Editing Narration and Speech for Video

Editing Narration and Speech for Video
By Craig Anderton
Copied to clipboard

With video presentations and pre-recorded video streams taking the place of many “in-person” activities—from church services to product demonstrations—the quality of recorded narration and speech is more important than ever. Fortunately, those working with video programs already have many of the tools needed to edit speech, without having to use dedicatedFullcompass LogoDefinition:
Designed or used to do a specific task or set of tasks.
audio software. Furthermore, audio plug-ins can provide additional functionality.

The examples given here are based on Magix Vegas, a popular video editing program for Windows. However, the same techniques apply to most other video editing programs, as well as audio-only programs if the speech is being edited outside of the video production environment.

Controlling P-Pops

P-pops result from plosives, when a sudden rush of air from a sound like “p” or “b” hits the mic. Although it’s best to prevent pops at the source with pop filters (Fig. 1) and proper mic technique (e.g., not speaking too close to the mic), editing can fix pops as well.

Pop Filter
Fig. 1: Gator’s RI-Popfilter is an inexpensive, nylon-screen pop filter that controls pops at the source. Its C-clamp fits most standard mic stand shafts and booms.

Locate the pop in the audio; pops have a distinctive shape you’ll learn to recognize. Split the audio just before the pop begins, and add a fade-in. The fade-in length and curve controls how much you reduce the pop (Fig. 2).

P-Pop reduce
Fig. 2: The upper image shows the original file. The middle image shows fading in on a p-pop to reduce the low-frequency “pop” that can occur from plosives and directional mics. The bottom image shows that the “p” sound is still present, but its level has been controlled by the fade-in.

Reducing Breath Noise

Although it’s possible to cut breath noises, if you remove all of them speech sounds unnatural—we’re humans, and we breathe. I usually cut some inhales, but also reduce the volume of others. You do this by splitting an audio clip at the start and end of the inhale, and reducing only the inhale’s level. Depending on the program, this can be done with volume automationFullcompass LogoDefinition:
With recording, the process of storing control and switch mixer moves so that they can be duplicated upon playback.
, or varying the clip’s level (Fig. 3).

Reduce Breath Noise
Fig. 3: Gain is being reduced for only the isolated clip of breath noise.

De-Essing Sibilants

Those nasty “ess” sounds can be a problem, particularly if you use compressionFullcompass LogoDefinition:
1. Reducing the dynamic range of an audio or video signal for consistency, to keep it from exceeding the available headroom, or providing a special effect.  2. With data, using a process to reduce the amount of data. Compression can be lossless, where decompressing replicates the original signal, or lossy, where compression occurs by judging some data as unnecessary, and can therefore be discarded from the file.
or limiting, or increase the treble to make the voice more intelligible. Although there are de-esserFullcompass LogoDefinition:
A signal processor that reduces sibilance ("ess" sounds) from vocals.
plug-ins (like IK Multimedia’s T-Racks De-Esser) and dedicated hardware units, you can also use a Multiband DynamicsFullcompass LogoDefinition:
A dynamics processor that splits the incoming audio into more than one frequency band, and can apply separate control over dynamics parameters for each band.
plug-inFullcompass LogoDefinition:
A software module, typically for signal processing or analysis, that inserts into the signal path of a DAW or video editor, or in some digital consoles. Also Plugin.
(Fig. 4), and compress only the high frequencies to reduce ess sounds.

Multiband dynamics
Fig. 4: The Multiband Dynamics processor in Vegas reduces ess sounds automatically.

When you need a really fine degree of control, you can also de-ess manually. The technique is similar to fixing breath noises. Locate the sibilant (the waveformFullcompass LogoDefinition:
A graphic picture of a wave.
will look like a ball of dense sound; see Fig. 5), split before and after the ess, and lower the level.

Reduce Sibilance
Fig. 5: It’s possible to de-ess manually by locating the ess sound, isolating it, and reducing its level.

Attenuating Wind Noise

When recording outside, especially with a shotgun or lavalier mic, wind noise often comes along as an unwelcome guest. Like p-pops, it’s better to stop this at the sound with an appropriate acoustic wind shieldFullcompass LogoDefinition:
1. The outer conductive wrapping on a cable. 2. A metal plate or enclosure around a device to keep out radio frequency and magnetic interference.
filterFullcompass LogoDefinition:
1. (audio) A circuit that reduces certain frequencies, e.g., a low-pass or high-pass filter for audio. See also: Equalizer.  2. (optics) reduces certain color wavelengths or polarizations.
(Fig. 6).

Wind shield
Fig. 6: The Røde Minifur-Lav is an artificial fur wind shield designed specifically for lavalier mics.

However if the wind noise is already “baked into” a track, you still have some options. Fortunately, most wind noise consists of low frequencies, many of which are below the range of the human voice. Adding a steep, low-cut filter (use the steepest rolloff possible; see Fig. 7), just below the voice range can help. This probably won’t eliminate wind noise, but its volume will likely be lower.

Reduce Wind
Fig. 7: Vegas’s Track EQ plug-in is using a low-cut shelf filter to reduce low frequencies. Note the relatively sharp filter rolloff of 24 dB/oct.

Dealing with Mouth Clicks

These are sharpFullcompass LogoDefinition:
A musical note that is higher in pitch than a note's standard frequency. See also: Flat.
, trebly, short-durationFullcompass LogoDefinition:
The length of time over which an event, such as a musical note transmitted over MIDI, occurs.
transients that distract from the narration. The easiest solution is to use iZotope’s RX7 restoration plug-in, whose Mouth De-Clicker function (available in all RX7 versions except RX7 Elements) is almost 100% effective. If you don’t have RX7, locate the click, which will look like a needleFullcompass LogoDefinition:
(Slang) The stylus that's part of a phonograph cartridge.
in the waveform (Fig. 8).

Mouth Click
Fig. 8: A mouth click (circled in white) is an extremely short, high-frequency transient.

When you split a waveform in Vegas, it adds an automatic, short fade-in and fade-out to prevent clicks at the split point. We can use this to advantage with mouth clicks: split directly in the center of the click, and the fades will eliminate it most of the time (Fig. 9). If not, simply cut the section with the click. Mouth clicks are of such short duration that cutting it will generally not produce an audible discontinuity.

Kill Mouth Click with Split
Fig. 9: Splitting exactly on a mouth click in Vegas will almost always get rid of it.

Phrase-by-Phrase Gain Changes

To achieve a consistent voice level, many people use a compressorFullcompass LogoDefinition:
A device that reduces dynamic range by a certain ratio, and used primarily for audio signals.
or limiterFullcompass LogoDefinition:
A form of compressor with a ratio of infinity:1 above the threshold for both overload protection, or as a sound effect to normalize volume. Some engineers consider a compressor with a ratio of 20:1 or higher above the threshold for a limiter.
to narrow the dynamic rangeFullcompass LogoDefinition:
1. Range in dB between a device's noise floor and its maximum available headroom.  2. The difference between the maximum and minimum volume levels that occur in a musical performance.
. However, sometimes this can produce pumpingFullcompass LogoDefinition:
A change in background sound, hiss, or "room noise," caused by the tendency for automatic level control systems and audio dynamic range compression to audibly vary background or system noise. See also: Breathing.
and other undesirable sonic artifactsFullcompass LogoDefinition:
Errors in digital conversion, recording, processing or transmission of audio and visual information caused by compression, quantization, jitter and/or packet loss.
. I prefer to use normalization or gainFullcompass LogoDefinition:
The amount of amplification provided by an amplifier circuit, expressed in dB or numerically as in "3X" = "3 times".
changes to make individual phrases more uniform (Fig. 10). Because you’re changing level—it’s no different from turning up a volume control—there are no artifacts. Then, if you want to add compression or limiting, you can use a much lesser amount, and attain better consistency than you could by using dynamicsFullcompass LogoDefinition:
In a musical performance, changes in overall volume levels, often accompanied by timbral changes. Example: Classical symphonic music has a wide dynamic range, while dance mixes have a much narrower dynamic range.
processing alone.

Phrase by phrase
Fig. 10: The upper and lower waveforms are the same audio file. However, the lower version has been split in multiple places to isolate individual phrases, and the gain varied for the phrases to create a more uniform level.

Adjusting Equalization

The ear is most sensitive around 3-4 kHzFullcompass LogoDefinition:
Abbreviation for Kilohertz: 1,000 Hertz, or one thousand cycles per second.
, so boosting voice frequencies in that range will make the voice “pop” more. However, a little goes a long way; excessive boosting can make the voice harsh and screechy. Similarly, adding a gentle high-frequencyFullcompass LogoDefinition:
A value, expressed in Hertz, that indicates how many cycles of a periodic signal occur in one second.
shelfFullcompass LogoDefinition:
A point at either end of the frequency spectrum where the level is boosted or cut with all frequencies past the crossover point at the same level
can increase the overall articulation, while trimming the low frequencies helps reduce “boominess” (although you don’t want to cut too much, or the voice will lose “warmth”). Fig. 11 shows a typical EQ curve for male voice.

Vocal EQ
Fig. 11: This EQ curve is designed to make a male voice stand out more. The exact settings will differ for various speakers, because their voices have different timbres.

Specialized Vocal Processing Tools

We mentioned RX7’s mouth de-click function, but the entire family of iZotope RX7 audio restoration tools is exceptional. Recently I was given some audio to repair that was recorded by actress Ali McGraw, and the voice had a considerable amount of background noise throughout. RX7’s Voice De-Noise function (Fig. 12) got rid of most of it, which made a major improvement in the overall quality.

Voice De-Noise
Fig. 12: Voice is being separated from the noise, so that the noise can be attenuated independently.

RX7 even has a function where you can split a phrase that runs together into two phrases, and change the ending pitchFullcompass LogoDefinition:
For a given range of audio frequencies, pitch represents where a single sound falls within that range.
of the first phrase so that it sounds like the person actually ended the sentence there—sort of like adding a virtual period.

Of the various RX7 versions, iZotope RX7 Elements has basic repair options, but I’d recommend the Standard version, which has a comprehensive array of tools. The Advanced version offers features useful for complex applications, like fixing problematic movie dialog. For example, it can isolate dialog from noise, “de-rustle” the sound that often occurs with lavalier mics, reduce wind noise, minimize reverbFullcompass LogoDefinition:
Short for Reverberation. The myriad echoes of decaying amplitude created in an acoustic environment. Reverberation may be simulated electronically, mechanically using springs or a metal plate, or in a specially built physical chamber with reflective surfaces where a speaker sends audio into the chamber, and a microphone picks up the reflections.
, and even match ambiances—ideal when loopingFullcompass LogoDefinition:
1. (sampling) Repeating a section of a waveform so that it can sustain indefinitely.  2. (audio) The process of repeating a phrase or section of sound within a composition or recording.  3. (video) See: ADR.
dialog that needs to match the sound of dialog recorded on location.

Magix Sound Forge Pro is another option. This software program’s main orientation is sophisticated audio editing, but it also includes restoration tools of its own, and incorporates iZotope RXFullcompass LogoDefinition:
(Receive) Abbreviation for a receiver or to receive.
Elements and iZotope Ozone Elements for signal processingFullcompass LogoDefinition:
Alteration of sound or video signals by using a hardware device or software. Typical signal processing devices include equalization, compression, reverb, color correction, blur, glow, etc.
. If you don’t have extreme restoration needs, Sound Forge Pro is an excellent choice because of the extensive editing and format conversion capabilities. Among other functions, editing allows removing “ummms,” coughs, and the like.

The Fix Is In

It’s worth taking the time to use these techniques to turn “okay” narration or presentations into polished, professional-level audio. The impact of quality audio on the listener is huge, and regardless of the subject matter, superior sound quality not only attracts attention, but gives a feeling of confidence in what’s being said.

Featured Products

Gator RI-POPFILTER Microphone Pop Filter

Gator RI-POPFILTER Microphone Pop Filter

The Gator Rok-It RI-POPFILTER is a 6" single layer microphone pop filter with a clamp mount.

$13.99 View Details
IK Multimedia T-RACKS-DE-ESSER Multi-Band De-Esser [DOWNLOAD]

IK Multimedia T-RACKS-DE-ESSER Multi-Band De-Esser [DOWNLOAD]

The IK Multimedia T-RACKS-DE-ESSER De-Esser has multiband capability that puts is ahead of the competition making it precise and versatile on top of good sounding.

$49.99 View Details
Rode MINIFUR-LAV Artificial Fur Wind Shield for Lavalier Microphones

Rode MINIFUR-LAV Artificial Fur Wind Shield for Lavalier Microphones

The MINIFUR-LAV is a synthetic fur cover for the Lavalier microphone, designed for use in outdoor or high-wind conditions. The MINIFUR-LAV is supplied in a pack of three.

$24.99 View Details

Featured Brands

  • Magix
  • Global Truss
  • IK Multimedia
  • Rode
  • iZotope

Related Posts