GearCast - Editing Narration and Speech for Video

Editing Narration and Speech for Video

By Craig Anderton

Published 5/7/2020

With video presentations and pre-recorded video streams taking the place of many “in-person” activities—from church services to product demonstrations—the quality of recorded narration and speech is more important than ever. Fortunately, those working with video programs already have many of the tools needed to edit speech, without having to use dedicated Fullcompass Logo Definition:
Designed or used to do a specific task or set of tasks. audio software. Furthermore, audio plug-ins can provide additional functionality.

The examples given here are based on Magix Vegas, a popular video editing program for Windows. However, the same techniques apply to most other video editing programs, as well as audio-only programs if the speech is being edited outside of the video production environment.

Controlling P-Pops

P-pops result from plosives, when a sudden rush of air from a sound like “p” or “b” hits the mic. Although it’s best to prevent pops at the source with pop filters (Fig. 1) and proper mic technique (e.g., not speaking too close to the mic), editing can fix pops as well.

Pop Filter — **Fig. 1:** **Gator’s RI-Popfilter** is an inexpensive, nylon-screen pop filter that controls pops at the source. Its C-clamp fits most standard mic stand shafts and booms.

Locate the pop in the audio; pops have a distinctive shape you’ll learn to recognize. Split the audio just before the pop begins, and add a fade-in. The fade-in length and curve controls how much you reduce the pop (Fig. 2).

P-Pop reduce — **Fig. 2:** The upper image shows the original file. The middle image shows fading in on a p-pop to reduce the low-frequency “pop” that can occur from plosives and directional mics. The bottom image shows that the “p” sound is still present, but its level has been controlled by the fade-in.

Reducing Breath Noise

Although it’s possible to cut breath noises, if you remove all of them speech sounds unnatural—we’re humans, and we breathe. I usually cut some inhales, but also reduce the volume of others. You do this by splitting an audio clip at the start and end of the inhale, and reducing only the inhale’s level. Depending on the program, this can be done with volume automation Fullcompass Logo Definition:
With recording, the process of storing control and switch mixer moves so that they can be duplicated upon playback., or varying the clip’s level (Fig. 3).

Reduce Breath Noise — **Fig. 3:** Gain is being reduced for only the isolated clip of breath noise.

De-Essing Sibilants

Those nasty “ess” sounds can be a problem, particularly if you use compression Fullcompass Logo Definition:
1. Reducing the dynamic range of an audio or video signal for consistency, to keep it from exceeding the available headroom, or providing a special effect. 2. With data, using a process to reduce the amount of data. Compression can be lossless, where decompressing replicates the original signal, or lossy, where compression occurs by judging some data as unnecessary, and can therefore be discarded from the file. or limiting, or increase the treble to make the voice more intelligible. Although there are de-esser Fullcompass Logo Definition:
A signal processor that reduces sibilance ("ess" sounds) from vocals. plug-ins (like IK Multimedia’s T-Racks De-Esser) and dedicated hardware units, you can also use a Multiband DynamicsDefinition:
A dynamics processor that splits the incoming audio into more than one frequency band, and can apply separate control over dynamics parameters for each band. plug-in Fullcompass Logo Definition:
A software module, typically for signal processing or analysis, that inserts into the signal path of a DAW or video editor, or in some digital consoles. Also Plugin. (Fig. 4), and compress only the high frequencies to reduce ess sounds.

Multiband dynamics — **Fig. 4:** The Multiband Dynamics processor in Vegas reduces ess sounds automatically.

When you need a really fine degree of control, you can also de-ess manually. The technique is similar to fixing breath noises. Locate the sibilant (the waveform Fullcompass Logo Definition:
A graphic picture of a wave. will look like a ball of dense sound; see Fig. 5), split before and after the ess, and lower the level.

Reduce Sibilance — **Fig. 5:** It’s possible to de-ess manually by locating the ess sound, isolating it, and reducing its level.

Attenuating Wind Noise

When recording outside, especially with a shotgun or lavalier mic, wind noise often comes along as an unwelcome guest. Like p-pops, it’s better to stop this at the sound with an appropriate acoustic wind shieldDefinition:
1. The outer conductive wrapping on a cable. 2. A metal plate or enclosure around a device to keep out radio frequency and magnetic interference. filter Fullcompass Logo Definition:
1. (audio) A circuit that reduces certain frequencies, e.g., a low-pass or high-pass filter for audio. See also: Equalizer. 2. (optics) reduces certain color wavelengths or polarizations. (Fig. 6).

Wind shield — **Fig. 6:** The **Røde Minifur-Lav** is an artificial fur wind shield designed specifically for lavalier mics.

However if the wind noise is already “baked into” a track, you still have some options. Fortunately, most wind noise consists of low frequencies, many of which are below the range of the human voice. Adding a steep, low-cut filter (use the steepest rolloff possible; see Fig. 7), just below the voice range can help. This probably won’t eliminate wind noise, but its volume will likely be lower.

Reduce Wind — **Fig. 7:** **Vegas’s Track EQ plug-in** is using a low-cut shelf filter to reduce low frequencies. Note the relatively sharp filter rolloff of 24 dB/oct.

Dealing with Mouth Clicks

These are sharpDefinition:
A musical note that is higher in pitch than a note's standard frequency. See also: Flat., trebly, short-duration Fullcompass Logo Definition:
The length of time over which an event, such as a musical note transmitted over MIDI, occurs. transients that distract from the narration. The easiest solution is to use iZotope’s RX7 restoration plug-in, whose Mouth De-Clicker function (available in all RX7 versions except RX7 Elements) is almost 100% effective. If you don’t have RX7, locate the click, which will look like a needle Fullcompass Logo Definition:
(Slang) The stylus that's part of a phonograph cartridge. in the waveform (Fig. 8).

When you split a waveform in Vegas, it adds an automatic, short fade-in and fade-out to prevent clicks at the split point. We can use this to advantage with mouth clicks: split directly in the center of the click, and the fades will eliminate it most of the time (Fig. 9). If not, simply cut the section with the click. Mouth clicks are of such short duration that cutting it will generally not produce an audible discontinuity.

Kill Mouth Click with Split — **Fig. 9:** Splitting exactly on a mouth click in Vegas will almost always get rid of it.

Phrase-by-Phrase Gain Changes

To achieve a consistent voice level, many people use a compressor Fullcompass Logo Definition:
A device that reduces dynamic range by a certain ratio, and used primarily for audio signals. or limiterDefinition:
A form of compressor with a ratio of infinity:1 above the threshold for both overload protection, or as a sound effect to normalize volume. Some engineers consider a compressor with a ratio of 20:1 or higher above the threshold for a limiter. to narrow the dynamic range Fullcompass Logo Definition:
1. Range in dB between a device's noise floor and its maximum available headroom. 2. The difference between the maximum and minimum volume levels that occur in a musical performance.. However, sometimes this can produce pumpingDefinition:
A change in background sound, hiss, or "room noise," caused by the tendency for automatic level control systems and audio dynamic range compression to audibly vary background or system noise. See also: Breathing. and other undesirable sonic artifacts Fullcompass Logo Definition:
Errors in digital conversion, recording, processing or transmission of audio and visual information caused by compression, quantization, jitter and/or packet loss.. I prefer to use normalization or gainDefinition:
The amount of amplification provided by an amplifier circuit, expressed in dB or numerically as in "3X" = "3 times". changes to make individual phrases more uniform (Fig. 10). Because you’re changing level—it’s no different from turning up a volume control—there are no artifacts. Then, if you want to add compression or limiting, you can use a much lesser amount, and attain better consistency than you could by using dynamics Fullcompass Logo Definition:
In a musical performance, changes in overall volume levels, often accompanied by timbral changes. Example: Classical symphonic music has a wide dynamic range, while dance mixes have a much narrower dynamic range. processing alone.

Adjusting Equalization

The ear is most sensitive around 3-4 kHz Fullcompass Logo Definition:
Abbreviation for Kilohertz: 1,000 Hertz, or one thousand cycles per second., so boosting voice frequencies in that range will make the voice “pop” more. However, a little goes a long way; excessive boosting can make the voice harsh and screechy. Similarly, adding a gentle high-frequency Fullcompass Logo Definition:
A value, expressed in Hertz, that indicates how many cycles of a periodic signal occur in one second. shelfDefinition:
A point at either end of the frequency spectrum where the level is boosted or cut with all frequencies past the crossover point at the same level can increase the overall articulation, while trimming the low frequencies helps reduce “boominess” (although you don’t want to cut too much, or the voice will lose “warmth”). Fig. 11 shows a typical EQ curve for male voice.

Vocal EQ — **Fig. 11:** This EQ curve is designed to make a male voice stand out more. The exact settings will differ for various speakers, because their voices have different timbres.

Specialized Vocal Processing Tools

We mentioned RX7’s mouth de-click function, but the entire family of iZotope RX7 audio restoration tools is exceptional. Recently I was given some audio to repair that was recorded by actress Ali McGraw, and the voice had a considerable amount of background noise throughout. RX7’s Voice De-Noise function (Fig. 12) got rid of most of it, which made a major improvement in the overall quality.

Voice De-Noise — **Fig. 12:** Voice is being separated from the noise, so that the noise can be attenuated independently.

RX7 even has a function where you can split a phrase that runs together into two phrases, and change the ending pitch Fullcompass Logo Definition:
For a given range of audio frequencies, pitch represents where a single sound falls within that range. of the first phrase so that it sounds like the person actually ended the sentence there—sort of like adding a virtual period.

Of the various RX7 versions, iZotope RX7 Elements has basic repair options, but I’d recommend the Standard version, which has a comprehensive array of tools. The Advanced version offers features useful for complex applications, like fixing problematic movie dialog. For example, it can isolate dialog from noise, “de-rustle” the sound that often occurs with lavalier mics, reduce wind noise, minimize reverb Fullcompass Logo Definition:
Short for Reverberation. The myriad echoes of decaying amplitude created in an acoustic environment. Reverberation may be simulated electronically, mechanically using springs or a metal plate, or in a specially built physical chamber with reflective surfaces where a speaker sends audio into the chamber, and a microphone picks up the reflections., and even match ambiances—ideal when looping Fullcompass Logo Definition:
1. (sampling) Repeating a section of a waveform so that it can sustain indefinitely. 2. (audio) The process of repeating a phrase or section of sound within a composition or recording. 3. (video) See: ADR. dialog that needs to match the sound of dialog recorded on location.

Magix Sound Forge Pro is another option. This software program’s main orientation is sophisticated audio editing, but it also includes restoration tools of its own, and incorporates iZotope RXDefinition:
(Receive) Abbreviation for a receiver or to receive. Elements and iZotope Ozone Elements for signal processing Fullcompass Logo Definition:
Alteration of sound or video signals by using a hardware device or software. Typical signal processing devices include equalization, compression, reverb, color correction, blur, glow, etc.. If you don’t have extreme restoration needs, Sound Forge Pro is an excellent choice because of the extensive editing and format conversion capabilities. Among other functions, editing allows removing “ummms,” coughs, and the like.

The Fix Is In

It’s worth taking the time to use these techniques to turn “okay” narration or presentations into polished, professional-level audio. The impact of quality audio on the listener is huge, and regardless of the subject matter, superior sound quality not only attracts attention, but gives a feeling of confidence in what’s being said.

Gator RI-POPFILTER Microphone Pop Filter

The Gator Rok-It RI-POPFILTER is a 6" single layer microphone pop filter with a clamp mount.

$13.99 View Details

IK Multimedia T-RACKS-DE-ESSER Multi-Band De-Esser [DOWNLOAD]

The IK Multimedia T-RACKS-DE-ESSER De-Esser has multiband capability that puts is ahead of the competition making it precise and versatile on top of good sounding.

$49.99 View Details

Rode MINIFUR-LAV Artificial Fur Wind Shield for Lavalier Microphones

The MINIFUR-LAV is a synthetic fur cover for the Lavalier microphone, designed for use in outdoor or high-wind conditions. The MINIFUR-LAV is supplied in a pack of three.

$24.99 View Details

GearCast - Editing Narration and Speech for Video