With video presentations and pre-recorded video streams taking the place of many “in-person” activities—from church services to product demonstrations—the quality of recorded narration and speech is more important than ever. Fortunately, those working with video programs already have many of the tools needed to edit speech, without having to use dedicatedDefinition:
Designed or used to do a specific task or set of tasks. audio software. Furthermore, audio plug-ins can provide additional functionality.
The examples given here are based on Magix Vegas, a popular video editing program for Windows. However, the same techniques apply to most other video editing programs, as well as audio-only programs if the speech is being edited outside of the video production environment.
Controlling P-Pops
P-pops result from plosives, when a sudden rush of air from a sound like “p” or “b” hits the mic. Although it’s best to prevent pops at the source with pop filters (Fig. 1) and proper mic technique (e.g., not speaking too close to the mic), editing can fix pops as well.
Locate the pop in the audio; pops have a distinctive shape you’ll learn to recognize. Split the audio just before the pop begins, and add a fade-in. The fade-in length and curve controls how much you reduce the pop (Fig. 2).
Reducing Breath Noise
Although it’s possible to cut breath noises, if you remove all of them speech sounds unnatural—we’re humans, and we breathe. I usually cut some inhales, but also reduce the volume of others. You do this by splitting an audio clip at the start and end of the inhale, and reducing only the inhale’s level. Depending on the program, this can be done with volume automationDefinition:
With recording, the process of storing control and switch mixer moves so that they can be duplicated upon playback., or varying the clip’s level (Fig. 3).
De-Essing Sibilants
Those nasty “ess” sounds can be a problem, particularly if you use compressionDefinition:
1. Reducing the dynamic range of an audio or video signal for consistency, to keep it from exceeding the available headroom, or providing a special effect. 2. With data, using a process to reduce the amount of data. Compression can be lossless, where decompressing replicates the original signal, or lossy, where compression occurs by judging some data as unnecessary, and can therefore be discarded from the file. or limiting, or increase the treble to make the voice more intelligible. Although there are de-esserDefinition:
A signal processor that reduces sibilance ("ess" sounds) from vocals. plug-ins (like IK Multimedia’s T-Racks De-Esser) and dedicated hardware units, you can also use a Multiband DynamicsDefinition:
A dynamics processor that splits the incoming audio into more than one frequency band, and can apply separate control over dynamics parameters for each band. plug-inDefinition:
A software module, typically for signal processing or analysis, that inserts into the signal path of a DAW or video editor, or in some digital consoles. Also Plugin. (Fig. 4), and compress only the high frequencies to reduce ess sounds.
When you need a really fine degree of control, you can also de-ess manually. The technique is similar to fixing breath noises. Locate the sibilant (the waveformDefinition:
A graphic picture of a wave. will look like a ball of dense sound; see Fig. 5), split before and after the ess, and lower the level.
Attenuating Wind Noise
When recording outside, especially with a shotgun or lavalier mic, wind noise often comes along as an unwelcome guest. Like p-pops, it’s better to stop this at the sound with an appropriate acoustic wind shieldDefinition:
1. The outer conductive wrapping on a cable. 2. A metal plate or enclosure around a device to keep out radio frequency and magnetic interference. filterDefinition:
1. (audio) A circuit that reduces certain frequencies, e.g., a low-pass or high-pass filter for audio. See also: Equalizer. 2. (optics) reduces certain color wavelengths or polarizations. (Fig. 6).
However if the wind noise is already “baked into” a track, you still have some options. Fortunately, most wind noise consists of low frequencies, many of which are below the range of the human voice. Adding a steep, low-cut filter (use the steepest rolloff possible; see Fig. 7), just below the voice range can help. This probably won’t eliminate wind noise, but its volume will likely be lower.
Dealing with Mouth Clicks
These are sharpDefinition:
A musical note that is higher in pitch than a note's standard frequency. See also: Flat., trebly, short-durationDefinition:
The length of time over which an event, such as a musical note transmitted over MIDI, occurs. transients that distract from the narration. The easiest solution is to use iZotope’s RX7 restoration plug-in, whose Mouth De-Clicker function (available in all RX7 versions except RX7 Elements) is almost 100% effective. If you don’t have RX7, locate the click, which will look like a needleDefinition:
(Slang) The stylus that's part of a phonograph cartridge. in the waveform (Fig. 8).
When you split a waveform in Vegas, it adds an automatic, short fade-in and fade-out to prevent clicks at the split point. We can use this to advantage with mouth clicks: split directly in the center of the click, and the fades will eliminate it most of the time (Fig. 9). If not, simply cut the section with the click. Mouth clicks are of such short duration that cutting it will generally not produce an audible discontinuity.
Phrase-by-Phrase Gain Changes
To achieve a consistent voice level, many people use a compressorDefinition:
A device that reduces dynamic range by a certain ratio, and used primarily for audio signals. or limiterDefinition:
A form of compressor with a ratio of infinity:1 above the threshold for both overload protection, or as a sound effect to normalize volume. Some engineers consider a compressor with a ratio of 20:1 or higher above the threshold for a limiter. to narrow the dynamic rangeDefinition:
1. Range in dB between a device's noise floor and its maximum available headroom. 2. The difference between the maximum and minimum volume levels that occur in a musical performance.. However, sometimes this can produce pumpingDefinition:
A change in background sound, hiss, or "room noise," caused by the tendency for automatic level control systems and audio dynamic range compression to audibly vary background or system noise. See also: Breathing. and other undesirable sonic artifactsDefinition:
Errors in digital conversion, recording, processing or transmission of audio and visual information caused by compression, quantization, jitter and/or packet loss.. I prefer to use normalization or gainDefinition:
The amount of amplification provided by an amplifier circuit, expressed in dB or numerically as in "3X" = "3 times". changes to make individual phrases more uniform (Fig. 10). Because you’re changing level—it’s no different from turning up a volume control—there are no artifacts. Then, if you want to add compression or limiting, you can use a much lesser amount, and attain better consistency than you could by using dynamicsDefinition:
In a musical performance, changes in overall volume levels, often accompanied by timbral changes. Example: Classical symphonic music has a wide dynamic range, while dance mixes have a much narrower dynamic range. processing alone.
Adjusting Equalization
The ear is most sensitive around 3-4 kHzDefinition:
Abbreviation for Kilohertz: 1,000 Hertz, or one thousand cycles per second., so boosting voice frequencies in that range will make the voice “pop” more. However, a little goes a long way; excessive boosting can make the voice harsh and screechy. Similarly, adding a gentle high-frequencyDefinition:
A value, expressed in Hertz, that indicates how many cycles of a periodic signal occur in one second. shelfDefinition:
A point at either end of the frequency spectrum where the level is boosted or cut with all frequencies past the crossover point at the same level can increase the overall articulation, while trimming the low frequencies helps reduce “boominess” (although you don’t want to cut too much, or the voice will lose “warmth”). Fig. 11 shows a typical EQ curve for male voice.
Specialized Vocal Processing Tools
We mentioned RX7’s mouth de-click function, but the entire family of iZotope RX7 audio restoration tools is exceptional. Recently I was given some audio to repair that was recorded by actress Ali McGraw, and the voice had a considerable amount of background noise throughout. RX7’s Voice De-Noise function (Fig. 12) got rid of most of it, which made a major improvement in the overall quality.
RX7 even has a function where you can split a phrase that runs together into two phrases, and change the ending pitchDefinition:
For a given range of audio frequencies, pitch represents where a single sound falls within that range. of the first phrase so that it sounds like the person actually ended the sentence there—sort of like adding a virtual period.
Of the various RX7 versions, iZotope RX7 Elements has basic repair options, but I’d recommend the Standard version, which has a comprehensive array of tools. The Advanced version offers features useful for complex applications, like fixing problematic movie dialog. For example, it can isolate dialog from noise, “de-rustle” the sound that often occurs with lavalier mics, reduce wind noise, minimize reverbDefinition:
Short for Reverberation. The myriad echoes of decaying amplitude created in an acoustic environment. Reverberation may be simulated electronically, mechanically using springs or a metal plate, or in a specially built physical chamber with reflective surfaces where a speaker sends audio into the chamber, and a microphone picks up the reflections., and even match ambiances—ideal when loopingDefinition:
1. (sampling) Repeating a section of a waveform so that it can sustain indefinitely. 2. (audio) The process of repeating a phrase or section of sound within a composition or recording. 3. (video) See: ADR. dialog that needs to match the sound of dialog recorded on location.
Magix Sound Forge Pro is another option. This software program’s main orientation is sophisticated audio editing, but it also includes restoration tools of its own, and incorporates iZotope RXDefinition:
(Receive) Abbreviation for a receiver or to receive. Elements and iZotope Ozone Elements for signal processingDefinition:
Alteration of sound or video signals by using a hardware device or software. Typical signal processing devices include equalization, compression, reverb, color correction, blur, glow, etc.. If you don’t have extreme restoration needs, Sound Forge Pro is an excellent choice because of the extensive editing and format conversion capabilities. Among other functions, editing allows removing “ummms,” coughs, and the like.
The Fix Is In
It’s worth taking the time to use these techniques to turn “okay” narration or presentations into polished, professional-level audio. The impact of quality audio on the listener is huge, and regardless of the subject matter, superior sound quality not only attracts attention, but gives a feeling of confidence in what’s being said.