Musician likes to experiment with the keys and tempos of musical works. If it's hard to adapt to a new key on acoustic instrument, you can try digital instrument that has built-in transpose function without changing the original way of playing. Sound engineer can raise or lower the pitch of a sound with a pitch shifter in recording, or transpose the soundtracks in a DAW in post production. For those who are comfortable with command line interface, FFmpeg and SoX are great companions for audio processing. This article introduces FFmpeg, Rubberband and SoX for audio playback, format conversion, and ultimately adjusting the tempo and the pitch to your taste in the terminal.
How will the Concerto in D Minor after Marcello (BWV 974) by J.S. Bach sound in C Minor? Check it out yourself after you finish reading. 😉
What is FFmpeg, Rubberband and SoX?
FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge - https://www.ffmpeg.org/about.html
FFmpeg has a large suite of libraries and programs for processing audio, video and other multimedia files and streams. First released in 2000, FFmpeg has been used as a core module for handling multimedia in software applications such as YouTube, Chrome, iTunes, VLC media player, Handbrake, and Blender, just to name a few. If you are unfamiliar with FFmpeg, check out this tutorial for an introduction.
Rubberband is an audio time-stretching and pitch-shifting library and utility program. It includes a simple (free) command-line utility program that you can use for fixed adjustments to the speed and pitch of existing audio files - https://breakfastquay.com/rubberband/
Rubberband is shipped in
librubberband as an audio filter in FFmpeg. It also has a command-line interface independent of FFmpeg. For modifying the tempo or pitch of an audio, Rubberband is far better than low-level manipulation via the
asetrate filters in FFmpeg.
SoX (Sound eXchange) is a cross-platform command-line utility for audio manipulation. It can read and apply effects to audio, particularly suited for quick, simple edits and batch processing.
SoX is almost a decade older than FFmpeg. Unlike FFmpeg which has encompassing libraries for both audio and video, SoX focuses on audio processing only. However, it includes most of the tools that you will also find in a DAW.
Quality Control with Meta Data and Playback
Checking meta data is the first step in quality control. You can extract the meta data of a single audio file with the command
ffprobe <input> in FFmpeg, or use
soxi <input1> [input2] [input3] to display information for multiple audio files with SoX.
SoX only supports audio formats that are not patent-encumbered or of which the patent has expired. For example, it can process audio files with an extension of
ogg, but cannot read the compatible audio bitstreams inside
m4b containers. For a complete list of the supported audio formats, please refer to the official documentation. For files that cannot be processed by SoX, FFmpeg is here to rescue. It can transcode or transmux (changing containers without re-encoding) audio in almost any format with decent quality by default.
Playback is an intuitive way to check for potential problems. Both FFmpeg and Sox allows you to play audio with compatible formats in the terminal. FFmpeg ships a
ffplay command for multimedia playback. The
-nodisp flag is optional if you don't want the graphical display in playback.
ffplay input.wav -nodisp
The playback position can be controlled with a timestamp in
[-]S+[.m...][s|ms|us] formats. You need to be mindful that FFmpeg doesn't always have an accurate position for playback. The seeking for transcoding is always accurate to the given timestamps though.
ffplay input.wav -ss <start> -t <duration>
SoX comes with a
play command for audio playback with no pop-up display. You can control the playback with an optional start via
trim seconds, an optional pause or end via
trim =seconds. You can even skip to a position measured from the end of a soundtrack via
trim -seconds. The
play command below basically says that it should play from 00:15 until 00:30, skip the rest with fast forward and resume playing the last 20 seconds till the end. The playback seeking is accurate in SoX.
play input.wav trim 00:15 =00:30 -20
Transcoding and Generation Loss
You can convert an audio from one format into another as long as both formats are supported in the tool of your choice. FFmpeg is more suitable for this task thanks to its massive container library (libavformat) and codec library (libavcodec). You can check the supported containers with
ffmpeg -formats and the supported codecs with
Here are the commands for converting audio formats with SoX and FFmpeg respectively:
sox input.wav output.mp3 # format conversion with SoX ffmpeg -i input.wav output.mp3 # format conversion with FFmpeg
Generation loss can be introduced in transcoding. The above command will re-encode the source audio with the target codec. If you just want to change containers (transmuxing) and the target codec is the same as the source codec, you really should just copy the original codec and avoid re-encoding. You can add the
-c:a copy flag, where
-c:a means the audio codec.
ffmpeg -i input.ogg -c:a copy output.webm
You can pick SoX or FFmpeg for changing the tempo of an existing audio file. The trick is also called time stretching. SoX has a straight-forward command and a decent quality for the output. FFmpeg has a slightly more complicated command syntax, and if used with the right filter, it can produce a slightly better output.
In SoX you can change the tempo in playback with the command
play <input> tempo [factor] where
factor is the ratio of new tempo to the old tempo, so 1.2 speeds up the tempo by 20% and 0.7 slows it down by 30%. Since the
tempo effect uses the
rate effect in SoX which changes the sampling rate, the bitrate drops in the output audio and the difference might be perceptible. Use the command below to save a copy.
sox <input> <output> tempo 1.2
An audio filter is required in FFmpeg for changing the tempo of a soundtrack. You can check the filters library in FFmpeg (libavfilter) via
ffmpeg -filters. FFmpeg accepts multiple filters, which can be daisy-chained by comma (see the command syntax below). The
-af flag is equivalent to
-filter:a for audio filter. Likewise the
-vf is a shorthand for
-filter:v for video filter. A filter can have multiple parameters with a set of default values. Parameters of a filter are joined by colon.
ffmpeg -i <input> -af "filter1,filter2,filter3" <output> ffmpeg -i <input> -af "filter=param1=value1:param2=value2" <output>
I tried the
atempo filter with
ffplay <input> -af atempo=1.5 to increase the tempo by 1.5 times. The audio quality is so miserable that I don't recommend it. Another option is the
librubberband in FFmpeg, where it uses Rubberband, a dedicated library for time stretching and pitch shifting. Rubberband can be used as an FFmpeg audio filter or an independent command-line utility.
The playback command via
librubberband looks like this:
ffplay <input> -af rubberband=tempo=1.5. You can save a copy of the audio streamed at different playback speed with either command below.
ffmpeg -i <input> -af rubberband=tempo=1.5 <output> rubberband --tempo 1.5 <input> <output>
Rubberband produces the best audio quality. SoX has a decent quality and its command is easy to remember. You probably should avoid the FFmpeg atempo filter at all cost for tempo adjustment.
SoX has a very good support for shifting pitch with
pitch [-][shift], where the
[shift] indicates a shift value at 100th of a semitone, with an optional positive sign or a negative sign for the shift direction. The following commands show how to lower the pitch by 2 semitones.
play <input> pitch -200 # for playback sox <input> <output> pitch -200 # for conversion
When it comes to Rubberband, the pitch shifting syntax is somewhat problematic. It depends on whether it's used as a command-line utility program or as an FFmpeg audio filter. Let's look at the command-line scenario first.
According to the Rubberband command line utility help guide, which is also available via
rubberband -h in the terminal, you can lower the pitch by 2 semitones with the following command, where the
--pitch flag is interchangeable with
rubberband --pitch -2 <input> <ouptput>
If you want to use Rubberband as an FFmpeg audio filter, you need to be careful with the value for the scaled pitch. For instance, this command
ffplay -i <input> -af rubberband=pitch=-2 will throw an out-of-range error, where the value for the pitch should fall somewhere between 0.01 and 100. Since a person normally can hear the sounds between 20 to 20,000 Hz, is this somehow related to the pitch value range in
How about changing the command to
ffplay -i <input> -af rubberband=pitch=2? Will I get a shift of 2 semitones above the original key? It turns out it's one octave higher! Reducing the pitch value to 0.5 will end up with one octave lower than the original key. It's obvious that the pitch parameter in the FFmpeg audio filter is different from the Rubberband command-line utility program. How can I calculate the value for shifting the pitch down by 2 semitones with
To answer this question, a bit of musical theory is needed. The calculation of a semitone depends on the tuning system in use. In twelve-tone equal temperament, each semitone is equal to one twelfth of an octave. The ratio of the frequencies between two adjacent octaves is 2:1. The ratio of the frequencies between two adjacent semitones is twelfth root of two .
|+12 semitones||2 12⁄12||2|
|+1 semitone||2 1⁄12||1.0594630944|
|-1 semitone||1 ÷ 21⁄12||0.9438743127|
|-2 semitones||1 ÷ 21⁄6||0.8908987181|
|-12 semitones||1 ÷ 2 12⁄12||0.5|
If the original key is scaled to 1, the ratio for shifting the pitch down by 2 semitones is 0.8908987181. You can use the Rubberband audio filter in FFmpeg to achieve the transposition.
ffplay -i <input> -af rubberband=pitch=0.8908987181 # for playback ffmpeg -i <input> -af rubberband=pitch=0.8908987181 <ouptput> # for conversion
The syntaxes for pitch-shifting in SoX and the Rubberband CLI are more musician-friendly than the audio filter in FFmpeg. The audio quality from
librubberband in FFmpeg is the least optimal. The Rubberband CLI produces the best audio quality and its syntax makes sense. The only problem is the absence of a playback command similar to the
play <input> in SoX. Therefore you can use SoX for a convenient playback and the Rubberband CLI for high quality conversion.