CLI Tools for Digital Audio
Blog
2021-04-23
9 mins

Musician likes to experiment with the keys and tempos of musical works. If it's hard to adapt to a new key on acoustic instrument, you can try digital instrument that has built-in transpose function without changing the original way of playing. Sound engineer can raise or lower the pitch of a sound with a pitch shifter in recording, or transpose the soundtracks in a DAW in post production. For those who are comfortable with command line interface, FFmpeg and SoX are great companions for audio processing. This article introduces FFmpeg, Rubberband and SoX for audio playback, format conversion, and ultimately adjusting the tempo and the pitch to your taste in the terminal.

How will the Concerto in D Minor after Marcello (BWV 974) by J.S. Bach sound in C Minor? Check it out yourself after you finish reading. ๐Ÿ˜‰

What is FFmpeg, Rubberband and SoX?

FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most obscure ancient formats up to the cutting edge - https://www.ffmpeg.org/about.html

FFmpeg has a large suite of libraries and programs for processing audio, video and other multimedia files and streams. First released in 2000, FFmpeg has been used as a core module for handling multimedia in software applications such as YouTube, Chrome, iTunes, VLC media player, Handbrake, and Blender, just to name a few. If you are unfamiliar with FFmpeg, check out this tutorial for an introduction.

Rubberband is an audio time-stretching and pitch-shifting library and utility program. It includes a simple (free) command-line utility program that you can use for fixed adjustments to the speed and pitch of existing audio files - https://breakfastquay.com/rubberband/

Rubberband is shipped in librubberband as an audio filter in FFmpeg. It also has a command-line interface independent of FFmpeg. For modifying the tempo or pitch of an audio, Rubberband is far better than low-level manipulation via the aresample, atempo, or asetrate filters in FFmpeg.

SoX (Sound eXchange) is a cross-platform command-line utility for audio manipulation. It can read and apply effects to audio, particularly suited for quick, simple edits and batch processing.

SoX is almost a decade older than FFmpeg. Unlike FFmpeg which has encompassing libraries for both audio and video, SoX focuses on audio processing only. However, it includes most of the tools that you will also find in a DAW.

Quality Control with Meta Data and Playback

Checking meta data is the first step in quality control. You can extract the meta data of a single audio file with the command ffprobe <input> in FFmpeg, or use soxi <input1> [input2] [input3] to display information for multiple audio files with SoX.

SoX only supports audio formats that are not patent-encumbered or of which the patent has expired. For example, it can process audio files with an extension of mp3, wav, aiff, flac, vorbis, opus, ogg, but cannot read the compatible audio bitstreams inside webm, mp4, m4a or m4b containers. For a complete list of the supported audio formats, please refer to the official documentation. For files that cannot be processed by SoX, FFmpeg is here to rescue. It can transcode or transmux (changing containers without re-encoding) audio in almost any format with decent quality by default.

Playback is an intuitive way to check for potential problems. Both FFmpeg and Sox allows you to play audio with compatible formats in the terminal. FFmpeg ships a ffplay command for multimedia playback. The -nodisp flag is optional if you don't want the graphical display in playback.

ffplay input.wav -nodisp

The playback position can be controlled with a timestamp in [HH:]MM:SS[.m...] or [-]S+[.m...][s|ms|us] formats. You need to be mindful that FFmpeg doesn't always have an accurate position for playback. The seeking for transcoding is always accurate to the given timestamps though.

ffplay input.wav -ss <start> -t <duration>

SoX comes with a play command for audio playback with no pop-up display. You can control the playback with an optional start via trim seconds, an optional pause or end via trim =seconds. You can even skip to a position measured from the end of a soundtrack via trim -seconds. The play command below basically says that it should play from 00:15 until 00:30, skip the rest with fast forward and resume playing the last 20 seconds till the end. The playback seeking is accurate in SoX.

play input.wav trim 00:15 =00:30 -20

Transcoding and Generation Loss

You can convert an audio from one format into another as long as both formats are supported in the tool of your choice. FFmpeg is more suitable for this task thanks to its massive container library (libavformat) and codec library (libavcodec). You can check the supported containers with ffmpeg -formats and the supported codecs with ffmpeg -codecs.

Here are the commands for converting audio formats with SoX and FFmpeg respectively:

sox input.wav output.mp3        # format conversion with SoX
ffmpeg -i input.wav output.mp3  # format conversion with FFmpeg

Generation loss can be introduced in transcoding. The above command will re-encode the source audio with the target codec. If you just want to change containers (transmuxing) and the target codec is the same as the source codec, you really should just copy the original codec and avoid re-encoding. You can add the -c:a copy flag, where -c:a means the audio codec.

ffmpeg -i input.ogg -c:a copy output.webm

Changing Tempo

You can pick SoX or FFmpeg for changing the tempo of an existing audio file. The trick is also called time stretching. SoX has a straight-forward command and a decent quality for the output. FFmpeg has a slightly more complicated command syntax, and if used with the right filter, it can produce a slightly better output.

In SoX you can change the tempo in playback with the command play <input> tempo [factor] where factor is the ratio of new tempo to the old tempo, so 1.2 speeds up the tempo by 20% and 0.7 slows it down by 30%. Since the tempo effect uses the rate effect in SoX which changes the sampling rate, the bitrate drops in the output audio and the difference might be perceptible. Use the command below to save a copy.

sox <input> <output> tempo 1.2

An audio filter is required in FFmpeg for changing the tempo of a soundtrack. You can check the filters library in FFmpeg (libavfilter) via ffmpeg -filters. FFmpeg accepts multiple filters, which can be daisy-chained by comma (see the command syntax below). The -af flag is equivalent to -filter:a for audio filter. Likewise the -vf is a shorthand for -filter:v for video filter. A filter can have multiple parameters with a set of default values. Parameters of a filter are joined by colon.

ffmpeg -i <input> -af "filter1,filter2,filter3" <output>
ffmpeg -i <input> -af "filter=param1=value1:param2=value2" <output>

I tried the atempo filter with ffplay <input> -af atempo=1.5 to increase the tempo by 1.5 times. The audio quality is so miserable that I don't recommend it. Another option is the librubberband in FFmpeg, where it uses Rubberband, a dedicated library for time stretching and pitch shifting. Rubberband can be used as an FFmpeg audio filter or an independent command-line utility.

The playback command via librubberband looks like this: ffplay <input> -af rubberband=tempo=1.5. You can save a copy of the audio streamed at different playback speed with either command below.

ffmpeg -i <input> -af rubberband=tempo=1.5 <output>
rubberband --tempo 1.5 <input> <output>

Rubberband produces the best audio quality. SoX has a decent quality and its command is easy to remember. You probably should avoid the FFmpeg atempo filter at all cost for tempo adjustment.

Shifting Pitch

SoX has a very good support for shifting pitch with pitch [-][shift], where the [shift] indicates a shift value at 100th of a semitone, with an optional positive sign or a negative sign for the shift direction. The following commands show how to lower the pitch by 2 semitones.

play <input> pitch -200           # for playback
sox <input> <output> pitch -200   # for conversion

When it comes to Rubberband, the pitch shifting syntax is somewhat problematic. It depends on whether it's used as a command-line utility program or as an FFmpeg audio filter. Let's look at the command-line scenario first.

According to the Rubberband command line utility help guide, which is also available via rubberband -h in the terminal, you can lower the pitch by 2 semitones with the following command, where the --pitch flag is interchangeable with -p.

rubberband --pitch -2 <input> <ouptput>

If you want to use Rubberband as an FFmpeg audio filter, you need to be careful with the value for the scaled pitch. For instance, this command ffplay -i <input> -af rubberband=pitch=-2 will throw an out-of-range error, where the value for the pitch should fall somewhere between 0.01 and 100. Since a person normally can hear the sounds between 20 to 20,000 Hz, is this somehow related to the pitch value range in librubberband?

How about changing the command to ffplay -i <input> -af rubberband=pitch=2? Will I get a shift of 2 semitones above the original key? It turns out it's one octave higher! Reducing the pitch value to 0.5 will end up with one octave lower than the original key. It's obvious that the pitch parameter in the FFmpeg audio filter is different from the Rubberband command-line utility program. How can I calculate the value for shifting the pitch down by 2 semitones with ffplay then?

To answer this question, a bit of musical theory is needed. The calculation of a semitone depends on the tuning system in use. In twelve-tone equal temperament, each semitone is equal to one twelfth of an octave. The ratio of the frequencies between two adjacent octaves is 2:1. The ratio of the frequencies between two adjacent semitones is twelfth root of two .

Shift distance Multiplier Ratio
+12 semitones 2 12โ„12 2
+2 semitones 21โ„6 1.1224620483
+1 semitone 2 1โ„12 1.0594630944
base 2 0โ„12 1
-1 semitone 1 รท 21โ„12 0.9438743127
-2 semitones 1 รท 21โ„6 0.8908987181
-12 semitones 1 รท 2 12โ„12 0.5

If the original key is scaled to 1, the ratio for shifting the pitch down by 2 semitones is 0.8908987181. You can use the Rubberband audio filter in FFmpeg to achieve the transposition.

ffplay -i <input> -af rubberband=pitch=0.8908987181            # for playback
ffmpeg -i <input> -af rubberband=pitch=0.8908987181 <ouptput>  # for conversion

The syntaxes for pitch-shifting in SoX and the Rubberband CLI are more musician-friendly than the audio filter in FFmpeg. The audio quality from librubberband in FFmpeg is the least optimal. The Rubberband CLI produces the best audio quality and its syntax makes sense. The only problem is the absence of a playback command similar to the play <input> in SoX. Therefore you can use SoX for a convenient playback and the Rubberband CLI for high quality conversion.