Improving Firefox’s scaletempo parameters for better audio playback at greater than 1x speed

When you play media files in the browser at speeds other than 1x, the browser uses an algorithm that scales the speed without affecting pitch. (Without this, it would have the Chipmunk effect.) The most common implementation is Scaletempo, which is used all over the place because it’s open source, relatively efficient, and sounds decent. It works by cutting the audio into short segments (strides), and then blending them together with small overlaps. It searchs for similarities when choosing the overlapping areas, to minimize distortion or pops.

For whatever brain-dead reason, Firefox’s default implementation of Scaletempo has a poor choice of parameters. If you often watch Youtube videos at 1.5x or whatever, you might consider tweaking these. There are three main parameters of interest: stride, overlap, and search. The stride parameter determines the lengh in milliseconds of the segments, overlap determines how much any two segments are allowed to overlap, and search defines how far to look for candidate segment boundaries.

Firefox’s defaults for these three are 10, 8, 15 respectively. Here’s the documentation from the source tree (from Firefox 112):

# Time-stretch algorithm single processing sequence length in milliseconds.
# This determines to how long sequences the original sound is chopped in the
# time-stretch algorithm.
- name: media.audio.playbackrate.soundtouch_sequence_ms
  type: RelaxedAtomicInt32
  value: 10
  mirror: always

# Time-stretch algorithm overlap length in milliseconds. When the chopped sound
# sequences are mixed back together, to form a continuous sound stream, this
# parameter defines over how long period the two consecutive sequences are let
# to overlap each other.
- name: media.audio.playbackrate.soundtouch_overlap_ms
  type: RelaxedAtomicInt32
  value: 8
  mirror: always

# Time-stretch algorithm seeking window length in milliseconds for algorithm
# that finds the best possible overlapping location. This determines from how
# wide window the algorithm may look for an optimal joining location when mixing
# the sound sequences back together.
- name: media.audio.playbackrate.soundtouch_seekwindow_ms
  type: RelaxedAtomicInt32
  value: 15
  mirror: always

From my own experimentation, I have settled on the values 35 / 10 / 20, corresponding to the following lines in prefs.js:

user_pref("media.audio.playbackrate.soundtouch_sequence_ms", 35);
user_pref("media.audio.playbackrate.soundtouch_overlap_ms", 10);
user_pref("media.audio.playbackrate.soundtouch_seekwindow_ms", 20);

You can also enter these in about:config. Feel free to experiment and see what sounds best for you.