- commit
- 70299e8e7f58e0cee8b59fbd65fcc1ae43543504
- parent
- 65edef9ecfb6335d2c272a3d62a4c8a21ce85d82
- Author
- Tobias Bengfort <tobias.bengfort@posteo.de>
- Date
- 2025-06-23 13:30
add post on loadness normalization
Diffstat
| A | _content/posts/2025-06-23-loudness/index.md | 289 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 files changed, 289 insertions, 0 deletions
diff --git a/_content/posts/2025-06-23-loudness/index.md b/_content/posts/2025-06-23-loudness/index.md
@@ -0,0 +1,289 @@
-1 1 ---
-1 2 title: Notes on Loudness Normalization
-1 3 date: 2025-06-23
-1 4 tags: [audio, linux]
-1 5 description: "Dialogue in a movie is barely audible, but explosion are far too loud. So I experimented with loadness normalization."
-1 6 ---
-1 7
-1 8 You know the situation: Dialogue in a movie is barely audible, so you turn the
-1 9 volume all the way up. The next scene has an explosion and your ears explode.
-1 10
-1 11 To prevent this, there are algorithms to normalize loudness. I wasn't really
-1 12 interested in reading everything there is to know about loudness normalization
-1 13 though. Instead, I just experimented with a few options. I also looked into how
-1 14 they can be used on a Linux desktop.
-1 15
-1 16 # PipeWire filters
-1 17
-1 18 [PipeWire](https://gitlab.freedesktop.org/pipewire/pipewire) has recently
-1 19 replaced older sound servers like PulseAudio or Jack on Linux desktops.
-1 20 It provides backwards compatibility with the old systems, so you can for
-1 21 example use the Pulse Volume Control GUI. It also provides features similar to
-1 22 Jack (or even [PureData](https://en.wikipedia.org/wiki/Pure_data)) where you
-1 23 can create different audio processing nodes and connect them together.
-1 24
-1 25 Creating a [filter node](https://docs.pipewire.org/group__pw__filter.html) was
-1 26 easy enough. However, I had to manually connect it to the audio streams I
-1 27 wanted to process. For stereo audio, theat meant manually making 4 links (2
-1 28 links from movie to filter and 2 links from filter to speakers). I tried to
-1 29 automatically create these links via the API to no avail. I also tried to
-1 30 fiddle with
-1 31 [WirePlumber](https://pipewire.pages.freedesktop.org/wireplumber/policies/smart_filters.html)
-1 32 with similar results.
-1 33
-1 34 Finally I found [filter
-1 35 chains](https://docs.pipewire.org/page_module_filter_chain.html), apparanetly a
-1 36 completely unrelated feature in PipeWire that creates a virtual sink in front
-1 37 of the filter and automatically connects its output to the default sink. This
-1 38 makes it really easy to use the filter with standard GUIs.
-1 39
-1 40 Filter chains are configured using a syntax that looks like JSON without commas.
-1 41 The documentation says they should be saved to
-1 42 `~/.config/pipewire/filter-chain.conf.d/`, but for me they didn't load unless I
-1 43 saved them to `~/.config/pipewire/pipewire.conf.d/`.
-1 44
-1 45 If there is any error in the configuration, the filter will just be ignored. I
-1 46 added `ExecStart=/usr/bin/pipewire -vvv` to
-1 47 `/usr/lib/systemd/user/pipewire.service` to get some debug output, which helped
-1 48 a little but not much.
-1 49
-1 50 For the filters themselves you have a couple of options:
-1 51
-1 52 - a couple of builtin low-level primitives like multiplication or logarithms
-1 53 - ladspa/lv2 plugins
-1 54 - SOFA filters for spatially oriented audio
-1 55 - EBU R 128 filters (we will get to that)
-1 56
-1 57 Out of all of these, ldaspa/lv2 plugins provide the most flexibility. However,
-1 58 I didn't get them to work. So I was mostly stuck with the builtin primitives to
-1 59 build my filters.
-1 60
-1 61 This whole experience was a bit bumpy. Once I got this to work it was a joy,
-1 62 but documentation and the debug experience could certainly be improved.
-1 63
-1 64 # Reshaping curves
-1 65
-1 66 My first idea was to apply function directly to the audio signal. I landed on
-1 67 $f(x) = 1.5x - 0.5x^3$. This function is symmetric around (0, 0), boosts small
-1 68 values, and compresses larger values so the maximum value is still at 1.
-1 69
-1 70 It also reshapes the sound waves. A pure sine wave would be distorted when send
-1 71 through this filter. I was curious to hear how that would effect the sound.
-1 72
-1 73 This is the PipeWire configuration I came up with:
-1 74
-1 75 ```
-1 76 context.modules = [
-1 77 {
-1 78 name = libpipewire-module-filter-chain
-1 79 args = {
-1 80 node.description = "compressor"
-1 81 media.name = "compressor"
-1 82 filter.graph = {
-1 83 nodes = [
-1 84 {
-1 85 type = builtin
-1 86 name = copy
-1 87 label = copy
-1 88 }
-1 89 {
-1 90 type = builtin
-1 91 name = cube
-1 92 label = mult
-1 93 }
-1 94 {
-1 95 type = builtin
-1 96 name = mixer
-1 97 label = mixer
-1 98 control {
-1 99 "Gain 1" = 1.5
-1 100 "Gain 2" = -0.5
-1 101 }
-1 102 }
-1 103 ]
-1 104 links = [
-1 105 { output = "copy:Out" input = "cube:In 1" }
-1 106 { output = "copy:Out" input = "cube:In 2" }
-1 107 { output = "copy:Out" input = "cube:In 3" }
-1 108 { output = "copy:Out" input = "mixer:In 1" }
-1 109 { output = "cube:Out" input = "mixer:In 2" }
-1 110 ]
-1 111 }
-1 112 audio.channels = 2
-1 113 capture.props = {
-1 114 node.name = "effect_input.compressor"
-1 115 media.class = Audio/Sink
-1 116 }
-1 117 playback.props = {
-1 118 node.name = "effect_output.compressor"
-1 119 node.passive = true
-1 120 }
-1 121 }
-1 122 }
-1 123 ]
-1 124 ```
-1 125
-1 126 The result sounded ok, but also not quite like what I had in mind: The
-1 127 compression for larger values was barely noticeable because the audio data
-1 128 doesn't really contain many large values. On the plus side, this meant that the
-1 129 wave distortion effect was small. But it didn't really do much beyond
-1 130 increasing the volume.
-1 131
-1 132 ## Fourier Transforms
-1 133
-1 134 It is a fun exercise to apply techniques from image processing to sound or the
-1 135 other way around.
-1 136
-1 137 I had experimented with optimizing images by spreading each of the red, green,
-1 138 and blue channels so that the minimum value for each is 0% and the maximum
-1 139 value is 100%. That technique turned out useful to remove color casts from old
-1 140 photos.
-1 141
-1 142 To apply this technique to sound, my approach was to first do a Fourier
-1 143 transform to get the strength of each frequency, spread these strengths, and
-1 144 then do the inverse Fourier transform.
-1 145
-1 146 The minimum turned out to be 0 in most cases. But I thought this might also be
-1 147 a good chance to do some additional noise reduction. So I shifted the minimum
-1 148 anyway.
-1 149
-1 150 On the other end, I didn't want to cancel out all differences in loudness. So
-1 151 instead of stretching the maximum to 100% everywhere, I opted to just push it
-1 152 slightly in that direction by applying a square root.
-1 153
-1 154 Finally, I didn't want to have abrupt changes in loudness. So I smoothed the
-1 155 minimum and maximum by mixing it with the previous values.
-1 156
-1 157 Because I didn't know how to implement this using PipeWire filter chains, I
-1 158 prototyped it in python instead:
-1 159
-1 160 ```python
-1 161 import sys
-1 162
-1 163 import numpy as np
-1 164 import soundfile as sf
-1 165
-1 166 CHUNK_SIZE = 2048
-1 167 KEEP = 0.9
-1 168 CUTOFF = 0.02
-1 169 BOOST = 0.5
-1 170
-1 171 audio_data, sample_rate = sf.read(sys.argv[1])
-1 172
-1 173 chunks = []
-1 174 min_magnitude = 0
-1 175 max_magnitude = 1
-1 176
-1 177 for start in range(0, len(audio_data), CHUNK_SIZE):
-1 178 end = min(start + chunk_size, len(audio_data))
-1 179
-1 180 fft_data = np.fft.fft(audio_data[start:end])
-1 181 magnitude = np.abs(fft_data)
-1 182 min_magnitude = np.min(magnitude) * (1 - KEEP) + min_magnitude * KEEP
-1 183 max_magnitude = np.max(magnitude) * (1 - KEEP) + max_magnitude * KEEP
-1 184
-1 185 spread_magnitude = (
-1 186 (magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF)
-1 187 / (max_magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF)
-1 188 * (max_magnitude ** BOOST)
-1 189 )
-1 190 spread_magnitude = np.clip(spread_magnitude, 0, 1)
-1 191
-1 192 new_fft_data = spread_magnitude * np.exp(1j * np.angle(fft_data))
-1 193 processed_chunk = np.fft.ifft(new_fft_data)
-1 194 chunks.append(np.real(processed_chunk))
-1 195
-1 196 sf.write('processed.flac', processed, sample_rate)
-1 197 ```
-1 198
-1 199 The result sounded OK (no noticeable distortion) but the quiet parts were
-1 200 still too quiet.
-1 201
-1 202 # EBU R 128
-1 203
-1 204 In the meantime I did some reading on the last kind of filter that PipeWire had
-1 205 to offer. I had never heard of EBU R 128 before. It turns out it has quite an
-1 206 interesting story.
-1 207
-1 208 EBU is short for the "European Broadcasting Union". That is the same
-1 209 organization that does the [Eurovision Song
-1 210 Contest](https://en.wikipedia.org/wiki/European_Song_Contest), so this already
-1 211 starts glamorous.
-1 212
-1 213 In the last few decades, there was a thing called the [Loudness
-1 214 War](https://en.wikipedia.org/wiki/Loudness_war): Audio producers who wanted
-1 215 their songs and jingles to be more noticeable used compression to increase the
-1 216 average loudness of the sound, while leaving the peaks at the same level. [EBU
-1 217 R 128](https://tech.ebu.ch/files/live/sites/tech/files/shared/r/r128.pdf)
-1 218 provides loudness recommendations for its member organizations, which
-1 219 effectively stopped the loudness war.
-1 220
-1 221 We shouldn't give too much credit to EBU though. Much of the specification is
-1 222 in turn based on [ITU-R
-1 223 BS.1770-5](https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en) by the
-1 224 International Telecommunication Union. This might actually be one of the best
-1 225 standards I have ever read. It first gives a conceptual overview, then provides
-1 226 all normative formulas, and then goes deep into the rationale and methodology.
-1 227 It was a very interesting and at the same time approachable read.
-1 228
-1 229 The only downside is of course the name. I can understand why EBU R 128 is more
-1 230 commonly used.
-1 231
-1 232 Loudness is typically measured as logarithm of power, which in turn is
-1 233 calculated as the integral over the squared audio signal. In the case of ITU-R
-1 234 BS.1770-S:
-1 235
-1 236 $Loudness(y) = 10 \log_10\left(\int_{t=0}^T y(t)^2 dt\right) - 0.691$
-1 237
-1 238 The unit for loudness is LKFS (Loudness, K-weighted, relative to full scale).
-1 239 EBU uses the same unit, but calls it LUFS (Loudness units relative to full
-1 240 scale).
-1 241
-1 242 Before all that is calculated, frequencies are weighted to account for human
-1 243 hearing. The industry standard is a curve simply called
-1 244 [A-weighting](https://en.wikipedia.org/wiki/Sound_level_meter#Frequency_weighting).
-1 245 ITU-R BS.1770-S however refers to a [study by
-1 246 Soulodre](https://jcaa.caa-aca.ca/index.php/jcaa/article/download/1673/1420/1810)
-1 247 that found that no weighting actually performs better, and a new curve called
-1 248 RLB performs even better than that.
-1 249
-1 250 In addition to the frequency weighting curve, ITU-R BS.1770-S also specifies an
-1 251 algorithm to calculate "gated" loudness. In this version, power is calculated
-1 252 as the average over many small chunks. Chunks that are too quite are ignored.
-1 253
-1 254 On top of this, [EBU Tech
-1 255 3341](https://tech.ebu.ch/files/live/sites/tech/files/shared/tech/tech3341.pdf)
-1 256 defines three profiles:
-1 257
-1 258 - "Momentary Loudness" is measured over a 400ms window without gating
-1 259 - "Short-term Loudness" is measured over a 3s window without gating
-1 260 - "Integrated Loudness" is measured over the complete audio with gating
-1 261
-1 262 If you want to use this system with PipeWire, the repo contains an [example of
-1 263 how to use its ebur128
-1 264 filter](https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/src/daemon/filter-chain/35-ebur128.conf).
-1 265 Fair warning though: The current version has [a
-1 266 typo](https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4667) so that
-1 267 "Shortterm" must be written as "Shorttem" instead.
-1 268
-1 269 I have used this filter with some success. This really does normalize loudness.
-1 270 However, there are still some issues. With the "Short-term" profile there is a
-1 271 noticeable ramp when going from a quiet section to a loud section or the other
-1 272 way around. So when there is a sudden bang after a quiet section, it gets
-1 273 amplified even further.
-1 274
-1 275 # Conclusion
-1 276
-1 277 I want to be able to hear all dialogue, but I don't want loud explosions or
-1 278 background noise to be amplified. It is tricky to make that distinction with
-1 279 these simple techniques. I feel like I could get lost in trying to tweak all
-1 280 the parameters to perfection, so I better stop here.
-1 281
-1 282 PipeWire turned out to be extremely flexible in theory, but also very limited
-1 283 in practice. For example, I wish I there was a builtin power filter (it has
-1 284 $const^x$, but not $x^{const}$) or that it was possible to apply these filters to
-1 285 control values (e.g. the gain factor generated by ebur128). While the
-1 286 documentation is decent, I still had issues finding relevant information.
-1 287
-1 288 At this point this is just a collection of notes. I will use the ebur128 filter
-1 289 for a while and then maybe come back to this topic with some new ideas.