- commit
- 70299e8e7f58e0cee8b59fbd65fcc1ae43543504
- parent
- 65edef9ecfb6335d2c272a3d62a4c8a21ce85d82
- Author
- Tobias Bengfort <tobias.bengfort@posteo.de>
- Date
- 2025-06-23 13:30
add post on loadness normalization
Diffstat
A | _content/posts/2025-06-23-loudness/index.md | 289 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 files changed, 289 insertions, 0 deletions
diff --git a/_content/posts/2025-06-23-loudness/index.md b/_content/posts/2025-06-23-loudness/index.md
@@ -0,0 +1,289 @@ -1 1 --- -1 2 title: Notes on Loudness Normalization -1 3 date: 2025-06-23 -1 4 tags: [audio, linux] -1 5 description: "Dialogue in a movie is barely audible, but explosion are far too loud. So I experimented with loadness normalization." -1 6 --- -1 7 -1 8 You know the situation: Dialogue in a movie is barely audible, so you turn the -1 9 volume all the way up. The next scene has an explosion and your ears explode. -1 10 -1 11 To prevent this, there are algorithms to normalize loudness. I wasn't really -1 12 interested in reading everything there is to know about loudness normalization -1 13 though. Instead, I just experimented with a few options. I also looked into how -1 14 they can be used on a Linux desktop. -1 15 -1 16 # PipeWire filters -1 17 -1 18 [PipeWire](https://gitlab.freedesktop.org/pipewire/pipewire) has recently -1 19 replaced older sound servers like PulseAudio or Jack on Linux desktops. -1 20 It provides backwards compatibility with the old systems, so you can for -1 21 example use the Pulse Volume Control GUI. It also provides features similar to -1 22 Jack (or even [PureData](https://en.wikipedia.org/wiki/Pure_data)) where you -1 23 can create different audio processing nodes and connect them together. -1 24 -1 25 Creating a [filter node](https://docs.pipewire.org/group__pw__filter.html) was -1 26 easy enough. However, I had to manually connect it to the audio streams I -1 27 wanted to process. For stereo audio, theat meant manually making 4 links (2 -1 28 links from movie to filter and 2 links from filter to speakers). I tried to -1 29 automatically create these links via the API to no avail. I also tried to -1 30 fiddle with -1 31 [WirePlumber](https://pipewire.pages.freedesktop.org/wireplumber/policies/smart_filters.html) -1 32 with similar results. -1 33 -1 34 Finally I found [filter -1 35 chains](https://docs.pipewire.org/page_module_filter_chain.html), apparanetly a -1 36 completely unrelated feature in PipeWire that creates a virtual sink in front -1 37 of the filter and automatically connects its output to the default sink. This -1 38 makes it really easy to use the filter with standard GUIs. -1 39 -1 40 Filter chains are configured using a syntax that looks like JSON without commas. -1 41 The documentation says they should be saved to -1 42 `~/.config/pipewire/filter-chain.conf.d/`, but for me they didn't load unless I -1 43 saved them to `~/.config/pipewire/pipewire.conf.d/`. -1 44 -1 45 If there is any error in the configuration, the filter will just be ignored. I -1 46 added `ExecStart=/usr/bin/pipewire -vvv` to -1 47 `/usr/lib/systemd/user/pipewire.service` to get some debug output, which helped -1 48 a little but not much. -1 49 -1 50 For the filters themselves you have a couple of options: -1 51 -1 52 - a couple of builtin low-level primitives like multiplication or logarithms -1 53 - ladspa/lv2 plugins -1 54 - SOFA filters for spatially oriented audio -1 55 - EBU R 128 filters (we will get to that) -1 56 -1 57 Out of all of these, ldaspa/lv2 plugins provide the most flexibility. However, -1 58 I didn't get them to work. So I was mostly stuck with the builtin primitives to -1 59 build my filters. -1 60 -1 61 This whole experience was a bit bumpy. Once I got this to work it was a joy, -1 62 but documentation and the debug experience could certainly be improved. -1 63 -1 64 # Reshaping curves -1 65 -1 66 My first idea was to apply function directly to the audio signal. I landed on -1 67 $f(x) = 1.5x - 0.5x^3$. This function is symmetric around (0, 0), boosts small -1 68 values, and compresses larger values so the maximum value is still at 1. -1 69 -1 70 It also reshapes the sound waves. A pure sine wave would be distorted when send -1 71 through this filter. I was curious to hear how that would effect the sound. -1 72 -1 73 This is the PipeWire configuration I came up with: -1 74 -1 75 ``` -1 76 context.modules = [ -1 77 { -1 78 name = libpipewire-module-filter-chain -1 79 args = { -1 80 node.description = "compressor" -1 81 media.name = "compressor" -1 82 filter.graph = { -1 83 nodes = [ -1 84 { -1 85 type = builtin -1 86 name = copy -1 87 label = copy -1 88 } -1 89 { -1 90 type = builtin -1 91 name = cube -1 92 label = mult -1 93 } -1 94 { -1 95 type = builtin -1 96 name = mixer -1 97 label = mixer -1 98 control { -1 99 "Gain 1" = 1.5 -1 100 "Gain 2" = -0.5 -1 101 } -1 102 } -1 103 ] -1 104 links = [ -1 105 { output = "copy:Out" input = "cube:In 1" } -1 106 { output = "copy:Out" input = "cube:In 2" } -1 107 { output = "copy:Out" input = "cube:In 3" } -1 108 { output = "copy:Out" input = "mixer:In 1" } -1 109 { output = "cube:Out" input = "mixer:In 2" } -1 110 ] -1 111 } -1 112 audio.channels = 2 -1 113 capture.props = { -1 114 node.name = "effect_input.compressor" -1 115 media.class = Audio/Sink -1 116 } -1 117 playback.props = { -1 118 node.name = "effect_output.compressor" -1 119 node.passive = true -1 120 } -1 121 } -1 122 } -1 123 ] -1 124 ``` -1 125 -1 126 The result sounded ok, but also not quite like what I had in mind: The -1 127 compression for larger values was barely noticeable because the audio data -1 128 doesn't really contain many large values. On the plus side, this meant that the -1 129 wave distortion effect was small. But it didn't really do much beyond -1 130 increasing the volume. -1 131 -1 132 ## Fourier Transforms -1 133 -1 134 It is a fun exercise to apply techniques from image processing to sound or the -1 135 other way around. -1 136 -1 137 I had experimented with optimizing images by spreading each of the red, green, -1 138 and blue channels so that the minimum value for each is 0% and the maximum -1 139 value is 100%. That technique turned out useful to remove color casts from old -1 140 photos. -1 141 -1 142 To apply this technique to sound, my approach was to first do a Fourier -1 143 transform to get the strength of each frequency, spread these strengths, and -1 144 then do the inverse Fourier transform. -1 145 -1 146 The minimum turned out to be 0 in most cases. But I thought this might also be -1 147 a good chance to do some additional noise reduction. So I shifted the minimum -1 148 anyway. -1 149 -1 150 On the other end, I didn't want to cancel out all differences in loudness. So -1 151 instead of stretching the maximum to 100% everywhere, I opted to just push it -1 152 slightly in that direction by applying a square root. -1 153 -1 154 Finally, I didn't want to have abrupt changes in loudness. So I smoothed the -1 155 minimum and maximum by mixing it with the previous values. -1 156 -1 157 Because I didn't know how to implement this using PipeWire filter chains, I -1 158 prototyped it in python instead: -1 159 -1 160 ```python -1 161 import sys -1 162 -1 163 import numpy as np -1 164 import soundfile as sf -1 165 -1 166 CHUNK_SIZE = 2048 -1 167 KEEP = 0.9 -1 168 CUTOFF = 0.02 -1 169 BOOST = 0.5 -1 170 -1 171 audio_data, sample_rate = sf.read(sys.argv[1]) -1 172 -1 173 chunks = [] -1 174 min_magnitude = 0 -1 175 max_magnitude = 1 -1 176 -1 177 for start in range(0, len(audio_data), CHUNK_SIZE): -1 178 end = min(start + chunk_size, len(audio_data)) -1 179 -1 180 fft_data = np.fft.fft(audio_data[start:end]) -1 181 magnitude = np.abs(fft_data) -1 182 min_magnitude = np.min(magnitude) * (1 - KEEP) + min_magnitude * KEEP -1 183 max_magnitude = np.max(magnitude) * (1 - KEEP) + max_magnitude * KEEP -1 184 -1 185 spread_magnitude = ( -1 186 (magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF) -1 187 / (max_magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF) -1 188 * (max_magnitude ** BOOST) -1 189 ) -1 190 spread_magnitude = np.clip(spread_magnitude, 0, 1) -1 191 -1 192 new_fft_data = spread_magnitude * np.exp(1j * np.angle(fft_data)) -1 193 processed_chunk = np.fft.ifft(new_fft_data) -1 194 chunks.append(np.real(processed_chunk)) -1 195 -1 196 sf.write('processed.flac', processed, sample_rate) -1 197 ``` -1 198 -1 199 The result sounded OK (no noticeable distortion) but the quiet parts were -1 200 still too quiet. -1 201 -1 202 # EBU R 128 -1 203 -1 204 In the meantime I did some reading on the last kind of filter that PipeWire had -1 205 to offer. I had never heard of EBU R 128 before. It turns out it has quite an -1 206 interesting story. -1 207 -1 208 EBU is short for the "European Broadcasting Union". That is the same -1 209 organization that does the [Eurovision Song -1 210 Contest](https://en.wikipedia.org/wiki/European_Song_Contest), so this already -1 211 starts glamorous. -1 212 -1 213 In the last few decades, there was a thing called the [Loudness -1 214 War](https://en.wikipedia.org/wiki/Loudness_war): Audio producers who wanted -1 215 their songs and jingles to be more noticeable used compression to increase the -1 216 average loudness of the sound, while leaving the peaks at the same level. [EBU -1 217 R 128](https://tech.ebu.ch/files/live/sites/tech/files/shared/r/r128.pdf) -1 218 provides loudness recommendations for its member organizations, which -1 219 effectively stopped the loudness war. -1 220 -1 221 We shouldn't give too much credit to EBU though. Much of the specification is -1 222 in turn based on [ITU-R -1 223 BS.1770-5](https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en) by the -1 224 International Telecommunication Union. This might actually be one of the best -1 225 standards I have ever read. It first gives a conceptual overview, then provides -1 226 all normative formulas, and then goes deep into the rationale and methodology. -1 227 It was a very interesting and at the same time approachable read. -1 228 -1 229 The only downside is of course the name. I can understand why EBU R 128 is more -1 230 commonly used. -1 231 -1 232 Loudness is typically measured as logarithm of power, which in turn is -1 233 calculated as the integral over the squared audio signal. In the case of ITU-R -1 234 BS.1770-S: -1 235 -1 236 $Loudness(y) = 10 \log_10\left(\int_{t=0}^T y(t)^2 dt\right) - 0.691$ -1 237 -1 238 The unit for loudness is LKFS (Loudness, K-weighted, relative to full scale). -1 239 EBU uses the same unit, but calls it LUFS (Loudness units relative to full -1 240 scale). -1 241 -1 242 Before all that is calculated, frequencies are weighted to account for human -1 243 hearing. The industry standard is a curve simply called -1 244 [A-weighting](https://en.wikipedia.org/wiki/Sound_level_meter#Frequency_weighting). -1 245 ITU-R BS.1770-S however refers to a [study by -1 246 Soulodre](https://jcaa.caa-aca.ca/index.php/jcaa/article/download/1673/1420/1810) -1 247 that found that no weighting actually performs better, and a new curve called -1 248 RLB performs even better than that. -1 249 -1 250 In addition to the frequency weighting curve, ITU-R BS.1770-S also specifies an -1 251 algorithm to calculate "gated" loudness. In this version, power is calculated -1 252 as the average over many small chunks. Chunks that are too quite are ignored. -1 253 -1 254 On top of this, [EBU Tech -1 255 3341](https://tech.ebu.ch/files/live/sites/tech/files/shared/tech/tech3341.pdf) -1 256 defines three profiles: -1 257 -1 258 - "Momentary Loudness" is measured over a 400ms window without gating -1 259 - "Short-term Loudness" is measured over a 3s window without gating -1 260 - "Integrated Loudness" is measured over the complete audio with gating -1 261 -1 262 If you want to use this system with PipeWire, the repo contains an [example of -1 263 how to use its ebur128 -1 264 filter](https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/src/daemon/filter-chain/35-ebur128.conf). -1 265 Fair warning though: The current version has [a -1 266 typo](https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4667) so that -1 267 "Shortterm" must be written as "Shorttem" instead. -1 268 -1 269 I have used this filter with some success. This really does normalize loudness. -1 270 However, there are still some issues. With the "Short-term" profile there is a -1 271 noticeable ramp when going from a quiet section to a loud section or the other -1 272 way around. So when there is a sudden bang after a quiet section, it gets -1 273 amplified even further. -1 274 -1 275 # Conclusion -1 276 -1 277 I want to be able to hear all dialogue, but I don't want loud explosions or -1 278 background noise to be amplified. It is tricky to make that distinction with -1 279 these simple techniques. I feel like I could get lost in trying to tweak all -1 280 the parameters to perfection, so I better stop here. -1 281 -1 282 PipeWire turned out to be extremely flexible in theory, but also very limited -1 283 in practice. For example, I wish I there was a builtin power filter (it has -1 284 $const^x$, but not $x^{const}$) or that it was possible to apply these filters to -1 285 control values (e.g. the gain factor generated by ebur128). While the -1 286 documentation is decent, I still had issues finding relevant information. -1 287 -1 288 At this point this is just a collection of notes. I will use the ebur128 filter -1 289 for a while and then maybe come back to this topic with some new ideas.