blog

git clone https://git.ce9e.org/blog.git

commit
70299e8e7f58e0cee8b59fbd65fcc1ae43543504
parent
65edef9ecfb6335d2c272a3d62a4c8a21ce85d82
Author
Tobias Bengfort <tobias.bengfort@posteo.de>
Date
2025-06-23 13:30
add post on loadness normalization

Diffstat

A _content/posts/2025-06-23-loudness/index.md 289 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 files changed, 289 insertions, 0 deletions


diff --git a/_content/posts/2025-06-23-loudness/index.md b/_content/posts/2025-06-23-loudness/index.md

@@ -0,0 +1,289 @@
   -1     1 ---
   -1     2 title: Notes on Loudness Normalization
   -1     3 date: 2025-06-23
   -1     4 tags: [audio, linux]
   -1     5 description: "Dialogue in a movie is barely audible, but explosion are far too loud. So I experimented with loadness normalization."
   -1     6 ---
   -1     7 
   -1     8 You know the situation: Dialogue in a movie is barely audible, so you turn the
   -1     9 volume all the way up. The next scene has an explosion and your ears explode.
   -1    10 
   -1    11 To prevent this, there are algorithms to normalize loudness. I wasn't really
   -1    12 interested in reading everything there is to know about loudness normalization
   -1    13 though. Instead, I just experimented with a few options. I also looked into how
   -1    14 they can be used on a Linux desktop.
   -1    15 
   -1    16 # PipeWire filters
   -1    17 
   -1    18 [PipeWire](https://gitlab.freedesktop.org/pipewire/pipewire) has recently
   -1    19 replaced older sound servers like PulseAudio or Jack on Linux desktops.
   -1    20 It provides backwards compatibility with the old systems, so you can for
   -1    21 example use the Pulse Volume Control GUI. It also provides features similar to
   -1    22 Jack (or even [PureData](https://en.wikipedia.org/wiki/Pure_data)) where you
   -1    23 can create different audio processing nodes and connect them together.
   -1    24 
   -1    25 Creating a [filter node](https://docs.pipewire.org/group__pw__filter.html) was
   -1    26 easy enough. However, I had to manually connect it to the audio streams I
   -1    27 wanted to process. For stereo audio, theat meant manually making 4 links (2
   -1    28 links from movie to filter and 2 links from filter to speakers). I tried to
   -1    29 automatically create these links via the API to no avail. I also tried to
   -1    30 fiddle with
   -1    31 [WirePlumber](https://pipewire.pages.freedesktop.org/wireplumber/policies/smart_filters.html)
   -1    32 with similar results.
   -1    33 
   -1    34 Finally I found [filter
   -1    35 chains](https://docs.pipewire.org/page_module_filter_chain.html), apparanetly a
   -1    36 completely unrelated feature in PipeWire that creates a virtual sink in front
   -1    37 of the filter and automatically connects its output to the default sink. This
   -1    38 makes it really easy to use the filter with standard GUIs.
   -1    39 
   -1    40 Filter chains are configured using a syntax that looks like JSON without commas.
   -1    41 The documentation says they should be saved to
   -1    42 `~/.config/pipewire/filter-chain.conf.d/`, but for me they didn't load unless I
   -1    43 saved them to `~/.config/pipewire/pipewire.conf.d/`.
   -1    44 
   -1    45 If there is any error in the configuration, the filter will just be ignored. I
   -1    46 added `ExecStart=/usr/bin/pipewire -vvv` to
   -1    47 `/usr/lib/systemd/user/pipewire.service` to get some debug output, which helped
   -1    48 a little but not much.
   -1    49 
   -1    50 For the filters themselves you have a couple of options:
   -1    51 
   -1    52 -   a couple of builtin low-level primitives like multiplication or logarithms
   -1    53 -   ladspa/lv2 plugins
   -1    54 -   SOFA filters for spatially oriented audio
   -1    55 -   EBU R 128 filters (we will get to that)
   -1    56 
   -1    57 Out of all of these, ldaspa/lv2 plugins provide the most flexibility. However,
   -1    58 I didn't get them to work. So I was mostly stuck with the builtin primitives to
   -1    59 build my filters.
   -1    60 
   -1    61 This whole experience was a bit bumpy. Once I got this to work it was a joy,
   -1    62 but documentation and the debug experience could certainly be improved.
   -1    63 
   -1    64 # Reshaping curves
   -1    65 
   -1    66 My first idea was to apply function directly to the audio signal. I landed on
   -1    67 $f(x) = 1.5x - 0.5x^3$. This function is symmetric around (0, 0), boosts small
   -1    68 values, and compresses larger values so the maximum value is still at 1.
   -1    69 
   -1    70 It also reshapes the sound waves. A pure sine wave would be distorted when send
   -1    71 through this filter. I was curious to hear how that would effect the sound.
   -1    72 
   -1    73 This is the PipeWire configuration I came up with:
   -1    74 
   -1    75 ```
   -1    76 context.modules = [
   -1    77     {
   -1    78         name = libpipewire-module-filter-chain
   -1    79         args = {
   -1    80             node.description = "compressor"
   -1    81             media.name = "compressor"
   -1    82             filter.graph = {
   -1    83                 nodes = [
   -1    84                     {
   -1    85                         type = builtin
   -1    86                         name = copy
   -1    87                         label = copy
   -1    88                     }
   -1    89                     {
   -1    90                         type = builtin
   -1    91                         name = cube
   -1    92                         label = mult
   -1    93                     }
   -1    94                     {
   -1    95                         type = builtin
   -1    96                         name = mixer
   -1    97                         label = mixer
   -1    98                         control {
   -1    99                             "Gain 1" = 1.5
   -1   100                             "Gain 2" = -0.5
   -1   101                         }
   -1   102                     }
   -1   103                 ]
   -1   104                 links = [
   -1   105                     { output = "copy:Out" input = "cube:In 1" }
   -1   106                     { output = "copy:Out" input = "cube:In 2" }
   -1   107                     { output = "copy:Out" input = "cube:In 3" }
   -1   108                     { output = "copy:Out" input = "mixer:In 1" }
   -1   109                     { output = "cube:Out" input = "mixer:In 2" }
   -1   110                 ]
   -1   111             }
   -1   112             audio.channels = 2
   -1   113             capture.props = {
   -1   114                 node.name =  "effect_input.compressor"
   -1   115                 media.class = Audio/Sink
   -1   116             }
   -1   117             playback.props = {
   -1   118                 node.name =  "effect_output.compressor"
   -1   119                 node.passive = true
   -1   120             }
   -1   121         }
   -1   122     }
   -1   123 ]
   -1   124 ```
   -1   125 
   -1   126 The result sounded ok, but also not quite like what I had in mind: The
   -1   127 compression for larger values was barely noticeable because the audio data
   -1   128 doesn't really contain many large values. On the plus side, this meant that the
   -1   129 wave distortion effect was small. But it didn't really do much beyond
   -1   130 increasing the volume.
   -1   131 
   -1   132 ## Fourier Transforms
   -1   133 
   -1   134 It is a fun exercise to apply techniques from image processing to sound or the
   -1   135 other way around.
   -1   136 
   -1   137 I had experimented with optimizing images by spreading each of the red, green,
   -1   138 and blue channels so that the minimum value for each is 0% and the maximum
   -1   139 value is 100%. That technique turned out useful to remove color casts from old
   -1   140 photos.
   -1   141 
   -1   142 To apply this technique to sound, my approach was to first do a Fourier
   -1   143 transform to get the strength of each frequency, spread these strengths, and
   -1   144 then do the inverse Fourier transform.
   -1   145 
   -1   146 The minimum turned out to be 0 in most cases. But I thought this might also be
   -1   147 a good chance to do some additional noise reduction. So I shifted the minimum
   -1   148 anyway.
   -1   149 
   -1   150 On the other end, I didn't want to cancel out all differences in loudness. So
   -1   151 instead of stretching the maximum to 100% everywhere, I opted to just push it
   -1   152 slightly in that direction by applying a square root.
   -1   153 
   -1   154 Finally, I didn't want to have abrupt changes in loudness. So I smoothed the
   -1   155 minimum and maximum by mixing it with the previous values.
   -1   156 
   -1   157 Because I didn't know how to implement this using PipeWire filter chains, I
   -1   158 prototyped it in python instead:
   -1   159 
   -1   160 ```python
   -1   161 import sys
   -1   162 
   -1   163 import numpy as np
   -1   164 import soundfile as sf
   -1   165 
   -1   166 CHUNK_SIZE = 2048
   -1   167 KEEP = 0.9
   -1   168 CUTOFF = 0.02
   -1   169 BOOST = 0.5
   -1   170 
   -1   171 audio_data, sample_rate = sf.read(sys.argv[1])
   -1   172 
   -1   173 chunks = []
   -1   174 min_magnitude = 0
   -1   175 max_magnitude = 1
   -1   176 
   -1   177 for start in range(0, len(audio_data), CHUNK_SIZE):
   -1   178     end = min(start + chunk_size, len(audio_data))
   -1   179 
   -1   180     fft_data = np.fft.fft(audio_data[start:end])
   -1   181     magnitude = np.abs(fft_data)
   -1   182     min_magnitude = np.min(magnitude) * (1 - KEEP) + min_magnitude * KEEP
   -1   183     max_magnitude = np.max(magnitude) * (1 - KEEP) + max_magnitude * KEEP
   -1   184 
   -1   185     spread_magnitude = (
   -1   186         (magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF)
   -1   187         / (max_magnitude - min_magnitude - (max_magnitude - min_magnitude) * CUTOFF)
   -1   188         * (max_magnitude ** BOOST)
   -1   189     )
   -1   190     spread_magnitude = np.clip(spread_magnitude, 0, 1)
   -1   191 
   -1   192     new_fft_data = spread_magnitude * np.exp(1j * np.angle(fft_data))
   -1   193     processed_chunk = np.fft.ifft(new_fft_data)
   -1   194     chunks.append(np.real(processed_chunk))
   -1   195 
   -1   196 sf.write('processed.flac', processed, sample_rate)
   -1   197 ```
   -1   198 
   -1   199 The result sounded OK (no noticeable distortion) but the quiet parts were
   -1   200 still too quiet.
   -1   201 
   -1   202 # EBU R 128
   -1   203 
   -1   204 In the meantime I did some reading on the last kind of filter that PipeWire had
   -1   205 to offer. I had never heard of EBU R 128 before. It turns out it has quite an
   -1   206 interesting story.
   -1   207 
   -1   208 EBU is short for the "European Broadcasting Union". That is the same
   -1   209 organization that does the [Eurovision Song
   -1   210 Contest](https://en.wikipedia.org/wiki/European_Song_Contest), so this already
   -1   211 starts glamorous.
   -1   212 
   -1   213 In the last few decades, there was a thing called the [Loudness
   -1   214 War](https://en.wikipedia.org/wiki/Loudness_war): Audio producers who wanted
   -1   215 their songs and jingles to be more noticeable used compression to increase the
   -1   216 average loudness of the sound, while leaving the peaks at the same level. [EBU
   -1   217 R 128](https://tech.ebu.ch/files/live/sites/tech/files/shared/r/r128.pdf)
   -1   218 provides loudness recommendations for its member organizations, which
   -1   219 effectively stopped the loudness war.
   -1   220 
   -1   221 We shouldn't give too much credit to EBU though. Much of the specification is
   -1   222 in turn based on [ITU-R
   -1   223 BS.1770-5](https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en) by the
   -1   224 International Telecommunication Union. This might actually be one of the best
   -1   225 standards I have ever read. It first gives a conceptual overview, then provides
   -1   226 all normative formulas, and then goes deep into the rationale and methodology.
   -1   227 It was a very interesting and at the same time approachable read.
   -1   228 
   -1   229 The only downside is of course the name. I can understand why EBU R 128 is more
   -1   230 commonly used.
   -1   231 
   -1   232 Loudness is typically measured as logarithm of power, which in turn is
   -1   233 calculated as the integral over the squared audio signal. In the case of ITU-R
   -1   234 BS.1770-S:
   -1   235 
   -1   236 $Loudness(y) = 10 \log_10\left(\int_{t=0}^T y(t)^2 dt\right) - 0.691$
   -1   237 
   -1   238 The unit for loudness is LKFS (Loudness, K-weighted, relative to full scale).
   -1   239 EBU uses the same unit, but calls it LUFS (Loudness units relative to full
   -1   240 scale).
   -1   241 
   -1   242 Before all that is calculated, frequencies are weighted to account for human
   -1   243 hearing. The industry standard is a curve simply called
   -1   244 [A-weighting](https://en.wikipedia.org/wiki/Sound_level_meter#Frequency_weighting).
   -1   245 ITU-R BS.1770-S however refers to a [study by
   -1   246 Soulodre](https://jcaa.caa-aca.ca/index.php/jcaa/article/download/1673/1420/1810)
   -1   247 that found that no weighting actually performs better, and a new curve called
   -1   248 RLB performs even better than that.
   -1   249 
   -1   250 In addition to the frequency weighting curve, ITU-R BS.1770-S also specifies an
   -1   251 algorithm to calculate "gated" loudness. In this version, power is calculated
   -1   252 as the average over many small chunks. Chunks that are too quite are ignored.
   -1   253 
   -1   254 On top of this, [EBU Tech
   -1   255 3341](https://tech.ebu.ch/files/live/sites/tech/files/shared/tech/tech3341.pdf)
   -1   256 defines three profiles:
   -1   257 
   -1   258 -   "Momentary Loudness" is measured over a 400ms window without gating
   -1   259 -   "Short-term Loudness" is measured over a 3s window without gating
   -1   260 -   "Integrated Loudness" is measured over the complete audio with gating
   -1   261 
   -1   262 If you want to use this system with PipeWire, the repo contains an [example of
   -1   263 how to use its ebur128
   -1   264 filter](https://gitlab.freedesktop.org/pipewire/pipewire/-/blob/master/src/daemon/filter-chain/35-ebur128.conf).
   -1   265 Fair warning though: The current version has [a
   -1   266 typo](https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4667) so that
   -1   267 "Shortterm" must be written as "Shorttem" instead.
   -1   268 
   -1   269 I have used this filter with some success. This really does normalize loudness.
   -1   270 However, there are still some issues. With the "Short-term" profile there is a
   -1   271 noticeable ramp when going from a quiet section to a loud section or the other
   -1   272 way around. So when there is a sudden bang after a quiet section, it gets
   -1   273 amplified even further.
   -1   274 
   -1   275 # Conclusion
   -1   276 
   -1   277 I want to be able to hear all dialogue, but I don't want loud explosions or
   -1   278 background noise to be amplified. It is tricky to make that distinction with
   -1   279 these simple techniques. I feel like I could get lost in trying to tweak all
   -1   280 the parameters to perfection, so I better stop here.
   -1   281 
   -1   282 PipeWire turned out to be extremely flexible in theory, but also very limited
   -1   283 in practice. For example, I wish I there was a builtin power filter (it has
   -1   284 $const^x$, but not $x^{const}$) or that it was possible to apply these filters to
   -1   285 control values (e.g. the gain factor generated by ebur128). While the
   -1   286 documentation is decent, I still had issues finding relevant information.
   -1   287 
   -1   288 At this point this is just a collection of notes. I will use the ebur128 filter
   -1   289 for a while and then maybe come back to this topic with some new ideas.