Ever wondered how an MP3 encoder worked its magic? A search on Google will give you thousands of definitions, so I went back to some old documents to understand better what actually happens when an MP3 file gets encoded or decoded. Here are some factoids about the process.
How MP3 Got Its Name
MP3 is an acronym for MPEG Level 1, Layer III. MPEG is itself an acronym for Motion Picture Experts Group, established in 1987 as a working group of the International Standards Organization (ISO). MPEG’s job was to establish standards for compressed digital video and audio signals.
In 1991 MPEG called for submissions for audio coding and got 14 responses. A group of researchers from Germany submitted their work, a codec named ASPEC, and with some tweaking this became the basis of MPEG Layer 1, Level III.
As you guessed, two other submissions got the nod as well. These became Level I and Level II – lower complexity, lower quality, simpler to implement. However as computers got more powerful and capable of handling the complicated but more efficient Layer III standard, the others have fallen into disuse.
The filename extension chosen for MPEG Layer 1 Level III was .mp3. And this is what gave MP3 its name.
MP3 is Proprietary
The original research was performed by the Erlangen-Nuremberg University and sponsored by a German company Fraunhofer IIS, and further work was in part sponsored by French company Thomson SA. Between them Fraunhofer and Thomson control the licensing of MP3 technology throughout the world.
However Fraunhofer and Thomson have not been too diligent about enforcing their rights, at least as far as software is concerned. Freeware encoders and decoders are easy to locate on the Internet, although many shareware and all commercial products do indeed license their MP3 technology from these two companies (as do all portable MP3 players).
Psychoacoustics
The basis of MP3’s effectiveness is psychoacoustics, which is the study of how humans perceive sound. The human ear is not a perfect instrument, and the algorithms used in MP3 compression take advantage of this.
Auditory masking is a phenomenon whereby we can’t hear weaker audio signals in a particular frequency range in the presence of a stronger audio signal at a single, similar frequency. At low frequencies, the range around one of these dominant frequencies may only be 100 Hz. At the upper end of the scale, it’s around 4 kHz.
Psychoacoustics and auditory masking are the guts of MP3 encoding – take samples of the audio signal across 32 predefined frequency bands, and break them into single tones of varying volume. It’s worth pointing out that MP3 encoding is a whole lot more complicated than this. There are subtleties in the way the signal is analysed that help minimise the effect of sound “artifacts”, like echoes or ringing, and the signal-noise ratio comes in for special treatment as well.
MP3 is Lossy
Lossy is the opposite of lossless. It means that some of the original signal is lost irretrievably, and can’t be recreated. But that’s the point of using psychoacoustic theory – the audio that’s lost is the stuff we can’t hear (at least, not at high bit rates).
Frames
MP3 files are encoded into frames – chunks of data that represent the audio signal. The time represented by a frame varies according to the bit rate specified during encoding (which can range from 32-320 kbit/sec), the tonal complexity of the audio, and (sometimes) the sample bit depth (which can be 8 bit or 16 bit).
Each frame holds 1,152 samples, organised as a group of 36 sets of 32 frequency samples (from each frequency band). So frames are based on the number of samples, not time. A single frame doesn’t cover a fixed number of milliseconds, it covers a fixed number of samples.
Compression
Those frames aren’t fixed length. What happens is that some frames that represent complex-toned audio can use space that’s left over from shorter frames earlier in the file. If that extra space remains unused, then the file is shorter and compression is higher. If it is used, then sound quality is better than it would be otherwise.
Also, there’s a reasonable amount of empty space in each frame. MP3 uses a form of compression called Huffman coding to squeeze them down to a minimal size without losing frame data. This is a bit like zipping a file with winzip.
Sample Rates
MP3 files usually have a specified sample rate of 44.1 kHz, which is the same as a retail CD. Other sample rates allowed for are 32 kHz and 48 kHz, and there are plenty of encoders that use non-standard values down to 8 kHz.
The standard 44.1 kHz is good for human hearing, theoretically it can represent frequencies up to 20, 500 kHz (beyond what we can hear). That’s the one I use. If I was a dog I’d use 48 kHz, but I’m not.
For speech, lo-fi or radio 32 kHz could be the choice to get less dynamic range and smaller file sizes, and 48 kHz is really only there because it was used on digital tapes and , more recently on DVDs.
Sample Depth
Only two choices are available, 8 bit and 16 bit. This is different to bit rate, which is the bits per second used to represent the audio. 320 kbit/sec is the highest quality, and 32 kbit/sec is the lowest.
Sample depth means the number of bits used to represent a sample. If it’s 8 bit, it means the sample can have one of 256 different values. If it’s 16 bit, the sample can have one of 65,536 different values. So 16 bit is much higher resolution and the one to use. It doesn’t make much of a difference to the file size anyway.
Sample depth only matters when you’re decoding the MP3 file (playing it back), as the MP3 encoding process uses the bits it needs to represent the audio signal. It fits it to the required bit rate (a predefined value between 32-320 kbit/sec) and adjusts the file size by using more frames as required.
Tags
MP3 tags aren’t a part of the MPEG standard. They were an afterthought and even now, aren’t covered by an ISO standard. ID3v2.3 tags are used almost universally these days and support for ID3v2.4 is growing. Tag data can be located at any frame boundary but usually gets located at the beginning of the MP3 file. The older ID3v1.1 tags are positioned at the end of the file.
More Information?
I wanted to keep this post simple, without too much techo detail. But if you’re hankering for some in depth information, here’s a few links that can take you further:
Wikipedia
Audio data compression (entry)
Others
Yen Pan, D., A tutorial on MPEG/ audio compression (1995)



0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
You must log in to post a comment.