About mp3
MP3 is a popular digital
audio encoding and lossy compression
format
invented and standardized in 1991 by a team of engineers
working in the framework of the ISO/IEC MPEG audio committee
under the chairmanship of Professor Hans Musmann (University
of Hannover - Germany). It was
designed to greatly reduce the amount of data required to
represent audio, yet still sound like a faithful reproduction
of the original uncompressed audio to most listeners. In
popular usage, MP3 also refers to files of sound or music
recordings stored in the MP3 format on computers.
Overview
MP3
is a compression format. It provides a representation
of pulse-code modulation-encoded (PCM) audio data
in a much smaller size by discarding portions that are considered
less important to human hearing (similar to JPEG, a lossy
compression for images).
A
number of techniques are employed in MP3 to determine which
portions of the audio can be discarded, including psychoacoustics.
MP3 audio can be compressed with different bit rates, providing
a range of tradeoffs between data size and sound quality.
The
MP3 format uses, at its heart, a hybrid transformation to
transform a time domain signal into a frequency domain signal:
MP3 Surround,
a version of the format supporting 5.1 channels for surround
sound, was introduced in December 2004. MP3 Surround
is backward compatible with standard stereo MP3,
and file sizes are similar.
In
terms of the MPEG
specifications, AAC (Advanced audio coding) from MPEG-4 is to be the
successor of the MP3 format, although there has been a significant
movement to create and popularize other audio formats. Nevertheless,
any succession is not likely to happen for a significant
amount of time due to MP3's overwhelming popularity (MP3
enjoys extremely wide popularity and support, not just by
end-users and software but by hardware such as DVD and CD
players).
History
Development
MPEG-1
Audio Layer 2 encoding began as the Digital Audio Broadcast (DAB) project managed
by Egon Meier-Engelen of the DFVLR (later on called DLR
= Deutsche Luft und Raumfahrt = German Aerospace Agency)
in Germany. This project
was financed by the European
Union as a part of the EUREKA research program where it was commonly known as EU-147.
EU-147 ran from 1987
to 1994.
In
1991, there were two proposals
available: Musicam (known as Layer 2), and ASPEC (Adaptive Spectral Perceptual
Entropy Coding). The Musicam technique, as proposed by Philips
(The Netherlands), CCETT (France), IRT (Germany) was chosen
due to its simplicity and error robustness, as well as its
low computational power associated to the encoding of high
quality compressed audio. The Musicam format based on subband
coding was key to settle the basis of the MPEG Audio compression
format (sampling rates, structure of frames, headers, number
of samples per frame). Its technologies and ideas were fully
incorporated into the definition of ISO MPEG Audio Layer
I and Layer II and further on of the Layer III (MP3) format.
Under the chairmanship of Professor Mussmann (University
of Hannover) the editing of the standard was made under
the responsibilities of L. van de Kerkhof (Layer I) and
G. Stoll (Layer II).
Further
on a working group consisting of J. D. Johnston (US), Gerhard
Stoll (Germany), Yves-François Dehery (France), Karlheinz
Brandenburg (Germany) took ideas from Musicam and ASPEC,
added some of their own ideas and created MP3, which was
designed to achieve the same quality at 128 kbit/s as MP2
at 192 kbit/s.
All
algorithms were finalized in 1992 as part of MPEG-1, the first standard suite by MPEG, which resulted in
the international standard ISO/IEC 11172-3, published
in 1993.
Further work on MPEG audio was finalized in 1994 as part of the second suite of MPEG standards, MPEG-2, more formally
known as international standard ISO/IEC 13818-3, originally
published in 1995.
Compression
efficiency of encoders is typically defined by the bit rate
because compression rate depends on the bit depth and sampling
rate of the input signal. Nevertheless, there are often
published compression rates that use the CD parameters
as references (44.1 kHz, 2 channels at 16 bits per channel or 2x16 bit). Sometimes the
Digital Audio Tape (DAT) SP parameters are used
(48 kHz, 2x16 bit). Compression ratios with this reference
are higher, which demonstrates the problem of the term compression
ratio for lossy encoders.
Karlheinz
Brandenburg used a CD recording of Suzanne
Vega's song Tom's Diner to assess the MP3 compression algorithm. This song was chosen
because of its softness and simplicity, making it easier
to hear imperfections in the compression format during playbacks.
Some more serious and critical audio excerpts (glockenspiel,
triangle, accordion, ...) were taken from the EBU V3/SQAM
reference compact disc and have been used by professional
sound engineers to assess the subjective quality of the
MPEG Audio formats.
MP3 goes public
A
reference simulation software written in C language known
as ISO 11172-5 was developed by the members of the ISO MPEG
Audio committee in order to produce bit compliant MPEG Audio
files (Layer 1, Layer 2, Layer 3). Working in non real time
on a number of operating systems it was able to demonstrate
the first real time hardware decoding (DSP based) of compressed
audio. Some other real time implementation of MPEG Audio
encoders were available for the purpose of digital broadcasting
(radio DAB, television DVB) towards consumer receivers and
set top boxes.
Later
on, on July
7, 1994
the Fraunhofer Society released the first software MP3 encoder
called l3enc. The filename extension .mp3 was chosen by the
Fraunhofer team on July 14, 1995 (previously, the files had been named .bit). With the
first real-time software MP3 player Winplay3 (released
September 9th, 1995) many people were able to encode and
playback MP3 files on their PCs. Because of the relatively
small hard drives
back in that time (~500 MB)
the technology was essential to store music for listening
pleasure on a computer.
MP2 and MP3 and the Internet
In
October 1993, MP2 (MPEG-1 Audio Layer
2) files appeared on the Internet and were
often played back using the Xing MPEG Audio Player, and later in a program for Unix by Tobias Bading called
MAPlay, which was initially
released on February 22nd, 1994 (MAPlay was also ported
to the Microsoft
Windows).
Initially
the only encoder available for MP2 production was the Xing
Encoder, accompanied by the program CDDA2WAV, a CD ripper
that transformed CD audio tracks to computer data files.
The
Internet Underground Music Archive
(IUMA) is generally recognized as the start of the on-line
music revolution. IUMA was the Internet's first high-fidelity
music web site, hosting thousands of authorized MP2 recordings
before MP3 or the web was popularized. IUMA was started
by Rob Lord (who later headed
pioneering Nullsoft) and Jeff
Patterson, both from the University of California, Santa
Cruz, in 1993. Other founding members include Jon Luini,
Brandee Selck, and Ahin Savara.
In
the first half of 1995
through the late 1990s,
MP3 files began flourishing on the Internet. MP3
popularity was mostly due to, and interchangeable with,
the successes of companies and software packages like Nullsoft's
Winamp(released in
1997), mpg123, and Napster (released
in 1999).
Those programs made it very easy for the average user to
playback, create, share, and collect MP3s.
Controversies
regarding peer-to-peer file sharing
of MP3 files have flourished in recent years — largely because
high compression enables sharing of files that would otherwise
be too large and cumbersome to share. Due to the vastly
increased spread of MP3s through the Internet some major
record labels reacted by filing a lawsuit
against Napster to protect their Copyrights (see
also intellectual property).
Commercial
online music distribution services (like the iTunes Music Store) usually prefer other/proprietary
music file formats that support Digital Rights Management (DRM) to control
and restrict the use of digital music. This preference is
most likely chosen in an attempt to prevent piracy of copyright protected
materials, but most users with at least an intermediate
understanding of computers will know that it's just a matter
of time before someone else makes it easy to convert such
proprietary file formats.
Quality
of MP3 audio
Because
MP3 is a lossy
format, it is able to provide a number of different options
for its "bit rate"—that is, the number of bits of encoded data that are used to represent each second of audio.
Typically rates chosen are between 128 and 256 kilobit per second. By
contrast, uncompressed audio as stored on a compact
disc has a bit rate of about 1400 kbit/s.
MP3
files encoded with a lower bit rate will generally play
back at a lower quality. With too low a bit rate, "compression
artifacts" (i.e., sounds that were not present in the
original recording) may appear in the reproduction. A good
demonstration of compression artifacts is provided by the
sound of applause: it is hard to compress because it is
random, therefore the failings of the encoder are more obvious,
and are audible as ringing.
As
well as the bit rate of the encoded file, the quality of
MP3 files depend on the quality of the encoder and the difficulty
of the signal being encoded. For average signals with good
encoders, many listeners accept the MP3 bit rate of 128
kibit/s as near enough to compact disc quality for them,
providing a compression
ratio of approximately 11:1. However, listening tests
show that with a bit of practice many listeners can reliably
distinguish 128 kbit/s MP3s from CD originals; in many cases
reaching the point where they consider the MP3 audio to
be of unacceptably low quality. Yet other listeners, and
the same listeners in other environments (such as in a noisy
moving vehicle or at a party) will consider the quality
acceptable.
Fraunhofer Gesellschaft (FhG) publish on their
official webpage the following compression ratios and data
rates for MPEG-1 Layer 1, 2 and 3, intended for comparison:
- Layer 1: 384 kbit/s, compression 4:1
- Layer 2: 192...256 kbit/s, compression
6:1...8:1
- Layer 3: 112...128 kbit/s, compression
10:1...12:1
The
differences between the layers are caused by the different
psychoacoustic models used by them; the Layer 1 algorithm
is typically substantially simpler, therefore a higher bit
rate is needed for transparent encoding. However, as
different encoders use different models, it is difficult
to draw absolute comparisons of this kind.
Many
people consider these quoted rates as being heavily skewed
in favour of Layer 2 and Layer 3 recordings. They would
contend that more realistic rates would be as follows:
- Layer 1: excellent at 384 kbit/s
- Layer 2: excellent at 256...384 kbit/s,
very good at 224...256 Kbit/s, good at 192...224 Kbit/s
- Layer 3: excellent at 224...320 Kbit/s,
very good at 192...224 Kbit/s, good at 128...192 Kbit/s
When
comparing compression schemes, it is important to use encoders
that are of equivalent quality. Tests may be biased against
older formats in favour of new ones by using older encoders
based on out-of-date technologies, or even buggy encoders
for the old format. Due to the fact that their lossy encoding
loses information, MP3 algorithms work hard to ensure that
the parts lost cannot be detected by human listeners by
modeling the general characteristics of human hearing (e.g.,
due to noise masking). Different
encoders may achieve this with varying degrees of success.
A
few possible encoders:
- LAME first created
by Mike Cheng in early 1998. It is (in contrast to others)
a fully LGPL'd MP3 encoder,
with excellent speed and quality, rivaling even MP3's
technological successors.
- Fraunhofer Gesellschaft: Some encoders
are good, some have bugs.
Many
early encoders that are no longer widely used:
- ISO dist10 reference code
- Xing
- BladeEnc
- ACM Producer Pro.
Good
encoders produce acceptable quality at 128 to 160 Kibit/s
and near-transparency at 160 to 192 kbit/s,
while low quality encoders may never reach transparency,
not even at 320 kbit/s. It is therefore misleading to speak
of 128 kbit/s or 192 kbit/s quality, except in the context
of a particular encoder or of the best available encoders.
A 128 kbit/s MP3 produced by a good encoder might sound
better than a 192 kbit/s MP3 file produced by a bad encoder.
It
is important to note that quality of an audio signal is
subjective. A given bit rate suffices for some listeners
but not for others. Individual acoustic perception may vary,
so it is not evident that a certain psychoacoustic
model can give satisfactory results for everyone. Merely
changing the conditions of listening, such as the audio
playing system or environment, can expose unwanted distortions
caused by lossy compression. The numbers given above are
rough guidelines that work for many people, but in the field
of lossy audio compression the only true measure of the
quality of a compression process is to listen to the results.
If
your aim is to archive sound files with no loss of quality
(or work on the sound files in a studio for example), then
you should use Lossless compression algorithms, currently
capable of compressing 16-bit PCM audio to 38% while leaving
the audio identical to the original, such as Lossless Audio
LA,
Apple
Lossless, FLAC, Windows Media Audio 9 Lossless
(wma) and Monkey's Audio (among others). Lossless
formats are strongly preferred for material that will be
edited, mixed, or otherwise processed because the perceptual
assumptions made by lossy encoders may not hold true after
processing. The losses produced by multiple stages of coding
may also compound each other, becoming more evident when
the signal is reencoded after processing. Lossless formats
produce the best possible result, at the expense of a lower
compression ratio.
Some
simple editing operations, such as cutting sections of audio,
may be performed directly on the encoded MP3 data without
necessitating reencoding. For these operations, the concerns
mentioned above are not necessarily relevant, as long as
appropriate software (such as mp3DirectCut and MP3Gain)
is used to prevent extra decoding-encoding steps.
Bit rate
The
bit rate is
variable for MP3 files. The general rule is that more information
is included from the original sound file when a higher bit
rate is used, and thus the higher the quality during play
back. In the early days of MP3 encoding, a fixed bit rate
was used for the entire file.
Bit
rates available in MPEG-1 Layer 3 are 32, 40, 48, 56, 64,
80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, and
the available sample frequencies are 32, 44.1 and 48 kHz.
44.1 kHz is almost always used (coincides with the sampling
rate of compact
discs), and 128 kbit/s has become the de facto "good
enough" standard, although 192 Kbit/s is becoming increasingly
popular over peer-to-peer file sharing
networks. MPEG-2 and [the non-official] MPEG-2.5 includes
some additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64,
80, 96, 112, 128, 144, 160 kbit/s
Variable
bit rates (VBR) are also possible. Audio in MP3 files
are divided into frames (which have their own bit rate)
so it is possible to change the bit rate dynamically as
the file is encoded (although not originally implemented,
VBR is in extensive use today). This technique makes it
possible to use more bits for parts of the sound with higher
dynamics (more sound movement) and fewer bits for
parts with lower dynamics, further increasing quality and
decreasing storage space. This method compares to a sound
activated tape recorder that reduces tape consumption by
not recording silence. Some encoders utilize this technique
to a great extent.
Non-standard
bitrates up to 640 kbit/s can be achieved with the LAME encoder and --freeformat
option, however only few MP3 players can play those files.
Design
limitations of MP3
There
are several limitations inherent to the MP3 format that
cannot be overcome by using a better encoder.
Newer
audio compression formats such as Vorbis and AAC no longer have these limitations.
In
technical terms, MP3 is limited in the following ways:
- Bitrate is limited to a maximum of 320
kbit/s
- Time resolution can be too low for highly
transient signals
- No scale factor band for frequencies
above 15.5/15.8 kHz
- Joint
stereo is done on a frame-to-frame basis
- Encoder/decoder overall
delay is not defined, which means lack of official provision
for gapless playback; gaps may be introduced between
tracks, although this can be avoided to a degree by using
LAME to encode.
Nevertheless,
a well-tuned MP3 encoder can perform competitively even
with these restrictions.
Encoding
of MP3 audio
The
MPEG-1 standard does
not include a precise specification for an MP3 encoder.
The decoding algorithm and file format, as a contrast, are
well defined. Implementers of the standard were supposed
to devise their own algorithms suitable for removing parts
of the information in the raw audio (or rather its MDCT representation in the frequency
domain). This is the domain of psychoacoustics,
which aims at understanding how human acoustical perception
works (both in our ears and in our brain).
As
a result, there are many different MP3 encoders available,
each producing files of differing quality. Comparisons are
widely available, so it is easy for a prospective user of
an encoder to research the best choice. It must be kept
in mind that an encoder that is proficient at encoding at
higher bitrates (such as LAME,
which is in widespread use for encoding at higher bitrates)
is not necessarily as good at other, lower bitrates.
Decoding
of MP3 audio
Decoding,
on the other hand, is carefully defined in the standard.
Most decoders are "bitstream compliant",
meaning that the uncompressed output they produce from a
given MP3 file will be the same (within a specified degree
of rounding tolerance)
as the output specified mathematically in the standard document.
The MP3 file has a standard format which is a frame consisting
of 384, 576, or 1152 samples (depends on MPEG version and
layer) and all the frames have associated header information(32
bits) and side information(9, 17, or 32 bytes, depending
on MPEG version and stereo/mono).The header and side information
help the decoder to decode the associated huffman encoded
data correctly.
Therefore,
for the most part, comparison of decoders is almost exclusively
based on how computationally efficient they are (i.e., how
much memory
or CPU time they use in the
decoding process).
ID3 and
other tags
Main articles: ID3 and APEv2 tag
A
"tag" is data stored in an MP3 (as well as other
formats) that contains metadata
such as the title, artist, album, track number or other
information about the MP3 file to be added to the file itself.
The most widespread standard tag formats are currently the
ID3
ID3v1 and ID3v2 tags, and the more recent APEv2 tag.
APEv2
was originally developed for the MPC file format (see the APEv2 specification).
APEv2 can coexist with ID3 tags in the same file, but it
can also be used by itself.
Volume
normalization
As
compact
discs and other various sources are recorded and mastered
at different volumes, it is useful to store volume information
about a file in the tag so that at playback time, the volume
can be dynamically adjusted.
A
few standards for encoding the gain of an MP3 file have
been proposed. The idea is to normalize the volume (not
the volume peaks) of audio files, so that the volume
does not change between consecutive tracks.
The
most popular and widely used solution for storing replay
gain is known simply as "Replay
Gain". Typically, the average volume and clipping
information about an audio track is stored in the metadata
tag.
Alternative
technologies
Many
other lossy audio codecs
exist, including:
- MPEG-1/2
Audio Layer 2 (MP2), MP3's predecessor;
- Ogg Vorbis from the
Xiph.org Foundation, a free
software and patent free codec.
- MPC, also known as Musepack (formerly
MP+), a derivative of MP2;
- mp3PRO from Thomson Multimedia combining MP3 with SBR;
- AC-3, used in Dolby
Digital and DVD;
- ATRAC, used in
Sony's Minidisc;
- MPEG-4 AAC, used by Apple's
iTunes Music Store and iPod
- Windows Media Audio (WMA) from Microsoft.
- QDesign, used
in QuickTime
at low bitrates;
- AMR-WB+
Enhanced Adaptive Multi Rate WideBand codec, optimized
for cellular and other limited bandwidth use;
- RealAudio
from RealNetworks, frequently in use for streaming on websites;
- Speex, free software
and patent free codec based on CELP specifically
designed for speech and VoIP.
mp3PRO,
MP3, AAC, and MP2 are all members of the same technological
family and depend on roughly similar psychoacoustic models. The Fraunhofer Gesellschaft owns many of the basic
patents
underlying these codecs, with Dolby Labs, Sony, Thomson Consumer Electronics, and AT&T holding
other key patents.
There
are also some lossless audio compression methods used
on the Internet. While they are not similar to MP3, they
are good examples of other compression schemes available.
These include:
Listening
tests
have attempted to find the best-quality lossy audio codecs
at certain bitrates. The tests have suggested that for some
audio samples, newer audio codecs including Ogg Vorbis,
mp3PRO, AC-3, Windows Media Audio, MPC and RealAudio perform
better than MP3. Generally, these codecs achieve the equivalent
of MP3 128kbit/s at around 80kbit/s. At 128kbit/s, Ogg Vorbis
and MPC performed marginally better than other codecs. At
64kbit/s, AAC and mp3pro performed marginally better than
other codecs. At high bitrates (128kbit/s+), most people
do not hear significant differences. What is considered
'CD quality' is quite subjective; for some 128kbit/s MP3
is sufficient, while for others 192kbit/s MP3 is necessary.
Though
proponents of newer codecs such as WMA and RealAudio have
asserted that their respective algorithms can achieve CD
quality at 64 kbit/s, listening tests have shown otherwise;
however, the quality of these codecs at 64 kbit/s is definitely
superior to MP3 at the same bitrate. The developers of the
patent-free Ogg
Vorbis codec claim
that their algorithm surpasses MP3, RealAudio and WMA sound
quality, and the listening tests mentioned above support
that claim. Thomson claims that its mp3PRO codec achieves
CD quality at 64 kbit/s, but listeners have reported that
a 64 kbit/s mp3PRO file compares in quality to a 112 kbit/s
MP3 file and does not come reasonably close to CD quality
until about 80 kbit/s.
MP3,
which was designed and tuned for use alongside MPEG-1/2
Video, generally performs poorly on monaural data at less
than 48 kbit/s or in stereo at less than 80 kbit/s.
Licensing and patent issues
Thomson Consumer
Electronics controls licensing of the MPEG-1/2 Layer 3 patents
in countries that recognize software
patents, including the United States and Japan, but not EU countries.
Thomson has been actively enforcing these patents. Thomson
has been granted software patents in EU countries, but it is unclear
whether or not they would be enforced by courts there. See
Software patents
under the European Patent Convention.
In
September 1998, the Fraunhofer Institute sent a letter to
several developers of MP3 software stating that a license
was required to "distribute and/or sell decoders and/or
encoders". The letter claimed that unlicensed products
"infringe the patent rights of Fraunhofer and THOMSON.
To make, sell and/or distribute products using the [MPEG
Layer-3] standard and thus our patents, you need to obtain
a license under these patents from us."
These
patent issues significantly slowed the development of unlicensed
MP3 software and led to increased focus on creating and
popularizing alternatives such as WMA and Ogg Vorbis.
Microsoft,
the makers of the Windows operating system, chose to move
away from MP3 to their own proprietary Windows Media formats to avoid the licensing issues
associated with the patents. Until the key patents expire,
open source
/ free
software encoders and players appear to be illegal for
commercial use in countries that recognize software patents.
For
information about licensing fees see here and here.
In
spite of the patent restrictions, the perpetuation of the
MP3 format continues; the reasons for this appear to be
the network
effects caused by:
- familiarity with the format, not knowing
alternatives exist,
- the fact that these alternatives do
not universally provide a definite advantage over MP3,
- the large quantity of music now available
in the MP3 format,
- the wide variety of existing software
and hardware that takes advantage of the file format,
- the lack of DRM-protection technology,
which makes MP3 files easy to edit, copy and distribute
over networks,
- the majority of home users not knowing
or not caring about the software patent controversy, which
is in general irrelevant to their choice of the MP3 format
for personal use.
Sisvel
S.p.A. [1]
and Audio MPEG, Inc. [2], are suing Thomson for patent infringement
on MP3 technology[3].
Audio MPEG also starts licensing MP3 to vendors of MP3,
so legal status of MP3 is unclear.
Online
music resources
Tools
such as iRate
try to make it easier to find music that matches the listener's
tastes. There are several online music stores. Apple's iTunes store is presently the most popular commercial
online music offering. Independent artists are able to use
smaller sites to provide distribution. A controversial MP3
portal is the Russian site AllOfMP3.com,
which through their country's copyright laws can legally
distribute music by any label or artist. The music industry
has closed down many file sharing networks and the publics
urge for free mp3 downloads has made the way for sites as
bestmp3links.com
and Erik Brown's MP3 Links that list links to free legal
mp3 download sites.
There
are also several online columnists who edit news sites focused
on digital music and the grassroots community it spawned.
They include Richard
Menta's MP3 Newswire, an early MP3 news site started in 1998,
Jon Newton's P2Pnet,
and Thomas Mennecke's Slyck.com. Other sites like Download.com
and Vitaminic.com which
allow artists to choose to post their own music for free
download.
[edit]
See also