Part I of this series introduced jicyshout, an open-source project to get the Java Media Framework to play MP3-over-http streams provided by Shoutcast, Icecast, and Live 365 servers. In this part, we get into the value-added service of providing metadata about that stream.
There are multiple standards for providing this metadata, some formal, some ad hoc, aiming to serve different purposes. Since the two most popular standards put the data in the media stream, we first need to understand how MP3 streams work.
MP3 streams are arranged into "frames", each with a header that provides basic information about encoding type, bitrate, sampling frequency, etc. By using this approach instead of expecting a single monolithic header at the top of a file, MP3's are well-suited for streaming — a client can just start reading and ignoring bytes until it hits the beginning of a frame
To make the frame header easy to find, the header starts off with a
single FF
byte followed by another F
, or the
case of an "MPEG 2.5" frame, an E
. Getting an
FF
in purely random data is just a 1 in 256 chance of
course, and getting the next three bits on makes it a 1 in 32768,
making the header pretty easy to find, especially if a decoder
sanity-checks the rest of the supposed header. Here's what the whole
four-byte header looks like:
where A-F are
(respectively) one-bit flags for MPEG 2.5, CRC, pad, private,
copyright, and original. A detailed description of the header can be
found at MP3
Tech. What's relevant for now is the fact that once found, this
four-byte header gives a decoder everying it needs to know to start
reading and decoding the audio data.
ID3 is probably the best-known standard for providing MP3 metadata, and is supported by most software and hardware MP3 players. While it calls itself an "informal standard", ID3 is quite rigorous and far less ambiguous than the other schemes in this article.
Curiously, though, the ID3 website went dark in mid-July and has not reappeared as of this writing. That means we have to depend on the Google cache to refer to the introductory ID3v2 made easy and other documentation.
The original ID3 tag went at the end of a flat file. This was
determined to be bad for streaming, so ID3v2 moved the tag to the
front of the file. An ID3 tag could also be received in-stream,
provided it's not inside an MP3 frame (since that would mess up
parsing of the audio data). That would create a stream that looks
like this:
One interesting concern is how to keep the ID3 data, which the MP3
parser sees as "junk" between frames, from including an
FFF
or FFE
that could be mistaken
for a frame header. ID3 fixes this with a pair of related
schemes called "unsynchronization" and "synchsafe
integers". The former converts sequences of 11111111
111xxxxx
to 11111111 00000000 111xxxxx
inside ID3
frames, while the latter tweaks integers by declaring the top bit to
always be 0
and ignoring it, meaning the four-byte size
in the ID3 header has only 28 meaningful bits.
That said, nobody seems to actually put ID3 tags into network-streaming audio, and instead, ID3 tags are encountered only when reading from a local filesystem.
Like the MP3 frame, ID3 tags start with a common header. There
have been three major versions of the ID3v2 standard, but fortunately
the 10-byte header has remained constant:
The header begins with the string ID3
followed by two
version bytes indicating major version (02
,
03
or 04
) and revision. Bits in the
flags
indicate use of unsynchronization, an "extended
header" block, whether the tag is experimental, and whether a "footer"
block is present. The last 4 bytes are the size of the subsequent
data, in the aforementioned "synchsafe" format. Of course,
this is all defined in the standard — with the ID3 site MIA, I
was only able to get the ID3v2.3.0
docs from Google's cache.
What ID3 calls a "tag" is a collection of
"frames", each containing a name, a value, and needed
information such as size or text encoding for reading and parsing the
frame. ID3v2.0 used three-byte names, while ID3v2.3 increased that to
four-byte names. Thus, their frame headers look like this:
The majority of declared frames are textual in nature, and their
names start with a T
— in ID3v2.3, there's
TCOM
for composer name, TALB
for the
original album or CD the track came from, and curiously
TIT2
for song title. Why TIT2
and not
TIT1
? Because TIT1
allows you to group
different MP3's together, most obviously for classical music that has
distinct movements, for example an MP3 of the last movement of
Beethoven's Ninth Symphony might have "Sym. No. 9 in D" for
TIT1
, while TIT2
would be "Presto-O
Freunde, nicht diese Tone-Allegro assai".
The dictionary of available frames are quite rich, and even allow
for wide open user-defined text frames (TXXX
), URL frames
(WXXX
) and a private PRIV
frame. This
allows for vast new applications. For example, the various sites that
stream music from the Dance Dance Revolution video-game could be
turned into world-wide online games by using PRIV
frames
to send client apps information about what steps players are to make
in time to the music.
I said a few paragraphs ago that streaming-MP3 servers don't actually send ID3 data. So how do they work? The Shoutcast server also sends tags as part of the audio stream, but its approach is very different.
For one thing, the formality of the ID3 standard is absent; in fact, this web-board post may be the closest thing to an authoritative guide that exists.
Secondly, there is no effort in Shoutcast to play nicely alongside
the MP3 frames. Instead, shoutcast's metadata blocks can and do
appear right in the middle of a MPEG frame or header:
this of course requires a client to strip the metadata blocks before
they make it to the MP3 decoder.
A shoutcast stream starts with a series of name-value pairs
separated by CRLF's (in Java, \r\n
). These include
information about the stream and the server, sometimes using a
pseudo-HTML markup to encourage you to use the WinAMP player. Notice
how shoutcast is recognizable by its use of the term "icy",
a false cognate that has nothing to do with the Icecast server.
ICY 200 OK icy-notice1:<BR>This stream requires <a href="http://www.winamp.com/">Winamp</a><BR> icy-notice2:SHOUTcast Distributed Network Audio Server/win32 v1.8.2<BR> icy-name:Core-upt Radio icy-genre:Punk Ska Emo icy-url:http://www.core-uptrecords.com icy-pub:1 icy-metaint:8192 icy-br:56The really interesting value here is
icy-metaint
. It
will only be sent if the client sent the name:value pair
Icy-Metadata:1
in the http request headers before opening
the stream.
What icy-metaint
tells the client is how
often it can expect to receive the shoutcast metadata blocks —
the value is almost always 8192 bytes. So, after the intial headers
end with a blank line, a client can read 8192 bytes of real MP3 data
before having to handle the next metadata section.
The metadata block begins with a single byte, which indicates how many 16-byte segments need to be read. In real life, this value is almost always zero, since in-stream metadata is only sent when the song changes. When there is metadata, the format is completely different from that seen at the top of the file, for example:
StreamTitle='Final Fantasy 8 - Nobuo Uematsu - Blue Fields';StreamUrl='';From this, we can see that each name-value pair is separated by semi-colons, that an equals sign is used to separate name from value, and that the values are surrounded by single quotes.
Icecast and Live 365 streams offer some metadata, but don't do so by sending anything in the media stream itself.
Both of these servers send an initial collection of name-value pairs in the http response headers, although the only thing interesting in Live 365's case is the pair "Server:Nanocaster/2.0", which is the only way we know the name of their server software.
Icecast streams, though rare, are much more interesting, offering headers that describe the stream's title, home page, and even the physical location of the server, as in this example from the CalArts School of Music:
Server: icecast/1.3.12 Content-Type: audio/mpeg x-audiocast-location: Valencia CA x-audiocast-admin: tre@shoko.calarts.edu x-audiocast-server-url: http://shoko.calarts.edu x-audiocast-udpport: 8000 x-audiocast-mount: /som x-audiocast-name: CalArts School of Music x-audiocast-description: Default description x-audiocast-url: http://shoko.calarts.edu x-audiocast-genre: Classical Jazz Experimental World Rock x-audiocast-bitrate: 32 x-audiocast-public: 1
An interesting response header here is
x-audiocast-udpport
. It is only sent if the client
includes an x-audiocast-udpport
in the request headers,
indicating the client's interest in receiving metadata updates via UDP
packets (the client can indicate what port it would prefer to listen
on, but the response doesn't necessarily match the request).
jicyshout makes such a request, but work on code to read those
messages has only just been started... in other words, it doesn't work
yet.
Nevertheless, there's an interesting comparison to be made between the three major metadata systems:
Live 365 also offers information about the track it's streaming, but
only to listeners using the site's popup window. The playlist is in a
frame inside this window, pulled with a URL like
http://www.live365.com/pls/front?handler=playlist&cmd=view&handle=handle&site=..
,
where the handle
is a short name for the stream. Where
do you get the name from? If Live 365 sends you a .pls
file when you click the play button in their stream guide, a
file1
URL in the .pls
contains a
session
variable that appears to be the needed
handle
. Theoretically, you could screen-scrape this
html to get the currently-playing item, but any changes to the html
layout would break your code, to say nothing of whether Live 365 would
tolerate this kind of reverse-engineering for long.
To see how these tags are parsed and represented, start with a
quick glance of the
net.sourceforge.jicyshout.jicylib1.metadata
package.
This defines an abstract MP3Tag
class, which is just a simple
name-value pair. Concrete subclasses identify the source of a tag:
ID3Tag
, IcyTag
, HTTPHeaderTag
,
etc. Note that jicyshout calls these pairs
"tags", as a way to abstract away the differences between
ID3 frames, http headers, etc.
The StringableTag
interface indicates
that a value is a String or has a meaningful toString()
,
as opposed to the as-yet unimplemented Imageable
tag,
which would be used for ID3's graphic frames.
Back in the jicylib1
package, a
MP3StreamFactory
class is responsible for figuring out if
a stream has embedded tags, ie, ID3 or Shoutcast metadata. If so, it
returns a suitable stream, such as the ID3InputStream
or
the IcyInputStream
. Look through those classes for
implementations of parsing metadata from input streams. One thing you
might notice is that the Shoutcast parsing is arguably easier,
since we always know how many bytes we can read before the next block
of metadata. For ID3, we always have to search through every
byte[]
we read for the ID3
string that
indicates an ID3 frame header.
Both of those streams implement MP3MetadataParser
, an
interface to provide TagParseEvents
when metadata is
parsed. Typically, though, the SimpleMP3DataSource
is
the only object that gets events from the stream, but he's also an
MP3MetadataParser
, re-firing the events he receives from
the stream while also firing events for tags parsed from http headers
and theoretically from other sources.
In practical terms, this means a caller that creates a
SimpleMP3DataSource
as part of setting up a
Player
can also use the object to get metadata by calling
getTags()
to retrieve metadata that was parsed while
setting up the streams, and by adding a TagParseListener
to handle new metadata as it arrives.
The net.sourceforge.jicyshout.gui.SimpleDevGUI
shows a
straightforward use of this, putting parsed MP3Tags
into
a JTable
. Part I of this series had a screenshot of a
playing a Shoutcast stream and displaying its metadata, so here is
what it looks like when you hand-enter a file://
URL to a
local file with ID3 tags
There's much to do on jicyshout, but hopefully this series has helped to show that streaming MP3 audio is not only possible in the Java Media Framework, but also straightforward to implement and capable of some nifty tricks.