Part I of this series introduced jicyshout, an open-source project to get the Java Media Framework to play MP3-over-http streams provided by Shoutcast, Icecast, and Live 365 servers. In this part, we get into the value-added service of providing metadata about that stream.

There are multiple standards for providing this metadata, some formal, some ad hoc, aiming to serve different purposes. Since the two most popular standards put the data in the media stream, we first need to understand how MP3 streams work.

MP3 streams are arranged into "frames", each with a header that provides basic information about encoding type, bitrate, sampling frequency, etc. By using this approach instead of expecting a single monolithic header at the top of a file, MP3's are well-suited for streaming — a client can just start reading and ignoring bytes until it hits the beginning of a frame

To make the frame header easy to find, the header starts off with a single FF byte followed by another F, or the case of an "MPEG 2.5" frame, an E. Getting an FF in purely random data is just a 1 in 256 chance of course, and getting the next three bits on makes it a 1 in 32768, making the header pretty easy to find, especially if a decoder sanity-checks the rest of the supposed header. Here's what the whole four-byte header looks like:

where A-F are (respectively) one-bit flags for MPEG 2.5, CRC, pad, private, copyright, and original. A detailed description of the header can be found at MP3 Tech. What's relevant for now is the fact that once found, this four-byte header gives a decoder everying it needs to know to start reading and decoding the audio data.

The ID3v2 standard

ID3 is probably the best-known standard for providing MP3 metadata, and is supported by most software and hardware MP3 players. While it calls itself an "informal standard", ID3 is quite rigorous and far less ambiguous than the other schemes in this article.

Curiously, though, the ID3 website went dark in mid-July and has not reappeared as of this writing. That means we have to depend on the Google cache to refer to the introductory ID3v2 made easy and other documentation.

The original ID3 tag went at the end of a flat file. This was determined to be bad for streaming, so ID3v2 moved the tag to the front of the file. An ID3 tag could also be received in-stream, provided it's not inside an MP3 frame (since that would mess up parsing of the audio data). That would create a stream that looks like this:

One interesting concern is how to keep the ID3 data, which the MP3 parser sees as "junk" between frames, from including an FFF or FFE that could be mistaken for a frame header. ID3 fixes this with a pair of related schemes called "unsynchronization" and "synchsafe integers". The former converts sequences of 11111111 111xxxxx to 11111111 00000000 111xxxxx inside ID3 frames, while the latter tweaks integers by declaring the top bit to always be 0 and ignoring it, meaning the four-byte size in the ID3 header has only 28 meaningful bits.

That said, nobody seems to actually put ID3 tags into network-streaming audio, and instead, ID3 tags are encountered only when reading from a local filesystem.

Like the MP3 frame, ID3 tags start with a common header. There have been three major versions of the ID3v2 standard, but fortunately the 10-byte header has remained constant:

The header begins with the string ID3 followed by two version bytes indicating major version (02, 03 or 04) and revision. Bits in the flags indicate use of unsynchronization, an "extended header" block, whether the tag is experimental, and whether a "footer" block is present. The last 4 bytes are the size of the subsequent data, in the aforementioned "synchsafe" format. Of course, this is all defined in the standard — with the ID3 site MIA, I was only able to get the ID3v2.3.0 docs from Google's cache.

What ID3 calls a "tag" is a collection of "frames", each containing a name, a value, and needed information such as size or text encoding for reading and parsing the frame. ID3v2.0 used three-byte names, while ID3v2.3 increased that to four-byte names. Thus, their frame headers look like this:

The majority of declared frames are textual in nature, and their names start with a T — in ID3v2.3, there's TCOM for composer name, TALB for the original album or CD the track came from, and curiously TIT2 for song title. Why TIT2 and not TIT1? Because TIT1 allows you to group different MP3's together, most obviously for classical music that has distinct movements, for example an MP3 of the last movement of Beethoven's Ninth Symphony might have "Sym. No. 9 in D" for TIT1, while TIT2 would be "Presto-O Freunde, nicht diese Tone-Allegro assai".

The dictionary of available frames are quite rich, and even allow for wide open user-defined text frames (TXXX), URL frames (WXXX) and a private PRIV frame. This allows for vast new applications. For example, the various sites that stream music from the Dance Dance Revolution video-game could be turned into world-wide online games by using PRIV frames to send client apps information about what steps players are to make in time to the music.

The Shoutcast standard

I said a few paragraphs ago that streaming-MP3 servers don't actually send ID3 data. So how do they work? The Shoutcast server also sends tags as part of the audio stream, but its approach is very different.

For one thing, the formality of the ID3 standard is absent; in fact, this web-board post may be the closest thing to an authoritative guide that exists.

Secondly, there is no effort in Shoutcast to play nicely alongside the MP3 frames. Instead, shoutcast's metadata blocks can and do appear right in the middle of a MPEG frame or header:

this of course requires a client to strip the metadata blocks before they make it to the MP3 decoder.

A shoutcast stream starts with a series of name-value pairs separated by CRLF's (in Java, \r\n). These include information about the stream and the server, sometimes using a pseudo-HTML markup to encourage you to use the WinAMP player. Notice how shoutcast is recognizable by its use of the term "icy", a false cognate that has nothing to do with the Icecast server.

ICY 200 OK
icy-notice1:<BR>This stream requires <a href="http://www.winamp.com/">Winamp</a><BR>
icy-notice2:SHOUTcast Distributed Network Audio Server/win32 v1.8.2<BR>
icy-name:Core-upt Radio
icy-genre:Punk Ska Emo
icy-url:http://www.core-uptrecords.com
icy-pub:1
icy-metaint:8192
icy-br:56

The really interesting value here is icy-metaint. It will only be sent if the client sent the name:value pair Icy-Metadata:1 in the http request headers before opening the stream.

What icy-metaint tells the client is how often it can expect to receive the shoutcast metadata blocks — the value is almost always 8192 bytes. So, after the intial headers end with a blank line, a client can read 8192 bytes of real MP3 data before having to handle the next metadata section.

The metadata block begins with a single byte, which indicates how many 16-byte segments need to be read. In real life, this value is almost always zero, since in-stream metadata is only sent when the song changes. When there is metadata, the format is completely different from that seen at the top of the file, for example:

StreamTitle='Final Fantasy 8 - Nobuo Uematsu - Blue Fields';StreamUrl='';

From this, we can see that each name-value pair is separated by semi-colons, that an equals sign is used to separate name from value, and that the values are surrounded by single quotes.

The out-of-band standards

Icecast and Live 365 streams offer some metadata, but don't do so by sending anything in the media stream itself.

Both of these servers send an initial collection of name-value pairs in the http response headers, although the only thing interesting in Live 365's case is the pair "Server:Nanocaster/2.0", which is the only way we know the name of their server software.

Icecast streams, though rare, are much more interesting, offering headers that describe the stream's title, home page, and even the physical location of the server, as in this example from the CalArts School of Music:

Server: icecast/1.3.12
Content-Type: audio/mpeg
x-audiocast-location: Valencia CA
x-audiocast-admin: tre@shoko.calarts.edu
x-audiocast-server-url: http://shoko.calarts.edu
x-audiocast-udpport: 8000
x-audiocast-mount: /som
x-audiocast-name: CalArts School of Music
x-audiocast-description: Default description
x-audiocast-url: http://shoko.calarts.edu
x-audiocast-genre: Classical Jazz Experimental World Rock
x-audiocast-bitrate: 32
x-audiocast-public: 1

An interesting response header here is x-audiocast-udpport. It is only sent if the client includes an x-audiocast-udpport in the request headers, indicating the client's interest in receiving metadata updates via UDP packets (the client can indicate what port it would prefer to listen on, but the response doesn't necessarily match the request). jicyshout makes such a request, but work on code to read those messages has only just been started... in other words, it doesn't work yet.

Nevertheless, there's an interesting comparison to be made between the three major metadata systems:

ID3 tags are carefully crafted to work with MP3 streams, but how would they be included in another streaming-audio format?
Shoutcast's "icy" in-stream tags could work with any streaming format, but is it a good idea to totally corrupt the stream like that?
Icecast's UDP dodges the stream issue entirely, but by assuming it's running over IP, it's no longer suitable for simpler connections like serial ports.

Live 365 also offers information about the track it's streaming, but only to listeners using the site's popup window. The playlist is in a frame inside this window, pulled with a URL like http://www.live365.com/pls/front?handler=playlist&cmd=view&handle=handle&site=.., where the handle is a short name for the stream. Where do you get the name from? If Live 365 sends you a .pls file when you click the play button in their stream guide, a file1 URL in the .pls contains a session variable that appears to be the needed handle. Theoretically, you could screen-scrape this html to get the currently-playing item, but any changes to the html layout would break your code, to say nothing of whether Live 365 would tolerate this kind of reverse-engineering for long.

jicyshout's metadata support

To see how these tags are parsed and represented, start with a quick glance of the net.sourceforge.jicyshout.jicylib1.metadata package. This defines an abstract MP3Tag class, which is just a simple name-value pair. Concrete subclasses identify the source of a tag: ID3Tag, IcyTag, HTTPHeaderTag, etc. Note that jicyshout calls these pairs "tags", as a way to abstract away the differences between ID3 frames, http headers, etc.

The StringableTag interface indicates that a value is a String or has a meaningful toString(), as opposed to the as-yet unimplemented Imageable tag, which would be used for ID3's graphic frames.

Back in the jicylib1 package, a MP3StreamFactory class is responsible for figuring out if a stream has embedded tags, ie, ID3 or Shoutcast metadata. If so, it returns a suitable stream, such as the ID3InputStream or the IcyInputStream. Look through those classes for implementations of parsing metadata from input streams. One thing you might notice is that the Shoutcast parsing is arguably easier, since we always know how many bytes we can read before the next block of metadata. For ID3, we always have to search through every byte[] we read for the ID3 string that indicates an ID3 frame header.

Both of those streams implement MP3MetadataParser, an interface to provide TagParseEvents when metadata is parsed. Typically, though, the SimpleMP3DataSource is the only object that gets events from the stream, but he's also an MP3MetadataParser, re-firing the events he receives from the stream while also firing events for tags parsed from http headers and theoretically from other sources.

In practical terms, this means a caller that creates a SimpleMP3DataSource as part of setting up a Player can also use the object to get metadata by calling getTags() to retrieve metadata that was parsed while setting up the streams, and by adding a TagParseListener to handle new metadata as it arrives.

The net.sourceforge.jicyshout.gui.SimpleDevGUI shows a straightforward use of this, putting parsed MP3Tags into a JTable. Part I of this series had a screenshot of a playing a Shoutcast stream and displaying its metadata, so here is what it looks like when you hand-enter a file:// URL to a local file with ID3 tags
jicyshout dev GUI with local ID3 file

There's much to do on jicyshout, but hopefully this series has helped to show that streaming MP3 audio is not only possible in the Java Media Framework, but also straightforward to implement and capable of some nifty tricks.