The Lyrics Server

Michael Herf
October 2005, reflecting on events from 1994-1996

back

Some stories fade with time, and this is one that I've not written down in its entirety, though I've told it to friends and colleagues over the years. Already the exact dates and conversation are both fading a bit for me, but the gist of the story remains intact.

I had some interesting experiences about 10 years ago that seem more and more relevant every year, as I read about online music and copyright issues. I put up a website in 1994 called the "Vivarin Lyrics Server" and it provided lyrics to a large variety of songs online, totalling about 50,000 songs by 1996.

Back in 1994, I was a student at Carnegie Mellon University, and I had been a fan for a couple years of Dave Datta's ftp.uwp.edu. (That's the University of Wisconsin at Parkside, for those not up on Wisconsin geography.) At that time his archive was available via FTP and Gopher, and it contained a variety of media, including the lyrics to many songs. It was a very popular service on the Internet, and had been around long enough to predate the Web.

I had been playing with a small utility to mirror FTP archives—I'd been using it to mirror some early distributions of a UNIX-like operating system called Linux. So I ran this tool one night against Dave's server, and in the morning I had a complete mirror of the UWP lyrics archives, containing the lyrics to about 30,000 songs.

It seems obvious in retrospect that the thing to do was to put this information on the Web so it could be more widely accessed. The thing was, this wasn't exactly as obvious back then as it is today. Netscape was several months away from its 0.9 release, Yahoo! was less than a year old, and Mosaic was the popular web browser to use. But I decided that this was the thing to do anyway.

So it was one very late night of hacking during midterms in September 1994 during which I wrote a set of CGI scripts that could take Dave's archive and make it available for viewing on the web.

CGI itself was a pretty new thing, having been supported but not used much. Nobody was yet running perl scripts (let alone PHP), and all the samples NCSA provided were in C. I wrote some simple glue in shellscript and C that could display an index of songs, arranged by artist and song name. I guess there were other dynamic websites back then, but I didn't really know of any.

Around this time, I sent a note to the "Yanoff List", an early email list of new stuff on the web, and shortly afterwards, NCSA's "Best of What's New". There appear to be archives of these old descriptions here and here.

I also told lots of people I knew, and quite soon people were using the service daily. I made sure to leave that machine up for the holidays.

Over Christmas I wrote a small parser that could build an index of song titles and artist names and make them both searchable, displaying links to the appropriate pages. This index was simply a 600KB file (using front-end compression) that I read brute-force for each search. It worked really well.

This was the era of personal homepages and I started being linked, quite a lot. The server became reasonably popular. throughout 1995, and when I left for the summer, I convinced a friend to let me keep my PC in his office, and convinced CMU's sysops to point DNS there. I learned Photoshop from one of the artists at Microsoft (where I had an internship), and we made some graphics that became the banner for the server.

I had long been playing with ways to manipulate lots of words with computers—several years before I'd written a cute anagram program, and some other tricks. But when I got back to school in 1995, I holed up until I managed to build a fulltext search of the archives. It was a thing I really wanted to have. I knew nothing about the field of text indexing, so I pretty much made it up as I went along.

The thing was, the fulltext index didn't really fit the CGI stateless model very well. It really worked faster if you could load a chunk of data into RAM (you could save a lot of seeks, for one thing.) So Dave Meltzer, who was a great network hacker I knew, wrote me a simple webserver and helped make it serve all the search requests. It ran on the same machine on another port, sometimes running multiple fulltext searches per second!

In case that doesn't sound very impressive, I should mention that the machine I was running this system on was my Pentium-90 (a fast machine for the time), and it had 24MB of RAM. I also ran XWindows on it, compiled things, and did all my homework on it. I think I later upgraded it to have 32MB of RAM, which made XWindows a little faster.

The new search engine was really very popular, and CNET's search.com reviewed it pretty soon after I put it online, proclaiming it the "fastest search engine on the Web". It was pretty impressively fast, especially considering XWindows used most of the RAM in the machine.

Somebody told me that when the netops people at CMU ran their statistics, I was #2 in bandwidth usage at CMU, and #1 was Lycos. I was filling about a megabit/s with just text!

The strange thing about this time was the sort of fame it brought. For awhile, I had linked every page on the server back to my personal page, which contained a picture of me. During the summer, when I was in Seattle and at Microsoft in Redmond, I had three people I'd never met before come up and introduce themselves, just to say they liked the lyrics server and ask some questions about it. I'd never had an experience like that before.

But next, I guess it was May 1996 that I got a call from Grey Advertising. I remember that I was preparing to leave for the summer for my second internship at Microsoft, and it seemed very strange to be discussing these kinds of things at my parents' house in Indiana. I was talking about things that sounded important with a big ad agency in New York, and it seemed like they were a big deal.

The conversation started something like this: "Well, we went to build a website for Vivarin (we're their ad agency), and a web search turned up thousands of links pointing to your server!" What followed was a really enjoyable conversation, and a nice discussion of Vivarin sponsoring the server...I mean, how cool would that be? But the next day, I got another call, "Uh, no, our lawyers say we can't even think of doing that. You're lucky to be getting away with it."

So Vivarin rather informally and nicely asked me to stop using the name Vivarin for any popular service, and I agreed that I would. Now that I've been through the usual progression of this stuff in corporate America, it strikes me that this discussion was quite amicable and friendly.

I'd been in touch with Dave Datta for some time, and his archive remained intact at UWP. He'd just gotten a sponsor for the service, which surprisingly was AOL. They were starting to let their users roam free on the Web, and to allay some fears of the abuse and newbie-ish spam this would cause, they were sending hardware and general good feelings to people who ran interesting Web services. Dave was the lucky recipient of a big Sun SPARC-20 with a huge RAID-10, and he was getting a dedicated line to Chicago from Parkside. Wouldn't it be cool if we moved the web portion of the service back to UWP?

The timing couldn't have been more perfect. I needed a new name for the service, and running this service off my own PC was getting pretty taxing. I had already patched the Linux kernel several times to enable more TCP sockets due to the traffic I was getting, and it seemed that SunOS with an enormous quantity of RAM must just work better.

So I eagerly jumped on his new box and compiled all my code, and started pushing traffic his way. The system grew and grew. We had a mail archive of love letters from people, and we kept several thousand that said things like, "I was trying to figure out what this song was, and I searched and found it, and then I bought the album!" At one point, we had an 18,000 message backlog, and no way to reply to it all.

I'd talked with Dave Datta some about copyright, and one day he got a call from Warner Bros., a message on his machine when he arrived at work at 8AM central time. He told me later he thought "This is it. This is the day."

The guys he was supposed to talk to were on the west coast (Pacific time) and so Dave paced around, worried, for the entire morning, for the two hours before he could talk to them.

How did the call go? Well, the Warner Brothers lawyers were happy, thanked him up and down. Apparently these guys had used our "research service" to successfully prosecute a copyright case, and thanked us for providing such a wonderful service.

Crisis...averted?

But the thing to know is that in late 1996 the WIPO (World Intellectual Property Organization) was meeting, and they formed a set of International "Internet treaties". Among other things, these treaties required copyright holders to aggressively pursue infringement of copyrighted works online.

So in October, 1996, we received a cease-and-desist from the Harry Fox Agency, representing writers for ASCAP, BMI, and SESAC. Their letter stated that our archive hurt their market for tablature sales (basically, that people wouldn't buy sheet music if it was available on the Internet.)

We had a week or so to respond. I called up some lawyers, and they told us things like these: "Well, let's see, you're at a .edu address, you're providing a research service, and there's not a clear profit motive. You might have a shot, but it will take $500,000 and 5 years of your life to find out, and if you lose it will be incredibly expensive."

My uncle, a lawyer in Arizona, took the opportunity to inform me that the reason you never hear "Happy Birthday" in a chain restaurant is because the copyright holders aggressively pursue anyone who tries to perform the song publicly. It seemed like the right time to give up.

But during that next week before we took the service down, a thing happened that I could never figure out.

The search engine became suddenly "broken", and as explanation it provided a cryptic link allowing anyone to download a large TGZ file containing all the lyrics and all the code.

There were 17 full downloads of that file, and I guess this explains why there were quite similar services all over Europe the year after that.