Thursday, November 11, 2004

Today I feel complete. We just launched the last in a series of component improvements to the GarageBand system. Now, I can gladly say that the system works the way it was always meant to work. I'll bore you with the full details of the GarageBand system some other time, but let me explain this most recent improvement as an example that other music companies could follow.

GarageBand unleashed an enormous amount of intelligence and activity by creating a model to review and rank songs without the need for a centralized editorial staff. We developed a way for a large group of non-experts to do what takes a small group of experts much longer. The accuracy of the system requires subdividing songs into contextual categories of manageable size. But the problem was that all existing classification models needed to be managed by a small group of experts. Using one of the old models would just replace one bottle neck with another. So, we created a new paradigm: an Emergent Taxonomy of Music

What we've done should be familiar to those who are aware of the application of genomics to taxonomy in biology. Arguments in biological systemics center on the question: should characteristics (Linnean taxonomy) or evolutionary history (phylogeny) define the categorical relationship among organisims? Comparing genetic maps asks a more direct question: what are the relationships among individual organisims? The categories and the relationship between categories emerge from that.

One company, All Music Guide, has become the de facto musicologist for the internet by creating a fairly comprehensive database of Linnean categorization (styles, mood, instrument, theme, country). The centerpiece of their taxonomy is genre which incorporates all the Linnean factors with reference to an extensive music style history (the music maps). Ultimately, they first ask "What is Rap?" and then ask if a specific song fits that category. While this works fairly well for categorizing a historical body of material, the main problem of applying it to a new music catalog like GarageBand's is that the categories can't predict the new forms of music that are being created.

By contrast a new company, Music Plasma focuses entirely on the relationships between artists, eschewing categorization entirely. In a sense each artist is their own category and is connected in a web with other similar artists. Webs are not confined in the way categories are, because relationships aren't restricted by definitions of genres. While it is not a taxonomy really, the editors of Music Plasma have created a more fluid and relationship minded way of browsing music.

At it's core, GarageBand's new system is like Music Plasma, with one important difference. We have no editors to define the relationship between artists. This is partly by necessity. Our bands are so little known, so plentiful (10x what either of these other sites deal with), and so varied in quality that there is no way we could pay for a large enough editorial staff. Instead the artists themselves create the web of relationships, by writing in three artists they sound like and three artists they are influenced by, and name of the genre of music they play. You can browse from artist to artist via search results for the "sounds like" and "influenced by" and "genre" data.

Despite the difference in process, the new system shares functionality with AllMusic. In addition to browsing the web-like relationships between artists, we make available genre categorization. Again the big difference is there are no editors. In addition to writing in an exact genre name, artists select the closest fit from a prequalified list of genres. This prequalified list is generated automatically according to frequency of exact genres written in. In this way the genre taxonomy emerges organically from the relationships between songs themselves. Moreover, they emerge almost as they are created without the latency of an editorial cycle.

Ultimately, the new genre system is an empirical process rather than a theoretical construct. A genre doesn't exist because experts can describe it and derive it from the model they've already built, but because the categorical word is already in common use. The theoretical model is thus inferred and constantly re-inferred from the data.

You can see the new system in action at http://www.garageband.com/genre