Research in Modeling Musical Knowledge

We visit Masataka Goto of Waseda University's Muraoka Laboratory

Computing Japan first visited Tokyo's Waseda University in spring 1995 while researching "Computer Education at Japan's Universities" (May 1995, p. 19), and then again for a feature article on "Computer Science R&D in Japan" (July 1995, p. 29). We return once more to this prestigious university to examine a fascinating project underway at a lab within Waseda's School of Science and Engineering.

by Steven Myers

For most of us, the first things that come to mind when we think of computers and music are synthesizers, samplers, and MIDI. A whole new genre of music based on these components surfaced in the 1980s, and the technologies involved have permeated into virtually all areas of modern recording and performance.

In the past few years though, an increasing amount of research -- led by such institutions as MIT, Carnegie-Mellon, and the University of Edinburgh -- has been conducted in a slightly different (but related) vein: using artificial intelligence (AI) techniques to model musical knowledge. Technical papers describing systems for "intelligent" musical representation and recognition are garnering considerable attention not only from the computer science and music communities, but from researchers in psychology and human cognition as well.

To learn more about research in this intriguing field, Computing Japan consulted Masataka Goto, who for the past three years has been at work on a system that can understand musical signals and events in a human-like fashion. A doctoral candidate conducting his research at the Muraoka Laboratory of Waseda University (one of the best known and most respected research institutions in Japan), Goto has been active in computer music research since 1992, when he first joined the Muraoka lab as an undergraduate.

As an initial step in building his system, Goto has developed a subsystem for recognizing and understanding music at its most basic level: the beat. He has chosen to work with raw acoustic signals in real time (rather than pre-formatted data, such as MIDI streams) from the very beginning. Goto contends that a useful system for understanding music in pre-formatted form -- however advanced it may be -- would prove exceedingly difficult to convert to one capable of handling real-time acoustic music.

Tracking the beat

Goto's BTS (Beat Tracking System) is capable of accurately pinpointing the beat in most popular songs that have a 4/4 time signature and tempo in the range of 78 to 167 MM. The beat information generated by the system is sent out over an Ethernet network, where it can be used by other applications to synchronize their own events with the music. In a highly entertaining demonstration that Goto performed for us, his BTS used the beat information from a popular song to control the movements of a "virtual dancer" displayed on another workstation, so that the dancer moved in time to the song as it was playing. Goto also showed how a separate workstation with a MIDI instrument connected to it could be used to generate other sounds such as handclaps and drum beats in time to the music.

As figure 2 illustrates, BTS works in the following manner:

1. The analog signals of the acoustic musical source are converted to digital data on a Sun SPARCStation 10 workstation. Each signal is digitized at 16 bits/22.05 KHz, then divided into blocks of 256 samples. These blocks are the fundamental units for all processing done by BTS.

2. Each block is sent via SCSI to a distributed-memory Fujitsu AP1000 parallel computer (with 64 processing nodes) for analysis.

3. Separate processes running in parallel on the AP1000 parse the blocks for "sound events," examine the time from each event to the next event of similar type, associate a beat with the event, and try to predict when the next beat will occur based on that information. For example, after observing the interval between the onset times of two events, a hypothesis is generated that predicts another event to occur after the same time interval.

4. Using the information for onset times of events, 30 separate "agents" generate beat predictions that are coordinated by a separate manager process. Having several agents responsible for prediction generation and checking in parallel ensures that a correct prediction will be generated without significant delay. (Imagine how much longer the task would take if one prediction had to be formulated and followed to completion before another could be generated.)

5. Finally, after coordinating the information gathered from several predictions that have proven to be correct, the Manager formulates a "beat information" (BI) block, which is sent via the Ethernet as an RMCP (remote music control protocol) packet. (RMCP is a UDP/IP protocol for communication between clients and servers on a distributed system that integrates MIDI and LAN.)

Remember that this is not an archival analysis; it all happens in real time. The system makes significant use of the signals generated by the bass and snare drums (represented by "Detect BD and SD" in the figure). Goto's system makes use of the fact that in much of popular music -- especially in rock songs -- the bass drum occurs on the strong beats (1 and 3) while the snare drum occurs on the weak beats (2 and 4) of a measure. This frequency pattern for bass and snare drums is stored in the system's knowledge base as part of its musical intelligence, and is used to recognize and extract the time data for these events when they occur in the song.

At this point, it is reasonable to ask, "Why not just cue off the bass and snare in order to track the beat?" This method does not work accurately, says Goto, because the acoustic signals for all events are generally noisy; no single event can be used in isolation to reliably detect the beat. Rather, data must be gathered from many different events and compared in order to generate a correct prediction. In Goto's BTS, the temporal information taken from bass and snare drum occurrences is used primarily for distinguishing between strong and weak beats.

Possible applications and future plans

The possibilities for practical and commercial application of Goto's work are many and varied. During our visit, Goto pointed out the obvious use for such a system in multimedia applications such as video editing (synchronizing video to music) and automated stage lighting for live performances. Also, in audio recording, beat tracking would allow music to be indexed automatically in such a way that users of the recording system can deal with acoustic signals as a set of beats rather than as raw wave data.

The broader concept of modeling human musical intelligence on a computer is also highly intriguing. Consider, for instance, the possibilities for musicians and music educators. Wouldn't it be extremely useful for a musician, especially one who improvises frequently, to be able to catalog his own collection of riffs (musical phrases) in an intelligent database that is able to perceive relationships between those riffs, and thereby be able to construct entire solos by mixing and matching riffs from the database? With such a system, it would be much easier to construct larger musical concepts than otherwise might occur to the musician. Such a database would also allow students of music and improvisation to discern more clearly how higher level musical groupings are built from more primitive components. Additionally, the student would be able to see how a riff he has been using in a particular context could also be used in completely different contexts.

This area of musical knowledge modeling has so far been one of the least explored in computer science and artificial intelligence research. The possibilities for application of such research to commercial products, however, are immensely fascinating, and could well prove highly lucrative for all types of companies involved in the production of audio and musical equipment.

And where will Goto focus his energies next? He has demonstrated a working system that can perform beat tracking within a limited domain (his experiments have yielded beat tracking accuracy on 42 of 44 popular songs). Next, he intends to focus on improving the musical knowledge base of the system, as well as the interaction among the prediction agents. He will also explore the application of his BTS to other multimedia systems, such as the "virtual dancer" implemented at Waseda.

Contact information

Masataka Goto

Muraoka Laboratory, School of Science and Engineering,
Waseda University, Tokyo
Phone/fax: +81-3-3209-5198