Jump to content

Music information retrieval

From Wikipedia, the free encyclopedia

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence, or some combination of these.

Applications

[edit]

Music information retrieval is being used by businesses and academics to categorize, manipulate and even create music.

Music classification

[edit]

One of the classical MIR research topics is genre classification, which is categorizing music items into one of the pre-defined genres such as classical, jazz, rock, etc. Mood classification, artist classification, instrument identification, and music tagging are also popular topics.

Recommender systems

[edit]

Several recommender systems for music already exist, but surprisingly few are based upon MIR techniques, instead of making use of similarity between users or laborious data compilation. Pandora, for example, uses experts to tag the music with particular qualities such as "female singer" or "strong bassline". Many other systems find users whose listening history is similar and suggests unheard music to the users from their respective collections. MIR techniques for similarity in music are now beginning to form part of such systems.

Music source separation and instrument recognition

[edit]

Music source separation is about separating original signals from a mixture audio signal. Instrument recognition is about identifying the instruments involved in music. Various MIR systems have been developed that can separate music into its component tracks without access to the master copy. In this way, for example, karaoke tracks can be created from normal music tracks, though the process is not yet perfect owing to vocals occupying some of the same frequency space as the other instruments.

Automatic music transcription

[edit]

Automatic music transcription is the process of converting an audio recording into symbolic notation, such as a score or a MIDI file.[1] This process involves several audio analysis tasks, which may include multi-pitch detection, onset detection, duration estimation, instrument identification, and the extraction of harmonic, rhythmic or melodic information. This task becomes more difficult with greater numbers of instruments and a greater polyphony level.

Music generation

[edit]

The automatic generation of music is a goal held by many MIR researchers. Attempts have been made with limited success in terms of human appreciation of the results.

Methods used

[edit]

Data source

[edit]

Scores give a clear and logical description of music from which to work, but access to sheet music, whether digital or otherwise, is often impractical. MIDI music has also been used for similar reasons, but some data is lost in the conversion to MIDI from any other format, unless the music was written with the MIDI standards in mind, which is rare. Digital audio formats such as WAV, mp3, and ogg are used when the audio itself is part of the analysis. Lossy formats such as mp3 and ogg work well with the human ear but may be missing crucial data for study. Additionally some encodings create artifacts which could be misleading to any automatic analyser. Despite this the ubiquity of the mp3 has meant much research in the field involves these as the source material. Increasingly, metadata mined from the web is incorporated in MIR for a more rounded understanding of the music within its cultural context, and this recently consists of analysis of social tags for music.

Feature representation

[edit]

Analysis can often require some summarising,[2] and for music (as with many other forms of data) this is achieved by feature extraction, especially when the audio content itself is analysed and machine learning is to be applied. The purpose is to reduce the sheer quantity of data down to a manageable set of values so that learning can be performed within a reasonable time-frame. One common feature extracted is the Mel-Frequency Cepstral Coefficient (MFCC) which is a measure of the timbre of a piece of music. Other features may be employed to represent the key, chords, harmonies, melody, main pitch, beats per minute or rhythm in the piece. There are a number of available audio feature extraction tools[3] Available here

Statistics and machine learning

[edit]

Other issues

[edit]

Academic activity

[edit]

See also

[edit]

References

[edit]
  1. ^ A. Klapuri and M. Davy, editors. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, 2006.
  2. ^ Eidenberger, Horst (2011). "Fundamental Media Understanding", atpress. ISBN 978-3-8423-7917-6.
  3. ^ David Moffat, David Ronan, and Joshua D Reiss. "An Evaluation of Audio Feature Extraction Toolboxes". In Proceedings of the International Conference on Digital Audio Effects (DAFx), 2016.
[edit]

Example MIR applications

[edit]