Music Information Retrieval Music Digital Library Systems Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Music Information retrieval is the science that allows information to be retrieved from music. Computational methods include classification, modelling, clustering, pattern-matching and retrieval of music. Music Digital Libraries are collections of sources, resources and services of digitally stored music. A relevant document should describe specific MIR/MDL topics in any context, ranging from music retrieval methods to music information needs, from musical data extraction or systems evaluation methods.

Documents concerning music analysis, representation, music perception and cognition as well as other similar metrics are relevant. Documents concerning advanced signal or natural language processing and speech retrieval are considered irrelevant. Relevance judgements of documents retrieved will be performed by the same person who pre-defined the topic and converted it to queries, that is myself. This systems-centred evaluation will use 2 queries, one 'natural language' and one 'Boolean' query to examine Factiva, Altavista, Dialog Classic and Metacrawler in a total of 7 searches.

A basic analysis of the facets is shown on the table above. Information retrieval is the science behind the storage, organisation and access of information relevant to a user's query. Digital libraries are collections of electronic documents, in the case of this topic, music recordings. However, despite the keyword's definition and thesaurus, in the context of the information need for the topic, both terms are used unchanged for the initial search as they seem very popular in information and library science contexts.

Search strategy for Boolean systems:

For the purpose of building a Boolean query, since most of the keywords (facets) are terms I used the Quick-search method with a query made up of only the keywords. The initial search service I chose was Factiva because of its nature of service, more news and business information, is the most unlikely to retrieve relevant information:


music AND ((information AND retrieval) OR (digital AND library))

Using Quicksearch1 query on Factiva's Free Text Search, duplicate article identification on, and all dates returned 73333 publications. After sorting by relevance I realised that most of the top retrieved documents were irrelevant or partially relevant. Only 2 out of the first 10 documents were relevant. The partially relevant documents referred to digital libraries. Only 2 out of 10 documents referred to any kind of information retrieval. This, and the fact that I got a huge amount of returned documents made me refine my search into something maybe simpler but definitely more specific.


music information retrieval OR music digital library

Using Quicksearch2 query with same option as initial search returned 138 documents. After sorting by relevance, Most documents were relevant or partially relevant. Partially relevant documents mostly referred to news about symposiums and conferences in the field of MIR. Some documents were partially accessible (abstract only). Other documents contained links to relevant documents.

Using Factiva's Search Form the following search was performed for all dates and duplicate article identifier ON:

All of these words: Music

At least one of these words:

None of these words:

This exact phrase: Information retrieval, digital library

This returned no results. Query alteration was needed. Since 'digital' and 'library' can be referred to music, as in 'digital music' and 'music library' I decided switching keywords to create the following query:

All of these words: Digital, library, music

At least one of these words:

None of these words:

This exact phrase: Information retrieval

232 results are sorted by relevance. Considering the top 10 results, some relevant identical to the ones found in free text search, and other irrelevant documents.

My information need required something more specific to 'information retrieval' as it is a very wide subject. By investigating some of the relevant articles the term 'music information retrieval' occurred which might prove to be a more suitable definition to limit the retrieved documents and increase relevance.

All of these words: Digital, library

At least one of these words:

None of these words:

This exact phrase: Music information retrieval

This query retrieved 12 results which were sorted by relevance shown in Appendix. This is the final query to be used by all systems. The whole process of deriving the final query to be used in the evaluation consisted of different strategies mixed together. Initially, the quicksearch strategy was followed to derive a basic boolean query. The first queries returned results similar to the 'natural language' query. This made me decide to create a query with two exact terms, combined with the word 'music', a tactic similar to building blocks. After getting no results, most of the terms were divided to different blocks. By browsing through the retrieved documents to enhance my Anomalous State of Knowledge (Belkin) I came across some relevant societies and projects (ISMIR, IMIRSEL, MIREX). By mixing the two strategies together and through initial searching, feedback, citation pearl growing, further interaction and manipulation of the query the final query was derived.


music information retrieval AND digital AND library

Evaluation methodology & metrics

In Information Retrieval, the user is limited to the functionality and sources of the information system he's using. In order, for someone to fulfil his information need, he must focus search to information systems that are more likely to retrieve relevant documents to his context. Information System's evaluation is an empirical method to evaluate information systems, specifically search services. Which is best suited to the information need specified by the topic?

The type of this evaluation will be systems-centred. This means that many 'real' variables are eliminated, like time taken to conduct experiment or cost of experiment. In addition, other variables are held constant, like the pre-defined topic converted to queries. 'Natural language' and 'boolean' queries are static and are used on all search services in the same way. The number of experiments is predefined in the coursework specification. Additional filters like industry, region or language were not used but results were sorted by relevance on all systems.

The nature of Systems Centred evaluations is quantitative therefore some metrics must be calculated. Standard evaluation measures, precision and recall, are used to measure the relevance quality of systems. Precision is the ratio of relevant documents among documents retrieved. In other words, it measures the relevance of results. Recall is the ratio of relevant documents retrieved among all relevant documents. This means it measures the ability of the system to detect relevant documents.

For the purposes of this evaluation, precision will be calculated at the top 5 and 10 results to evaluate the deteriorating accuracy of results as the retrieved collection grows. Recall, however, cannot be calculated since its unrealistic to be able to determine how many relevant documents exist in each of the systems used. Average precision (EAP) is always used along with the precision measure and is very popular in Information Retrieval research.

This can be considered a Laboratory type evaluation, as there are no users involved and the binary relevance judgements is made by the researcher. Besides the binary relevancy relationship of documents-topic, precision and EAP, additionally, the following metrics will be considered:


repeated documents,


broken links,


not retrieved and



These diagnostic measures are required to provide more information about result besides the binary relevance relationship of documents and information need. They are used for investigating the reasons why irrelevant documents are retrieved.

Repeated Documents

Documents that are identical, similar or from same source on web-search are considered repeated documents. Repeated documents can be considered relevant or irrelevant. Where available, similar documents are identified and treated as a single document (Factiva). Otherwise, if the repeated document is relevant, the first document is considered relevant while the following ones are judged as irrelevant.

Broken Links

Broken links are links to documents that access is not possible due to incorrect file path.

Not Retrieved

Documents that are inaccessible due to restricted access. This can occur for a number of reasons like document file format, special authorization/subscription required or links to other inaccessible information sources.


Sponsored links and advertisements are considered 'spam'. In addition, documents linking to commercial sales websites such as Amazon are also 'spam'.

Evaluation Experiment

Two queries, 'natural language' and 'boolean', were used to perform search on 4 different systems, Factiva, DialogClassic, AltaVista and Metacrawler. The combination of the two will deduct 7 sets of searches, two for each system, except DialogClassic that only supports 'boolean' queries. The metrics mentioned on the evaluation's methodology were calculated and the results are displayed in the table below:

P @ 5 P @ 10 EAP Repeated Broken Not Retrieved Spam

AltaVista 0.600 0.600 0.513 0 0 1 0

NATURAL Factiva 0.000 0.000 0.000 0 0 0 0

MetaCrawler 0.200 0.300 0.092 1 0 0 7


Factiva 0.600 0.500 0.323 0 0 1 0

AltaVista 0.200 0.400 0.198 5 0 0 0

MetaCrawler 1.000 0.800 0.733 2 0 0 0

DialogClassic 1.000 0.900 0.900 1 0 0 0

For the purposes of this topic, the best search service is DialogClassic. It measures 1 in precision at top 5 results, meaning they are all relevant. Precision at 10 deteriorates very little down to 0.9 with Estimated Average Precision of 0.9. Of course, DialogClassic only offers 'boolean' query support therefore the accuracy of the search is depended on how well the query was created. Metacrawler, a meta-search engine that searches services including Yahoo, Google, Bing and Ask, compares very close to Dialog on 'boolean' search. With the same precision at 5 as, it only drops to 0.8 precision at 10 and 0.733 ETA. By looking at the diagnostic measures, DialogClassic retrieved 1 repeated document while Metacrawler retrieved 2. Even if the documents were relevant, they were considered duplicates thus reducing the precision measures. Therefore, if the duplicate documents are actually relevant the two search services DialogClassic and Metacrawler would score the same.

Altavista seems to under-perform when used for 'boolean' search compared to 'natural language'. Under 'boolean' search, Altavista returned 5 repeated sources. Although most of the links were relevant, they came from the same domain (source) and therefore were counted as irrelevant, greatly reducing the precision measures. 'Natural language' Altavista search returned more relevant results from different sources which adds greater value. However, Altavista was the exception as the rest services perform better in 'boolean' searches. Factiva doesn't retrieve any relevant document when the 'natural language' query is used while for 'boolean' search it scores an EAP of 0.323.

Considering Factiva is mostly business news which is more likely irrelevant to the topic, its score is satisfactory. Metacrawler, when used for 'natural language' queries, is greatly influenced by other service's advertisement, more specifically Google's sponsored ads, thus reducing its EAP score down to 0.092 with 7 links considered as 'spam'.

The comparison between online and web search is not very distinct from this experiment because the two online systems (Factiva and Dialog) come in contrast. Web search can be used for retrieving information in any context. Online search is very different. Each online search system is an index of databases or collections of records. For satisfying information needs, the correct choice has to be made on which information sources and system contain the most relevant information. For the purposes of this topic, Factiva is the least and DialogClassic the most useful search service. The table below show comparison of web vs online search top searches.

web AltaVista 0.600 0.600 0.513

MetaCrawler 1.000 0.800 0.733

online DialogClassic 1.000 0.900 0.900

Factiva 0.600 0.500 0.323

The experiment indicates that meta-search is better when used for 'boolean' search than any form of other form of web-search ('boolean' or 'natural'). This is not the case when used in 'natural language' because it seems to inherit some of the advertising links offered by the other services it searches in.

In conclusion, online search is the most efficient way to retrieve information if targeted successfully to relevant subjects, fields or industries. If otherwise, it can lead to irrelevant and confusing information. Web-search is the fastest and easiest way but not the most accurate and precise. Metasearch provides the good and bad features of multiple search services. Information can be retrieved correctly only if a well structured query is created.

Part 2: Research Topic

Title: Music Information Retrieval in Music Digital Libraries : how is content-based music

information retrieval methods used in digital libraries


The purpose of this paper is to investigate how content-based music information retrieval is used in digital libraries. What musical content is used for music information retrieval? How do different levels of this content affect the search? In order for effective information retrieval, music information is extracted in the form of features. The paper examines the various methods of feature extraction as well as how different features are used for specific retrieval purposes. It includes all forms of music information representation, including musical notation and audio recordings. A brief view on how music information retrieval systems are implemented in digital libraries in addition to how digital libraries scale to the growth of music is also discussed.


Information retrieval is the science of searching for information in documents. Digital Libraries are collections of sources, resources and services. Records are kept in an electronic format accessible by computers. Information retrieval and digital libraries are closely related since information retrieval mechanisms are embedded within digital libraries to provide the functionality and effectiveness of searching. Music Information Retrieval (MIR), is a science involving many disciplines, computer science, mathematics, library & information science, cognitive psychology being the basic ones and its aim is to search through musical information.

In MIR, there exist two methods of retrieving musical information, using content-based or metadata search techniques. Meta-data search, similarly to text retrieval, consists of text information about the music like the artist's name, song title or record label. Music information retrieval is mostly depended on meta-search methods as its the most common retrieval technique used by all music digital libraries.

Content-based music retrieval involves searching for information in the actual music, for example querying by humming, melody/harmony matching, or even by mood, based on music perception and cognition. Have you ever heard music and wanted to find more information about it without knowing any meta-data? Haven't you ever wanted to listen to similar kind of music to what you listen to now? Beneficiaries are not only end-users who are given the opportunity to use advanced querying methods for searching, but also music professionals, like performers, teachers, copyright lawyers and producers. MIR contributes to machine learning and automated systems for musical feature extraction, pattern matching and automatic music identification.

Overview of major issues & research areas:

Features and Specificity

There are various ways of describing music by its characteristics. This music characteristics are described by extracting information as features from music. MIR's challenge is to transform music audio to representations that are easily interpreted, manipulated and searched by humans. This introduces new ways of indexing musical information found in digital libraries leading to improved retrieval. In the context of music information retrieval, these features are divided by specificity.

Specificity is a measure of how specific information is. On its highest level, it means that information is very specific, similar to exact matching, while on its lower level, it specifies a broader description for information. In the context of Music Information Retrieval, when searching using high specificity features, an exact match of audio signal content will be retrieved. On the medium level, high-level musical features are extracted for querying such as melody, and only these are compared to the audio signal. Low specificity features are widely used to compare more ambiguous musical term, usually genre.

According to what the purpose of retrieval is, different specificity-level features are used. The table below indicates common tasks in MIR according to specificity:

High Music







Medium Melody Identical


Performer Sounds like Performance


Low Recommendation Mood Style/Genre Instruments Music/Speech


High-level features use musical concepts like melody and harmony. These features are measured by extracting the melodic content straight from a music recordings thus enabling end-users to perform their query by singing or humming. Many commercial websites use such methods nowadays like and

Low-level features are also measurements on the information contained in the actual audio signal.

However, it is impossible to describe a single aspect of music from audio because of the trade-off between audio, where noise and effects exist, and high-level features. Most low-level audio features are based on the short-time spectrum of audio signals to overcome such boundaries.

One of the key challenges in MIR is the extraction of key features from any kind of music representation, especially audio. The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) through its Music Information Retrieval Evaluation eXchange (MIREX) provides a solid framework for evaluations of MIR systems using these high-level features. Due to the large number of music recordings, a tendency exists of extracting high-level music features from low-level audio content.

Music Analysis

Beat-tracking is the automatic estimation of temporal structure of music. Features describing the temporal structure include: beat, tempo, rhythm and meter. Can be used for querying and retrieval (, automatic classification, music recommendation and playlist generation (iTunes Genius). Beat-tracking depends on the temporal structure of music, therefore, changes in tempo or rhythmic complexity can affect its effectiveness.

Melody is the strongest indicator for music identification, while bass is responsible for the harmony,

according to music perception. :It is melody that enables us to distinguish one work from another. It

is melody that human beings are innately able to reproduce by singing, humming, and whistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text: (E. Selfridge-Field, 1998). One problem faced by MIR is the difficulty of distinguishing such features in multi-channel stereo audio signals.

Chord and Key Recognition systems use Hidden Markov Models, which are also very popular for

speech recognition. Bello and Pickens created a chord recognizer, by training the Hidden Markov Models on the Expectation Maximization algorithms in combination with their own knowledge, which extracted harmonic content from audio signals with an accuracy of 75%.

Music Structure is segmented depending on what classification is required by the MIR system. In order to improve similarity measures in automated feature extraction, music information segments that are not representable for the track, like intro and outro, are excluded. Segment in music information is an internal part of a track that has similar features such as timbre or instrumentation. This segmentation method, although simple it can be ambiguous because various divisions and segmenting criteria might exist depending on listener's perception and cognition (Saffran 1999).

Symbolic similarity:

String-based similarity methods for monophonic melodies use strings to represent music information such as sequences, contour and pitch. Each character describes one note. String-matching algorithms are quite well known and implemented in the cases of monophonic melodies since notes are ordered in time. Standard string-matching algorithms are used by Themefinder, which is only able to perform an exact match search. More advanced services, like Musipedia use regular expressions to calculate distance between query and database data to provide an approximate matching functionality to search.

Geometry-based similarity methods for polyphonic melodies consider music as sets of events that carry the following properties: onset time, pitch and duration. This method is especially good for polyphonic melodies whose notes are not ordered in time. Exact matches are supersets of the query found in the library. Approximate matches involve all subsets of the query and retrieve supersets of those subsets. In addition, weightings can be calculated according to music's segmentation, the chorus is of more value than intro or outro. Other statistical and transportation distances measurements are suggested by Typke et al. and Ukkonen et al. in their papers of the International Conference of Music Information Retrieval 2003.

The New Zealand Digital Library MELody inDEX (MELDEX):

MELDEX is a music digital library system that enables users to retrieve music by specifying the query in singing a few bars. MELDEX ranks results according to relevance of matching and retrieves music in any representation, musical notation or audio downloads in many formats. Melody is transcribed with pitch tracking, note segmentation and pitch representation. In addition, the system is able to adapt to each user's tuning, meaning that even if the query is expressed on a different pitch/key the system will be able to transpose it and retrieve matching results. Features used in MELDEX don't consider the key in which the notes are played. They are derived as pitch ratios or musical intervals. The size of music intervals is irrelevant. What is measured is the direction of the intervals, as in melodic contour and for its representation, Parson's Code is annotated. MELDEX basic tasks are:


representation and indexing of musical material,


integration of traditional bibliographic systems with advanced MIR tools,


managing intellectual property rights for producers and users.

The MIR/MDL evaluation:

The International Music Information Systems Laboratory Project (IMIRSEL) provides resources and materials required for development and evaluation of Music Information Retrieval and Music Digital Library technologies. The main purpose of this project is to create large-scale collections of music, including metadata, audio recordings and symbolics, which are used by the research community to perform evaluations and comparisons of different the different approaches the field.

IMIRSEL is divided into two distinct yet inter-related components:


The Virtual Research Labs (VRL) using Music-to-Knowledge (M2K) project.

Based on the NCSA's (National Center for Supercomputing Applications) D2K (Data-to-Knowledge), Music-to-Knowledge (M2K) is a machine learning environment that provides a framework for complex MIR/MDL evaluations. Other relevant tools used are:


Weka : a collection of machine learning algorithms for data mining


Marsyas (Music Analysis, Retrieval and Synthesis for Audio Signals) framework for

audio processing


Matlab : algorithm development, data visualisation & analysis, numerical computation


For more tools check


The Human Use of Music Information Retrieval (HUMIRS)

This subproject was created to focus on the music information seeking behaviour examining real-world examples. Its purpose is to provide the IMIRSEL with evaluation tasks that can be deployed in reality, making the evaluation measures more meaningful.


In 2006 and 2007 MIREX conducted evaluations on low-specificity audio similarity. Other research and evaluation was done using high-specificity features (Allamanche et al. 2001). Best-performing system used low-level audio matching methods rather than extracting a high-level melody feature from audio. However, different features are more efficiently used for specific tasks in MIR. For example, beat-based segmentation improves cover song identification.

One of the major issues in MIR/MDL is the exponential growth of available tracks on the web. This

accelerates information overload. In contrast with meta-data search which is greatly affected by this

excessive information, content-based retrieval seems more able to cope with big digital libraries. Furthermore, in order for meta-search to be effective, musical meta-data has to be tagged in the library by a person with advanced knowledge and expertise in music.

Yet music is very ambiguous as a difference exists in every person's perception. Depending on this perception, different people might consider different tactics of structuring their query. However, content-based search satisfies the user's Anomalous State of Knowledge (Belkin) because it is easier to identify music one doesn't know any information about. The actual content of music can help them find information about it.

Content-based information retrieval techniques increases the accessibility of music digital libraries greatly, since it removes the linguistics and cultural barriers in query formation from users. Content-based retrieval can understand the semantics in musical data and is able to retrieve matching results. Additionally, by using machine learning algorithms, library repositories can be trained to extract music information and index it automatically.