Artificial Intelligence and Approaches to Music Education

Info: 5404 words (22 pages) Dissertation
Published: 12th Dec 2019

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Abstract

The goal of this paper is to review the principal approaches to Music Education with a focus on Artificial Intelligence (AI). Music is a domain which requires creativity, problem-seeking and problem-solving respectively, from both learner and teacher, therefore is a challenging domain in Artificial Intelligence. It is argued that remedial intelligent tutoring-systems are inadequate for teaching a subject that requires open-ended thinking. Traditional classroom methods are sometimes favoured because tutors can focus on individual differences and enhance creativity and motivation.

However, it can also be argued that AI is a mechanism which enables those without traditional musical skills to ‘create’ music. Almost the only goal that applies to music composition in general is ‘compose something interesting’ (Levitt, 1985). This paper will review different approaches to AI in Music Education. Approaches considered will be: Intelligent Tutoring Systems in Music; AI based Music Tools; highly interactive interfaces that employ AI theories.

1. Introduction

This paper will review some of the approaches to using Artificial Intelligence in Music Education. This particular field is of high interdisciplinary and involves contributions from the fields of education, music, artificial intelligence (AI), the psychology of music, cognitive psychology, human computer interaction, philosophy, computer science and many others. AI in education itself is a very broad field, dating from around 1970 (Carbonell, 1970) and has its own theories, methodologies and technologies. For brevity, we will abbreviate Artificial Intelligence in Education to AI-ED, following a standard convention.

Definitions

The scope of AI in Education (AI-ED) is not decisive, so it will be useful to consider some definitions. A common definition is: any application of AI techniques or methodologies to educational systems. Other definitions which focus more narrowly are, for example: any computer-based learning system which has some degree of autonomous decision-making with respect to some aspect of its interaction with its users (Holland, 1995). This definition suggests the requirement that AI techniques reason with the user at the point of interaction.

This might be in relation to best teaching approach, the subject being taught or any misconceptions or gaps in the student’s knowledge. However, AI-ED in a wider context is sometimes defined as: ‘the use AI methodologies and AI ways of thinking applied to discovering insights and methods for use in education, whether AI programs are involved at the point of delivery or not’ (Naughton, 1986). In practice, these contrasting approaches form a continuum.

Music: An open-ended domain

A useful distinction in AI-ED is between formalised domains and the more open-ended domains (‘domain’ means subject area to be taught). In relation to domains such as mathematics and Newtonian dynamics there are clear targets, correct answers and a reasonable clear and concise structure to follow for success. Whereas in open-ended domains such as music composition, there are in general, no clear goals, no set criteria to follow and no correct answers.

The focus is based upon, as mentioned earlier, ‘Compose something interesting’ (Levitt, 1985). Rittel and Webber (1984) describe this particular problem in domains as ‘wicked problems’. In such domains there cannot be a definitive formulation for the problem or the answer. Wicked domains such as music composition require learners to not just solve problems but also seek problems (Cook, 1994). The term problem seeking is used in a number of disciplines such as animal behaviour (Menzel, 1991). Cook (1994) imported the term into AI in Education in particular reference to the sense of philosopher Lipman (1991). In this sense Cook (1994) refers to the term ‘problem seeking’ as follows:

Problems are treated as ill-defined and open-ended

There is a continual intertwining of problem specification and solution

Criteria for completion is very limited

Context greatly affects the interpretation of the problem

Problems are always open re-interpretation and re-conceptualisation

In relation to expressive performing arts and music composition there is no goal or problem to be solved. The learner must find or create goals and problems which then may need to be revised, modified and rejected where best suited to his/her taste.

2. Computer-Aided Instruction

It is worth considering briefly the music education programs that negligibly use AI as a background to AI approaches in education. Historically, computers used in music, and most other subjects, were associated with the theory of learning behaviourism. These particular systems (branching teaching programs) stepped through the following algorithm (O’Shea and Holland, 1983),

Present a ‘frame’ to the student i.e.

Present the student with pre-stored material (textual or audio visual)

Solicit a response from the student

Compare the response with pre-stored alternative responses

Give any pre-stored comment associated with the response

Look up the next frame to present on the basis of the response

An example of this kind of system was the GUIDO ear-training system (Hofstetter, 1981). Branching teaching programs tend to respond to the user in a manner that has more or less been explicitly pre-planned by the author. Therefore, this tends to limit the approach to a simple treatment.

Multimedia and Hypermedia

Multimedia and hypermedia has had a great impact on music education and transformed music education software programs, giving a different emphasis from the earlier behaviourist programs. Recent educational music programs such as Seventh Heaven, Ear Trainer, Interval and Listen aim to provide practice in recognising or reproducing intervals, chords or melodies. MacGAMUT is a classroom simulation program that dictates exercises and provides a detailed marking scheme.

Other programs such as MiBAC Music Lessons, Perceive and Practica Musica offer a comprehensive ear training program including scales, durations, modes and tuning. See Yavlow (1982) for information on the aforementioned programs. Since the domain is relatively clear-cut and non-problematic, ear training and music theory are popular methods in non-AI music education programs. There are many useful musical computer tools applicable to education such as music editors, sequencers, computer-aided composition tools, multimedia reference tools on CD-ROM Masterworks and much more.

3. Intelligent Tutoring Systems for Music: A ‘Classical’ Approach

The history of AI in education can be divided into two periods, the ‘classical’ period (1970 – 1987) and the ‘modern’ period (1987 to present day). In the classical period, the three component ‘traditional’ model of an Intelligent Tutoring System (ITS) was the most common and influential idea. This model was sometimes extended to a four component model. After 1987, ideas had shifted to finding alternative ways around the traditional model. However, this was limited due to research available at those times, and the traditional model remains influential and is still used to the present day.

Each of the three components of the traditional model can be considered a separate ‘expert’ system’. The traditional ITS model (Sleeman and Brow, 1982) consists of three AI components, each an expert in its own area. The first component, the domain model, is an expert in the subject being taught. So in the case of a vocal tutor, the domain expert itself would be able to perform vocal tasks. This requirement is essential if the system is to be able to answer unforeseen questions in relation to the task in hand.

The second component is the student model. Its purpose is to build a model of the student’s knowledge, capabilities and attitudes. This will allow the system to vary its approach in accordance to the individual student. In essence, the student model can be viewed as a checklist of skills. This is sometimes modelled as an overlay i.e. a tick list of the elements held in the domain. Sophisticated models may view it as a deliberately distorted element or a faulty ‘expert’ system. These errors are intended to mirror a student’s misconceptions.

A fair diagnosis of a student’s knowledge, skills, capabilities and beliefs is often a hard problem in AI. One partial way around the diagnosis problem would be to ask the student about their capabilities, beliefs, previous experience and so on. A more stringent approach is to set the student tasks specifically designed to analyse their skills. The results can then be used to construct the student model.

The third component of the traditional ITS model is the teaching model. Typically, this may consist of teaching strategies such as Socratic tutoring, coaching and teaching by analogy (Elsom-Cook, 1990), to simply allowing the student to explore available materials unhindered, with or without the guidance of a human teacher. The fourth component is an interactive user interface for the tasks mentioned, if it is used. Note that not all Intelligent Tutoring Systems consist of all three components. It is common to have a central focus on one maybe two components, and omit, or greatly simplify the others.

In particular, most ITS’s in music focus on the expert or student model. Irrespective of the emphasis, ITS models require an explicit, formalisable knowledge of the task. However, many skills in music correspond to wicked problems and are resistant to explicit formalisation. This narrows the number of areas ITS models can be applied to in music education. An example area is Harmonisation. It is one of the few musical topics for which relatively detailed, rules of thumb can be found in a textbook. But even here, the traditional ITS model may not be effective.

There are two systems from the classical ITS period, which are good examples of the potential and limitations of the ITS approach in music, Vivace and Macvoice.

3.1 Vivace: An expert system

Vivace is a four-part chorale writing system, created by Thomas (1985). Vivace is not an ITS model in itself, yet has formed the basis of one. It takes an eighteenth century chorale melody and writes a bass line and two inner voices that fit the melody. It uses text from books, abstracted from the practice of past composers, to employ rules and guidelines for harmonisation. These rules can be categorized into four types: firm requirements, preferences, firm prohibitions, less firm prohibitions.

There are three specific problems which can be identified for any human or machine when trying to harmonise on the basis of the rules. The first problem is indeed common in beginners’ classes, to satisfy all the formal rules and produce a composition which is correct but aesthetically unsatisfactory. The second problem is that most of the guidelines are prohibitions rather than positive suggestions. Milton Babbit observes that ‘the rules…are not intended to tell you what to do, but what not to do’ (Pierce, 1983).

In other words, if we view harmonisation as a typical AI ‘generate and test’ problem, the rules constitute weak help in the testing phase, but little help in well focused generation. The third problem is that it is quite impossible to satisfy all of the preferences at any one given time. Some preference rules may have to be broken. A clear order of importance of preference rules is not assigned by traditional descriptions in fact, it is not at all clear that any fixed order would make sense.

However, it is possible to write a rule-based system that implements text book rules. In principle, a traditional ITS system can use these rules to criticise student’s work and serve as a model of the expertise they are supposed to acquire. In relation to the limits aforementioned, how useful or effective would such a tutor be? Thomas used the tutor to illuminate the limitations of the theory. By using Vivace, Thomas was able to establish that text book rules are an inadequate characterisation when performing such a task at expert level.

Thomas discovered using only conventional rules about range and movement the tenors voice would most certainly move to the top of its range and stay there. Thomas suggested that there must be a set of missing rules and metra-rules to fill theses gaps. He used a Vivace experimental tool to establish this gap. In each experiment Thomas had to use his intuition to decide upon whether the results were musically viable or not. Thomas discovered that many of the traditional rules were overstated or needed redefining. He also unveiled new guideline and was able to understand the task at a more strategic level. With the assistance f her human pupils, Thomas formulated a number of heuristics for ‘what to do’ rather than ‘not what to do’.

Experiments with Vivace enabled Thomas to realise the need to make human pupils aware of high level phase structure prior to detailed chord writing. As a result of her experiments, Thomas was able to use her new knowledge about the task, as a result of ‘teaching’ her expert system, and write a new teaching text book based on her findings. Part of this knowledge was used in a simple commercial ITS, which criticises student’s voice-leading (MacVoice).

3.2 MacVoice

MacVoice criticises voice-leading aspects of four part harmonisation. It is a Macintosh program based on the expert system Vivace. The MacVoice also includes a music editor as part of its interface. MacVoice makes it possible to input any note, any chord at a time or a voice at a time, or notes in any disconnected fashion. As soon as a note is placed on the stave, it will display its guess as to the function of the corresponding chord in the form of an annotated Roman numerical.

Three are two important limitations of this system as follows: firstly, all chords must form Homophonic blocks (all notes must be of the same duration); and secondly, the piece must be in a single key. There is one other menu function, called ‘voice-leading’.This particular function inspects the harmonisation in line with a set of base rules for voice-leading, indicating any errors. MacVoice is quite flexible to use.

MacVoice has been used practically at Carnegie Mellon University. MacVoice does not give positive strategic advice. It only points out errors. It does not address the efficiency or any other benefits of the chord sequences involved. Further research on this topic may include a visual display of what the voice-leading constraints are, or the possible preferred outcomes.

3.3 Lasso

Lasso was formalised by Lux (1725). It is an intelligent tutoring system designed for the 16^th century counterpart and is limited to two voices. Newcomb’s approach focuses on intending to provide simple and consistent guidelines to help students know what is required to pass exams. The process of codification of the necessary knowledge goes beyond that of text book rules and guidance. Like Thomas, Newcomb was aware of this, however, approached it using a probabilistic manner, analysing scores to find out such facts as ‘the allowable ratio of skip to non-skip melodic intervals’ and ‘how many eighth note passages can be expected to be found in a piece of a given length’ (Newcomb, 1985).

Also, the knowledge used for criticising students work is being coded as branch procedural code. There are also unvarying canned error messages, help messages and congratulatory messages. This will assist students, offering some form of motivation. Lasso is a very impressive system. It has a quality musical editor, tackles complex musical paradigm and has been used in real teaching contexts. However, there are some intrinsic problems. The rules are at a very low level, and there are a high number of them. There is a system rule which prevents over one hundred comments being made about any one given attempt to complete an exercise. For example, typical remarks made by Lasso include;

“A melodic interval of a third is followed by stepwise motion in the same direction.”

“Accented quarter passing note? The dissonant quarter note is not preceded by a descending step.” (Newcomb, 1985).

The quantity of relevant text required to put in help context of myriad low-level criticisms could easily overwhelm students. Students complained that it was so difficult to meet Lasso’s demands that they were forced to revise the same task repeatedly. A solution to this problem would be to incorporate general principles to govern the low-level rules. Using such codified principles will reduce the number of comments required to relevant text and generalise observations.

3.4 Concluding remarks on Intelligent Tutoring Systems: A ‘Classical Approach’

The traditional Intelligent Tutoring System approach assumes an objectivist approach to knowledge. Such systems depend on the assumption there is a well-defined body of knowledge to be taught and can be put into precise concepts and relationships. This works with four-part harmonisation and 16^th century counterparts. However, in a more open-ended context, an objectivist approach can be very limited. In domains which are artificially limited, teaching of rules drawn from practical experience tends not be a very good approach.

Using verbal definitions to teach a musical concept is limited and does not compare to the knowledge required to identify the true meaning of these definitions to be an experienced musician. It is all very well to define a chord, a dominant eighth in terms of its interval pattern and provide general rules but to an experienced musician the ‘meaning’ of a chord or a dominant eight is much more depending on the context. Being able to intelligently manipulate structures is far more important than to just being able to understand and obey a set of rules, which an experienced musician will be capable of doing so. Rather than just a set of explanations, a student needs a structured set of experiences making them more aware of musical structures, being able to manipulate them intelligently and most importantly, more capable of formulating sensible musical goals to pursue.

4. Open-ended Microworlds: The Logo Philosophy

A contrasted idea from the classical approach of AI in education, which is just as influential as the notion of an ITS is the Logo approach (Papert, 1980). The Logo philosophy has particular attractions to open-ended domains such as music. It focuses its approach on the idea of an educational microworld. An educational microworld is an open-ended environment for learning. Therefore, there are no specific built-in lessons. The Logo approach in associated microworlds does not need to involve much, or indeed any AI at point of delivery.

However, their designs tend to be strongly influenced by AI methodologies and tools. A simple version of AI programming language is used to build microworlds. Students are encouraged to write or modify programs as a means of exploring the domain. Logo doubles as the name of programming language based on Lisp, used for just this purpose. There are three distinct elements in the Logo approach: Logo (and similar languages) as a programming tool; Logo as a vehicle for expressing various AI theories for educational purposes; and Logo as an educational philosophy.

Firstly, we will briefly explore Logo as an educational philosophy. In its early work, Logo was mainly used for mathematics learning, poetry and music. One of the versions encouraged children to produce new melodies by rearranging and modifying melodic phrases. The learning philosophy was aimed to enable children to have a better understanding of the concept by making them envision or pre-hear a result. Thus, enabling them to work out how to achieve it, and realise the reason behind obtaining an unexpected result. This learning philosophy was derived from a number of sources, including the psychologist Piaget’s notions of how children construct their own knowledge through play.

The Logo approach in relation to microworlds can be somewhat complex. Students are sometimes provided with a simplified version of an AI model in some problem domains. For example, in the case of music composition, fragments of illustrative material can be generated using generative grammars as models of particular composition techniques. The supplied programs can be used by students to explore, criticise, and refine their own (or someone else’s) model of process.

Notice that none of the three components in the ITS model are required in the Logo approach. In practice, students need some form of guidance from teachers in order to make use of their full potential using Logo systems. If there is no guidance from a teacher the students risks only learning a technique without appreciating the wider possibilities and understanding the true meaning of being an experienced musician. The educational philosophy associated with Logo has been applied to a number of systems in music at different levels and in different ways, as mentioned below.

4.1 Music Logo System: Bamberger’s System

Jeanne Bamberger’s Music Logo System (1986, 1991) can be used to work with sound cards or synthesisers. It uses programming elements called functions to structure and control musical sounds. Music Logo’s central data structure is a list of integers representing sequences of durations and pitches, which can be stored separately. These can be manipulated separately before being played by a synthesiser. So for example, to play A above middle C for 30 beats, then middle C for 20 beats, then G for 20 beats , the following expression might be used.

Play [a c g] [30 20 20]

Programming constructs such as repeat can easily be understood by beginners to do musical work. Using arithmetic and list manipulation functions, note and duration list can be manipulated separately. Features such as recursion and random number generators can be used to build complex musical structures. Common musical operations are provided (list manipulation functions).

For example, one function takes a duration a pitch list and generates a number of repetitions of the phrase shifted at each repetition by a constant pitch increment, creating a simple sequence (in a musical sense of the term). Bamberger’s Music Logo System also provides other musical functions, such as retrograde (reverses a pitchlist), invert (processes a pitch list to the complimentary values within an octave), and fill (makes a list of all intermediate pitches between two specified pitches).

To try and guess a musical outcome, manipulate lists and procedures or conversely iteratively manipulating lists of representations to try to reproduce something previously imagined, Bamberger suggests many simple exercises. These techniques, in many ways, are a reflection of educational techniques suggested by Laurillard (1993) for general use in higher education. There are two particular classes of phenomena suggested by Bamberger, which emphasises the importance of ‘shock’ and learning experiences.

Firstly, perceptions of phrase boundaries occur in melodic and rhythmic fragments dependent upon small manipulations of the duration list. Secondly, there is an unpredictable difference between degree of change in the data structure and the degree of the perceived change produced. In priniciple, the Logo system allows students to focus on manipulating any kind of musical structuring technique. However, in practice the focus tends to be on simple, small scale structures such as motives, and their transformation.

4.2 A series of microworlds: Loco

Peter Desain and Henkjan Honing developed a series of microworlds and tools applying the Logo philosophy. The first series was the LOCO (Desain and Honing, 1986, 1992). The second was POCO (Honing, 1990), followed by Expresso (Honing, 1992) and LOCO-Sonnet (Deasin and Honing, 1996). All of these microworlds carefully reflect the thought behind AI methodologies and how they can be applied to music education.

LOCO is similar to Bamberg’s Logo, in the sense it also focuses on music composition. The central component is a set of tools for representing sequences of musical events, which can be interfaced with any output device or instrument. It is also flexible enough to take input from practically any composition system.

Microworlds provided each offer tools for useful style-independent composition techniques, particularly stochastic processes and context free music grammars. Two musical objects provided essentially are just ‘rests’ and ‘notes.’ LOCO’s time structuring mechanism is simple and elegant. There two relations, Parallel and Sequential – used to combine arbitrary musical objects. Sequential is a function which causes musical objects in an argument list to be played one after another, whereas, Parallel is a function that causes arguments to be played simultaneously.

It is quite simple to nest a parallel structure within a sequential structure, and vice versa. Sequential and Parallel objects are treated as data which can be computed and manipulated before they are played. The result- arbitrary time structuring can be applied with much flexibility. As mentioned earlier, LOCO provides a base for composing using stochastic processes and free grammar context. Various effects can be produced, depending on how variables are defined, including;

A random choice among its possible values

A choice weighted by a probability distribution

A random choice in which previous values cannot recur until all other values have been chosen

Selection of a value in a fixed circular order

The above are easily put together using composition (in a mathematical sense) of functions. For example, the value of an increment could be specified as a stochastic variable. This can produce a variable that performs a Brownian random walk. Brownian variables can be used, for example, as arguments in commands to instruments within a time-structured framework. These techniques can be used to construct concise, easy to read programs for transition nets and other stochastic processes. Using general programming language in each case, the operation of a program can be modified. See Ames (1989) for more information in the compositional uses of Markov chains.

The primary design goals of LOCO include ease of use by non-programmers to experts. A more recent version of LOCO, LOCO-Sonnet mirrors LOCO but also includes a graphical front end. Sonnet is a domain independent data flow language originally designed for adding sound to user interfaces drawn from Jameson’s (1992) Sonnet. It is designed for use by both novices and experts alike. LOCO has been used in workshops for novices and professionals and even has courseware available.

4.3 Concluding comments on the Logo approach

The Logo approach is known to be associated with constructivism. Constructivism, in the aspect of knowledge and learning, suggests that even in the cases where ‘objectively true knowledge, exists simply presenting it to a student limits the effects of their learning. It based on the assumption that learning arises from learners being interactive with the world, which will force them to construct their own knowledge.

The result of this ‘knowledge’ will vary between individuals creating unique ideas and outcomes. This fits in very well with open-ended domains such as music where the basis of knowledge is learning how to create your ‘own’ masterpiece.

Unlike classical Intelligent Tutoring Systems, Logo requires intensive support from a human teacher. This can be viewed as both weakness and strength of the program. Intelligent Tutoring Systems and the Logo approach were both influential ideas of AI in education in the early years. As both strengths and limitations were noted over the years, combining characteristics of the two became a prime focus of research which led to Interactive Learning Environments (ILE). We will talk about this after a brief discussion on AI-based tools.

5. Applications in Education: focus on AI-based tools

There are a number of application tools employing AI but its purpose is not primarily educational. However, it is useful to consider some of these systems as they nevertheless have clear educational applications. There are quite a few programming languages based on AI languages such as LISP and CLOS that have a relatively similar technical aspect to that of the Music Logo systems described earlier. However, the philosophy of use may be quite different. The commercial system Symbolic Composer (for Macintosh and Atari) is one example of this difference.

It has a vast library of functions, including neural nets facilities, used for processing, generating and transforming musical data and processes, commonly built on Lisp. The system is primarily aimed at composers and researchers. Another culture which offers an educational paradigm with many links to AI culture is the Smalltalk culture. An example of such a system is Pachet’s (1994) MusES environment, implemented in Smalltalk 80. It is aimed at experimenting with knowledge representation techniques in tonal music.

MuSES includes systems for harmonisation, analysis and improvisation. Finally, an example of a commercial program is Band in a Box (Binary Designs, 1996). It takes a chord sequence as input and at output can play an accompaniment based on the chord in a wide variety of styles. At one moment in time this would have required AI techniques but in today’s era it is a conventional method.

6. Supporting learning with Computational Models of Creativity

6.1 A cognitive support framework: constraint-based model of creativity

“I noticed that the [drawing] teacher didn’t tell people much….Instead, he tried to inspire us to experiment with new approaches. I thought of how we teach physics: we have so many techniques-so many mathematical methods – that we never stop telling the students how to do things. On the other hand, the drawing teacher is afraid to teach you anything.

If your lines are very heavy, the teacher can’t say “your lines are too heavy” because some artist has figured out a way of making great pictures using heavy lines. The teacher doesn’t want to push you in some particular direction. So the drawing teacher has this problem of communicating how to draw by osmosis and not by instruction, while the physics teacher has the problem of always teaching techniques, rather than spirit of how to go about solving physical problem”

Feynman (1986)

“John and I….were quite happy to nick things off people, because…you start off with the nicked piece and it gets into a the song…and when you’ve put it all together…of course it does make something original”

Paul McCartney quoted in (Moore, 1992)

There are limitations present in both traditional AI approaches in education mentioned earlier (ITS and Logo). ITS’s don not work very well in problem-seeking domains and Logo type approaches require support from a human teacher in order to be effective. One way of investigating these problems has been addressed by MC (Holland, 1989, 1991; Holland and Elsom-Cook, 1990). ‘MC’ is an acronym for both ‘Meta Constraints’ and ‘Master of Ceremonies’, which is a general framework for interactive learning environments in open-ended domains. We will focus on the domain model rather than the teaching model.

The current version is designed at teaching ab initio students to compose tonal chord sequences, with partic