Study Of Innovative Applications Within Augmented Reality Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Within this report the field of Augmented Reality (AR) is reviewed focussing mainly on the advances of AR into the consumer mobile phone platform. The technology required to generate and display an AR application is described along with some of the earliest implementations of AR and mobile AR applications. The key researchers who have helped to advance the development of AR onto mobile phone platforms have been mentioned and their work in this field reviewed. Different types of fiducial markers or marker tags have also been looked at. These include ARToolkit tags, ARTags and QRcode tags. The benefits of the visual representation and simplicity of the ARToolkit tags is given as a reason for the continued use of these tags. Whereas the data capacity and redundancy of a QRcode also makes them an ideal form of AR marker tag for future AR applications. Markerless natural feature recognition technology like SURF is compared with the use of fiducial marker tags and it is concluded that both have attributes which would support their continued use. The use of fiducial markers in advertising is one example. Different AR toolkit libraries and APIs are mentioned and features contained within them are discussed. Finally, some of the latest AR mobile phone applications are reviewed and the tracking incorporated within them and their implications are mentioned.


What is AR?

Augmented Reality (AR) is the practice of overlaying, or combining, the visible real world environment with computer generated virtual objects. (Milgram & Kishino, 1994) One use of this technology is to provide a user with additional information on their surroundings, that may be of assistance to them and that they may not have previously known. Some ways in which this has been practically implemented include, giving users additional information about artefacts they are viewing in a museum (Bruns & Bimber, 2009) and providing a heads up display (HUD), on a car windscreen of relevant traffic information. (Charissis & Papanastasiou, 2010)

Although AR is thought of as a recent area of research, its beginning can in fact be dated back to the late 1960s with Sutherland's (1968) research paper "A head-mounted three-dimensional display". The technology involved has advanced greatly since then. However, some of the original problems that faced Sutherland remain, for example, how to properly handle occlusions of virtual objects with real world objects (Fortin & Hebert, 2006). In recent years AR has moved away from the research arena and become a topic of interest in the public domain.

How it works

Some of the fundamental technologies that are required to create and generate AR applications are visual displays, methods of tracking position and direction and a user interface for interaction with the augmented reality being generated. In addition to this, a way to calibrate the device is also usually required.

AR displays can be categorised into three main types; handheld, head mounted and projection. Head mounted displays can be further divided into two different forms. These being, optical see-through, which are transparent lenses overlain with the AR image, and video see-through which requires the user to wear head mounted cameras which display the mixed real and virtual objects on small screens in front of the user's eyes. Handheld displays work in a similar fashion to the video see-through HMDs. They have a camera on one side and an LCD screen on the other which displays the augmented reality. Projection displays work by projecting the AR environment onto a surface. Viewing AR on a computer screen would fall into this category (Azuma et al., 2001).

Tracking where the user is looking and where to generate the virtual overlay of the real world can be performed in several different ways. Some of the different methods used, are the use of Global Positioning System (GPS), digital compass, inertial tracking and accelerometers, visual tracking via fiducial tag markers or via natural features, or a combination of any or all of these methods.

A visual tracking library that can be used in the development of AR applications is called ARToolkit. This toolkit enables visual tracking of camera position relative to black bordered, square fiducial markers. When the camera detects one of these markers it displays the virtual 3D image over on top of the marker's position. The toolkit also has tools for calibration of visual devices and other features that allow a developer to quickly begin to create AR applications (ARToolkit). The ARToolkit was released by the University of Washington and is open source and free to use under the GPL.

Traditional AReas

Traditionally, the use of AR is separated into several subtopics. These include medical applications that can assist with surgery (State et al., 1996) and assisting technicians in the repair of complex machinery, such as in the paper by (Feiner et al., 1993). In this an HMD displayed a virtual overlay on top of a printer to make the user aware of where to refill the printer with toner cartridges and paper. Another traditional area for research in AR is Military applications. An example of this would be in heads-up displays (HUDs) in an aircraft cockpit (Furness, 1986). Path planning and learning aids are two other traditional areas. An example of the use of AR as an aid for learning is the "Remembrance Agent" developed by MIT students (Starner et al., 1997). This assisted the operator by giving him reminders of dates he had arranged when checking his calendar and provided a list of relevant research papers as references while he was writing a report. The final traditional area and the main topic of this report is mobile AR. One of the first examples of this was the "Touring Machine" (Feiner et al., 1997).

All of the traditional AR implementations mentioned previously required a user to wear an HMD. In contrast to this, the main focus of this report is specifically on the implementation of AR on hand held mobile devices and consumer mobile phones.

Chosen Sub topic

Although still being very much an area of research in the traditional areas, AR has evolved from these beginnings to become more prominent in the public domain. Areas where it has become more widely recognised include the fields of entertainment and advertising (Azuma et al., 2001). A reason for this, in some ways, is due to the capability of modern mobile phones. Since modern mobile phone processors and operating systems have become sufficiently powered and many contain the necessary components for visual tracking, orientation and display of 3D graphics, it is no longer necessary for expensive, cumbersome, customised equipment to make AR mobile. In addition to this, as Henrysson and Ollila (2004) state "The mobile device is the most ubiquitous device and a part of most peoples' everyday life." Therefore having AR on a mobile phone makes the technology available to a much wider audience.

Related Work

Major figures in Mobile AR

There are several people renowned in the field of mobile AR as having made developments that have helped in the advancement of this technology. Jun Rekimoto is one of these key figures. He is accredited as being one of the first people to implement an AR application on a hand-held device while researching for Sony, thereby making the move from the traditional normal of the user being required to wear an HMD. In addition to this he was also the first person to make use of fiducial markers, or tracking markers as they are more commonly known as, in the form of coloured stickers, as a way of tracking the position of virtual objects in the real world environment (Rekimoto & Nagao, 1995). Rekimoto was also one of the first to make a collaborative AR application on a handheld device allowing car designers to view a 3D model of a design concept collaboratively (Rekimoto, 1996).

Anders Henrysson is another person who has made several notable contributions to the advancement of AR applications on mobile phones. He is responsible for porting the ARToolkit library to Nokia's Symbian mobile smartphone platform (Henrysson & Ollila, 2004). This helped pave the way for others to develop AR applications for this platform. He among others was also the first to create a collaborative, two player AR game on a commercially available mobile phone (Henrysson et al., 2005). In this game players played an AR version of table tennis.

Another person worthy of mention is Daniel Wagner. He ported the ARToolkit tracking library to the Windows CE PDA platform (Wagner & Schmalstieg, 2003) and was the inspiration for Henrysson to port it to the Symbian platform. He and his colleagues are also responsible for the adaptation and improvement of the ARToolkit into the ARToolkit Plus. (Wagner & Schmalstieg, 2007) He later developed a completely new, more efficient and improved development library and vision library for Mobile AR called Studierstube ES and Studierstube tracker (Studierstube a).

Early Work in Mobile AR

The "Touring Machine" is one of the earliest examples of Mobile AR and wearable computing (Feiner et al., 1997). This like much of the early work in mobile AR required a user to wear a backpack and an HMD. The tracking and orientation in this system was done via the use of GPS and an accelerometer.

ARQuake (Thomas et al., 2000) is another example of Mobile AR in which the user had to wear a backpack and HMD. Quake was a very popular first person shooter computer game made by id software in the early 1990s. A notable point of interest with this study is the method used for tracking the user's position and orientation. In this study both Global Positioning System (GPS) and fiducial markers were used as methods of tracking. The reason the author gave for this was that GPS is fairly inaccurate and was best used at long distances. Using the fiducial markers alongside GPS allowed the 3D virtual objects, in this case monsters, to be rendered in the correct position at close range when one of these markers came in to view of the user wearing the HMD. The author commented that using GPS alone resulted in "monsters walking through walls".

Main problems in Mobile AR

The main difficulties faced by developers of AR applications for mobile phones currently lie with natural feature recognition and tracking. Due to the nature of mobile phones it will often be the case where the owner will be somewhere that they haven't been before and more often than not, be outside, which would make it impractical to use fiducial markers as a way of object recognition. Therefore the ability for the mobile phone to recognise natural features without having to use markers would be of great benefit for AR applications. David Lowe (1999) created the Scale-invariant feature transform (SIFT) algorithm to enable this possibility of natural feature recognition through computer vision. In the simplest terms it allows the computer or mobile phone to compare features in an image against images it holds in a database to find a match. A newer and more efficient method of natural feature recognition is the speeded up robust feature (SURF) algorithm. It is partly based on Lowe's SIFT algorithm but it is several times faster and more robust when faced with difficult images. It works by recognising high contrast areas in an image such as those found at the corners and edges of walls (Bay et al., 2006). There is an open source version of this image detector called OpenSurf which can be used on the most recent mobile phone operating systems such as the Google Android and the Apple iPhone. Daniel Wagner and his team also claim to have created in 2008 the first video real time natural feature recognition on a mobile phone (Studierstube b).

Due to the fact that there are so many different types of mobile phone all possessing differing quality of cameras with different focal lengths the problem of image recognition is not an easy one to solve.

Fiducial Markers

The other approach of visual tracking is to make use of marker tags or fiducial markers as they are called. Although at first thought these may not be as useful for tracking in mobile AR as natural feature recognition would be due to the reasons previously mentioned, there are many reasons why the use of these markers is still very important. Advertising companies have been using them a lot recently to generate 3D images when viewed by a mobile camera or webcam and they are now frequently being printed on posters.

The most commonly used type of Fiducial marker is the type from the ARToolkit like the one used by Toyota to display an interactive AR advertisement of their iq car, see Figure 1 below. These tags are very simple and all that is required for tracking is that they are a black square. They are also user created rather than generated by the virtual image library. This makes them very appealing to advertisers as they can use a custom image inside the black square which gives the AR user and idea of what will be seen when viewing the marker tag through their AR application on their mobile phone or computer.

Figure 1: Toyota iq ARToolkit type Fiducial marker

Another type of Fiducial marker system is the ARTag. These are datamatrix markers. This means that they contain data for error correction and other computer readable information. These differ from the ARToolkit markers as they are generated by the image library of the program itself and the library contains 2002 different variants of these. They have a much better detection rate than the ARToolkit markers and deal much better with variations in lighting on markers. They also do not require the preloading of pattern files like ARToolkit does. Another benefit of using this type of marker is that even if part of the marker is ripped or occluded they can still function correctly in contrast to the ARToolkit markers (ARTag). However, a benefit of the ARToolkit markers is that they are very easy to create and can be customised to give the user an idea of what will be displayed in AR. They have more of an aesthetic quality which is appealing to advertisers and designers.

A paper that compares ARTags and those from the ARToolkitPlus library was written by Mark Fiala (2005) (Fiala, 2005). This paper favours ARTags over the ARToolkitplus fiducial markers stating reasons similar to those previously mentioned. Reasons the paper gives are that ARTag Fiducials are much more robust than ARToolkitPlus tags when dealing with difficult lighting conditions and when occluded. Although these are very valid reasons to favour ARTags over ARToolkit markers the paper doesn't mention how the benefit of how simple it is to create ARToolkit markers and that any black square can be used as a marker with this library. It is felt that there may be some degree of bias towards ARTags in this paper due to the fact that the author of the paper, Mark Fiala is also the creator of the ARTag.

Another type of matrix tag is the Quick Response (QR) tag, which was invented by a subsidiary of Toyota motors called Denso-Wave. They are a two dimensional barcode and can hold the largest amount of data out of any of this type of tag. They can hold over 7000 numerical characters, over 4000 alphanumeric characters and nearly 2000 binary characters. They are an ISO standard and although the patent is maintained by Denso it is not enforced. Therefore these barcodes are free to create and make use of by anyone (Denso-Wave). There are many different online websites hosting generators and decoders for this form of barcode or marker. Figure 2 below is a QR tag with the map coordinates of Glasgow Caledonian University encoded into it. It was created using an online generator located at QR codes can also be decoded at this site. Although these 2D barcodes were not created with the specific purpose of being used as a visual tagging system for use with Augmented Reality applications they are in several ways ideally suited to it. As they can store such a large amount of data, they can contain information which could tell a mobile phone what to display when it reads the code. For example, they could contain a 3D model or some other binary application which will be displayed when scanned with a mobile phone. This is a benefit over other types of fiducial markers as they require the preinstallation of a program or application that tells the device what to display when they are viewed. QR codes also have the same benefits of ARTags in that they have built in error correction and will still work correctly if part occluded or even ripped. They are also asymmetrical and could be used to determine the orientation of the camera and which way up to display a 3D model. Many mobile phones running on the Android operating system come preinstalled with ZXing barcode reader which can read QR codes. These tags are becoming very popular in the areas of advertising and have also been used in art works, music album covers and music videos. In these areas a common use for them is to contain code which will launch a mobile phone user's browser and take them directly to a webpage when the tag is scanned (Wikipedia contributors,).

Figure 2: A QR-code marker containing Glasgow Caledonian University's map coordinates

Some researchers have already implemented the use of QRcodes as Fiducial Markers for AR applications. One example is the paper written by Tsung-Yu Liu et al. (2007). Described in this paper is a system developed to assist Taiwanese students in learning the English language. The students were all given PDAs equipped with cameras. On the PDAs there was a map showing the location of various QRcode markers throughout their campus. The data contained within the QRcodes is sent to a server from the PDA via Wifi. Depending on the data received from the PDA the server then sends back information to the PDA which then displays a 3D "Virtual Learning Partner" that gives English language information to the student.

Mobile AR ToolKits and APIs

In addition to the toolkits previously mentioned created by Henrysson and Wagner there are some very new AR Application Programming Interfaces (APIs) which have been developed specifically for the latest mobile phones. Two of these are the Wikitude API and the Layar API. The Layar API runs on both the Android platform and the iPhone OS platform. There have been over 1000 AR applications created for it so far (Layar). The wikitude API was initially restricted to the Android platform but it is now also available for the iPhone and Symbian platforms (Wikitude a). It is free to develop an AR application using either of these APIs and after approval into the marketplace a developer can earn money from their applications. The popularity of AR applications and APIs and their prospects for the future is made apparent by the fact that only as recently as November 16th Layar were granted $14 million from Intel Capital to further develop their AR applications (Venturebeat, 2010).

Current Applications

In this section some of the latest and most innovative AR applications on mobile phones are described. The methods they use for tracking and orientation are discussed. The mobile phone platform that they are available on is also mentioned. The three main smartphone platforms available of running AR applications at present are, the Nokia Symbian, Google Android and the iPhone OS.

The first and possibly the most exciting current Mobile AR application looked at was AR Parrot. (Ardrone). This is a remote drone application which runs on the iPhone. The user can control a remote controlled helicopter-type drone via wifi by tilting and rotating their iPhone. This is possible because of the built-in accelerometers the iPhone has. The drone itself has cameras attached to it which gives the user a view, on the screen of their iPhone, of what is in front of and below the drone. There are AR games that can be played between two AR Parrot owners. With some fiducial marker tag stickers attached to the AR parrot drones other drones are able to identify them and target them for dogfight games. The iPhone screen overlays virtual gun and missile fire onto the view from the drone's camera during this type of game. See Figure 3 below.

Figure 3: A two player dogfight game using AR drone Parrot with the iPhone

A very innovative method of combining AR with QRcodes is the N building in Japan. Covering the whole side of this building, which is an indoor shopping centre, is a huge QRcode, as shown in Figure 4 below. If a user is to view this code through their iPhone they will be shown an augmented view of the inside of the building showing adverts for the various shops and their locations. What people who are within the building are tweeting about on twitter is also shown if they have given permission for this to happen. A reason the developer of the application gave to justify developing it was that advertising takes away from the building's identity and that using QRcodes would also give more up to date information (Vimeo).

Figure 4: N building Tokyo Japan with QR code covering full side of building

"Wikitude Drive" is a path finding type of mobile AR application. It is similar to a car GPS system which provides a driver with directions to a destination. An image showing the application in use can be seen in Figure 5 below. The key difference between this application and a standard car GPS system is that it displays the directions on top of the driver's view of the road, seen through their mobile phone's camera (Wikitude b). Its functions could be compared to that of an HUD, like that researched by (Charissis & Papanastasiou, 2010). In Charrissis' paper he compares an HUD to a head down display (HDD) and gives reasons as why an HUD is better when the driver is faced by immediate traffic hazards. However his conclusions are slightly flawed as the HDD in his paper didn't display any of the same information that the HUD did and therefore a true comparison could not have been fairly drawn. By using an adapted GPS system similar to Wikitude Drive as an HDD in his study he could have made a better experiment.

Figure 5: Wikitude Drive AR directional GPS application

The final AR application to be mentioned in this report is "Tweeps around". This is a twitter application which was developed with and requires the Layar API to function. It is a social networking AR app which shows twitter posts of people near the location of the user and gives the distance and direction to the poster's location. It runs on the Android platform (Androidzoom).


Within this report the key areas of Augmented Reality have been discussed. The technology required to generate and display AR applications has been mentioned including different display methods and methods of tracking and orientation. Some of the latest AR mobile phone applications have also been mentioned.

Augmented Reality has come a long way since Feiner's "Touring Machine". With the advanced capabilities and components of modern mobile phones it is no longer necessary for a backpack containing a laptop and a head mounted display to use an AR application. As mentioned by Henrysson, the ubiquity of the mobile phone makes it an ideal device for creating AR applications that will assist with tasks (Henrysson & Ollila, 2004). Although HMD technology has become advanced enough to be implemented into a pair of glasses it is still unlikely that the general public would feel comfortable wearing these on a day to day basis any time soon (Azuma et al., 2001). This may change in the future however, if the most recent 3D televisions have anything to go by, as they also require the viewer to wear glasses and they seem to be becoming more popular.

The different ways of tracking and orientation have been looked at including visual tracking. Although natural feature recognition may be in many ways more desirable, than using fiducial markers dotted over the landscape, when using mobile AR. Fiducial markers still have many advantages including faster speeds and greater accuracy of tracking. The use of Fiducial markers in advertising, such as on poster ads or on websites also give support to their continued use. Projects like the Japanese N building show the ingenuity that can be displayed using these tags. The use of QR codes as fiducial markers is an exciting prospect as they can contain the information that they will display when viewed by a mobile phone without the prior installation of another program. However the use of other types of tags in AR like the ARToolkit marker still have merit as they are more aesthetically pleasing and can give a visual cue as to what will be generated by the AR application. In summary the continued use of both natural feature tracking and marker tag tracking have valid reasons for their continued use in AR and that a combination of both should be considered when creating a mobile AR application.

Finally, AR applications can be of great assistance to users in many areas. From directional information like the Wikitude drive to details on the location of train stations or local coffee shops they can be of great benefit to society in many ways. As most people have a mobile phone to be able to have this additional information without requiring an additional device or an expensive HUD is an exciting prospect. Although still at its early stages in commercial mobile phone applications it is likely that AR will become much more prevalent in the near future. With this, as applications like "Tweeps around" and the possibility of facial recognition software linking to Facebook or Twitter profiles show, there may be some privacy implications that will need to be addressed with this advent.