Voice And Vision Controlled Lego Robot Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Audio Processing and Pattern Recognition are important aspects in the control and behavior of mobile robots, their action depends on the recognition of visual and audio stimuli in order to reflect intelligent behavior. This work presents an economic intelligent recognition system developed in an integrated manner for a LEGO robot. It consists of two main parts: one is the isolated word recognition. The second consists of an intelligent, artificially vision based, hand pattern recognition which tracks the moving body in front of it, keeping it in the field of view. The objective is to design a person tracking system for a mobile robot with text dependent voice recognition and a visual pattern recognition mechanism.

Research and development in robotics is looking into humanoid robots to help making human activities more comfortable and safe. For this reason, robust technologies must be developed making this interaction easy to use. This work is intended to improve the performance of a LEGO robot by combining its humanoid structure with voice and vision sensory artificial intelligent system. By doing so, we will upgrade the interaction between robots and humans in a natural and economic way.

There are many applications of robots, which are useful in domestic and industrial environment. Currently many robotic companies make a variety of robots with different functions specializing in specific field, because of the limitation of hardware and software. The hardware has made some progress in standardization of platform but the software platforms lag behind and should be standardized. The use of Microsoft Robotics Studio can help us to standardize the platform in the direction of development of various functions of the robots. Thus in this project a different approach is taken and we will specifically explore a model of voice and vision based robotic direction using the LEGO NXT.

1.1 Scope

The main objective is to design vision based hand recognition service for a robot with text dependent voice recognition for command obedience. In this manner the robot must stay alert and "listen" to its master whenever it talks so that it will be able to execute some predetermined tasks. The most relevant task will be to follow and to detect the visual hand pattern, in the most precise way.

1.2 Background

As the number of senior citizens increases, more research efforts have been made to develop service robots used in the welfare service. We too are developing a service robot that brings things ordered by its user. Speech interface may be the first choice for a user interface of such a robot. However, using only speech is not enough to realize a user-friendly interface. We humans use gestures and other nonverbal behaviors in addition to verbal speech in our communication. Thus multimodal interfaces using speech, vision and others have been intensively studied.

It is more convenient if service robots have vision that can recognize gestures and we can give them orders by gestures in addition to speech. However, we use vision for much more purposes in our communication. An important one is to get information on the real world around us. We assume that our partner has the same visual capability as ours and obtains the same information on the world. Thus we omit or abbreviate things that are supposed to be obtained by vision from our utterances. For example, we just say, "Come here," assuming that our partner knows that "here" means the position of the speaker and he/she knows it by vision (or other perception). We ask our partner, "Bring the red book on the bookshelf," assuming that he/she knows where the bookshelf is or that he/she can find it by vision. To realize a user-friendly interface, we need the vision capability to obtain such information on the real world in addition to that of gesture recognition.

1.3 Summary

In this project, I have described implementation of voice & vision services using Microsoft Robotics Studio for speech and hand recognition to control LEGO Robot. The voice recognition system will basically sense all the detectable voices and words pronounced. Once it is recognized, the text dependent recognition system will analyze the word which was spoken, and perform the task which was commanded. The pattern recognition will be implemented using local machine vision. The robot will detect any moving body in its vicinity, analyze its position and track it depending on the direction.


In order to design the system, what requirements are and it is necessary to know functional and non-functional requirements of the project. The proposed system was designed and tested with the following necessary considerations in mind. The requirements for this project are broken down into functional and non-functional requirements. Also included is the hardware and software required to successfully complete this project.

2.1 Functional Requirement

Functional requirements are statements of services the system should provide, how the system should react to particular inputs and how the system should behave in particular situations.

2.1.1 Acquisition of Real World Information

Objects in the real world are defined by attributes such as color and shape. The system finds these objects from the results of image processing modules to detect the related attributes. In the current implementation, we prepare image processing modules to detect the regions of the specified color and to detect simple geometric shapes such as rectangles and ellipses from edge detection results.

Gesture Recognition:

There are different methods for hand gesture recognition, and vision-based methods can handle properties such as texture and color for analyzing a gesture. In their framework, the region of the hand is detected by a color segmentation technique. Then, the hand gesture recognition is accomplished through a background subtraction method in order to accomplish hand segmentation and classify static hand gestures. A challenging issue in hand gesture recognition is the segmentation ambiguity due to that there might be spatial temporal variations of hand posture and motion trajectory. Therefore, it is sometimes not easy to determine when a gesture starts and when it ends in a continuous hand trajectory. Another problem is the various environmental backgrounds and illumination changes which service robot has to cope with.

2.1.2 Basic Speech Analysis

As our goal is for the robot to follow the commands, we do not need a vast vocabulary. First, the system listens for the end of an utterance. The utterance (speech input) is converted into text. Then it classifies the words in the text to the predefined categories. After categorizing the words and analyzing them, if the system has enough information to start the robot's action, it sends the command to the action module for the action to be carried out. However, if there exist any ambiguities in the command due to lack of information, the system does the followings to work out an appropriate solution by using other information.

2.2 Performance Requirement

Performance requirements are necessary for system design and development. This describes how well services to be executed.

2.2.1 Intuitiveness

It is a requirement that any interaction system be intuitive and easy to use. Should a system become complicated to use the scope of the device's user base becomes narrow. The goal of any user focused system should be to reduce task time as well as allow new users to learn the modality of the systems input rapidly and adapt to using it, whether through experimentation or streamlined instruction.

2.2.2 Responsive

Robotic Systems that are unresponsive can be dangerous and unreliable, as well as add to user frustration. The system needs to react quickly to allow the user to know that the human system communication was successful.

2.2.3 Resolution, Precision and Accuracy

The system needs to be accurate and have decent resolution, precision and accuracy to be effective.


Resolution is the smallest difference between two measurements that can be detected. Sometimes processing truncates the measurement. Sampling is also proven to improve resolution. Any system that relies on any form of detection and filtering is affected by detection resolution.


Precision is difficult to singularly define, taking a different meaning based on context. For the purpose of this study, precision will be considered the range of possible output values for one input.


Accuracy is an independent measure from resolution and precision. The ability to distinguish between two different values does not make a system accurate. Accuracy is a conclusion drawn from the system's determined value and the actual value.

Hardware and Software Requirement

The following sections list the minimum hardware and software requirements to successfully demonstrate the system performance.

2.3.1 Hardware

LEGO Mindstroms NXT kit with:

NXT Intelligent Brick 2.0

Two Servo Motor

Ultrasonic Sensor

Bluetooth Dongle

Integrated Webcam

IDT High Definition Audio

2.3.2 Software

Microsoft Visual C# Express Edition

Microsoft Visual Studio Standard, Professional, or Team Edition

Microsoft .NET 3.0 Framework

Microsoft Robotics Studio 2008 R3

Microsoft Internet Explorer


This chapter deals with the functional and non functional requirements of the project and lists the hardware and software requirements that required for this system to function properly.


The Figure 2.1 shows the simple System Design Model for the system. The system consist of LEGO Mindstroms kit with NXT controller, servo motors and sensors, which has inbuilt bluetooth module can communicate with other bluetooth device attached with laptop. The laptop is with integrated Webcam and Audio devices like Microphone and Speaker. The code is running through Microsoft Robotics Studio Software.


NXT Brick


Microsoft Robotics Studio Software

Motors and Sensors



Figure 2.1: System Design Model

3.1 Microsoft Robotics Studio

Microsoft developed and published a software environment named MSRS for robotics user. MSRS is aimed to be used for all robotics. There are 3 most important functions for MSRS. First of all, extensible runtime architecture for all types of robot is supported by MSRS. MSRS can be used for robots using 8-bit, 16-bit, or 32-bit processors and even for robots using multi-processors. Secondly, MSRS provide set of useful tools for programming and debugging in simulation environment. User can easily debug the real robot in simulation to be efficient. Also, visual programming supporting drag-and-drop interface is included. Thirdly, MSRS delivers useful technology libraries services to start writing robot applications and tutorials using several programming languages. MSRS uses LEGO NXT for tutorials because of the merits of LEGO NXT.

Distributed Software Services:

The Microsoft Robotics Studio relies on a very interesting concept known as distributed software services (DSS). This is a method of programming for robotic systems as it essentially embodies each robotic component as a potential asynchronous parallel process that provides services over REST web services. The Microsoft Concurrency and Coordination Runtime (CCR) provides the framework that provides DSS all the synchronisation it needs.

DSS itself is very difficult for a beginner to comprehend as it breaks the sequential programming style that many are very familiar with and presents a programming architecture where objects and hardware have to be represented as services. What is interesting is the analogy that can be drawn to the Communicating Sequential Processes (CSP) model. Often in DSS, the services synchronize at the point of communication or message passing between the services request. The major advantage associated with DSS is that once a method is written it can potentially be reused often and the functionality exists to add entire services to an application. Microsoft has advanced this functionality with the inclusion of generic services that don't specifically interface with any sort of hardware but potentially work with any hardware provided the correct XML manifest. This makes it useful for use with rapid prototyping devices like the NXT. The NXT manifest just needs to be swapped with another manifest and the other device can easily use the same methods with little alterations.

Visual Programming Language:

Included with the Microsoft Robotics Studio is the Microsoft Visual Programming language. A new concept for the integrated development environment has been proposed, and while not a new concept the VPL environment is perfect for the development of distributed software services as it is capable of visually depicting the services, the inputs and outputs from the services and the connections between services. VPL was found to be a perfect basis for the development of any DSS based application based on the fact that standard services can be planned out and then C# code generated that matches that VPL planned system.

3.2 LEGO Mindstorms NXT

NXT intelligent brick (I-Brick) is the name of the programmable brick of NXT. I-Brick has a 32-bit Atmel ARM7 processor which includes 256 KB Flash memory and 64 KB RAM. Atmel 8-bit AVR processor is implemented in I-Brick as the co-processor.

Figure 2.2: LEGO NXT with Peripherals

I-Brick has two types of communication to download programs; universal serial bus (USB) 2.0 port and wireless communication using Bluetooth module. A dot matrix display supporting 100 x 64 pixels is located on the front of I-Brick. This display module can print numbers, text, figures and it is also possible to print a bitmap image file with '.ric' file type. There are 3 output ports to connect with motors, and 4 input ports to connect various types of sensors. The basic NXT set comes with touch sensor, sound sensor, light sensor, and ultrasonic sensor. Three interactive servo motors are also included. Finally, parts of LEGO like beams and gears are included as well. Fig. shows how the I-Brick with its peripherals.

A robot was made with NXT. An ultrasonic sensor is built and it is attached to the driving base with two wheels. Each wheel was directly connected with the motors and they could be operated independently. Fig. 2.3 shows the robot made with NXT for experiment. Touch sensor returns value in binary form; on or off state. The motors of NXT can be controlled by the rotating direction with amount of power, or the degree of rotation. Because the two motors can be controlled individually, to make the robot turn to right, motor on right side of the robot will rotate in clock wise direction and the other motor will rotated in anti-clock wise direction. I-Brick can also synchronize the motors to move in forward or backward direction perfectly. The program will be repeated continuously until the user stops the running I-Brick. The left motor is connected with the left wheel of the robot and the right motor is connected with the right wheel of the robot. Both motors will rotate in reverse direction to move backward. When any object is near to ultrasonic sensor range while it is moving forward, the robot will turn to left again.

Figure 2.3 LEGO Robot

3.3 Communication Interface

Communication between the robot and the UI is conducted using Bluetooth communications, V2.0 with Enhanced Data Rate (EDR). This communication occurs when downloading a new program to the microcontroller, or during runtime operations to control the robot using the UI, and to provide the UI with sensor measurements. The Bluetooth device is connected to the host computer using a Universal Serial Bus (USB) connection and the Lego NXT microcontroller has a Bluetooth built into the microcontroller.

3.4 Summary

For the implementation of the robotic methods, we adopted a platform consisting of the distributed software services (DSS), C#, and Microsoft Visual Programming Language (VPL), Which are new technologies provided by Microsoft in the Microsoft Robotics Studio 2008 R2. This framework designed for the rapid development of robotic applications in both home and industry, providing an extensible simulation environment for the testing of the produced DSS solutions. The aim of the Robotics studio is to make asynchronous programming easy.


The previous chapter outlined the design of the system, describing the modules and the roles they play in the design. This chapter will describe the implementation of the service needed for the system towards reaching the proposed design goals. For the service builder the chosen platform was Microsoft technologies using C# with a direct focus on the intercommunication of the modules with Microsoft Robotics Studio.

4.1 Gesture Recognition Service (SimpleVision)

In gesture recognition we need to segment and track three objects of interest: the face and the two hands.

Foreground Image Calculation

Motion Detection

Skin Color Segmentation

Face Detection






Right Hand


Left Hand


Both Hands


Flow Chart 1: Hand Gesture Recognition Service

The flow chart shows the sequential flow of the Hand Gesture Recognition service, in which skin segmentation module is responsible for segmentation of skin objects, similarly the object tracking module is responsible for matching the resulting skin blobs of the segmentation component to the previous frame blobs while keeping track of the occlusion status of the three objects.

4.1.1 Skin Segmentation

In order to robustly detect the skin objects, we combine three useful features: color, motion and position. Color cue is useful because the skin has a distinct color that helps to differentiate it from other colors. The motion cue is useful in discriminating foreground from background pixels. Finally, the predicted position of objects using Kalman filter helps to reduce the search space.

Color Information:

In order to collect candidate skin pixels, we use two simple classifiers. First, a general skin model (color range) is applied on small search windows around the predicted positions of skin objects. As the fixed color range can miss some skin pixels, we propose another color metric distance to take advantage of the prior knowledge of the last segmented object. This metric is the distance between the average skin color in the previously segmented skin object and the current pixel in the search window at positions x and y.

Pseudo code:

IsSkin( ) //Identify Skin Color

{ Initialize the Skin Color Range;

Check for condition if it is in the range;


IsMyColor( ) //Identify with the Color Object

{ Check if it is match with identified color object;


4.1.2 Motion Information

Finding the movement information takes two steps. Firstly, motion detection, then next step, finding candidate foreground pixels. The first step examines the local gray-level changes between successive frames by frame differencing.

The second step assigns a probability value for each pixel in the search window to represent how likely this pixel belongs to a skin object. This is done by looking backward to the last segmented skin object binary image in the previous frame search window and applying the following model on the pixels.

In this way, small values (stationary pixels) in that were previously segmented as object pixels will be assigned high probability values as they represent skin pixels that were not moved, and new background pixels with high will be assigned small probability values. So simply, this model gives high probability values to candidate skin pixels and low values to candidate background values.

Pseudo Code:

DetectMotion( )

{ Convert color image into Gray scale image;

Calculate x, y pixels and pixel count of image;

Check if motion is detected by counting pixel is greater than threshold;


4.1.3 Binary Image

After collecting the color, motion features, we combine them logically to obtain a binary decision image.

Pseudo Code:


{ IsSkin();



Calculate foreground from difference between background and current image;


4.1.4 Skin-Based Face and Hand Gesture Tracking

The basis of our face detection system is skin-color blob detection. To get blobs, run a 2-pass segmentation algorithm on the binary image and keep only regions larger than a certain size. The result of the blob detection is a set of regions that contain skin-color.

To find a face region among the segmented face candidate regions, the service uses an ellipse validation and two simple geometric constrains. The first constrain is a face on a color object and the second constrain is a face that should be in the upper image plane.

To detect hand gestures, the service uses geometric information over a skin image and foreground image. It also uses results of a detected color object and a face. The foreground image is calculated by subtracting the current image from the background image. The background image is grabbed when a motion is not detected between the current camera frame and the previous camera frame for a specified time in a different image.

Pseudo Code:

FindHead ( )

{ Initialize SegmentationRegions; //Initialize region for segmentation

Find HeadRegionCandidates; //Find if there is any head region in image

Find face with a color object; //Considering a color object as a user's colored shirt

ValidateFaceByEllipse method; //Check if it is fulfills the geometric constraint of face

Calculate number of Heads;

Check if headbox located in the upper side of color object;

Find face without color object;


FindHeadRegionCandidates ( )

{ Initialize headbox and boundbox region;

Check the geometric constraint; // face should be in upper image plane


FindHandGestures ( )

{ Check if head is found;

Find the right hand gesture by checking head and any color object found;

Gather hand pixels from skin image and foreground image in a selected range but exclude head region;

Accumulate hand blob area to enhance the motion informs;

Find the left hand gesture by checking head and any color object found;

Gather hand pixels from skin image and foreground image in a selected range but exclude head region;

Accumulate hand blob area to enhance the motion informs;


4.2 Summary

In this chapter the implementation of the design was explored. Hand gesture recognition service is built by modifying the simple vision and attaching an onboard laptop with a Web camera. The service used to capture images from the attached Web camera and proceed further for face and hand gesture recognition. Images are captured from the Web camera every 100ms, and the result is stored in an array of binary values.


This chapter will outline the testing procedures that are required to ensure the functionality of the system. Testing is important as it will help locate errors that can interfere with the performance of the system. The DSS Log Analyzer is a tool from MSRS which can help us to collect and analyze the service logs and to see how services are communicating with each other.

5.1 DSS Log Analyzer

The Decentralized Software Services (DSS) Log Analyzer tool allows you to view log files that have been recorded using DSS logging. You can step through messages, filter message flows across multiple DSS services, and inspect message details.

The services that were found as oval-shaped nodes well as lines showing the flow of messages between services. If you click on one of the lines it will be highlighted and the message header details will be shown at the bottom of the window.

Message Flow View:

DSS Log Analyzer has two main views, a "Message Flow" view and a "Message Details" view. The Message Flow view is the default view that is shown once the log files have been loaded. It shows you how many messages are sent between various DSS services by varying the line thickness connecting the various services. Thicker lines mean more messages whereas thinner lines mean less messages.

Figure 5.1: DSS Log Analyzer - The "Message Flow" view

You can also change which services are displayed in the "Services View" in two ways. One way is by changing the value of the "Message Threshold" slider as shown in the below picture. This limits services displayed to those that sent more messages than a certain threshold to some other service.

Message Details View:

Clicking on the "Message Details" tab will take you to the Message Details view. This view provides a timeline view of all the messages that were sent and received throughout the execution of your program. It should look similar to the below image. You can also select several messages by left clicking and dragging the mouse in one of the rows. Upon releasing the left mouse button, the messages will appear highlighted and display in the "Causality View" window.

Figure 5.2 DSS Log Analyzer - The "Message Details" view

The highlighted blue boxes in Figure 4.2 denote the InsertRequests and InsertResponses messages. This could be helpful if you were interested in determining when a service sent or received an InsertRequest or InsertResponse message. You can zoom in on messages of interest with the "Zoom" and "Timeline offset" sliders.

Figure 5.3 DSS Log Analyzer - The "Message Details" view with 'Zoom' and 'Timeline offset'


The Message Flow view is showing the interface and data flowing between the services. In Message Detail view, we can see how many times one service request for service. The result of figure 4.3 shows that every 100ms simplevision service is called which is responsible hand recognition.


The systems consist of both hardware and software. Development of these two elements may proceed concurrently, with their integration also proceeding concurrently. However, it may be necessary for the hardware to already be in place and operational before the software can be developed, integrated, and tested. Ideally, both elements will be ready for integration into the final system at the same time.

6.1 System Integration

System integration is the successful putting together of the various services of a system and having them work together to perform what the system was intended to do. Microsoft Visual Programming Language (MVPL) is an application development environment designed on a graphical dataflow-based programming model. A MVPL dataflow consists of a connected sequence of activities represented as blocks with inputs and outputs that can be connected to other activity blocks. The connections between activities pass data as messages. Activities that represent DSS services may require configuration information about how they should start. These settings are referred to as the initial state. In addition, partner services may also need to be started.

Hand Recognition Service:

The Hand Recognition service implements image processing functions using a webcam. This service performs a simplified face and hand gestures detections. Other services can get the detection results by subscribing to a service.

Configure NXT Drive activity's input by If statement that evaluates the output from previous service which is hand gesture detection and initialize the values for left and right wheels of the NXT drive for the differential drive rotations.

Figure 6.1: MVPL Diagram for Vision Service Implementation

Speech Recognition Service:

For the robot to receive verbal input, you need to add the speech recognition service to your diagram. Select Speech Recognizer from the Services toolbox and drag it onto the diagram.

The Speech Recognizer service uses a grammar that defines the words and phrases that should be recognized. Without a grammar, the Speech Recognizer does not recognize anything and your program will not work. The easiest way to set up a grammar is to run the Speech Recognizer Gui service which provides a web page as an interface. Drag this service to the diagram as well. It does not require any connections, you just need to put it on the diagram so that it will be started when you run the program. Speech recognition converts spoken words to written text and as a result can be used to provide user interfaces that use spoken input.

Configure NXT Drive activity's input by If statement that evaluates the Command input and decides what command to pass to the NXT Drive. However, as its name suggests, this activity Set Drive Power commands are just sent straight to the robot.

Figure 6.2: MVPL Diagram for Voice Service Implementation

6.2 Test Requirement

Adequate light brightness

Image background should have some offset with skin color

Face or Hand image should be greater than 0.2 and less than 0.85

Test Facilities:

Robot will be used only indoors in a controlled environment

Robot will only operate on a flat surface

6.3 Test Procedure

A software program has been written to control the robot based on the Speech & Hand gesture recognition. The command needed to for robot to start and move forward. Any command can be issued randomly; however, if they are not issued in a logical manner, a proper course of action cannot be taken. For instance, if 'Forward' command is issued prior to control the robot forward, even though the command is recognized, no action will be taken. Table 6.1 below shows the equivalent command words for controlling the robot. The test is conducted by different people like male and female, different environment like noisy and silent. Out of 100 test the accuracy is to be required is over 85% and in almost it is about more than 90%.

Table 6.1: Recognition Rate of Command Word to Control Robot


Correct Result(100)


Move Forward



Move Backward



Move Left



Move Right






The images are being acquired from the remote place from the robot. The system was observed to be more accurate under normal lighting conditions. Table 5.2 below shows the recognition result for controlling robot using Hand Gesture Service. The accuracy of the result is required to be over 80%.

Table 6.2: Recognition Rate of Hand Posture to Control Robot


Correct Result(100)


Left Hand



Right Hand









Testing is conducted in different testing environment, which has different lighting and background condition. This is impacted on test result to be failed. To be more accurate the assumption and test facilities are to be followed strictly mention in section 6.2

6.4 Test Results

The figure 6.3 shows the result of Hand Gesture Recognition service. In which the light brightness has created a major role which can seen on the captured image. With adequate amount of light the detection of face or hand is good. But so far the designed algorithm has proved to achieve the expected output.

Figure 6.3: Test Result

6.5 Summary

For the performance evaluation of the hand gesture recognition, the system has been tested on a set of 10 users. Each user has performed a predefined set of 10 movements and therefore we have 100 gestures to evaluate the application results. Figure 6.3 shows the performance evaluation results. Table 6.2, the hand recognition gesture works fine for an average of 76% of the cases.


A simple approach is taken to develop and implement the service based image processing algorithms are designed for MSRS. This experimental work is successful in using MSRS for LEGO MINDSTORMS NXT robot, which allows the robot to tracks and recognize the Hand Gesture. With a focus on incorporating vision based hand gesture recognition service into robotics and developing systems that actively use computer vision to drive and control robotic systems, the approach used in the project was to develop a prototype system that incoporates all the problem domains when computer vision is used in the field of robotics.

7.1 Future Work

The project suggests several avenues for future study, most being perhaps enhancements to the system itself. The largest extension would be the implementation of an appearance based system to power the visual detection, There is also the avenue of adding additional interaction modalities to the robot, like remote control, to provide additional options for the user, and testing how accurately these systems can cooperate.


Microsoft Corporation. Microsoft Robotics Studio 2008 R3 help

Microsoft Corporation. Microsoft Robotics Studio 2008 R3 tutorial

Pham Trung, Nitin Afzulpurkar member IEEE, Dhananjay Bodhale "Development of Vision Service in Robotics Studio for Road Signs Recognition and Control of LEGO MINDSTORMS ROBOT" Proceedings of the 2008 IEEE International Conference on Robotics and Biomimetics Bangkok, Thailand, February 21 - 26, 2009

Mr Manigandan and Mrs.I Manju Jackin "Wireless Vision based Mobile Robot control using Hand Gesture Recognition through Perceptual Color Space" 2010 International Conference on Advances in Computer Engineering

Jean-Christophe Terrillon, Arnaud Pilpré, Yoshinori Niwa and Kazuhiko Yamamoto "DRUIDE : A Real-Time System for Robust Multiple Face Detection, Tracking and Hand Posture Recognition in Color Video sequences" Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04)

Thomas Coogan, George Awad, Junwei Han, Alistair Sutherland "Real Time Hand Gesture Recognition Including Hand Segmentation and Tracking"

Khalid M. Alajel, Wei Xiang, John Lies "Face Detection Technique Based on Skin Color And Facial Feature"

Md. Hasanuzzaman, V. Ampornaramveth, Tao Zhang, M.A. Bhuiyan, Y. Shirai, H. Ueno "Real-time Vision-based Gesture Recognition for Human Robot Interaction" Proceedings of the 2004 IEEE International Conference on Robotics and Biomimetics August 22 - 26, 2004, Shenyang, China