This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
MeMouse is an interactive platform that has been developed using Digital Image Processing Techniques to provide the user with mouse functionalities. Computer Vision techniques are the most fundamental in this scenario. This chapter describes the basic functionalities of the system and the specifications according to which the system has been developed.
Human Computer Interaction is an important field today. Research and Development in this field is increasing day by day. Several problems are being solved and improvements in many already provided solutions are being made. MeMouse also provides interaction with the computer. Idea of interaction has been developed by taking the idea from smart board. MeMouse focuses on letting the user interact with the presentation without the aid of mouse and by using a camera. This software focuses on the way any system can be controlled through defined hand gestures. Hand Gestures take over the functionality of mouse in a computer while the user interacts with it, being present at the point of presentation.
While presenting, a user needs to move between the slideshow and hardware control. In our environment, computers are placed on the left side of stage and presentation is projected on the right side of the stage. If an instructor, for example, wants to explain a particular point, he moves to the projected screen. But if there is a need to change slide or perform any mouse operation then he has to come back where the computer is placed to perform any operation. This consumes time and affects the attention of the presenter as well as the audience. This inconvenience needs to be sorted. MeMouse addresses this problem in a way to provide mouse functionalities using Computer Vision Techniques.
Goals and Objective
The objective of MeMouse is to provide hardware free mouse functionalities to the presenter using computer vision techniques. The functions that are mandatory to be performed are: single click, double click and right click. The goal of the project is to first capture the video of the presenter, then to extract the hand. Hand gestures will be observed. If any gesture matching the defined gestures appears, the computer needs to perform the corresponding operations. All of these operations need to be performed in real time environment.
The deliverable for this project is the software that will control the computer according to the gestures of the presenter.
This document provides basic knowledge about MeMouse. First, description of the project has been given and next chapter explains the related work in the field and other software products providing same functionality has been discussed. This thesis also includes requirement specification of the software in chapter 3. Design specifications have also been discussed in chapter 4 of this thesis. Chapter 5 describes the implementation details of our project, after that an analysis of the software has been provided and a few suggestions about the enhancements of the project have also been provided.
MeMouse is an interactive platform that provides mouse functionalities under the domain of Computer Vision. This Chapter provides introduction to MeMouse, goals and objection which had been set in order to develop this software. A brief elaboration of document contents has also been added.
This chapter provides an insight into different technologies that have already been developed for Human Computer Interaction under different domains including Signal Processing and other hardware devices. All of these devices of applications provide relevant functionality. An overview of all these technologies is presented along with a comparison with MeMouse.
The Smart Board interactive whiteboard is an interactive whiteboard that uses touch detection for user input - e.g., scrolling, right mouse-click - in the same way normal PC input devices, such as a mouse or keyboard, detect input. A projector is used to display a computer's video output on the interactive whiteboard, which then acts as a large touch-screen. The components are connected wirelessly, via USB or serial cables . Figure 2.1(a) and 2.1(b) shows the Smart Board being used. A projector connected to the computer displays the computer's desktop image on the interactive whiteboard. The interactive whiteboard accepts touch input from a finger, pen or other solid object. Each contact with the Smart Board interactive whiteboard is interpreted as a left-click from the mouse. Other functions are implemented in same way by using different inputs.
Figure 2.1(a) Figure 2.1(b)
Use of Smart in class room and other learning environment
MeMouse on the other hand just needs a camera that has to be connected to the computer and its position has to be in such a place where user comes in its visual field. Hence a large touch screen is not needed to be installed. This reduces the cost of system. A normal Smart Board is available in a few hundred dollars to several thousand dollars. Smart Boards are touch-sensitive devices. Accidental touch can trigger many unwanted operations on screen. MeMouse has less such drawbacks. Hence MeMouse is prioritized over Smart Boards.
Interactive Whiteboards Using the Wiimote
Wiimote is a remote controlling device that uses an infrared (IR) camera to detect the infrared light source. An IR camera is integrated in wiimote to detect energy sources. Wiimote can track sources of infrared light; user can track pens that have an IR led in the tip . By pointing a wiimote at a projection screen or LCD display, user can create very low-cost interactive whiteboards or tablet displays as compared to Smart Boards. MeMouse provides even cheaper solution that uses a web camera which is far cheaper than an IR camera. Also MeMouse does not need any aiding device for detection of hand or as input. Hence it is better to use MeMouse rather than Wiimote.
Figure 2.2(a) Figure 2.2(b)
Interaction using a Wiimote
Digital presenter is a small device that has USB like device that is connected to computer. Another device that is of same size is connected to it via Bluetooth. Buttons are provided on presenter which triggers limited operations required to change slides during presentation. MeMouse on the other hand have broader scope, not only slides are controlled but other mouse operations are performed. Hence, MeMouse gains an edge over Digital presenters.
Figure 2.3: Digital Presenter
This chapter describes the existing technologies that provide same functionalities as MeMouse. Smart Boards, Digital Presenters and Wiimote are closely similar to MeMouse, an analysis and comparison of MeMouse has been provided with all technologies. It has been observed that MeMouse is cheaper than any of these technologies and provides with same results, hence MeMouse has an edge over these devices.
This chapter describes the requirements specifications for the version 1.0 of Undergraduate degree project of software engineering. The idea of the product is a software program that helps the presenter to interact with the presentation comfortably. It captures the hand gestures of the presenter using a camera and performs the operations of the mouse. So the presenter does not have to have hardware always at his disposal. This document provides the specification that this software needs to fulfill in order to be most useful.
MeMouse has a limited scope that is to be able to provide the following functionalities of mouse as outputs:
Slide Show Control
All of these functionalities need to be provided using hand gestures as inputs. Hands need to be tracked and analyzed. If any hand gesture matches the defined gesture, associated operation is performed.
MeMouse needs to have some specific features that ensure the usability and durability of the product. These features are the core functionalities of the system and have been useful in design of the system.
Recognize Hand Gestures
MeMouse can interact with Windows API as a mouse without the use of actual mouse. User can perform right click, left click and tracking on the screen. These operations can be initiated by performing some defined hand gestures that is to keep hand static for a particular time.
MeMouse can detect the skin of any person without the aid of any color or other hardware. For this, user just needs to keep his hand in the visual field of camera.
MeMouse can recognize the hand in the area of visual field of camera that will find hand in the area and recognize it.
Hand is tracked in the image using predictions of Kalman Filter. The focus of the program is the area where hand is present in order to reduce search space.
Windows API interaction
MeMouse interacts with Windows API in order to pass messages about mouse operations. Messages are passed to Operating System about performing single click, double click and right click.
Assumptions and Dependencies
For the development of MeMouse, there are some assumptions have been made. This section describes the assumptions and dependencies of MeMouse based on which, the software had to be developed.
It has been assume that hand is the most moving object in our environment. But there is a drawback in using this assumption that if any other object having skin color moves faster than human hand then MeMouse will classify it as Human Hand. Also that the hand does not stop movement for up to or more than 2 seconds if any operation performance is not intended.
Windows family higher than windows 98 is targeted Operating System family. As it is the most used Operating Environment, it has been selected.
System features according to functional requirements provided by the Project Supervisor are as follows:
System needs to capture video so that human hand can be detected from the user environment. And certain actions can be performed based on the gesture of the hand. User has to initiate the program for video capturing. After starting the program user has to click "start" button present on the form to start capturing. And the response of the system is to display the video as output.
After video capturing, hand has to be extracted. So that tracking and gesture recognition can be performed. This feature has high priority because it initializes the system at first and no further processing can be done without input generated by this module. At this stage, user has to interact with camera rather than direct interaction with the program. Hence response from camera has to be accurate in order to perform the processing on video.
This is an essential part of MeMouse. It has to be efficient in order to track hand on screen and cursor will move accordingly. Response of this feature is to give expected position of the object based on present position and velocity of the object.
Gesture Recognition is core requirement of MeMouse. The performed gesture has to be classified, after analysis, as legal or illegal. This has to pass message to the main class to perform the action accordingly. Response of gesture recognition has to classify the gesture and send request to main program for related action.
External Interface Requirements
MeMouse has specific interface requirements that have been discussed in this chapter.
MeMouse is an interactive program that always needs to get input from the user. User needs to be in the defined area that is the visual field of the camera. The hand should be visible in the scenario in order to get input. The input can only be taken if the hand stays in the visual field when making a specific gesture. The gestures must be able to perform the mouse operation for the presentation. Hand must act as a mouse. The gestures defined for MeMouse to perform Single Click, Double Click, Right click and Slide Show Control have been defined.
Also the hand has to be detected using the properties like skin color and hand movement, no other aiding material should be needed to do so.
MeMouse gets input through web-camera that must have high resolution in order to get clear input with less noise.
This product needs to interact with the operating system of the platform through API. Targeted Operating System must include Windows XP and Windows Vista.
Other Nonfunctional Requirements
Certain other functionalities are required based on performance and response of MeMouse. These requirements are described here.
MeMouse has to be efficient software in terms of response and operation. As the domain of the product is image manipulation that needs fast processing on the machine along with being efficient. Hence the program logic and data flow needs to be in a way to be most efficient. MeMouse needs to work under the normal lighting conditions, with non-static background, be robust, compatible with the platform and also respond within minimum time in order to produce output.
Software Quality Attributes
MeMouse has to follow some requirements that affect the quality of the system. Quality of MeMouse has to be improved by following the quality requirements described in this section.
Runtime System Qualities
At run time MeMouse has to adopt some function in order to provide the user with required functionalities. As system has to perform its functions in real time so, runtime qualities of MeMouse are as follows:
MeMouse must perform the functions like right click, single click and double click at any point of the screen.
MeMouse must be able to perform operations in less time that is acceptable that is within a second.
MeMouse must be available for performance all the time user needs it to. For example in the presentation environment MeMouse must be available.
MeMouse has to be user friendly. User must be able to use MeMouse in most convenient way.
Non-Runtime System Qualities
Non-Runtime qualities of MeMouse are those which are required for enhancement in code or to make MeMouse useful for other developers in enhancing the system for other requirements and for other environments for which this system can be extended.
MeMouse has to be able to accommodate changes that include modifying MeMouse to incorporate more gestures. Also, the software must be able to accommodate any other functions if any other user, like a programmer, wants to incorporate.
MeMouse should have the ability of a system to run under different computing environments. As the targeted environment for MeMouse is presentation or lecture environment but other such environments where user want to use the system for personal use should be covered.
MeMouse applications must to be reusable in new applications. If a system is developed which needs the functionalities of MeMouse, MeMouse should be easy to understand that could be implemented in such a way.
Separately developed components of the system have to work correctly together in MeMouse. Modules of MeMouse must collaborate with each other in such a way to perform in way to be most useful.
MeMouse must be able to be tested in order to free it from faults. Different tests including beta testing is necessary in order to remove faults and make the software perform in accordance with the requirements specified.
MeMouse needs to be robust and able to manage disaster situations that arise during operation and hence work in real time efficiently. By disaster situation, it is meant that the situation in which undesired inputs are provided to the system. For example, if user's hand goes out of the frame, it should be able to manage to track it when it reappears in the frame.
This chapter describes the requirements of the system as described by Project Supervisor. It includes interface, functional and non functional requirements along with the main features required by the system. These requirements have been set after checking the feasibility of the system. These requirements have been considered as the fundamental principles for testing and standardization of the product.
This chapter provides with the design specifications of MeMouse. These specifications have been developed using the requirements described in previous chapter. This chapter provides information regarding system structure and architecture.
MeMouse is software that is intended to facilitate the user while they are delivering presentation. In order to control the presentation there are several way that includes manual interaction through mouse or keyboard, use of digital presenter or smart board. This creates a lot of inconvenience during presentation and distracts the audience which in turns wastes time. But while using MeMouse the presenter will be able to interact with the presentation at the point where output is being projected.
Assumptions and Dependencies
Basic assumption that has been the basis of development is that presenter's hand is the most moving object in MeMouse's environment. Another assumption has been that Windows Operating System released after Windows 98 are used in presentation environment.
For better performance the system on which MeMouse runs should fulfill requirements like having Pentium 4 or above (C2D recommended), 512 MB RAM (Recommended) and Graphics card (optional)
MeMouse is efficient software that provides output in different scenarios but there are some conditions that have to be applied in order to get usability from the system. Following are the constraints that have been applied
The room must have sufficient light so that skin color can be recognized.
Hand should move more than any other body part initially, particularly in 2 seconds at the start of program.
Full sleeve shirts are recommended for best performance. Cloth color must not have color that matches skin.
There must not be any other skin colored object in the background.
Design decisions and strategies that affect the overall organization of the system have been described here, higher-level structures of system. Some important issues are describes in this section like language, platform and project extensions.
C# has been used to develop MeMouse. Main reason to use C# is that the application is being developed in real-time and requires fast execution. Another reason is that it is widely being used for Image processing techniques. It is better than MATLAB because MATLAB is efficient for manipulation of still images and does not produce efficient results when used in real-time and videos.
An open source library AForge.net is being used for image processing techniques. It is the best open source library for image processing with C#. Another choice is Open CV that is not being used because of having some compatibility issues when using with C#.
At present, main focus is to control slideshows using hand gestures. But there is a plan to extend the functionality to take over mouse functionalities of computer interaction using gestures. Also, to build an application integrated with MeMouse which will act as interactive whiteboard.
To perform mouse operations using MeMouse, hand tracking and gesture recognition is necessary. In order to perform hand gesture recognition there are certain steps which have been followed. For that, user needs to be present in the visual field of camera. His hand has to be extracted and tracked. Gestures that hand shows are to be analyzed. Keeping in view these steps MeMouse was divided into five modules Video capture, Hand extraction, Tracking, Gesture recognition and Control Windows message passing.
The flow of data throughout the system has been shown diagrammatically in figure 4.1. When MeMouse starts its execution, it captures the video of presenter first. All processing depends upon the video that has been captured in real time. The input has to be hand gesture therefore hand is extracted from the video which was converted into frames before this operation performance. Next step is to track hand, in order to move cursor and finding next input. If hand shows defined gesture, a message to Windows API is passed to perform a particular operation. All of these modules have been explained in detail in this section afterwards.
Hand extraction is not a standalone and single task rather it is a series of different tasks which are Skin detection, Edge detection, Motion residue and Blob finding. After getting the resulted images from skin detection, edge detection and motion residue a logical AND is applied on these images. The resulted blob is considered as the hand.
Figure 4.1: Control flow diagram
After identifying the hand, it is being tracked. Kalman filter is used to predict the next position of the hand based on the present position. When next position is predicted skin colored objects are identified in the squeezed window (that is the hand).
Gestures that had to be defined were supposed to be most convenient to use. Hence, time dependent gestures have been defined. System clock puts a check about the time hand is kept static on a particular area of screen. Notifications appear about clicking option and operation is performed after hand moves after being kept static for a particular time.
The use case diagram of the system has been given below. This diagram describes the interaction of user and the system.
Basic Flow of System
User places his hand in front of camera; camera takes the image and sends it to clip board where it is saved. System captures new frame and sends a copy of frame to detect skin; skin module sends the extracted image back to the clip board. System sends another copy to detect edges; module returns extracted edges in frame. System sends current and previous frame to motion residue module to calculate motion in subsequent frames; module returns a frame with difference of both the frames.
Fig 4.2 Use Case Diagram
System performs Logical AND of all three returned frames; then Blog counting is done to extract the maximum blog from the resulted frame which is 'Hand' of the user. Systems sends center of hand to filter to predict possible position of hand in next frame and also sends hand shape for gesture recognition. A filter takes position and predicts next position. Gesture recognition module processes the shape to find out the gesture; if gesture found system sends message to windows API to perform action, else next frame is captured. Next frame is searched in a restricted area obtained from Kalman Filter output.
System has performed the action which was requested by user through hand gesture.
There is no alternate scenario because the objective is defined and product is being developed by strictly following the requirements.
Class Diagram of MeMouse has been presented as Figure 4.3 and elaborated in this section. All the classes in the diagram are described briefly. And a legend is provided in the diagram to describe the symbols that have been used and their purpose.
Figure 4.3: Class Diagram of MeMouse
As it has been shown in Figure 4.3, MeMouse is the main class. This is the class which controls all other classes and interacts with them in order to perform required functionality. When user first starts the program, user's direct interaction with the program is over now it is the responsibility of MeMouse class to carry out further actions and procedures.
MeMouse class shifts the control to WebcamCapture class to capture video and recognize gesture. WebcamCapture class contains the objects of KalmanProcessing KalmanProperties and MotionDetector3. These objects are used to interact with the respective classes.
KalmanProperties class contains the data and functions which are required for predicting the track for an object in the scene. It simply defines data variables and makes it available for other classes i.e. KalmanProcessing to be used for prediction of moving object in the frames.
This class is responsible for predicting the track of object and object itself in the scene. It contains methods for prediction of next position of the object.
It is the implementation of interface IMotionDetector. This class contains methods for processing frames and extraction of hand in a frame. It also has the object of vision class which is used to call the functions like skindetection, edgeDetection and motion residue and others from vision class.
Vision class has necessary methods for finding the required attributes in any image, from skin detection to motion detection and logical AND of bitmap images.
MeMouse has to perform operations in real time environment that is why it has to be properly designed to improve efficiency. This chapter elaborated the design of the software in accordance with the assumption and constraints that have been applied for development. Class Diagram, Data Flow Diagram and Use Case have been added and explained in order to have better understanding of system functionalities.
This chapter provides with the summary of different approaches used by people to address the problem statement of MeMouse. All of these approaches are useful but differ in efficiency and response. Complete system of MeMouse has been subdivided into five modules or subsystems based on previous technologies and team effort. These modules includes: Video Capturing, Hand Extraction, Hand Tracking, Gesture Recognition and Windows API Interaction. Hand Extraction part can be further subdivided into Skin Color Detection, Edge Detection and Residue Image. Gesture Recognition can be further subdivided into Time-based Gestures and Up/Down Gestures.
The first and the most important step to start with MeMouse is to capture video for real time processing. As the software will have to perform operations according to hand gestures rather than mouse itself. These hand gestures are to be captured or have to be seen. So video of the presenter is to be captured and converted into a format that can be operated on by other system modules. As we know that a camera captures video at different rates depending upon quality and resolution of camera. We are using a web camera because it is cheap in cost and maintenance. The video that is captured can be converted into images. As C# platform does not give any functionality to convert the video in usable data, there is a need to use some other tool or technique that do so. There are two tools we considered that provide these functionalities that are AForge.NET and OpenCV. Both of these tools are available as open source software.
OpenCV has mainly been written in C and provides Digital signal processing portability. Its wrappers for C++, C# and Java are available. As C# is our development language, so it is the main focus to see the compatibility with C#. Program needs to convert the video captured in .avi format into images. In this scenario, OpenCV does not provide some important functions that were required. Hence there is incompatibility of OpenCV with C# programming interface.
AForge.NET is also an open source library for image forging. Unlike OpenCV, AForge.NET is a C# framework designed for developers in the field of computer vision and artificial intelligence  . Hence, provides complete functionality for image manipulation in C# programming environment. AForge.NET was chosen for "Video Capture" module of MeMouse. AForge.NET had to be used for real time image manipulation. The image processing functions by AForge.NET can be utilized by using the library AForge.Imaging. AForge.Imaging is a library for image processing routines and files. It can be used to convert the video in the image frame according to the frame capture rate. As MeMouse has to process images in real time using C#, Aforge.NET provides the best solution for the image capturing and manipulation.
Before the actual processing starts, human hand needs to be detected. MeMouse depends upon the movement of hands and hand gestures. Hand Extraction techniques help in extraction of human hand from the image and subtraction of background image. This approach has been developed using image segmentation techniques that include Skin Color Detection, Edge Detection and Motion Detection techniques. These techniques, when put together, give us Human Hand in the video. Hence, skin detection techniques could be used in order to detect hands. But as our system needs real time results so, the use of techniques provided by Khurshid and Vincent and Askar et al.   have been used in collaboration that use hand segmentation techniques in real time.
Skin Color is the most important property through which we can detect hand. Skin has a specific range of colors varying from region to region as well as lightning conditions. There are also various models that can be used to find skin color that include RGB, normalized RBG, TSL, YCrCb and HSI Color Spaces. All of these approaches are efficient in different scenarios. The most important two approaches are RGB and HSI. RGB space can be used in this scenario but the main disadvantage of using this approach is that it detects a range of many other colors that are not required but the system. If we implement it with improvements that do not give errors, efficiency of the system decreases. HSI does not show inefficiency in his case. It is proven that irrespective of different races human color falls into the finite subset of HUE value. Keeping this in view and based on experimental results, HSI color space for skin detection has been chosen. Literature shows the range of HSI values for human body falls into finite subset of real values. This information was the basis of implementation of skin color detection sub module. Skin Color values of hue and saturation are more important factors to be considered. Using HSI model, better results have been achieved.
Table 5.1: HSI values used
4 to 35
0 to 0.7
0 to 0.8
Using only Skin Color detection technique, hand cannot be detected efficiently. Major constraint that implies is that face is also detected in this scenario. So other techniques are also needed to be used in collaboration with Skin Detection. There is also a constraint that applies in using this approach is omnipresence of skin colored objects in the background because those objects will be considered as skin as well. Resulted image is converted into binary image for further manipulation.
Along with Skin Detection, another technique relevant in this case is Edge Detection. Separation of hand from other objects is made easy by finding edges. It is an important technique for finding any object in the given space also called as image segmentation. Edges of the object vary depending upon the shape of the object. Edge Detection is used for feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or more formally has discontinuities. There are various techniques that can be used for Edge Detection in images or videos. These techniques include Sobel Operator, Differential Edge Detection, Canny Edge Detector, Prewitt operator and methods by Roberts cross. Applying Threshold is another technique that can be applied in a way to find edges in the system. Based on these techniques, different methods have been used to fnd edges in the videos. Most accurate results have been obtained by using Sobel Operator. Resulted image is converted into binary image for further manipulation.
Sobel Operators  that have been used are shown is Figure 5.1.
Figure 5.1: Sobel Opertors
Presenter uses his hands the most during presentation. Hence hands become the most moving object in the video. So in order to detect hands another property of hand movement is useful. For this purpose residue of image is found that detects motion in sequence of frames. Difference between two frames can be taken in order to detect motion. Two images are considered as matrices and mobile objects are found by examining the gray level changes in the video sequence. Let Fi(x,y) be the ith frame of the sequence, then the residue image Di(x,y) is a binary image formed by the difference of ith and (i+1)th frame to which a threshold is applied. This has been done in order to extract motion from complex backgrounds.
Results of 'Skin Detection', 'Edge Detection' and 'Residue Image' collaboratively gives the Hand extracted from the video. As all three images are binary, common result areas in three images is hand extracted in the video sequence. All images collaboratively provide a 'Combined' image . We find the largest contour area and its center and then draw a bounding box of fixed width and length which represents the hand region we were looking for. Further operations are to be performed on the area bound in this box.
Tracking of hand is an important part of MeMouse. Tracking algorithm shows the cursor its position on screen. Heuristic search in complete frame for finding hand makes the software inefficient. Hence, some technique is needed to be used that makes tracking efficient. Kalman Filter  is used to predict the next position of moving object using basic equation of motion. This reduces the search space in frames. Kalman Filter can be implemented using second equation of motion that is:
xk = Axk- 1 + Buk-1 + wk-1
The Kalman filter estimates a process by using a form of feedback control: the filter estimates the process state at some time and then obtains feedback in the form of (noisy) measurements. As such, the equations for the Kalman filter fall into two groups: time update equations and measurement update equations. The time update equations are responsible for projecting forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step. The measurement update equations are responsible for the feedback, that is, for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate.
Hence, time update equations are predictor equations, while the measurement update equations are corrector equations. Indeed the final estimation algorithm resembles that of a predictor-corrector algorithm for solving numerical problems. Time Update equations of Kalman Filter are:
x1k = Ax1k-1 - Buk-1
P1k = APk-1AT+ Q
Measurement Update equations of Discreet Kalman Filter are:
Kk = Pk .HT(HPk . HT +R)
x2k = x1k + Kk (zk - Hx1k)
P2k = (I - KkH). P1k
After each time and measurement update pair, the process is repeated with the previous a posteriori estimates used to project or predict the new a priori estimates. The Kalman filter recursively conditions the current estimate on all of the past measurements.
User needs to show a hand gesture that triggers the mouse operation on the computer. There can be several types of gestures that can be used. Time dependent gestures are the simple in implementation as well as convenient for the user. Hence a timer has been integrated in order to recognize user's input. Time constraint has been added depending upon the usage of the input. User needs to keep his hand static in MeMouse's scenario in order to initiate mouse operation. As single click is most frequently used, hence only 2 second time is required to trigger this operation. After that, double click is the most frequent mouse operation, for which 4 seconds timer has been set. And 6 seconds for right click. User is informed if he keeps his hand static for a particular time, and options are notified on screen. When user moves his hand after keeping it static, operation is performed according to input.
Windows API Interaction
Finally there is a need to interact with the windows API that will trigger operations on the computer. This can be done using the functions defined in Windows API . The functions we need the most for the use of mouse include tracking of mouse cursor, managing single and double clicks. A class for Windows interaction has been added to perform the core functionality for the system.
Implementation details of MeMouse have been discussed in this chapter. Techniques like skin color detection, edge detection, motion detection, hand extraction and Kalman filter have been discussed. Process starts with video capture, extracts hand using motion, skin color and edge detection. After that tracking is performed in order to move the cursor and perform functions to capture inputs. Depending upon time, gestures are recognized and operations are performed accordingly.
Results and Analysis
MeMouse has been developed to work in real time environment. This is a way to control presentation in real time using computer vision techniques rather than single processing and/or other similar ways that includes touch sensitive or using devices that operate using wireless signals or Bluetooth.
MeMouse has been developed in a way to facilitate users while delivering a presentation. The idea has been to facilitate the user feel comfortable by not making him use any aiding device or material. The milestone has been achieved to control slideshow during presentation. A series of snapshots presented in this chapter gives better understanding of the results we achieved.
Hand Detection is an important milestone, for which image processing techniques have been used. These techniques help in the detection of hand to perform further operations. Figure 6.1 shows the hand detection during the operation of MeMouse. A rectangular window covers the area where hand is detected by MeMouse. This has been achieved by using Skin detection, EDGE Detection and Blob Finding and then getting a combined image that gave the similarity that is hand. As the Figure 6.1 shows, hand has been detected which is the major milestone to find input in the scenario.
Figure 6.1: Hand Detection and tracking in MeMouse
Tracking of hand is another important issue that has been achieved in efficient way. Using Kalman Filter tracking has been made better. Onscreen tracking is an important step as it spots the point where user needs to perform operation. Tracking of hand shows the output on screen simultaneously. As the hand moves, cursor moves on computer screen.
Figure 6.2: Indication for Single Click
Static Gestures have been added in the system to perform any operation. If the user keeps his hand static for 2 seconds, user is notified that if he moves his hand now, Single Click operation will be performed. Figure 6.2 shows the indication of Single Click to the user. User has his hand static for 2 seconds now. But if user wants to perform Double Click Operation, he has to keep his hand static for 2 more seconds. Overall 4 seconds is the time for which user needs to keep his hand static in order to perform Double Click operation. Figure 6.3 show that user is notified about that if he moves his hand now Double Click operation will be performed.
Figure 6.3: Indication for Double Click
If the user needs to perform Right Click on screen, 6 seconds is the minimum time the user has to keep his hand static. After that an indication appears on screen that if user moves his hand, Right Click operation will be performed. Figure 6.4 depicts the operation performed if user keeps his hand static for 6 seconds. Like previous two cases, a notification precedes the performance of Right Click operation. User usually needs to perform Single Click after performing Right Click. Figure 6.5 show that user has performed 'Refresh' operation by Right Clicking on desktop and then Single Clicking on Refresh tab.
Figure 6.4: Right Click operation performed
Figure 6.5: User performed Refresh operation
Other techniques that are used to control slide show includes Smart Boards, Wiimote and Digital Presenters. These are the devices that need more hardware support than MeMouse.
Digital Presenter is a device, connected via Bluetooth Technology to its other half in order to provide limited functionalities of Mouse. User needs to attach a Bluetooth connecting device with the computer in order to perform operations. The hardware required is expensive as well as it costs more if any fault happens. Digital Presenter controls the slides during presentation but cannot perform if any other operation is to be performed like opening any other program. User is restricted to a particular program. Whereas MeMouse can provide more functions that is Mouse functionality that can be used to open and close other programs as well during presentation.
Smart Board is a useful device. It has similar functionality as of MeMouse, but it has to have a large touch screen in order to perform Mouse operations. The equipment of Smart Board is quite expensive that is difficult to purchase for many institutes and organizations. Hence, MeMouse provides a cheaper solution to the same problem. MeMouse provides the functionality of Mouse operations in real time which makes MeMouse a better option than a Smart Board.
Wiimote is a device that is uses IR camera in order to perform the functions. Also it needs the user to use a pen-like device that has Light Emitting Diode (LED) at its tip. When that LED emits light, Wiimote tracks it and perform operations. On the other hand, MeMouse does not need any aiding material and provides the same functionalities in terms of Mouse operations. Also, Wiimote is expensive device than a webcamera that we use for MeMouse in order to get input. Hence, MeMouse is a smart choice.
MeMouse can perform the operations like Singe Click, Double Click and Right Click which in turns helps to control the Slide Show. Some of these devices provide even less functions than MeMouse being expensive as compared to MeMouse. Functionality provided by MeMouse is same as other technologies similar to it in much cheaper and hence efficient way.
Conclusion and Future Work
This chapter describes the overall achievements of MeMouse. Also, some suggestions have been presented n order to enhance the system and for up gradation of MeMouse. MeMouse can be extended smartly in order to cover a broader domain of Human Computer Interaction.
Concept of MeMouse was developed by the concept of Smart Boards and Digital Presenters. These devices are expensive as well as needs more expenses for their maintenance. Also, these devices are hardware dependent whereas MeMouse provides the use with freedom and not to be dependent upon hardware. MeMouse provides more convenient and useful environment in order to perform same functions which are required while presentations.
MeMouse is just an initiation of a wide field of Human Computer Interaction during presentation. It can achieve a lot of milestones which can make a presenter feel more comfortable with the environment while presenting.
MeMouse can be extended to provide on-screen keyboard to the users. That can be done by defining coordinates according to keyboard in video capturing area. This can let the user even process documents on-screen.
There is another suggestion to extend the software in order to detect human hand motion and captures it to write on projected screen. This idea is similar to writing on white board. Techniques to track the input object along with algorithms to enhance ability to learn of software can be used. Artificial Intelligence techniques can be most useful.
MeMouse can also take over computer control in every way for which user can use it. Digital image processing techniques can be used along with Artificial Intelligence to develop a complete computer system.
All of these techniques have to work with Windows API to perform desired implementations.
Human Computer Interaction can be made more and more convenient. Computer Vision and Digital Image Processing has to work in collaboration with Artificial Intelligence and miracles can be made reality. There is a wide range of option available in order to make Human Computer Interaction more and more comfortable.