This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Voice Transmission control of robot to operate and check through video and capture the pictures where a person can't enter into the place
Now a day's every system is automated in order to face new challenges. In the present days Automated systems have less manual operations, flexibility, reliability and accurate. Due to this demand every field prefers automated control systems. Especially in the field of electronics automated systems are giving good performance. Here in this project the robot follows a particular path in such a way that this path is always depends on the voice of ours.
In this topic "Voice over robot control", like the title indicates that controlling action of Robot is done through the voice and image is displayed in the pc using web cam. In this topic, there are two sections (transmitter & receiver) as shown in the block diagrams. The instructions such as Left, Right etc are processed and stored in the PC for authorized person's voice using MATLAB.
M A X
2 3 2
1.2 Receiver Section:
In Transmitter Section, the instructions are delivered through Microphone in authorized person's voice which is given to the PC. The voice instruction is processed and is compared. The data resembling the action for the instruction is sent by the transmitter if and only if the voice is proved to be authorized.
In Receiver Section, the signals from the transmitter section are received by the RF receiver through the receiving antenna. The Controller will control the Robot direction according to the instruction which is being given at the transmitter section. Then the Robot will move in that particular direction for the given instruction.
Along with the direction control of the Robot in addition we are monitoring the surroundings in the PC by arranging a Camera in the Robot section. The camera which is arranged at the Robot is Wireless. So that the robot is mobile to longer distances also. The receiver of the Camera is connected to the PC to monitor the LIVE telecast from the Robot.
Mat lab(Matrix Laboratory)
Keil C Compilers, Macro Assemblers, Debuggers, Real-time Kernels, Single-board Computers, and Emulators support all 8051.
MATLAB is a high performance language for technical computing .It integrates computation visualization and programming in an easy to use environment. Mat lab stands for matrix laboratory. It was written originally to provide easy access to matrix software developed by LINPACK (linear system package) and EISPACK (Eigen system package) projects. MATLAB is therefore built on a foundation of sophisticated matrix software in which the basic element is matrix that does not require pre dimensioning
USES OF MATLAB
1. Math and computation, 2.Algorithm development
1.2 HARDWARE COMPONENTS
1 .TRANSMITTER SECTION (Components in Transmitting Section)
Components in Receiving Section
Data analysis ,exploration and visualization
Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
PRINCIPLES OF SPEAKER RECOGNITION
Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. Shows the basic structures of speaker identification and verification systems.
Speaker recognition methods can also be divided into text-independent and text-dependent methods. In a text-independent system, speaker models capture characteristics of somebody's speech which show up irrespective of what one is saying. In a text-dependent system, on the other hand, the recognition of the speaker's identity is based on his or her speaking one or more specific phrases, like passwords, card numbers, PIN codes, etc.
All technologies of speaker recognition, identification and verification, text-independent and text-dependent, each has its own advantages and disadvantages and May requires different treatments and techniques. The choice of which technology to use is application-specific. The system that we will develop is classified as text-independent speaker identification system since its task is to identify the person who speaks regardless of what is saying.
At the highest level, all speaker recognition systems contain two main modules feature extraction and feature matching.
3. SPEECH SOUNDS
It can be classified into Three Distinct Classes according to the modes of Excitation
1. Plosive Sounds
2. Voiced Sounds
3. Unvoiced Sounds
1. PLOSIVE SOUNDS:
Plosive Sounds result from making a complete closure (again toward the front end of the vocal tract), building up pressure behind the closure, and abruptly releasing it.
2. VOICED SOUNDS:
Voiced sounds are produced by forcing air through the glottis with the tension of the vocal chords adjusted so that they vibrate in a relaxation oscillation, thereby producing quasi-periodic pulses of air which excite the vocal tract. Voiced sounds are characterized by
â€¢ High Energy Levels
â€¢ Very Distinct resonant and formant frequencies.
The rate at which the vocal chord vibrates determines the pitch. These vibrations are periodic in time thus voiced sounds are approximated by an impulse train. Spacing between impulses is the pitch, F0.
3. UNVOICED SOUNDS:
Voiced Sounds are also known as formants generated by forming a constriction at some point in the vocal tract (usually toward the mouth end), and forcing the air through the constriction at high enough velocity to produce turbulence. This creates a broad-spectrum noise source to excite the vocal tract.
Unvoiced sounds are characterized by
â€¢ Lower Energy Levels than voiced sounds.
â€¢ Higher frequencies than voiced sounds.
In other words we can say that unvoiced sounds (e.g. /sh/, /s/, /p/) are generated without vocal cords vibrations. The excitation is modeled by a White Gaussian Noise source. Unvoiced sounds have no pitch since they are excited by a non-periodic signal.
The speech waveform needs to be converted into digital format before it is suitable for processing in the speech recognition system. The raw speech waveform is in the analog format before conversion. The conversion of analog signal to digital signal involves three phases, mainly the sampling, quantization and coding phase. In the sampling phase, the analog signal is being transformed from a waveform that is continuous in time to a discrete signal. A discrete signal refers to the sequence of samples that are discrete in time. In the quantization phase, an approximate sampled value of a variable is converted into one of the finite values contained in a code set. These two stages allow the speech waveform to be represented by a sequence of values with each of these values belonging to the set of finites values. After passing through the sampling and quantization stage, the signal is then coded in the coding phase. The signal is usually represented by binary code. These three phases needs to be carried out with caution as any miscalculations, over-sampling and quantization noise will result in loss of information. Below are the problems faced by the three phases.
According to the Nyquist Theorem, the minimum sampling rate required is two times the bandwidth of the signal. This minimum sampling frequency is needed for the reconstruction of a band limited waveform without error. Aliasing distortion will occur if the minimum sampling rate is not met. Figure below shows the comparison between a properly sampled case and an improperly sampled case.
Aliasing Distortion by Improperly Sampling
Speech signals are more likely to have amplitude values near zero than at the extreme peak values allowed. For example, in digitizing voice, if the peak value allowed is 1V, weak passages may have voltage levels on the order of 0.1V. Speech signals with non-uniform amplitude distribution are likely to experience quantizing noise if the step size is not reduced for amplitude values near zero and increased for extremely large values. The quantizing noise is known as the granular and slope overload noise. Granular noise occurs when the step size is large for amplitude values near zero. Slope overload noise occurs when the step size is small and cannot keep up with the extremely large amplitude values. To solve the above quantizing noise problem, Delta Modulation (DM) is used. Delta Modulation works by reducing the step size for amplitude values near zero and increasing the step size for extremely large amplitude values. Figure below shows a diagram on the two types of noises.
Analog Input and Accumulator Output Waveform
APPROACHES TO SPEECH RECOGNITION
Human beings are the best "machine" to recognize and understand speech. We are able to combine a wide variety of linguistic knowledge concerned with syntax and semantics and adaptively use this knowledge according to the difficulties and characteristics of the sentences. The speech recognition system is built with this aim in mind to match or exceed human performance. There are generally three approaches to speech recognition, namely, acoustic-phonetic, pattern recognition and the artificial intelligence approach. These three approaches will be explained in greater detail in the following sections.
The project "Voice over Robot Control" has been successfully designed and tested project.
It has been developed by integrating features of all the hardware components used. Presence of every module has been reasoned out and placed carefully thus contributing to the best working of the unit.
Secondly, using highly advanced IC's and with the help of growing technology the project has been successfully implemented.
Sooner or later, robot can become more interactive and human like taking voice commands from people.
Voice over Robot Control can be used to analyze the state of mind the
Speaker has, at the time of speaking like different emotion associated with different tones of voice.
Training Air traffic controls
Telephony and other domains