In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as photographs or frames of video; the output of image processing can be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it.
The red, green, and blue color channels of a photograph by Sergei Mikhailovich Prokudin-Gorskii. The fourth image is a composite.
Among many other image processing operations are:
- Euclidean geometry transformations such as enlargement, reduction, and rotation
- Color corrections such as brightness and contrast adjustments, quantization, or color translation to a different color space
- Digital compositing or optical compositing (combination of two or more images). Used in film-making to make a "matte"
- Interpolation, demosaicing, and recovery of a full image from a raw image format using a Bayer filter pattern
- Image registration, the alignment of two or more images
- Image differencing and morphing
- Image recognition, for example, extract the text from the image by using optical character recognition
- Image segmentation
- High dynamic range imaging by combining multiple images
- Geometric hashing for 2-D object recognition with affine invariance
- Computer vision
- Augmented Reality
- Face detection
- Feature detection
- Lane departure warning system
- Non-photorealistic rendering
- Medical image processing
- Microscope image processing
- Morphological image processing
- Remote sensing
Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory for building artificial systems that obtain information from images. The image data can take many forms, such as a video sequence, views from multiple cameras, or multi-dimensional data from a medical scanner.
As a technological discipline, computer vision seeks to apply its theories and models to the construction of computer vision systems. Examples of applications of computer vision include systems for:
- Controlling processes (e.g., an industrial robot or an autonomous vehicle).
- Detecting events (e.g., for visual surveillance or people counting).
- Organizing information (e.g., for indexing databases of images and image sequences).
- Modeling objects or environments (e.g., industrial inspection, medical image analysis or topographical modeling).
- Interaction (e.g., as the input to a device for computer-human interaction).
Computer vision can also be described as a complement (but not necessarily the opposite) of biological vision. In biological vision, the visual perception of humans and various animals are studied, resulting in models of how these systems operate in terms of physiological processes. Computer vision, on the other hand, studies and describes artificial vision systems that are implemented in software and/or hardware. Interdisciplinary exchange between biological and computer vision has proven increasingly fruitful for both fields.
Computer vision is, in some ways, the complement to computer graphics. While computer vision obtains models and understanding from visual media, computer graphics uses models of the world to synthesize visual media. There is also a trend towards a combination of the two disciplines, e.g., as explored in augmented reality.
Applications for computer vision
One of the most prominent application fields is medical computer vision or medical image processing. This area is characterized by the extraction of information from image data for the purpose of making a medical diagnosis of a patient. Generally, image data is in the form of microscopy images, X-ray images, angiography images, ultrasonic images, and tomography images. An example of information which can be extracted from such image data is detection of tumours, arteriosclerosis or other malign changes. It can also be measurements of organ dimensions, blood flow, etc. This application area also supports medical research by providing new information, e.g., about the structure of the brain, or about the quality of medical treatments.
A second application area in computer vision is in industry, sometimes called machine vision, where information is extracted for the purpose of supporting a manufacturing process. One example is quality control where details or final products are being automatically inspected in order to find defects. Another example is measurement of position and orientation of details to be picked up by a robot arm.
Military applications are probably one of the largest areas for computer vision. The obvious examples are detection of enemy soldiers or vehicles and missile guidance. More advanced systems for missile guidance send the missile to an area rather than a specific target, and target selection is made when the missile reaches the area based on locally acquired image data. Modern military concepts, such as "battlefield awareness", imply that various sensors, including image sensors, provide a rich set of information about a combat scene which can be used to support strategic decisions. In this case, automatic processing of the data is used to reduce complexity and to fuse information from multiple sensors to increase reliability.
Artist's Concept of Rover on Mars, an example of an unmanned land-based vehicle. Notice the stereo cameras mounted on top of the Rover. (credit: Maas Digital LLC)
One of the newer application areas is autonomous vehicles, which include submersibles, land-based vehicles (small robots with wheels, cars or trucks), aerial vehicles, and unmanned aerial vehicles (UAV). The level of autonomy ranges from fully autonomous (unmanned) vehicles to vehicles where computer vision based systems support a driver or a pilot in various situations. Fully autonomous vehicles typically use computer vision for navigation, i.e. for knowing where it is, or for producing a map of its environment (SLAM) and for detecting obstacles. It can also be used for detecting certain task specific events, e. g., a UAV looking for forest fires. Examples of supporting systems are obstacle warning systems in cars, and systems for autonomous landing of aircraft. Several car manufacturers have demonstrated systems for autonomous driving of cars, but this technology has still not reached a level where it can be put on the market. There are ample examples of military autonomous vehicles ranging from advanced missiles, to UAVs for recon missions or missile guidance. Space exploration is already being made with autonomous vehicles using computer vision, e. g., NASA's Mars Exploration Rover.
Other application areas include:
- Support of visual effects creation for cinema and broadcast, e.g., camera tracking (matchmoving).
Typical tasks of computer vision
Each of the application areas described above employ a range of computer vision tasks; more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods. Some examples of typical computer vision tasks are presented below.
The classical problem in computer vision, image processing and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. This task can normally be solved robustly and without effort by a human, but is still not satisfactorily solved in computer vision for the general case: arbitrary objects in arbitrary situations. The existing methods for dealing with this problem can at best solve it only for specific objects, such as simple geometric objects (e.g., polyhedrons), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and pose of the object relative to the camera.
Different varieties of the recognition problem are described in the literature:
- Object recognition: one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene.
- Identification: An individual instance of an object is recognized. Examples: identification of a specific person's face or fingerprint, or identification of a specific vehicle.
- Detection: the image data is scanned for a specific condition. Examples: detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.
Several specialized tasks based on recognition exist, such as:
- Content-based image retrieval: finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contains many houses, are taken during winter, and have no cars in them).
- Pose estimation: estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation.
- Optical character recognition (OCR): identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).
Several tasks relate to motion estimation where an image sequence is processed to produce an estimate of the velocity either at each points in the image or in the 3D scene, or even of the camera that produece the images . Examples of such tasks are:
- Egomotion: determining the 3D rigid motion (rotation and translation) of the camera from an image sequence produced by the camera.
- Tracking: following the movements of a (usually) smaller set of interest points or objects (e.g., vehicles or humans) in the image sequence.
- Optical flow: to determine, for each point in the image, how that point is moving relative to the image plane, i.e., its apparent motion. This motion is a result both of how the corresponding 3D point is moving in the scene and how the camera is moving relative to the scene.
Given one or (typically) more images of a scene, or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model.
The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters such as low-pass filters or median filters. More sophisticated methods assume a model of how the local image structures look like, a model which distinguishes them from the noise. By first analysing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches.
Face detection is a computer technology that determines the locations and sizes of human faces in arbitrary (digital) images. It detects facial features and ignores anything else, such as buildings, trees and bodies.
Definition and relation to other tasks
Face detection can be regarded as a specific case of object-class detection; In object-class detection, the task is to find the locations and sizes of all objects in an image that belong to a given class. Examples include upper torsos, pedestrians, and cars.
Face detection can be regarded as a more general case of face localization; In face localization, the task is to find the locations and sizes of a known number of faces (usually one). In face detection, one does not have this additional information.
Early face-detection algorithms focused on the detection of frontal human faces, whereas newer algorithms attempt to solve the more general and difficult problem of multi-view face detection. That is, the detection of faces that are either rotated along the axis from the face to the observer (in-plane rotation), or rotated along the vertical or left-right axis (out-of-plane rotation),or both.
Face detection as a pattern-classification task
Many algorithms implement the face-detection task as a binary pattern-classification task. That is, the content of a given part of an image is transformed into features, after which a classifier trained on example faces decides whether that particular region of the image is a face, or not.
Often, a window-sliding technique is employed. That is, the classifier is used to classify the (usually square or rectangular) portions of an image, at all locations and scales, as either faces or non-faces (background pattern).
Face detection is used in biometrics, often as a part of (or together with) a facial recognition system. It is also used in video surveillance, human computer interface and image database management. Some recent digital cameras use face detection for autofocus. Also, face detection is useful for selecting regions of interest in photo slideshows that use a pan-and-scale Ken Burns effect.
Feature detection (computer vision)
In computer vision and image processing the concept of feature detection refers to methods that aim at computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves or connected regions.
Definition of a feature
There is no universal or exact definition of what constitutes a feature, and the exact definition often depends on the problem or the type of application. Given that, a feature is defined as an "interesting" part of an image, and features are used as a starting point for many computer vision algorithms. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. Consequently, the desirable property for a feature detector is repeatability: whether or not the same feature will be detected in two or more different images of the same scene.
Feature detection is a low-level image processing operation. That is, it is usually performed as the first operation on an image, and examines every pixel to see if there is a feature present at that pixel. If this is part of a larger algorithm, then the algorithm will typically only examine the image in the region of the features. As a built-in pre-requisite to feature detection, the input image is usually smoothed by a Gaussian kernel in a scale-space representation and one or several feature images are computed, often expressed in terms of local derivative operations.
Occasionally, when feature detection is computationally expensive and there are time constraints, a higher level algorithm may be used to guide the feature detection stage, so that only certain parts of the image are searched for features.
Where many computer vision algorithms use feature detection as the initial step, so as a result, a very large number of feature detectors have been developed. These vary widely in the kinds of feature detected, the computational complexity and the repeatability. At an overview level, these feature detectors can (with some overlap) be divided into the following groups:
Types of image features
Edges are points where there is a boundary (or an edge) between two image regions. In general, an edge can be of almost arbitrary shape, and may include junctions. In practice, edges are usually defined as sets of points in the image which have a strong gradient magnitude. Furthermore, some common algorithms will then chain high gradient points together to form a more complete description of an edge. These algorithms usually place some constraints on the properties of an edge, such as shape, smoothness, and gradient value.
Locally, edges have a one dimensional structure.
The terms corners and interest points are used somewhat interchangeably and refer to point-like features in an image, which have a local two dimensional structure. The name "Corner" arose since early algorithms first performed edge detection, and then analysed the edges to find rapid changes in direction (corners). These algorithms were then developed so that explicit edge detection was no longer required, for instance by looking for high levels of curvature in the image gradient. It was then noticed that the so-called corners were also being detected on parts of the image which were not corners in the traditional sense (for instance a small bright spot on a dark background may be detected). These points are frequently known as interest points, but the term "corner" is used by tradition.
Medical imaging is the technique and process used to create images of the human body (or parts and function thereof) for clinical purposes (medical procedures seeking to reveal, diagnose or examine disease) or medical science (including the study of normal anatomy and physiology).
As a discipline and in its widest sense, it is part of biological imaging and incorporates radiology (in the wider sense), nuclear medicine, investigative radiological sciences, endoscopy, (medical) thermography, medical photography and microscopy (e.g. for human pathological investigations).
Measurement and recording techniques which are not primarily designed to produce images, such as electroencephalography (EEG), magnetoencephalography (MEG), Electrocardiography (EKG) and others, but which produce data susceptible to be represented as maps (i.e. containing positional information), can be seen as forms of medical imaging.
The electron microscope is a microscope that can magnify very small details with high resolving power due to the use of electrons as the source of illumination, magnifying at levels up to 2,000,000 times.
Electron microscopy is employed in anatomic pathology to identify organelles within the cells. Its usefulness has been greatly reduced by immunhistochemistry but it is still irreplaceable for the diagnosis of kidney disease, identification of immotile cilia syndrome and many other tasks
Two forms of radiographic images are in use in medical imaging; projection radiography and fluoroscopy, with latter useful for intraoperative and catheter guidance. These 2D techniques are still in wide use despite the advance of 3D tomography due to the low cost, high resolution, and depending on application, lower radiation dosages. This imaging modality utilizes a wide beam of x rays for image acquisition and is the first imaging technique available in modern medicine.
Fluoroscopy produces real-time images of internal structures of the body in a similar fashion to radiography, but employs a constant input of x-rays, at a lower dose rate. Contrast media, such as barium, iodine, and air are used to visualize internal organs as they work. Fluoroscopy is also used in image-guided procedures when constant feedback during a procedure is required. An image receptor is required to convert the radiation into an image after it has passed through the area of interest. Early on this was a fluorescing screen, which gave way to an Image Amplifier (IA) which was a large vacuum tube that had the receiving end coated with cesium iodide, and a mirror at the opposite end. Eventually the mirror was replaced with a TV camera.
Projectional radiographs, more commonly known as x-rays, are often used to determine the type and extent of a fracture as well as for detecting pathological changes in the lungs. With the use of radio-opaque contrast media, such as barium, they can also be used to visualize the structure of the stomach and intestines - this can help diagnose ulcers or certain types of colon cancer.
Magnetic resonance imaging (MRI)
A brain MRI representation
A magnetic resonance imaging instrument (MRI scanner), or "nuclear magnetic resonance (NMR) imaging" scanner as it was originally known, uses powerful magnets to polarise and excite hydrogen nuclei (single proton) in water molecules in human tissue, producing a detectable signal which is spatially encoded, resulting in images of the body. MRI uses three electromagnetic fields: a very strong (on the order of units of teslas) static magnetic field to polarize the hydrogen nuclei, called the static field; a weaker time-varying (on the order of 1kHz) field(s) for spatial encoding, called the gradient field(s); and a weak radio-frequency (RF) field for manipulation of the hydrogen nuclei to produce measurable signals, collected through an RF antenna.
Photoacoustic imaging is a recently developed hybrid biomedical imaging modality based on the photoacoustic effect. It combines the advantages of optical absorption contrast with ultrasonic spatial resolution for deep imaging in (optical) diffusive or quasi-diffusive regime. Recent studies have shown that photoacoustic imaging can be used in vivo for tumor angiogenesis monitoring, blood oxygenation mapping, functional brain imaging, and skin melanoma detection, etc.
Needs main article Digital Infrared Imaging Thermography is based on the principle that metabolic activity and vascular circulation in both pre-cancerous tissue and the area surrounding a developing breast cancer is almost always higher than in normal breast tissue. Cancerous tumors require an ever-increasing supply of nutrients and therefore increase circulation to their cells by holding open existing blood vessels, opening dormant vessels, and creating new ones (neoangiogenesis). This process frequently results in an increase in regional surface temperatures of the breast. Digital Infrared Imaging uses extremely sensitive medical infrared cameras and sophisticated computers to detect, analyze, and produce high-resolution diagnostic images of these temperature variations. Because of DII's sensitivity, these temperature variations may be among the earliest signs of breast cancer and/or a pre-cancerous state of the breast.
- Linear tomography: This is the most basic form of tomography. The X-ray tube moved from point "A" to point "B" above the patient, while the cassette holder (or "bucky") moves simultaneously under the patient from point "B" to point "A." The fulcrum, or pivot point, is set to the area of interest. In this manner, the points above and below the focal plane are blurred out, just as the background is blurred when panning a camera during exposure. No longer carried out and replaced by computed tomography.
- Poly tomography: This was a complex form of tomography. With this technique, a number of geometrical movements were programmed, such as hypocycloidic, circular, figure 8, and elliptical. Philips Medical Systems  produced one such device called the 'Polytome.' This unit was still in use into the 1990s, as its resulting images for small or difficult physiology, such as the inner ear, was still difficult to image with CTs at that time. As the resolution of CTs got better, this procedure was taken over by the CT.
- Zonography: This is a variant of linear tomography, where a limited arc of movement is used. It is still used in some centres for visualising the kidney during an intravenous urogram (IVU).
- Orthopantomography (OPT or OPG): The only common tomographic examination in use. This makes use of a complex movement to allow the radiographic examination of the mandible, as if it were a flat bone. It is often referred to as a "Panorex", but this is incorrect, as it is a trademark of a specific company's equipment
- Computed Tomography (CT), or Computed Axial Tomography (CAT): A CT scan, also known as a CAT scan, is a helical tomography (latest generation), which traditionally produces a 2D image of the structures in a thin section of the body. It uses X-rays. It has a greater ionizing radiation dose burden than projection radiography; repeated scans must be limited to avoid health effects.
Medical ultrasonography uses high frequency broadband sound waves in the megahertz range that are reflected by tissue to varying degrees to produce (up to 3D) images. This is commonly associated with imaging the fetus in pregnant women. Uses of ultrasound are much broader, however. Other important uses include imaging the abdominal organs, heart, breast, muscles, tendons, arteries and veins. While it may provide less anatomical detail than techniques such as CT or MRI, it has several advantages which make it ideal in numerous situations, in particular that it studies the function of moving structures in real-time, emits no ionizing radiation, and contains speckle that can be used in elastography. It is very safe to use and does not appear to cause any adverse effects, although information on this is not well documented. It is also relatively inexpensive and quick to perform. Ultrasound scanners can be taken to critically ill patients in intensive care units, avoiding the danger caused while moving the patient to the radiology department. The real time moving image obtained can be used to guide drainage and biopsy procedures. Doppler capabilities on modern scanners allow the blood flow in arteries and veins to be assessed.
Until the early 1990s, most image acquisition in video microscopy applications was typically done with an analog video camera, often simply closed circuit TV cameras. While this required the use of a frame grabber to digitize the images, video cameras provided images at full video frame rate (25-30 frames per second) allowing live video recording and processing. While the advent of solid state detectors yielded several advantages, the real-time video camera was actually superior in many respects.
Today, acquisition is usually done using a CCD camera mounted in the optical path of the microscope. The camera may be full colour or monochrome. Very often, very high resolution cameras are employed to gain as much direct information as possible. Cryogenic cooling is also common, to minimise noise. Often digital cameras used for this application provide pixel intensity data to a resolution of 12-16 bits, much higher than is used in consumer imaging products.
2D image techniques
Image processing for microscopy application begins with fundamental techniques intended to most accurately reproduce the information contained in the microscopic sample. This might include adjusting the brightness and contrast of the image, averaging images to reduce image noise and correcting for illumination non-uniformities. Such processing involves only basic arithmetic operations between images (i.e. addition, subtraction, multiplication and division). The vast majority of processing done on microscope image is of this nature.
Another class of common 2D operations called image convolution are often used to reduce or enhance image details. Such "blurring" and "sharpening" algorithms in most programs work by altering a pixel's value based on a weighted sum of that and the surrounding pixels. (a more detailed description of kernel based convolution deserves an entry for itself).
3D image techniques
Another common requirement is to take a series of images at a fixed position, but at different focal depths. Since most microscopic samples are essentially transparent, and the depth of field of the focused sample is exceptionally narrow, it is possible to capture images "through" a three-dimensional object using 2D equipment like confocal microscopes. Software is then able to reconstruct a 3D model of the original sample which may be manipulated appropriately. The processing turns a 2D instrument into a 3D instrument, which would not otherwise exist. In recent times this technique has led to a number of scientific discoveries in cell biology.
Remote sensing is the small or large-scale acquisition of information of an object or phenomenon, by the use of either recording or real-time sensing device(s) that are wireless, or not in physical or intimate contact with the object (such as by way of aircraft, spacecraft, satellite, buoy, or ship). In practice, remote sensing is the stand-off collection through the use of a variety of devices for gathering information on a given object or area. Thus, Earth observation or weather satellite collection platforms, ocean and atmospheric observing weather buoy platforms, the monitoring of a parolee via an ultrasound identification system, Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), X-radiation (X-RAY) and space probes are all examples of remote sensing. In modern usage, the term generally refers to the use of imaging sensor technologies including: instruments found in aircraft and spacecraft as well as those used in electrophysiology, and is distinct from other imaging-related fields such as medical imaging.
There are two kinds of remote sensing. Passive sensors detect natural radiation that is emitted or reflected by the object or surrounding area being observed. Reflected sunlight is the most common source of radiation measured by passive sensors. Examples of passive remote sensors include film photography, Infrared, charge-coupled devices, and radiometers. Active collection, on the other hand, emits energy in order to scan objects and areas whereupon a sensor then detects and measures the radiation that is reflected or backscattered from the target. RADAR is an example of active remote sensing where the time delay between emission and return is measured, establishing the location, height, speed and direction of an object.
Generally speaking, remote sensing works on the principle of the inverse problem. While the object or phenomenon of interest (the state) may not be directly measured, there exists some other variable that can be detected and measured (the observation), which may be related to the object of interest through the use of a data-derived computer model. The common analogy given to describe this is trying to determine the type of animal from its footprints. For example, while it is impossible to directly measure temperatures in the upper atmosphere, it is possible to measure the spectral emissions from a known chemical species (such as carbon dioxide) in that region. The frequency of the emission may then be related to the temperature in that region via various thermodynamic relations.
The quality of remote sensing data consists of its spatial, spectral, radiometric and temporal resolutions.