Video coding standard

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.


Since the introduction of H.264 video coding standard, many applications had gain the useful advantages of this standard. Application such as transmitting video via satellite, High Definition Television (HDTV), surveillance and security system are some that is proven to benefit from the growth of the standard. H.264 had evolved from a monoscopic video coding standard to a more complex stereoscopic video standard. Today, researchers are focusing on developing a multiview video code based on the original H.264 standard. The objective of this research is to identify how the multiview video coding operates, to examine whether if it is possible to apply a multiview video code that will increase the video quality and decreases the usage of bit rate to transfer the data and to identify if it is possible to develop a new algorithm that improves the encoding of the image. Perceived viewing quality must also be taken into consideration to achieve a pleasant viewing experience. In order to be successful and efficient, limitations must be resolved immediately. As a conclusion, this video coding standard can offer more than current standards are able to.

Chapter 1: Introduction


Demand for displaying video sequences stereoscopically has increased recently due to the needs of "real live experience" in many modern applications nowadays. Stereoscopic video coding manages to deliver 3 dimension (3-D) images or video sequences which provide a sense of depth sensation to the viewers.


First issue that this research needs to overcome is achieving higher compression rate while still manage to deliver an acceptable visual quality (i.e. without distortion from either blocking or ringing artefacts) towards the viewer.

Optimising stereoscopic video coding is also another issue that needs to be addressed. This will ensure that all parameters set up from the codec that influences the quality output are properly

Experimental organisation is also discusses in this report. Questionnaire relating to this experiment are also reviewed.

Research Objective and Questions

This research has the following objectives:

1) Carry out a complete and critical literature review on stereoscopic and multi - view video coding technology

2) Review and understand the H.264 MVC and H.264 SVC, video coding standard.

3) Develop novel algorithms for the efficient, joint motion-disparity estimation in multi-view images, both in pixel and multi - resolution DWT domain.

4) Develop new coding algorithms for multi-view video that are resolution, PSNR, spatial, temporal and ROI scalable.

5) Investigate the subjective quality of the stereoscopic video sequences and images gained from the codec.

The following research questions will be investigated:

1) What is the best existing, multi-view video coding algorithm? What are its limitations and how can they be overcome?

2) How does one achieve optimal rate-distortion performance for a given multi-view camera configuration?

3) Can joint motion-disparity estimation be done in DWT domain? If so, how?

4) Can resolution, PSNR, bit-rate, ROI and temporal scalabilities be incorporated within the CODEC design? If so how?

5) How to improve subjective quality of decoded multi-view images, via selective compression on ROI's?

Chapter's Overview:

Chapter 1: Introduction

This chapter discussed about the stereoscopic video coding using H.264/SVC codec, its advantages & constrains. The main research objective and questions are also justified in this chapter.

Chapter 2: Literature Review

This chapter focuses on previous academic research work relating to the topic. The emerging of H.264/SVC codec, current and existing techniques, edges and eye constraints are discussed in this chapter.

Chapter 3: Proposed System

This chapter discussed the experimental set up and procedures that are conducted to achieve the desire objectives. The designed questionnaire is also reviewed.

Chapter 4: Preliminary Study Analysis

This chapter will examine the results gathered from the experiment performed.

Chapter 5: Current and Future Work

This chapter discusses the work that had been done for the second year of this research as well as explaining the future work that will be performed for the third year.

Chapter 6: Annexure

Annex-A: Questionnaire

Annex-B: Gantt chart

Annex-C: References

Chapter 2: Literature Review & Background


This review was carried out to improve the understanding and awareness of the current knowledge level concerning the topic in hand. Papers and articles were gathered from numerous sources such as publications, conference proceedings and the Internet. The following is an overview of the background and existing research.

A Brief History

In 1998, the Video Coding Expert Group (VCEG) proposed a number of novel ideas to increase the coding efficiency of video, compared to ideas presented within previously proposed video coding techniques such as MPEG-1, MPEG-2 and MPEG-4. These ideas were brought together within a single standard that was aimed at increasing the efficiency of rate-distortion performance. This coding standard was named as H.26L. In 2001, VCEG and Moving Picture Expert Group (MPEG) joined forces and are now known as Joint Video Team (JVT) and have since worked on further standardization of H.264. Since the initial standardization of H.264/AVC, the JVT has worked on new extensions of H.264/AVC in order to expand the applicability of the standard. This includes the Scalable Video Coding (SVC) Standard and Multiview Video Coding (MVC). Our particular interest here is drawn to the H.264/MVC standard and exploitation of scalabilities under the H.264/SVC standard.

The Emerging of H.264/AVC Standard

H.264/AVC was developed in order to support video coding. This video coding technique is recognized for accomplishing very high data compression ratio. Schäfer (2003) mentioned that technique was developed with the cooperation between ITU-T Video Coding Expert Group and ISO/IEC Moving Picture Expert Group (MPEG). The coding standardization supports simple video coding design and raise compression standards.

This standard applies to both Video Coding Layer (VCL) and Network Abstraction Layer (NAL). VCL represents video content. The first picture is coded without using reference to other picture but to itself. The next frame is then coded based using the frame previously next to it. This is known as "Inter" coding. The motion data is transmitted separately and applied by the decoder to provide prediction. NAL on the other hand, formats the VCL information on the header. This provides the information transmitted is more robust and flexible.

The video is divided into smaller block of image, either 16X16 or 8X8 blocks.

H.264/AVC supports five type of coding type:

  • I Slice

Intra slice - blocks are coded independently without reference to other block

  • P Slice

Predictive slice - signals are predicted using previous block

  • B Slice

Bi-predictive - blocks use a weighted average of two distinct motion - compensated prediction values

  • SP Slice

Switching P - coded slice for efficient switching between video streams, similar to coding of a P slice

  • SI Slice

Switching I - similar to coding of an I slice

SP and SI slice are used when switching bit streams which were coded at different bit-rates.

To prevent error while transmitting, H.264/AVC has a feature called Flexible Macroblock Ordering (FMO) which assigns a pattern to each slice. If a slice is lost in transmission, recovery can be made based on other slice which is received error - free.

Each predictive block is then divided into smaller blocks. Example, for an 8X8 block, a 4x4 sub-block is coded and produces maximum 16 motion vectors that can be transmitted. Therefore the accuracy of the motion can be more precise. This motion vector is also applicable outside the frame.

This standard furthermore supports multi-picture motion-compensated prediction. More than one previous frame can be referred to pr edict the current frame.

The transformation used in H.264/AVC is separable integer transform instead of 4x4 DCT. This has almost the same properties as a 4x4 DCT but when inverse transform is applied, mismatched can be eliminated due to exact integer operations. Scalar quantization is used to transform coefficients. The quantization usually scans in a zigzag manner and transmitted using entropy coding methods.

The default entropy coding uses a single infinite-extended codeword. A single mapping is applied to the data statistics. For transmitting the quantized coefficients, Context-Adaptive Variable Length Coding (CAVLC) is applied. VLC tables for various syntax elements are swapped depending on the transmitted syntax elements. However, this can be improved if using Context-Adaptive Binary Arithmetic Coding (CABAC). This coding allows assignment of a non-integer numbers of bits and permits adaptation to non-stationary symbol statistics. CABAC reduces bit-rate of 10 - 15% when coding TV signals compare to CAVLC.

The usage of in-loop deblocking filter helps H.264/AVC to reduce the blocking artefacts produced by compression. The sharpness of the frame is increased thus the quality of the image is also increased.

Interlace coding is applicable in H.264/AVC standard. This coding takes place when either a field or field frame to be coded using prediction processes and prediction residual coding. Then deblocking filtering is applied for all block in the frame.

A conformance point is an indication point where a test can be made to determine if a system meets a set of conformance criteria. Profiles and levels identify the conformance points for the H.264/AVC standard. Profile identify the set of coding tools for generating bitstream while level sets the constraint on certain key parameters of the bitstream. Decoders have to support a specific profile, on the other hand, encoders are not required to support any profile feature but need to have conforming bitstream. The same set of level definition is required with the profiles.

Comparison of H.264/AVC to other video coding such as MPEG-2 Visual, H.263++ and MPEG-4 Visual are made using the same video sequence to measure their performance. The result of the testing proves that H.264/AVC is the most effective in bit-rate saving compare to other standards. This is mostly influenced by the highly flexible motion model and efficient context - based arithmetic coding.

With this efficient data compression standard, H.264/AVC can be applied in business related application or other application fields. Television broadcast over the satellite can be improved in quality and quantity. In mobile telecommunication, cost of transmitting and receiving streaming video can be reduced as it uses lower bit rate to be transmitted. Some companies in Europe have optimized this standard for their main domain. This standard had improved the bit rate for transmitting high quality video at a higher rate compared to older video coding standards.

H.264/AVC is a major milestone in the video coding sector. This standard improves video applications and opens a new window for fresh business opportunity. As this is an open standard, meaning anyone can apply this standard, this leads to lowering the cost and price of new innovations. The application of this technology is now affordable for everyone anywhere.

Many researchers are trying to develop a coding technique that manipulates the properties of H.264/AVC video codec to achieve better compression rate and produces better quality output.

The technique of sending main and auxiliary streams to the multiplexer and merged as one stream of images before sending to the modified H.264 encoder was proposed by Adikari (2005). These streams are exploited using motion, disparity and worldline correlation. This technique applies that the left image is the reference frame and the right image is the predicted coded image. The result of this experiment shows that a significant PSNR gain when using this proposed codec.

Adikari et. al. (2006) proposed a technique that exploits the disparity to minimize compression efficiency along the motion compensation. This proposed technique was able to save more than 12% of total bit rate per frame compared to the original H.264 codec.

Akbari et. al. (2007) alternatively proposed a novel technique that re-sorts the frame of sequences captured by multiple cameras and produces a single sequence. The macroblock prediction can be performed either from disparity or motion compensated. This technique manages to enhance the performance than the simulcast coding of all bitrates. It was achieved by the correlation among views rather than just two neighbouring views.

As a conclusion, H.264/AVC has been proved that this video coding technique delivers much better quality of video at a lower bit rate compare to other video coding technique. H.264/AVC also helps reduces blocking artefact presented.

Stereoscopic Image & Video Compression

Video coding techniques had become more and more efficient in producing a higher quality output. Techniques such as MPEG-1, H.262/MPEG-2 Video and H.263 had been proved that newer techniques produce better results as each had their own characteristics and advantages. As modern equipments demands for more data to be transmitted (e.g. satellite), the video coding technique had to evolve as well. One of the benefits of H.264/AVC is it is faster and requires less bit rate. Stereoscopic video can provide the viewers a sense of depth perception if the coding was done properly. Long exposure to stereo vision can cause an uncomfortable feeling to the viewer's eyes as the stress to merge these two separate images is present. Therefore a balance between coding technique to compress the image and human eye perception is needed to accomplish an optimization of stereoscopic video.

According to Lo (2003), illumination, lighting arrangements and object position can also provide the illusion that depth is present. However participants have to take much longer time to judge the stereo images that was presented to them. The viewers must feel that they are looking at a stereo image instead of a photograph.

Seutiens et. al. (2006) and Stelmach et. al. (1998) both agrees that compressing image can save bandwidth while transmitting. In spite of this the quality of the image after transmission can decrease with the presence of blocking effect and loss of spatial details. As compression increases, quality and sharpness decrease and eye strain increases.

Ienaga et. al. (2001) recommends that binocular disparity is the best technique to rebuild a 3D image. Images are sent alternately to reduce the number of transmission. The benefit of doing this is that users can apply better using original stereoscopic video contrast to stereoscopic with two channel transmission. The drawback of this method is it reduces time of transmission and the resolution.

Symptoms of eye fatigue include eyestrain, dried tears around the eyes, pressure and ache around the eyes, difficulty on focusing, stiff shoulders and headache (Ukai and Howarth, 2007). The human eyes are able to detect depth perception from psychological cues and physiological cues. Psychological cue includes perspective, overlap, shadow, size and texture while physiological cues consist of binocular parallax - the difference in images between two eyes caused by different location, accommodation and convergence. However, Kooi and Toet (2004) conclude that a small amount of left/right asymmetric can affect the viewing comfort. Jitter, flickering, image motion and poor resolution can influence the viewing comfort. The presence of crosstalk during viewing can also produces eye strain and headache.

Dodgson et. al. (1999) reported that two images that are visible to each eye were not widely accepted due to the usage of stereoscopic glasses. Therefore, a new problem arises - tracking eye movement in order to display the images directly to the user. Normally, human eyes detect 3D objects by stereo parallax and movement parallax. Stereo parallax is seeing different images with each eye and movement parallax is the different images we see when we move our heads. Dodgson proposed a multi view display design that enables users to freely move their heads as long as it is within the viewing area. Nevertheless, it is difficult to build a display with many views and generating all views at once.

However, video coding on a monoscopic base lacks of depth perception. The depth perception is required as application such as High Definition Television, gaming consoles, security and surveillance as this can give the viewers more details of the image. This requires using at least two cameras parallel to each other. According to Puri (1997), this technique however requires more processing power as it needs to process 2 images from 2 cameras. The images needed to be checked for redundancies to avoid sending out unnecessary information that might slow down the transmitting process.

Balasubramaniyam et al. (2006) mentioned that adjacency constraint between regions was always used in region matching for stereoscopic images. In stereoscopic images, there might be overlapping objects, therefore this is not accurate and might affect the outcome. New researches proposed a new algorithm that is based on relative position constraints between regions. The region selected is compared and matched until all are compared.

The efficiency of compression and video streaming over the internet for stereoscopic video is also an important factor for implementation of this technique. Aksaya (2006) propose a content adaptive stereo video coding which exploits the inter-view correlations which only represents one of the views temporally. This was done by making use of inter-view and psycho-visual redundancy. The exploitation reduces of bandwidth requirements. The test resulted with zero packet loss.

General Theory & Existing Techniques

H.264/AVC has more attributes that help it to produce more quality images and uses less bit rate compare to previous video coding standards.

The depth perception produced by the H.264/Multiview Coding is due to the fact that when each eye is presented with corresponding image from the two views that form stereoscopic video, the viewer experiences the sensation of three-dimensional (3-D) vision (Yang et. al., 2006). However, long term of exposure can result in dizziness to the user. Steps can be taken to reduce this such as interchanging images at a scene image and users will not notice the difference of the quality but reduces the stress on the eye (Balasubramaniyam, 2006).

Redundancy is also a major problem when applying multiview video coding. This causes the encoder to transmit unnecessary data to the decoder, thus increasing the bit rate resulting in more computing process is needed but no difference in quality.

As multiview video coding is still new, we are planning to implement existing techniques that are currently applied to monoscopic video coding. Some researchers propose different method to tackle this obstacle. One proposes by applying a hybrid coding technique - motion compensated prediction, 2D DCT transform and exploitation of inter-view redundancy. The other proposes to apply the existing motion/disparity estimation technique from H.264 encoding environment. Yang (2006) proposes of predicting the motion vectors and prediction modes of macroblock of the predicted frame. The finding of this proposal is that the exploitation of binocular redundancy is not fully optimized.

An MPEG-4 compatible codec for multiview video coding combines MPEG-4 coding for one view and uses joint disparity and motion compensation for the other. Yang (2006) applies this technique for a five - view encoding and evaluate against four available possibilities of structures. The result was that the proposed encoder produced higher quality image using the same amount of bit - rate as other video coding codec.

Separate edge - preserving method for joint disparity and motion estimation for stereoscopic video sequences are also discussed. Block - based joint estimation algorithm was used to calculate the difference of the image from the stereoscopic images given. In order to preserve the edge, Sobel edge values are incorporated to the algorithm. Another algorithm was incorporated to identify the texture and homogeneous regions for identification and regularized separately. Experimental results had proved that the algorithm proposed is better than existing algorithm (Yang, 2004).

One of the applications of H.264/MVC is image tracking from a multiview image using a set of cameras. An efficient approach of tracking objects between overlapping and non - overlapping camera images was established. Background was removed to uncover the moving objects. Between each video sequences, temporal alignment was carry out to calculate the different processing rate of the cameras. Experiment was performed to confirm the theory and help to enhance the performance of object tracking and trajectory prediction. The results are based on multiple outdoor surveillance video (Black, 2002).

In order to display the encoded images, scalable video coding must also be taken into consideration. As application differ s the screen resolution (i.e. mobile phones has less resolution then HDTV) and computing power, we have to optimize what each differences to support the application (Barbarien, 2004).

Motion Estimation Techniques

Motion estimation plays a major role when trying to encode a moving object. The principle of motion estimation is to estimate how regions within a scene move from frame to frame (Edirisinghe, 2006). By calculating the prediction error of the selected region, by subtracting best match fro m original, we can exploit the redundancy that exists. There are many motion estimation technique had been proposed but which method best depends on the nature of the video.


In order to feel the presence of depth perception, edges play a major role in determining the borders of each object. For reducing redundancy among stereoscopic images, we are able to exploit the fact that the image pair is similar when disparities are compensated. Kim et. al. (1997) proposed an algorithm called adaptive directional, limited search (ADLS). Although other algorithm could reduce the computational complexity, the result is still worst compared to a full search algorithm. Therefore full search is still widely used in many coding scheme. ADLS take advantage of the homogeneous directional property of disparity vectors and high correlation between neighbouring blocks. The algorithm searches at a pre-determined position which is evaluated by restricted directional searching and then limited searching is performed. As a result, the algorithm decreases the computational load and improves the performance contrast to the full search algorithm.

Kim and Sohn (1999) proposed a new edge preserving directional regularization technique that smoothes the edges in smooth region and preserve edges in object boundaries. However this technique might over smooth the object boundaries. The researchers manage to overcome this by calculating the mean absolute error (MAE). If the value is smaller then a predetermine threshold, then the block is considered as a smooth area.

One technique that might recover a damaged edge was proposed by Lee et. al. (2007). The technique improves the damaged area by replacing with a similar area gained from previous frame. Block matching algorithm (BMA) uses the neighbouring blocks of the damage block and searches the block that best match from previous block. This technique manages to achieve a good concealment quality even when error rate is high or the movement of the object is fast. However large amount of computation power is needed to perform best-match block.


In this chapter, we have discussed some basic terms used in this report. The facts and figures mentioned above give us a general idea on how to proceed with this research.

Chapter 3: Proposed System


The proposed research aim is to investigate the perceived quality of visualisation obtained from compressed stereoscopic video sequences that had been passed through an H.264/SVC video codec. The H.264/SVC codec used at the beginning was obtained from a previous researcher that proposed a parallel video sequence technique.

This research follows seven steps in order to achieve this research objective.

Parallel Video Sequence

This technique applies that both left and right video sequences are fed into the codec as they are alternately switched form one and the other. A reference frame is temporarily stored to act as an original indication if any bit stream is lost during the coding/encoding process. By doing this, we are able to compress either one of the left/right video sequences at a higher rate while still getting a good quality output.

Video Subjective Quality Optimization

In order to achieve the main objective of this research, video subjective quality must be examined. A set of video sequences that had been encoded via the H.264/SVC video codec are shown to participants. The viewers then are required to answer a questionnaire that had been prepared to assemble the information and the user's experience after the experiment. Parameters available on the H.264/SVC codec includes group of picture, rate, distortion. CPU cycle, memory and subjective quality can be manipulated in order to achieve the desired quality output and compression ratio.

Questionnaire is used for collecting vital data to prove the research questions. Data from Intel's Vtune Performance Analyzer are also gathered to measure CPU usage. These steps are necessary to further understand human - machine interaction.

Although this research focuses on H.264 coding technique, it requires human subjective feedback in order to answer the research questions. This simple questionnaire was designed to gather as much information possible regarding the viewer's experience from viewing the video sequences. The data can be used to determine the experience factor in affecting the experiment results.


The subjective quality experiment was performed in order to determine whether the theory discussed can be applies in the real world. A series of questions are asked specifically to the subjects as are they are able to feel any discomfort while watching the selected video sequences which are performed in a controlled environment. The subjects are asked about the general quality of the stereoscopic video sequences as series of videos are increasingly compressed at controlled intervals.

A copy of the questionnaire are attached in Annex - A. This questionnaire was divided into 4 parts which the researcher thinks that will help gaining more specific information from the subjects.

Part 1: Viewer's background

This section asks the viewer's general experience relating to stereoscopic video coding. Any visual aids wore by the viewer's are also asked to investigate whether is there any relationship between wearing visual aids and eye stains.

Part 2: General Video Quality

A series of 15 video sequences are shown to the subjects. There are 3 different video sequences (blood cells, football and helicopter) which will be shown 5 times respectively. The viewer's will then mark each video sequences at a scale from 1 (very bad) to 5 (very good). Each right video sequence is at an uncompressed rate while the left video sequences are compressed at 10 quantised parameter intervals. During any time of the videos are shown, if there are any presence of eye strain, the viewers are able to mark down any existence of eye strain. The viewers are asked to mark any affected areas that might appear in the video sequences. This section questions the viewer if they noticed any flickering or smearing in the video sequences. They are also asked to rate if this defect irritate and distract them from having a pleasant viewing experience.

Part 3: Eye strain

This section asks the viewers whether they had any experience on eye strain present during the experiment. If the feeling of eye strain is there, on which eye are effected.

Part 4: Improvements

This section asks the subjects for any suggestions that might improve the viewing experience for future developments and fine-tuning the questionnaire so more valuable information can be extracted.

Previous Work

We decide to improve on our previous work on stereoscopic video coding which was focusing on alternating the left/right video sequence where subjective quality was not fully analysed. The research performed using the current video codec and we manage to accomplish up to 50% compression rate.

Proposed Method

The left and right video sequences are fed into the codec. The compress rate for the left video sequence is set to normal, where as the compression rate for the right sequence are decreased by 10 intervals. As the image captured by these cameras tend to have redundancy between the reference and the predicted. The redundancy can be classified as Temporal, Binocular and Worldline redundancy. Temporal redundancy is the redundancy between the first and second frame of the reference image. Binocular is the redundancy between the reference frame and the prediction frame. On the other hand, Worldline redundancy exists between the first frame of the reference frame and the second frame from the prediction frame. As the difference between the left - right sequence are quite big, we only selected the redundancy between the left - middle sequences or right - middle sequences

These encoded and decoded videos are then shown to 10 subjects as a pilot experiment to gather as much information before a major roll out involving 50 participants are conducted. The feedback that had been gathered is then used to improve the experiment settings as well as the coding technique.

We also need to identify within the code where the Peak Signal-to-Noise Ratio (PSNR) and the bit rate is calculated. PSNR are defined as 'a measure of the similarity of an image that is computed by measuring the pixel difference between the original image and the compressed image' (Mulopulus, et. al., 2003). As we configure the encoding settings, PSNR and bit rate will help us to identify any changes that occur.

We have selected three video sequences that have different characteristics to perform the experiment. "Blood Cell" is a 160 x 240 stereoscopic video, while "Football" and "Helicopter" is a 640 x 480 stereoscopic video sequence. These video sequences had been selected because each has their own characteristic. These two differences are ideal for us to determine weather does the movement of the object or the camera has any effect on the video quality and bit rate.

As the level of participant's level expertise regarding stereoscopic video coding varies, participants are specifically informed to focus on the edges of the images as well as a general feel of the videos. Original video sequences are also shown in order to make a quick comparison between the compressed and the uncompressed video sequences.

Research Novelty

Based on the literature review, the research that had been performed up to date mostly focuses on PSNR and rate distortion. Little had been done to investigate the relationship of compressing the video sequences and the perceived quality form a human point of view. Therefore, this research novelty is listed as below:

a) Involve human subjects to determine of acceptable video quality with the association of compressing video to a certain threshold

b) Differ in term of experimental setup where the tests will be performed in a real life situation (i.e. lighting and surrounding mimics real world conditions)

c) Optimisation of parameters that contribute to compression while still maintain the desirable quality output

d) Developing new coding algorithm for multiview video that are resolution, PSNR, spatial temporal and ROI scalable

Initial Design and Hardware

Figure 3.1 below displays the block diagram of the initial design and hardware of this research. There are three main blocks, consists of Human, Questionnaire and Redesign Output. In the block diagram, human become the controller of the whole process. As the viewers conducting this experiment, the eye will deliver messages to the brain and will analyse whether the quality shown are adequate to the viewer's perception. The feedback gained will be analysed and any adjustments (from codec point of view) will be performed and newly amendment constructed video will be shown again to the viewers.


As a conclusion, this closed - loop process will continue until the viewers manage to come up with a desirable quality output at a certain compression rate. It can be said that this is a two way interaction that needs significantly improvement to accomplish the research questions and objectives. The results and finding for the experiments above will be discussed in Chapter 4.

Chapter 4: Preliminary Study Analysis


This chapter will explore the preliminary study in detail about the relationship between human quality perception and video quality. This study involves 10 participants which have various video compression experiences. The participants are required to fill in a questionnaire which investigates the quality, eye strain, affected areas, movement, flicker and smearing.

Scope of Experiment

This initial study is interested on investigating the human perception about quality as well as video compression. Firstly, the video will be shown to the participants on 17" CRT monitor, with the association with CrystalEyes Stereoscopic viewing glasses. We tried to replicate the experiment conditions as stated in the Recommendation ITU-R BT. 500 - Methodology for the subjective assessment of the quality of the television pictures.

Secondly, the video sequences are compressed via the Parallel Virtual Machine set up. The left sequences are kept at the original rate and only the right video sequences are compressed. Both left and right sequences are feed through the encoder as the right sequence uses the left sequence as reference. The codec setting are kept the same except the Quantization Parameter (QP) value. This results in a compressed video with different Peak Sound to Noise Ratio (PSNR) and we then calculate the compression achieved.

Finally, the feedback gathered from the questionnaire is analysed. As quality, eye strain, movement, flicker and smearing can only be judged by human, this part of the experiment is vital in order to answer the research questions. The affected areas are also asked to the participants if they notice any distortion while viewing the video sequences.

Results and Analysis

10 viewers participate in this preliminary study. The age range of the participants was between 22 and 37. The scale of quality was defined from 1 - very bad to 5 - very good. In order to perform a constructive analysis, the mean of video quality for each sequence was calculated from the viewer's response.

4.2.1 Blocking and Blurring Artefacts

The presence of blocking and blurring artefacts in the video sequence truly affects the outcome of reaching the desired quality level. Lowering the bit rate during compression has resulted with these artefacts to appear at the background of the video sequences.

Presence of Eye Strain

Some of the participants can feel some mild eye strain present during the experiment. This involves mostly participants who are wearing visual aids such as eye glasses or contact lenses over the CrystalEyes Stereoscopic display glasses. This finding will be investigated further to gain in depth understanding of the link between wearing visual aids and the presence of eye strain.

Affected Areas

Participants are aware of affected areas when we show them a highly compressed video sequences. The higher the compression, the affected areas are very noticeable. Despite with the presence of artefacts, the viewers still manage to view all of the video sequences in stereoscopic view (3D effect).

Movement, Flicker and Smearing

From the preliminary experiment, the viewer's agree that then movement of the videos are excellent, but flicker and smearing artefacts are very annoying to the viewers especially when we played the high compressed videos. Flickering and smearing also contributes towards the presence of eye strains.

The data that was collected from this experiment was analysed through one - way Analysis of Variance (ANOVA) which is used to test the difference among independent groups.

Chapter 5: Current and Future Work


This chapter will discuss the work that had been performed for the second year of this research as well as future work that will be conducted for the third year.

Second Year

We managed to set up the experiment with the help of Dr. Balamuralii Balasubramaniyam. His experience of setting up the Parallel Virtual Machine (PVM) during his research was vital in order for us to resume the experiment successfully. Stereoscopic questionnaire was also designed to get the necessary feedback from the viewers regarding to the experiment.

We are able to run the preliminary experiment on 10 participants and get all of the crucial data needed. We are required to get the health and safety approval from the Ethical Advisory Committee when involving any human participants.

Future Work

For the final year, we are proposing to focus more on creating new stereoscopic coding algorithms with optimisation of the codec variables. Parameters of the codec are modified until an optimized set of parameters are achieved. This process will be performed until a significant video compression ratio is attained with the minimum loss of quality.

We are also trying to extend the current H.264/SVC codec to accommodate MVC coding. The current algorithm will be applied into the MVC to investigate if the algorithm will gain the desired objectives.

Detailed time frame for this research is attached in Annex B.