H.264 parameter optimization

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

H.264 PARAMETER OPTIMIZATION

AIM

To carry out an investigation into choosing optimal parameters for H.264 encoder to achieve a more network friendly video representation addressing for conventional and nonconventional applications such as video streaming, live broadcast and video storage.

BACKGROUND

H.264 is the latest international video coding standard jointly developed by ITU-T Video Coding Expert Group and ISO/IEC Moving Picture Experts group under the Joint Video Team (JVT) partnership. It is a block oriented motion compensation based codec standard in use today.

H264 video compression standard was developed to provide highly quality video at lower bitrates (say half that of the earlier standards like MPEG-2, H.263, without increasing its complexity which make it difficult to implement. Additionally, it was also intended to achieve very flexibility in its ability to be used with wide variety of applications and different network types and systems including low and high bitrates/resolution video, broadcast, IP packet network, DVD storage and multimedia telephony systems.

The design structure of the H.264 is shown in fig 1 below which shows the typical video and decoding chain. The design structure gives it the flexibility and customizability for new applications

which may be deployed to run over existing and future networks. This is achieved with the Video Coding Layer which is designed to efficiently represent the video content and a Network Abstraction layer NAL which formats the VCL representations and provides header information in a manage appropriate for delivery by transport layer protocols over various data link and physical layers.

Relative to earlier versions of the video coding standard, the high coding efficiency of the H264 is achieved through the enhancement of some of its parameters to predict the values of the content of a picture to be encoded. Some of these parameters include; Variable block size motion compensation from (16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4), Quarter-sample-accurate motion compensation, Weighted prediction, Spatial prediction from the edges of neighboring blocks for intra coding, Entropy coding techniques using CABAC and CAVLC etc.

The high compression efficiency of the H.264 has increased its high demand in applications to enhance video conferencing, mobile TV broadcasting, live video streaming, Multimedia Message Service and in optical and magnetic storage devices like DVDs. Its only drawback which is its complexity can be effectively mitigated by selecting optimal parameters for various application needs.

EXPERIMENTAL PROCEDURE

The experiment was carried out using a configuration file, three video sequences, H.264 encoder file and a PC laptop. The configuration file was used to set parameters used by H.264 for the encoding of the video sequences.

The encoder used for this experiment is the JM (Joint Model) H.264 encoder version 10.2 FRext. The Basic configuration file was used. Three QCIF video sequences were chosen for the experiment are akiyo.qcif (300 frames), container.qcif(300 frames), coastguard.qcif(300 frames). The computer chosen for the experiment has the following specification: HP Pavilion dv6664en, Windows Vista Home Premium (32-bit), Intel® Core™ 2 Duo CPU T750 @ 2.00GHz, 2.00GB RAM. However to reduce error in coding times, it was ensured that most system CPU and memory intensive applications which such as antivirus, outlook and messenger were turned off during the experiment.

The experiment was then carried following steps outlined below:

I. Encode the three video sequences with the H.264 encoder using default parameter settings in the configuration file and obtain the Rate Distortion curve for each of sequence. This will be the reference curve.

  • Edited the configuration file's input file name with the sequence name
  • The target bitrates was varied using 100, 200, 300, 400, 500 and 600kbps
  • The encoder was run and output logged to a file for each set.
  • PSNR and bitrates were recorded for each sequence. The Luminance component of the video sequence is used given that the human eyes is more sensitive it more than the chroma components. PSNR = SNR_Y measured in decibels (dB)
  • The total encoding time for each set was recorded
  • Rate-Distortion (R-D) curve for PSNR (dB) against bitrates (kbps) was plotted for each sequence.

II. Repeated step (l) above with one or more basic parameters altered in the base configuration file. The following parameters were used

  1. CABAC
  2. RDO (Rate Distortion Optimization) ON or OFF
  3. B-pictures
  4. Multiple Reference Frames
  5. Fast and Low Complexity Motion Search
  6. Large and small motion estimation block sizes
  7. Motion estimation search window size (Search range)
  • The encoder was switched from the basic to main profile for some parameters. These parameters do not operate at the base profile.
  • PSNR and Bitrates were recorded for each set of parameters with the total encoding time.

III. Plotted R-D curves for all values obtained in step (ll). Each set of graphs contains R-D curves for each altered parameter plus the reference curve as a base for comparison.

EXERIMENTAL RESULTS AND ANALYSIS

The experimental steps and result will now be discussed further. The highlight will include the detailed parameters changed in the configuration file for each experiment, the tables of values obtained for each set, graphs R-D plots and the observation. The readings obtained for encoding times will only be used while making comments on the observations; it will not be reflected on the graph.

EXPERIMENT 1: Running the Encoder with Default Parameter settings (Reference)

In this experiment, the encoder was run using default parameters. The following changes were altered in the configuration files.

InputFile = “akiyo.qcif”, “container.qcif”, “coastguard.qcif” (selecting the sequence to be encoded)

RateControlEnable = 1 (to turn on rate control)

Bitrate = 100000, 200000, 300000, 400000, 500000, 600000 (Target bitrates used)

The result is tabulated below, followed by the rate distortion curves.

Table 1a Readings obtained for default parameter settings

Graph 1.0 R-D curve (Reference) for Akiyo.qcif, container.qcif and coastguard.qcif

Observation

The three video sequences show levels of Rate Distortion performance. Akiyo.qcif has the highest PSNR against bitrates values followed by container.qcif. This can generally be explained from the fact that encoding video sequences having low motion content produces compressed video sequences with considerably high luminance SNR values. Akiyo.qcif is a news broadcast sequence containing only head and mouth motion while container.qcif and coastguard.qcif higher motion (mostly motion vectors due to the motion estimation and compensation of the water current and boat and ship in each video respectively).

EXPERIMENT 2: Running encoder with Rate Distortion Optimization (RDO)Parameter

This parameter enables Lagrangian based Rate distortion optimized mode decision. The RDO can be run in three modes as shown below:

RDOptimization = 0 (default) # rd-optimized mode decision

# 0: RD-off (Low complexity mode)

# 1: RD-on (High complexity mode)

# 2: RD-on (Fast high complexity mode - not work in FREX Profiles)

The H.264 encoder runs RDO in its Low Complexity Mode by default therefore investigations was carried out with RDO turned on running in High Complexity Modes and then Fast High Complexity Mode.

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. RDO in High Complexity Mode:

RDOptimization= 1

ii. RDO in Fast High Complexity Mode:

RDOptimization= 2

Tabulated results and graphs for the 3 video sequences are shown below (Note: the reference from exp 1 is included for comparison):

Note: RDO-HC (Rate-Distortion Optimization High Complexity mode)

RDO-FHC (Rate-Distortion Optimization in Fast High Complexity mode)

Table 2a Tabulated result for experiment 2 using sequence akiyo.qcif

Observation

From the results above it is evident that using Rate Distortion Optimization gives a slightly improved luminance SNR_Y (better quality video sequences) compared to reference. It can also be inferred from the result of the experiment that High complexity mode produces a slightly better PSNR than the Fast High Complexity mode but using twice as much it's encoding time.

I would not recommend RDO for applications such as live streaming or live broadcast where encoding time needs to be minimal but for entertainment applications where there are quality demands and encoding time could be used as a trade-off, RDO could be used.

EXPERIMENT 3: Investigating CABAC against CAVLC Parameter

CABAC (Context-Adaptive Binary Arithmetic coding) is a method of arithmetic coding where probability models are updated based on previously coding statistics of a H.264 bit stream while CAVLC (Context-Adaptive Variable Length coding) is an entropy coding method specially designed for coding transform coefficients in which different sets of variable-length codes are chosen depending on the statistics of recently coded coefficients (context adaptation). CABAC offers good compression performance compared to CAVLC through the use of arithmetic coding (instead of variable length coding), adapting probability estimates based on local statistics of bit stream. REF

In the H.264 CABAC runs either in fixed or adaptive modes. The encoder in its default state runs CAVLC while CABAC is not allowed in the baseline profile therefore the encoder was switched to Main profile.

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. CABAC - Fixed Mode:

ProfileIDC = 77 (encode using “main” profile)

SymbolMode = 1 (turn on CABAC)

ContextInitMethod = 0 (Set CABAC Fixed Mode)

ii. CABAC Adaptive Mode:

ProfileIDC = 77 (encode using “main” profile)

SymbolMode = 1 (turn on CABAC)

ContextInitMethod = 1 (Set CABAC Adaptive Mode)

Below are the results:

Observations

From the results above, it can be deduced that CABAC gives a better Luminance SNR_Y (hence better rate-distortion) than CAVLC and also lower encoding time than CAVLC. Alsocomparing adaptive CABAC with fixed CABAC, they both give the same rate distortion performance with the adaptive CABAC having a slightly lower encoding time at higher bitrates.

I would recommend the use of adaptive CABAC in most applications like live streaming, entertainment etc., since it improves rate-distortion performance with less encoding time.

EXPERIMENT 4: Running encoder with B-Pictures

B-pictures (B slices) are bi-directional predicted frames using both previous and future frames of other picture(s) to be encoded. Predicting current frame from both previous and future frames gives rise to higher computational demand on the encoder resulting in more encoding time, but usually produces decoded video sequences having better PSNR value compared to P-frames since more frame options are used for motion estimation. Typically they will require lesser bits for encoding than either I or P pictures.

Using B-frames, a better rate-distortion performance is expected with increased encoding time.

The H.264 does not allow B-pictures in baseline profile therefore the encoder will be switched to the main profile. B-pictures makes reference to future frames therefore it is necessary to set the frame skip sequence to correspond with the number of B-pictures to be used (i.e. if B=pictures is set to 2, then frame skip is also set to 2). Also the Number of frames to be encoded needs to be adjusted in the configuration file to accommodate the skipped frame. This is achieved using the formula below:

FramesToBeEncoded = int((TotalNumberOfFrames - 1) / (NumberBFrames + 1)) + 1 [REF]

Total number of frames used for the experiment is 61.

When using B-pictures, it is also ensured that QPISlice, QPBSlice and QPPSlice are set to the same value.

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. Using 1 B-Picture:

ProfileIDC = 77

NumberBFrames = 1

FrameSkip = 1

FramesToBeEncoded= int ((61-1)/ (1+1)) +1 = 31

ii. Using 2 B-picture:

ProfileIDC = 77

NumberBFrames = 2

FrameSkip = 2

FramesToBeEncoded= int ((61-1)/ (2+1)) +1 = 21

The obtained results are shown below:

Observation

From the results above, it can be said that using B-Pictures produces high PSNR value at lower bit rate but the performance deteriorates at higher bitrates. This is more evident when using higher number of B-Pictures. The higher the number of B-picture, the higher the PSNR value at low bit rate. Also the encoding time improves with larger number of B-pictures.

B-pictures will be recommended for all applications especially for live broadcast and video streaming.

EXPERIMENT 5: Effect of running H.264 using Multiple Reference Frames

In H.264, using multiple reference frames to encode video sequences enables the encoder to choose from more than one previously decoded frame for motion compensation REF. Given the number of processes involved in selecting matching frames, longer time is taken for encoding but it produces improved video quality.

Up to a maximum of 16 reference frames came be used.

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. Using 4 Reference Frames :

NumberReferenceFrames = 4

ii. Using 6 Reference Frames:

NumberReferenceFrames = 6

The results are shown below:

Observation

The results above proves the fact that using multiple reference frame will slightly improve the quality of the decoded video sequence but at expense of more encoding time. Since the added encoding time will not be desirable for live broadcast, I will not recommend its use multiple frames in live broadcasting.

EXPERIMENT 6: Effect of using High or Low Complexity Motion Search

The H.264 is set to use simplified UMHexagons (Uneven Multi-Hexagon) Fast motion estimation algorithm. The experiment will investigated the effect of turning off Fast Motion Estimation algorithm and the use of low complexity motion search algorithm.

From the H.264 configuration file, the Fast Motion Search is controlled by the parameters as shown below:

UseFME = 2 (default) # Use fast motion estimation (0=disable, 1=UMHexagons, 2=Simplified UMHexagons)

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. Using Low Complexity Motion Search (UMHexagon) :

UseFME = 1

ii. Turning off Fast Motion Search:

UseFME = 0

Obtained results are shown below:

Observation

Using Low complexity Motion search produces very little increase in the PSNR therefore higher performance. This becomes more evident at higher bitrates as can be clearly seen result for the sequence container.qcif. When Fast Motion Estimation is turned off, the PSNR is higher but at much higher encoding time.

The above shows that using Fast Motion Estimation improves the overall performance of the encoder and thus is recommended for live broadcasting.

EXPERIMENT 7: Effect of using Large and Small Motion Estimation Block Sizes

The H.264 uses large and small motion estimation block sizes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4) for inter-prediction and motion estimation. The purpose of this experiment is to investigate the effect of using only small MEB sizes (8x4, 4x8, 4x4) and also using both large and small sizes simultaneously. One would expected that using small block sizes for motion estimation would increase rate-distortion performance by allowing the capture of more discrete movement between pixels for video sequences with high motion content (such as container.qcif), but the high computational demand this will further increase the encoding time.

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. Using small motion estimation block sizes:

InterSearch36x16 = 0

InterSearch36x8 = 0

InterSearch8x16 = 0

InterSearch8x8 = 0

InterSearch8x4 = 1

InterSearch4x8 = 1

InterSearch4x4 = 1

ii. Using both large and small motion estimation block sizes:

InterSearch36x16 = 1

InterSearch36x8 = 1

InterSearch8x16 = 1

InterSearch8x8 = 1

InterSearch8x4 = 1

InterSearch4x8 = 1

InterSearch4x4 = 1

The results are shown below:

Observations

From results above, using only small motion estimation block sizes only produced lesser SNR_Y value compared to using large motion estimation block sizes. The results also shows slightly high PSNR value when using both small and large motion estimation block sizes, however this require additional encoding time given the added computation requirement of using all the available block sizes simultaneously.

EXPERIMENT 8: Investigating the effect of the Motion Estimation Window size (Search Range)

The search range is used to set the maximum range allowed for motion estimation.

It is likely that increasing search window size might have some effect on the rate-distortion performance of the encoder given the larger options available for better matching blocks. This increases the encoding time given additional computational requirements on the encoder in order to find the best matching block. Reducing the search window range will reduce the encoding time, but might always result in lower SNR_Y values.

The encoder by default is set to use a search range of 16.

The following input parameters were used in the configuration file in addition to the basic ones used in experiment 1.

i. Reducing Motion Estimation search window to 8:

SearchRange = 8

ii. Increasing Motion Estimation search window to 32:

SearchRange = 32

The results are shown below:

Observation

From the above, there is no change in the SNR_Y values obtained when varying the search window size, however there is slight drop in the encoding time as predicted when using smaller search window size. This is more evident in akiyo.qcif which contains less motion compared to the other two.

I can recommend the use of small search window size in live TV news cast or closed CCTV continuous monitoring application.

EXPERIMENT 9: Choosing Parameters for Optimal Performance

Having compared and analyzed the results from experiments 1 to 8, the following combination of basic parameters were chosen to optimize the performance of the H.264 encoder. The parameters were chosen based of their perceived improvement in PSNR and reduced encoding time.

The chosen parameters are as follows:

  1. B-Pictures - 1 B frame
  2. CABAC- Adaptive mode
  3. Large and Small Motion Estimation Block sizes
  4. RDO Fast High Complexity Mode
  5. Low Complexity Motion Search - UMHexagon

Configuration file settings used is shown below:

ProfileIDC = 77

NumberBFrames = 1

FrameSkip = 1

FramesToBeEncoded = 31

SymbolMode = 1 (turn on CABAC)

ContextInitMethod = 1 (Set CABAC Adaptive Mode)

UseFME = 1

RDOptimization= 2

InterSearch36x16 = 1

InterSearch36x8 = 1

InterSearch8x16 = 1

InterSearch8x8 = 1

InterSearch8x4 = 1

InterSearch4x8 = 1

InterSearch4x4 = 1

Below is the result

Observation

From results obtained above, using the chosen optimal parameters produced a better Rate Distortion curve when compared to the reference curve. The obtained PSNR value was clearly higher for the optimized parameters at both high and lower bitrates. This is clearly reflected in the container.qcif and coastguard.qcif sequences.

The increased PSNR was obtained at slightly longer encoding time when compared to the reference frame. However this time is within acceptable limits given the considerable increase in quality.

CONCLUSION

At the start of this assignment, our goal was to choose optimal parameters for the H.264 encoder that will optimize rate-distortion performance (produce high quality coded video sequence) and minimize the encoding time. From the experimental results obtained, it can deduce that the quality of coded video sequence was enhanced with the use of B Pictures, large number of reference frame, large search range, and use of all partition block sizes, fast motion estimation disabled, CABAC adaptive mode, and high complexity rate-distortion optimized mode. Likewise, the encoding time was minimal or better with small number of reference frames, small motion search window size, fast motion estimation and rate-distortion optimization disabled or in fast mode.

High quality coded sequences is most desirous in DVD storage, High Definitions devices, blue ray devices, games and other multimedia application while faster encoding time is most desired in MMS application, instant messaging, video conferencing.

Achieving a high PSNR at lower bitrates reduces the cost in transmission in terms of bandwidth requirement

With careful combination of the parameters that improved PSNR value and avoiding or minimizing use of the ones that introduces addition computational requirements the H.264 can be optimized to produce a high quality coded sequence with minimal encoding time.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.