H.264 Video Streaming System on Embedded Platform
Disclaimer: This dissertation has been submitted by a student. This is not an example of the work written by our professional dissertation writers. You can view samples of our professional work here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
The adoption of technological products like digital television and video conferencing has made video streaming an active research area.
This report presents the integration of a video streamer module into a baseline H.264/AVC encoder running a TMSDM6446EVM embedded platform. The main objective of this project is to achieve real-time streaming of the baseline H.264/AVC video over a local area network (LAN) which is a part of the surveillance video system.
The encoding of baseline H.264/AVC and the hardware components of the platform are first discussed. Various streaming protocols are studied in order to implement the video streamer on the DM6446 board. The multi-threaded application encoder program is used to encode raw video frames into H.264/AVC format onto a file. For the video streaming, open source Live555 MediaServer was used to stream video data to a remote VLC client over LAN.
Initially, file streaming was implemented from PC to PC. Upon successfully implementation on PC, the video streamer was ported to the board. The steps involved in porting the Live555 application were also described in the report. Both unicast and multicast file streaming were implemented in the video streamer.
Due to the problems of file streaming, the live streaming approach was adopted. Several methodologies were discussed in integrating the video streamer and the encoder program. Modification was made both the encoder program and the Live555 application to achieve live streaming of H.264/AVC video. Results of both file and live streaming will be shown in this report. The implemented video streamer module will be used as a base module of the video surveillance system.
Chapter 1: Introduction
Significant breakthroughs have been made over the last few years in the area of digital video compression technologies. As such applications making use of these technologies have also become prevalent and continue to be of active research topics today. For example, digital television and video conferencing are some of the applications that are now commonly encountered in our daily lives. One application of interest here is to make use of the technologies to implement a video camera surveillance system which can enhance the security of consumer's business and home environment.
In typical surveillance systems, the captured video is sent over a cable networks to be monitored and stored at remote stations. As the captured raw video contains large amount of data, it will be of advantage to first compress the data by using a compression technique before it is transferred over the network. One such compression technique that is suitable for this type of application is the H.264 coding standard.
H.264 coding is better than the other coding technique for video streaming as it is more robust to data losses and coding efficiency, which are important factors when streaming is performed over a shared Local Area Network. As there is an increasing acceptance of H.264 coding and the availability of high computing power embedded systems, digital video surveillance system based on H.264 on embedded platform is hence a feasible and a potentially more cost-effective system.
Implementing a H.264 video streaming system on an embedded platform is a logical extension of video surveillance systems which are still typical implemented using high computing power stations (e.g. PC). In a embedded version, a Digital Signal Processor (DSP) forms the core of the embedded system and executes the intensive signal processing algorithm. Current embedded systems typical also include network features which enable the implementation of data streaming applications. To facilitate data streaming, a number of network protocol standards have also being defined, and are currently used for digital video applications.
1.2. Objective and Scope
The objective of this final year project is to implement a video surveillance system based on the H.264 coding standard running on an embedded platform. Such a system contains extensive scopes of functionalities and would require extensive amount of development time if implemented from scratch. Hence this project is to focus on the data streaming aspect of a video surveillance system.
After some initial investigation and experimentation, it is decided to confine the main scope of the project to developing a live streaming H.264 based video system running on a DM6446 EVM development platform. The breakdown of the work to be progressive performed are then identified as follows:
1. Familiarization of open source live555 streaming media server
Due to the complexity of implementing the various standard protocols needed for multimedia streaming, the live555 media server program is used as a base to implement the streaming of the H.264.based video data.
2. Streaming of stored H.264 file over the network
The live555 is then modified to support streaming of raw encoded H.264 file from the DM6446 EVM board over the network. Knowledge of H.264 coding standard is necessary in order to parse the file stream before streaming over the network.
3. Modifying a demo version of an encoder program and integrating it together with live555 to achieve live streaming
The demo encoder was modified to send encoded video data to the Live555 program which would do the necessary packetization to be streamed over the network. Since data is passed from one process to another, various inter-process communication techniques were studied and used in this project.
The resources used for this project are as follows:
1. DM6446 (DaVinci™) Evaluation Module
2. SWANN C500 Professional CCTV Camera Solution 400 TV Lines CCD Color Camera
3. LCD Display
4. IR Remote Control
5. TI Davinci demo version of MontaVista Linux Pro v4.0
6. A Personal Workstation with Centos v5.0
7. VLC player v.0.9.8a as client
8. Open source live555 program (downloaded from www.live555.com)
The system setup of this project is shown below:
1.4. Report Organization
This report consists of 7 chapters.
Chapter 1 introduces the motivation behind embedded video streaming system and defines the scope of the project.
Chapter 2 illustrates the video literature review of the H.264/AVC video coding technique and the various streaming protocols which are to be implemented in the project.
Chapter 3 explains the hardware literature review of the platform being used in the project. The architecture, memory management, inter-process communication and the software tools are also discussed in this chapter.
Chapter 4 explains the execution of the encoder program of the DM6446EVM board. The interaction of the various threads in this multi-threaded application is also discussed to fully understand the encoder program.
Chapter 5 gives an overview of the Live555 MediaServer which is used as a base to implement the video streamer module on the board. Adding support to unicast and multicast streaming, porting of live555 to the board and receiving video stream on remote VCL client are explained in this chapter.
Chapter 6 explains the limitations of file streaming and moving towards live streaming system. Various integration methodologies and modification to both encoder program and live555 program are shown as well.
Chapters 7 summarize the implementation results of file and live streaming, analysis the performance of these results.
Chapter 8 gives the conclusion by stating the current limitation and problems, scope for future implementation.
Chapter 2: Video Literature Review
2.1. H.264/AVC Video Codec Overview
H.264 is the most advanced and latest video coding technique. Although there are many video coding schemes like H.26x and MPEG, H.264/AVC made many improvements and tools for coding efficiency and error resiliency. This chapter briefly will discuss the network aspect of the video coding technique. It will also cover error resiliency needed for transmission of video data over the network. For a more detailed explanation of the H.264/AVC, refer to appendix A.
2.1.1. Network Abstraction Layer (NAL)
The aim of the NAL is to ensure that the data coming from the VCL layer is “network worthy” so that the data can be used for numerous systems. NAL facilitates the mapping of H.264/AVC VCL data for different transport layers such as:
* RTP/IP real-time streaming over wired and wireless mediums
* Different storage file formats such as MP4, MMS, AVI and etc.
The concepts of NAL and error robustness techniques of the H.264/AVC will be discussed in the following parts of the report.
The encoded data from the VCL are packed into NAL units. A NAL unit represents a packet which makes up of a certain number of bytes. The first byte of the NAL unit is called the header byte which indicates the data type of the NAL unit. The remaining bytes make up the payload data of the NAL unit.
The NAL unit structure allows provision for different transport systems namely packet-oriented and bit stream-oriented. To cater for bit stream-oriented transport systems like MPEG-2, the NAL units are organized into byte stream format. These units are prefixed by a specific start code prefix of three bytes which is namely 0x000001. The start code prefix indicates and the start of each NAL units and hence defining the boundaries of the units.
For packet-oriented transport systems, the encoded video data are transported via packets defined by transport protocols. Hence, the boundaries of the NAL units are known without having to include start code prefix byte. The details of packetization of NAL units will be discussed in later sections of the report.
NAL units are further categorized into two types:
* VCL unit: comprises of encoded video data
· Non-VCL unit: comprises of additional information like parameter sets which is the important header information. Also contains supplementary enhancement information (SEI) which contains the timing information and other data which increases the usability of the decoded video signal.
A group of NAL units which adhere to a certain form is called a access unit. When one access unit is decoded, one decoded picture is formed. In the table 1 below, the functions of the NAL units derived from the access units are explained.
Data/Error robustness techniques
H.264/AVC has several techniques to mitigate error/data loss which is an essential quality when it comes to streaming applications. The techniques are as follows:
· Parameter sets: contains information that is being applied to large number of VCL NAL units. It comprises of two kinds of parameter sets:
- Sequence Parameter set (SPS) : Information pertaining to sequence of encoded picture
- Picture Parameter Set (PPS) : Information pertaining to one or more individual pictures
The above mentioned parameters hardly changes and hence it need not be transmitted repeatedly and saves overhead. The parameter sets can be sent “in-band” which is carried in the same channel as the VCL NAL units. It can also be sent “out-of-band” using reliable transport protocol. Therefore, it enhances the resiliency towards data and error loss.
· Flexible Macroblock Ordering (FMO)
FMO maps the macroblocks to different slice groups. In the event of any slice group loss, missing data is masked up by interpolating from the other slice groups.
· Redundancy Slices (RS)
Redundant representation of the picture can be stored in the redundant slices. If the loss of the original slice occurs, the decoder can make use of the redundant slices to recover the original slice.
These techniques introduced in the H.264/AVC makes the codec more robust and resilient towards data and error loss.
2.1.2. Profiles and Levels
A profile of a codec is defined as the set of features identified to meet a certain specifications of intended applications For the H.264/AVC codec, it is defined as a set of features identified to generate a conforming bit stream. A level is imposes restrictions on some key parameters of the bit stream.
In H.264/AVC, there are three profiles namely: Baseline, Main and Extended. 5 shows the relationship between these profiles. The Baseline profile is most likely to be used by network cameras and encoders as it requires limited computing resources. It is quite ideal to make use of this profile to support real-time streaming applications in a embedded platform.
2.2. Overview of Video Streaming
In previous systems, accessing video data across network exploit the ‘download and play' approach. In this approach, the client had to wait until the whole video data is downloaded to the media player before play out begins. To combat the long initial play out delay, the concept of streaming was introduced.
Streaming allows the client to play out the earlier part of the video data whilst still transferring the remaining part of the video data. The major advantage of the streaming concept is that the video data need not be stored in the client's computer as compared to the traditional ‘download and play' approach. This reduces the long initial play out delay experienced by the client.
Streaming adopts the traditional client/server model. The client connects to the listening server and request for video data. The server sends video data over to the client for play out of video data.
2.2.1. Types of Streaming
There are three different types of streaming video data. They are pre-recorded/ file streaming, live/real-time streaming and interactive streaming.
* Pre-recorded/live streaming: The encoded video is stored into a file and the system streams the file over the network. A major overhead is that there is a long initial play out delay (10-15s) experienced by the client.
* Live/real-time streaming: The encoded video is streamed over the network directly without being stored into a file. The initial play out delay reduces. Consideration must be taken to ensure that play out rate does not exceed sending rate which may result in jerky the picture. On the other hand, if the sending rate is too slow, the packets arriving at the client may be dropped, causing in a freezing the picture. The timing requirement for the end-to-end delay is more stringent in this scenario.
* Interactive streaming: Like live streaming, the video is streamed directly over the network. It responds to user's control input such as rewind, pause, stop, play and forward the particular video stream. The system should respond in accordance to those inputs by the user.
In this project, both pre-recorded and live streaming are implemented. Some functionality of interactive streaming controls like stop and play are also part of the system.
2.2.2. Video Streaming System modules
The intent of the video source is to capture the raw video sequence. The CCTV camera is used as the video source in this project. Most cameras are of analogue inputs and these inputs are connected to the encoding station via video connections. This project makes use of only one video source due to the limitation of the video connections on the encoding station. The raw video sequence is then passed onto the encoding station.
The aim of the encoding station digitized and encodes the raw video sequence into the desired format. In the actual system, the encoding is done by the DM6446 board into the H.264/AVC format. Since the hardware encoding is CPU intensive, this forms the bottleneck of the whole streaming system. The H.264 video is passed onto the video streamer server module of the system.
Video Streaming and WebServer
The role of the video streaming server is to packetize the H.264/AVC to be streamed over the network. It serves the requests from individual clients. It needs to support the total bandwidth requirements of the particular video stream requested by clients. WebServer offers a URL link which connects to the video streaming server. For this project, the video streaming server module is embedded inside DM6446 board and it is serves every individual client's requests.
The video player acts a client connecting to and requesting video data from the video streaming server. Once the video data is received, the video player buffers the data for a while and then begins play out of data. The video player used for this project is the VideoLAN (VLC) Player. It has the relevant H.264/AVC codec so that it can decode and play the H264/AVC video data.
2.2.3. Unicast VS Multicast
There are two key delivery techniques employed by streaming media distribution.
Unicast transmission is the sending of data to one particular network destination host over a packet switched network. It establishes two way point-to-point connection between client and server. The client communicates directly with the server via this connection. The drawback is that every connection receives a separate video stream which uses up network bandwidth rapidly.
Multicast transmission is the sending of only one copy of data via the network so that many clients can receive simultaneously. In video streaming, it is more cost effective to send single copy of video data over the network so as to conserve the network bandwidth. Since multicast is not connection oriented, the clients cannot control the streams that they can receive.
In this project, unicast transmission is used to stream encoded video over the network. The client connects directly to the DM6446 board where it gets the encoded video data. The project can easily be extended to multicast transmission.
2.3. Streaming Protocols
When streaming video content over a network, a number of network protocols are used. These protocols are well defined by the Internet Engineering Task Force (IETF) and the Internet Society (IS) and documented in Request for Comments (RFC) documents. These standards are adopted by many developers today.
In this project, the same standards are also employed in order to successfully stream H.264/AVC content over a simple Local Area Network (LAN). The following sections will discuss about the various protocols that are studied in the course of this project.
2.3.1. Real-Time Streaming Protocol (RTSP)
The most commonly used application layer protocol is RTSP. RTSP acts a control protocol to media streaming servers. It establishes connection between two end points of the system and control media sessions. Clients issue VCR-like commands like play and pause to facilitate the control of real-time playback of media streams from the servers. However, this protocol is not involved in the transport of the media stream over the network. For this project, RTSP version 1.0 is used.
Like the Hyper Text Transfer Protocol (HTTP), it contains several methods. They are OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, RECORD and TEARDOWN. These commands are sent by using the RTSP URL. The default port number used in this protocol is 554. An example of such as URL is:
<method name > rtsp://
· OPTIONS: An OPTIONS request returns the types of request that the server will accept. An example of the request is:
OPTIONS rtsp://18.104.22.168:554/test.264 RTSP/1.0
User-agent: VLC media Player
The CSeq parameter keeps track of the number of request send to the server and it is incremented every time a new request is issued. The User-agent refers to the client making the request.
* DESCRIBE: This method gets the presentation or the media object identified in the request URL from the server. An example of such a request:
DESCRIBE rtsp://22.214.171.124:554/test.264 RTSP/1.0
User agent: VLC media Player
The Accept header is used to describe the formats understood by the client. All the initialization of the media resource must be present in the DESCRIBE method that it describes.
· SETUP: This method will specify the mode of transport mechanism to be used for the media stream. A typical example is:
SETUP rtsp://126.96.36.199:554/test.264 RTSP/1.0
Transport: RTP/AVP; unicast; client_port = 1200-1201
User agent: VLC media Player
The Transport header specifies the transport mechanism to be used. In this case, real-time transport protocol is used in a unicast manner. The relevant client port number is also reflected and it is selected randomly by the server. Since RTSP is a stateful protocol, a session is created upon successful acknowledgement to this method.
· PLAY: This method request the server to start sending the data via the transport mechanism stated in the SETUP method. The URL is the same as the other methods except for:
Range: npt= 0.000- \r\n
The Session header specifies the unique session id. This is important as server may establish various sessions and this keep tracks of them. The Range header positions play time to the beginning and plays till the end of the range.
* PAUSE: This method informs the server to pause sending of the media stream. Once the PAUSE request is sent, the range header will capture the position at which the media stream is paused. When a PLAY request is sent again, the client will resume playing from the current position of the media stream as specified in the range header.
RSTP Status Codes
Whenever the client sends a request message to the server, the server forms a equivalent response message to be sent to the client. The response codes are similar to HTTP as they are both in ASCII text. They are as follows:
405: Method Not Allowed
451: Parameter Not Understood
454: Session Not Found
457: Invalid Range
461: Unsupported Transport
462: Destination Unreachable
These are some of the RTSP status codes. There are many others but the codes mentioned above are of importance in the context of this project.
2.3.2. Real-time Transport Protocol (RTP)
RTP is a defined packet structure which is used for transporting media stream over the network. It is a transport layer protocol but developers view it as a application layer protocol stack. This protocol facilitates jitter compensation and detection of incorrect sequence arrival of data which is common for transmission over IP network. For the transmission of media data over the network, it is important that packets arrive in a timely manner as it is loss tolerant but not delay tolerant. Due to the high latency of Transmission Control Protocol in establishing connections, RTP is often built on top of the User Datagram Protocol (UDP). RTP also supports multicast transmission of data.
RTP is also a stateful protocol as a session is established before data can be packed into the RTP packet and sent over the network. The session contains the IP address of the destination and port number of the RTP which is usually an even number. The following section will explain about the packet structure of RTP which is used for transmission.
RTP Packet Structure
The below shows a RTP packet header which is appended in front of the media data.s
The minimum size of the RTP header is 12 bytes.. Optional extension information may be present after the header information. The fields of the header are:
· V: (2 bits) to indicate the version number of the protocol. Version used in this project is 2.
· P (Padding): (1 bit) to indicate if there padding which can be used for encryption algorithm
· X (Extension): (1 bit) to indicate if there is extension information between header and payload data.
· CC (CSRC Count) : (4 bits) indicates the number of CSRC identifiers
· M (Marker): (1 bit) used by application to indicate data has specific relevance in the perspective of the application. The setting for M bit marks the end of video data in this project
· PT (Payload Type): (7 bits) to indicate the type of payload data carried by the packet. H.264 is used for this project
· Sequence number: (16 bits) incremented by one for every RTP packet. It is used to detect packet loss and out of sequence packet arrival. Based on this information, application can take appropriate action to correct them.
· Time Stamp: (32 bits) receivers use this information to play samples at correct intervals of time. Each stream has independent time stamps.
· SSRC: (32 bits) it unique identifies source of the stream.
· CSRC: sources of a stream from different sources are enumerated according to its source IDs.
This project does not involve the use of Extension field in the packet header and hence will not be explained in this report. Once this header information is appended to the payload data, the packet is sent over the network to the client to be played. The table below summarizes the payload types of RTP and highlighted region is of interest in this project.
Table 2: Payload Types of RTP Packets
2.3.3. RTP Control Protocol (RTCP)
RTCP is a sister protocol which is used in conjunction with the RTP. It provides out-of-band statistical and control information to the RTP session. This provides certain Quality of Service (QoS) for transmission of video data over the network.
The primary functions of the RTCP are:
* To gather statistical information about the quality aspect of the media stream during a RTP session. This data is sent to the session media source and its participants. The source can exploit this information for adaptive media encoding and detect transmission errors.
* It provides canonical end point identifiers (CNAME) to all its session participants. It allows unique identification of end points across different application instances and serves as a third party monitoring tool.
* It also sends RTCP reports to all its session participants. By doing so, the traffic bandwidth increases proportionally. In order to avoid congestion, RTCP has bandwidth management techniques to only use 5% of the total session bandwidth.
RTCP statistical data is sent odd numbered ports. For instance, if RTP port number is 196, then RTCP will use the 197 as its port number. There is no default port number assigned to RTCP.
RTCP Message Types
RTCP sends several types of packets different from RTP packets. They are sender report, receiver report, source description and bye.
· Sender Report (SR): Sent periodically by senders to report the transmission and reception statistics of RTP packets sent in a period of time. It also includes the sender's SSRC and sender's packet count information. The timestamp of the RTP packet is also sent to allow the receiver to synchronize the RTP packets. The bandwidth required for SR is 25% of RTCP bandwidth.
· Receiver Report (RR): It reports the QoS to other receivers and senders. Information like highest sequence number received, inter arrival jitter of RTP packets and fraction of packets loss further explains the QoS of the transmitted media streams. The bandwidth required for RR is 75% of the RTCP bandwidth.
· Source Description (SDES): Sends the CNAME to its session participants. Additional information like name, address of the owner of the source can also be sent.
· End of Participation (BYE): The source sends a BYE message to indicate that it is shutting down the stream. It serves as an announcement that a particular end point is leaving the conference.
Further RTCP Consideration
This protocol is important to ensure that QoS standards are achieved. The acceptable frequencies of these reports are less than one minute. In major application, the frequency may increase as RTCP bandwidth control mechanism. Then, the statistical reporting on the quality of the media stream becomes inaccurate.
Since there are no long delays introduced between the reports in this project, the RTCP is adopted to incorporate a certain level of QoS on streaming H.264/AVC video over embedded platform.
2.3.4. Session Description Protocol (SDP)
The Session Description Protocol is a standard to describe streaming media initialization parameters. These initializations describe the sessions for session announcement, session invitation and parameter negotiation. This protocol can be used together with RTSP. In the previous sections of this chapter, SDP is used in the DESCRIBE state of RTSP to get session's media initialization parameters. SDP is scalable to include different media types and formats.
The session is described by attribute/value pairs. The syntax of SDP are summarized in the below.
In this project, the use of SDP is important in streaming as the client is VLC Media Player. If the streaming is done via RTSP, then VLC expects a sdp description from the server in order to setup the session and facilitate the playback of the streaming media.
Chapter 3: Hardware Literature Review
3.1. Introduction to Texas Instrument DM6446EVM DavinciTM
The development of this project based on the DM6446EVM board. It is necessary to understand the hardware and software aspects of this board. The DM6446 board has a ARM processor operating at a clock speed up to 300MHz and a C64x Digital Signal Processor operating at a clock speed of up to 600MHz.
3.1.1. Key Features of DM6446
The key features that are shown in the above are:
* 1 video port which supports composite of S video
* 4 video DAC outputs: component, RGB, composite
* 256 MB of DDR2 DRAM
* UART, Media Card interface (SD, xD, SM, MS ,MMC Cards)
* 16 MB of non-volatile Flash Memory, 64 MB NAND Flash, 4 MB SRAM
* USB2 interface
* 10/100 MBS Ethernet interface
* Configurable boot load options
* IR Remote Interface, real time clock via MSP430
3.1.2. DM6446EVM Architecture
The architecture of the DM6446 board is organized into several subsystems. By knowing the architecture of the DM6446, the developer can then design and built his application module on the board's underlining architecture.
The shows that DM6446 has three subsystems which are connected to the underlying hardware peripherals. This provides a decoupled architecture which allows the developers to implement his applications on a particular subsystem without having to modify the other subsystems. Some of subsystems are discussed in the next sections.
The ARM subsystem is responsible for the master control of the DM6446 board. It handles the system-level initializations, configurations, user interface, connectivity functions and control of DSP subsystems. The ARM has a larger program memory space and better context switching capabilities and hence it is more suited to handle complex and multi tasks of the system.
The DSP subsystem is mainly the encoding the raw captured video frames into the desired format. It performs several number crunching operations in order to achieve the desired compression technique. It works together with the Video Imaging Coprocessor to compress the video frames.
Video Imaging Coprocessor (VICP)
The VICP is a signal processing library which contains various software algorithms that execute on VICP hardware accelerator. It helps the DSP by taking over computation of varied intensive tasks. Since hardware implementation of number crunching operation will have a faster execution time, the DSP's performance is significantly enhanced. Some of the algorithms supported by VICP are:
* Matrix and Array operation i.e.: Matrix multiplication/transpose, Array Multiplication, Look-up table
* Digital Signal Processing Operations: 1D, 2D FIR Filtering, Convolution and Correlation
* Digital Image and Video Processing Functions: Alpha Blending, Colour space Conversion, Median Filtering
Video Processing Subsystem
This subsystem does the processing of the video frames. The Resizer module crops the video frame into the appropriate resolutions. It also has a On-Screen Display (OSD) to output either the encoded or to be encoded video frames to the LCD display. Four DAC channels are connected to this subsystem to condition the incoming video signals.
Switched Central Resources (SCR)
SCR acts as an interface between various subsystems and the underlying hardware peripherals. It manages the hardware resources and it decides which subsystem can gain control of the hardware resources in an efficient manner. SCR has several techniques to ensure that allocation of hardware for contesting subsystems does not result in deadlock.
3.2. Memory Management of DM6446EVM
The understanding of the memory management of the DM6446EVM was important when dealing with embedded systems. This is to ensure that the developed system does not exceed the memory capabilities of the embedded board.
The shows the breakdown of the different memory types present in the DM6446EVM board as it has a large byte addressable memory space. Since the DSP component is treated as a ‘black box', the memory mapping is shown with respect to the ARM processor. The ARM instruction and data RAM occupies the about 2MB size. The Flash/NAND memory is used to store the contents of the developed program to be loaded into the file system.
3.2.1. Current Memory map of system
For this project, the memory map is shown in the above. It is divided into different sections, each handling a different function. The explanations of the sections are as follows:
· LINUX Section: manages all the resources required by the applications. Whenever the application request for a resource, Linux grants it depending on the availability and the UNIX permissions. The memory partition is segmented into 4KB pages.
· DDRALGHEAP Section: contains heap memory which codec uses to allocate dynamic memory. The memory size is large as video codec consumes a lot of memory.
· DDR Section: contains the DSP-side codes, the static data for the codec and the system of DSP/BIOS and the Codec Engine.
· DSPLINKMEM Section: memory allocation for DSPLINK Inter-process Communication. This module communicates between ARM and DSP. It also loads DSP codes and controls DSP execution.
· RESET_VECTOR Section: contains the DSP reset vector.
3.2.2. Contiguous Memory Allocator (CMEM)
The ARM and DSP works on different regions of the memory on the DM6446EVM. The ARM views the memory of DSP as virtual. The DSP requires the allocation of contiguous memory space. If contiguous memory is not allocated, the DSP can corrupt the memory space of the ARM causing the system to crash. The CMEM is an API created to share buffers between ARM Linux processes and the DSP. CMEM uses a physical memory region and carves it into pools of contiguous memory space. This is done at module insertion time which occurs before the running of any applications on the DM6446EVM board. The advantage of the CMEM is that it is configurable by user. The command input by user at the target is:
The following command initializes the start and ending physical address of the contiguous memory space. The memory is partitioned into 4 pools of various sizes. The CMEM is an important module in the memory management of DM6446EVM. It helps the developers to con the memory pools needed for their applications running on DM6446EVM.
3.3. Inter-Process Communication (IPC) of DM6446 EVM
Since the DM6446EVM consists of both ARM processor and DSP, there must be IPC between the two processors in order to exchange data. The DSP/BIOS LINK is a software framework that allows communication between the ARM and DSP.
3.3.1. Software Architecture of DSP/BIOSTM LINK
The above shows the software architecture of the DSP LINK. The GPP component refers to the General Purpose Processor of ARM processor. The components of the architecture are:
On the GPP side
A specified OS is to be running. In this project, MontaVista Linux OS is running on ARM processor.
* OS ADAPTATION LAYER: wrapper which encapsulates the generic OS services needed by the other components of the DSP LINK. Hence, the other components make use of this API exported by this component instead of direct OS calls. This makes DSP LINK portable across platforms.
* LINK DRIVER: encapsulates the low-level control on the physical link between ARM and DSP.
* PROCESSOR MANAGER: logs information for all components. It also allows various boot loaders to be integrated into the system.
* DSP/BIOS LINKTM API: interfaces for all clients on the ARM side.
On the DSP side
· LINK DRIVER: is part of the DSP/BIOS drivers. It communicates with the ARM over the physical link.
3.3.2. Types of Communication in DSP/BIOSTM LINK
There are four types of IPC in the DSP/BIOSTM LINK which allows communication between the ARM processor and the DSP. They are PROC, CHNL, MSGQ and POOL.
This component refers to the DSP processor from the application's perspective. This allows the DSP to be callable from the ARM processor. Currently, only one DSP is supported. The use of processorId allows the number of DSP to be scalable.
CHNL refers to the logical data channel in application space. It is mainly responsible for data transfer across ARM processor and DSP. Multiplexing of channels on a single physical link is also supported. The information of the source or destination is not contained in the data and it must be explicitly established. The shows a simple CHNL example.
MSGQ refers to IPC via message queuing. This component can exchange short messages of varied length between ARM and DSP clients. The reader retrieves messages from queue and the writer writes messages to the queue. MSGQ supports one reader and multiple writers. The below shows a MSGQ example.
POOL is an API which opens and closes memory pools which are used by the CHNL and MSGQ components that allocates buffers needed to transfer data between the ARM processor and the DSP.
3.4. Software Framework and Tools of DM6446EVM
The DM6446EVM is equipped with software frameworks and tools which allows developers to reduce their system's development time. These software frameworks include Codec Engine, eXpress DSP algorithm Interoperability Standard (xDAIS) tools and eXpress DSP Components (XDC) toolset. The following sections of the report will elaborate on these features.
3.4.1. Codec Engine
Codec Engine is a collection of APIs which the developer can instantiate and execute xDAIS algorithms. It has a Video, Image, Speech and Audio (VISA) interface to communicate with the xDAIS algorithms. One set of API is defined per codec class. A MPEG4 can be changed to H.264 by changing the configuration. This allows software reusability. The Codec Engine supports real-time execution of codec. APIs are also defined to access memory, log CPU utilization statistics and execution trace information.
The advantages of this software framework are:
· Easy to use: developers just to specify codec to run
· Scalable and configurable: supports addition of new algorithm through the use of standard tools
· Portable: APIs are target, platform and codec independent.
The below shows the architecture of an application that exploits the codec engine.
The application calls the Core Engine and VISA APIs. The VISA APIs uses stubs to call the core engine's System Programming Interfaces (SPIs) and the skeletons. The VISA SPIs access the algorithms. For a ARM and DSP board, the application, media middleware and video encoder stubs run on ARM processor. The video encoder skeleton and codecs run the DSP.
3.4.2. eXpress DSP Algorithm Interoperability Standard (xDAIS)
This standard was developed for Texas instrument for TMS320 DSP family. It eases the integration of various DSP algorithms into a system. The xDAIS standard handles issues pertaining to resource allocation and utilization of DSP CPU cycles. This standard conforms to set of guidelines that are used in all its DSP algorithms.
The major advantages of using this standard are:
* Reduces integration time of algorithms
* Allows comparisons of different algorithms from different sources
* Has broad range of compliant algorithms from third parties and reduces the need to custom develop new algorithms
* Works well with Codec Engine Framework
3.4.3. eXpress DSP Components (XDC)
The eXpress DSP Components creates reusable software components. These components are optimized for the use in real-time embedded platforms. The reusable components are called packages. The main advantages of the XDC is that the delivery content is standardized which makes it easier for integration in applications. XDC is used by two groups of developers namely consumers and producers. The consumers integrate target contents into their own applications and producers develop the packages used by consumers. The below shows the relationship between consumers and producers.
Chapter 4: Execution of DM6446 Programs
This chapter illustrates the how the multi-threaded DM6446 programs work. The program of interest in this project is the encode program. The understanding of this program is essential during the later stages of implementation of the system. The interaction between various threads in the program is also explained. The setting up of the environment and compiling of the program is included in Appendix B for user's reference.
4.1. Understanding the encode program of DM6446
In this project, the encode program of the DM6446 is used as a base for encoding the raw captured video frames into the desired baseline H.264/AVC format. The resulting output bit stream is written back to a file on the NFS. Therefore, it is important to understand the workings of the encode program in order to make modification to the program. In this section of the report, the workflow of the encode program will be discussed.
4.1.1. Overview of encode program
The encode program's objective to capture raw video frames using the camera source and encodes into a baseline H.264/AVC format to be written to a output file. The program is a multi- threaded application. The threads make use of mutual exclusion and condition synchronization concepts to ensure the correct execution of the application. The below shows the various threads involved in this program.
The program makes use of 6 POSIX threads. They are main, control, video, display, capture, speech and writer threads. The main thread is evolved to the control in the application. All the threads except control thread are created from the main thread. These threads are cond to be pre-emptive and priority based scheduled. Initialization and cleanup of threads are done by the Rendezvous module. The Rendezvous module uses the condition synchronization to synchronize the threads. The threads are first initialized and it signals the Rendezvous object. Once all the threads are initialized, the threads are unlocked and execute their main loop routines. Hence, the shared buffers are not freed before the other threads are using.
4.1.2. Functions of threads in encode program
Each thread handles a certain function in the whole application. The functions of speech thread will not be discussed as it is not in the scope of this project.
This thread handles all the initializations and also checks the arguments given by the user. Based on these arguments, it creates the necessary threads to start the encoding application. The main thread then invokes the control thread. The below shows the workflow of the main thread.
The control thread handles the user interaction with the application. It constantly polls the IR interface to check if user got press any commands on the IR remote. If the keyboard is enabled, it also checks to see whether the user presses any key on the keyboard. The thread also draws texts and graphics on the LCD display console. It makes uses of the simplewidget utility to do this. Both the ARM and DSP CPU load is also calculated and displayed on the LCD console. Parameters like frame rate, bit rate and time elapsed also displayed.
The video thread is in charge of encoding the video frames into the H.264/AVC. The buffer from the capture thread is passed to the video thread and is encoded by the H.264 algorithm running on the DSP side. It allocates contiguous memory buffer for the writer thread to write the output to the NFS. It then passes the buffer to the writer thread. The below shows the workflow of the video thread.
This thread allows the user to see a preview of the encoded video frame while the encoding is taking place. It makes of the Video Processing Sub System (VPSS) to do the copying of frames in order to be displayed on the LCD console.
The capture thread removes the interlacing artifacts in the raw captured video frames. This is done by using the VPSS resizer module. The resizer module consists of Smooth and the Rszcopy modules. The Smooth module just removes the interlacing artifacts and the Rszcopy copies the raw buffer with any modification. The removal of interlacing artifacts can also be disabled by the user.
Finally the writer thread basically writes the encoded video frames to an output file on the NFS which is specified by the user. DSP processing and writing to file is done in parallel so as to conserve the CPU cycles.
4.1.3. Interaction of Threads in the encode program
After exploring the individual functions of various threads, it is essential to also know about how the threads interact with each other. Since it is a multi-threaded application, it is important the execution of the threads occur in a certain sequence so as to ensure that safety and liveness properties are not violated. The below shows the interaction between the various threads in the application.
After all the threads have been initialized, a raw buffer from capture device is dequeued by the capture device. It sends the raw buffer to the display thread to display the raw video frame to the LCD screen. It fetches empty raw buffer from the video thread. It makes use of the Smooth module to remove interlacing artifacts and puts into the buffer. The video thread receives this buffer to do the encoding of the video frames.
The video thread fetches an I/O buffer from the writer thread where it will place the encoded data. The display thread copy the copies the raw buffer to the display device frame buffer using the Video Processing Subsystem (VPSS) resizer. At the same time, the video is encoding the same buffer on the DSP. Since both VPSS and DSP are only accessing the capture buffer for reading, there would be no contention of data. After the display thread finish copying the buffer, it creates a new frame buffer.
When the video encoder on the DSP has finished encoding, it sends the I/O buffer to the writer thread to write to the Linux network file system. The capture thread is allocated the capture buffer. The writing of the encoded frame is done by the writer thread. This is done while the capture thread is waiting for the next dequeued buffer of the capture device to be ready. This cycle of execution continues till user interrupts/stops the program.
Chapter 5: Open-Source Live555 MediaServer
5.1. Introduction to Live555 MediaServer
The Live555 Media Server is a well defined complete RTSP open-source server application. It makes use of RTSP, RTP, RTCP and SDP for streaming media. Due to the complexity of various network protocols needed for streaming, live555 was used as a base for development of the streaming module of this system. This chapter briefly describes the overview of the live555 open source application. Support for multicast and unicast streaming is also added to streaming stored H.264 video files over the network. It also highlights the steps to cross compile and executing the live555 application on the DM6446EVM board.
5.1.1. Overview of Live555 MediaServer
The Live555 MediaServer It can stream different types of media files over the network. These media files include:
* MPEG Transport Stream file (“.ts” file)
* MPEG 1 or 2 Program Stream file (“.mpg” file)
* MPEG 4 Video Elementary Stream file (“.m4e” file)
* MPEG 1 or 2 audio file (“.mp3” file)
* WAV (PCM) audio file (“.wav” file)
* AMR audio file (“.amr” file)
* AAC (ADTS format) audio file (“.aac” file)
Although the live555 does not support H.264/AVC codec standard, it was added into the live555 media server by implementing classes which encapsulate the streaming of the H.264/AVC media file.
Modifying the live555 to stream H.264/AVC
The live555 server application was implemented using C++ and event-driven model. The task was to implement classes to encapsulate the H.264/AVC file streaming over the network. 5 classes were implemented to achieve the task. These classes are H264VideoFileSink H264VideoFileServerMediaSubsession, H264VideoRTPSink, H264VideoStreamFramer and H264VideoStreamParser. These implementations of classes are further explained in the following sections of this report. The below shows the execution of the implemented classes in order to achieve file streaming over the network.
The objective of this class was to opening and writing of output file. A file is created when the user enters the media file to be streamed. When it reads the first frame of the media file, it adds the 4 bytes start code (0x000001) to the file and continues to write the rest of the data of the media file into this output file. This file is passed onto other classes for further processing of data.
This class creates a dynamic session for streaming the data over the network. It inherits the connection type (unicast or multicast) from another class. The session must be created as RTSP is a stateful protocol. Bandwidth is allocated for this session and other auxiliary parameters are validated. After the session is created, the RTP sink and the video framer is instantiated.
The RTP sink is the underlying transport mechanism of H.264/AVC data. It facilitates in the packetization of NALs to be sent over the network. Firstly, it validates the dynamic SDP parameters such as payload type, sprop parameter sets, profile Id and packetization mode of the media data. It considers three cases of sending NAL units. The cases are as follows:
· Case 1: NAL unit data is present in the buffer and it is small enough to send to the RTP sink
· Case 2: NAL unit data is present in buffer but it is too large to send to the RTP sink. The first fragment of the data is sent as FU-A packet with extra one preceding header byte
· Case 3: NAL unit data is in buffer and some fragments are sent to the RTP sink already. The next fragment of NAL unit data is sent as FU-A packet with extra two preceding header bytes.
The last NAL unit of data is marked by setting the ‘M' bit of the RTP packet. Appropriate delays are set to fragments so that play out of the media file is smooth at the client side.
The aim of this class is to classify the input video data into frames. This is done by continuously reading the input file and identifying the data which is contained in each frame. The frame size is computed and the frame rate of the video is set appropriately by setting the presentation time of each frame.
This class is invoked by the H264VideoStreamFramer in order to correctly parse the data into frames. It checks for the 4 bytes start code before parsing the frames. In the absence of the start codes, the frames are not parsed. This returns frame size of the video so that it can be sent to the stream framer for further processing.
5.1.2. Adding support for unicast and multicast streaming
The live555 MediaServer provides support for both unicast and multicast streaming capabilities. The unicast connection is straightforward as the system will only allow one user to connect to the streamer module in order to receive the video stream. In order to model this type of connection, the H264VideoFileMediaSubsession is used to dynamically create a media session for a single user to receive the video stream. This is modeled after the OnDemandMediaSubsession class which checks the user filename input and compares the extension of the filename. It then returns the appropriate media subsession for the user.
The multicast connection is slightly complex. The system must also allow multiple users to connect and receive video stream from the streamer as this is a video surveillance system. The multicast connection is implemented by using the PassiveServerMediaSubsession. This class makes use of broadcast address to stream to multiple clients. The address is generated at random and it uses the range of [188.8.131.52, 184.108.40.206). As long as the system is connected in the same network, this broadcast address holds valid.
In using multicast connection, there could be instances of different users connecting at different moments in time. For this case, the system is implemented such at even when the users connect at different time instances, they all receive the same video stream. For example, if user A connects at time = 1s, and user B connects at time = 5s. In time = 6s, user A will continue receiving the video stream as normal. The user B will receive the same video stream as user A instead of starting of the video stream. To achieve this, reuseFirstSource parameter is used. When this parameter is set, the server will only sent the video stream packets to the client of the first video stream.
5.2. Porting Live555 MediaServer to DM6446 board
The next stage of development involved the porting of live555 MediaServer onto the DM6446 board. Since the system should run on the embedded board, live555 application has to be ported onto the board. This section of the report, the porting process and the running of the live555 application is explained.
5.2.1. Modifying the make files of Live555 and DM6446 encode program
The live555 application uses the make files in order to compile and execute. The make files describe how the various classes and objects are to be compiled and linked in order to execute. The original application uses the GNU C++ compiler to make the application. In order to port the application to the board, it must be cross compiled with the board tool chain. The board has a montaVista tool chain which make uses of the arm C and C++ cross compiler. If the cross compiler can compile the application, it can run on the board.
Firstly, the make files are of the live555 had to be modified to inform that the classes must use the board cross compiler to compile the application. The below shows the top portion of the original make file.
In the make file, the various compiler and suffixes are defined. The classes make use of these parameters to compile the application.
From the two s shown above, the C_Compiler and CPLUSPLUS_COMPILER variable is changed to the montaVista tool chain compiler. Once this modification is done, the live555 can be cross compiled for the DM6446 board.
The make files of the encode program is also changed so that the live555 is compiled along with the encode program and the resulting executable is stored directly into the appropriate directory. Firstly, the make file in the dvevm_1_10/demos is changed as:
The live directory is added to the SUBDIRS variable. Once this is done, the live555 will be compiled along with the encode programs of the board and the resulting executable will be stored into /home/ansary/workdir/filesys/opt/dvevm directory.
5.2.2. Cross compiling live555 and running it on the board
Firstly, open up the terminal in the linux host. Change the dvevm_1_10 directory. Type ‘make' to compile and type ‘make install' to install in appropriate directory. Then, boot the board using the minicom and change to opt/dvevm directory. Lastly, type ‘./live555MediaServer' to run the application on the board.
5.3. Receiving video stream on VLC player
When the live555 MediaServer is running on the DM6446 board, the VLC player must connect to the board to receive the video stream for playback to the client. In this section of the report explains the steps in connecting to the DM6446 board to receive the video stream.
5.3.1. Playing vide o stream via VLC
For this system, the VLC client is running on a Windows host machine. The steps in receiving the video stream via VLC are as follows:
1. Launch VLC player
2. Under the Media tab, click on Open Network option.
3. A dialog box should appear
4. Under the Protocol drop box, select RTSP. In the Address text field, type rtsp://220.127.116.11/test
5. Click the Play button.
The VLC player connects to the board's IP address and makes use of the RTSP protocol to start receiving the video stream. Lastly, the video stream is played back for the client.
Chapter 6: From File streaming to Live streaming
6.1. Reasons for Live Streaming and investigation of approaches
The current system encodes the video data and writes to the Linux file system. The file is then passed as an input to the live555 media server program to be streamed over the network. This is two step streaming approach and there are some issues arising from this implementation.
· PROBLEM 1: The file is accessed by both the encode program and live555 media server program. Both the programs are contesting for the use of the file resource which could lead to data contention. This also leads to the next problem.
· PROBLEM 2: The end to end delay initial delay for the playback for the H.264/AVC on a remote station is about 10 - 15 seconds. If the remote VLC media player established a RTSP connection with the board and file is used by the encode program for writing data, the connection will timeout and teardowns the connection. Hence, the user has to re-establish connection again.
· PROBLEM 3: The live555 program reads from the beginning of the encoded video file and streams over to the remote VLC media player. Hence, the VLC would play the delayed version of the encoded video stream.
Due to the limitation of stored file streaming, the concept of live streaming was explored. Live streaming directly encoded video frames directly over the network. This approach prevents the programs from contesting over the file resource and avoiding data contention. Since the encoded video frames are streamed over the network directly, the end to end initial delay of playing back the video stream is significantly reduced. Hence, the user can experience the live version of the encoded video rather than the delayed version of the delayed version.
6.1.1. Possible Integration methodologies
Both the live555 media server program and the encode program have to be integrated together in order to achieve live streaming. Several integration methodologies are explored to achieve integration. Currently, the live555 media server is executing C++ codes whereas the encode program is executing C codes. The encode program is a multi-threaded program whereas the live555 media server is an event-driven program. Two of the methodologies that were considered were:
§ Having a single integrated program which encapsulates both the encoding program and the live555 media server.
§ Having two separate programs communicating with each other via inter-process communication mechanisms.
Single Program VS Multi Program System
Firstly, a single program approach was considered. The live555 media server must be compiled as a library to be integrated with the encode program. The live555 C++ functions and classes must be callable from the encode C program. The C++ classes and function that is being accessed by the C program must be declared using ‘extern C' keyword. The encode program is using threading concept and so the live555 library must be instantiated as a separate thread inside the encode program for the integration. The table below summarizes the pros and cons of having a single program.
Table 3: Pros and Cons of Single Program approach
PROS of Single Program
CONS of Single Program
Only need to run single program on the target
Need to modify the significant portion of the code to ensure it is callable by encode program
Understanding the relationship of threads and analysis of multi-threaded program is time consuming process
Need to ensure live555 thread does not violate thread safety and liveness aspects of the program execution
Program size increases
By adopting the single program system, the disadvantages outweigh advantages. Hence, the multi program approach was explored.
The multi program approach is having the two programs running separately on the target with the means of communicating with each other via inter-process communication. The two programs communicate through sockets. The program flow of the multi program approach is:
1. The encode program captures raw video frame and encodes the video frame into H.264/AVC format.
2. The encode program opens a socket and writes the encoded video frame to the socket.
3. The live555 media server, which listening to the socket, receives the video frame and streams it over the network.
The pros and cons of this approach are summarized in the table below.
Table 4: Pros and Cons of Multi program approach
PROS of Multi program system
CONS of Multi-program system
Fewer modification to individual programs and reduces integration time
Need to ensure reliable data transfer between programs via sockets
Use of sockets facilitate communication with each other
OS must be efficient in allocating resources in executing both the programs simultaneously
Program sizes are smaller than single program approach
After weighing the pros and cons of the two approaches, the multi program system approach was selected to be implemented in order to achieve live streaming of H.264/AVC on the DM6446EVM.
6.1.2. Writing a script to run both encode and live555 MediaServer
The DM6446 is booted up via the minicom application. Since only one instance of the minicom application can communicate with the board, it is only possible to run one program at a time. Therefore, a shell script is written in order to execute both programs at the same time via the minicom command line. The steps in writing a script are as follows:
1. Open an empty text file on the Linux host.
2. Type the following statements:
./encode -v test.264 -r 352x288
3. Save this file as <filename>.sh. This .sh indicates that this file is a shell script. Copy the file into the appropriate directo
Cite This Dissertation
To export a reference to this article please select a referencing stye below: