Digital Signal Processors Are Microprocessors Hardware Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

2.0 Introductions

The backbone of any digital system or electronic device is a digital processor. The importance of digital processors in our industries and their numerous applications in this modern-age cannot be overemphasis. This chapter deals with the features of digital signal processor (DSP) devices. Some common architecture used in their implementation and their real-life applications were also highlighted. Furthermore, the features of TMS320C6713DSK which a typical DSP processor and some its applications were also treated alongside with the daughter card.

2.1 Digital signal processor (DSP)

Digital signal processors are microprocessors (hardware) designed to handle defined functions or tasks of digital signal processing [Smith]. These microprocessors are primarily used to carry out intensive real-time computations. The use of arithmetical algorithm on DSP processors to accomplish a specific task is regard as digital signal processing. While digital signal processor is a hardware based application, digital signal processing is more concern with software implemented on the hardware. The hardware can be used to carry out different operations without necessarily modifying its build-up or the architecture structure but by simply reconfiguring the software algorithms to suit each task. The origin of these algorithms is from the brilliant work of a French scientist called Jean Baptiste Joseph Fourier (1768 - 1830) [Patrick] who established Fourier series with intention of providing heat equation in a metallic material. The Fourier series equation was later found to be a very useful equation in many areas such as signal processing, image processing, acoustic etc. Hence this equation formed the basis of the arithmetical algorithms used in digital signals processing.

The wide use of digital signal processors in the industries today has recorded tremendous growth, especially in the past fifteen (15) years. This is as a result of the continuous efforts of the researchers and microprocessor producers to find ways of improving the DSP's performances in terms of computational competence, easy real-time implementation, reduced physical-size, minimal power consumption, speed-up operations, affordability and application-specific needs [3] and suitable architectures. The need to deliver high quality of services by telecommunication companies and the demand for efficient digital-voice technologies and the quest for optimum productivity of other DSP-based systems has been one of the key factors that immensely contributed to the rapid developmental growth of DSPs.

The use of modern DSP devices has positively influenced our standard of living by providing advanced medical equipment, modern security gadgets, superior health-support devices, advanced communication systems, improved learning facilities, classic entertainment ingenious devices, to mention but a few. DSP has also played a very important role in the world economy by providing a wider market for digital electronic devices whilst increasing the demand for higher quality and reducing the overall cost of signal processing devices [Denyer]. Taking United Kingdom for example who aims at phasing-out all analogue televisions and to adopts the use of digital television system by year 2012, the key player towards achieving this goal is digital signal processors. Consequently, robotics which is second helper to human being couldn't have existed without DSPs and the roles of DSP devices in our modern power generating systems cannot be neglected as well. Table 2.1 outlined specific applications of DSP processors in different areas. The performance of DSP devices varies and as well depends exclusively on their architectural design and their memory structure. The next section deals with architectural structure, features and fixed/floating point DSP respectively.

Table 2.1 applications of digital signal processor chips [Radivojevic].




Error detection and correction

Message encryption

Coding and modulation

Control Application


High-precision servo control

Disk controllers

Fault-tolerant system

Artificial intelligence application

Speech processing

Machine vision

Neural network simulation

Human interface (graphics)


Spectrum analysers


Function generators

Numeric Applications


Arrays processing

Graphics and image processing

Translation, rotation, shading

Matrix arithmetic

Image restoration, compression


Standard peripherals




Target recognition

2.1.1 DSP Architectures

The architectural design of any DSP processor plays an important role in its functionality. Several architectural structures has been design with the aim of achieving optimisation in DSP's performances but most of the available processors nowadays rely on the sequential routines of Von Neumann architectural concept [3],

Von Neumann architecture; this architecture was an initiative of John Von Neumann (1903-1957), an American mathematician [Smith]. This architecture consists of a single memory unit and a single bus. The bus is responsible for transferring data to and from the central processing unit (CPU) [Smith]. Though most of the modern computers are based on the concept of this architecture, but the sluggishness experienced in data transfer remains its major drawback.

Harvard architecture; this architecture was discovered in mid 1940s at University of Harvard during the reign of Howard Aiken (1900-1973) [Smith]. It was based on two separate spaces for data and program memories with separate buses for each, allowing a complete overlap of instruction fetch and execution simultaneously ([3], [15], [16]). This architecture shows some reasonable improvement in its speed of operation over the single bus Von Neumann's architecture. The Harvard architecture was later modified to Super Harvard architecture with an additional two important features.

Super Harvard Architecture: this architecture is basically the same with Harvard architecture but has two distinguished features making it more superior. The new features are; input/output (I/O) controller and instruction caches [Smith]. The instruction cache is a small and fast memory that temporary holds data coming from the data memory and passing it to program memory when need be [Udina]. The I/O controller regulates the flow of signals in and out of the system, example ADC and DAC which control and convert signals from analogue to digital and vice versa. Figures 2.1, 2.2, and 2.3 are the architectural structures of Von Neumann, Harvard and Super Harvard architectures respectively.

Memory data and instruction


Address bus

Data busFig.2.1 Von Neumann Architecture [smith]

DM data bus

PM address bus


Memory instruction only

Data memory instruction only


DM address bus

PM data bus

Fig. 2.2 Harvard Architecture [smith]

Data memory instruction only



Memory instruction & secondary data

I/O controller

Instruction cache

PM data bus

PM address bus

DM address bus

DM data bus


Fig. 2.3 Super Harvard Architecture [smith]

The need to increase the amount operations carried out per instruction and the number of instructions executed in one cycle geared the rapid development of advanced architectural techniques such as; single-instruction, multiple data (SIMD), Acorn RISC Machine (ARM), Scalable Processor Architecture (SPARC), Very Long Instruction Word (VLIW) and Static Superscalar Processing (SSP) architectures[3],[Tanenbaum].

Single-Instruction, Multiple Data (SIMD) Architecture: SIMD architecture consists of multiple execution units and multiple data paths that are capable of carrying out the same operation on multiple data concurrently by one instruction. This type of system utilise the data level parallelism in it operation. SMD systems are generally known for their intelligent way of operation, they send a single instruction to efficiently achieve multiple tasks. Secondly, only instruction that are relevant to all blocks are sent. This is to say, that every instruction sent out must be applied to a block. SIMD on the other hand consumes much power and has large chips surface area; as a result of numerous it's file content. Lucent DSP16000, Texas instruments TMS320C62x are some the DSP processors that uses this architecture [3], the effectiveness of SIMD depends solely on the programmer and the compiler.

Acorn RISC Machine (ARM): ARM was formerly known as Advance reduced instruction set computer (RISC) Machine. Acorn Reduced Instruction Set Computer Machine is an instruction set architecture (ISA) which is based on 32-bit RISC. ARM is the highest used 32-bit instruction set architecture in low power applications such as mobile phones, PDAs, music players and embedded electronics etc, due to its simplicity, less power consumption, low cost and small sized microprocessors and microcontrollers [wiki/ARM]. ARM holdings who are the designer of ARM architectures found in 1983, and have developed several of ARM processors such as ARM7, ARM9, ARM11, and Cortex as the most prominent. The newest Cortex series (Cortex9A) has the capacity of performing at clock speed above 2GHz and consuming about 1.9 Watts delivering 10,000 Dhrystone MIPS [Angel]. ARM operates mostly on a single cycle execution with even 16 Ã- 32-bit array of processor registers and 32-bit (4 byte long) fixed instruction length for simple decoding and pipelining [wiki/ARM].

Scalable Processor Architecture (SPARC): SPARC is one of the first ISA designed with RISC architectural technique. SPARC was introduced in 1987 by Sun Microsystems based on the result of research work done in University of California Berkeley [Tanenbaum] [SPARC] [Opensparc] [wiki/SPARC]. Early SPARC architectures were originally operating on 32-bit architecture while the newer ones (such as UltraSPARC I and II version 9) are upgraded to 64-bit machine to suit the market demand. The modern SPARC processors have 64-bit virtual address and integer data, fast context switching, simple decoding ability, an optimized complier, fault tolerance ability, can use big - and - little - endian byte orders [SPARC], and also an addressable memory array of about 264 bytes. This memory size makes provisions for future expansion and improvement [Tanenbaum]. Modern SPARC processors are scalable (an embedded processor can expand to a large server processor both using the same core instruction set) [wiki/SPARC]. The UltraSPARC architecture was used to design a processor capable of achieving 128 billion floating point operations in one second [wiki/SPARC]. This architecture is widely used in personal computers and laptops.

Very Long Instruction word (VLIW) architecture: VLIW architecture is based on the platform of Texas Instrument's VelociTI architecture. This architecture consists of two data paths and eight independent execution units arranged in two sets. Each of these units is linked together, containing short instructions of 32 bits wide therefore having very long instruction word packet [3]. VLIW are known for their super-fast ability in processing instructions per cycle. TMS320C67x families are common DSP processors with VLIW architecture.

Superscalar architecture: This is a technique that increases the rate of processing instructions per cycle in DPS processors, by take advantage of instruction-level parallelism [3], [Udina]. A superscalar processor executes multiple instructions per clock cycle by despatching multiple instructions at once to redundant part of the processor [Udina]. Superscalar machines are dynamic in nature and can independently process a simultaneously initiated instruction. They can concurrently execute numerous instructions in the pipeline. The use of this particular architecture is common in; Pentium processors, PowerPC and Analogue Devices TigerSHARC [3]. Figures 2.4, 2.5, 2.6 and 2.8 are typical structures of SIMD, ARM, SPRAC and VLIW architectures respectively.



Instruction Memory





Fig. 2.4 Architectural diagram of SIMD [Udina]

Fig 2.5. ARM Cortex A9 in multiple core configuration Architecture [Angel]

Fig. 2.6. SPARC V9 Architecture [Marejka]

256-bit program data bus





32-bit data bus A

32-bit data bus B

Instruction fetch, dispatch and decode

Data patch 1

Data patch 2

Register file 1

Register file 2

L1 | S1 | M1|D1

L2 | S2 | M2|D2

Fig. 2.7 Architectural diagram of VLIW [3]

2.1.2 Fixed/Floating point DSP

DSP chips are classified into two groups; fixed-point and floating point architectures. Fixed-point DSPs are either 16-bit or 24-bit processors; this represents the format in which data are been stored in their memories. A typical 16-bit fixed-point processor stores numeric values in a 16-bit integer format, for example TMS320C55x. Sometimes, though signals and coefficients can only be stored in 16-bit precision, intermediate values (products) might be kept at 32-bit format in the internal saver to minimise the overall traded-off errors. The fixed-point DSP are known to be faster and less expensive than the floating-point counterpart because of their structural make-ups [18].

On the other hand, the floating-point processors are basically classified as 32-bit devices. They store 24-bit mantissa and 8-bit exponent [18]. A typical 32-bit floating-point DSP chip such as TMS320C3x has an advantage of very wide dynamic ranges [16], though operates still at only 24-bit resolution. The dynamic range constraints can be insignificant when they are use in designing, on like the fixed-point DSP design, where scaling factors has to be carried out to avoid arithmetic overflow and this is very inflexible and a time consuming task. Floating-point processors numbers have the form n = M2E, where M is the mantissa and E is the characteristic [16]. Floating-point processors are suitable for applications where the coefficients change with time (variation in coefficients over a certain period), where the signals and coefficients have wide disparities or where huge memory structures are needed such as in video processing [18].

2.1.3 Features of DSP

DSP systems are typified by their real-time operation, with special interest on high throughput rate. They uses algorithms that involve huge arithmetic computations (multiplication, addition and circular convolutions) hence having very large output data from the processor [3]. The general features DSP can be summarize as follows;

DSP devices are highly flexible to implement. They can be programmed severally to handle different tasks without modifying the hardware.

They are low-level power consumption devices.

They have in-built power regulator for low-dropout.

They have self-real-time diagnostic ability for medical-end equipment.

Modern DSP processors have ultra-fast processing ability that is, the number of instruction processed in one operation.

DSP processor such as TMS320C6713dsk has initial clock rate of 225 - 300MHz and processing power of 1.35 giga floating-point operation per second.

Nowadays DSP processors are smaller in size, cheaper, easy real-time implementation and great reliability in their performance.

2.2.0 Hardware Specification

2.2.1 TMS320C6713 Development Starter Kit

The TMS320C6713 development starter kit (DSK) is an advanced very-long-instruction word (VLIW) DSP device with a superior performance from the group of TMS320C6000 platform floating-point DSP processors. The C6000 DSP was first introduced in February 1997 by Texas instrument© and has got several models with the latest model as TMS320C6713 (for simplicity C6713). This is owned to the rapid growth in the semiconductors (integrated chips) development, as newer chips are becoming more and more sophisticated and highly reliable and efficient. The C6713 has a very high computing power and huge on-chip memory which makes it highly suited for real-time applications ([19], [17]). It also have the ability of providing an enhanced direct memory access (EDMA), and operates at an initial clock rate of 225MHz with its processing speed ranging to 1.35 giga floating-point operation per second (GFLOPS) [17]. The C6713 is based on the advanced properties of TI's VelociTI architecture which encompassed the instruction packaging, pre-fetched branching and conditional branching. This architecture is highly flexible and resourceful as it has little or no restrictions on its mode of operations (in terms of how and when instructions are fetched, executed or stored) [17], hence a perfect choice for multichannel multipurpose applications. Figure 2.6 and 2.7 shows the typical C6713DSK board and its codec interface respectively.

Fig.2.6 Typical C6713DSK Board [roinos]

Fig.2.7 C6713DSK codec interface [Udina]

2.2.2 Features of TMS320C6713DSK

TMS320C6713 is one of the important real-time DSP processors that find its uses in different operations owning to its versatility and flexibility. Table 2.2 below summarizes the key features of C6713DSK and brief description of each feature while figure 2.8 and 2.9 displays the features diagrammatically and the memory mapping of the board respectively.

Table 2.2, Summary of the features of TMS320C6713DSK [19]



Initial Clock frequency



4-kbyte, direct-mapped, 64-byte cache line.


4-kbyte, 2-way set associative, 32-byte cache line, 64-bit wide dual-ported


5-cycle L1P miss penalty, 4-cycle L1D miss penalty, up to 64 Kbytes, four 64-bit banks.

L2 cache

5-cycle L1P miss penalty, 4-cycle L1D miss penalty, up to 64 Kbytes, 1/2/3/4-way set associative, 128-byte cache line, four 64-bit banks.

L2 to L1 read path

128 bits.

L1 to L2 write buffer

32-bit, 4-entry, L2 can process a write request every 2 cycle.

Processor speed





Advanced VelociTITM architecture, (advanced very long instruction word) (AVLIW)

Rate of instruction processing

1800 million instructions per second.

Operating temperature

0 - 90oC

Core supply (volt)

IO Supply (V)


3.3 V


About 450 million multiply-accumulate operations per second (MMACS)

Functional Units

8 independent functional units which are: 2ALUs (Floating-Point), 4 ALUs (Floating-/Fixed Point), 2 Multipliers (Floating-/Fixed Point).

External Connectivity

Standard expansion connectors for daughter card use.

Configurable boot options.

The L1P and L1D stands for level one program and level one data respectively while L2 is level two. The L1 and L2 represents the two level memory caches in C67132, L1 consist of a two-way set associative cache and is 4kbytes in size, while L2 has 192Kbytes on-chip Static Random Access Memory (SRAM) for data and program storage. It can be configured in form of all the three C6713 memory systems (Cache system, external Synchronous Dynamic Random Access Memory (SDRAM) and on-chip SRAM) and has 64Kbytes. It is interesting to know that C6713 is compatible with other members of C6000 processors, such that the code for C6711 or C6712 can as well be built on C6713.

Fig 2.8, Key features of TMS320C6713DSK Board [19]

Fig. 2.9 Memory map of C6713DSK [Udina]

2.2.3 TMS320C6713DSK Daughter Card

One of the attractive features of TMS320C6713DSK is the provision for an external board (daughter card). TMS320C6713DSK has three expansion ports for plug-in daughter cards. The daughter card enables users to customize their C6713DSK platform, either by assigning a specific I/O or expanding the board's competences. The three expansion ports are basically for memory, Host Port Interface (HPI) and peripherals [Spectrum]. While memory port is responsible for linking the DSP's asynchronous External Memory Interface (EMIF) signals with the board's memories and memory mapped devices, the HPI is enables multiple DSP processors to communicate and carry out a specified task. On the other hand, the peripheral port provides access to the peripheral signals such as clocks, timers and Multichannel Buffered Serial Ports (McBSPs) [Spectrum]. The signals from these ports are always prevented from influencing the functionality of the board by buffering them however, these signal cannot (for most daughter cards) be modified on the board. Figure 2.10 and 2.11 are typical daughter cards for TMS320C6713DSK both for different purposes. Fig 2.10 is RS-UART; it is used for data transfer. It has two serial ports; data terminal device (DTD) and data communication device (DCD) [IndiaMART], while fig 2.11 is DSK_Audio4 daughter card. It offers four synchronized, 16-bit ADC and DAC channels the C6713DSK [Educational].

Fig.2.10 RS-UART C6713DSK Daughter card [IndiaMART]

Fig.2.11. DSK_Audio4 C6713DSK daughter card [Educational]