Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. G. Fettweis
chair
Digital Signal Transmission Lab
SS 08
Oliver Arnold
Steffen Kunze
chair
Hardware
TU Dresden, 4/29/2008
Slide 3
Digital Signal Processing (DSP)
chair
Wireless / Cellular
HDD
PRML read channel
MR pre-amp
Voice-band audio
RF codecs
Voltage regulation
Servo control
SCSI tranceivers
Consumer Audio
DSP:
Technology
Enabler
Automotive
Digital radio A/D/A
Active suspension
Voltage regulation
Stereo A/D, D/A
PLL
Mixers
Multimedia
Stereo audio
Imaging
Graphics palette
Voltage regulation
DTAD
Speech synthesizer
Mixed-signal
processor
TU Dresden, 4/29/2008
Slide 4
System Considerations
chair
Performance
Interfacing
Power
Size
Ease-of Use
• Programming
• Interfacing
Integration
• Memory
• Peripherals
Cost
• Device cost
• System cost
• Development cost
• Time to market
• Debugging
TU Dresden, 4/29/2008
Slide 5
Why Go Digital?
chair
Digital signal processing techniques are now
so powerful that sometimes it is extremely
difficult, if not impossible, for analogue signal
processing to achieve similar performance.
Examples:
FIR filter with linear phase
Adaptive filters
TU Dresden, 4/29/2008
Slide 6
Why Go Digital?
chair
Analogue signal processing is achieved by
using analogue components such as:
Resistors
Capacitors
Inductors
The inherent tolerances associated with
these components, temperature, voltage
changes and mechanical vibrations can
dramatically affect the effectiveness of the
analogue circuitry
TU Dresden, 4/29/2008
Slide 7
Why Go Digital?
chair
With DSP? - It is easy to:
Change applications
Correct applications
Update applications
Additionally DSPs reduce:
Noise susceptibility
Chip count
Development time
Cost
Power consumption
TU Dresden, 4/29/2008
Slide 8
chair
General Introduction to DSPs
TU Dresden, 4/29/2008
Slide 9
What Problem Are We Trying To Solve?
chair
x
Y
ADC
DAC
DSP
Digital sampling of
an analog signal:
Most DSP algorithms can be
expressed as:
count
A
Y = Σ ai * xi
i = 1
for (i = 1; i < count; i++){
sum += m[i] * n[i]; }
t
TU Dresden, 4/29/2008
Slide 10
What are the typical DSP algorithms?
chair
The Sum of Products (SOP) is the key element in
most DSP algorithms:
Algorithm
Equation
M
y(n) = a x(n − k)
∑
k
Finite Impulse Response Filter
k=0
M
N
y(n)= ak x(n − k)+ bk y(n − k)
∑
∑
Infinite Impulse Response Filter
Convolution
k=0
k=1
N
y(n)= x(k)h(n − k)
∑
k=0
N−1
X (k) =
x(n)exp[− j(2π / N)nk]
∑
Discrete Fourier Transform
Discrete Cosine Transform
n=0
N −1
π
⎡
⎤
F
(
u
)
=
c(u). f (x).cos
u
(
2x +1
)
∑
⎢
⎥
2N
⎣
⎦
x=0
TU Dresden, 4/29/2008
Slide 11
Why do we need DSP processors?
chair
Use a DSP processor when the following
are required:
Cost saving
Smaller size
Low power consumption
Processing of many “high” frequency signals in
real-time
Use a GPP processor when the following
are required:
Large memory
Advanced operating systems
TU Dresden, 4/29/2008
Slide 12
Hardware vs. Microcode multiplication
chair
DSP processors are optimized to perform
multiplication and addition operations.
Multiplication and addition are done in
hardware and in one cycle.
Example: 4-bit multiply (unsigned).
Hardware
Microcode
1011
1011
x 1110
x 1110
10011010
0000 Cycle 1
1011. Cycle 2
1011.. Cycle 3
1011... Cycle 4
10011010 Cycle 5
TU Dresden, 4/29/2008
Slide 13
General Purpose DSP vs. DSP in ASIC
chair
Application Specific Integrated Circuits
(ASICs) are semiconductors designed for
dedicated functions.
The advantages and disadvantages of using
ASICs are listed below:
Advantages
Disadvantages
• High throughput
• Lower silicon area
• Lower power consumption
• Improved reliability
• Reduction in system noise
• Low overall system cost
• High investment cost
• Less flexibility
• Long time from design to
market
TU Dresden, 4/29/2008
Slide 14
Floating vs. Fixed point processors
chair
Applications which require:
High precision
Wide dynamic range
High signal-to-noise ratio
Ease of use
ÎNeed a floating point processor
Drawback of floating point processors:
Higher power consumption
Usually higher cost
Usually slower than fixed-point counterparts and
larger in size
TU Dresden, 4/29/2008
Slide 15
chair
TMS320C6711 Architectural Overview
TU Dresden, 4/29/2008
Slide 16
General DSP System Block Diagram
chair
Internal Memory
Internal Buses
P
E
R
I
P
H
E
R
A
L
S
External
Memory
Central
Processing
Unit
TU Dresden, 4/29/2008
Slide 17
‘6711 CPU Overview
chair
Specification
Clock Rate: 100/150 MHz Î 600/900 MFLOPS
0.18-μm/5-Level Metal Process – CMOS Technology
CPU has got two Datapaths, altogether:
Four ALUs (Floating- and Fixed-Point)
Two ALUs (Fixed-Point)
Two Multipliers (Floating- and Fixed-Point)
Load-Store Architecture
2*16 32-Bit General-Purpose Registers
TU Dresden, 4/29/2008
Slide 18
‘6711 CPU Overview
chair
VelociTI Î advanced very-long instruction words (VLIW)
Program Memory Width is 256 Bit
Up to 8 32-Bit instructions can be executed in parallel/Cycle
16, 32 and 40 bit fixed point operands
32 and 64 bit floating point operands
Instruction parallelism is detected at compile-time
no data dependency checking is done in Hardware.
Instruction Packing Reduces Code Size
All operations work on registers
Memory Architecture
4K-Byte L1P Program Cache (Direct Mapped)
4K-Byte L1D Data Cache (2-Way Set-Associative)
64K-Byte L2 Unified Mapped RAM/L2 Cache (Flexible Data/Program
Allocation)
TU Dresden, 4/29/2008
Slide 19
Functional Block and CPU Diagram
chair
TU
Slide 20
A ‘6711 Datapath
chair
.S & .L
Arithmetic, Logical
& Branch functions
.M
Multiply, Rotation,
Bit expansion
.D
Data-addressing
Only way to access
memory
Cross path
TU Dresden, 4/29/2008
Slide 21
Functional Units and Operations Performed
chair
TU Dresden, 4/29/2008
Slide 22
C6700: Instruction Set
chair
.S Unit
.L Unit
ABS
ADD
NOT
NEG
NOT
OR
ABSSP
ADDSP
ADDDP
SUBSP
SUBDP
INTSP
ADD
ADDK
ADD2
AND
B
OR
ABSDP
AND
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
.S
CMPEQ
CMPGT
CMPLT
LMBD
MV
SET
SHL
CLR
EXT
MV
MVC
MVK
MVKH
SHR
SSHL
SUB
SUB2
XOR
ZERO
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP
.L
NEG
NORM
RCPDP
RSQRSP
RSQRDP
SPDP
.D
.M Unit
MPY
SMPY
SMPYH MPYDP
MPYI
MPYSP
.D Unit
MPYH
MPYLH
MPYHL
.M
ADD
NEG
STB
SUB
MPYID
ADDAB (B/H/W)
(B/H/W)
LDB
LDDW
MV
(B/H/W)
No Unit Used
SUBAB (B/H/W)
ZERO
NOP
IDLE
TU Dresden, 4/29/2008
Slide 23
'C6x System Block Diagram
r
Program
RAM
Data Ram
Addr
Internal Buses
DMA
D (32)
EMIF
Serial Port
Host Port
Boot Load
Timers
.D1 .D2
.M1 .M2
.L1 .L2
Ext’l
Memory
- Sync
- Async
.S1 .S2
Control Regs
CPU
Pwr Down
TU Dresden, 4/29/2008
Slide 24
‘C6000 Internal Buses
chair
Program Addr
Program Data
x32
PC
x256
Internal
Memory
Data Addr - T1
Data Data - T1
x32
A
regs
x32/64
External
Memory
Data Addr - T2
Data Data - T2
x32
B
regs
x32/64
DMA Addr - Read
DMA Data - Read
x32
x32
Peripherals
DMA
DMA Addr - Write
DMA Data - Write
x32
x32
TU Dresden, 4/29/2008
Slide 25
How are Peripherals Controlled?
chair
Control and configuration of internal peripherals is
done by memory mapped control registers
There is a separate memory mapped register file of
control registers
Example of Timer mode control register:
31
11
10
9
8
12
Rsvd
TSAT INVIMP CLKSRC C/P
7
6
5
4
3
2
1
0
HLD GO Rsvd PWID DATIN DATOUT INVOUT Func
TU Dresden, 4/29/2008
Slide 26
‘C6711 Memory Map
chair
0000_0000
64K x 8 Internal
External Memory
Async (SRAM, ROM, etc.)
(L2 cache)
Byte Address
Sync (SBSRAM, SDRAM)
0180_0000
Internal Memory
Unified (data or prog)
4 blocks - each can be
RAM or cache
On-chip Peripherals
8000_0000
9000_0000
A000_0000
B000_0000
0
1
2
3
256M x 8 External
256M x 8 External
256M x 8 External
256M x 8 External
Level 1 Cache
4KB Program
4K
P
4KB Data
Not in map
L2
CPU
64K
4K
D
FFFF_FFFF
TU Dresden, 4/29/2008
Memory Map
chair
0000_0000
16MB SDRAM
64KB Internal
(Progam or Data)
128K byte FLASH
4 byte I/O Port
0180_0000
On-chip Periph
9008_0000
LED’s
Switches
8000_0000
9000_0000
A000_0000
B000_0000
DSK status
DSK rev#
Daughter Card
256MB External
256MB External
256MB External
256MB External
Available via
Daughter Card
Connector
FFFF_FFFF
TU Dresden, 4/29/2008
Slide 28
Operands
chair
Operands can be
5-bit constants (or 16-bit in some special instruct.)
32-bit Registers
40-bit Registers
64-bit Registers
A 40-bit or a 64-bit register can be obtained by
concatenating two registers
The registers must be from the same side
The first register must be even and the second odd (e.g.
A1:A0, B9:B8 or A15:A14)
The registers must be consecutive
TU Dresden, 4/29/2008
Slide 29
Conditional execution
chair
All instructions in each Functional Unit of both Data
paths can be executed conditionally
Only the Registers A1, A2, B0, B1, B2 can hold the
condition
Conditional Execution uses the Syntax
[!condition] Instruction
e.g
[!B0] ADD.L1 A1,A2,A3 ; add if B0 ==0
[B0] ADD.L1 A1,A2,A3 ; add if B0 != 0
TU Dresden, 4/29/2008
Slide 30
Branches
chair
Branches are required to realize loops and change
the program flow
Branches are very useful in conjunction with
conditional execution
There are two branch types supported:
Relative Branching
Absolute Branching
TU Dresden, 4/29/2008
Slide 31
More on the Branch Instruction (1)
chair
With this processor all the instructions are encoded
in a 32-bit.
Therefore the label must have a dynamic range of
less than 32-bit as the instruction B has to be
coded.
32-bit
21-bit relative address
B
Case 1:
B .S1
label
Relative branch.
Label limited to +/- 220 offset.
TU Dresden, 4/29/2008
Slide 32
More on the Branch Instruction (2)
chair
By specifying a register as an operand instead of
a label, it is possible to have an absolute branch.
This will allow a dynamic range of 232.
32-bit
5-bit register
B
code
Case 2:
B .S2 register
Absolute branch.
Operates on .S2 ONLY!
TU Dresden, 4/29/2008
Slide 33
Getting Data from the Memory
chair
All Instructions work exclusively on Registers
The .D Units in the Data-Paths are used to load and
store the required Data from and to the Memory
Load and Store Instructions use an Address
operator X:
TU Dresden, 4/29/2008
Slide 34
Addressing Modes
chair
There are two addressing modes supported:
Linear Addressing
Circular Addressing (e.g. Convolution)
Circular Addressing supports block sizes 2N
Only the lower N bits of the Address are modified by address
arithmetic. This equals mod(2N) operations.
The addressing mode is selected by control register
„AMR‘
Operands for CA are limited to A4-A7, B4-B7
TU Dresden, 4/29/2008
Slide 35
Floating vs. Fixed point processors
chair
Fixed point arithmetic
16-bit (integer or fractional)
Signed or unsigned
Floating point arithmetic
32-bit single precision
64-bit single precision
N −1
Using signed and unsigned integers:
Multiplication overflow.
Addition overflow
y
(
n
)
= a
(
k
)
x
(
n − k
)
∑
k=0
Î Saturate the result
Î Double precision result
Î Fractional arithmetic
e.g. If A and B are fractional then: A x B < min(A, B)
TU Dresden, 4/29/2008
Slide 36
C6000 C Data Types
chair
Type
Size
Representation
char, signed char
unsigned char
short
unsigned short
int, signed int
unsigned int
8 bits
8 bits
16 bits
16 bits
32 bits
32 bits
ASCII
ASCII
2’s complement
binary
2s complement
binary
long, signed long 40 bits
2’s complement
binary
unsigned long
enum
40 bits
32 bits
32 bits
64 bits
64 bits
32 bits
2’s complement
IEEE 32-bit
IEEE 64-bit
IEEE 64-bit
binary
float
double
long double
pointers
TU Dresden, 4/29/2008
Slide 37
Numerical Issues - Useful Tips
chair
Multiply by 2:
Divide by 2:
Log2N:
Use shift left
Use shift right
Use shift
Sine, Cosine, Log: Use look up tables
To convert a fractional number to hex:
Num x 215
Then convert to hex
e.g: convert 0.5 to hex
0.5 x 215 = 16384
(16384)dec = (0x4000)hex
TU Dresden, 4/29/2008
Slide 38
Numerical Issues - 32-bit Multiplication
chair
It is possible to perform 32-bit multiplication
using 16-bit multipliers.
Example: c = a x b (with 32-bit values).
a = ah
bh
al
bl
b =
32-bits
a * b = (ah << 16 + al)* (bh << 16 + bl)
= [(ah * bh) << 32] + [(al * bh) << 16] +
[(ah * bl) << 16] + [al * bl ]
TU Dresden, 4/29/2008
Slide 39
chair
Selected ‘6711 Peripherals
TU Dresden, 4/29/2008
Slide 40
C6000 Peripherals
chair
XB
Internal
Memory
Host μC
Host Port
PCI
External
Memory
16/32
EMIF
Internal Buses
McBSPs
.D1 .D2
.M1 .M2
.L1 .L2
EDMA
DMA
Boot Loader
.S1 .S2
Timer/Count
PLL
CPU
TU Dresden, 4/29/2008
Slide 41
The McBSP
chair
Multichannel Buffered Serial Port
Up to 100 Mb/sec performance
2 (or 3) full-duplex, synchronous serial-ports
Enables direct interfacing to industry standard
Codecs, Analog interface Chips and other serially
connected devices
Supports a wide range of data-sizes, including 8, 12,
16, 20, 24 and 32 bits
ÎBit, Word(channel), Frame, Phase
ÎIn our lab the McBSP is used to connect to the A/D,
D/A daughter card
TU Dresden, 4/29/2008
Slide 42
PCM3003
chair
MONOLITHIC 20-BIT DS ADC AND DAC
16-/20-BIT INPUT/OUTPUT DATA
HARDWARE CONTROL: PCM3003
STEREO ADC: SNR: 90dB & DynamicRange: 90dB
STEREO DAC: SNR: 94dB & Dynamic Range: 94dB
Digital Attenuation (256 Steps), Soft Mute, Digital Loop Back
SAMPLING RATE: Up to 48kHz
SYSTEM CLOCK: 256fS, 384fS, 512fS
TU Dresden, 4/29/2008
Slide 43
What is the bootloader?
VCC
VCC
chair
Boot Config
L1P Cache
Addr
0000
0001
0002
0003
...
CPU
EPROM
DMA
L1D Cache
C6211/C6711
When the DSP is NOT powered or under
reset the internal program memory is in a
random state.
TU Dresden, 4/29/2008
Slide 44
What is the bootloader?
VCC
VCC
chair
Boot Config
L1P Cache
Addr
0000
0001
0002
0003
...
CPU
EPROM
DMA
PC=0000123
L1D Cache
C6211/C6711
When the DSP is powered and the CPU is taken out of
reset the internal memory is still in a random state and
the program will start running for address zero.
TU Dresden, 4/29/2008
Slide 45
What is the bootloader?
VCC
VCC
chair
Boot Config
L1P Cache
CPU
EPROM
DMA
L1D Cache
C6211/C6711
With the boot, a portion of code can be
automatically copied from external to internal
memory.
TU Dresden, 4/29/2008
Slide 46
Interrupts
chair
DSPs must be able to execute tasks on
asynchronous events
Interrupts suspend the current processor task
and save its context
A interrupt service routine (ISR) is executed
After completion of the ISR, the context of the
former task is restored and the execution
continues
Interrupts are organized hierarchically
Î vs. Polling
TU Dresden, 4/29/2008
Slide 47
Interrupt Interrupt- and Thread Types
chair
HWI priorities set by hardware
ÎOne ISR per interrupt
14 SWI priority levels ÎMultiple
SWIs at each level
15 TSK priority levels ÎMultiple
TSKs at each level
Multiple IDL functions
ÎContinuous loop
ÎHWI triggered by hardware interrupt
ÎIDL runs as the background thread
TU Dresden, 4/29/2008
Slide 48
chair
The DSK6711 Development Kit
TU Dresden, 4/29/2008
Slide 49
DSK Contents
Hardware
150 MHz ‘C6711 DSP
TI 16-bit A/D Converter (‘AD535)
External Memory
16M Bytes SDRAM
128K Bytes Flash ROM
LED’s
Daughter card expansion
Power Supply & Parallel Port Cable
Software
Code Generation Tools
(C Compiler, Assembler & Linker)
Code Composer Debugger
(256K program limitation)
Example Programs & S/W Utilities
Power-on Self Test
Flash Utility Program
Board Confidence Test
Host access via DLL
Sample Program(s)
TU Dresden, 4/29/2008
Slide 50
C6711 DSK Overview
chair
1.8V Power Supply 16M SDRAM
128K FLASH
Daughter Card I/F
(EMIF Connector)
Parallel
Port I/F
TMS320C6711
‘C6711
DSP
Power
Jack
D. Card I/F
(Periph Con.)
Power
LED
User DIP
switches
Three User LEDs
Reset
3.3V Power Supply
16-bit codec (A/D & D/A)
Emulation
JTAG Header
Line Level Input (microphone)
Line Level Output (speakers)
JTAG Header
Software: (4) PC Æ DSK Communications
chair
CCS uses parallel port to control DSP via JTAG port
You can use full TI eXtended Dev System (XDS) via 14 pin
header connector
Communicate from Windows program (C++, VB) via parallel
port using Win32 DLL
Use HPI via Win32 DLL
DSP
JTAG
JTAG
Emulation
.......
.......
Port
TU Dresden, 4/29/2008
Slide 52
What happens to the Source-Code?
chair
TU Dresden, 4/29/2008
Slide 53
|