差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

dsp [2016/05/29 11:00]
gongyu 创建
dsp [2019/04/24 14:14] (当前版本)
gongyu
行 3: 行 3:
 The goal of DSPs is usually to measure, filter and/or compress continuous real-world analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully,​ but dedicated DSPs usually have better power efficiency thus they are more suitable in portable devices such as mobile phones because of power consumption constraints.[3] DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time. The goal of DSPs is usually to measure, filter and/or compress continuous real-world analog signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully,​ but dedicated DSPs usually have better power efficiency thus they are more suitable in portable devices such as mobile phones because of power consumption constraints.[3] DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time.
  
-Overview[edit]+#### Overview 
 + 
 +##### A typical digital processing system
  
-A typical digital processing system 
 Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.
  
行 12: 行 13:
 The architecture of a digital signal processor is optimized specifically for digital signal processing. Most also support some of the features as an applications processor or microcontroller,​ since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below. The architecture of a digital signal processor is optimized specifically for digital signal processing. Most also support some of the features as an applications processor or microcontroller,​ since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.
  
-Architecture[edit] +#### Architecture 
-Software architecture[edit]+##### Software architecture
 By the standards of general-purpose processors, DSP instruction sets are often highly irregular. One implication for software architecture is that hand-optimized assembly-code routines are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms.[clarification needed] By the standards of general-purpose processors, DSP instruction sets are often highly irregular. One implication for software architecture is that hand-optimized assembly-code routines are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms.[clarification needed]
  
-Instruction sets[edit]+##### Instruction sets
 multiply–accumulates (MACs, including fused multiply–add,​ FMA) operations multiply–accumulates (MACs, including fused multiply–add,​ FMA) operations
 used extensively in all kinds of matrix operations used extensively in all kinds of matrix operations
行 29: 行 30:
 VLIW VLIW
 superscalar architecture superscalar architecture
 +
 Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing
 Digital signal processors sometimes use time-stationary encoding to simplify hardware and increase coding efficiency. Digital signal processors sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.
 Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle
 Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing[clarification needed] Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing[clarification needed]
-Data instructions[edit]+ 
 +##### Data instructions
 Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn'​t overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available. Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn'​t overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
 Fixed-point arithmetic is often used to speed up arithmetic processing Fixed-point arithmetic is often used to speed up arithmetic processing
 Single-cycle operations to increase the benefits of pipelining Single-cycle operations to increase the benefits of pipelining
-Program flow[edit]+ 
 +##### Program flow
 Floating-point unit integrated directly into the datapath Floating-point unit integrated directly into the datapath
 Pipelined architecture Pipelined architecture
 Highly parallel multiplier–accumulators (MAC units) Highly parallel multiplier–accumulators (MAC units)
 Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations
-Hardware architecture[edit] + 
-Memory architecture[edit]+##### Hardware architecture 
 +##### Memory architecture
 DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data and/or instructions at the same time, such as the Harvard architecture or Modified von Neumann architecture,​ which use separate program and data memories (sometimes even concurrent access on multiple data buses). DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data and/or instructions at the same time, such as the Harvard architecture or Modified von Neumann architecture,​ which use separate program and data memories (sometimes even concurrent access on multiple data buses).
  
行 57: 行 62:
 Exclusion of a memory management unit Exclusion of a memory management unit
 Memory-address calculation unit Memory-address calculation unit
-History[edit]+ 
 +#### History
 Prior to the advent of stand-alone DSP chips discussed below, most DSP applications were implemented using bit-slice processors. The AMD 2901 bit-slice chip with its family of components was a very popular choice. There were reference designs from AMD, but very often the specifics of a particular design were application specific. These bit slice architectures would sometimes include a peripheral multiplier chip. Examples of these multipliers were a series from TRW including the TDC1008 and TDC1010, some of which included an accumulator,​ providing the requisite multiply–accumulate (MAC) function. Prior to the advent of stand-alone DSP chips discussed below, most DSP applications were implemented using bit-slice processors. The AMD 2901 bit-slice chip with its family of components was a very popular choice. There were reference designs from AMD, but very often the specifics of a particular design were application specific. These bit slice architectures would sometimes include a peripheral multiplier chip. Examples of these multipliers were a series from TRW including the TDC1008 and TDC1010, some of which included an accumulator,​ providing the requisite multiply–accumulate (MAC) function.
  
行 66: 行 72:
 In 1980 the first stand-alone,​ complete DSPs – the NEC µPD7720 and AT&T DSP1 – were presented at the International Solid-State Circuits Conference '80. Both processors were inspired by the research in PSTN telecommunications. In 1980 the first stand-alone,​ complete DSPs – the NEC µPD7720 and AT&T DSP1 – were presented at the International Solid-State Circuits Conference '80. Both processors were inspired by the research in PSTN telecommunications.
  
-The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.[citation needed]+The Altamira DX-1 was another early DSP, utilizing quad integer pipelines with delayed branches and branch prediction.
  
 Another DSP produced by Texas Instruments (TI), the TMS32010 presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture,​ and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs. Another DSP produced by Texas Instruments (TI), the TMS32010 presented in 1983, proved to be an even bigger success. It was based on the Harvard architecture,​ and so had separate instruction and data memory. It already had a special instruction set, with instructions like load-and-accumulate or multiply-and-accumulate. It could work on 16-bit numbers and needed 390 ns for a multiply–add operation. TI is now the market leader in general-purpose DSPs.
行 76: 行 82:
 The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/​decoding. SIMD extensions were added, VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased, a 3 ns MAC now became possible. The fourth generation is best characterized by the changes in the instruction set and the instruction encoding/​decoding. SIMD extensions were added, VLIW and the superscalar architecture appeared. As always, the clock-speeds have increased, a 3 ns MAC now became possible.
  
-Modern DSPs+#### Modern DSPs
 Modern signal processors yield greater performance;​ this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry and a wider bus system. Not all DSPs provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300 Modern signal processors yield greater performance;​ this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry and a wider bus system. Not all DSPs provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300
  
行 101: 行 107:
 In Communications a new breed of DSPs offering the fusion of both DSP functions and H/W acceleration function is making its way into the mainstream. Such Modem processors include ASOCS ModemX and CEVA's XC4000. In Communications a new breed of DSPs offering the fusion of both DSP functions and H/W acceleration function is making its way into the mainstream. Such Modem processors include ASOCS ModemX and CEVA's XC4000.
  
-See also[edit]+#### See also
   * Digital signal controller   * Digital signal controller
   * Graphics processing unit   * Graphics processing unit