Getting into Digital Signal Processing Part 4: The DSP
Part 1, Part 2 and Part 3 of this series on digital signal processing examined the processes involved in converting analogue signals to a digital format and back again. Now let’s look at the digital hardware in the middle. But before we do, here are two examples of typical tasks that show just how simple the DSP-specific hardware really is.
FIR digital filter
The Finite Impulse Response digital filter is an algorithm that can be implemented both in hardware or software (Fig.1). Here we have a ‘2-Pole’ filter, so-called because it uses two previous input samples, x(n-1), x(n-2) from the ADC in addition to the current x(n) to produce an output sample y(n). The maths consists of three multiplications by constants (Tap-Gains) a0, a1, a2, with two additions. If these operations can be performed within one sample interval, then the output will be a sampled filtered signal ready for conversion back to analogue form by the DAC – in real-time.
There is the problem of deciding how many ‘poles’ are needed and finding the values of the corresponding tap-gains. I hope to give a detailed design example in a future article, but for now, just consider the basic principles for a low-pass filter. As with analogue filters, the steeper the ‘roll-off’ required, the more poles you need. A ‘brick-wall’ filter with no roll-off would be nice, but that needs an infinite number of poles, obviously impractical to implement in analogue or digital form. However, a software design featuring more than 50 poles is perfectly possible, depending on the frequencies involved, when a modern digital signal processor chip is used. You could build such a filter out of discrete digital logic hardware, but that would get to be bulky and expensive for anything more than a 2-pole filter like this.
The values of those tap-gain multipliers are obtained from the impulse response of that brick-wall filter. As I’ve said, I’ll go into the practical detail in a future article, but if you want to know more in the meantime, here is a good link with actual code examples. The main takeaway here is that a digital signal processing application, such as a digital filter, does not require complex mathematical functions: just multiply and add. In many cases, floating-point maths is not required either and a resolution as low as 8 bits may provide sufficient accuracy.
The perceptron is the lowest ‘unit’ of artificial intelligence, a single-layer neural network roughly equivalent to a biological neuron and synapse (Fig.2). It can have any number of binary inputs, but only one binary output.
The maths performed this time is pretty much the same as that of the FIR filter, that is, input data is multiplied by constants and the results all added together or accumulated. The Activation Function returns a binary output y = 0 if a < 0 or y = 1 if a ≥ 0. The perceptron can only distinguish two classes of objects but it was the first attempt to produce an artificial neural network (ANN) back in 1958. It may not do much, but it does have a primitive ability to learn, deriving the values of those weight constants itself while being trained. Some years later it was realised that networks made of multiple layers of perceptrons were capable of learning complex tasks and so deserved being classified as ‘artificial intelligence’.
Choosing the right hardware
These two very different examples of DSP applications make use of the same hardware building blocks: multiply-accumulators or MACs. Early microprocessors/microcontrollers didn’t even have hardware-based multiply instructions so any real-time DSP code would only work in very slow applications! The first microprocessor with a dedicated MAC and instruction set to match was the Texas Instruments TMS32010 launched back in 1982. It was the first DSP chip and proved to be a real game-changer. For those interested in the history of technology, here is a paper describing how the chip came to be developed.
A modern DSP device will have extra features aiming to speed up execution when implementing an algorithm such as those above. Anything which can be done to reduce the number of instructions that need to be executed in the sampling interval will increase the maximum sampling rate available and hence the input’s upper analogue frequency limit. DSPs often have extra hardware provided for this purpose:
- A hardware loop counter eliminates the need for code which increments a register and triggers an exit when the terminal count is reached.
- A very long accumulator register to cope with the large numbers that may accumulate. For instance, a Microchip 16-bit dsPIC33 has a 40-bit accumulator.
- A MAC instruction will, in one clock cycle, perform an XY multiply, add the result to the accumulator, and update the pointers to the RAM blocks containing X and Y.
- Numerical overflows (when a signed addition of two positive numbers yields a negative result) can be a problem with a long sequence of MAC instructions. DSPs usually detect this and limit the result to the maximum valid positive or negative number as appropriate. In the analogue world, this is called ‘clipping’. The final output signal is distorted but not reduced to random rubbish.
So, for example, the ‘core’ assembly code for a 21-tap FIR filter on a dsPIC33 might look like this:
The REPEAT instruction sets up a hardware loop counter which causes the MAC instruction to execute 21 times. In other words, the whole FIR cycle between samples takes just 21 processor clock cycles to execute, plus of course the overhead of loading a data sample from the ADC and sending the output sample to the DAC.
DSP hardware and instruction sets can be found in many ‘ordinary’ microcontrollers nowadays thanks to the widespread use of the ARM Cortex-M series of processor cores. Any device based on a Cortex-M4 or -M7 core will have DSP functionality.
When one DSP is not enough
Basic real-time signal processing functions like filtering and correlation (locating a particular data string within a continuous transmission) are easily handled by a single DSP. Large scale neural-network-based AI running at video speed is quite another thing. A Deep Learning Inference Engine will need a large array of DSPs operating in parallel for real-time object classification from a video camera. Fortunately, we’ve established that the essential core of a DSP consists of a multiplier followed by an adder and an accumulator register. Many programmable logic arrays or FPGAs now have large numbers of blocks labelled as DSPs which contain just this core logic. See my recent article on FPGAs.
The mathematical principles behind digital signal processing go back many decades. Early computers could confirm the validity of the algorithms, but not in real-time, which made them something of an academic novelty. Now we have the hardware technology to put them to work in embedded systems.
If you're stuck for something to do, follow my posts on Twitter. I link to interesting articles on new electronics and related technologies, retweeting posts I spot about robots, space exploration, and other issues.