How do you feel about this article? Help us to provide better content for you.
Thank you! Your feedback has been received.
There was a problem submitting your feedback, please try again later.
What do you think of this article?
FIGnition: an Arduino-format single-board computer, based on the ATmega328 MCU and programmed in Forth that just needs a video monitor and power supply to work. It's an excellent tool for education, but my own designs for a Forth computer were aimed at the embedded-applications development engineer.
I began my project to create a version of the vintage computer language Forth for a modern microcontroller, the Microchip dsPIC33, several years ago. My starting point was LUT-Forth, code I wrote in 1982 for the popular Zilog Z80 microprocessor as used in the Sinclair ZX81 and Spectrum at the time. At first glance, it seems like comparing apples with oranges to set an old technology 8-bit microprocessor (MPU) up against a modern 16-bit microcontroller (MCU). The MCU is of course loaded with on-chip peripheral hardware such as sophisticated timer/counters, serial bus controllers and the like. The inclusion of these functions together with integral non-volatile program memory and data SRAM are what turned a basic MPU into an MCU. Modern MPUs have lots of extra hardware too, principally Memory Management Units (MMU) and Graphics Processor Units (GPU) to reflect their use in general-purpose computers running operating systems like Windows or Linux.
Both my Forth compiler/interpreters were designed to generate code for embedded applications. In 1982 the Z80 was ubiquitous, being found inside the first ‘home’ computers and, before the IBM PC arrived, in desktop/laboratory machines with 8in. floppy disk drives and an operating system called CP/M. Microcontroller chips were a bit thin on the ground – apart from the Intel 8051 – so any embedded project needed memory, serial and/or parallel I/O interface chips and peripheral devices such as analogue-digital converters (ADCs and DACs).
The Forth programming language was invented because:
- The mainframe high-level language compilers such as FORTRAN, ALGOL and COBOL produced code that just wouldn’t fit into the maximum 64KB memory space of the early MPU chips. The compilers were certainly far too big to run on a micro and you needed a mainframe or minicomputer to compile your programs into executable machine code.
- A BASIC interpreter would just about fit into the available memory space along with the user’s source program and didn’t need an operating system, but it ran far too slowly and was only really suitable for education. That’s why all those early home computers ran BASIC!
- Fast operation is needed for hardware control or embedded applications.
A Forth implementation contains both interpreter and compiler, and yet takes up only a few Kbytes of program space. Like BASIC, it doesn’t need an operating system to be present. Unlike BASIC, it has a compiler that converts the source code to a linked-list executing independently of the interpreter. As a result, it’s a whole lot faster. Forth is also fast because it’s simple and the source code which it has to interpret or compile-then-run is structured to suit a computer brain, not a human one. It all comes down to the order of the operators and operands in each line of source code.
Get your operators and operands in the right order
The traditional way of recording a series of numerical operations is in the form of an equation, with placeholders for actual numbers called variables. So, we write down an equation for the addition of two numbers in this form:
A = B + C where A will be assigned the result of adding the contents of B and C.
This is a mathematical expression because the operands (B and C) are variables, as is the result A. It becomes arithmetic when actual numbers are substituted in place of the variable names. It’s also an example of infix notation because the operator (+) is placed between the two operands. Programming languages like BASIC and Python both use this convention.
Nearly all pocket calculators use it too, albeit with numbers not variables. For example:
6 x 4 = Pressing the ‘=’ key causes the operation (x) to occur and assigns the result to the display.
The infix convention is great for human brains because that nice ‘equals’ sign makes it easy for us to follow what’s happening. The problem is that a computer wastes valuable time converting a command line in human-friendly infix syntax to one it can ‘understand’. Rearranging the order of operators and operands removes this unnecessary overhead:
- The ‘equals’ sign becomes redundant and
- Parentheses to ensure operations are executed in the right order are also redundant.
So, first up is a prefix or ‘Polish’ notation, where the operator comes before the operands like this:
B + C becomes + B C where the + operator ‘knows’ to wait for the two operands B and C before executing. In computer terms, + and B are ‘pushed’ onto a Last-In Last-Out (LIFO) stack until C is input, then the operation is carried out leaving the result as the top item on the stack. No equals sign is needed. Another more complex example:
(B – C) x D becomes x – B C D Note how the operator x is ‘stacked’ until it has two operands to work on, which only occurs when – B C has yielded a result. No parentheses are needed, at the expense of slightly incomprehensible syntax!
Finally, we get to postfix or ‘Reverse Polish’ notation (RPN), where amazingly, the operands come before the operator:
(B – C) x D becomes B C – D x or even D B C – x Both made possible by that LIFO stack. This time though, operands are pushed onto the stack until an operator is input which then works on the last two stack entries. So, B – C is executed first and the result left on top of the stack, followed by (result of B – C) x D. Easy isn’t it? Postfix is better than Infix or Prefix because operators do not have to be saved temporarily on the stack.
The bottom line is: the extra effort involved in using a programming syntax that the processor more readily ‘understands’ pays dividends in terms of speed and efficiency. In other words, feed it data and instructions in the right order to avoid wasting time with temporary storage. That’s what RPN does for you.
Back in the early 1980s versions were being written unofficially for just about every microprocessor in existence – and their variants. Even now, adherents to the Forth cause are churning out code for the latest devices including those with ARM Cortex-M processor cores. In 1982 a set of ‘benchmark’ programs was devised: sixteen short pieces of code for testing the execution speed of basic Forth instructions. Each one was embedded in a loop which repeated up to 100,000 times so that a stopwatch could be used for timing! The Forth source code for these benchmarks can be found under Downloads below. Benchmarking FORTHdsPIC is made a lot easier and more accurate by using one of the dsPIC33’s on-chip interrupt-driven 32-bit timers. The Forth code for the updated benchmark set can be downloaded from the project page.
Each test repeated 10,000 times for stopwatch timing (!)
The comparison table shows just how much faster the 70MHz dsPIC33E is than the 4MHz Z80A. No surprise there then. But there are other factors:
- The Z80 is only an 8-bit machine. However, it does possess some instructions that treat its six 8-bit working registers as three 16-bit register pairs.
- The Z80 chip is an Intel 8080 ‘on steroids’, with more registers and a significantly enhanced instruction set. LUT-Forth only uses the 8080 instructions and register set, so it would probably run a lot faster if the full range of Z80 features had been used!
- Most dsPIC33 instructions take a single 70MHz clock cycle to execute (with the exception of BRAnches and CALLs). All Z80 instructions require multiple 4MHz clock cycles.
- The dsPIC33 is in a different league when it comes to the Single-Precision and Double-Precision arithmetic timings. That’s because it has multiply and divide hardware with single-cycle, signed and unsigned mixed-precision multiply instructions.
- The dsPIC33 is very much a stack-oriented processor: any Forth implementation benefits from its multiple hardware stacks and an addressing mode that allows most instructions to access them directly.
Forth has always been considered fast and compact in its use of memory, making it ideal for small microprocessors running embedded applications. Its use of RPN puts a lot of people off because it’s seen as ‘difficult’. But Forth code is considered to be very reliable and less likely to contain bugs. This may be due to the programmer taking greater care when generating the code. Being able to test out new word definitions in interpreter mode before compiling them also helps. Code reliability and speed may explain why so many NASA and ESA robotic spacecraft have been programmed in Forth, often running on a special radiation-hardened microcontroller, the Harris/Intersil RTX2010.
The Z80 MPU lives on in the form of the Z84C00xx, the Z80180xx and the eZ80F91 .
If you're stuck for something to do, follow my posts on Twitter. I link to interesting articles on new electronics and related technologies, retweeting posts I spot about robots, space exploration and other issues.