CRX - Bits and Pieces

A Complex Instruction Set Computer

Coding in assembly language allows a choice between the compact slow way of doing things and the expansive fast way. Your typical "C" compiler cannot make these choices because there is no way to tell it which parts of your program you want done compactly and which parts need to be fast. The choice of ways is most likely to exist if the hardware has a "complex instruction set" rather than a "reduced instruction set".

Here are some examples from the complex Intel 486 instruction set. They are taken from an assembly listing with minimal editing.

Even something as primitive as a move between registers may be a choice. Some registers can be exchanged using a one byte operation so it can be right to us an exchange even when only the transfer in one direction is useful - saves a byte, costs a couple of cycles.

    0000   1   8B C3        mov ax,bx
    0002   3   93           xchg ax,bx

The Jump-if-CX-Zero instruction looks to be a very useful two byte instruction but there is a price - it takes twice the time of the four byte equivalent sequence.

    0003  8,5  E3 28        jcxz SomeLabel
    0005   1   0B C9        or cx,cx
    0007  3,1  74 24        jz SomeLabel

Apart from one pair of similar segment registers, all the registers of the 486 have unique characteristics. Sometimes this makes an easy task out of choosing which register to hold a variable in. Sometimes care is needed to gather in the potential gains. Here, just one of the instructions gains from the use of the AX register rather than BX.

    0009   1   B8 012C      mov ax,300
    000C   1   BB 012C      mov bx,300
    000F   1   05 012C      add ax,300
    0012   1   81 C3 012C   add bx,300
    0016   1   83 C0 03     add ax,3
    0019   1   83 C3 03     add bx,3

Much of the functionality of the 486 comes from the use of prefix bytes on instructions. Here is one with a prefix to say that the register used as index is unusual in that role, a prefix to say that 32 bits are to be operated on although the mode is 16 bits, and a prefix to select a segment register for the addressing. Thirteen bytes was the longest instruction that I could generate and it is worth noting that the assembler expects the "pipelining" of the hardware to be good enough to perform this instruction in one cycle time.

    001C   1   67& 66| 2E: C7 80 mov Place[eax],1
                 000000F5 R

The use of long and short branches has always been a fertile ground for space saving trickery. Here we have an instruction that looks the same as the two-byte jump instruction above but compiles to a four byte instruction; the assembler makes this choice because the target of the jump is further away. There is an algorithm for making a perfect job of this choice. (Just changing from four to two bytes on instructions individually where they have close enough targets is not a perfect solution.)

    0029  3,1  0F 84 00CC   jz SomeLabelx
    002D                   SomeLabel:
    002D    00C8 [          byte 200 dup(?)
    00F5 00000000          Place dword ?
    00F9                   SomeLabelx:

However, the potential does not end there. A long jump can be replaced by a short jump to an instruction which unconditionally jumps to the target. It can even be replaced by a jump to a conditional jump if the conditional jump is suitable. Some compilers of a decade ago would offer this sort of compaction of the code; I have not seen an assembler that does. It would actually improve the code of the CRX implementation noticeably, since that code is heavily splattered with branches to the one place where SYNTAX errors are handled.

A third interpreter loop

CRX uses two interpreter loops, essentially to restrict the overheads of tracing to the times when tracing is enabled. There is a case for a third interpreter loop, used when executing Built-in functions. The REXX Standard contains large amounts of coding in REXX, used to define the meanings of Built-in functions and of the ADDRESS instruction etc. So a plausible way of implementing such features in a REXX processor would be to take the code from the Standard, compile it to pseudo-code, and incorporate the pseudo-code in the REXX processor. When the user's program called for a Builtin function to be executed the REXX processor would execute that pseudo-code.

The interpreter loop for that particular executing could be a specialized one. It could make the assumption that there were no errors in the code from the Standard. For easy reading the code in the Standard uses many different variable names but it can easily be recast by a utility program to use the same names for different variables when the uses of the variables do not overlap. If this reduced the number of names to less than 128 the "third interpreter" could use a pseudo-code format with one-byte operands.

Not all the REXX code in the Standard would be suitable for this treatment, because of performance implications, there is much that would be. For example the DATE and TIME routines could well be handled this way, reducing the risk of implementor error in transcribing them from the Standard into assembler code.

Brian Marks