REXXwish number 1


Dis-assembly made easy

One thing common to all computer languages is that they work only in one direction; you write the source version, and translate it to machine code (or that step happens when you actually run an interpreted program). Going from an executable back to the source code is not really doable. Some people can read core dumps, but not many.  So where does REXX enter? Consider Java, byte-codes, the Java DVM, and that byte-codes are not machine-code, but an intermediate way to specify computer instructions; one needs the DVM to translate byte-code to machine code.  NetRexx creates the same byte-codes, and can use the same DVM.  But Java and NetRexx are most certainly different languages!

NetRexx is very readable and easily understandable, right?  So what do you do when you get a program written in Java, but don't know Java from American "instant coffee"?  Translate Java to NetRexx, right?  So create the byte-code for the Java version, and "dis-assemble" it into NetRexx. Great!  But not - yet - possible.  (sigh)   Mike Cowlishaw (NetRexx's author) revealed in email he'd also seen the possibilities of such a dis-assembler.  I got an impression (very vague) that it might be tackled seriously - someday.  But don't hold your breath; we want you to read a few more issues of this newsletter first...

But let's take the dis-assembly concept a bit further.  Wouldn't REXX users benefit from being able to dis-assemble a compiled version of some Fortran program into REXX, fix a bug or put in some enhancement, recompile it, and then run it?  Without having to shake heaven and earth to get hold of the source code, AND need to rewrite the whole thing into REXX?  If we could do this, the whole Y2K scare would be history... Or this neat little utility does almost what you want, but was written in C++.  OK, translate the C++ version to byte-code, and disassemble that to classic REXX.  Make your changes and test-drive it, then do the REXX --> byte-code --> C++ thing, and mail the new version to the author of the program.  In the language *she* understands!

Of course making it possible for such translations from one computer language to any other is a piece of cake, something any casual programmer can do overnight with her eyes closed.  All it takes to set this up is:

  1. Create a "neutral" language, halfway between source and machine code.
  2. Write programs which translate source code to the neutral language.
  3. Write the same set of programs as in step (2), except that they do the translation TO the source language.
  4. Then write the neutral language to machine code translators.
  5. And the machine-code to neutral-language translators.

As I said, one over-nighter, including documentation.  Yeah...  Right...

For Java and NetRexx, step (1) already exists; the byte-code concept. Step (2) has been done too; we can write programs in Java and NetRexx that can result in the same byte-code.  The Java DVM is the implementation of step (4), so we only need to implement steps (3) and (5).  Not an easy task.  But tell a programmer "it can't be done", and he'll prove you wrong before you can blink an eye!

"Neutral-code" also means an increase in portability in more than just one way:

  • Pass the byte-code on, not the source code.  Others will translate back to their own preferred language.
  • Forget about machine code optimizations; the neutral-to-machine code translator for whatever machine it'll be run on takes care of that.
  • Machine-to-neutral translators will allow you to solve legacy problems much more easily.
  • It will be much more difficult for the evil hackers to do their thing when a compiled virus can be returned to source code.

Translators from source code to "neutral code" may be easier to design than the current full-blown interpreters and compilers.  Same for the step from "neutral code" to machine code.  And translating from "neutral code" - a standardized halfway step by definition - to a source language should in principle also be easier than going the full route in one shot.  Same when going from machine code to standardized "neutral code".

Positive sides galore!  Now a few negatives:

  • You need to update TWO translators for a major source language change.
  • How do you provide for functionality not available in the destination language?  Can this be made a non-issue?
  • Would variable names survive the two-step translation unscathed?
  • Can comments be passed on from one source language to another?  And if so, are they of any use?
  • And how about human-language translators for those comments?

Please send your thoughts, using the subject line:
   REXXwishes; "Neutral Code"
to the email address stated below.

F. Scott Ophof <FSOphof@BLCLinks.net>