"I think I understand how the FORMAT builtin function works" In seven years as chairman of the Rexx standardizing committee, and a few before that chairing the committee that policed IBM's Systems Application Architecture for Rexx, I have found that when the subject of the FORMAT builtin function comes up it is always greeted by groans from the assembled experts. So when the subject came up at a recent meeting and I made the (foolish? reckless? boastful?) statement that provides the title for this article it turned into a challenge to write an account for this website. The account is in two parts, one about using FORMAT, and one about implementing it. What FORMAT does is: - Turn the first argument into a Rexx numeric value.
- Use that value in conjunction with the fourth and fifth arguments to decide whether the result will be in exponential notation (aka floating point) or non-exponential notation (aka plain).
- Take the value arranged in the chosen notation and round it, as necessary, according to the third argument.
- Add leading blanks (and maybe a sign) to the number (and maybe leading zeros on its exponent) to lay out the number with the second argument matching the number of characters before the decimal point, the third argument matching the number of characters after the decimal point, and (if relevant) the fourth argument matching the number of digits in the exponent.
In the text above I have assumed the Classic Rexx syntax: There are practical effects of this step. Some variations on the input, like leading zeros, are removed. Whether the input was "001" or "+1E-0" , the input becomes "1" before anything else happens. (Note though that "1.00" does not become "1", since that might lose some implication about the accuracy of the number.) Also it is possible for the number to be changed to floating point notation and for some least-significant digits to be lost. (To keep more than the default of nine digits overall you will need to make use of the NUMERIC DIGITS feature.) All this is not peculiar to FORMAT, but is characteristic of REXX decimal arithmetic; that topic is worthy of an article in itself. The next step in FORMAT is to decide between plain and floating point for the result. If you want to be certain of plain notation you should set the ExponentDigits argument to zero. If you happen to know that the values in your program are always in a range that won't lead to floating then you will get away with omitting the ExponentDigits argument, but the certain way is to use an ExponentDigits of zero; obviously if you say there are no character positions in the result for an exponent then FORMAT cannot produce an exponent. On the other hand, if you want to be certain of getting a floating point result set the TestDigits argument to zero. If you set the TestDigits greater than zero then the Decimal Arithmetic algorithm (with the TestDigits value in the role of the NUMERIC DIGITS value) will come into play in deciding which numbers get floated. Rather than think through this algorithm, it will usually make sense just to try some values of the TestDigits, increasing the number if FORMAT seems too eager to use floating point, according to your taste. If floating is decided on then the form of floating notation (ENGINEERING or SCIENTIFIC) will be determined by the current value of NUMERIC FORM. By default that will be SCIENTIFIC, with just one digit to the left of the decimal point. Use ENGINEERING notation when you want all exponent values to be multiples of three, and thus for example relating to milliseconds/microseconds/nanoseconds... Remember that the instruction Having decided on floating or plain, the result of FORMAT can be constructed. by giving values for Before, After, and ExponentDigits you can be sure what the layout will be, so your program can produce columns of numbers with the numbers aligned. If ExponentDigits is zero there will be a total of Before+After+(After>0) characters in the result, for example " 35.76" as a result if Before=3 and After=2. The minus sign, when needed, counts as a Before character. The After value can never turn out too small because it has been used to round out of existence any further digits. The Before value can turn out to be too small to allow a large number to be fitted in. When this happens the digits are not simply lost; an error condition is raised instead. (Depending on the Rexx product you are using this may be the ubiquitous "SYNTAX 40" or something more friendly and specific.) When ExponentDigits is not zero the layout will provide room for the letter 'E' and the exponent sign so the length of the result is consistently Before+After+(After>0)+ExponentDigits+2. For example "-35.76E+07" if Before=3, After=2 and ExponentDigits=2. An exponent of zero, eg E+00, is never shown. To keep the length of the result consistent, the character positions where it would appear are replaced by ExponentDigits+2 blanks. The meaning of omitting the Before, After, or ExponentDigits arguments is "as few as are needed" so the length of the result can vary with the value of the input. Not good for aligning columns but good when using the result as a word in a prose message. If you have set ExponentDigits to zero you should omit TestDigits - it is not an error to give TestDigits in that case, but it can't have any effect. Otherwise, omitting TestDigits is the the same as leaving the floating-or-plain decision as it was made by the initial Input+0 step. FORMAT does not have the repertoire of formatting that is found in languages which use an "editing mask". If you want to retain exponents of zero, you will have to program the overlaying of the blanks that FORMAT produces. Retention of leading zeros cannot be done with FORMAT alone. If only a range of positive integers is to be handled, something like RIGHT(1000+input,3) will work, but in general FORMAT followed by CHANGESTR will be needed (and more for negative numbers). Retaining the '+' sign even when it isn't needed requires careful programming because - even with fixed sizes for the components of the number - the position of the leading sign is not fixed. Putting in the commas so that -100000000 appears as -100,000,000 is best done with a subroutine. Perhaps when the plans for this website to be a code repository have progressed, somebody will provide a corpus of fragments and routines sufficiently tried and tested to be the natural source of formatting variations for Rexx. NetRexx has some differences in FORMAT, but they are unlikely to bother you. In the initial step, Input+0, NetRexx FORMAT never makes a floating point number out of a plain one. (Other Rexx family members will do so if the plain number is sufficiently large.) So the NetRexx equivalent of FORMAT(1234567891,1) will fail because it needs more than one character ahead of the decimal point, rather than showing 1.23456789E+9. (But this is a strange use of FORMAT anyway.) In NetRexx omitting TestDigits means "never float", the same as setting ExponentDigits to zero. This will make a difference from other family members when the result of the Input+0 step is floating and ExponentDigits is non-zero. (But if you wanted "never float" you would have set ExponentDigits zero anyway.) NetRexx allows an extra argument on FORMAT so that you can specify the ENGINEERING-or-SCIENTIFIC choice for this particular use of the FORMAT method.
The author of the book "The REXX language" would agree that the FORMAT specification there lacks the completeness and clarity that characterizes most of the book. Starting with the definition there we need to note: Taken literally, the phrase "If the number of places needed for the integer or decimal part exceeds The rounding implied by After must be done after the decision whether to use exponential notation, otherwise we would not know the position of the point and hence not know what digit to round. Working out the exponent value and position of the point (which can be different according to ENGINEERING or SCIENTIFIC notation) before rounding is necessary but not sufficient. If rounding changes the number of digits before the point then the correct exponent has to be worked out again. The other advice for implementors is "Beware, FORMAT is difficult". In October 98 there was known to be an error in the published ANSI specification, and different errors in at least three implementations. (Fortunately errors like missing trailing blanks, or misnormalizing such as a 10E-1 result, rarely get noticed - or maybe users do not complain because they think they must have misunderstood the FORMAT specification!) Brian Marks, Formcroft@compuserve.com |