The Charlotte Web Browser, part 2


The first part of this 2-part article was published in the October 1998 issue of the RexxLA Newsletter.  Carl Forde, Jonathan Scott, and Perry Ruiter answer questions posed by Scott Ophof on the making of Charlotte, the text-mode web browser for the VM world.

SCOTT:  Carl, since you now work at Beyond Software, how about some information on their web browser (and how it compares with Charlotte)?

CARL:  Beyond has a web browser called EnterpriseView "EView" that is based on Charlotte Version 1.  But I wouldn't want this to become an "advertorial" for Beyond.

SCOTT: Jonathan, how did you get involved with Charlotte?

JONATHAN:  Charlotte version 1.2 was being used by some enthusiasts within IBM, and we decided to help enhance it to provide effective web access suitable for all CMS users.  Some of us worked initially with Carl Forde and then later with Perry Ruiter, implementing changes to support our requirements and then passing the changes back in CMS UPDATE format.  The first part of this work (including SOCKS support) was led by Tony Benedetti, then I took over in mid-1996.  We also had some useful contributions from James Johnson at Central Missouri State University (the web map support, and the initial support for tables).  Charlotte/2.1.0 was eventually made generally available in May 1997.  About two thirds of the code in version 2 was new or totally rewritten since version 1, and there were many smaller changes too.

SCOTT: Charlotte was originally written in REXX; why?

CARL:  The original Torun Package which Charlotte is based on was written in REXX.  At the time I started working on it, I saw no reason to change that.

JONATHAN:  Charlotte is mostly written in REXX because it's so easy to write and debug, and because it is the easiest language in which to interface to CMS Pipelines, another of the wonders of the modern world!  It was probably originally written in REXX because it was based on previous similar REXX programs such as gopher clients.

PERRY:  Jonathan is right on the money here.  REXX interfaces so well with Pipelines.  That coupled with the fact Charlotte was written not by a professional software house, but rather by working VM systems programmers as a tool for them and their users.  When VM folk write tools it's almost always done with REXX and Pipes.

Looking back at the original Torun code, it was very clear that Rick Troth's VM gopher client (aka Rice Gopher) was the starting point.  It's interesting to note that virtually all of the Rice client's screen handling code was lifted from a Pipeline debugger called PipeDemo, written by Chuck Boeheim of SLAC (the Stanford Linear Accelerator Center).  An organization well known to REXX fans.  So you can see that Charlotte has a checkered pedigree.

SCOTT:  Some parts were changed from REXX to another language; can you tell us which parts, and why?

JONATHAN:  The HTML parsing and formatting logic necessarily contained a lot of intensive loops, and was not performing well in REXX, even when compiled.  It was also becoming messy to maintain because performance was being given priority over good program structure, and there were many cases in which input was not validated properly because it would have impacted performance too much.

SCOTT:  Sounds almost like "had we known we were going to rewrite it to a different language, then ...".

CARL:  Well, when Maciej wrote the original implementation he had no idea what would become of it.  When I rewrote it into Charlotte 1.0 I had no idea that it would become as popular as it has.  The thought never crossed my mind that (portions of) Charlotte would be ported to another language. I could do everything I needed to do and performance was far superior to the previous attempts.  The performance bottleneck of course was the parsing and formatting, which is the portion of the code that Jonathan rewrote into C.

SCOTT:  Carl, could you comment on the value of REXX as a prototyping language?

CARL:  The characterization of REXX that I like is:  REXX is such a good prototyping language, that when you're done with the prototype you put it into production.

SCOTT:  And you, Jonathan?  How did you go about rewriting REXX code in another language?  How useful was REXX to you in the sense of prototyping? Did the rewrite give you any migration problems?

JONATHAN:  After some experiments with various languages in August 1996, I wrote a new parsing and formatting program using C/370 plus CMS Pipelines. The C code had easy access to REXX variables through CMS Pipelines, so it was easy to integrate it with the existing REXX code.  The new parser was initially written as a "batch" parser, taking an HTML file as input and producing a monospaced text file as output, with no reference at all to Charlotte.  (I originally started it out of curiosity to see what sort of algorithms were needed to parse and format HTML properly, especially tables).  When I integrated it into Charlotte, I looked at the calling routine to determine what variables it expected to be set, but I did not look at the original REXX parser at all.   

The sense in which the original would have been a "prototype" is therefore rather limited in this particular case.  It was also a rather long-lived "prototype", having survived the whole of version 1 and two beta releases of version 2.

The new program was written from scratch based on the new HTML 3.2 specification, which meant that the new program was not only much faster than the REXX program but had a lot more function.  The program originally required the C/370 run-time libraries, which caused us some problems in  early beta testing, but with help from the Master Plumber, John Hartmann, I managed to convert it to use the Systems Programming C (SPC) environment, so that it looks just like Assembler as far as Pipelines is concerned.

SCOTT:  It is said that one can learn from history, and from the mistakes made by others.  Could you give an example or so of "why this routine in REXX, why that one in Pipelines", etc.?

JONATHAN:  I can't think of any simple examples.  Most of the changes were to support new function.  We did try to use REXX built-in functions for scanning rather than loops, but this tended to mean that large strings were being manipulated instead of single characters, which often made things worse.  Most of the performance changes were superseded when the parser was replaced with the C version.

REXX coding for performance is a topic in itself; the main principle that I used was to restructure the code to minimize the logic executed on the main paths.  In particular, it often helped performance to replace:
     If A & B then whatever
with:
     If A then
          If B then whatever
in cases where "A" was not usually true and "B" was a non-trivial expression.  This sort of structure is also required where "B" is only valid when "A" is true.  However, this code structure doesn't work very well when an "else" clause is required.  I often wish that REXX had short-cut forms of "AND" and "OR" condition.

SCOTT:  Now for a loaded question.  Aside from using REXX, which features make Charlotte a better browser than others?

JONATHAN:  I'm not familiar with any other current browser on CMS, so I can't make direct comparisons.  Charlotte/2.1.0 is probably very much faster than any pure REXX browser, but because most of it is still in REXX we still have the flexibility to add new function quite easily when required.  Here are some other strong points:

  • Implements almost all of HTML 3.2 that a text-only browser can.
    • Includes tables, with nesting and cell width attribute support.
    • Includes client-side image maps (MAP and AREA tags).
  • Includes some useful Netscape extensions.
    • Cookies (Netscape-style and RFC 2109).
    • Access to FRAMES-only documents (via list of links).
  • Automatically uses terminal code page for data translation.
    • Ensures correct square brackets and national characters.
    • Uses box characters for table borders where available.
  • Supports SOCKS or proxy firewalls.
    • Can choose between multiple firewalls based on URL or IP address.
    • Includes support for alternate name server for "external" hosts.
  • Provides tracing of all TCP/IP requests.
    • Includes all data transmitted and received.
    • Includes name resolution and firewall selection.
  • Link to an active document is treated as a return to that level.
    • Simplifies navigation of hierarchical documents.
  • Includes a simple integrated news reader interface.
    • Keeps track of highest article read in each group.
  • Fully tailorable PF key functions.
    • Includes predefined sets for CMS, CUA and PROFS conventions.
  • User or installation can set up alternative configuration files.
    • Can include or exclude settings using command line options.

SCOTT:  And which points are not so good?

JONATHAN:  I'm not aware of much in the way of weaknesses, as we tried to eliminate them as far as possible!  The majority of limitations encountered when using Charlotte are caused by web page authors increasingly failing to cater for text-only browsers.  (Even then, Charlotte can usually get by.)  Just a couple of things come to mind:

  • Charlotte does require quite a lot of storage (virtual and DASD) to support a long web session.  This is a consequence of an early design decision to cache all documents for the current session, which had the performance advantage of eliminating having to fetch them again, and made it easy to retain changes such as forms input.  However, now that connections are faster, documents are larger and some VM/CMS sites are getting meaner with disk space, this original decision is more questionable.
  • We would have liked to be able to provide SSL encryption support, but this would be problematical for two reasons.  Firstly, there would be technical problems interfacing SSL support routines, written in C, with the existing Charlotte code which uses REXX and CMS Pipelines. Secondly, there would probably be license problems, as it would presumably be necessary to pay to use the SSL implementation, but Charlotte itself is currently free.

SCOTT:  And what would you say, Perry?

PERRY:  Jonathan hit on the bad points.  Charlotte (and REXX deserves some of the blame) is memory hungry.  That and page authors who never considered anything but their browser (eg: a page with nothing but images on it, and no alternate text for the images).

SCOTT:  How easy is it to use Charlotte?

PERRY:  Charlotte is fast, easy and intuitive to use.  If the pages you're interested in contain primarily textual information it can't be beat!  As a sample end user (and recent Charlotte convert) let me present my wife. She was becoming increasingly frustrated with Netscape, so I suggested she
try using Charlotte.  After a minute or so of introduction (she already had a passing familiarity of 3270s) she was using it and raving!

JONATHAN:  It's very easy to use Charlotte.  I think it's probably easier than using XEDIT.

SCOTT:  Why Pipelines Fullscreen instead of XEDIT for the user interface?

JONATHAN:  The structure of Charlotte is built around using Pipelines to allow overlapping activities to proceed in parallel, and the use of the Pipelines Fullscreen function fits in well with that design, as well as giving more direct control of the screen layout than we could achieve with XEDIT.

CARL:  Eg. while waiting for more data to arrive, format what has already arrived.  The idea is to "keep the data moving", and here you really get the benefit of Pipelines.

PERRY:  Remember I mentioned PipeDemo, Chuck Boeheim's Pipeline debugger? Well, no plumber writing a Pipes debugger would ever consider using anything but Pipes for the screen I/O  ;-).

SCOTT:  <grin>  Say I want to write a text-mode browser for myself to run in "good old DOS", using Charlotte as example.  But my knowledge of C is zero, so I want to write it in REXX.  Any advice?

JONATHAN:  As someone who has written programs to run under PC/DOS (in macro assembler and C) I can safely say that the ONLY way that Charlotte is of any help is in showing that text-mode browsers are feasible.

SCOTT:  Oh.  In other words, "leave it to the experts"...  I'd better wrap it up.  Carl, anything on the general topic of VM's role on the Web?

CARL:  I strongly believe that VM has a very important role to play on the Web.  A VM Web server can do anything that a Unix Web server can do and more.  It is also better positioned, in that it is "closer to" the corporate data that the company wants to make available on the Web. In many cases, everyone in the company has a VM id.  It is trivially easy for them to put up personal Web pages should they choose to do so. CGIs are far easier to write using Rexx and Pipelines than C or Perl.

The corporate world already has a huge investment in mainframe applications.  The hard work, database design, application structure and integration into the business have already been done.  In many cases what is required is to make these applications available to people outside the company or on the LAN.  The most cost effective way to do that is to simply use the data and application knowledge that exists to write CGIs to generate the HTML that "recreates" the application on the Web.  The best place for that Web server is on the mainframe close to the data and where large parts of the application can run unchanged.  VM is ideally positioned to take advantage of this booming market.  What better corporate Web server than the one that has the data?  VM's qualities of stability, reliability and recovery are valuable assets that make it even more desirable as a network server.

SCOTT:  Thank you Carl, Perry, and Jonathan, for participating in this discussion.  The virtual beer was *excellent*!

Anyone interested in Charlotte can find it being discussed on the WWW-VM mailing list.  To join that list, just send your SUBSCRIBE command to:

LISTSERV@LISTSERV.net

This generic list server will forward it to the appropriate list server for processing.

The authors of this article are:
  Carl Forde, Carl_Forde@Beyond-Software.com, Beyond Software,
  Perry Ruiter, Perry.Ruiter@gems1.gov.bc.ca, ITSD, Province of British Columbia,
  Jonathan Scott, Jonathan_Scott@VNET.IBM.COM, IBM UK Labs, Hursley,
  F. Scott Ophof, Newsletter@RexxLA.org, RexxLA newsletter Editor

Disclaimer:  The authors of this article do not speak officially for their respective (ex-)employers in any capacity.