The PL/I programming language has many keywords, but none are actually reserved words, so you can have variables named IF, THEN and ELSE, and write statements like:

IF IF = THEN THEN THEN := ELSE; ELSE ELSE := IF;

I've long wondered about how to write a scanner and parser for it.
1/
#ProgrammingLanguage design

I think the parser could treat the keywords as tokens, and a "word" token for other words, and the grammer could have producions like:

keyword = IF | THEN | ELSE | BEGIN | END | DO | DECLARE | PROCEDURE [etc.]
identifier = keyword | word

But I don't think the resulting grammar would be LALR(1), or even LR(k). I'm not entirely sure that it is even context free.
2/

I suppose I could try writing the grammar based on either the ANSI X3.53-1976/IN IT'S 53-1976/ECMA 50 standard, or the IBM language spec ([G]Y33-6003), and see what Bison thinks about it. If it is context free but not LR(k), Bison might accept it in GLR mode.
3/

I became curious as to how early IBM PL/I compilers work. The Bitsavers archive has scans of several revisions of the PL/I(F) compiler Program Logic Manual (PLM), [G]Y28-6800, so I spent a few hours reading the -6 revision (June 1972).
I suppose it should not surprise me that a compiler originally written in the late 1960s has an amazingly complex structure with a large number of phases that each only do a tiny part of the work, much like IBM's 63-phase FORTRAN compiler for the 1401.
4/

Anyhow, reading the PL/I(F) PLM didn't give me much overall insight into how they recognize statements in the face of so much apparent syntactic ambiguity. Using modern compiler techniques would almost certainly involve a huge amount of backtracking, but it looks like PL/I(F) uses little or none.
5/

IBM had various other compilers, including subset, checkout, and optimizing compilers, and I have not looked at those at all.
The Multics PL/I compiler version 2 source code is available online, but I haven't yet looked at that either.
Digital Research, best known for the CP/M operating system, actually offered a PL/I compiler for 8080 CP/M systems. Before CP/M, Kildall developed a PL/M compiler for the 8008 and 8080 for Intel, originally written in FORTRAN.
6/

FORTRAN is also well known for needing parsing tricks. The following are two valid FORTRAN statements with wildly different effects:

DO I = 1,5
DO I = 1.5

The first statement, with the comma, starts a loop, with the iteration variable I. The second statement, with the period, is a simple assignment statement, which sets the value of a variable to 1.5. But what variable does it set?
7/

If you guessed that the second statement sets I to 1.5, you're mistaken. Instead, it sets a variable named "DOI" to 1.5, which is unlikely to be what the programmer wanted
To understand why, you must consider three "features" of FORTRAN:

1) Spaces are insignificant in a FORTRAN program, except in the line number/comment/continuation fields, but those can't contain a statement. FORTRAN compilers generally strip all spaces early in the compilation process.
8/

As a result, "DOI", "DO I", "D OI", and "D O I" all refer to the same variable.

2) FORTRAN 66 doesn't require variables to be declared prior to use. If DOI wasn't previous declared, the second example implicitly declares it as a REAL variable. (FORTRAN 77 allows the programmer to disallow implicit variable declaration by a statement "IMPLICIT NONE".)

3) Syntactically, the comma is the only difference between the assignment statement and the DO loop statement.
9/

I bring this up only to illustrate that these sort of programming language issues were known when PL/I was designed, but obviously not considered to be serious enough to warrant using reserved words.

However, PL/I has these problems to a much greater extent because, like ALGOL 60, and the C family of languages, it is a free-form language, including nested statements with their own delimiters, such as the THEN clause of an IF statement.
10/

Follow

@brouhaha I ran into a similar problem with the 'chain' command and tape drive syntax assuming drives were named either 'a#" or 'b#'. It was fun..The IBM'er was furious that a mere mortal found a flaw in their FORTRAN compiler.

· · Web · 0 · 1 · 1
Sign in to participate in the conversation
EntropyService

For known friends and family