The PL/I programming language has many keywords, but none are actually reserved words, so you can have variables named IF, THEN and ELSE, and write statements like:
IF IF = THEN THEN THEN := ELSE; ELSE ELSE := IF;
I've long wondered about how to write a scanner and parser for it.
1/
#ProgrammingLanguage design
I think the parser could treat the keywords as tokens, and a "word" token for other words, and the grammer could have producions like:
keyword = IF | THEN | ELSE | BEGIN | END | DO | DECLARE | PROCEDURE [etc.]
identifier = keyword | word
But I don't think the resulting grammar would be LALR(1), or even LR(k). I'm not entirely sure that it is even context free.
2/
I became curious as to how early IBM PL/I compilers work. The Bitsavers archive has scans of several revisions of the PL/I(F) compiler Program Logic Manual (PLM), [G]Y28-6800, so I spent a few hours reading the -6 revision (June 1972).
I suppose it should not surprise me that a compiler originally written in the late 1960s has an amazingly complex structure with a large number of phases that each only do a tiny part of the work, much like IBM's 63-phase FORTRAN compiler for the 1401.
4/
IBM had various other compilers, including subset, checkout, and optimizing compilers, and I have not looked at those at all.
The Multics PL/I compiler version 2 source code is available online, but I haven't yet looked at that either.
Digital Research, best known for the CP/M operating system, actually offered a PL/I compiler for 8080 CP/M systems. Before CP/M, Kildall developed a PL/M compiler for the 8008 and 8080 for Intel, originally written in FORTRAN.
6/
FORTRAN is also well known for needing parsing tricks. The following are two valid FORTRAN statements with wildly different effects:
DO I = 1,5
DO I = 1.5
The first statement, with the comma, starts a loop, with the iteration variable I. The second statement, with the period, is a simple assignment statement, which sets the value of a variable to 1.5. But what variable does it set?
7/
If you guessed that the second statement sets I to 1.5, you're mistaken. Instead, it sets a variable named "DOI" to 1.5, which is unlikely to be what the programmer wanted
To understand why, you must consider three "features" of FORTRAN:
1) Spaces are insignificant in a FORTRAN program, except in the line number/comment/continuation fields, but those can't contain a statement. FORTRAN compilers generally strip all spaces early in the compilation process.
8/
As a result, "DOI", "DO I", "D OI", and "D O I" all refer to the same variable.
2) FORTRAN 66 doesn't require variables to be declared prior to use. If DOI wasn't previous declared, the second example implicitly declares it as a REAL variable. (FORTRAN 77 allows the programmer to disallow implicit variable declaration by a statement "IMPLICIT NONE".)
3) Syntactically, the comma is the only difference between the assignment statement and the DO loop statement.
9/