Introduction:
We have designed a novel programming language grammar formalism that is intuitive and powerful. A grammar is a set of general rewrite rules α -->β with bottomup precedences. Starting with the input string, the parser repeatedly looks for the highest precedence rule that can rewrite some substring α (using associativity to break ties), and replaces it with β.
As α may be any string of terminals or nonterminals, this formalism nicely generalizes operatorprecedence grammars. It allows one to define multifix operators, new kinds of parentheses, exceptional associativity rules, ifthenelse and forloop constructs, etc.
We have finished building a parser for the formalism. A novel feature of our parser is that it does not require a total ordering of precedences. This lets the user add new constructs rapidly, without having to think out their precedence with respect to all existing constructs. Once an expression arises where the parser cannot determine its next move, it will show that case to the user and ask for clarification of precedence.
Analysis:
Our parsing algorithm operates on a string of length n in time O(log n)
per reduction (times a small grammar constant). The total parse time is
then O(n log n) if the reduction rules have contextfree form.
This runtime is only slightly worse than the O(n) of standard LALR or operatorprecedence parsing. It is more than acceptable on modern computers, as is the requirement to store the entire string in memory.
The runtime is much faster than the O(n3) of previous work on enriching programminglanguage CFGs with precedence information [1, 5]. This is thanks to the prescribed bottomup nature of our parser. Some programming languages do have constructions that would be tedious to describe with our bottomup approach, but ultimately our approach is Turingcomplete because it allows general rewrite rules.
Tools Used:
The parser uses a graph template library
(GTL) extensively. The grammar is considered as a graph to handling
of a total ordering among the precedence levels. The parser also
used the ATT toolkit graphviz
specially the dot package of it to generate the heirarchical or layered
drawings of directed graphs.
Future Work
Adding a facility for a semantic attachments (both primitive and
deried). For example the use can define macro operators of
"priority union" and "lenient composition". (.P. and .O.) by the
semantic rules
Q .P. R > Q | [~upper(Q) .o. R]
R.O.C > [R.o.C].P.R
whose corresponding syntactic rules are determined automatically by type inference:
Expression .P. Expression > Expression
Expression .O. Expression > Expression
REFERENCES
[1] Annika Aasa. User Defined Syntax. PhD thesis, Chalmers University of Technology and University of G¨
oteborg, 1992.
[2] Jon Bentley. Programming pearls [column]. Communications of the ACM, 29(8):711–721, August 1986.
[3] Lauri Karttunen. The proper treatment of optimality in computational phonology. In Proceedings of FSMNLP’98 (International Workshop on FiniteState Methods in Natural Language Processing), pages 1–12, Bilkent University, Ankara, Turkey, July 1998.
[4] Gertjan van Noord and Dale Gerdemann. An extendible regular expression compiler for finitestate approaches in natural language processing. In Automata Implementation: 4th International Workshop on Implementing Automata (WIA ’99), Potsdam, Germany, July 17–19, 2001, Revised Papers, number 22 in Springer Lecture Notes in Computer Science, 2001.
[5] E. Visser. Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam, 1997.