User Extensible Syntax for Little Languages 

Introduction:

Many applications furnish specialpurpose languages for specifying behavior or data [2]. Users often want to extend these “little languages” with new operators and constructions. Some programming languages (Prolog, C++, Lisp) do allow the definition of new operators or the redefinition of old ones. However, the arity, syntax, and precedence of these operators is either fixed in advance or must be specified in gory detail by the user. We have built a more flexible tool for parsing extensible programming languages. The user can define new operators and other constructions on the fly, in a surprisingly simple and flexible way.
This work will be used as part of a finitestate learning toolkit.  Extended weighted regular expressions constitute a powerful “little language” for specifying finitestate probability models. This language is frequently extended with new macro operators for both local and global use—as in the previous (nonlearning) toolkit of [4], which uses Prolog to define new operators.

We have designed a novel programming language grammar formalism that is intuitive and powerful. A grammar is a set of general rewrite rules α -->β with bottomup precedences. Starting with the input string, the parser repeatedly looks for the highest precedence rule that can rewrite some substring α (using associativity to break ties), and replaces it with β.

As α may be any string of terminals or nonterminals, this formalism nicely generalizes operatorprecedence grammars. It allows one to define multifix operators, new kinds of parentheses, exceptional associativity rules, ifthenelse and forloop constructs, etc.

We have finished building a parser for the formalism. A novel feature of our parser is that it does not require a total ordering of precedences. This lets the user add new constructs rapidly, without having to think out their precedence with respect to all existing constructs. Once an expression arises where the parser cannot determine its next move, it will show that case to the user and ask for clarification of precedence.

Analysis:
Our parsing algorithm operates on a string of length n in time O(log n) per reduction (times a small grammar constant). The total parse time is then O(n log n) if the reduction rules have contextfree form.

This runtime is only slightly worse than the O(n) of standard LALR or operatorprecedence parsing. It is more than acceptable on modern computers, as is the requirement to store the entire string in memory.

The runtime is much faster than the O(n3) of previous work on enriching programminglanguage CFGs with precedence information [1, 5]. This is thanks to the prescribed bottomup nature of our parser. Some programming languages do have constructions that would be tedious to describe with our bottomup approach, but ultimately our approach is Turingcomplete because it allows general rewrite rules.

Tools Used:
The parser uses a graph template library (GTL) extensively. The grammar is considered as a graph to handling of a total ordering among the  precedence levels. The parser also used the ATT toolkit graphviz specially the dot package of it to generate the heirarchical or layered drawings of directed graphs.


Future Work
Adding a facility for a semantic attachments (both primitive and deried). For example  the use can define macro operators of "priority union" and "lenient composition". (.P. and .O.) by  the semantic rules

Q .P. R > Q | [~upper(Q) .o. R]

R.O.C > [R.o.C].P.R

whose corresponding syntactic rules are determined automatically by type inference:

Expression .P. Expression > Expression

Expression .O. Expression > Expression

REFERENCES

[1] Annika Aasa. User Defined Syntax. PhD thesis, Chalmers University of Technology and University of G¨

oteborg, 1992.

[2] Jon Bentley. Programming pearls [column]. Communications of the ACM, 29(8):711–721, August 1986.

[3] Lauri Karttunen. The proper treatment of optimality in computational phonology. In Proceedings of FSMNLP’98 (International Workshop on FiniteState Methods in Natural Language Processing), pages 1–12, Bilkent University, Ankara, Turkey, July 1998.

[4] Gertjan van Noord and Dale Gerdemann. An extendible regular expression compiler for finitestate approaches in natural language processing. In Automata Implementation: 4th International Workshop on Implementing Automata (WIA ’99), Potsdam, Germany, July 17–19, 2001, Revised Papers, number 22 in Springer Lecture Notes in Computer Science, 2001.

[5] E. Visser. Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam, 1997.