%% -*- mode: prolog -*- %% This file implements OT directional constraint operators and %% a small grammar of directional constraints for syllabification. %% %% Your job is to uncomment and complete all definitions commented as "%...%". %% For each definition commented as "%...%", you should delete "%...%" %% and fill in the part of the definition that is currently written %% as "...". %% General notes that will make your life easier: %% %% - It's easiest to uncomment and complete one definition at a time. %% Test each definition using the GUI until you're sure it's working right. %% - Sometimes the comments give examples that will help your testing. %% - I suggest working from the top of this file toward the bottom, %% filling in the higher definitions first. %% - Make sure to read the other macros and comments so that you %% understand what's going on! Trying some of the other macros %% out in the GUI is a very good idea. %% %% - Remember that http://www.cs.jhu.edu/~jason/405/software.html %% has a quick reference to FSA Utilities, and links to the manual. %% - If you have trouble with FSA Utilities, feel free to write to me. %% - If you have trouble and figure it out, feel free to write to the %% class mailing list. %% %% - To start the FSA Utilities so that they read this file (otdir.plg): %% fsa regex_cache=on -tk -aux otdir.plg %% - To reread this file into the FSA Utilities after you've edited it: %% use the ReconsultAux command on the File menu. %% - If you get a syntax error when reading the file: %% The error message mentions the line number, a description of %% the error, and an internal description of the line. %% %% - Don't leave out "regex_cache=on" in the command line above! This %% makes the FSA Utilities remember the machine it constructs for any %% subexpression in case you use that subexpression again. This %% speeds up interactive use considerably! You could also try %% controlling the cache explicitly so that only certain subexpressions %% are cached: see section 3.55 of the FSA manual. %% %% - To define a macro in the file you are now reading: %% macro(name, def) %% - To define a macro with arguments: %% macro(name(X,Y), def that mentions X and Y) %% The arguments must be Prolog variables, i.e., they must start %% with capital letters. %% - To make your macro cache X and Y for speed if they are mentioned multiple %% times in the definition (see section 3.57 of FSA manual): %% macro(name(X,Y), pragma([X,Y], def that repeatedly mentions X and Y)) %% %% - To type expressions into the Regex field of the GUI: %% Remember to quote capital letters: 'A' %% Remember to quote or escape special symbols: '*' or \* or star %% - To type expressions into the String field of the GUI: %% No quoting is necessary here. %% - To examine the output of a transducer T on an input string S: %% If you enter T into the Regex field and S into the String %% field, the possible outputs are printed in your terminal %% window. If there are infinitely many possible outputs, they %% are printed using a special notation for cycles (see section %% 15.49 of the FSA manual). The notation |3+| means that the %% previous 3 characters could be repeated one or more times. %% %% Another option is to type range(S o T) into the Regex field %% and look at the result graphically. %% %% - To understand the labels in a graphically displayed automaton: %% a:b read a, output b %% []:b insert b (i.e., read empty string [], output b) %% b:[] delete b %% b copy b from input to output %% `b copy any character other than b from input to output %% b..e copy any character b,c,d,e from input to output %% `b..e copy any character other than b,c,d,e from input to output %% b:{a,b} read input b, output either a or b %% b:[a,b] read input b, output a and then b %% $@(b..e):[a,$@(b..e)] read any character in b..e, output a and %% then THAT CHARACTER ($@ is a backreference) %% Example: t_determinize([[]:a,b..e]) %% If there are multiple arcs from state i to state j, the %% displayer combines them. For example, try the expression %% intr(surf) %% The resulting arc is labeled with both %% `{A..E,[,]} copy any character other than a capital letter or bracket %% []:{A..E,[,]} insert any capital letter or bracket %% -------------------- %% The following lines allow export of the macros in this file %% for use in other Prolog files. :- multifile macro/2. :- multifile rx/2. %% -------------------- %% Define the classes of symbols we will use. %% corrpair tells us which deep segments correspond to which surface segments. %% Deep segments are lowercase letters, surface segments are capital letters. %% There are other surface symbols that are not segments: the brackets []. macro(corrpair, {[a,'A'],[b,'B'],[c,'C'],[d,'D'],[e,'E']}). macro(deeptosurf, t_minimize([?, [] x ?] o corrpair o [? x [], ?])). macro(deepseg, domain(deeptosurf)). % deep vs. surf segments. macro(surfseg, range(deeptosurf)). macro(surfvowel, surfseg & {'A','E','I','O','U'}). % vowels vs. consonants macro(surfcons, surfseg - surfvowel). macro(lsyl,'['). % syllable boundaries macro(rsyl,']'). macro(deep, deepseg). % all deep symbols. macro(surf, {surfseg,lsyl,rsyl}). % all surface symbols. Disjoint from deep. macro(star, '*'). % stars %% -------------------- %% Some "traditional" utility transducers. %% %% - elim(C) is a transducer that removes from its input all single %% characters matching C. %% %% - intr(C) is a transducer that freely introduces single characters %% matching C into a string with no such characters. (This is %% traditionally called intro(C), but that name is already in use %% by replace.pl.) %% %% - ignore(E,C) matches any string that would match E if we ignored %% all characters matching C in the string. macro(intr(C), {(? - C), [] x (C & ?)}* ). macro(ignore(E,C), range(E o intr(C))). macro(elim(C), inverse(intr(C))). %% -------------------- %% Define the OT generator gen. This inserts surface symbols. It %% includes a hard constraint that left and right brackets [] be %% matched up properly and not nested. macro(gen, deep* o intr(surf) o matchedbrackets). macro(matchedbrackets, ignore([lsyl,rsyl]* , ? - {lsyl,rsyl})). %% -------------------- %% Some macros that are very useful in describing constraints. %% %% - addstarwhere(L,R) is a transducer that applies to an unstarred %% string and adds a single star at every position surrounded by L,R %% (i.e., preceded by a substring matching L and followed by a %% substring matching R). %% %% - delstarwhere(L,R) is similarly a transducer for removing such stars. %% %% - constraint(Lif,Rif,Lthen,Rthen) specifies a constraint that requires %% every position surrounded by Lif, Rif to also be surrounded by %% Lthen, Rthen. It inserts a star at any position that violates this %% property. %% %% - surfconstraint(Lif,Rif,Lthen,Rthen) is a front-end to constraint() %% that is very useful for defining surface constraints. Surface %% constraints (as defined in the problem set) only evaluate the %% surface material. For example, surfconstraint('A','B',{},{}) %% is violated at any surface position that would have A to the left %% and B to the right if we ignore the deep segments. Example: It maps %% aABcCaAabB to aA*BcCaA*abB, inserting stars "in the same two %% places" as if it were merely mapping ABCAB to A*BCA*B. % addstarwhere() and delstarwhere() are implemented inefficiently here % in terms of the replace() operator we discussed in class. We load % that operator from a file. :- ensure_loaded('~rflorian/software/lib/fsa/GerdemannVannoord99/eacl99'). macro(addstarwhere(L,R), replace([]:star, ignore(L,star), R)). macro(delstarwhere(L,R), replace(star:[], ignore(L,star), ignore(R,star))). %...% macro(constraint(Lif,Rif,Lthen,Rthen), ...). % hint: use addstarwhen %...% macro(surfconstraint(Lif,Rif,Lthen,Rthen), ...). % hint: use constraint %% -------------------- %% Now let's define some actual constraints! %% - nodel ("No deletion"): every deep segment is immediately followed by %% the corresponding surface segment. Example: aAbcC --> aA*bcC %% That is, deep abc mapping to surface AC gets a * for deleting b. %% - noins ("No insertion"): every surface segment is immediately preceded %% by the corresponding deep segment. Example: aABcC --> aAB*cC %% That is, deep ac mapping to surface ABC gets a * for inserting B. %% - syllabify: every surface segment is inside a syllable. %% Example: [aAbB]cCD[eE]E --> [aAbB]cC*D*[eE]E* %% - onset: every syllable starts with a consonant. %% Example: [eEbBaAdDeE][cCdD] --> [*eEbBaAdDeE][cCdD] %% - nocomplex: two surface consonants may not be next to each other. %% That is, they must be separated by a surface vowel or %% a syllable boundary. %% Example: [eEbBaAdDdD][cC] --> [eEbBaAdD*dD][cC] %% - nucleus: every syllable contans a surface vowel. %% Example: [eEbBaAdDeE][cCdD] --> [eEbBaAdDeE][*cCdD] %% - singlenuc: any two surface vowels must be separated by a %% syllable boundary (and perhaps some other material too). %% This says that a syllable cannot contain more than one %% surface vowel (nor, for that matter, can any substring %% that is *not* in a syllable). %% Example: [eEbBaAdDeE][cCdD] --> [eE*bBaA*dDeE][cCdD] %% This example gets one star per extra vowel. macro(nodel, constraint([],deepseg,[],corrpair)). %...% macro(noins, constraint(...)). macro(syllabify, surfconstraint(surfseg,[],ignore(lsyl,surfseg),[])). %...% macro(onset, surfconstraint(...)). %...% macro(nocomplex, surfconstraint(...)). macro(nucleus, surfconstraint(lsyl,[],[],ignore(surfvowel,surfcons))). %...% macro(singlenuc, surfconstraint(...)). %% -------------------- %% Define two new left-associative infix operators od and do. %% (These correspond to o> and