600.105 M&M's - Homework Due Monday, November 15, 2010 OVERVIEW: The goal of this this homework is help you understand one sample task in natural language processing and information retrieval, namely morphological analysis, and to give you some experience with regular expressions in the process. You may also gain a better understanding of your own language or a language you are studying. TO DO: Using the following morphological description of the inflectional morphology of English as a guide, either: (a) write a similar description of the inflectional morphology of another language of your choice or (b) write a similar description of the derivational morphology of English NOTE: Inflectional morphology includes things such as verb tense (eg -ed, -ing), noun plurals (-s), and adjective comparatives (e.g. -er, -est), plus such things as noun/adjective gender and case. Inflectional morphology doesn't change the basic part of speech of a word (e.g. verbs stay verbs). Derivational morphology typically involves changing the part of speech of a word, such as employ(verb) to employee(N), employer(N), and employment(N), or global(adjctive) to globalize(verb), but also can include noun to noun (e.g. Boston -> Bostonian, Maryland -> Marylander, Baltimore -> Baltimorian). In general, you should assume that transformations are based on the endings of words (e.g. y:ies means s/y$/ies/ as in cry:cries). Also, assume that if multiple patterns match, the lower one applies. Multiple matches in the same block will all apply. You should be creative and if you want/need to extend the syntax of these rules to handle your language you should feel free to do so. Please feel free to do so, but ideally document them. =========================================================== ABBREVIATIONS: {V}=aeiou {C}=bcdfghjklmnpqrstvwxyz ## you can also use abbreviations to handle ## accented characters with aliases, to make ## rule writing simpler, for example: N=ñ E=é ----------------------------------------------------------- NN:NNS :s NN:NNS y:ies # cry:cries " ch:ches # church:churches " o:oes, o:os # :potatoes, :photos " x:xes # box:boxes " ([sz]):($1)es # class:classes " ([{V}])([sz]):($1)($2)($2)es # bus:busses " us:ora # corpus:corpora " is:es # axis:axes " man:men # :tradesmen, :walkmans " us:i # alumnus:alumni " a:ae # alumna:alumnae " ix:ices # matrix:matrices " ex:ices # vertex:vertices " um:a # medium:media " ma:mata # stigma:stigmata " on:a # phenomenon:phenomena " to:ti # libretto:libretti " eau:eaux # bureau:bureaux NN:NNS child:children " mouse:mice " louse:lice " ox:oxen VB:VBZ :s VB:VBZ y:ies " ch:ches " o:oes, o:os # :potatoes, :photos " x:xes " ([sz]):($1)es # class:classes " ([{V}])([sz]):($1)($2)($2)es # bus:busses VB:VBG :ing VB:VBG e:ing # :paging " ([{V}])([bdfgklmnprstvz]):($1)($2)($2)ing # :bussing " ([{V}])c:($1)cking # :picnicking VB:VBD :ed VB:VBD e:ed # :paged " y:ied # :cried " ([{V}])([bdfgklmnprstvz]):($1)($2)($2)ed # :bussed " ([{V}])c:($1)cked # :picnicked VBD:VBN : VB:VBN :en (there will always be a VBD analysis; so slot is filled how do we know what expected frequency to use?) notation about stem changing productivity/irregularity? VB:VBN e:en # :taken " y:ien # ? " ([{V}])([bdfgklmnprstvz]):($1)($2)($2)en # ? " ([{V}])c:($1)cken # ? JJ:JJR :er JJ:JJR e:er # :bluer " y:ier # :drier " ([{V}])([bdfgklmnprstvz]):($1)($2)($2)er # :bigger " ([{V}])c:($1)cker # ? JJ:JJT :est JJ:JJT e:est # :bluest " y:iest # :driest " ([{V}])([bdfgklmnprstvz]):($1)($2)($2)est # :biggest " ([{V}])c:($1)ckest # ? ## EXAMPLE DERIVATIONAL RULES VB:NN(result-of) :ment # employ:employment VB:NN(agent) :er, :or # employ:employer