Yawipa Data Extract

Winston Wu and David Yarowsky

Last updated from enwiki-20210901.

Type Size Lines
alter 12M 327216
anagrams 27M 442205
ant 1.7M 42490
cog 11M 315941
coord 1.3M 28682
def 679M 9854898
deftr 192M 4276464
der 38M 939731
desc 11M 283319
etym 134M 2334835
formof 276M 4146376
holo 45K 1065
hyper 475K 11329
hypo 1.6M 36493
mero 83K 2040
noncog 123K 2958
pos 242M 7741547
pron 101M 2678937
rel 29M 690689
syn 13M 322194
tr 176M 2576445
el-pron 1.2M 33827
es-pron 25M 667157
fr-pron 162M 4176767
fr-pos 149M 4530134
fr-tr 47M 1039038
it-pron 7.7M 188093
el-pron 1.2M 33827

Citations

If you use this data, please cite our paper:
@inproceedings{wu-yarowsky-2020-computational,
    title = "Computational Etymology and Word Emergence",
    author = "Wu, Winston and Yarowsky, David",
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.397",
}
If you use the formof or tr data, please also cite:
@inproceedings{wu-yarowsky-2020-wiktionary,
    title = "{W}iktionary Normalization of Translations and Morphological Information",
    author = "Wu, Winston and Yarowsky, David",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.413",
}
Let us know if you found this helpful, or if you have any questions or suggestions!