Yawipa Data Extract
Last updated from enwiki-20210901.
Type | Size | Lines |
---|---|---|
alter | 12M | 327216 |
anagrams | 27M | 442205 |
ant | 1.7M | 42490 |
cog | 11M | 315941 |
coord | 1.3M | 28682 |
def | 679M | 9854898 |
deftr | 192M | 4276464 |
der | 38M | 939731 |
desc | 11M | 283319 |
etym | 134M | 2334835 |
formof | 276M | 4146376 |
holo | 45K | 1065 |
hyper | 475K | 11329 |
hypo | 1.6M | 36493 |
mero | 83K | 2040 |
noncog | 123K | 2958 |
pos | 242M | 7741547 |
pron | 101M | 2678937 |
rel | 29M | 690689 |
syn | 13M | 322194 |
tr | 176M | 2576445 |
el-pron | 1.2M | 33827 |
es-pron | 25M | 667157 |
fr-pron | 162M | 4176767 |
fr-pos | 149M | 4530134 |
fr-tr | 47M | 1039038 |
it-pron | 7.7M | 188093 |
el-pron | 1.2M | 33827 |
Citations
If you use this data, please cite our paper:@inproceedings{wu-yarowsky-2020-computational, title = "Computational Etymology and Word Emergence", author = "Wu, Winston and Yarowsky, David", booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://www.aclweb.org/anthology/2020.lrec-1.397", }If you use the formof or tr data, please also cite:
@inproceedings{wu-yarowsky-2020-wiktionary, title = "{W}iktionary Normalization of Translations and Morphological Information", author = "Wu, Winston and Yarowsky, David", booktitle = "Proceedings of the 28th International Conference on Computational Linguistics", month = dec, year = "2020", address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.coling-main.413", }Let us know if you found this helpful, or if you have any questions or suggestions!