A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval
Citation
@inproceedings{sun2020clirmatrix,
title={CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval},
author={Sun, Shuo and Duh, Kevin},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={4160--4170},
year={2020}
}
BI-139
A bilingual dataset of queries in one language matched with relevant documents in another language for 139x138=19,182 language pairs.