Benchmark Databases for Historical Linguistics
- Benchmark Databases for Historical Linguistics (BDHL) is the attempt to establish a series of online databases for different kinds of benchmarks that can be used in order to evaluate how well quantitative methods in historical linguistics work.
- Currently, we plan to offer three different kinds of data, namely
- data for the evaluation of phonetic alignment algorithms (see BDPA),
- data for the evaluation of cognate detection algorithms, and
- data for the evaluation of linguistic reconstruction algorithms.
- Most of the data was collected when I was working on my dissertation, but this is still ungoing work, and we will try to regularly update the database and to encourage other linguists to join in and share their benchmark data with us. If you are interested in the original benchmark data that was collected for my publication, you can browse and download it from http://SequenceComparison.github.io.
- If you are interested in the current state of the benchmark data, please have a look at our GitHub repository.
- The Benchmark Database for Phonetic Alignments has been officially launched on June, 5. You can browse, query, and download all data at http://alignments.lingpy.org. If you are interested in the structure of the data, you can check out the paper by List and Prokić (2014).
- The Benchmark Database for Cognate Detection is now available as a supplement to my PhD thesis (List 2014).