Today, a new preprint with Cormac Anderson, Tiago Tresoldi, Simon J. Greenhill, Robert Forkel, and Russell D. Gray, titled "Measuring variation in phoneme inventories" appeared online at Research Square (10.21203/rs.3.rs-891645/v1). The study systematically compares phoneme inventories and how they are coded in different datasets.
For over a century, the phoneme has played a central role in linguistic research. In recent years, collections of phoneme inventories, originally designed for cross-linguistic purposes, have increasingly been used in comparative studies involving neighbouring disciplines. Despite the extended application of this type of data, there has been no research into its comparability or tests of its reliability. In this study, we carry out a systematic comparison of four popular phoneme inventory collections. We render them comparable by linking them to standardised formats for the handling of cross-linguistic datasets and develop new measures to test both size and similarity. We find considerable differences in inventories supposedly representing the same language variety, both in terms of size and transcriptional choices. While some of these differences appear to be predic, reflecting design decisions in the different collections, much of the observed variation is unsystematic. These results should sound a note of caution for comparative studies based on phoneme inventories, which we suggest need to take the question of comparability more seriously. We make a number of proposals for improving the comparability of phoneme inventories.