New Paper on Etymological Word Relations

A draft of this paper has been on my homepage for some time available for download, but now the paper has also finally been officially published, and I take this as an opportunity to officially announce it here. The paper with the title "Beyond cognacy: historical relations between words and their implication for phylogenetic reconstruction" (PDF here, URL here) treats etymological relations between words and somehow summarizes many ideas which I have so far shared both in my earlier dissertation, but also in quite a few blogposts. In this paper, I further expand the thoughts, and I come to the conclusion that we need multi-state models to handle certain problems of linguistic variation, even when only talking about the dimensions according to which words can vary. Here's the abstract of the paper:

This article investigates the terminology and the processes underlying the fundamental historical relations between words in linguistics (cognacy) and genes in biology (homology). The comparison between linguistics and biology shows that there are major inconsistencies in the analogies drawn between the two research fields and the models applied in phylogenetic reconstruction in linguistics. Cognacy between words is treated as a binary relation which is either present or not. Words, however, can exhibit different degrees of cognacy which go beyond the distinction between orthologous and paralogous genes in biology. The complex nature of cognacy has strong implications for the models used for phylogenetic reconstruction. Instead of modeling lexical evolution as a process of cognate gain and cognate loss, we need to go beyond the cognate relation and develop models which take the degrees of cognacy into account. This opts for the use of evolutionary models which handle multistate characters and allow to define potentially asymmetrical transition tendencies among the character states instead of time-reversible binary state models in phylogenetic approaches. The benefit of multistate models with asymmetric transition tendencies is demonstrated by testing how well different models of lexical change perform in semantic reconstruction on a lexicostatistical dataset of 23 Chinese dialects in a parsimony framework. The results show that the improved models largely outperform the popular gain–loss models. This suggests that improved models of lexical change may have strong consequences for phylogenetic approaches in linguistics.

If you are interested in the supplementary material and the source code, you can find the supplementary data at this link and the code at this link.