|1.21||earth, land||zemlja||Erde, Land||...|
|1.212||ground, soil||počva||Erde, Boden||...|
Database of Cross-Linguistic Colexifications (CLICS):
Sources of CLICS
The idea of the database of Cross-Linguistic Colexifications (CLICS) was inspired by the work of Steiner, Stadler, and Cysouw (2011). In their paper, the authors introduce the use of cross-linguistic polysemies or colexifications for the purpose of handling historical semantic similarities between concepts for the purpose of cognate detection. They themselve trace their idea back to Haspelmath's (2003) notion of semantic maps, which are in some sense similar to colexifications, but for the purpose of comparing grammatical functions across languages rather than the meanings of words.
Complete Network in CLICS
Since the resulting network is very, very dense, we try to break it down to smaller interesting pieces by: using algorithms for community identification which break down the networks to small groups in which the number of links within the group is higher than the number of links outside the group (INFOMAP algorithm, Rosvall and Bergstrom 2008), or extracting subgraphs from the network with a certain resolution depth
Community analysis is important for further analyses of the network and usually breaks the network down into the most relevant, cross-cultural units.
Subgraph extraction may be particularly interesting to study areal features.
Complete Network in CLICS
With our work on the Concepticon (List et al. 2015, we have assembled a large collection of metadata. Initially we only wanted to link concept lists proposed as alternatives to Swadesh's original concept lists of 100 (1955) and 200 (1952) words, in order to make sure that we have authoritative concept labels (glosses), similar to the idea of Glottolog providing stable identifiers for language varieties.
In the meantime, the project has grown to 160 different lists in which 30 000 concept labels are linked to 2500 concept sets. The resource is further enhanced with additional metadata, including links to Wordnet, Babelnet, and EAT. Since the concepts of CLICS are also mapped to the Concepticon, we have now everything we need to start investigating differences and similarities between concept hierarchies and associations on the one hand and colexifications on the other hand.
LingPy (List et al. 2015) has been rapidly developing, not in the sense that many new algorithms would have been added, but more in the sense that the library has become more and more stable (especially thanks to Robert Forkel, who joined LingPy in 2014). LingPy seems to be the perfect place to offer high-quality code for colexification analyses, including the community detection analyses mentioned before, but also the calculation of various simple and complex statistics, be it node centrality, weighted degrees, or page rank.
With currently a bit more than 200 languages, CLICS is quite big, but of course by far not as big as it should be. We hope to expand the data in various ways. First, the Intercontinental Dictionary Series will be relaunched soon and offer some 300 languages, thus giving us the possibility to increase both the accuracy of the data currently underlying CLICS and the size of the sample.
Furthermore, we hope we can profit from our mappings to the Concepticon to further expand the data on a semi-automatic basis.
... for the moment