Reconciling Classical and Computational Approaches in Historical Linguistics
The Cross-Linguistic Data Formats initiative (Forkel et al. 2016, http://cldf.clld.org) aims at increasing the comparability of cross-linguistic. It comes along with:
As of now, a couple of software tools (LingPy, Beastling, EDICTOR) support CLDF. In the future, we hope that the number of users will increase, and that the community helps to develop the formats further.
The Concepticon is an attempt to link the large amount of different concept lists which are used in the linguistic literature, ranging from Swadesh lists in historical linguistics to naming tests in clinical studies and psycholinguistics.
This resource, our Concepticon, links concept labels from different conceptlists to concept sets. Each concept set is given a unique identifier, a unique label, and a human-readable definition. Concept sets are further structured by defining different relations between the concepts.
With the verson 1.1 of Concepticon, which will be released later in 2017, many new features become available:
This is an attempt to create a cross-linguistic phonetic alphabet, realized as a dialect of IPA, for cross-linguistic approaches to language comparison.
The basic idea is to provide a fixed set of symbols for phonetic representation along with a full description regarding their pronunciation following the tradition of IPA. This list is essentially expandable, when new language data arises, and can be linked to alternative datasets, like Mielke's (2008) P-Base, and PHOIBLE.
Efforts are extremely preliminary and also experimental. A rather new idea is to use orthography profiles (Moran and Cysouw 2017) to automatically convert between consistent alphabets (like, e.g., the GLD alphabet) and CLPA, or to adjust existing datasets so that they conform to CLPA standards.
ID | Language_ID | Parameter_ID | Value | Segments |
---|---|---|---|---|
1 | stan1295 | 1277 | Hand (n) | h a n t |
2 | stan1293 | 1277 | hand | h æ n d |
3 | russ1263 | 2121 | рука | r u ˈk a |
... | ... | ... | ... | ... |
Partial Cognate Detection (List et al. 2016)
Partial Colexification Analysis
Ancestral State Reconstruction
Tools or interfaces serve two major purposes in the CALC framework:
Tools do not necessarily need to offer solutions for both of these aspects, and we will often have tools which serve only for inspection, or tools which only server for data-creation and correction.
Tools for data input should have the following functionalities:
Tools for data inspection are less bound to standards, also since the problems they may address are so different. Nevertheless, they should also accept standardized formats as input data. In addition, we recommend to use web-based frameworks (JavaScript / CSS/ HTML) to guarantee platform independence. This will also minimize learning time for users, given that we are all nowadays quite familiar with web-based frameworks, rather than with complicated GUIs.
The EDICTOR is a web-based tool that allows to edit, analyse, and publish etymological data. It is available as a prototype in Version 0.1 and will be further developed in the project "Computer-Assisted Language Comparison" (2017-2021). The tool can be accessed via the website at http://edictor.digling.org, or be downloaded and used in offline form. All that is needed to use the tool is a webbrowser (Firefox, Safari, Chrome). Offline usage is currently restricted to Firefox. The tool is file-based: input is not a database structure, but a plain tab-separated text file (as a single sheet from a spreadsheet editor). The data-formats are identical with those used by LingPy, thus allowing for a close interaction between automatic analysis and manual refinement.
The EDICTOR structure is modular, consisting of different panels that allow for:
CLICS is an online database of synchronic lexical associations ("colexifications") in currently 221 language varieties of the world. Large databases offering lexical information on the world's languages are already readily available for research in different online sources. However, the information on tendencies of meaning associations they enshrine is not easily extractable from these sources themselves.
As CLICS comes along with a powerful visualization suite, it is very convenient to query the information regarding meaning associations. CLICS thus also serves as an example for computer-assisted language comparison, in so far as it illustrates how analyses created by machines can be made accessible to the detailed inspection by researchers.
We are only at the beginning, and many things still need to be done:
Спасибо за Ваше внимание!