Chances of the New Approaches
Chances of the New Approaches
Automatic Cognate Detection
Chances of the New Approaches
Automatic Cognate Detection
Chances of the New Approaches
Automatic Cognate Detection
- detecting, which words across different languages go back to the same ancestor form (i.e., detect etymologically related words) is notoriously difficult and tedious
- the traditional method relies on an intensive comparison of words across different languages, during which patterns of regularly recurring sound correspondences are extracted and evaluated
- the result of this endeavour are the well-known etymological dictionaries which tell us where the words in our languages come from
Chances of the New Approaches
Automatic Cognate Detection
- etymologies play an important role in science, as far as they can help us to get insights into the history of our languages, but also in literature studies, in text criticism, and in the decipherment of ancient texts
- etymologies are also important in rhetorics, and in literature in practice, as they may inspire authors and speech writers in their literary "action"
- for a long time, the task of identifying etymologically related words automatically was considered to be impossible to be carried out by automatic approaches
Chances of the New Approaches
Automatic Cognate Detection
- starting with the pioneering work by Kondrak (2000), the task of automatic cognate detection was taken more seriously by different scholars with different backgrounds (computer science, computational linguistics, and classical historical linguistics)
- based on new algorithms for sequence alignment, a method common in bioinformatics, that was considerably adopted to the needs of historical linguistics, List (2014) presented a stable Python library along with new algorithms that mimick the classical comparative method very closely
- a study by List et al. (2017) that further improved the algorithms reports average accuracy scores of 89% for tests on five different languages families
Chances of the New Approaches
Automatic Cognate Detection
LingPy:
- Python library for quantitative tasks in historical linguistics
- offers many methods for sequence comparison (phonetic alignment, cognate detection), phylogenetic reconstruction, ancestral state reconstruction, etc.
- collaborative work by a core team of currently two developers and many contributors in the past
Chances of the New Approaches
Automatic Cognate Detection: Partial Cognates
Chances of the New Approaches
Automatic Cognate Detection: Performance
List, Greenhill, and Gray, PLOS ONE, 2017
Chances of the New Approaches
Automatic Cognate Detection: Performance
List, Greenhill, and Gray, PLOS ONE, 2017
Chances of the New Approaches
Automatic Cognate Detection: Performance
List, Greenhill, and Gray, PLOS ONE, 2017
Chances of the New Approaches
Automatic Cognate Detection: Summary
- although our algorithms will never be able to completely replace trained experts, they are good enough to assist experts, and they are usually better than untrained linguists, even if they know the languages they investigate very well
- the reason for the success of current methods for automatic cognate detection is a careful modeling of the classical method for cognate detection, which tried to take inspiration from similar tasks in evolutionary biology without blindly following the implementation that biology offers (adapt rather than transfer)
Chances of the New Approaches
Database of Cross-Linguistic Colexifications
Chances of the New Approaches
Polysemy, Homophony, Colexification
- Polysemy:
- If a word has two or more meanings which are historically related.
- Homophony:
- If two words which do not share a common etymological history have an identical pronunciation.
- Colexification (coined by François 2008):
- If one word form denotes several meanings.
Chances of the New Approaches
Colexification Networks
Key | Concept | Russian
| German | ... |
1.1 | world
| mir, svet | Welt | ... |
1.21 | earth, land | zemlja | Erde, Land | ... |
1.212 | ground, soil | počva | Erde, Boden | ... |
1.420 | tree | derevo | Baum
| ... |
1.430 | wood |
derevo | Holz
| ... |
Chances of the New Approaches
Colexification Networks
- concepts are represented as nodes in a colexification network.
- instances of colexification in the languages are represented as links between the nodes
- edge weights in the network reflect the number of attested instances of a given colexification or the number of languages or language families in which the colexification occured
Chances of the New Approaches
Analyzing Colexification Networks
Chances of the New Approaches
Analyzing Colexification Networks
Chances of the New Approaches
CLICS¹ Database
Database of Cross-Linguistic Colexifications (CLICS):
- CLICS¹ offered information on colexification in 221 different languages.
- 301,498 words covering 1,280 different concepts
- 45,667 cases of colexification, identified with help of a strictly automatic procedure, correspond to 16,239 different links between the 1,280 concepts in CLICS
Chances of the New Approaches
CLICS² Database
Problems of CLICS¹
- difficult to curate
- difficult to correct
- difficult to use computationally
- difficult to re-use by the community for similar projects
- difficult to expand (only three sources, only 221 languages)
Chances of the New Approaches
CLICS² Database
Basic ideas for CLICS²
- use standardized formats proposed by the Cross-Linguistic Data Formats initiative (Forkel et al. 2018) as basic format for representation
- link many different datasets to refernece catalogs like Concepticon and Glottolog
- make a new CLICS application with a transparent Python API
- separate data, data analysis, and data deployment
- create a CLLD (http://clld.org) application for easy deployment of the data
Chances of the New Approaches
CLICS² Database
Results for CLICS² (List et al. 2018)
- more than 1000 languages
- more than 1500 concepts
- full replicability with the clics2 Python API (https://github.com/clics/clics2)
- sources of CLICS² (15 different datasets) are fully traceable
- new web application more beautiful than before
- old "look-and-feel" is preserved thanks to a standalone application that runs on every server based on pure JavaScript
Chances of the New Approaches
CLICS Database
CLICS², Linguistic Typology, List et al. 2018
Chances of the New Approaches
CLICS Database
CLICS², Linguistic Typology, List et al. 2018
Chances of the New Approaches
CLICS Database
CLICS², Linguistic Typology, List et al. 2018
Chances of the New Approaches
CLICS Database
CLICS², Linguistic Typology, List et al. 2018
Chances of the New Approaches
Summary on CLICS
- CLICS provides information on patterns in the languages of the world that was not available in this form before
- CLICS uses a completely automatic workflow, but due to a thorough curation of the data as well as the direct presentation of the data with its sources to the experts, it makes it easy to spot potential errors quickly, while at the same time showing the major signal in the data which is rather unlikely to be affected by minor problems in the original data
- CLICS has the definite potential to provide us with new insights, while it also offers data in a form that could not be compiled by humans without technical support alone in a reasonable time frame
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Rhyme Networks, Bulletin of Chinese Linguistics, List 2017
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Rhyme Networks, Bulletin of Chinese Linguistics, List 2017
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Rhyme Networks, Bulletin of Chinese Linguistics, List 2017
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
Rhyme Networks, Bulletin of Chinese Linguistics, List 2017
Chances of the New Approaches
Rhyme Networks in Ancient Chinese
- the rhyme network approaches (List 2017 and List et al. 2017) helped us to gain new insights into the structure of rhyme patterns in Chinese poetry
- thanks to the network representation of rhyme data, we could confirm and correct existing reconstructions of Old Chinese Phonology
- we could also illustrate that most reconstruction systems proposed for Old Chinese maintain a strict vowel purity, avoiding rhymes with different vowels
- the rhyme browser, that was created from the data, allows scholars quick access to the original data, so that they can spot errors, use it for their research, or correct it