... for Computational Language Comparison
Comparative linguistics has provided us with many new insights into the past of our languages. |
Comparative linguistics has a great and so far underexplored potential to help us learn more about human cognition. |
Comparative linguistics can help us to distinguish culturally specific traits from universal tendencies. |
We face many problems, resulting largely from the lack of
Linguists rarely follow standards in naming languages, referencing concepts, or transcribing words. |
Linguists use a large amount of different methods and barely agree even on the basic procedures for inference, as specifically witnessed by the "comparative method", which differs from scholar to scholar and has never been properly formalized. |
Linguistic data for different or identical languages are largely incomparable, since key aspects of the data have often not been unified, as reflected in idiosyncratic elicitation glosses, language names, or transcription systems. |
The Quantitative Turn in Historical Linguistics
Expertendämmerung
Computer-Assisted Language Comparison (List 2016)
Computer-Assisted Language Comparison (List 2016)
Core ideas of the CALC framework
Retro-Standardizing Data through Data-Lifting
(Retro)-standardization (or data lifting) is one of the core aspects of our recent efforts, expressed as part of the Cross-Linguistic Data Formats initiative (https://cldf.clld.org). This means we
Cross-Linguistic Data Formats (Forkel et al. 2018, https://cldf.clld.org)
Reference Catalogs
Reference catalogue for language varieties (languages and dialects), providing language identifiers, geolocations, classifications, and references (Hammarström et al. 2020, https://glottolog.org). |
Reference Catalogs
Reference catalog for concepts, which are defined independently of concrete languages, providing concept identifiers, concept metadata, concept relations, and references (List et al. 2020, https://concepticon.clld.org). |
Reference Catalogs
Reference catalogue for speech sounds (across different transcription systems and datasets), offering sound identifiers, feature-based sound descriptions, and references (List et al. 2019, https://clts.clld.org). |
Annotation helps us to add information to a dataset which was not inherently given by the structure of the data before. For these tasks, we use
Prime examples for annotation are: morpheme segmentation (assignment of morpheme boundaries), cognate annotation (assignment of etymological relationships).
Cognates and Sound Correspondences
Cognates and Sound Correspondences
List et al. (2016): Partial cognate detection workflow.
Cognates and Sound Correspondences
List (2019): Correspondence pattern detection.
Colexification Networks
Colexification Networks
List et al (2013): Using community detection methods to identify colexification networks.
Colexification Networks
Colexification Networks
Colexification Networks
Colexification Networks
Colexification Networks
Colexification Networks
Too many studes are still being published without submitting data and code, increasing the amount of irreproducible data in the field of comparative linguistics.
Jacques and List (2019): Incomplete lineage sorting.
Our models are still naive, far too naive, and we need to invest much more time into a careful modeling of our problems, rather than trusting that increased computation power would help us in solving our issues.
Спасибо за ваше внимание!