CLICS 2016


Chances


and


Challenges

Agenda 2016

  • Introduction
  • Analyses
  • Plans

Introduction

Colexification

  • Polysemy:
    • If a word has two or more meanings which are historically related.
  • Homophony:
    • If two words which do not share a common etymological history have an identical pronunciation.
  • Colexification (coined by François 2008):
    • If one word form denotes several meanings.

Introduction

Colexification

  • Polysemy:
    • English wood 'forest; wood (material)'
  • Homophony:
    • German Arm 'arm' vs. German arm 'poor'
  • Colexification:
    • English wood, German Arm/arm, etc.

Introduction

Colexification: Cross-Linguistic Perspective

Key Concept Russian German ...
1.1 world mir, svet Welt ...
1.21 earth, land zemlja Erde, Land ...
1.212 ground, soil počva Erde, Boden ...
1.420 tree derevo Baum ...
1.430 wood derevo Holz ...

Introduction

CLICS

Database of Cross-Linguistic Colexifications (CLICS):

  • CLICS offers information on colexification in 221 different languages.
  • 301,498 words covering 1,280 different concepts
  • 45,667 cases of colexification, identified with help of a strictly automatic procedure, correspond to 16,239 different links between the 1,280 concepts in CLICS

Sources of CLICS

  • IDS (Key and Comrie 2007): 178 languages
  • WOLD (Haspelmath & Tadmor 2009): 33 languages
  • LOGOS (http://www.logosdictionary.org): 4 languages
  • Språkbanken (University of Gothenburg): 6 languages

Introduction

CLICS

img

Introduction

CLICS Cross-Linguistic Perspective

The idea of the database of Cross-Linguistic Colexifications (CLICS) was inspired by the work of Steiner, Stadler, and Cysouw (2011). In their paper, the authors introduce the use of cross-linguistic polysemies or colexifications for the purpose of handling historical semantic similarities between concepts for the purpose of cognate detection. They themselve trace their idea back to Haspelmath's (2003) notion of semantic maps, which are in some sense similar to colexifications, but for the purpose of comparing grammatical functions across languages rather than the meanings of words.

Introduction

CLICS: Network Modeling

  • Concepts are represented as nodes in our network.
  • Instances of colexification in the languages of CLICS are represented as links between the nodes (we link the concept 'poor' with the concept 'arm' since German colexifies both concepts).
  • Edge weights in the network reflect the number of attested instances of a given colexification or the number of languages or language families in which the colexification occured.

Introduction

CLICS: Network Modeling

img

Analysis

Communities

Complete Network in CLICS

img

Analysis

Communities

Since the resulting network is very, very dense, we try to break it down to smaller interesting pieces by: using algorithms for community identification which break down the networks to small groups in which the number of links within the group is higher than the number of links outside the group (INFOMAP algorithm, Rosvall and Bergstrom 2008), or extracting subgraphs from the network with a certain resolution depth

Community analysis is important for further analyses of the network and usually breaks the network down into the most relevant, cross-cultural units.

Subgraph extraction may be particularly interesting to study areal features.

Analysis

Communities

Complete Network in CLICS

img

Analysis

Visualization

In Mayer et al. (2014), a new interactive visualization of CLICS was presented, based on JavaScript and the D3 library (Bostock et al. 2011). The visualization was created with an active user in mind who investigates colexification patterns for the purpose of linguistic research on semantic change, but not necessarily restricted to diachronic studies. With the help of Thomas Mayer, who created the visualization, the loose collection of colexifications which we first presented only as a collection of browsable tables (List et al. 2013), turned into an interactive tool that also highlighted the complexity behind the colexification data.

Analysis

Visualization: Examples

Analysis

Visualization: Examples

Analysis

Visualization: Examples

Plans

Concepticon and CLICS

With our work on the Concepticon (List et al. 2015, we have assembled a large collection of metadata. Initially we only wanted to link concept lists proposed as alternatives to Swadesh's original concept lists of 100 (1955) and 200 (1952) words, in order to make sure that we have authoritative concept labels (glosses), similar to the idea of Glottolog providing stable identifiers for language varieties.

con

Plans

Concepticon and CLICS

In the meantime, the project has grown to 160 different lists in which 30 000 concept labels are linked to 2500 concept sets. The resource is further enhanced with additional metadata, including links to Wordnet, Babelnet, and EAT. Since the concepts of CLICS are also mapped to the Concepticon, we have now everything we need to start investigating differences and similarities between concept hierarchies and associations on the one hand and colexifications on the other hand.

con

Plans

LingPy and CLICS

LingPy (List et al. 2015) has been rapidly developing, not in the sense that many new algorithms would have been added, but more in the sense that the library has become more and more stable (especially thanks to Robert Forkel, who joined LingPy in 2014). LingPy seems to be the perfect place to offer high-quality code for colexification analyses, including the community detection analyses mentioned before, but also the calculation of various simple and complex statistics, be it node centrality, weighted degrees, or page rank.

con

Plans

New Data for CLICS

With currently a bit more than 200 languages, CLICS is quite big, but of course by far not as big as it should be. We hope to expand the data in various ways. First, the Intercontinental Dictionary Series will be relaunched soon and offer some 300 languages, thus giving us the possibility to increase both the accuracy of the data currently underlying CLICS and the size of the sample.

Furthermore, we hope we can profit from our mappings to the Concepticon to further expand the data on a semi-automatic basis.

con

THE END

... for the moment