Digital Chinese Historical Phonology


image

Agenda for the Talk

  1. Introduction
  2. Rhyme Networks
  3. Character Formation Graphs
  4. Rhyme Annotation
  5. Outlook

Introduction

img

Introduction

Chinese Historical Phonology

Chinese Historical Phonology (音韵学 yīnyùnxué) is the classical discipline, developed by Chinese scholars long time before the arrival of modern linguistic theories (mostly based on synchronic grammar).

Introduction

Chinese Historical Phonology

Chinese Historical Phonology investigates specific aspects of the development and the stages of the Chinese language, including specifically:

  • the diversification of the dialects (as a rather modern part of the discipline),
  • the pronunciation of the Chinese variety encoded in the rhyme books (600 AD) and rhyme tables (ca. 1100 AD), also known as Middle Chinese, and
  • the pronunciation of the ancient poems, especially the Book of Odes (詩經 Shījīng, ca. 600 BC), and ancient characters, based on the structural analysis of phonophoric characters.

Introduction

Chinese Historical Phonology

Achievements of Chinese Historical Phonology:

  • linguistic reconstruction of Old Chinese pronunciation based on ryme pattern analysis of the Book of Odes and character structure analysis of phonophoric characters
  • early phonological classification of Chinese speech sounds
  • classification of Chinese dialects based on their individual divergence from Middle Chinese pronunciation

Introduction

Digital Historical Linguistics

What is Digital Historical Linguistics?

  • So far, the term has barely been used.
  • Having a way to distinguish research in Computational Historical Linguistics, which implies a closeness to NLP approaches, from research that makes active use of computers without relying completely on them, seems desirable.
  • In this sense, Digital Historical Linguistics can be seen as the idea of having a discipline that integrates classical, qualitative approaches and computational, quantitative approaches in linguistics.

Introduction

Computer-Assisted Language Comparison

  • data in linguistics is steadily increasing
  • our methods reach their practical limits, as they are tedious to apply
  • we need to take computational methods into account
  • but computational methods are not very accurate and may yield wrong results

Introduction

Computer-Assisted Language Comparison

calc

Introduction

Computer-Assisted Language Comparison

Computer-assisted language comparison (CALC) can be seen as one aspect of Digital Historical Linguistics, with the latter covering a broader range than CALC, as not all problems in historical linguistics are problems of language comparison.

The general strategy, however, to avoid black-box approaches, bridge the gap between quantitative and qualitative approaches, while fostering the development of new quantitative and qualitative methods with a specific focus on linguistic questions (opposed to engineering, NLP questions) also holds for my vision of Digital Historical Linguistics as a discipline that uses digital approaches to deal with scientific questions relevant for the field of linguistics.

Introduction

Why «Digital»?

«Measure what is measurable, and make measurable what is not so.» (quote apparently falsely attributed to Galileo Galilei, see Kleinert 2009)

Introduction

Why «Digital»? Juggling Lessons...

3b 3b441 3b531

Introduction

Why «Digital»? Juggling Lessons...

3b 3b441 3b531

Introduction

Why «Digital»? Juggling Lessons...

3b 3b441 3b531

Introduction

Why «Digital»? Juggling Lessons...

3b 3b441 3b531

Introduction

Why «Digital»? Juggling Lessons...

3b 3b441 3b531

Introduction

Why «Digital»? Juggling Lessons...

Lessons from Juggling (and Galilei):

  1. It does not hurt to try to measure something.
  2. Even if you fail measuring some phenomenon, you will learn something about it.
  3. Restricting your view on a problem with some kind of model may restrict you at first, but it may also encourage you to look at aspects of the phenomen you had ignored so far.

→ It does not hurt to try, but it may hurt, not to try, even if it may not seem to hurt, while not trying.

Introduction

Digital Chinese Historical Phonology

Problems of Chinese Historical Phonology:

  1. The discipline was always data-driven, but so far, the interpretation, even of large datasets, was done manually.
  2. Scholars have assembled large collections of data, but they are not available in computer-readable form.
  3. Collaboration and interdisciplinary approaches in the field are still rare.

Introduction

Digital Chinese Historical Phonology

Potential of Digital Chinese Historical Phonology:

  1. Digital approaches may help to offer a new perspective on our data.
  2. Digital techniques may even further increase the data basis, especially when helping to unify the data and analyses proposed by different scholars.
  3. Techniques for exploratory data analysis may help scholars to develop new hypotheses.

Rhyme Networks

img

Rhyme Networks

The Poetic Function of Language

Rhyme Networks

The Poetic Function of Language

Rhyme Networks

Old Chinese Phonology

  • Old Chinese (spoken around 1000 BC) pronunciation can only be inferred through means of linguistic reconstruction, since the Chinese characters give little hints on how the words were pronounced originally
  • rhyming in ancient Chinese poems serves as main evidence regarding the pronunciation of syllable finals
  • quantitative network analyses on ancient Chinese poetry can support traditional analyses which have been mainly based on manual inspection by scholars

Rhyme Networks

Old Chinese Phonology

Rhyme Networks

Old Chinese Phonology

Rhyme Networks

Old Chinese Phonology

Rhyme Networks

Summary

  • poetic traditions in the history of Chinese are reflected for more than 3000 years of history
  • many datasets are already digitized and thus amenable for quantitative investigations
  • dynamic network approaches could help us to study both the development of the language and its varieties and the development of different traditions, thus enabling us to study the interacton of linguistic, cultural, and cognitive factors during the evolution of poetry in China

Rhyme Networks

Literature

Published papers:

  • List, J.-M. (2017): Using network models to analyze Old Chinese rhyme data. Bulletin of Chinese Linguistics 9.2. 218-241.
  • List, J.-M., J. Pathmanathan, N. Hill, E. Bapteste, and P. Lopez (2017): Vowel purity and rhyme evidence in Old Chinese reconstruction. Lingua Sinica 3.1. 1-17.

Character Formation Graphs

img

Character Formation Graphs

The Derivative Character of Chinese Writing

krank

Character Formation Graphs

The Derivative Character of Chinese Writing

krank

Character Formation Graphs

The Derivative Character of Chinese Writing

krank

Character Formation Graphs

The Derivative Character of Chinese Writing

krank

Character Formation Graphs

The Derivative Character of Chinese Writing

krank

Character Formation Graphs

Chinese Writing and Old Chinese Reconstruction

  • since the work of Duàn Yǔcái 段玉裁 (1735-1815), we know that derived characters were once phonetically not only very similar, but also tend to rhyme in the Book of Odes
  • scholars have since then classified Chinese characters with respect to their phonetic elements in order to aid the reconstruction of Old Chinese phonology, with pioneering work procuded by Bernhard Karlgren (1889-1978)
  • but so far, the classification assigns characters to uniform groups, and barely takes their derivative character into account, although we may assume that at least some aspects of Chinese phonology have left traces in the derivation structure

Character Formation Graphs

Testing Hypotheses on Old Chinese phonology

  • we start from a character formation network along with Middle Chinese readings
  • we then investigate systematic differences in Middle Chinese pronunciations for each graph and check if the graph structure reflects them
  • our application is based on a Python script that reads in the data, based on logical queries for different hypotheses, and produces the result in form of a PDF, convenient to inspect qualitatively by the scholar

Character Formation Graphs

Testing Hypotheses on Old Chinese phonology

Five hypotheses:

  1. Hypothesis SAGART: The distinction between A (without medial -j- in Middle Chinese) and B syllables in Old Chinese is reflected in part in writing.
  2. Hypothesis PAN: The distinction of uvular sounds is reflected in writing.
  3. Hypothesis STAROSTIN: Old Chinese had a distinction between final -n, -r, and -j.
  4. Hypothesis HAUDRICOURT: The departing tone (qùshēng) in Old Chinese goes back to a final -s.
  5. Hypothesis GABELENTZ: Old Chinese distinguished syllabic (Cə-) from non-syllabic prefixes (C-).

Character Formation Graphs

Hypothesis SAGART

  • we find 293 character series (following Karlgren's 1964 analysis), in which both types of syllable are present
  • quite a few of the series support the hypothesis, but we find notable exceptions
  • with a quantitative analysis alone, we cannot investigate these exceptions, as the character structures we have are not necessarily ancient
  • thorough studies by experts will be needed, taking data from excavated manuscripts and ancient stages of Chinese writing into account

alt

alt

Character Formation Graphs

Hypothesis PAN

We find no series in which a clear sub-distinction between velars and uvulars is visible, but we find series which point exclusively to uvulars and velars.

alt

alt

Character Formation Graphs

Remaining Hypotheses

  • Nathan W. Hill is still evaluating the results, but it seems that not all of the hypotheses hold for subseries of character formation.

Character Formation Graphs

Summary

  • More research is needed, especially on refining the data basis.
  • Scholars who work on ancient Chinese writing should start coding their data in form of derivation graphs and pay attention to potential Old Chinese readings.
  • The derivational structure of the Chinese writing system results in non-trivial relationships that cannot be captured by pure computational methods. Instead, a thorough qualitative along with a thorough quantitative investigation will be needed.

Character Formation Graphs

Summary

Published / accepted papers:

  • List, J.-M. (2018): More on Network Approaches in Historical Chinese Phonology (音韵学). In: The 2nd Li Fang-Kuei Society Young Scholars Symposium. 157-174.
  • Hill, N. W. and List, J.-M. (forthcoming): Using chinese character formation graphs to test proposals in Chinese historical phonology. Bulletin of Chinese Linguistics.

Rhyme Notation

img

Rhyme Notation

Excursus: Notation of Music

  • When comparing traditions of musical notations over times and cultures, it is clear that none of the techniques used for notation is capable of rendering the music faithfully how it was perceived when people originally created the music.
  • Despite this general problem of faithful notation, people across times and cultures have tried to develop systems that would freeze the music they heard or played in a durable medium.
  • When comparing notation systems for music which are currently used, we can also say that people have indeed succeeded at least to some degree, to catch the ephemeric with help of their notation systems.

Rhyme Notation

Excursus: Notation of Music

  • Linguistics faces a similar problem of notation, given that we want to represent speech in a durable medium.
  • What is surprising, however, is, that linguistic practice of notation of speech is often less strict, showing much more variation, and a much more limited degree of comparability, than we find in music: despite the efforts of the IPA, there is a huge varation in which linguists actually use the IPA (Anderson et al. forthcoming, https://clts.clld.org).

Rhyme Notation

Excursus: Notation of Music

  • While the high degree of variation in linguistics may also relate to the higher degree of variation in languages in general, it may be helpful for linguists to look at different systems for notation in different cultures and practices, in order to improve the techniques by which we try to represent speech.

Rhyme Notation

From Notation to Annotation

  • We can roughly say that notation serves to reflect a specific practice in a different, usually visual, medium, while annotation aims to add information, such as some kind of analysis or interpretation.
  • We can distinguish two basic techniques for annotation (when dealing with texts): stand-off and inline, with stand-off annotation representing the analysis independent of the text, by indexing its words, for example, and inline-annotation representing the analysis in the text itself (Eckart 2012).
  • It is not clear to me at this point, whether the distinction between notation and annotation is useful after all, as I need to read more about the topic in general.

Rhyme Notation

Annotation of Rhyme Judgments

  • Although rhyme analysis plays a crucial role in the reconstruction of Old Chinese phonology, the field has not yet developed a standardized annotation framework for rhyme judgments applied to Ancient Chinese texts.

Rhyme Notation

Annotation of Rhyme Judgments

Wang Li's annotation (1980)

Rhyme Notation

Annotation of Rhyme Judgments

Baxter's annotation (1992)

Rhyme Notation

Annotation of Rhyme Judgments

Karlgren's annotation (1950)

Rhyme Notation

Annotation of Rhyme Judgments

Starostin's annotation (1989)

Rhyme Notation

Annotation of Rhyme Judgments

Behr's annotation A (2008)

Rhyme Notation

Annotation of Rhyme Judgments

Behr's annotation B (2008)

Rhyme Notation

Annotation of Rhyme Judgments

Problems resulting from missing standards

  1. We have huge problems in comparing different analyses on rhyme judgments.
  2. We have problems in digitizing different analyses on rhyme judgments in order to make them comparable.
  3. We have only a few contributions where scholars actually publish their analyses, given that it is so tedious preparing the annotation.

Rhyme Notation

A Framework for Rhyme Judgments

Main ideas:

  1. Zen of Python: "Simple things should be simple — complex things should be possible"
  2. Simplicity: allow for a framework that be realized in simple spreadsheet editors (like Excel or LibreOffice)
  3. Exhaustiveness: allow for a framework that captures many aspects we already know will be important for rhyme annotation.
  4. Flexibility: allow for a framework that can be easily lifted to more complex annotations, even when initial ones were lacking certain aspects.

Rhyme Notation

A Framework for Rhyme Judgments

Learning from wordlist annotation and CLDF

In order to achieve all these goals, we draw largely from our experience with the enhanced annotation and computer-assisted manipulation of wordlists in historical linguistics (Hill and List 2017) and their subsequent inclusion into the CLDF specifications.

Rhyme Notation

A Framework for Rhyme Judgments

Basic structure

  • Table format, with first row serving as header, and content per cell in a specific column being standardized.
  • Python API for analyzing and checking the data, also supports conversion across formats.
  • Examples of best practice to help scholars to create their data in our format specifications.

Rhyme Notation

Examples

Example 1: Wang's (1980) judgments in our format

Rhyme Notation

Examples

Example 2: Providing alignments of Wang's (1980) judgments

Rhyme Notation

A Simplified Format

  • We offer in addition a simplified format that allows to prepare a dataset in some initial form, which can then later be converted to the extended format and edited therein.
  • This format mimicks how poems are displayed in normal documents, and makes extensive use of inline-annotations.

Rhyme Notation

A Simplified Format

Example 1: Wang's (1980) judgments in the simplified format

Rhyme Notation

A Simplified Format

Example 2: Song «Песня для Цоя» from Zoopark in our simplified annotation.

Rhyme Notation

Visualization of Patterns

  • The annotation is not everything, but it allows us to make use of programming solutions to produce quick visualizations of the patterns in the data.
  • One very straightfoward case is, for example, the visualization of general rhymin schemes along with rhyme words inside a stanza.

Rhyme Notation

Visualization of Patterns

Example 1: Song «Leto» and its rhyme pattern structure

Rhyme Notation

Visualization of Patterns

Example 2: Poem «Zwielicht» (Eichendorff) and its rhyme pattern structure

Rhyme Notation

Visualization of Patterns

Example 3: Dylan's «I want you»

Rhyme Notation

Visualization of Patterns

Example 4: «Yuèliàng dàibiǎo wǒ de xīn»

Rhyme Notation

Visualization of Patterns

Example 5: Silvio Rodriguez «Te doy una canción»

Rhyme Notation

Visualization of Patterns

Two more examples on visualization techniques

Rhyme Notation

Analysis of Patterns

  • We cannot only visualize patterns conveniently in our framework, but also analyze them in multiple ways (many of which we have not yet even developed or thought of)
  • A straightforward analysis is the comparison of alternative rhyme judgments, by different scholars.
  • But also simple statistics, regarding the number of stanzas, the number of rhyme words, etc., in a given collection are straightforward.

Rhyme Notation

Analysis of Patterns: Baxter vs. Wang

  • Both authors (Baxter 1992 and Wáng 1980) describe the same data but analyze it independently.
  • We do not know so far, how similar or different rhyme judgments are among scholars.
  • But we would like to know, as the judgments have a huge impact on the reconstruction.

Rhyme Notation

Analysis of Patterns: Baxter vs. Wang

  • From 1070 common stanzas, 175 are different between Wáng and Baxter, which amounts to 15.9%.
  • Applying enhanced measures that also assess partial similarity between stanzas and general trends, we find 97% of similarity between Baxter's and Wáng’s rhyme judgments.

Rhyme Notation

Analysis of Patterns: Baxter vs. Wang

Wáng (1980)Baxter (1992)

Rhyme Notation

Summary

  • Annotation can unleash very powerful forces in scientific research, and its importance is way too often neglected.
  • Rhyme annotation offers -- in specific for Digital Chinese Linguistics, and in general for Digital Linguistics -- a lot of possibilities for analyses which have so far not been carried out and which could help to investigate questions which have so far not yet been investigated.
  • In the future, we need to work on increasing the number of examples, in order to provide more illustrations for the usefulness of our framework, and for annotation in general.

Rhyme Notation

Literature

  • List, J.-M., N. Hill, and C. Forster (2018): Towards a standardized annotation of rhyme judgments in Chinese historical phonology (and beyond). [Draft article under review]

Outlook

img

Outlook

  • We should try to measure what we investigate, even if we are convinced it can't be measured.
  • Digital Historical Linguistics, as a field that bridges between quantitative and qualitative approaches in historical linguistics, bears a lot of potential for future research, especially given its potential to integrate classical and computational research.
  • Digital Chinese Historical Phonology is an example for the usefulness of digital approaches in linguistics.
  • Much more work will need to be done in the future, we need more scientists to joing the digital fraction.

Vielen Dank fürs Zuhören!

image