Keynote speakers
Keynote speakers:
-
Emily M. Bender, University of Washington
Title: Computer Assisted (Morpho-)Syntax: Grammar Engineering for Linguistic Hypothesis Testing, Linguistic Typology, and Language Documentation.
Abstract: Grammar engineering is the process of encoding formal grammars in machine-readable form, so that the computer can do the tedious work of verifying analyses against data. In this talk, I will give an overview of two long-standing projects which aim to facilitate the use of grammar engineering for linguistic research: The Grammar Matrix and AGGREGATION Projects. The Grammar Matrix (Bender et al 2002, 2010) is an open-source toolkit for helping create implemented precision grammars based on a shared core grammar and a series of typologically informed 'libraries'. The Grammar Matrix itself provides an interesting test-bed for typological generalizations, as each new library must be interoperable with existing ones. The Grammar Matrix solicits a linguistic description through a web questionnaire and then outputs a grammar to spec. The AGGREGATION project (Bender et al 2013, Howell et al 2017, Zamaraeva et al 2019) is exploring methods for automatically answering the Grammar Matrix questionnaire on the basis of collections of interlinear glossed text produced by linguists working in the field. In the short-term this project provides useful feedback to the linguist about patterns in their data, facilitating the language documentation effort. In the long-term, our goal is to be able to create implemented grammars which can be used to parse interestingly large fragments of the languages at hand. These grammars should be useful for both linguistic research and ultimately language technology.
-
Sabrina Bendjaballah, Université de Nantes
Title: On Descriptive Adequacy
Abstract: On the surface, linguistic fieldwork is confronted with a dizzying wealth of variation. However, this wealth of variation turns out to be neither arbitrary, nor unlimited. The linguistic expressions we observe on the surface are manifestations of general principles that limit the logical space of variation by defining the mechanisms that allow the creation of linguistic expressions. The methodological principle behind this hypothesis consists in the meticulous study of the particular phenomena in each language from a universalist perspective. Rather than simply taking stock of particular, language-specific observations, these phenomena are taken to reveal interesting facts about the interplay of general properties of the human language faculty. This research project cannot succeed without a reliable, grammatically structured database. I illustrate how such a database can give rise to robust generalizations with the example of Modern South Arabian languages (minority languages with an oral tradition spoken in the south of the Arabian Peninsula: Yemen, Oman). The data will be mainly from morpho-phonology.
-
Steven Bird, Charles Darwin University & Nawarddeken Academy & University of California Berkeley
Title: Sparse Transcription: Representing and Processing Oral Languages
Abstract: In the rush to document the world's endangered languages, the "transcription bottleneck'' is often cited as the main obstacle standing in the way of efforts to make large quantities of recorded speech amenable to translation, analysis, and pedagogy. One solution is to extend methods from automatic speech recognition and machine translation to low-resource scenarios and recruit linguists to provide phonetic transcriptions and sentence-aligned translations. However, I believe that these approaches are not a good fit with the interests and aptitudes of speakers, and with long-established transcription practices that are essentially word based. In seeking a new approach, I consider a century of transcription practice in linguistics and a variety of computational approaches, before proposing a computational model which I call "sparse transcription''. This represents a shift away from current assumptions that we transcribe phones, transcribe fully, and transcribe first. Sparse transcription combines the orthodox practice of word transcription with interpretive, iterative, and interactive processes which are amenable to wider participation and which open the way to new tools for efficient processing of oral languages.
-
Michel Jacobson, Très Grande Infrastructure de Recherche Huma-Num
Title: FAIR deals?
Abstract: The new acronym FAIR (findable accessible interoperable reusable) summarizes concerns which seem clear and self-evident, but which constitute ever renewed challenges in a digital environment characterized by the rapid evolution of all its technological layers. I will try to show how the Linked Data movement can help to address some of these issues. I will also show that important issues that shine through when discussing FAIR data concern sustainability and long-term conservation.
|