Conférenciers invités

Conférenciers invités :

Emily M. Bender, University of Washington

Titre : Computer Assisted (Morpho-)Syntax:
Grammar Engineering for Linguistic Hypothesis Testing, Linguistic Typology, and Language Documentation.

Résumé : Grammar engineering is the process of encoding formal grammars in machine-readable form, so that the computer can do the tedious work of verifying analyses against data. In this talk, I will give an overview of two long-standing projects which aim to facilitate the use of grammar engineering for linguistic research: The Grammar Matrix and AGGREGATION Projects. The Grammar Matrix (Bender et al 2002, 2010) is an open-source toolkit for helping create implemented precision grammars based on a shared core grammar and a series of typologically informed 'libraries'. The Grammar Matrix itself provides an interesting test-bed for typological generalizations, as each new library must be interoperable with existing ones. The Grammar Matrix solicits a linguistic description through a web questionnaire and then outputs a grammar to spec. The AGGREGATION project (Bender et al 2013, Howell et al 2017, Zamaraeva et al 2019) is exploring methods for automatically answering the Grammar Matrix questionnaire on the basis of collections of interlinear glossed text produced by linguists working in the field. In the short-term this project provides useful feedback to the linguist about patterns in their data, facilitating the language documentation effort. In the long-term, our goal is to be able to create implemented grammars which can be used to parse interestingly large fragments of the languages at hand. These grammars should be useful for both linguistic research and ultimately language technology.
Sabrina Bendjaballah, Université de Nantes

Titre : On descriptive adequacy

Résumé: À première vue, le linguiste de terrain est confronté à une variation linguistique particulièrement frappante. Cette diversité cependant se révèle être ni aléatoire ni illimitée : les différences de surface entre les langues sont explicables par des principes généraux, qui définissent ce qui est possible et ce qui ne l’est pas.

Le principe méthodologique qui permet d'arriver à ce résultat consiste à étudier les phénomènes particuliers de chaque langue avec la plus grande précision, puis à ramener la variation de surface à des principes universels, caractéristiques de la capacité humaine de langage.

Cette entreprise ne peut aboutir que si elle se fonde sur un ensemble de données structuré. A partir d'un cas particulier, celui des langues sudarabiques modernes, langues minoritaires, orales, parlées au sud de la péninsule Arabique (Yémen, Oman), j'illustre de quelle façon une base de données organisée selon des critères linguistiques précis permet de formuler des généralisations robustes. Mon propos s'appuiera essentiellement sur des questions morpho-phonologiques.
Steven Bird, Charles Darwin University & Nawarddeken Academy & University of California Berkeley

Titre : Sparse Transcription: Representing and Processing Oral Languages

Résumé : In the rush to document the world's endangered languages, the "transcription bottleneck'' is often cited as the main obstacle standing in the way of efforts to make large quantities of recorded speech amenable to translation, analysis, and pedagogy. One solution is to extend methods from automatic speech recognition and machine translation to low-resource scenarios and recruit linguists to provide phonetic transcriptions and sentence-aligned translations. However, I believe that these approaches are not a good fit with the interests and aptitudes of speakers, and with long-established transcription practices that are essentially word based. In seeking a new approach, I consider a century of transcription practice in linguistics and a variety of computational approaches, before proposing a computational model which I call "sparse transcription''. This represents a shift away from current assumptions that we transcribe phones, transcribe fully, and transcribe first. Sparse transcription combines the orthodox practice of word transcription with interpretive, iterative, and interactive processes which are amenable to wider participation and which open the way to new tools for efficient processing of oral languages.
Michel Jacobson, Très Grande Infrastructure de Recherche Huma-Num

Titre : Les affaires FAIR ?

Résumé : Autour de ce nouvel acronyme FAIR (findable accessible interoperable reusable), se regroupent des préoccupations qui ne sont pas nouvelles mais qu’il faut sans cesse réinventer dans un environnement numérique caractérisé par une évolution rapide de toutes ses couches technologiques. Je tenterai de montrer en quoi le mouvement du web de données peut aider à répondre en partie à traiter ces problématiques. Je montrerai également qu’à travers ces termes, c’est souvent la question de la durabilité qui est posée.

Personnes connectées : 9