• No results found

Methods and tools for automating language engineering

N/A
N/A
Protected

Academic year: 2021

Share "Methods and tools for automating language engineering"

Copied!
2
0
0

Loading.... (view fulltext now)

Full text

(1)

Methods and tools for automating language engineering

Grégoire Détrez

Thesis submitted for the degree of Doctor of Philosophy in Computer Science at the Department of Computer Science and Engineering, Chalmers University of Technology &

University of Gothenburg, Göteborg, Sweden

To be defended in public, 10.00 am, 2

nd

June, 2016 in room EA, Hörsalsvägen 11, Göteborg

Faculty opponent Assistant Professor Mans Hulden

Department of Linguistics University of Colorado

U.S.A.

Department of Computer Science and Engineering Chalmers University of Technology

University of Gothenburg

SE-412 96 Göteborg, Sweden

Telephone + 46 (0)31–772 1000

(2)

Methods and tools for automating language engineering

Thesis for the degree of Doctor of Philosophy in Computer Science GRÉGOIRE DÉTREZ

Department of Computer Science and Engineering

Chalmers University of Technology & University of Gothenburg

Abstract

Language-processing software is becoming increasingly present in our society. Making such tools available to the greater number is not just a question of access to technology but also a question of language as they need to be adapted, or localized, to each linguistic community. It is thus important to make the tools necessary to the engineering of language-processing systems as accessible as possible, for instance through automation.

Not so much to help the traditional software creators but more importantly to enable communities to bring their language use into the digital world on their own terms.

Smart paradigms are created in the hope that they can decrease the amount of work for the lexicographer who wishes to create or update a morphological lexicon. In the first paper, we evaluate smart paradigms implemented in GF. How good are they to guess the correct inflection tables? How much information is required? How good are they at compressing the lexicon?

In the second paper, we take some distance from the smart paradigms, although they have been used in this work, they are not the main focus of the study. Instead, we compare two rule-based machine translation systems based on different translation models and try to determine the potential of a possible hybridization.

In the third paper we come back to the smart paradigms. If they can reduce the work of the lexicographer, someone still needs to create the smart paradigms in the first place.

In this paper we explore the possibility of automatically creating smart paradigms based on existing traditional paradigms using machine-learning techniques.

Finally, the last paper presents a collection of tools meant to help grammar engineering work in the Grammatical Framework community: a tokenizer; a library to embedded grammars in Java applications; a build server; a document translator and a kernel to Jupyter notebooks.

Keywords: Natural language processing, Language Engineering, Morphology, Lexicon,

Complexity

References

Related documents

Complementary lipid imaging and analysis of mouse brain samples using nanoparticle- laser desorption ionization and high energy argon cluster secondary ion mass spectrometry.. Amir

Once given, a set of paradigms can be used in automated lexicon extraction from raw data, as in (Forsberg, Hammarström, and Ranta 2006) and (Clément, Sagot, and Lang 2004), by a

The hardware consisted of a PC, an Acer 23” touchscreen monitor, and a fixed camera sitting on top to capture the helper’s deictic gestures and operations (See Figure 6). In the

 The SOLWEIG model is using a command-line as interface. This means that, in order to communicate with the software, the user has to type commands and the input data

22 In relation to the No-Difference View, Meyer discusses this version of the disjunctive view: “[…] An action (or inaction) at time t 1 harms someone only if either [...] the

Secondly, stability margins in the Nyquist plot are de- ned in terms of a novel clover like region rather than the conventional circle.. The clover region maps to a linearly

Joel Kr onander Ph ysically Based Rendering o f Synthetic Object s in Real En vir onment s 2015.. Department of Science and Technology

• With a better game controller I’m willing to play some more, otherwise it is too difficult and frustrating.