Morphology based automatic acquisition of large-coverage lexica
CLÉMENT, Lionel
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Linguistic signs, grammar and meaning: computational logic for natural language [SIGNES]
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Linguistic signs, grammar and meaning: computational logic for natural language [SIGNES]
CLÉMENT, Lionel
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Linguistic signs, grammar and meaning: computational logic for natural language [SIGNES]
< Réduire
Laboratoire Bordelais de Recherche en Informatique [LaBRI]
Linguistic signs, grammar and meaning: computational logic for natural language [SIGNES]
Langue
en
Communication dans un congrès
Ce document a été publié dans
LREC 04, LREC 04, LREC 04, 2004, Lisbonne. 2004p. 1841-1844
Résumé en anglais
In this article, we introduce a new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French. Basically, it relies on the idea that the ...Lire la suite >
In this article, we introduce a new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French. Basically, it relies on the idea that the existence of a hypothetical lemma can be guessed if several different words found in the corpus are best interpreted as morphological variants of this lemma. We first validated our technique by extracting verbs and adjectives on a general French corpus of 25 million words. Compared with other lexical resources available for French, our results are very satisfying, since we cover many words, often derived words, that are not always present in other lexica. Application of our algorithm to the acquisition of domain-specific adjectives on a botanic corpus gave also very good results, thus demonstrating its usability to extract domain-specific lexica. Moreover, it is generalizable to any language with a substantial morphology.< Réduire
Origine
Importé de halUnités de recherche