Afficher la notice abrégée

hal.structure.identifierCentre de recherche sur la langue et les textes basques [IKER]
dc.contributor.authorJOUITTEAU, Mélanie
dc.date.accessioned2024-01-11T03:13:05Z
dc.date.available2024-01-11T03:13:05Z
dc.date.issued2023-08-18
dc.date.conference2023-08-18
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/187032
dc.description.abstractEnThis paper is a position paper concerning corpus-building strategies in minoritized languages in the Global North. It draws attention to the structure of the non-technical community of speakers, and concretely addresses how their needs can inform the design of technical solutions. Celtic Breton is taken as a case study for its relatively small speaker community, which is rather well-connected to modern technical infrastructures, and is bilingual with a non-English language (French). I report on three different community internal initiatives that have the potential to facilitate the growth of NLP-ready corpora in FAIR practices (Findability, Accessibility, Interoperability, Reusability). These initiatives follow a careful analysis of the Breton NLP situation both inside and outside of academia, and take advantage of preexisting dynamics. They are integrated to the speaking community, both on small and larger scales. They have in common the goal of creating an environment that fosters virtuous circles, in which various actors help each other. It is the interactions between these actors that create qualityenriched corpora usable for NLP, once some low-cost technical solutions are provided. This work aims at providing an estimate of the community's internal potential to grow its own pool of resources, provided the right NLP resource gathering tools and ecosystem design. Some projects reported here are in the early stages of conception, while others build on decade-long society/research interfaces for the building of resources. All call for feedback from both NLP researchers and the speaking communities, contributing to building bridges and fruitful collaborations between these two groups.
dc.language.isoen
dc.source.titleProceedings of the 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023 )
dc.subject.enFAIR practices
dc.subject.encorpus-building tools
dc.subject.encitizen science
dc.subject.enopen science
dc.subject.enlanguage policies
dc.subject.enCeltic
dc.subject.enBreton
dc.title.enCommunity Internally-driven Corpus Buildings. Three Examples from the Breton Ecosystem
dc.typeCommunication dans un congrès
dc.identifier.doi10.21437/sigul.2023-22
dc.subject.halInformatique [cs]
dc.subject.halSciences de l'Homme et Société/Linguistique
bordeaux.hal.laboratoriesIKER - UMR 5478*
bordeaux.institutionUniversité Bordeaux Montaigne
bordeaux.countryIE
bordeaux.title.proceedingProceedings of the 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023 )
bordeaux.conference.cityDublin
bordeaux.peerReviewedoui
hal.identifierhal-04384300
hal.version1
hal.invitednon
hal.proceedingsoui
hal.conference.end2023-08-20
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//hal-04384300v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.btitle=Proceedings%20of%20the%20%202nd%20Annual%20Meeting%20of%20the%20ELRA/ISCA%20SIG%20on%20Under-resourced%20Languages%20(SIGUL%202023%20)&rft.date=2023-08-18&rft.au=JOUITTEAU,%20M%C3%A9lanie&rft.genre=unknown


Fichier(s) constituant ce document

FichiersTailleFormatVue

Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée