On November 17th and 18th two study days on dictionary creation will take place in Lyon. The first day will allow discussion about methods and practices with a session dedicated to the Wiktionary project. The second day will be participative with group training in dictionary writing in the Wiktionary project, focused on the ten words of the Francophonie. And it’s co-organized by Lyokoï and Noé, two editors of Actualités!
- In August, we were talking about a Quebec-based association for the promotion of the Shawiya language. An interview in the newspaper L’initiative allows you to learn more about their activities, which include a workshop on contribution to Wiktionary!
- In an article in the newspaper Le Soir, Michel Francard is interested in the term tote-bag and notes “Wiktionary, well known for its lexical scouting, mentions one [attestation] as early as June 2013”.
- In an article in English, the Kansas City Star writes about whitesplaining by referring to the entry in the English Wiktionary, whitesplain. This word has not yet been added to the French Wiktionary.
- The Swiss newspaper 24 heures reports on a court case in which Wiktionary was called upon because it was the only dictionary to offer a definition for a problematic term. At the time of the first judgment the court had to determine whether the term used by the accused was a racist insult and relied solely on the definition given in Wiktionary. On appeal, the Federal Court pointed out that “Wiktionary has no official character and its definitions are open to modifications”. An entirely correct analysis and especially true in this case, since the page in question doesn't yet have any usage examples that could help identify the context in which the term is used. The case was transferred to the Cantonal Court of Vaud.
- An columnist on the website Jeuxvideo.com does not hesitate to rely on the “knowledge of Wiktionary” to define video game as a preamble of a controversial article.
- In a long paper published on the website Les Numériques, Jérôme Cartegini presents and compares the encyclopedia Universalis 2018, Le Grand Robert online and Wikipedia. Too bad Wiktionary was not included in the comparison, as it contains more entries than the Grand Robert and more usage examples.
- The designation blanco, tipex or blanc (correcteur) is regionalized according to français de nos régions. Much more than we think. And according to the author, the Wiktionary is the only dictionary to have all the variants of blanc correcteur (or not...).
- An article on the lemonde.fr blog reports on the role of the internet in the evolution of conlang communities. A story that quickly forgets the previous centuries, whose creations were described in a book presented in the Actualités of January 2016.
- From mid-September to mid-October (from 09/20/2017 to 10/20/2017)
- French entries increased by 1,508 and quotations increased by 1,080. There are now 357,235 lemmas, 527,416 definitions and 331,833 quotations or examples.
- The three other languages which progressed the most are Northern Sami (+ 6,554 entries), Italian (+ 1,209 entries) and Esperanto (+ 280 entries).
- Four languages were added in the project (here with French names): lhomi (+1), merei (+1), limilngan (+1) and lorrain (+1).
- In October 11,272 entries were created for 76 languages!
- New lexicons
- Creation of Catégorie:Lexique en français de l’e-commerce.
- Words of the month
Statistics provided by Wikiscan:
A vote decided to divide existing thesauri with ambiguous titles and led to the creation of: cirque (naturel) and cirque (spectacle); langue (anatomie) and langue (linguistique); paresseux (animal) and paresseux (personne); assimilation culturelle and assimilation (biologie); racine (végétale), racine (odontologie), racine (linguistique), racine (informatique), racine (géologie) and racine (figuré et sociologique).
In addition,User:Assassas77 continued to create thesauri in Tagalog, by creating six more!
As of October 31st 2017, the French Wiktionary offers 317 thesauri in French and a total of 452 thesauri in 54 languages!
Twenty-three new thesauri this month, five of which in French: punition [punishment], peine de mort [death penalty], prison [jail] (first thesaurus creation by Classiccardinal!), armure [armour] and tissage [weaving].
- There are 32,855 illustrations (images and videos) in the French Wiktionary entries and 258 were added since last month.
The Questions on words page (WT:QM) records 189 questions in September, compared to 197 in September, 141 in August and 124 in July.
Identifying a root
In automatic language processing, several operations can be used to produce tools for a language. Richard Khoury and Francesca Spasford have tried to create a tool for Latin stemming from the English Wiktionary, which they report in their article “ Latin word stemming using Wiktionary ” (in Digital Scholarship in the Humanities, volume 31, number 2, June 2016, pages 368–373). Their pilot approach consisted of exploiting the database and links between pages that are specified in very precise declination models in order to link the roots to ending for verbs and suffixes for nouns. From a database dump of May 2015, they proceeded with three cleaning steps then obtained 655,434 word forms for 32,860 roots.
The best tool before their experimentation, the Schinke Stemmer, worked on a different principle since it was a set of rules that allowed to automatically stem by creating hypothetical roots, which were not necessarily words but which nevertheless reduced the number of words in a text, and made it easier to search in a search engine for example.
By comparing both tools, they observe that the one based on the Wiktionary misses the words that it does not know, but nevertheless that reduces the vocabulary of a text much more effectively. In addition, it allows you to access a definition dictionary directly afterwards, which was not possible with the previous tool. They even plan to improve their use of the Wiktionary database to integrate the part of speech categories of entries in order to produce an additional tool for morpho-syntactic labelling of a corpus.
These uses show that the Wiktionary projects contain data that are not only usable as a dictionary, but also allows, through their regular structures, reuse by machines to create new tools — a review by Noé.
About patrol and patrollers
Some remarks about the role of patrollers:
Patrollers are editors who spend some of their time to read contributions made on Wiktionary.
They have a tool which tells them about changes still in need of patrolling. Only anonymous contributions or edits made by users lacking the "auto-patrol" flag have to be checked.
After proofreading they can mark a contribution as patrolled.
Being patrolled means free of vandalism in a broad sense, which implies:
- deletion of clearly defamatory material
- deletion of material containing personal information
- deletion of information irrelevant to the page title
- deletion of copyrighted information
- restoring of correct information after deletion or corruption
These are the basic actions of the patroller. They may, in this context, if they are not administrators, be required to request that contributions containing defamation, personal information and copyright violations be concealed by them.
Then, the patroller can, wether he wishes, go further by operating on the presentation of various possible additional actions such as:
- to correct a page to conform to the expected structure of a Wiktionary page
- to correct typography
- to correct spelling
- to correct or add templates
- to correct or add categories
- to check the sources and references that are used.
Last but not least, and by far the most interesting, it can investigate the substance, ensuring the accuracy of a contribution, or even providing additional information or corrections.
It must be said, this part is by far the longest and also the least easy.
Thus, it is possible:
- to add missing inflections
- to add quotes
- to add pronunciations, anagrams, etc.
- to verify the accuracy of translations.
Concerning this last point, it is necessary to have a certain level of linguistic skills, very rich material on a large number of languages and knowledge of the grammar of several languages — which is not the case for everyone.
Translation errors are indeed numerous, although made in good faith, often because of the metonymy processes are not the same for all languages. This means that it is sometimes fatal to copy a translation found elsewhere (dictionary, Wikipedia, etc.).
For example, many languages distinguish by different names the action from its result, the content from its container, the building from the institution, etc., where the French language does not necessarily do so. Thus, in Finnish: loading (action): kuormaus / loading (what is loaded): kuormitus; the town hall (the building): kaupungintalo / the town hall (administration): pormestarin
And of course, we find the same problem in the opposite direction Finnish/French.
It is, however, quite rare to encounter real mistranslations. I remember one, several years ago, on the English Wiktionary who had amused me: intrigued by the fact that I found several pages on the net giving the word anaullaut in Inuktitut, and knowing that this word meant stick I found, after some research, that the origin was that a contributor had found in an Inuktitut/English dictionary: anaullaut : bat and has created this entry on the English Wiktionary by specifying Category:Animals; this has been reused and translated into French by other websites.
Yet, alas for him, it was the English word bat but in his meaning of batte — for example in baseball — and not bats (animal) ...
If you have also noticed some crazy or funny contributions, do not hesitate to report them here for a future issue. — a chronicle by Unsui
Dictionary of the month
- Yann Lukas, Les Mots celtes clandestins, coop breizh, 2017 ISBN 978-2-84346-834-6
What happens when Wiktionary becomes a reference against its own will? When discussing the sources of our project it becomes clear that they aren't at all structured like on Wikipedia. We don't share the same attitude towards original research and could even serve as a source ourselves. Well. Actually we're doing this already. And I can prove it with this little dictionary of the month. A pocket reference which gives an overview of “French vocabulary borrowed from Gaulish, Breton and the Celtic languages”. Yann Lukas shows us some familiar words and some with an unexpected Celtic origin. He suggests Celtic roots for some slang words where standard dictionaries are lost: à dache, loufer, morfal and many more.
But on page 62 we find a funny turn of phrase: Tamis: although disputed, the Gaulish etymology of tamis is tempting. In his Dictionnaire des étymologies obscures (Payot, 1982), Pierre Guiraud opts for the Latin origin stamen, also the root of étamine. Wiktionary prefers the Low-Franconian tamisa (source of Old Dutch teems). [...] So we are cited in a recent etymological analysis. And our hypothesis for tamis'' isn't very solid. Actually it has been added by an IP without giving sources, and other users have added more on top of this. Still, it shouldn't be discarded entirely since an etymologist has attested a certain base.
Apart from this small appearance which might bring us fame (or not), this short dictionary of Celtic words is filled with anecdotes about Celtic languages, allowing us to understand them better in our world today. We also get to wonder about tortured Breton which got words that don't suit it: menhir (the Bretons say peulvan), dolmen (they say lichaven), kermesse (from the Flamish kerkmisse) or even triskèle (from Greek and written as triskell to make it look more Celtic). — a chronicle by Lyokoï
Cette rubrique vous propose de faire une revue des vidéos sur la linguistique et la langue française du mois, n’hésitez pas à ajouter les vidéos que vous découvrez !
- Le Monde : le site web du journal Le Monde a publié une vidéo de 4 minutes sur l’écriture inclusive.
- Benoît Sagot: Extracting an Etymological Database from Wiktionary est une conférence donnée à l’eLex conference (à Leiden, Pays-Bas) par un lexicographe français Benoît Sagot. Il a extrait un arbre étymologique simple mais avec beaucoup d’entrées du Wiktionnaire anglophone: EtymDB.
- Doct’Auvergne : Dans l’émission « le dicovergne » on nous parle de sérendipité.
- Linguisticae : Une première vidéo qui balance des hypothèses étymologiques sur les anglicismes qui ne le sont pas et une autre avec quelques arguments sur l’écriture inclusive.
Curiosité : La fonction phatique
Parmi les six grandes fonctions du langage définies par Roman Jackobson, la fonction phatique correspond à ce qui permet de s’assurer que le canal de communication fonctionne bien. Ce sont d’abord tous les mots ou expressions comme « tu vois » ou « tu me suis ? » mais aussi les mots utilisés lors de l’initiation d’une communication téléphonique comme « allô ? ». Marina Yaguello étend l’analyse à tous les discours mondains qui n’ont que pour but de maintenir la conversation, sans pour autant servir à partager quoi que ce soit. En restant au niveau des phrases et des mots, c’est un enjeu délicat pour un dictionnaire que de décrire ces usages. D’une part car il existe de grandes variations dans les termes employés, et que trouver des attestations écrites n’est pas toujours évident. D’autre part car qu’il est difficile de bien expliquer la fonction de ces termes. Ce sont souvent des phrases entières, comportant un verbe, mais qui sont vidées de leurs sens, pour avoir seulement une fonction communicationnelle. — une chronique par Noé