I have been looking for a list of French nouns with corresponding gender for a machine learning experiment. So far I've only found lists of words without gender, or small lists including gender mostly embedded in html pages. Could anyone point me to a comprehensive (10000+), free, downloadable word list?
Asked
Active
Viewed 1,923 times
2 Answers
9
Aller sur Dicollecte
Cliquer sur Lexique [5.2] et télécharger.
Après décompression, votre bonheur se trouve dans le fichier lexique-dicollecte-fr-v5.2.txt
... bienvenue en francophonie !
-
Merci bien à tous! Interestingly, now with two lists as answers we can compare a bit and I noted that the following words are feminine in the Dicollecte list and masculine in the Lexique list: "auto-tamponneuse", "belon", "binaire", "borée", "corpuscule", "flaveur", "haut-de-chausse", "idée-force", "marine", "myriophylle", "penny", "raideur", "sparte", "tire-fesse", "zénana" – Sjoerd C. de Vries Dec 07 '14 at 22:30
-
Same thing for the plurals: "balanes", "belons", "binaires", "boxes", "corpuscules", "délices", "fixe-chaussettes", "hydromètres", "ides", "lombes", "marines", "orgues", "ranches", "tétragones" – Sjoerd C. de Vries Dec 07 '14 at 22:38
-
Le Robert indique Belon féminin (c'est le nom d'une huître plate et arrondie), parfois masculin ; binaire est un adjectif épicène ; borée n'est pas référencée mais peut être un adjectif au féminin, relatif au bore ; corpuscule, haut-de-chausse, sparte, penny, zénana sont masculins – Personne Dec 07 '14 at 22:38
-
orgue, délice, amour sont masculins au singulier, et féminin au pluriel. Les autres sont à vérifier avec http://www.cnrtl.fr/lexicographie/ . – Personne Dec 07 '14 at 22:41
-
Le pluriel d’« orgues » est même féminin lorsqu’il désigne tout ou partie d’un seul instrument (« les grandes orgues ») et masculin lorsqu’il en désigne plusieurs (les plus beaux orgues de la ville). – Édouard Dec 08 '14 at 06:30
-
There is no way 'auto-tamponneuse' can be masculine since '-euse' is really a feminine suffixe, opposed to '-eur' for masculine. 'Tire-fesses' is masculine. 'Raideur' is a feminine word for 'stiffness', but we have the masculine word 'raideur' which comes from the English 'raider'. – Destal Aug 03 '16 at 22:49
-
@SimonDéchamps auto[mobile] est féminin, son adjectif aussi -- une tire est une voiture en argot, mais surtout, on l'associe à tire-bouchons qui est masculin. – Personne Aug 03 '16 at 23:00
4
I'll add Lexique to the list. You can download it in Excel format, and use the Find feature to search for words when you have the document open.
The first 6 columns are the most interesting to French learners (and non-linguists), so you may consider deleting the other columns. There are also other resources available from the same group.
Chris Cirefice
- 909
- 6
- 15
-
1Pour l'ouvrir sous LibreOffice ou sous OpenOffice, il faut d'abord transformer les tabulations en
;et changer le .txt en .csv – Personne Dec 07 '14 at 22:30 -
1Merci beaucoups! Interestingly, now with two lists we can compare a bit and I noted that the following words are masculine in the Dicollecte list and feminine in the Lexique list: "acerbité", "aérobic", "alexithymie", "baston", "bute", "calcite", "chamane", "conteste", "ecstasy", "intifada", "job", "liche", "loco", "pole", "robinsonnade", "rocket", "super", "syzygie", "tome", "unetelle". – Sjoerd C. de Vries Dec 07 '14 at 22:31
-
Same thing for the plurals: "aérobics", "bastons", "chamanes", "faînes", "grands-oncles", "iles", "jobs", "jupes-culottes", "kilotonnes", "locos", "rockets", "santiags", "supers", "syzygies", "tapas", "tomes" – Sjoerd C. de Vries Dec 07 '14 at 22:37
-
@SjoerdC.deVries Interesting... I'll have to look into that when my semester is over. If you're interested, the TLF is a good source to compare against. – Chris Cirefice Dec 07 '14 at 22:41
-
1I noticed quite a few of the differences concern loan words, so I can imagine that gender will not be well defined there. There's still work for the Académie française to do. – Sjoerd C. de Vries Dec 07 '14 at 22:47
-
@SjoerdC.deVries Ah yeah, if there are differences, for your algorithms it might be safe to assume masculine/singular. If you want, you can try to guess by the morphology of the word, but that's more linguistic work and I don't know if you'd be interested in doing that. – Chris Cirefice Dec 07 '14 at 22:52
-
Well, the above are only a small minority. The two lists combined give me 92996 nouns (f/m; p/s). A few misclassifications (if they are that) won't mind. – Sjoerd C. de Vries Dec 07 '14 at 22:59