Skip to content

open-dsl-dict/wikidict-dsl-fr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

wikidict-dsl-fr - Wikidata Bilingual DSL Dictionaries (French)

This repository makes available a collection of bilingual French dictionaries in DSL format derived from interwiki links (links between article titles in different languages) in Wikipedia. The data has been extracted from Wikidata.

Format

ABBYY Lingvo DSL is a flexible dictionary format that can be read by dictionary applications such as Goldendict and converted to other formats using tools such as pyglossary. There are also a number of tools for creating DSL format dictionaries available in the dsl-tools project.

DSL files must be saved as UTF-16 to be usable by dictionary programs. The raw source files in this repository are saved in UTF-8 format, which is both significantly smaller in terms of file size, and also readable (and diffable) by git. However, there are fully encoded and compressed .dsl.dz dictionaries ready for use available in the Releases section.

You can also use the rezip_dsl.rb and unzip_dsl.rb scripts provided by the dsl-tools repo to encode/compress and decode/uncompress the dictionaries either individually or as a group.

Data

The data directory contains the bilingual dictionaries in pairs according to ISO language code.

The basic filename pattern is [ISO]-fr_wikidict.dsl, with [ISO] being the source language ISO code. A list of all language pairs is below.

Available language pairs

Language codes Language names
af-fr Afrikaans => French
am-fr Amharic => French
ang-fr Anglo-Saxon => French
ar-fr Arabic => French
arc-fr Aramaic => French
bg-fr Bulgarian => French
bi-fr Bislama => French
bn-fr Bengali => French
bo-fr Tibetan => French
br-fr Breton => French
bs-fr Bosnian => French
ca-fr Catalan => French
cdo-fr Min Dong => French
chr-fr Cherokee => French
chy-fr Cheyenne => French
cr-fr Cree => French
cs-fr Czech => French
cy-fr Welsh => French
da-fr Danish => French
de-fr German => French
el-fr Greek => French
en-fr English => French
eo-fr Esperanto => French
es-fr Spanish => French
et-fr Estonian => French
eu-fr Basque => French
fa-fr Persian => French
ff-fr Fula => French
fi-fr Finnish => French
ga-fr Irish => French
gan-fr Gan => French
gd-fr Scottish Gaelic => French
gu-fr Gujarati => French
gv-fr Manx => French
ha-fr Hausa => French
hak-fr Hakka => French
haw-fr Hawaiian => French
he-fr Hebrew => French
hi-fr Hindi => French
hr-fr Croatian => French
ht-fr Haitian => French
hu-fr Hungarian => French
hy-fr Armenian => French
id-fr Indonesian => French
ig-fr Igbo => French
is-fr Icelandic => French
it-fr Italian => French
iu-fr Inuktitut => French
ja-fr Japanese => French
jbo-fr Lojban => French
jv-fr Javanese => French
ka-fr Georgian => French
kg-fr Kongo => French
ki-fr Kikuyu => French
kl-fr Greenlandic => French
km-fr Khmer => French
ko-fr Korean => French
la-fr Latin => French
lg-fr Luganda => French
lo-fr Lao => French
lt-fr Lithuanian => French
lv-fr Latvian => French
mg-fr Malagasy => French
mi-fr Maori => French
mn-fr Mongolian => French
ms-fr Malay => French
mt-fr Maltese => French
nah-fr Nahuatl => French
ne-fr Nepali => French
nl-fr Dutch => French
nn-fr Norwegian (Nynorsk) => French
no-fr Norwegian => French
nv-fr Navajo => French
ny-fr Chichewa => French
oc-fr Occitan => French
pa-fr Punjabi => French
pi-fr Pali => French
pl-fr Polish => French
ps-fr Pashto => French
pt-fr Portuguese => French
qu-fr Quechua => French
ro-fr Romanian => French
ru-fr Russian => French
sa-fr Sanskrit => French
se-fr Northern Sami => French
sh-fr Serbo-Croatian => French
sk-fr Slovak => French
sl-fr Slovenian => French
sn-fr Shona => French
so-fr Somali => French
sq-fr Albanian => French
sr-fr Serbian => French
sv-fr Swedish => French
sw-fr Kiswahili => French
ta-fr Tamil => French
te-fr Telugu => French
th-fr Thai => French
tl-fr Tagalog => French
tpi-fr Tok Pisin => French
tr-fr Turkish => French
ug-fr Uyghur => French
uk-fr Ukrainian => French
ur-fr Urdu => French
vi-fr Vietnamese => French
wo-fr Wolof => French
wuu-fr Wu => French
xh-fr Xhosa => French
yi-fr Yiddish => French
yo-fr Yoruba => French
za-fr Zhuang => French
zh-fr Chinese (Mandarin) => French
zh_classical-fr Classical Chinese => French
zh_min_nan-fr Min Nan => French
zh_yue-fr Cantonese => French
zu-fr Zulu => French

Statistics

Dictionary size

Language pair # of entries
af-fr 24890
am-fr 5996
ang-fr 2316
ar-fr 145641
arc-fr 1343
bg-fr 113743
bi-fr 455
bn-fr 19543
bo-fr 2766
br-fr 41847
bs-fr 30045
ca-fr 251879
cdo-fr 2082
chr-fr 466
chy-fr 608
cr-fr 107
cs-fr 169662
cy-fr 28050
da-fr 107879
de-fr 594509
el-fr 60564
en-fr 1011051
eo-fr 127902
es-fr 487072
et-fr 64190
eu-fr 139674
fa-fr 180170
ff-fr 202
fi-fr 202110
ga-fr 21677
gan-fr 4819
gd-fr 11319
gu-fr 3677
gv-fr 4169
ha-fr 395
hak-fr 2881
haw-fr 1329
he-fr 103087
hi-fr 25527
hr-fr 76046
ht-fr 15369
hu-fr 151501
hy-fr 46422
id-fr 115432
ig-fr 737
is-fr 23163
it-fr 563341
iu-fr 355
ja-fr 288858
jbo-fr 1159
jv-fr 16753
ka-fr 50545
kg-fr 778
ki-fr 287
kl-fr 1580
km-fr 1715
ko-fr 145182
la-fr 89649
lg-fr 160
lo-fr 1105
lt-fr 77136
lv-fr 39405
mg-fr 49683
mi-fr 2176
mn-fr 10784
ms-fr 128930
mt-fr 2422
nah-fr 6954
ne-fr 8045
nl-fr 468595
nn-fr 57749
no-fr 193420
nv-fr 1745
ny-fr 110
oc-fr 79195
pa-fr 8333
pi-fr 2250
pl-fr 459190
ps-fr 2759
pt-fr 399551
qu-fr 11735
ro-fr 158249
ru-fr 435871
sa-fr 4766
se-fr 5628
sh-fr 131164
sk-fr 129404
sl-fr 70604
sn-fr 1520
so-fr 2347
sq-fr 31260
sr-fr 149024
sv-fr 351538
sw-fr 18710
ta-fr 23268
te-fr 8040
th-fr 51146
tl-fr 36109
tpi-fr 1165
tr-fr 116138
ug-fr 2226
uk-fr 238619
ur-fr 34069
vi-fr 211578
wo-fr 931
wuu-fr 2369
xh-fr 261
yi-fr 7523
yo-fr 15025
za-fr 651
zh-fr 285587
zh_classical-fr 2531
zh_min_nan-fr 10436
zh_yue-fr 18199
zu-fr 582

Top ten dictionaries by number of entries

Language pair # of entries
en-fr 1011051
de-fr 594509
it-fr 563341
es-fr 487072
nl-fr 468595
pl-fr 459190
ru-fr 435871
pt-fr 399551
sv-fr 351538
ja-fr 288858

License

According to the Wikidata website:

All structured data from the main and property namespace is available under the Creative Commons CC0 License

The data in this repository is therefore made available under the same Creative Commons CC0 License as that used by the Wikidata project. All of the data has been derived from the Wikidata JSON format database dumps.