lingvoj.org
Dedicated to the publication and use of multilingual RDF descriptions of human languages,
to be used as Linked Data.
What's in there?
-
http://www.lingvoj.org/lingvoj.rdf
is the complete RDF file gathering currently the description of 522 languages,
including all languages defined by ISO 639-1 and most of ISO 639-2 codes, and all
languages having an active Wikipedia. Descriptions include all ISO 639 codes when available,
owl:sameAs links to the URIs defining those languages in different RDF publications, and
labels available in multilingual Wikipedias. About 17,000 such labels are available in
the current version, in over 250 languages.
- Each individual language is identified by a URI in the namespace
http://www.lingvoj.org/lang/. The fragment identifier is a language code conformant
to
BCP 47
.
The language code is a two-letters code defined by ISO 639-1 when available, or
three-letters ISO 639-2, or 639-3 default the previous ones. It can also include regional
tags, for example en-us or en-gb. Such codes are used as values of the
"xml:lang" attribute, and also as the prefix of the Wikipedia in this language.
For
example "zh" is the code for Chinese language. Therefore this language is identified by
http://www.lingvoj.org/lang/zh
.
Content negociation is used to redirect this URI either to a human-readable HTML
page
http://www.lingvoj.org/lingvo/zh.html
, or to a RDF page containing the formal description
http://www.lingvoj.org/lingvo/zh.rdf
.
- The Lingvoj Ontology is used,
declaring a "Lingvo" Class, its attributes such as ISO 639 codes and the way to link
languages to FOAF resources (properties having Lingvo class as rdfs:range). Some examples of
the use of those properties are available in my FOAF profile.
Disclaimer
Resources provided here have no official status, but URIs in the lingvoj namespaces are intented to remain "cool", which means stable and dereferencable. ISO codes have been cross-checked carefully. Beyond that, quality of information for each resource may vary. Labels are subject to change with data quality checking. Alternate URIs linking to other data sets may not always be available and dereferencable.
Please report bugs, remarks and questions to Bernard Vatant, or to the Linking Open Data mailing list.
What's up?
2009-04-06 : New version of the ontology
Version 1.3 introduces the use of dcterms:language, as a superproperty of various lingvoj object properties, and its inverse property "is language of", used to link to active Wikipedia in the language when available (265 such languages to date).
2009-04-02 : Links to and from other data sets in the Linked Data Cloud
- Lingvoj.org URIs are used by the Linked Movie Data Base for languages of movies. Over 28,000 links to 82 different languages.
- Musicbrainz language URIs are now linked to and from lingvoj.org URIs.
- Hugh Glaser and Ian Millard from the RKB Explorer initiative team provided hundreds of alternate URIs mapped to lingvoj's ones, including UMBEL and Wordnet URIs.
2009-04-01 : Long overdue new release!
Updated Wikipedia languages and labels. Some stats : 522 languages, 1,585 values of ISO 639 codes, 1,452 alternate URIs, and 16,950 labels in 251 different languages. All languages have at least an english label. Most represented languages for multilingual labels are French (549), German (457), Russian (419) and Spanish (407) ... more.
Fixed CYC URIs, which had been broken for a while, and added frebase URIs. Both based on DBpedia owl:sameAs declarations.
Simplified the stylesheet, removing the (questionable and uncomplete) depictions. The stylesheet is now called from inside the RDF individual files, both IE and Firefox seem happy with it.
2008-01-28 : Release of Lingvoj Ontology
v1.2, declaring the "Lingvo" class as subclass of "LinguisticSystem" as defined by the
new release of Dublin Core terms in RDFS.
2007-11-29 : Release of Lingvoj Ontology
v1.1, including the Translation class, allowing to declare facts such as : The
resource A in original language L1 has beeen translated into resource B in target language
L2, by the the translator Z. Examples of use for translations of W3C
recommandations.
2007-10-09 : Eventually, with the precious help from the Linking Open Data community,
achieved publication with proper content negociation, which works well with Firefox. For some
reason this content negociation is not well supported by Internet Explorer.
Note that
this results in new URIs for languages. URIs used in previous versions are no longer
supported. Cools URIs never change, which means the previous ones were not cool, and the new
ones should be stable from now on.
What does "lingvoj" mean?
"Lingvoj" means "Languages" in Esperanto. It's
the plural form of "Lingvo".
Why do we need that?
Languages are an endangered heritage
According to Ethnologue, the number of human
languages currently used in the world amounts to almost 7,000. About half of them is on the
verge of extinction. Only a small fraction is supported by some writing system and have
written heritage, and among those, still less are used in modern information systems and on
the Web. A good idea of the number of languages used on the Web is provided by the multilingual editions of
Wikipedia, to-date 265 different languages.
If ranking of languages by
importance of their respective wikipedias is a fairly good indicator for the Web influence of
their communities of speakers, it is of course very different from the ranking obtained by
the number of speakers. An interesting indicator for each language is the ratio of number
of articles per number of speakers. For English,
it is about 1 to 200, whereas for Hindi, it is
about 1 to 30,000.
See also: The Wikipedia Challenge
We need languages as RDF resources
In current XML and RDF practice, languages are identified by tags, typically used in the
"xml:lang" attribute. The allowed values of tags are defined by BCP 47. Those language tags are
typically used for rdfs:label or rdfs:comment, and allow the filtering of such elements of
description by language, for example in SPARQL queries. But they do not provide support for
queries such as:
- "Can I find native speakers of Bengali in Berlin?"
- "Which books by Victor Hugo are translated in Arabic?"
- "Is this software documented in Chinese?".
To answer such queries, languages need to be represented as resources, linked to other
resources representing books, people, organizations, places, events, products ... through
object properties. DBpedia provides some information of this kind, like e.g., the countries
of which Bengali is official language. But more can be done, for example simple add-on to
FOAF defining properties enabling to capture information of the level of proficiency of a
person in a language, as defined in Wikipedia:Babel.
See also: Languages as RDF
resources on the ESW Wiki.
Sources
Sources defining languages as RDF resources
Other sources
-
Ethnologue, provides a description page for every
language in ISO 639-3 code list.
- Multilingual Wikipedias, and interwiki links, is the source used to found out labels of a
language in other languages.