lingvoj.org

Dedicated to the publication and use of multilingual RDF descriptions of human languages, to be used as Linked Data.

What's up?

(2008-06-10) Updated Wikipedia languages and labels. Some stats : 519 languages, 1585 values of ISO 639 codes, 2084 alternative URIs, and 14186 labels in 244 different languages. All languages have at least an english label and a matching wikipedia article in english. Most represented languages in labels are German (424), French (397), Spanish (374) ... more. I intend to publish 'monolingual files', for example all languages with a French label etc.

(2008-01-28) Release of Lingvoj Ontology v1.2, declaring the "Lingvo" class as subclass of "LinguisticSystem" as defined by the new release of Dublin Core terms in RDFS.

(2007-11-29) Release of Lingvoj Ontology v1.1, including the Translation class, allowing to declare facts such as : The resource A in original language L1 has beeen translated into resource B in target language L2, by the the translator Z. Examples of use for translations of W3C recommandations.

(2007-11-09) Started mining Open Cyc URIs, and added 270 mappings with lingvoj URIs. More to come.

(2007-10-26) Started adding some depictions, using mostly the common but certainly bad practice to use national flags to identify languages. This practice unfortunately conveys a confusion between national identity and language identity (a cause of many wars and conflicts all along history) which should be avoided. But currently there is not much alternative, although some languages have representative organisations who have defined official flags or logos independent of national flags, most languages have no official depiction.
I currently use shameless hotlinking to SVG files from Wikimedia Commons Flags, although this is yet another bad practice.

(2007-10-09) Eventually, with the precious help from the Linking Open Data community, achieved publication with proper content negociation, which works well with Firefox. For some reason this content negociation is not well supported by Internet Explorer.
Note that this results in new URIs for languages. URIs used in previous versions are no longer supported. Cools URIs never change, which means the previous ones were not cool, and the new ones should be stable from now on.

What's in there?

  • http://www.lingvoj.org/lingvoj.rdf is the complete RDF file gathering currently the description of more than 500 languages, including all languages defined by ISO 639-1 and most of ISO 639-2 codes (a few exceptions remain, for which Wikipedia articles are not consistent with ISO classification), and all languages having an active Wikipedia. Descriptions include all ISO 639 codes when available, owl:sameAs links to the URIs defining those languages in different RDF publications, and labels available in multilingual Wikipedias. More than 14,000 such labels are available in the current version, in over 200 languages.
  • Each individual language is identified by a URI in the namespace http://www.lingvoj.org/lang/. The fragment identifier is a language code conformant to BCP 47 .
    The language code is a two-letters code defined by ISO 639-1 when available, or three-letters ISO 639-2, or 639-3 default the previous ones. It can also include regional tags, for example en-us or en-gb. Such codes are used as values of the "xml:lang" attribute, and also as the prefix of the Wikipedia in this language.
    For example "zh" is the code for Chinese language. Therefore this language is identified by http://www.lingvoj.org/lang/zh .
    Content negociation is used to redirect this URI either to a human-readable HTML page http://www.lingvoj.org/lingvo/zh.html , or to a RDF page containing the formal description http://www.lingvoj.org/lingvo/zh.rdf .
  • The Lingvoj Ontology is used, declaring a "Lingvo" Class, its attributes such as ISO 639 codes and the way to link languages to FOAF resources (properties having Lingvo class as rdfs:range). Some examples of the use of those properties are available in my FOAF profile.

Disclaimer: Resources provided here are work in progress and have no official status.
Please report bugs, remarks and questions to Bernard Vatant, or on the Linking Open Data mailing list
.

What does "lingvoj" mean?

"Lingvoj" means "Languages" in Esperanto. It's the plural form of "Lingvo".

Why do we need that?

Languages are an endangered heritage

According to Ethnologue, the number of human languages currently used in the world amounts to almost 7,000. About half of them is on the verge of extinction. Only a small fraction is supported by some writing system and have written heritage, and among those, still less are used in modern information systems and on the Web. A good idea of the number of languages used on the Web is provided by the multilingual editions of Wikipedia, to-date over 250 different languages.
If ranking of languages by importance of their respective wikipedias is a fairly good indicator for the Web influence of their communities of speakers, it is of course very different from the ranking obtained by the number of speakers. An interesting indicator for each language is the ratio of number of articles per number of speakers. For English, it is about 1 to 200, whereas for Hindi, it is about 1 to 30,000.

See also: The Wikipedia Challenge

We need languages as RDF resources

In current XML and RDF practice, languages are identified by tags, typically used in the "xml:lang" attribute. The allowed values of tags are defined by BCP 47. Those language tags are typically used for rdfs:label or rdfs:comment, and allow the filtering of such elements of description by language, for example in SPARQL queries. But they do not provide support for queries such as:

  • "Can I find native speakers of Bengali in Berlin?"
  • "Which books by Victor Hugo are translated in Arabic?"
  • "Is this software documented in Chinese?".

To answer such queries, languages need to be represented as resources, linked to other resources representing books, people, organizations, places, events, products ... through object properties. DBpedia provides some information of this kind, like e.g., the countries of which Bengali is official language. But more can be done, for example simple add-on to FOAF defining properties enabling to capture information of the level of proficiency of a person in a language, as defined in Wikipedia:Babel.

See also: Languages as RDF resources on the ESW Wiki.

Sources

Sources defining languages as RDF resources

Other sources

  • Ethnologue, provides a description page for every language in ISO 639-3 code list.
  • Multilingual Wikipedias, and interwiki links, is the source used to found out labels of a language in other languages.
This material is Open Data Linking Open Data