04029437
m1.35
2.7 Digital Dictionaries and Thesauri
This project will rely heavily on dictionary and thesauri data. There are several digital
encoding formats, sources, and services openly available. In this section we identify
those which are most likely to be of use for the project.
There is some terminology we need to be aware of before continuing.
Antonym
Term that posses the opposite meaning of another term
Canonical
Most basic lexical form of a term e.g.
book
, but not
books
Concept
Area of identifiable knowledge that can be associated with other entities and
concepts
Lemma
Canonical form of a term
Lexicon
Vocabulary of terms usually including descriptions, sometimes restricted to a
specific area of interest
Metadata
Data about data
Morphological
Structure and form of words
Ontology
Data model that describes concepts (usually within a specific knowledge-
domain)
Pertainym
Pertaining to
Related
Semantic relationships are described as related, broader, and narrower
Semantic
Meaning of a term or set of terms (sentence)
Synonym
Term which has the same or similar meaning as another term
Synset
Set of synonyms
Term
Word or phrase
2.7.1 SKOS
SKOS (Simple Knowledge Organisation System) is a format used to encode data around
concepts. Miles, a key member of the SKOS community, describes SKOS as allowing us
to [Mil05]:
"- identify concepts with URIs
- label concepts with literals (e.g. `love'@en), symbols, sounds? other?
- document concepts with definitions, examples, scope notes, history notes, editorial notes...
- semantically relate concepts
- organise concepts into concept schemes, and into smaller meaningful groupings (`arrays')
- use concepts to subject-index documents"
It is worth noting that at the time of writing SKOS has yet to publicise a final
specification. The current spec. was written in 2005 and is a working draft [MB05].
Central to SKOS is RDF (Resource Description Framework). RDF is a metadata model
that builds on XML and URI technology and provides a format in which data can be
encoded and optionally referenced using URIs.
RDF is written in XML and is intended to provide a logical hierarchical format that is
easily understood by machines. RDF attempts to provide a semantic view of data so that
machines can understand how resources relate to one another. RDF documents are not
designed to be viewed directly on the World-Wide-Web [HK07, W3S07, BM04]
.