Friday, November 1, 2013

The n-triples, n-quads and .... n-quinds

Today I'm in a writing mood: Go for a third post.

RDF is basically a triple collection: <subject>, <predicate>, <object>
The object might be a literal.

One of the syntax formats is n-triples. Example from the linked page:

<http://www.w3.org/2001/08/rdf-test/>  <http://purl.org/dc/elements/1.1/creator>    "Dave Beckett" .
<http://www.w3.org/2001/08/rdf-test/>  <http://purl.org/dc/elements/1.1/creator>    "Jan Grant" .
<http://www.w3.org/2001/08/rdf-test/>  <http://purl.org/dc/elements/1.1/publisher>  _:a .
_:a                                    <http://purl.org/dc/elements/1.1/title>      "World Wide Web Consortium" .
_:a                                    <http://purl.org/dc/elements/1.1/source>     <http://www.w3.org/> .

In triple stores, where lots of triples are grouped, there is a need to register the origin of the triplet.
To be able to do so the triplets are extended with a forth element: <context>.
And, you guessed it already I suppose, the syntax format is n-quads.
So now we have <subject> <predicate> <object> <context>

Despite the name "context" this element is meant to represent the provenance of the triplet. Why wasn't it called <provenance>? No idea. But "context" is not wrong: it is to be interpreted as the production context (or source context) of the triplet. This is also called the graph (an RDF graph is a set of triplets).

But does this forth element the <context / provenance / graph> fulfills all our needs?

For me it doesn't. It has undeniable value but it's not enough.

Who has never gone through the tough exercise of changing their primary email address? If you didn't your are either lucky or smart (or both).
Changing your internet provider,  changing jobs etc might be a source of this crude task. (If you happened to associate your primary to your ISP or company).
Thus the
 <I> <email address> <xyz@something.com> has two characteristics:
- it identifies me (the <I>) because nobody has the same address
- it is more or less temporal (it is valid in a period of time)

Especially this last information should be associated with the triplet.

There are ways to do so in RDF but they are far more complicated then necessary.
A triplet (or statement) <subject> <predicate> <object> tells you something about the subject.
If I want to tell something about this statement the easiest way is to identify this triplet in some way and use that as a subject in another statement.

That's where the n-quind comes in (no hyperlink here because I invented the term - as far as I know).

The n-quind syntax:
<subject> <predicate> <object> <context> <id>

Note that the <context> element might become unnecessary because we can capture the same information like this:
<subject> <predicate> <object> <id-1>
<id-1> <context-predicate> <context> <id-2>
Here the <context-predicate> represents the implicit meaning of the forth element of a quad.
But I choose not to confuse existing quad parsers not knowing whether they read a <context> or an <id> in the forth position.

One additional remark on the context. It is mend to represent the provenance but it could be used to "identify" a graph with a single triplet. But I think in doing so we are moving away from the initial intended meaning of provenance. Thus ending up with quinds.

And, yes, long, long ago, in a galaxy (not so far) away, this existed already.


2013-11-02 Edit: Added a forgotten an id in the context-less example above.
2013-11-13 Edit: typos

No comments:

Post a Comment