Monday, March 14, 2005

DRAFT Translation Link Metadata Tutorial

What’s this about?

Do you blog in more than one language? Do you sometimes post translations of your own or other people’s posts (or excerpts of posts)? Would you like to make your translations easier to discover on the web, and maybe help to train machine translators of the future? If so, read on – “decorating” your blog posts with a few tags and attributes can make it happen.

For some background on translation metadata and why it’s cool, see this post. [Todo – this section needs some expanding]. For a more formal, “Reference Manual” type specification for these annotations, see here.

This note is intended for folks who are annotating their blog posts “by hand”, as well as those who are writing tools to help automate the process. It describes how to annotate some simple links on your blog post which indicate that it’s a translation (or “original”), what part of the post is translated, and who’s doing the translating. These are links you’d probably include anyway, so let’s start with a simple example. [Both of the example posts here and here are from my own blog for now; let’s pretend they’re from different blogs. 8) ]

Basics of Linking to Translations.

Let’s say you have post which translates an excerpt some bike racing news from a French source. Somewhere, you’d probably include a link to that post – say for instance like this:

[Translated (badly) from the original French here.]


Let’s look at the markup for this section:

<p><small>[Translated (badly) from the original French <a href="http://lewy14.blogspot.com/2005/01/phonak-gets-reprieve.html"> here</a>.]</small></p>

The first thing to do is add the ISO-639 language code with an hreflang attribute to indicate the language of the original (French, in this case):

<p><small>[Translated (badly) from the original French <a href="http://lewy14.blogspot.com/2005/01/phonak-gets-reprieve.html" hreflang="fr">here</a>.]</small></p>

Which Document is the Original?

Now, add an indication that the current document is a translation from the orginal linked document: we do this by using the rel attribute, “which specifies the relationship from the current document” (see here) with a “space separated list of link types”. For this we’ll define a link type called original, and use it as a value for the rel attribute.

<p><small>[Translated (badly) from the original French <a href="http://lewy14.blogspot.com/2005/01/phonak-gets-reprieve.html" hreflang="fr" rel="original">here</a>.]</small></p>

Note, there’s a complementary link-type called translation which annotates links which are translations to the language indicated by the hreflang attribute: the implication is that the “current document” is the “original”. The translation link type can be thought of as conveying more “authority”, in that the translation is explicitly endorsed by the author of the original. Finally, note that documents can link each other reflexively with original and translation link types (this is the case with the two examples I’ve posted here and here.

Where within the Document is the translation Excerpt(s)?

To make things even easier for automatic translation harvesting (“rosettabots”), consider wrapping the section of your post which consists of translated text with a div element, and give that element a unique id attribute. The id attribute value acts as a fragment identifier, allowing the “rosettabot” to easily identify the translated text. Let’s say the translated text is wrapped in a div element with an id of “rb-1”. We’d add the following to the rel attribute:

<p><small>[Translated (badly) from the original French <a href="http://lewy14.blogspot.com/2005/01/phonak-gets-reprieve.html" hreflang="fr" rel="original xlt-id:#rb-1">here</a>.]</small></p>

This indicates that the translation is contained in the element indicated by the fragment identifier “rb-1”. (There’s an org-id: link type prefix as well, indicating the fragment identifier (if any) for the original. Plus there’s a couple other ways of specifying excerpts within posts; we’ll cover that in another tutorial but for now refer to the spec and the examples.]

And the Translator is…

Now, an interested reader (or ‘bot) might want to know – who’s doing the translating? In the case of this example, a (passable) translation was constructed by person (me) with lousy French skills, from a risibly bad machine translation (Google – hey, no offense Google, but most current, free, public machine translation services are pretty pathetic). So how can we capture this? Simple. Once again, we take it from the top, this time adding links to both the human and machine translator. And I’ll cut right to the chase this time, since you know the drill: we’ll annotate each link with it’s own special “link type”, a rel attribute value indicating the translator is either a human or machine.

<p><small>[Translated (badly) from the original French <a href="http://lewy14.blogspot.com/2005/01/phonak-gets-reprieve.html" hreflang="fr" rel="original xlt-id:#rb-1">here</a>, by <a href="mailto:lewykatorz@yahoo.com" rel="human">your's truly</a> with the aid of <a href="http://www.google.com/language_tools?hl=en" rel="machine">Google</a>]</small></p>

Wrapping it up, literally.

Almost done. One last step: bundle up all the links mentioned above into their own div section, and give that section a class attribute with the value rosettabot. Why do we do this? A few reasons:
  • It creates a relationship between the links, which is important when there are multiple such sets of links in a single document (the front page of a blog, for instance, or a document with many translation excerpts from different sources.
  • It separates the link text and markup from the main body of the post, which can make it easier for “rosettabots” to separate the “data” from the “metadata”.
  • It serves as a “namespace”, to limit the scope of the “link types” (values of the rel attribute) that we defined above.

Putting it all together – here’s all the annotated links, grouped together within a div element:

<div class="rosettabot"><p><small>[Translated (badly) from the original French <a href="http://lewy14.blogspot.com/2005/01/phonak-gets-reprieve.html" hreflang="fr" rel="original xlt-id:#rb-1">here</a>, by <a href="mailto:lewykatorz@yahoo.com" rel="human">your's truly</a> with the aid of <a href="http://www.google.com/language_tools?hl=en" rel="machine">Google</a>]</small></p></div>

So there you go – not hard at all. There are some more techniques for delimiting excerpts as I mentioned above, but this should be enough to get you started. Any feedback on this tutorial, the spec, the ideas behind them, or my bad French, leave a comment below or email me – thanks!