Wednesday, March 09, 2005

DRAFT Translation Link Metadata Profile

[UPDATE 3/14] Minor edits and a few "issues" added.
[UPDATE 3/15] changed rel values to original and translation

The following is a brief but reasonably formal specification of a proposed profile for translation link annotation metadata. This profile specifies the elements and attributes proposed to annotate links to (human language) translations of HTML documents (and excerpts of such documents) with metadata. This metadata can specify the language of the translation, whether the translation is endorsed by the author of the original, the identify the translator. It can also specify (or give hints about) the range of text translated. The metadata profile can be extended to satisfy other requirements. An informal requirements document for translation link metadata can be found here.

This brief specification assumes the reader is familiar with the XHTML Meta Data Profile from the Global Multimedia Protocols Group. XMDP defines a format to define “properties” and the values that those properties can take on. This profile format extends the notion of “property” to cover XPath expressions. Nested nodeset specifications can be viewed as location steps, allowing a complete XPath expression to be "read off" from the Profile. Using these techniques, richer Semantic XHTML constructs can be specified more precisely, with an appropriate level of profile complexity.

The following is a DRAFT specification for the purpose of discussion and review. In particular, the XPath expressions need to be tested.



Required. The links to translated text and the link to the translating entity are contained within a div element, which is marked with a class attribute which contains "rosettabot". The translation link metadata is contained within a div element in order to group together the link to the translated document (or excerpt) and the link(s) identifying the translating entity(ies). This is especially convenient when there are several translation link blocks within the same document. The grouping also serves to separate these links from the text being translated. Finally, the grouping acts as a kind of informal “namespace” (not to be confused with formal namespaces in XML), delimiting the scope of certain tokens used within rel attributes on the links.
[Issue: should we also allow span elements to serve as containers?]
[Issue: earlier examples contained the "urn:" prefix on "rosettabot". Should Semantic XHTML profiles which define class attribute values ("class names") employ a urn naming scheme to prevent collisions?]


Required. This link is the link to the translated text, and will always have a rel attribute which contains either "translation" or "original"


This is the link to a document which represents a "parallel text" of the containing document, expressed in another language.


The ISO-639-2 language code for the language of the linked document.


The value of the rel attribute is a whitespace delimited list of tokens. The meaning of the presence of these tokens is specified below. Other tokens from other profiles (e.g. XFN) may be present.

contains(., "original")

Indicates that the linked document is the original document, and that the containing document is the translation. Either "original" or "translation" is required to be present in the rel attribute value. [Issue: the “translated” document is presumed to have a language code somewhere – do we want to rely on this, or for convenience require it be specified as well?]

contains(., "translation")

Indicates that the linked document is a translation of the containing document, which is the original. This confers somewhat greater authority in that it represents an endoresment of the author of the original document. When a back link in the translated document is present, even more authority is conferred. Either "original" or "translation" is required to be present in the rel attribute value. [Issue: the “original” document is presumed to have a language code somewhere – do we want to rely on this, or for convenience require it be specified as well?]

contains(., "org-id:")

Optional. If at all possible a fragment identifier which ideally targets a container (e.g. a div element) within the linked document which contains original or translated text. The characters after the ':' and before the next space character specifies the fragment identifier (if any) which contains or anchors the original text excerpt. Note that the "original" text is the linked document if original is present in the rel attribute value, and the containing document if translation is present.

contains(., "xlt-id:")

Optional. Like "org-id:", but indicating the fragment identifier of the original document.

contains(., "org-xp:")

Optional. Like "org-id", but followed by an XPath spec which points to the original text excerpt in the original document.

contains(., "xlt-xp:")

Optional. Like "xlt-id", but followed by an XPath spec which points to the translated text excerpt in the translated document.






Optional. This is a link identifying a human translator, or an entity (corporation, non-profit, etc) which did the translation.


Web page, email address, etc.




Optional. This is a link identifying a machine translation program. Note that both a human translator and a machine translator may be specified, in which case the translation should be considered to be a machine translation which was "cleaned up" by the human entity.(Note: while technically optional, the translating entity link(s) really should be present).


Ideally, specifies a web page where the machine translation program can be accessed.



child::comment()[starts-with(., "rosettabot")]

Optional. Translation hints may be present to aid rosettabots which are attempting to strip extraneous text and markup and identify aligned parallel texts. Alignment hints are present as HTML comments. (Note: Processing Instructions were examined for this role and rejected as some current blogging tools permit HTML comments to be embedded in a friendlier manner.) Alignment hints may be present with or without fragment identifiers for the original and translated text (see above). Note that a rosettabot may succeed in identifying parallel texts without any hints or fragment identifiers at all, but will require substantially more sophistication.

contains(., "org-hint-begin:")

Indicates an alignment hint, which speficies the first few words of the original text excerpt as a double quote delimited string following the ':'.

contains(., "org-hint-end:")

Indicates an alignment hint, which speficies the last few words of the original text excerpt as a double quote delimited string following the ':'.

contains(., "xlt-hint-begin:")

Like "org-hint-begin", except for the translated text.

contains(., "xlt-hint-end:")

Like "org-hint-end", except for the translated text.



For a couple quick examples, view the HTML source of this original document and this translation. For a brief tutorial outlining how to annotate your links according to this profile, see here.

1 comment: