Friday, November 04, 2005

More on a tagging standard

More on a tagging standard.

Rel-Tag (a.k.a relTag) is a de-facto standard for tagging a specific page. Rel-Tag was defined by Tantek Çelik, following a concept from Derek Powazek and Kevin Marks. It is part of a larger collection of microformats - a wonderfully practical approach to building a practical semantic web.

To tag your page with a Rel-Tag, you just include a link and use the attribute rel="tag" on the link:

<a href="" rel="tag">sample</a>

It doesn't matter what page the URL points to, as long as it ends with the tag name.

Incredibly simple. Paste this piece of HTML into your page, and your page is tagged, and indexed as such by services that support this convention, such as Technorati.

I do share Kevin Burton's feeling though, that the Rel-Tag specification is somewhat lacking (or, as Kevin defined it, under-specified).

Points to consider:

  • Mixing Data and Representation
    • relTags are easy to create by hand (at least, assuming that you know HTML). They're even easier to generate by an application that supports them (such as Zoundry). But once generated, relTags are not just metadata, they're actually part of your content. So it becomes very hard to build a tool that edits them. You might have used these links in your content. You might have moved them around. You might want to change the link text but keep the tag. It becomes messy. Last time checked the problem of separating data from representation in a practical, widely supported manner was already solved.
  • Redundancy
    • The tag is repeated twice, once in the URL and once in the text. Wanna bet how quickly these two get out of sync?
  • Usability
    • relTags are very easy to use, if you know how to access and edit your HTML. But, if you know that, there isn't much difference between editing a "<a href" tag to add the rel="tag", and between cutting and pasting a piece of XML to achieve the same goal. And if you don't know HTML, both are equally inaccessible. So, end-user usability being pretty much the same, why not choose a more robust solution?
  • Tag Spaces
    • According to relTag, the actual URL you use point to any page, as long as that page is a "tag space", loosely defined as "a place that collates or defines tags". The spec goes on to inst that tag spaces can be used to provide a specific meaning to the tag. What does this mean? Is it a method to tag tags? A categorization system? A meaningless technical detail? These questions are too big to be left unspecified. The obscurity of this is leading to people (e.g. me :) repeating the tags multiple times, for each one of their favorite "tag spaces" (and to some cool gizmos).
  • Scope definition
    • relTags are used to tag the text that includes them. They do not carry with them a scope definition, so it would be hard for tools (or people as a matter of fact) to understand if these tags refer to the entire page, or to a specific section in the page (or post in a blog).

It'd be great if a more general and robust tagging standard would support relTags for backward compatibility.

I've been playing with the tagging format I've suggested in a previous post, and found some changes that should be applied to it in order to make it more useful. But this post is already getting way too long, so more on that later :)

Technorati : , , , , , , , , , , , , , : , , , , , , , , , , , , ,


Kevin Marks said...

Lots of points there - consider adding the substantive ones to the issues page
* Mixing data and representation is by design. If the tags are embedded in the content they don't get detached.
* Redundancy is not a problem in practice (I have 18 million examples). The URL provides a defined encoding as it is in escaped utf8, which avoids encoding issues in the document
* Usability: straw man argument here. Bloggers know how to make links. Adding rel="tag" is very easy to remember. In any case, if a tool is generating it this is moot. XML is no more robust
* Tag Spaces: these are there for disambiguation, and to provide alternatives. You should pick an appropriate one for each tag, bearing in mind that it should make sense to your readers if clicked on. See your own complaint of redundancy supra
* Scope: this is deliberately left unspecified in the rel="tag" definition. Other microformats that use rel="tag" such as hReview and the rel="tag directory" compound can specify the scope appropriately.

Priyantha said...

Based on the current relTag format, it should be easy for tool developer to provide some UI to edit a tagged URI since (as Kevin noted), relTag's data (URI) and representation (node text) are attached and available to the tool. In, Zoundry, when you edit an existing link (with relTag), the tag words and tagspace are displayed in two different fields (performing necessary UTF-8 uri encode/decode).

We (Zoundry) are also experimenting with microformats - specifically the hReview format for product recomendations. Currently, if you use Zoundry's toolbar plugin and perform a "BlogThis" on a supported merchant product page, the content is formatted as hReview.

YanivG said...

Kevin, I'd love to update the issues page, but for some reason the create new user page on the wiki returns an error message.

Adam said...

I had a similar problem, check the username you are trying to create. I tried to create awillard, but had to use Adam Willard instead.

I have proposed to my mananger to use regular expressions and an interface that would allow you to select the word and it would wrap the word like <tag>SelectedTag</tag> and store it in the db. On page render (either in SQL Server or ASP) a regular expression would run to similar to:

<tag>monopoly</tag>→ <tag>(.*)?</tag> → <a href="/tag/$1" rel=tag>$1</a> → <a href="/tag/monopoly" rel=tag>monopoly</a>

Just an idea but I think it could work, and the user would not author would not have to know how to create the specific link and the terms during cut and paste would not be an issue