Monday, November 07, 2005

tagstd:recap <tagstd:recap>

I've had some interesting comments from several folks regarding the tagging standard ( & follow up) I've proposed a few days ago - thank you! I'd like to try to address some of them here.

  • On a general note, I'd like to note that the list of predefined values for the rel attribute is defined in the HTML4 specification. Obviously it does not include "tag" as predefined value.
    The specification however explicitly permits defining new values for the rel attribute (though it recommends that in this case the conventions used will be cited in the profile attribute of the head element).
    So, technically speaking, using rel="tag" is OK from HTML point of view. But, putting aside the spec, why take the risk of using a non-universally-accepted value when we don't have to? I don't really see why "
    <a tagstd:rel="tag" href=..."
    is significantly more usable then
    "<a rel="tag" href=... "
    and that would already be a big improvement in the robustness of RelTag.

Kevin Marks (from Technorati, and who is credited with the concept of Rel-Tag) raised some great points:

Mixing data and representation is by design. If the tags are embedded in the content they don't get detached (link)

I agree, this is a worthy cause. But, there are several different options for embedding the tag in the contents, and some of them provide the ability to embed the tag in the content without mixing data and representation. For example, using
<span tagstd:tags="tag1 tag2 tag3">content here</span>
would get the job done just as well. (though using span has other drawbacks though).

Redundancy is not a problem in practice (I have 18 million examples). (link)

While it's hard to argue with 18 million examples :), it'd be interesting to know - does the Technorati crawler verifies a match between the tag as indicated by the URL postfix and the tag as indicated by the link text?

When you do cut-and-paste, you get cut-and-paste errors, that's one of life's most basic truths, second only to "if you use 1.0, it will crash".

Usability: straw man argument here. Bloggers know how to make links. Adding rel="tag" is very easy to remember. In any case, if a tool is generating it this is moot. XML is no more robust. (link)

Hmpff. I beg to differ. Bloggers != HTML coders.

For people who click "new post", write their thoughts, and then click "Post", the rel="tag" is not only not easy to remember, it's also.. well. They pretty much have no idea what we're talking about :)

Now this of course could be solved by tools. But, if we rely on a tool, why don't we choose a format that assumes a tool (though it can still be added manually).

I am not sure why you're saying that XML is not more robust. XML can be validated against a schema, assuring 100% match between the expected syntax and the actual syntax. And even if a schema is not used, the mere fact that XML supports namespaces solves a lot of problems. Even just defining your own tagstd:rel attribute instead of using the HTML standard rel attribute would be a huge improvement in robustness IMO.

Tag Spaces: these are there for disambiguation, and to provide alternatives. You should pick an appropriate one for each tag, bearing in mind that it should make sense to your readers if clicked on. See your own complaint of redundancy supra. (link)

I am not clear on this point, disambiguation of what? and alternatives to what?

Let's keep in mind that the whole tagging thing is about a flat namespace, in which all tags start as equals. That's the beauty of tagging. And, when we do want to disambiguate them, I doubt it will be done by URLs.

Let's admit it, everyone reading this blog is probably an early adopter to some extent. Let's try to imagine what will the tagging world look like when the dust settles down a bit. I'd guess we'll have 3-4 "tag collections" web sites, with direct tagging and tag-lookup support integrated into the browsers, and lots of non-technical people using this.

I'd guess that for them, the right thing to do is that when you'd like to look up a tag (e.g. find all "things of type X" associated with this tag), they'd want that it would be them who decides - at "run time" - which repository to use to look up the tag, not the one determined by the author.

Scope: this is deliberately left unspecified in the rel="tag" definition.

Just wondering, what is the issue that not defining scope was meant to avoid? Also, as pointed out by Kevin, while the Rel-Tag spec avoid this, the other microformats use facilities such as the class attributeto define the scope. Again, according to the specs, this is ok. But there are practical questions to ask... what would prevent a CSS developer from defining a CSS class "tag" and using it? How would the tools and the browser resolve the conflict? and the biggest question... why not reduce the chances of a conflict from happening at the spec level. A very simple solution would be to use class="tagstd:tag".

Also, it should be noted that the input from Priyantha, from Zoundry, which is the tool I'm using the write this post, express confidence in the Rel-Tag (and the other microformats). Also, Hendrik responded to Eran's comments indicating that in his opinion, a tagging standard should not assume any changes in the (X)HTML specs, and should in fact be independent from HTML.

Technorati : , , , , , , , , , , , , , , : , , , , , , , , , , , , , ,


Kevin Marks said...

I am still at a loss as to what you are trying to solve here.
You are of course free to make up arbitrary markup to express tags, however nothing you have said so far is an improvement on rel="tag" and most of it is worse.

Your proposals have invisible tags. These are bad as they exacerbate the problems of authors know what their content is saying.

Your tool argument is specious too. You can validate rel="tag". Namespaces do not solve anything, they just give you a figleaf for inventing endless arbitrary variations on a theme.

The HTML rel is well defined, and 'tag' is a legitimate extension of it.

The disambiguation of tagspaces can help resolve ambiguous homonyms (the very problem you advocate namespaces in attributes for).

In practice this has not been a big problem, but telling people where to link is not part of the standard.

HTML classes (which you call CSS classes) do not have the same namespace as rel values, so the putative collision does not exist.

Designing specifications around imaginary naming conflicts is not a productive way of working.

I know inventing new specifications seems more attractive than learning to apply existing ones, but do have a read through the microformats process document to understand why we have taken the approach we have.

mary hodder said...

Just so you know, of the 18 million 'tags' currently collected, 5% are actually tags in the sense that they were placed inside the blogpost. 95% of tags currently collected by Technorati are actually categories set up in the back end of the blog. Categories as bloggers use them, act more like large buckets collectors, than small context tags we see that are specific to a post or object.

It matters in that you are proposing tagging ideas that do things that right now, only a small number of users might be willing to do, with a small number of tags, rather than the number you might think, if you work from the '18 million tags' number and assume those are all tags in the blog post.