« The "feed" URI scheme | Main | FeedDemon and well-formed Atom feeds, Part II »

Monday, January 12, 2004

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

I think this is a great idea. It's not about using market clout to destroy the future of an upcoming standard; it's about making sure that standard is easy for anyone to parse and maintains a standard of being well-formed XML.

It might take a bit of a rough start, but if the majority of aggregators enforce this then everyone will be better off.

I just hope the rest of the community doesn't decide the opposite should be true and FD / NNW lose customers as a result.

So one day I'm at lunch with C and we are explaining news readers to another person in the group. Sites produce a feed, a news reader reads the feed, you bring the news to you. So on and so forth. Then we got to talking about using smart quotes on the site. C flips out and exclaims we can't use smart quotes on the site because when he copies and pastes from our site into blogger it spits out broken xml and 50 people send him email telling him that his feed is broken. For many, many people with very few subscribers (me) this won't be a problem. But C is Cory Doctorow, posting on boingboing.net. A very smart guy, with very little time. He wants to do the right thing, but if his publishing tool is broken, he is going to be the one who pays for it.

I'm not trying to say that enforcing valid XML is a "bad" thing, just that it's going to affect people in an adverse way that you might not foresee. If all publishing tools enforced validity, I guess it wouldn't be a problem. But I always think about Zeldman and his hand rolled xml feed and Cory with his smart quote copy-n-paste problem.

I think that its the best option.
I'm working on a project were we try to aggregate feeds from several academic institutions and the lack of DTD for RSS is a "delayer"... it has brought us to this world of "funcky" feeds..

Look at the BBC.. they even created their own DTD :)

I'm really looking forward for the Atom spec.

Thanks so much for pushing forthrightly for standards-compliance and well-formedness. I'm a happy customer. :D

Great news, Nick. As I posted over at Brent's, the full definition of what constitutes a valid Atom feed is still under discussion. My own preference would be the switch between accept/reject happens at the level of DTD validity, errors in content being considered non-fatal. Your opinion on these (and any other) issues would be valued on the atom-syntax list.

It will be interesting to see how Postel's Law works in this situation.

Ironically, this page is not valid XHTML. In fact, it is so incredibly invalid that the W3C validator presents an error I've never seen before, and doesn't even show the offending source.

I also note that you are using Typepad, which lists as one of its selling points strict standards compliance (producing valid XHTML by default). But something you have done (I honestly do not know what) has slipped past Typepad's defenses.

Luckily, my browser has been *specially designed* to ignore author errors like this and display the rest of your page anyway. And thank goodness for that, otherwise we would not be having this stimulating conversation.

There are no exceptions to Postel's Law.

OK, I dug into the source for this page and figured out the problem. It's the trackback you received from "Znarf Infos" -- it contains characters that are illegal according to your specified character set. That is, in fact, what the W3C validator was trying to tell me, but I had never seen the error message before and didn't understand it.

Now let's pretend that you were doing XHTML the right way. And by that, I mean that you were serving it with a MIME type of application/xhtml+xml. This triggers an unforgiving XML mode in Mozilla; if your page is not well-formed XML, it will display an XML debugging error instead of the contents of your page. This is analogous to the behavior you are suggesting incorporating into FeedDemon.

Let's further pretend that every browser works this way.

Now let's pretend that you were hyper-diligent with your smartquotes and your ampersands, and that you validated your page immediately after authoring it, either with the W3C's validator or by viewing it in Mozilla (making sure the page was visible in its strict XML mode). Or perhaps such validation could be built into Typepad itself, and it would not let you post an invalid page. Regardless, the page was valid when you posted it.

So you have done everything right, and yet the page is now invalid, due to circumstances completely and utterly beyond your control. Because someone linked to your page, and your (uncustomizeable) publishing tools are buggy, and they wrecked your page even after you did everything right.

What happens next? Well, first of all, the discussion certainly stops, since everyone is using unforgiving browsers and all they see when they visit this page is an XML debugging error. Second of all, some of those people will be frustrated enough to hunt down your email address and tell you that your site is broken. Some of them might be intelligent enough to give you a URL; others will just curse at you and tell you you suck. (Surely you understand what I'm talking about -- you've dealt with end-user bug reports before.)

Keep in mind that, during all this, you don't have the slightest idea what's going on, since the page was valid when you authored it. Does Typepad even give you the option of deleting or editing trackbacks? I'm guessing it lets you delete them. Of course that assumes that you can figure out what the problem is in the first place.

But wait! Let's pretend that the administration page displays the trackbacks before it lets you delete them. But of course you can't see that page, for the same reason your users can't view your published page -- bad characters snuck in. Now you've got a catch-22, and you're sending emails off to Six Apart saying "WTF? I'm totally locked out of my admin page, and my readers are screaming at me, and why the heck am I paying for this grief?"

This is the world you're advocating, a world where clients enforce data quality, no exceptions. Think about who it hurts, before you go jumping into it whole hog.

Mark has a point,
the only aspect that can make a differente is that Atom is made for machine to machine interoperability; and HTML for machine to human.

I think that an Atom viewer for humans should have an option to be "tolerant" or "unforgiving"... On this point Tim Bray has a point..

my 2cents.

The whole reason we have XHTML(XML) is that it was impossible to parse/reuse HTML docs consistantly because they were unpredictable in structure due to every syntaz mistake under the sun being tolerated.

Now, if Atom, and RSS are indeed XML, then in no way shape or form should they be tolerated in any other form; i.e. non-validating XML. If they are, then we just lost the best feature of XML, and to some extent, XML as a standard: predictable parsibility.

"Atom, however, is a new format, and there's a chance we can get it right."

This is a dangerous way of thinking, that gets proven wrong again and again. (Netscape: "what we need to do to win the browser war is rewrite the code. This time we'll get it right.")

Sure we may have learned things since the last format, but new things have happened since then. People haven't gotten smarter, more responsible, or less lazy since the last format. So there's certainly no way we can "get it more right" on the human level.

On the other hand, computers got faster and parsers got better, so make them do the dirty work of suffering through poorly-formed XML.

The comments to this entry are closed.