« Amnesty International redesign, RSS feeds | Main | Dexter Cartoon for Sept 10, 2004 »

Thursday, September 09, 2004

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Extremely minor but as long as you're posting code... I'm curious as to why you don't use string.Empty instead of "".

This was actually the first piece of ASP.NET code I've written - so I wasn't aware of string.Empty :)

Heh, the idea sometimes touches two brains at the same time. I was thinking along the lines of query string params to pass date, but this is generally non-standart, so http header idea is probably the best.

I'm in Russia on paid-per-Mb cable, so I really feel the pain as my blog list grows bigger ;)

Heh, the idea sometimes touches two brains at the same time. I was thinking along the lines of query string params to pass date this morning when I stumbled on your article, but this is generally non-standart, so http header idea is probably the best.

I'm in Russia on paid-per-Mb cable, so I really feel the pain as my blog list grows bigger ;)

Nick,

I've been thinking about scoble's recent blog entry as well. In fact I went back to check my log entries, and he has a point....

Here's a thought for feed deamon that maybe you haven't considered. Use an approach similar to Bloglines, and aggregate content for all your users. You can then push the status to your desktop clients using what ever method you like. You could use a more efficient protocol between the client and your server.

This allows the feed demon server to make one status poll for all the users. If you look at your logs you'll see that this is exactly what bloglines does, and I think is a pretty decent approach.

This changes your model of selling only desktop software a bit, but it is worth giving it some thought.

Excellent! I'm working on a ASP.NET app that consumes loads of feeds, but I need to optimize the retrieval process. Atm, I'm using Atom.NET and RSS.NET (sourceforge projects) to load each feed and check it's last modified date. I guess it would be much more efficient to manually check each If-Modified-Since header before actually loading it. However, this field is null on every feed I've tried so far. Any idea why? Here's how I load the ModifiedSince header (c#):

HttpRequest r = new HttpRequest(null, feed.FeedURL, null);
string lastmodified = r.Headers.Get("If-Modified-Since");

-kenny

The flaw in this approach is that you're relying on the agreggators supporting conditional GETs, which was the problem in the first place. Those clients that don't support conditional GETs will never send an If-Modified-Since header and therefore will always receive a freshly generated copy.

mmj: I'd not be suprised to see cases where unconditional GETs for rss feeds get returned either an error or some static data suggesting that an updated aggregator is required.

I can see a scenario in which this would break quite badly.

A shared cache (i.e. an ISP's proxy server) requests the feed for the first time, and stores the result.

Somebody else at the same ISP requests the same feed within 15 minutes. Since they checked the feed longer than 15 minutes ago, the cache will see that its own copy is fresher, so it validates its copy (i.e. sends a request with a Last-Modified matching its own copy). It receives a 304 response, and sends that to the client. It updates its own copy to reflect how recently it checked for freshness.

A third person requests the feed, again within 15 minutes. The cache notices that it's got a fresher copy, validates it again, and the same thing happens again and again.

As long as at least one user of this proxy requests the feed in each 15 minute period, no users will ever receive an up-to-date feed.

You can't fix this by switching off public caching, as that would undermine your efforts to save bandwidth completely.

It would be better to set an Expires header for 15 minutes into the future. That way nobody should be requesting feeds more often.

Actually, it's already happening and yes, it does break when shared caches come into play. See http://nick.typepad.com/blog/2004/05/rss_abuse_and_s.html">http://nick.typepad.com/blog/2004/05/rss_abuse_and_s.html">http://nick.typepad.com/blog/2004/05/rss_abuse_and_s.html and the "blog entry" mentioned there, for example.

I'd read it, but forgotten about it!

Well that's not quite the same issue, as Nick's proposal here isn't about banning identical IP addresses. It's the same underlying principle though; the wrong people are being left out because the system can't deal with shared caches effectively. It's just the "identifying mark" is the Last-Modified header rather than the IP address.

mmj, I agree with Gywn. If I was shelling out $$$ to pay for bandwidth consumed by aggregators that fail to support essential features like conditional HTTP get, I'd ban them from retreiving my feeds. I wouldn't be surprised to see this happen with some high-profile feeds before too long.

Jim, are you saying that the proxy server would use its own Last-Modified date rather than the one FeedDemon uses in its HTTP request?

Please don't mind the "with" part, it's a part of a whole post, but MT uses excerpt to send trackbacks, so I suppose I can't add links there.

Sorry.

A nit: DateTime.UtcNow()

The comments to this entry are closed.