« Chinese Language Files for FeedDemon 2.0.0.25 | Main | Spyder Spots a Memetracker »

Tuesday, August 29, 2006

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

I wasn't talking about all contexts for all sets of feeds, rather across a single publications, the NY Times, where it works. It also worked for the BBC.

Having written my own share of syndication software I have to completely concur with Nick.

Feeds really need a guid/id to help consuming software avoid potential duplication issues. The title, link, and pubdate can even fail if, say for instance, the feed has no timestamp or the feed publisher reorgs their site and changes the link.

Ah - sorry, Dave, I missed that. So I guess we're in agreement that relying on title for de-duping only works for a certain subset of feeds.

I'm a happy Feeddemon customer but I would love an option on each Feed to supress items that have the same title and are on the same date. I have lots of problems which duplicate items in feeds.

Michael, could you share the URLs of a couple of these feeds?

Nick, it's happened on quite a few feeds. I did report it to your support once before. I don't think the problem is with FeedDemon though. Often I can see that an item has been changed by the author and appears as a duplicate. It hasn't happended this week except for one feed, but they had moved to WordPress and the items all had new dates.

I must say that it seems to be happening less and less lately, so hopefully the feeds are improving.

Nick, Mike,

I use RSS a lot however I have never setup an RSS feed (other than automagically with software) so I know nothing technically regarding RSS feeds so forgive me if my question is a little pointless :)

Don't each item in an RSS feed have a unique ID and so if you get the same item from 2 or more sources they are not listed as you have already seen then? This is how things work regarding usenet posts and it works pretty well IMHO. A number post fixed with the domain name of the original feed or something would work fine, eg [email protected] for this post then [email protected] for the next, etc.

Would this not work with RSS? As I said I have very little knowledge however usenet has many more posts per day than each site has new items posted to thier rss feed (even Scoble can't post that much heh).

Or does something like this already exist but it doesn't work? Or it does work if people use it but nobody uses it :)

Cheers,

Morgan

Nick,

You point is *very* well taken. More than half of my common feeds wouldn't work with Winer's "why the title works" solution. If it were just news, I wouldn't mind a few false positives. But for other feeds I'd sure hate to miss.

Morgan, items in RSS 2.0 and Atom feeds are *supposed* to have unique IDs, but they often don't (and RSS 0.9x feeds never do).

I subscribe to a blog where the author posts regular "Quote of the Day" (that's the title) posts.

Hi Nick,

I just had the problem again so I thought I'd post the feed:
http://dilbertblog.typepad.com/the_dilbert_blog/index.rdf

The post "Human Behavior" appeared as an unread item. I'm 100% sure all items in the feed where marked as read. I checked and there where was no duplicate of the item in FeedDemon. Normally when I have this problem there is a duplicate of the item, but about 10% of the time it seems to just have "forgotten" that the item was read.

Regards
Mike

Nick: You are seeing Atom feeds without an id? I can see where many (most?) RSS feeds do not have a guid, but Atom has always made id an explicit requirement from the beginning. I ask because it would surprise to learn a significant number of Atom feeds exist without one.

I just had exactly the same problem on the same feed. This time the post "Silent H" became unread, and there was no duplicate item.

Mike

Timothy, I have seen some Atom feeds without IDs, but admittedly very few.

I have here the same problem, that title is not enough. Even the publications are coming from dpa (Deutsche Presse Agentur). And a lot of german newspapers rely on the dpa messages. But it is not straight forward to get unique titles and rss items from there messages, because they are now and then corrected and sometimes deleted. So it happens, that the rss feed is generated the the original posting is dropped. So it becomes very hard to determine if the message is the same. And it is really not enough to check the title in newspapers.

So the idea isn't necessarily to use in RSS 2.0, rather we have to determine uniqueness on a per-feed basis? That's the strongest justification for using Atom I've heard in a long while...

The comments to this entry are closed.