An Attention Namespace for OPML
In a recent post I said that OPML would be a great format for sharing attention data, but I wasn't sure whether this would be possible due to uncertainty over OPML's support for namespaces. This afternoon I talked with Dave Winer and Steve Gillmor, and to make a long story short, I'm happy to report that namespaces will be supported by OPML. So an attention namespace for OPML seems like a fine idea at this stage.
As I mentioned previously, FeedDemon already stores attention data in OPML, but it uses a proprietary fd: namespace which relies on attributes that make little sense outside of FeedDemon. What I propose is that aggregator users and developers have an open discussion about what specific attention data could (and should) be collected by aggregators.
Although there's a lot of attention data that could be stored in OPML, my recommendation is that we keep it simple - otherwise, we risk seeing each aggregator support a different subset of attention data. So rather than come up with a huge list of attributes, I'll start by recommending a single piece of attention data: rank.
We need a way to rank feeds that makes sense across aggregators, so that when you export OPML from one aggregator, the aggregator you import into would know which feeds you're paying the most attention to. This could be used for any number of things - recommending related feeds, giving higher ranked feeds higher priority in feed listings, etc.
Although user interface and workflow differences require each aggregator to have its own algorithm for ranking feeds, we should be able to define a ranking attribute that makes sense to every aggregator. In FeedDemon's case, a simple scale (say, 0-100) would work: feeds you rarely read would get be ranked closer to zero, while feeds you read all the time would be ranked closer to 100. Whether this makes sense outside of FeedDemon remains to be seen, so I'd love to hear from developers of other aggregators about this.
Beyond rank, what other attention data do you think aggregators should collect? And how should they use that data to serve you better?

Posts
I just have a note for aggregator developers: you need to make sure that high-traffic feeds get treated properly. E.g. when I'm subscribing to an aggregated feed such as http://del.icio.us/rss/tag/rails I'll only click on a few of the feed's items, yet that doesn't mean this feed isn't as important to me as a feed where I read every single article.
I'm not sure how one could properly handle this situation on the OPML level though -- maybe there need to be both a metric of clicks/item (which is the rating you propose) and a metric of items/time; or something else that would allow aggregators to cope with high-traffic feeds.
See also: http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/
Nick, I'm looking forward to your implementation, I think it's a great idea.
Posted by: mardoen | Thursday, November 17, 2005 at 07:21 PM
That's an excellent point, Martin, and it's one that a very sharp FeedDemon user named Radek has raised in our forums:
http://www.newsgator.com/forum/shwmessage.aspx?forumid=7&messageid=9245
Taking clicks/item into account when determining rank makes a lot of sense. Perhaps the attention namespace should include attributes for (1) the number of items in the feed when it was exported (2) the number of clicks that feed received (3) the date the user subscribed to the feed. These values could help aggregators determine attention without forcing them to use a specific algorithm.
Posted by: Nick Bradbury | Thursday, November 17, 2005 at 08:13 PM
Thanks for the link, there are some really excellent ideas in there!
Posted by: mardoen | Thursday, November 17, 2005 at 08:31 PM
Oh, one thing I forgot to mention in my previous comment is that the newspaper views used by many aggregators (including FeedDemon) make tracking item clicks tricky at best.
For example, in FeedDemon I'll often read an entire page of news items, and I never click on any of them - but I did read them. This is why FeedDemon also tracks clicks at the feed level, and includes that in its attention algorithm.
Posted by: Nick Bradbury | Thursday, November 17, 2005 at 08:35 PM
I'm wondering if there's any way to track changes in rank over time. Putting it in physical terms, if rank is equivalent to speed, I'd like to measure acceleration (and deceleration.)
Posted by: Ed Batista | Friday, November 18, 2005 at 12:31 AM
Nick, lets try and coordinate our tags and namespaces for OPML in your stuff and BlogBridge. Can you contact me dirrectly over email? Thanks! Pito
Posted by: Pito Salas | Friday, November 18, 2005 at 08:33 AM
Nice to see you here, Pito! I'd really like to keep this discussion public - any chance you'd be willing to discuss this here (or in your own blog) instead of by email?
Posted by: Nick Bradbury | Friday, November 18, 2005 at 09:22 AM
Ed, how rank has changed over time would certainly be useful to know. I'm wondering, though, how it could be expressed in a way that's not tied to a specific application or algorithm?
Posted by: Nick Bradbury | Friday, November 18, 2005 at 09:26 AM
Nick, do you please you have an example of the data you're using (proposing)?
I'm particularly interested in seeing what advantage there is in putting attention terms in namespaces inside OPML, compared to say RSS. The reason I have doubts is that unlike RSS, which generally follows the XML+namespaces approach to structure and semantics, OPML has its own interpretation/extension point, the type attribute (see http://dannyayers.com/archives/2005/11/18/opml-revisited-2/ ). )
Posted by: Danny | Friday, November 18, 2005 at 10:08 AM
I was discussing this the other week with my brother. We both read our news in a newspaper format and have difficulty with ensuring our collections are completely read. I use bloglines and prioritise by moving feeds up and down between collections as my ranking system. Sometimes I'll read a collection that has many unread feeds only to get through half of them. I want to tell my reader the point at which I've managed to reach in that session.
So, I'd like to see a news reader that has the ability to mark feed items in my collections as read as I scroll and view them on screen, Or an option to click and say "I've read to here". I see this viewed item information combined with a feed subscription date, a "last read" date, as well as a click through counter, and my own human ranking - like blogbridge does - all used to generate my rank. On top of this the ability to filter by rank methods.
I'd also like the scroll view counter for items to be intelligent and know when I've skipped content by scrolling too quick to read items.
As for the namespace. Rank, yes. Definately. Please.
As for number of items in a feed on export, I don't believe is needed. The aggregator that imports can get that info from a "last read" date stamp for a feed instead.
The number of clicks that feed received should be related to its rank aswell. But for the sake of tracking what users value most maybe its worth including. But on the flipside, partial feed content getting more click thoughs doesn't mean it's any more important and deserving in attention than those I might simply read and not click through to.
I'm just not sure clicking through to comment or to learn more about a site can be seperated in attention terms to clicking through to simply read the entire article.
A definate yes to click through counting only on full feeds for me though. Not easy to do I'd think.
The date the user subscribed to a feed, definately. Useful for plotting users feed reading history. Imagine tracking your interests over time and charting that by topic. :) I could remanisce all the feed reading phases of my life. lol.
So my feed namespace recommendations would be; Rank, Last Read Date and Subscription Date.
Sorry for the long comment. :)
Posted by: Craig | Friday, November 18, 2005 at 10:25 AM
Nick, no problem with being public. I just worry about details being lost when doing this kind of thing in a discussion thread. But I will gladly post here and in my own blog. I'm working on the info.
Posted by: Pito Salas | Friday, November 18, 2005 at 10:28 AM
Danny, as you recall from our previous discussion, I'm proposing this not because of its technical merits (I freely admit that there are technically superior solutions), but instead because existing aggregators already support OPML. This means that users don't need to import attention separately - that to me is the biggest benefit.
At this stage I don't have examples, as I'm simply asking for input on what attention data should be stored in OPML.
Posted by: Nick Bradbury | Friday, November 18, 2005 at 10:34 AM
Rank should be a float between with a range of 0 to 1. This way any aggragator could scale the number to whichever local scale they wished.
5 stars would be: int (rank * 5)
Zagat Rating would be: int (rank * 30)
This way more thank just simple newsfeeds could be ranked. Any URI / web resource can be rated. This could include an end point for a web service.
Posted by: Ted Tschopp | Friday, November 18, 2005 at 11:37 AM
Er, yes Nick, but existing aggregators already support RSS ;-) But you're right - figuring out what data points are needed (in the attention domain model) is definitely the first job.
Ok, "rank" seems worth having, but I'd suggest the definition needs to be clearer - i.e. what is the ranking value a measure of?
Another question: how do the characteristics listed in Attention.xml line up against the ones you've actually been using in FeedDemon? If there's even a modderate match, those terms might be the best starting point (even if a different format is used).
Another reference for you: "MeNow" - attention/presence stuff:
http://crschmidt.net/semweb/menow/
Posted by: Danny | Friday, November 18, 2005 at 12:34 PM
Using attributes already defined by Attention.xml makes a lot of sense, and would certainly simplify transforming between the two formats.
Of the attention.xml attributes, the obvious ones to use at the feed level are etag, tags, lastupdated, lastread and dateadded. Dateremoved could also be useful, since that could tell the importing aggregator not to recommend feeds you removed from the exporting aggregator.
Of course, it could be reasonably argued that we might as well use all of the attention.xml attributes, but IMO a simple subset is a good starting point.
Posted by: Nick Bradbury | Friday, November 18, 2005 at 01:17 PM
Yeah, the definition of 'rank' does need to be clearer, but it's tricky to do that without trying to attach it to a specific algorithm (which I believe would be a very bad idea).
In my mind, 'rank' simply expresses how important a feed is to the user. The higher the value, the more important it is. It's up to each aggregator as to how 'rank' is calculated, but it must be within a specific range of values.
BTW, I agree with Ted that a float between 0 and 1 makes more sense than an integer value.
Posted by: Nick Bradbury | Friday, November 18, 2005 at 01:23 PM
Isn't the best algorithm highly personalized? With iTunes, I spend my time tweaking my Smart Playlists for what I value rather than depending on Apple to give me some baked-in solution.
Posted by: Robert Gable | Friday, November 18, 2005 at 01:44 PM
Robert, that's my point exactly - the algorithm is defined by the application, not the format.
Posted by: Nick Bradbury | Friday, November 18, 2005 at 01:57 PM
Ok, but if your application has rank by page-view count, and my application has rank by cat-photo count (according to our local measures of importance), and we exchange data, is a 'neutral', rank measure in the format going to tell either of anything useful? i.e. the value is 100. 100 what?
Posted by: Danny | Friday, November 18, 2005 at 04:03 PM
Sorry, I missed this bit:
"...attention.xml attributes, but IMO a simple subset is a good starting point"
- works for me ;-)
Posted by: Danny | Friday, November 18, 2005 at 05:27 PM
Hi Nick,
Nice post :-)
What do you make of the 'Vote' attribute idea?
See: http://blogs.msdn.com/alexbarn/archive/2005/11/18/494369.aspx
Posted by: Alex Barnett | Friday, November 18, 2005 at 07:20 PM
I want to reiterate what Danny is saying (at least what I think he is saying). Storing attention data from which you can calculate a rank would be far more useful than the rank itself.
Rank is only meaningful to the application that created it. Trying to exchange rank between applications would be like trying to compare apples and oranges.
Posted by: James Holderness | Friday, November 18, 2005 at 11:32 PM
James, I agree 100% that it would be more useful to store attention data that could be used to calculate rank, and that's actually part of my goal here - to determine exactly what data should be captured.
However, each aggregator may have application-specific attention data that makes little sense elsewhere. In FeedDemon's case, drag-and-dropping an item into a news bin increases its rank - but this action only makes sense in FeedDemon. Likewise, flagging an item increases its rank, but many aggregators don't support flagging items.
Since each aggregator will have its own method of determining rank, a generic 'rank' attribute could help the importing aggregator determine how important each feed is to the user. If the exporting aggregator includes enough attention data in the OPML for the importing aggregator to calculate rank, then the rank attribute could be ignored.
I don't believe exchanging rank between applications would have to be like comparing apples and oranges - I think of it along the same lines as sharing a list of your favorite artists between music stores. If one aggregator knows which feeds you reads the most, then surely that's useful information to another aggregator?
Posted by: Nick Bradbury | Saturday, November 19, 2005 at 07:14 AM
Thanks for you comment: re vote attribute. I agree you you when you wrote:
"subscribing to a feed is automatically a vote for it."
I want to clarify. I wrote:
"A vote would work at the item level. (I repeat: by item I mean RSS item, webpage, blog post, podcast, or video or whatever - if it has an url it can be voted for). Voting would be explicit, requiring a user action, maybe a quick check of a box. "
So, I'm not talking of a 'vote' and the feed level, I'm talking at a more granular level - specific content the feeds point to that have value to you.
Posted by: Alex Barnett | Saturday, November 19, 2005 at 12:38 PM
I agree with Alex - The blog world is getting so large that reading all the items within each feed is really starting to hurt. I want to keep up with various feeds, but can no longer keep up with all the items within the feeds.
Before we starting thinking of rank, we have to start thinking of categorization, and the ability to find/seek what we are looking for. Being around for while, I remember the web before Search (Altavista, Google,etc.), and how difficult it was to find something when you didn't know were to look. Now that it has been categorized (still problems, just ask Scoble), we are now turning our attention to rank, and context to help find information from the clutter.
We all know that blogs and the contribution to blogs are growing like wild fire - and we are now producing so much that we can't find what we are looking for, or the inverse, weed out what we don't want.
OPML is not the answer, nor do I think a centralized search engine is as well. Thinking outside of the box for a second, I would like to think of DNS. It contains information that allows us to find what we are looking for, and it's not centralized.
Could we not do for tags, that we have done for .net, .com, .org, etc. Could we not have tags that are registered within a library/DNS that we could then point to an "item" (defined by Alex in previous post) rather than a computer with a numerical address?
Tags is also something that I'm working on to categorize marketing messages, but I don't want to "subcribe" I want to publish what I'm interested in - "items" delivered to me - now that would be aggregation at it finest (IMHO).
Posted by: Richard Ruekema | Saturday, November 19, 2005 at 02:03 PM