« American Idiot | Main | Read your GMail inbox in FeedDemon »

Wednesday, October 06, 2004

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

I still think you should include a 'paid feeds' group in feeddemon. I would appreciate the extra news sources, as long as they were put in a special area. It would also make feeddemon more useful right out of the box. Reconsider!

Now that RSS/blogs has infected my life, I spend myself spending more time in my day with information filtering and consumption. Before, I had a few select sites that I woud visit each day with very little new information each day. Now there is a huge influx of information into FeedDemon each day. It now takes hours each day to wade through it all. First thing in the morning is especially bad. I've had a few ideas how new aggregation could be improved in the respect:

1. Bayesian filtering/ranking in similar vein of many antispam tools. Posts with certain keywords (e.g. source control, code genereation) get ranked up while posts while other keywords (e.g. kids, vacation) get ranked down. Would need to be categorizable as well, due to the diversity of feeds one can comsume, a post may rank well for "Software Development" but poorly for "Sports". If a post scores above some threshold, then it is flagged/bucketted for that category. See Popfile for a antispam solution that works using this system.
(Besides just the post body, would also take into acount the feed and author and other "meta data" as part of the ranking process)

2. Moderation - similar to slashdot. Some qualities of a post would be extremely difficult for a computer to rank based on the words it comprises. Such as "Humor" or "Usefulness" or "Inflamatory". If one could manually rank a post for certain attributes and have it post that back to some central authority where other aggregators can consume this info and report it back to user. (e.g "10587 users have flagged this post as funny.")

3. Bandwidth conservation. This is becoming an issue for some popular feed sources, such as weblogs.asp.net. Aggregators and publishers need to come up with and agree upon some extension to RSS to save bandwidth while keeping things simple for the user. I've had one idea on this: A standardized feed url notation that uses time information to split the feed into multiple files, allowing aggregators to only get the pieces it needs. For example, if the "main" feed URL is http://www.mywebsite.com/feed/Temporal/Daily/Current.xml
then a "temporal aware" aggregator knows it can look for /feed/Temporal/Daily/2004-10-05.xml to get yesterday's news. This could allow feed publishers to split their feed on a frequency that's appropriate for their site. (weblogs.asp.net would probably want to use "Hourly" instead of "Daily" or a single author may want "Monthly".) Any aggregator could then figure out based on the interval and the last time it checked that feed, which files it needs to request. I think the URL could take 2 standardized forms, one for sttic file feeds, such as the above example, and one for dynamic feeds that use some programming to build the feed on the fly, such as www.mysite.com/TemporalFeed.aspx?Daily=Current
Basically one uses a file path and one uses querystring.

4) A standardized protocol for viewing/posting comments. If I'm viewing the post body in my news aggregator, I shouldn't have to browse through to your website to post a comment. This would allow my reader to also automatically flag posts that I post comments to. In which case it should monitor the post and its comments for an update/reply.

5) A quick one for FeedDemon, in the newspaper view, if a post is extremely long, truncate it with the option to view the whole post if I choose. Maybe some DHTML to auto hide/show the whole post would suffice.

6) Another seemingly simple one that FeedDemon could do: "Related links" based on links in the post or trackbacks. In newspaper mode, I envision this would look like the main post taking up 3/4ths of the width of the page while the links being small text along the right margin.

That's what I've got off the top of my head.

The biggest issue with RSS or blogging is the threat from within. The problem with mainstream media is that they live in a cocoon. Mainstream writers/reporters do nothing but talk, write and preach to the choir (ie. each other). The reason blogging has taken off is that bloggers are pushing to find new sources from around the 'net. But if that were to stop, then the blogging movement would really come to a halt. Because RSS/blogging makes personal publishing possible, this becomes a human issue and humans tend to form cliques. We would wind up with a bunch of groups living in cocoons.

Secondly, keeping RSS/blogging as open as possible is vital. Already we've seen problems with comment and trackback spam causing some people to turn those features off, essentially blocking feedback and links. Without those connections, blogging turns into a medium we already have: newspapers and mainstream media websites.

Technologically, as you have mentioned, RSS readers inherit the security problems that come with the web. At this point, only a slip in diligence will keep your efforts from being a success.

one area that i see being a future source of problems and embarassment is with information leaking out of "private" feeds.

RSS has proven itself useful enough that users start to want feeds of *everything*, including potentially sensitive info. in my office, RSS is slowly replacing email as the preferred route for distributing memos and internal announcements.

basically, if a web application with feeds has authentication more complex than the HTTP Auth or cookies that most client aggregators support (eg, columbia has some pretty convoluted systems of redirects which i certainly wouldn't expect FeedDemon or anyone else to support), you have to poke a hole in your security to make the feeds accessible and useful. that's bad enough but, if users use a centralized aggregator like bloglines, than you also have to trust them with your sensitive data. with the recent work on bloglines + FeedDemon type integration, we may even see ways for info to leak from the client up to a public server without the user being aware.

eg, at work our bug tracking database and internal blogs have RSS feeds. these systems normally require authentication to access but we opened up the feeds so people could subscribe to them. our info isn't particularly sensitive, but other companies' might be.

anyway, i won't be surprised if we see a few companies learn the hard way in the next few years that they can't open up internal RSS feeds to external readers no matter how useful it may seem to their employees. we see this kind of thing happen with leaked email all the time, it's just a matter of time before it starts happening with RSS.

unfortunately, i don't really see any easy solution to the problem. you could probably concoct some crazy solutions with encrypted feeds, but i don't know if it would really be worth the trouble.

Jim, that's quite a comment! Although you're really talking about features rather than security issues, you've touched upon a number of areas that I'm also researching, some of which require a server-side component to be useful. This is one of the big reasons I'm working with server-side aggregator developers - synch is just the start, IMO.

Anders, you raise an interesting point about accidental leaking of private data. I'm not sure that extra layers of encryption/security are always necessary, though.

I can see many users treating feed readers strictly as notification tools. For example, FeedDemon would be your notification inbox that lets you know when new information is available, but if that information is sensitive, you'd use another tool to access it.

Xueilonox, I believe the creation of blogging cliques is unavoidable. However, there will always be bloggers who act like editors for their clique, collecting links to information they believe will be of interest to their readers. The end result is that someone like me can subscribe to a small number of feeds yet still be in tune with a wide array of topics. Quite honestly, this is one of things about blogging and feed reading that I find the most interesting, and it enables escaping the cocoon without much effort.

Anders--

Perhaps sensitive data shouldn't be stored in an RSS feed at all. Or if it is, maybe it needs to be encrypted in some way that is only reversable by my reader. Perhaps "attach" the sensitive data as UUEncoded data in the body of the reader which unecodes to a password protected file format. That in itself would be a cool feature that could facilitate security.

Personally, I am still uneasy with the concept of pulling sensitive information via newsfeeds. The thought of having my credit card statement right next to the latest post on slashdot about "Micro$oft is teh Evil!" seems weird to me. Maybe some visual division between "public" and "private" feeds needs to be established.

My personal opinion is that I don't think the general public is ready to go this far with aggregation. People need to be educated not to give out their passwords for chocolate (http://www.securitypipeline.com/news/18902074) before they will be ready for secure privare aggregated personalized web content.


PS: This comments box is way too small.

One minor feature I would find tremendously useful is some form of feed relocation information for when a blog moves to new blog software, domain name etc. This would remove the pain of having to hunt down the new feed link and update it manually.

I would probably want to approve any relocations.

Good question! But I do not think it has anything to do with RSS. The question you are asking IMO is this: why are we disinformed and what can we do to change all this?
Of course it is not simple to answer. In a sentence, I think we are disinformed because information equals money and power.
RSS, like TV, radio, newspaper, and other wonderful things, will be used as a way of disinformation. To change this, we need to think very very very much and find other ways, new ways of organizing society. Yes I know, it is a long debate...:-) but sooner or later we will have to seriously think about it and do not let other people think for us.

in my personal opionion, security is the 'Users' responsibility and fualt, we can all add things into software, but at the end of the day if joe blogs downloads sumthing then uses an aggreagator, hes guna blame the agg. I do believe tho that the raw data of the xml file should be scanned after downloading before extracting the nodes, this way the whole structue could be looked at thus protecting the user a little more, but how many aggs are going to implement a whole xml 'clean' is another thing!

Excellent -- you're asking the right questions. One possible but little-understood risk comes from the XML libraries that aggregators use to parse the RSS. XML has ways to tell a parser to pull in information from external sources (such as schemas, DTDs, stylesheets, and even document fragments), and some parsing libraries will support that silently and by default. The results could range from denial-of-service (with deliberately malformed XML) to unintentional disclosure (remote server logs show what feed you're reading and when) to forged information in the feed itself. Check your XML parser options, and make sure you have DTD validation, schema validation, external entities, and anything else like that turned off.


David Megginson

That's an excellent point, David, and it's one I didn't really consider when developing FeedDemon. As it turns out, though, I'm using a home-grown XML parser to read incoming RSS feeds, and all the options you mention aren't used.

I'm glad to hear that. The other thing to look for is buffer-overflow vulnerabilities in your home-grown parser. How does it deal with, say, a 2 GB element name?

aaaaaaaaaaaaa ....

or a 2 GB attribute value literal, or 2M attributes in a single tag? Even if there's not a buffer-overflow exploit possible, these could at least cause memory exhaustion, or will your parser stop reading at some sane limit and report an error?

It's good that you're asking these questions. I'm not so concerned about the RSS world, but I'm worried that a lot of people are glossing over these vulnerabilities while rushing to Web Services for critical government and corporate infrastructure. These problems don't mean that we should avoid XML, but we have to look a little more closely at managing the risks.

-- David Megginson

The comments to this entry are closed.