Wednesday, April 12, 2006

Feed Stream Analysis - Web Feed/Post Analysis to Group Like/Related Posts

One of my pet vision projects is build a next generation RSS/Atom/web feed reader (yeah, that’s all... simple right? I should be done with that this weekend, right? lol).

I subscribe to thousands of feeds/bloggers. Information overload is a constant problem. And the problem is only going to get worse in the coming years. You think web feeds are everywhere now? I’d bet we’re not even close. When IE7 ships and makes it really easy to subscribe to feeds, easy enough where my wife, parents and daughter are feed reading, THEN we’ll see the real feed explosion.

The current and near future feed readers are great tools but don’t really help the reader analyze their aggregated feed stream and deal with the overload. The folder metaphor only goes so far. It just doesn’t scale. "Newspaper" views and River of News (Dave Winer) help but still do not go far enough, IMHO.

What I want is a reader that takes my feeds and applies topic, conceptual and textual analysis to new posts, taking into consideration past posts that I’ve label or tagged. I want to see what’s hot in my aggregated feed stream, read and deal with that topic and then move onto the next. I want the reader to know what I thought was interesting in the past and to highlight posts with like topics. Not just matching words (which many readers have now) but matching topics. I want my reader to learn from me, kind of like Pandora or Last.FM does for music.

For example, when Google releases a new service or app just how many times do I really need hear about it? I’d like to browse that topic group, mark them all a read and then move onto the next, maybe related topic group.

I want to text mine the posts... And then present that information in a dynamic visual interface.

A Galaxy of Data UI. Star clusters representing like topics, the closer the stars the more related the data. The more stars the more posts with related topics. Solar systems are closely related data...Think 3D interface, with ability to easily rotate and zoom. (yeah, I know too much science fiction reading lately I guess). A visual method to see the relationships of your subscribed posts 

Or something like that... A new way to help deal with the information overload. To help me focus and deal with common items, and thereby highlighting those items are NOT all that common.

Okay, enough for now. Sorry... I get excited when I get on this kick.

What got me started was that I happened on a reference which reminded me of the WordNet project (WordNet-based semantic similarity measurement), leading me to the WordNet.Net project (WordNet.Net Open Source WordNet Library for .Net). Then on to the ConceptNet Project (The ConceptNet Project V2.1) which reminded me of my OpenNLP post ("Statistical parsing of English sentences")...

Which reminded me of my Uber-Reader vision.

Posting my vision is one way I use to remind myself of my visions and to try to kick myself in the butt to do some work on it... ;)

Related Past Post XRef:
WordNet
"Statistical parsing of English sentences"

Technorati Tags: , , , , ,

6 comments:

Anonymous said...

I'm getting back into the RSS Aggregator project we e-mailed about last year. I had a basic aggregator working, but ran into problems with embedded databases. I'm into the planning stages on a "take two" version based on NHibernate for database abstraction, .NET 2.0, and hopefully support for integration with the IE7 feeds system.

Most importantly, though, I'm trying to scheme up a plugin / pipeline model that will make it easy to upgrade the information that's presented. I'd like have support for auto-prioritization of posts I'm likely to be interested in, and I'd like for that filtering / sorting / searching / prioritization to be pluggable to allow for extension.

Chat or e-mail me if you'd like to brainstorm. I totally agree with you - us 1000+ feed subscribers are edge cases now, but with IE7 bringing feed subscription to the masses this problem will be more widespread.

Greg said...

Perfect.. I wasn't really interested in re-inventing the feed plumbing (I SO didn't want to deal with the ways ATOM/RSS feeds get mangled). Hooking into the IE7 feed system is a great idea. Let MS do what they do well, provide the basic framework...

Before I do much more than I already have I need to clear it with work. I am still fighting IP battles... (IP = "I think it, they own it")... sigh

But worse case, where I can't actually code, I can at least talk, beta and make comments...

BTW, congrats on getting Scobled! I just found out today via another blog (http://www.theserverside.net/news/thread.tss?thread_id=39899). I saw his post, but for what ever reason it didn't click (talking of info overload)...

Thank you again for including me in your list. It's very humbling to be included with the others in your list...

Bryant Likes said...

I imagine the next big thing around rss/feeds/blogs will be a service that does something similar to Amazon's suggestion service. We noticed you enjoyed this post by x on y, so you will probably also like z's post on y.

Of course, this would involve adding some kind of rating system for posts and blogs and subject categories, etc. But that would be a way to add value to all the rss content out there much like Amazon has added value to all the ISBN content out there.

Just my $.02...

Greg said...

I'd bet you're right Brian.

A social aspect seems like a great fit.

That has me thinking...

I'd really dig having my Reader not only take into consideration what I thought was hot, but also what other respected people (like you and Jon) also thought was hot.

I personally wouldn't want a "worldwide" hot list (we have that today with Digg, etc?), but a selected/peer/buddy hot list would very cool.

Though an Amazon like hot/suggestion list might be cool too.


I'm thinking a level deeper than blogroll... A post based unitization where a user marks, tags or rates a post. That data is posted to a central web service.

A user creates a circle, adding people with like interests, etc. Their reader then uses the data from their circle(s) when doing its post analysis.

It would also make sense to be able to apply different circles to different feed groups...

Making "stars" brighter (or changing colors) where there's a higher interest rating...

Oh yeah, I want that...

Chris Saad said...

Hey Greg,

I love your ideas - we're working on a similar project Called 'Touchstone' The idea is to do all the ranking and relevance you talked about based on your attention profile. Then we present the information to you while you work - the more important a headline the more disruptive the presentation.

I'd love to hear what you think!

You can find out more at the site at www.touchstonegadget.com

Greg said...

Touchstone does look pretty cool.

I'll be checking it out in more detail this weekend...

Thanks!