Slashdot Mirror


Using AI To Filter RSS Feeds

holden writes "According to a blog post, AideRSS has moved from closed to open beta. I've been using AideRSS over the past few weeks to filter my RSS feeds (including Slashdot and Reddit) and I've been quite impressed. They talk a bit about how the filtering system works, which apparently tracks a mixture of things, from pick-up in other blogs, to some clustering technology."

53 comments

  1. Some filtered RSS feeds by holdenkarau · · Score: 3, Informative

    I'm not sure if it is bad form to comment on your own story, but here goes anyways :). You can take a look at the scored version of the slashdot RSS feed here, or del.icio.us or my (holden's) blog. There is also a really cool widget I've put on the side of my blog which lets people subscribe to only posts of a certain quality (you can look at it here).

    1. Re:Some filtered RSS feeds by eMilkshake · · Score: 1

      "Using AI to Filter RSS feeds" only scored a 1.4 and is at the bottom of my page. Wouldn't you think those using an AI to filter feeds would be interested in a story about AIs filtering feeds? Seems like an automatic top of list.

  2. Title really sucks by martin_henry · · Score: 1, Redundant

    It should read more like "AideRSS fianlly released" or "AideRSS goes live."

    As for the article, what kind of person or group has too many RSS feeds to look through?
    I'm asking because I really have no idea. I have linked the RSS bar in my Gmail to Tomshardware and Slashdot, but that's about all that I need....

    --
    www.purevolume.com/martyd
    1. Re:Title really sucks by Anonymous Coward · · Score: 1, Funny

      I have too many RSS feeds to read through (but thats in part because I try and read the digg rss feed... ugh...)

    2. Re:Title really sucks by Anonymous Coward · · Score: 0

      I actually have a couple hundred. RSS is more than just reading slashdot.

      So besides /. i have Microsoft Developer network. Not the the main feed as there is a lot of junk so I subscribe to the individual feeds. For example I could care less about VB .net but 3d, C# C++ web programming, servers etc. All in all Microsoft makes up 25 of my feeds.

      Then there are new sites. Again not the main feeds because you get a ton of junk. I honestly could care less what paris hilton is up to in the news but the main feed would send it down so i ignore it completely and get world news, tech news, etc.

      Then there are other uses for RSS, like netflix, I hate emails from them getting emails about shipping and recieving sucks. However they do have customized RSS feeds for your account letting you know when something is shipping.

      I am also on several Open source projects. When there are updates check ins and so on on. Those all come via RSS. Then I am on the board of directors for a User Group, with its own website and it's own rss feed and also some special interest groups.

      Also totally in left field. I love to grill. I have a couple grilling RSS feeds and get weekly recipes and so on.

      Then you have friends and family. A member of my family just had a baby. Which the baby has it's own .com already registered and baby pics and so on are posted there. WHen again when there are new ones. I get them in my RSS feeds.

      Really I rarely browse the web anymore. RSS is really got me more informed and in touch. I mean honestly I wouldn;t go visit the baby pics on the web if I didn't get them via feed.

  3. Secret Sauce and GeoRSS by Lord+Satri · · Score: 2, Informative

    From TFA: ""Some of that data we show on the site itself: Technorati, del.icio.us, etc. Essentially, we're interested in measuring the 'social engagement' of each post. To make this a little less hand-wavy, I think we'll agree that a bookmark is nice but a comment involves more work, a trackback even more so, etc. - hence, engagement). Once we have all this data, we apply our 'secret sauce', which comes in a form of statistical analysis with respect to the author's previous history/posts. PostRank is not a global score, it's with respect to the blogger him/herself.""

    Secret sauce? Why do I prefer open sauce? ;-)

    One other way to filter RSS is by geographic location through using GeoRSS. However, the source RSS must be offered in GeoRSS for this geolocalization filtering to work... but it's only a matter of time, we'll get there. (hey, even slash has a plugin that works for publishing GeoRSS)

    1. Re:Secret Sauce and GeoRSS by bergie · · Score: 2, Informative

      Geolocation is a possible additional filter (think "local news" section of a newspaper), but I guess most people are interested in items from their field of interest regardless of the physical location where the post was made.

      I made some experiments on a more open source version of the "secret sauce". It seems quite easy to determine relevance of posts using the various social news services out there.

      --
      Midgard Project - Open Source CMS
    2. Re:Secret Sauce and GeoRSS by belg4mit · · Score: 1

      Actually you'd be surprised. What about the economies of scale in running a centralized
      event calendar, with the advantages of letting the reader select it's own idea of "local"?
      (Obviously you'd need to apply some minimal pre-filtering for "local" on the server)

      --
      Were that I say, pancakes?
  4. Re:Yes it is bad form by Anonymous Coward · · Score: 1

    because the slashdot janitors' javascript programming skills are even worse than their perl programming skills and their editing skills.

  5. If Only ... by nxtr · · Score: 5, Funny

    If only they could get the AI to do the work I'm missing out when I'm reading RSS feeds.

    1. Re:If Only ... by $RANDOMLUSER · · Score: 1

      Sorry, what is this arse feed thing which you refer to?

      --
      No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    2. Re:If Only ... by jeffeb3 · · Score: 1

      Oh, I know.... I barely have time to reply, I'm so busy at work right now.

  6. Download URL by Rupy · · Score: 1

    Anyone know where you can download it?

    1. Re:Download URL by forgotten_my_nick · · Score: 2, Interesting

      Someone can correct me if I am wrong but it looks like your supposed to put your feeds into the website then link to the one feed there.

      Seeing as half my feeds are internal work related and the fact I don't want someone profiling all feeds I am reading I won't be using the service.

    2. Re:Download URL by Pollardito · · Score: 1

      Someone can correct me if I am wrong but it looks like your supposed to put your feeds into the website then link to the one feed there. yes, "you put your feed in there", exactly.

      the privacy concerns for this seem to mirror those of any of the public feed browsers like Google Reader, it's probably a bad idea use them for private feeds
  7. "lindsay lohan arrested" scored 5.0 on CNN. by schwaang · · Score: 1, Funny

    The secret sauce tastes like teen spirit.

  8. Potentially scary side-effects already. by gethoht · · Score: 5, Interesting

    There are some companies out there(i.e. http://www.collectiveintellect.com/) that are using AI to mine RSS feeds and specifically the blogosphere, and selling that data to corporations for various reasons.

    Lets say you're a drug company that is releasing a potentially controversial drug. You can mine the data of the blogosphere and issue press releases as a pre-emptive strike to larger media stories. This starts the real beginning of being able to effectively monitor and even potentially control some of the social aspects of the internet. I think it's a great innovation indeed, with potentially scary side-effects.

    Personally it is nice to be able to filter through a billion RSS feeds to find information that I'm interested in though.

    --
    All things are subject to interpretation, whichever interpretation prevails at a given time is a function of power and n
    1. Re:Potentially scary side-effects already. by cookieinc · · Score: 0

      Personally I think it would be nice to be able to filter words such as "blogosphere" out of RSS feeds ;)

    2. Re:Potentially scary side-effects already. by rm999 · · Score: 2, Insightful

      "This starts the real beginning of being able to effectively monitor and even potentially control some of the social aspects of the internet"

      I fail to see your reasoning. Companies have always been able to "monitor" blogs and subscribe to RSS feeds. And they aren't controlling the social aspects of the internet at all. A press release has always been a standard communication means of corporations; as long as they aren't creating fake blogs, I don't think they are trying to control any aspect of the social internet.

      And personally, if a company does analyze blogs, I think it's a great thing. It means they care what normal people think about them and their products. Almost every blogger who talks about a company hopes that the company is listening to them.

      As an AI student, I wish people on Slashdot weren't so afraid of "intelligent" algorithms. They really aren't meant to be evil, they are usually meant to make something that is tedious more efficient. Yes, it can be abused, but just about everything can; For example, just because airplanes are used by the military to kill people does not make airplanes inherently evil.

    3. Re:Potentially scary side-effects already. by matthewcraig · · Score: 1

      I see what you are saying. In the past, powerful corporations might say, "We cannot do actions that might put us in a poor light, because public opinion would turn against us." Now, they can instead measure public opinion closely and watch web-trends based on RSS keywords. They can measure their actions by the result of the "blogosphere" and then gauge their next nefarious action accordingly.

      In reality, however, I think the common practice of spider-intelligence-gathering is simply another tool for marketing departments. They measure what trends are rising and what trends are ebbing, in near real-time. Plus, they can measure the "buzz" regarding their products, again in near real-time.

      Like any new technology, there are circumstances when it will be used for ill, but mostly it will simply be used to make us all more efficient. The hardest part is just getting our minds wrapped around all the new possibilities!

    4. Re:Potentially scary side-effects already. by dada21 · · Score: 2, Interesting

      There are some companies out there(i.e. http://www.collectiveintellect.com/) that are using AI to mine RSS feeds and specifically the blogosphere, and selling that data to corporations for various reasons.

      Sounds like a fantastic market, actually. I recently picked up a client in the casino management market because I had made some comments on a blog regarding their lack of insight towards proper marketing and keeping a decent percentage of return customers. They actually contacted me, and I've spent a large amount of time parsing through literally hundreds of RSS feeds covering different search terms (the company, the company's competitors, key words of the market, etc, etc). They're always impressed when I hit them up to 50-100 new blogs, news reports, and websites condemning or supporting various new tactics or old dogs. I think that selling companies information that can help them (fix or spin) the problem is a huge market waiting to be tapped properly.

      Lets say you're a drug company that is releasing a potentially controversial drug. You can mine the data of the blogosphere and issue press releases as a pre-emptive strike to larger media stories. This starts the real beginning of being able to effectively monitor and even potentially control some of the social aspects of the internet. I think it's a great innovation indeed, with potentially scary side-effects.

      I'm not sure I'd call any of the side-effects scary, though. Honestly, I hit Google Blogsearch and Technorati before I hit Google News. What is most important to me, though, is a personal rating system so I can give weight to certain information providers that have given me good information. Lately, I've had better luck from blogs than from the mainstream media. That's a nail in the coffin for the MSM if they don't move with the speed that today's news readers desire.

      The benefit of the "free market of information" on the web is that you can now see way more than the black and white sides of things. A drug company may put a spin on bad news, but the issue isn't just "bad news" and "good news." There might be 3 or 4 degrees of separation between the actual negative news and what others may say that could have a positive effect. With the growing (and shrinking?) anonymity of the web, whistle-blowers also have a chance to get the word out -- but again, without a positive history of truth, they may not have a high ranking with me.

      Personally it is nice to be able to filter through a billion RSS feeds to find information that I'm interested in though.

      I've been sorting through thousands of RSS feeds for a few years now, and have been using some collaborative filtering sites and systems to try to give weight to what I consider the better news sources. Collaborative filtering will be a necessary element to an AI filter because what matters to YOU personally may be missed, but if you use a collaborative ranking, you will also gather information that is important to people who have similar views/needs that you do.

      I find a lot more power in the collaborative filtering market than in the AI market. The downside of collaborative filtering is that it can be gamed (see Digg). The upside is that metamoderating of other collaborators can work to fix the gaming individual at a time.

    5. Re:Potentially scary side-effects already. by magus_melchior · · Score: 1

      You do realize that this is the crowd that watched such movies as the Terminator trilogy...

      --
      "We are Microsoft. You shall be assimilated. Competition is futile."
    6. Re:Potentially scary side-effects already. by vague_ascetic · · Score: 1

      A bit tangential but lately in server logs I have access to I've noticed a proliferation of 'vertical search engine' bots, which do not claim to ever be planning to provide the data they acquire at the websites' bandwidth cost in any manner which could possibly be deemed as reciprocal. Even more troubling are a few sites that throw mad wget type bots at sites, with user strings claiming to be a common browser, without concern for bandwidth spikes by using decent time intervals between GET requests, and will use logic attacks to guess server file structure. One I am currently keeping a close eye on (no name YET, but DC suburb in Virginia based, and purported to have gov as well as DMCA related contracts) has even managed to get written up in 'white hat' tech news as a new modern defender of corporate properties. Not one of the sites whose logs I access have anything even remotely related to DMCA issues, but a couple could be construed as political, which still does not give them the right to attack servers, just cuz, and their being written up as a 'white hat' currently offends me. It's akin to the 'Jack Bauer wannabe' epidemic currently afflicting the asshats who are officials in the American Republic.

      If you use evil to battle evil, then you are evil. If you rationalise your use of evil in an attempt to argue it is for good ends, then you are a coward, and a liar, as well as being evil; in short: scum of the earth.

      --
      Rush Limbaugh is a perfect real world example of an oxycontinmoron
  9. Another site using AI by Sanity · · Score: 3, Interesting
    Thoof (disclaimer: its my website) uses Bayesian analysis (you could call it AI, so much as anything is AI) to determine what you are interested in reading, based on a variety of factors, including:
    • The referring website (and what other people from that site liked)
    • Your OS/Browser (and what other people with your OS/Browser liked)
    • Your geographic location (and what other people close to you liked)
    • What you yourself read
    It also allows users to edit stories, a mechanism conceptually similar to a wiki, but with an additional voting process to help prevent abuse.

    Unlike AideRSS, Thoof isn't an RSS aggregator, rather users submit stories, in a manner similar to Slashdot, Digg, and Reddit.

    1. Re:Another site using AI by IwantToKeepAnon · · Score: 0

      Bayesian analysis (you could call it AI, so much as anything is AI)



      Bayesian filtering "learns" from past events, so yea it is an AI of sorts. I *love* popfile for classifing my emails. Haven't read spam in a long time, but that is on ONE of the classes that it correctly infers.

      Here's my stats: Classification Accuracy
      • Messages classified: 100,895
      • Classification errors: 272
      • Accuracy: 99.73%


      Not too shabby.
      --
      "Happy families are all alike; every unhappy family is unhappy in its own way." -- Anna Karenina by Leo Tolstoy
    2. Re:Another site using AI by jamshid · · Score: 1

      Wikipedia says "intelligence is a property of mind that encompasses many related abilities, such as the capacities to reason, plan, solve problems, think abstractly, comprehend ideas and language, and learn."

      Fancy stats != intelligence

    3. Re:Another site using AI by Sanity · · Score: 1

      Fancy stats != intelligence
      Are you sure that intelligence isn't simply the combination of billions of neurons each processing information?

      I have a bachelors degree in Artificial Intelligence, and I certainly wouldn't claim that intelligence doesn't simply boil down to mathematical computation at some point, indeed, I suspect that it probably does. Its just that we don't understand it yet.

  10. recursion by shird · · Score: 4, Insightful

    What if the 'other blogs' they 'pick up' on, are in turn using AideRSS to determine what to blog. The whole blogging thing really does seem like one giant feedback loop with only a few people generating actual useful content.

    --
    I.O.U One Sig.
    1. Re:recursion by jagdish · · Score: 3, Informative

      So basically the situation would be unchanged.

  11. Artificial Intelligence on Slashdot by Anonymous Coward · · Score: 2, Funny

    I suppose in these modern days when natural resources are being rapidly depleted by overpopulation and overconsumption, there had to come a time when we would start running out of intelligence... of course I wouldn't know because I'm a little short on it myself...
    however
    It is pleasing to see that scientists around the world have started to produce artificial intelligence to make up for the loss of natural intelligence, but I think that like everything else, perhaps it is also equally important that we conserve and recycle the little natural intelligence we have left and refine our methods to efficiently extract and use that intelligence to, uh, do something or other, but do it efficiently and without any needless waste. Yes, that's my point.

    And to that end I see this Artificial Intelligence RSS Feed Filter as a great marvelous invention, because you see, it combines the old and the new, it uses artificial intelligence to extract natural intelligence efficiently and use it for something in a wonderful postmodern fashion. Now, modern invention assists the primitive natural.

    Now, all we need is to have a massive SETI like project running this AI RSS Filler Feeder to search for signs of intelligence on slashdot. Oh, oh, cross my fingers, I hope my post makes it pass the filter...

  12. Dupes! by DaSH+Alpha · · Score: 1

    Yes, but can it filter dupes?

    1. Re:Dupes! by Anonymous Coward · · Score: 0

      I have the same question actually, I read a few feeds, drudge, cnn, digg and a lot of times on breaking stories I'll get several duplicates of the same story (posting as anynymous to hide that I read drudge and digg )

  13. But... by skipsandwichdx · · Score: 0

    I don't know anyone named Al.

  14. Another half-baked open soucred approach by Manifest · · Score: 1

    Way back in 2003 I wrote some codes to do something similar. I called it Intelli-Aggie and the code is released under GPL. It remains a developmental prototype as I got side-tracked.

    IA works, as noted in the readme, by computing a relevance factor, which in turn is based on four other 'relv' - category relevance, feed relevance, keyword relevance and item relevance. I used it as my reader for quiet some time before moving over to 'better' readers.

    --
    ... "follow me" the wise man said, but he walked behind ...
  15. Filtering RSS by AmberBlackCat · · Score: 1

    I vaguely remember somebody saying the whole point of RSS is that you never get content you don't want because you have to subscribe to it in the first place. What's stopping us from unsubscribing instead of filtering?

  16. Re: [ot] intelli-aggie by Anonymous Coward · · Score: 0

    The term "Intelli-Aggie" sounds like an oxymoron to me. Hint: Just ask any Longhorn, and he will tell you there's no such thing as an intelligent Aggie.

  17. Uh.. nice.. but... by Anonymous Coward · · Score: 0

    Why not download this RSS tool instead. It has no T-Shirts but it does filter dupes and put the feeds into a SQL database.

  18. Sux0r : Bayesian RSS filter and you can run it too by Herve5 · · Score: 2, Informative

    http://www.nullwhore.com/sux0r/index.php?c=/0/logi n/
    http://sourceforge.net/projects/sux0r/
    What I find interesting is, it is one of the verrry rare examples of 'internet 2' service that you can own yourself (instead of registering here or there for more ads or worse).
    A downside of Sux0r is it seems not having evolved for a couple of years (but still works, possibly that's why :-)
    I for one am desperately waiting for a *local* RSS agregator which would allow *me* (and not some site's AI) to Bayes-filter my selected feeds. I'm almost sure this will happenn sooner or later.

    --
    Herve S.
  19. AI? by Anonymous Coward · · Score: 0

    Come on. I hear AI I think, you know, self-aware. Couldn't we call it something more appropriate like "smart" or "learning"?

    Calling everything and my grandma's toaster AI is getting a little old.

    1. Re:AI? by pclminion · · Score: 1

      I guess that's your problem. Sorry, you don't get to define the terms.

  20. Personalize instead by Catil · · Score: 3, Insightful

    I think there are basically two kinds of RSS Feeds, either they show the latest news (last in first out) or they show an already sorted frontpage (e.g. "crowdsourced" like Digg); both are useful.

    Using an AI to resort those feeds is definitely interesting from a coders point of view but trying to give some kind of objective view to a feed is probably not what the average user wants.

    Why not do it the other way around and personalize them instead? Maybe it has been done before, but it would be nice if there was a reader to rerank (or even filter out) certain domains, keywords, tags and categories. It could take the given rank as the base score and then resort it according to the user's personal preference, e.g. if someone doesn't like politics he could give the keywords "Bush, Cheney, election, etc." a negative mulitplier and maybe the keyword "funny" gets a positive one. It could even consider the time of the day - politics in the morning and funny pictures during the lunchbreak or something.

    Just a qick thought though, someone can perhaps come up with something better. Anyway, I am pretty sure that personalization is the better approach here.

    1. Re:Personalize instead by adam.jimenez · · Score: 2, Insightful

      spot on.

      i also think their should just be a thumbs up/ thumbs down option which would save you typing in.

  21. Re:recursion excursion by ancientt · · Score: 1

    No, then a few people would be generating useful content.

    Dare to dream!

    --
    B) Eliminate all the stupid users. This is frowned upon by society.
  22. Risks of non-algorithmic filtering by dpbsmith · · Score: 1

    I view with alarm the increasing use of "artificial intelligence" to filter, screen, or otherwise judge human-generated material. In this case it's not enormously important, but it's part of a growing trend.

    The issue is lack of responsibility or accountability, because at a certain level of complexity, it is no longer practical to understand or explain the basis of individual decision. The company can just say "the computer did it."

    A few years back there was serious consideration being given to using neural nets or something like that to make judgements on loan applications. IIRC the proposed way of handling some sort of legal issues regarding accountability was to add to the system a subsystem that would automatically test the effect of hypothetical changes in the applicant's income. Thus the company could always say "this application was rejected because the applicant's income was too low, and would have been accepted if the applicant had earned X thousand more a year." Raising the question, of course, of whether this was the real reason. Or what it means to talk about "the real reason" in the case of a decision made by a neural net.

    In the case of a neural net made of meat, it's possible to cross-examine the net and attempt to find out whether illegal bias played a role in the decision. In the case of an AI neural net, there may be bias built-in... but there's no way to ask the neural net itself about this, and unless the programmers did it deliberately and consciously and left a paper trail, it's pretty hard to find out about it.

    1. Re:Risks of non-algorithmic filtering by pclminion · · Score: 1

      Thus the company could always say "this application was rejected because the applicant's income was too low, and would have been accepted if the applicant had earned X thousand more a year." Raising the question, of course, of whether this was the real reason. Or what it means to talk about "the real reason" in the case of a decision made by a neural net.

      How is it any different from dealing with a person? You have no idea if what somebody tells you is "real." And people can get hunches, where they feel that a situation is a certain way but without knowing quite why they feel that way. Basically, there's no difference between a neural net not being able to explain its "opinion" and a human not being able to explain his opinion.

      You seem to think that we can always get to the root of the matter by simply questioning the banker. That seems... laughable.

  23. Just what I need by uigin · · Score: 1

    Finally, I am one of those people who are swamped by news feeds. Some of the feeds I subscribe to are updated very regularly (the news ones) and I don't need to read everything that appears on them others (personal blogs) are infrequently updated and I want to read everything.

    Two things I'd like to see:
    An offline version; I know it's unlikely to appear (Web 2.0 business model and all that) but I'll never use the online one in the long term.

    The ability to upload a bookmarks file filled with rss links. I don't want to have to manually upload all my rss feeds. Also it'd be nice to be able
    to change the story levels for all of the feeds from the one page (radio buttons and a table?) rather than having to access each feed before setting the story level.

    D.

  24. My first thought? by wtfpgh · · Score: 1

    What's Al Gore got to do with this besides inventing the Internet, and how can I get him to filter my RSS feeds?!

    --
    Every time you ________ in Soviet Russia, kitten kills God!
  25. I would have replied to this earlier by Zepalesque · · Score: 1

    But AideRSS filtered the post out...

  26. Re:recursion excursion by drew · · Score: 1

    So basically the situation would be unchanged.

    --
    If I don't put anything here, will anyone recognize me anymore?
  27. Wish Google Reader has similar function by samxiao · · Score: 0

    I had all my RSS on Google Reader it'd be good to have AI filtering on it too

  28. Need RSS Feeds filter by vague_ascetic · · Score: 1

    I am probably an exception to the rule, but I just counted up the different individual news feeds in my NewsReader's (Default-RSS OWL Java OS) start-up OPML file, and there are 1074 unique feeds in it. Granted, there are a few major Mainstream News site feeds that I just recently updated the feed lists from their RSS pages, and haven't yet filtered out the many I consider to be irrelevant from them. Two I recently rescraped but haven't filtered are are McClatchy News and the NYTimes, but even after filtering, the number will still be around 1000.

    There are many different reasons for using RSS Feeds. It's more than just keeping up on your favorite web pastime sites. I usually have anywhere from five to ten Google news feeds for specific current event topics I am interested in. For any who dislike the default number of 10 articles that the GoogleNews generated feeds provides as default, add a new parameter argument at the beginning of them (after the ? in the RSS feed URL):

    num={varX}&

    where {varX} is the number of different headlines desired. I haven't checked, but would not be surprised to find an upper limiter on the number, and 100 would be my first guess.

    I keep an eye on the feeds provided for open source software packages I use, for notices of upgrades, bugfixes, security issues etc.

    I keep an eye on the wonkage produced by many Tanks/Policy orgs via RSS Feeds. I scan tanks/org I generally agree/disagree with, as well as those I consider to be POV neutral.

    There are a few very specific feeds to aid in my work and personal free-time collaboration projects.

    I desire to acquire as much data input as my mind is capable of assimilating. The issue is avoiding buffer overflows. Any good feed filtering software available that I can use with a high degree of confidence in its results is appreciated, and I am always curious about untried packages.

    --
    Rush Limbaugh is a perfect real world example of an oxycontinmoron