Internet Data Mining for Investment Analysis
CaroKann writes "Reuters is reporting on a Wall Street investment research company, Majestic Research, that is using web crawling techniques to track business performance. Instead of attempting to estimate business conditions by talking to company management, or pounding the pavement visiting stores, this company uses data mining systems to collect real-time sales data and other information on companies that have a web presence. Using this data, Majestic attempts to estimate company earnings more accurately than traditional research outfits."
I wrote a project in perl some years ago that would download online financial news stories and count the critical words and weigh their connotational weight, and compare that to the direction of the stock market. For example, if the words "stocks" and "down" started showing up a lot in sentences in online news stories, you might expect a downward trend.
I posted the preliminary code online in the perl newsgroup.
google "data mining" "news" "perl" etc
eat shiat and bark at the moon
A friend of mine has developed software that goes even further. It parses streaming news stories for good/bad news and executes orders before humans even finish reading. That advantage is enough to make this company a mint.
Having spoken to the company, I'll call your bull call. There are good proxies for retail sales data available from/over the web - how significant a channel is Amazon for some manufacturers? I don't doubt you can licence all kinds of goodness from other online properties - how about cars.com? Also TFA isn't entirely accurate. As well as data mining the internet, they also have access to a large number of proprietary (& $$) data sources - the drug prescription data you mention comes from companies like IMS Health & ImpactRX.
Stop whoring! AC me up Scotty!
Computers should be able to give a much more unbiased assessment of the economy than any person ever could. People are essentially incapable of interpreting economic data in a straightforward way, political agendas always seem to work their way into economists opinions about the economy. By using algorithms to do the analysis (and allowing market forces to refine those algorithms), we should be able to get a much better understanding of the REAL economy.
This is a good thing for mankind.
So you wrote a program that would read some stories that said the stock market was going down, and it told you the market was down? Did your program also see if weather news reports contained words like "rain" and "downpour" and hence "predict" rain?
beware the jabberwock, my son! the jaws that bite, the claws that catch!
Does anyone know whether Majestic Reasearch has any connections to Majestic 12 (http://www.majestic12.co.uk/)? For those who don't know, Majestic 12 is a distributed search engine. The distributed part is in that they have a bunch of people donate CPU cycles and bandwidth to run a web crawler in a SETI at home fashion. Now i thought this was a good thing to join, because we kind of need some independent alternatives to google. But if it turns out i'm sponsoring some marketing firm, well... i'd feel pretty stupid.
BEHOLD! The Power of the Meme!
t up=http://www.realmeme.com:80/Main/miner/investmen t/AMZNDejanews.png
http://www.realmeme.com/Main/miner/stock.jsp?star
I once interviewed with a group in San Francisco that did stuff like this. They weren't clear about who they were working for, but I do remember some of the techniques they mentioned during the interview. Some of these were actually implemented, others were just ideas:
- An eBay crawler that could estimate the number of auctions and average selling price to predict whether eBay would make their earnings target or not. eBay quickly blacklisted their IP space, so they started using a bunch of open proxies they found.
- By analyzing client/server communication for the Sims Online, they discovered that each connection was assigned a sequentially incrementing connection ID number. By looking at the rate at which the connection ID numbers were increasing each time they logged in, they determined that the Sims Online wasn't going to be nearly as popular as Electronic Arts was forecasting.
- They talked about placing a camera somewhere in Union Square (in SF) to monitor the entrace to Tiffany's during the holiday shopping season, and doing image analysis to determine what percentage of shoppers left the store with a Tiffany's bag in hand.
- Monitoring wireless carriers' spectrum to determine what percentage of GSM/CDMA channels were in use for data vs. voice. The communication itself is encrypted of course, but you can still tell whether a channel is carrying voice or data. They wanted to determine if wireless carriers forecasts about revenue from data services were accurate.
The only factor Greenspan had in most models was "the Greenspan put" which impacted growth only indirectly (because it freed an awful lot of risk capital). Macro forecasts are not worthe the paper they are printed on, my models were designed to interrpet when others were missing things like market share shifts and competitive advantages that were forming or decaying. The economy played a relativly small role in how those dynamics shifted (ie Dell was a better competitor than HP in good times 1998-2000, and bad 2001-2003) but to what extent did investors give Dell too much credit and HP too little credit for their successes and failures. Spotting that early is what most analysts are paid to do.
Degaussing scares the bad magnetism out of the monitor and fills it with good karma.