Actually that is the intent, I just need to parse it. I've got the curr database on my computer and it is slightly parsed. I've figured out most of the relevant structure (like what are discussions, vs articles, what separates articles, wikicode), just need to sit down and parse all the wikicode and html. Preprocessing aside I've estimated a couple CPU weeks for wikipedia ~1.7GB.
One thing that was interesting about wikipedia was that I counted all words in it, and looked at the ones that only occurred a few times, these were almost all misspellings or typos, I thought about going through and correcting a lot of these but thought better of it...they serve a purpose, they flag less professionally done articles.
The other article used "hat + head" and "hat + banana" and said that because hat and head is way more common than hat and banana there is a correlation, however let's look at the numbers:
about 175,000,000 for hat
about 162,000,000 for head about 10,400,000 for banana
about 8,900,000 for head hat about 517,000 for banana hat
so 5.49% of head sites have hat in them and 4.97% of banana sites have hat in them
I wrote a program that gathered, analyzed and used word pair frequency data (various situational pairings). It needs more raw data, but shows a lot of promise. I opted to not use literature, as that often has archaic and purposefully awful word usage. Some of the issues involved include case, like Fall vs fall, I chose to ignore case, grammatical structure, needs to integrate with a grammar checker. Coupling this with a thesaurus is my eventual goal, this leads to some obvious difficulties, though it has potential rewards. I had considered google, and have run a few tests using it, but that solution was too simple, and not quite as powerful in the long run. Just had to share, sorry to waste your time.
Why not make the key beep with an embedded small wrist watch battery, that way if your key starts to sound like a Geiger Counter in Chernobyl you can tell something's up. Sure the battery would die out, but by then the key would be old, and hence the associated car would be old as well and either not as worth stealing or there would be other more effective methods.
And one more thing, screw causality!
"I ain't a physics geek"
and yet your username is the shodinger equation? I always called it PsiStarPsi, but it's all the same.
Yes, mostly. Though they had problems with O2->concrete in the first trial, this was solved on a later trial as I recall. Also it is believed that they smuggled some medical supplies in at one point.
Exactly what got to me. A decent game, perhaps even innovative, but yeah when you conquor some land, log off content and happy that progress has been made, log back on only to see that land overrun by the opposing faction, blah, maybe the expansion or patches handled this, though I doubt it, I quit well before then.
Or maybe everyone is 'posting' where it matters, as per the article... Well maybe not, but I can always hope.. that is until they craft some overly broad legislation to crack down on "electronic file transfers," to only be selectively applied.
As much as I dislike doubleclick I like the fact that they make it easy to block their ads, perhaps a future company or future owner will make it harder, so double click going away or changing hands may not be so great.
Why not standardize the EULA, so companies could pick from a few that meet their needs or create their own, this might allow people to know what is said without having to read it more than once.
Dang hypocrites.
Actually that is the intent, I just need to parse it. I've got the curr database on my computer and it is slightly parsed. I've figured out most of the relevant structure (like what are discussions, vs articles, what separates articles, wikicode), just need to sit down and parse all the wikicode and html. Preprocessing aside I've estimated a couple CPU weeks for wikipedia ~1.7GB. One thing that was interesting about wikipedia was that I counted all words in it, and looked at the ones that only occurred a few times, these were almost all misspellings or typos, I thought about going through and correcting a lot of these but thought better of it...they serve a purpose, they flag less professionally done articles.
Will do,
I downloaded the research paper/proposal/whatever-it-is, but have yet to read it does that count?
playing a bit more I tried the following (changed order in search):
about 669,000 for hat banana
about 8,870,000 for hat head
so this indicates that hats have a stronger correlation with banana than with head by %!!
5.475% (head)
6.433% (banana)
!!!
The other article used "hat + head" and "hat + banana" and said that because hat and head is way more common than hat and banana there is a correlation, however let's look at the numbers:
.5% difference.... not a great example.
about 175,000,000 for hat
about 162,000,000 for head
about 10,400,000 for banana
about 8,900,000 for head hat
about 517,000 for banana hat
so 5.49% of head sites have hat in them
and 4.97% of banana sites have hat in them
just over
I wrote a program that gathered, analyzed and used word pair frequency data (various situational pairings). It needs more raw data, but shows a lot of promise. I opted to not use literature, as that often has archaic and purposefully awful word usage. Some of the issues involved include case, like Fall vs fall, I chose to ignore case, grammatical structure, needs to integrate with a grammar checker. Coupling this with a thesaurus is my eventual goal, this leads to some obvious difficulties, though it has potential rewards. I had considered google, and have run a few tests using it, but that solution was too simple, and not quite as powerful in the long run. Just had to share, sorry to waste your time.
Why not make the key beep with an embedded small wrist watch battery, that way if your key starts to sound like a Geiger Counter in Chernobyl you can tell something's up. Sure the battery would die out, but by then the key would be old, and hence the associated car would be old as well and either not as worth stealing or there would be other more effective methods.
And one more thing, screw causality! "I ain't a physics geek" and yet your username is the shodinger equation? I always called it PsiStarPsi, but it's all the same.
Just send it in as a story, be creative, I think they read those sometimes.
Or for those who don't feel like logging in, and want a clicky link, clikicky clicky.
I think it was drunk driving which caused one of them to be in line.
I know what ever happened to the good ol'fashioned fry the hard drive type viruses?
Actually it isn't that hard, a few more than 15 but I'll bet I could do it in under 500, excluding libraries.
Not only that Drudge is linking to a story saying that they fixed the problem, as is a post in here.
Interesting, Yeah what I learned of it was mostly from my environmental sciences class 9 years ago, thanks for the correction.
Yes, mostly. Though they had problems with O2->concrete in the first trial, this was solved on a later trial as I recall. Also it is believed that they smuggled some medical supplies in at one point.
Exactly what got to me. A decent game, perhaps even innovative, but yeah when you conquor some land, log off content and happy that progress has been made, log back on only to see that land overrun by the opposing faction, blah, maybe the expansion or patches handled this, though I doubt it, I quit well before then.
What's wrong with the Fox News one you linked to, it seems to be Fair and Balanced... oh, right....
A meteor is one that burns up (any body's atmosphere), a meteorite is one that hits, an astroid is one out in space. Your link even says this!?!
A meteorite.
What good is email, if you can't use it a a means of communication for fear of spam/forged headers.
Or maybe everyone is 'posting' where it matters, as per the article... Well maybe not, but I can always hope .. that is until they craft some overly broad legislation to crack down on "electronic file transfers," to only be selectively applied.
So's a kerfuffle.
As much as I dislike doubleclick I like the fact that they make it easy to block their ads, perhaps a future company or future owner will make it harder, so double click going away or changing hands may not be so great.
So how does that prevent malware?
Why not standardize the EULA, so companies could pick from a few that meet their needs or create their own, this might allow people to know what is said without having to read it more than once.
Kinda like this.