Solr/Lucene real-time search (or near real-time) is one of its weaker points. I think it could keep up with the updates but making them appear in the index immediately and having the caching still perform can be tricky.
We have one index with that's updated every 20 minutes, but only has about 50k documents and a combination of Solr cache auto-warming and squid's stale-while-re-validate logic works there.
In another system where updates need to be faster, we had to do some custom work to make it perform where there is an in memory index for recent changes, an on-disk index of previous changes, and process for moving from one to another. Hopefully these improvements will make their way back to Lucene in the future.
Solr/Lucene power a number of sites that would be in the enterprise search category (Apple, Netflix, C-Net). Where I work, we index 5 million docs in Solr/Lucne and serve out millions of search requests a day. It's not google scale, but most people don't need that. The markets where one needs a FAST are dwindling quickly.
I agree TWC is a tool. I still don't think its due to fear of losing customers to online video. Hopefully in the future, they will come up with more reasonable policies.
Below is what I was referring to. I was left with the impression that digital transmission and
digital tier we're somehow related such as not having to transcode an analog signal in real-time but given that it say "independently" they may not be. Thanks for the correction.
"Independent of the broadcast digital transition set to take place on June 12th, Comcast is making the all-digital shift in about half of its markets this year. (All Motorola markets so far) In fact, the Comcast move is taking place in my own Philadelphia backyard at the moment. As part of a âoemarinationâ period, Comcast is deploying digital set-tops and DTAs for basic subcribers during already-scheduled truck rolls. After new hardware is the field, Comcast aims to move roughly 40 analog channels to its digital tier - all as a way to free up bandwidth for more HD content and DOCSIS 3.0 channel bonding."
While stories of the cable companies running in fear from the impending flood of online content and restricting bandwidth in response have been common on slashdot for a while it's disappointing that something like the Economist has picked up this fable.
The reality is that most of the content that offered on cable today won't make its way to the web for free under the current revenue models of content providers (not cable cos). Currently half of the revenue that channels like TLC get is from cable subscriptions. The other half is from advertising. These channels aren't interested in cutting their revenues in half on the hopes that on-line advertising somehow doubles in profitability. This is especially the case when it's currently only about 10% of what the same ads get you on TV.
Hulu is an experiment by major networks (FOX/NBC) (who already provide their content for free and get most of their revenues from advertising) to see if they can make the online advertising model work and capture more eyes than they are currently getting. While the site is successful in terms of traffic, the advertising dollars aren't there yet. As long as it doesn't undermine the real money of TV advertising, it's a useful experiment. (There is some cable network content on Hulu from Viacom ala Comedy Central/Sci Fi but its the content that is so mainstream that the advertising experiment may pay-off).
While the Economist was missing the boat they decided to also throw in the al-a-carte pricing myth as well. That model doesn't work either since not everyone wants the same 15 channels. If you move to that model then today's 100 channels would be tomorrows 20. Hope you like Home Shopping Network more than BBC America.
As to why DOCSIS 3 is so slow to make its way out? First cable cos are large companies that have basically been monopolies for long enough that their culture reflects that. They are slow to do anything. Second, they are waiting for the end of analog signals so they can reclaim some bandwidth. Third it's expensive. When you have 30 million customers, $100 a pop is real money. Under these conditions, there is little incentive to rush to market. That said, cable cos are starting to roll out this service and increasing the bandwidth of existing customers.
So what will happen? I think content providers will partner with cable cos to provide their content online. It will be on a different part of the pipe, just like phone service is so you get good quality of service for HD even when your neighbor has his torrents at full throttle. You'll see reasonable network caps like comcast's 250GB a month but your video viewing (on that separate pipe) won't count towards that so you mostly won't care providing you have a job or something else to do other than watch torrent content all day. Will you suddenly get all the good content you want for free? Unlikely. Will your most affordable (and legal) option for getting content still be from a cable co/tele co? Probably. Hopefully as more options become available: Cable, Fiber, Netfix, iTunes, the pressure will be on cable/tele cos to provider a better experience.
Picture your Tivo now, with its great recording software. Compare that to the crappy software your cable company uses on their DVR. Well, the OCAP part of the CableCard 2.0 standard requires all hardware be running the cable company's software. In other words, your Tivo would have to be running Comcast/Cox/whoever's horrid interface instead of the standard one. At least, that's how I understand it.
... I think that's a fairly accurate summary of the history of CableCard and tru2way. No, this will not replace CableCards. Actually, this is just another step in the process towards adopting them.
You summary is mostly accurate (and much more so than most other comments).
The Tivo software that Comcast has rolled out in Boston is actually built on the OCAP stack so you won't necessarily be stuck with the cable companies crappy interface. Reviews of that service have been mostly positive so it appears that the OCAP/tru2way platform is flexible enough to built a reasonable interface. This should also allow better integration with VOD service as well as switch digital which have been the problems for Tivo users so far.
One of the other motivations for this is now cable companies don't have to be the sole provider of set-top boxes. I'm don't think that slashdot readers get what it's like to deploy software to millions of homes on hardware that is made as cheap as it possibly could be. Diversity in this environment is a support nightmare and cable companies pay all the upfront costs for those boxes (hence their cheapness).
Tru2way should allow a lot more diversity in the market for people who want high-end boxes. If this is bundled with your several thousand dollar HD TV the impact is far less noticeable.
There is a PBS Nova show on this topic which discusses several alternative theories to the Clovis first one.
America's Stone Age Explorers
http://www.pbs.org/wgbh/nova/stoneage/
It was recently airing (again) so you may be able to catch it again.
I've had my 650 for about 2 weeks (on Sprint) and haven't noticed this problems. I just asked my wife who probably accounts for 1/2 my cell phone time and she hasn't noticed anything in the quality of reception. In noisy environments like my car I use the ear piece (which I think is also the law in my state) so perhaps it's the mike on the phone in noisy environments but that doesn't seem unusual. The first problem was a non-issue for me when upgrading for me and SD cards are so cheap I doubt it ever will be.
I really like the 650. The touch screen is much easier to navigate with my finger and nail than my 300 and the increaded resolution is really nice. Browsing is also fast and quite usabe. I don't bother with the palm version of slashdot since I can read the regular one just fine.
There are a few minor interface issues like it takes a while to move to the dialing screen which sometimes makes me think that it didn't register the click and the call logs interface high-lights the last call dialed and the cancel button but the cancel button has focus (not the last number) so I often am sometimes confused when I hit enter and it doesn't call the last number. Nothing huge though just something to get used to.
The only reason I see to wait is if you're still deciding whether to shell out $450 for a phone. For me it was worth it.
"Wicked Cool" seems like a pretty dated term to me but after all it is a book on shell scripts. Perhaps we'll see "Hella Cool Perl Scripts" next. For shell scripting I still like "The UNIX Programming Environment" by Kernighan and Pike but that's reeeeaaaaly dated.
I did something similar but it wasn't quite the hack that yours was. When glass bottles we're still used the openers in convienent stores dropped the bottle caps in an attached bucket. I would go to several convient stores and collect the caps from the buckets. My best take was $50 from Pepsi's 21 game (I found an odd numbered cap!) but it was worth it just for the free soda.
I've just spent 20 minutes reading posts looking for something to mod up and resisting the strong urge to mod down. Now I'm giving up and posting myself. There seem to be four main arguments people are making that seem misguided (IMHO).
First, people keep complaining about students having to submit their work to this site instead of the teacher submitting it. This is such a non-point. What difference does it make if the teacher submits it or the student does. It's perfectly reasonable to request electronic submission, and three lines of code can make a paper submitted to the teachers site send it to the plagiarism site.
Second, that idea this some how violates trust between the student and teacher. When you turn you paper in you expect that the teacher will check for these sorts of things. The means by which they do it doesn't change these expectations. Trust is based on a personal relationship. I'd prefer the grading be as objective as possible and be the same regardless of whether the professor likes, trusts, or hates me.
Third. Why does everyone assume that the "originality report" mentioned in the article only contains a binary value. Systems I've used look a lot like the output of a visual code diff (only the same areas are highlighted). The systems flag essays for review and then you make the call whether the specific case is actually plagiarism or just a quoted passages or a coincidence. There is no presumption of guilt, just a tool to make the assessment easier.
Finally, I don't know why no one has ripped this comment from the article apart: "The reality is that the high monitoring of students really isn't about catching cheaters, it is a substitute for hiring enough faculty members to take the time to read student work," said Ian Boyko, national chair of the student federation.
The papers still get graded so someone reads them. If you hire more people then that means that one person doesn't see all the papers which means that in-class plagiarism has more of a chance of succeeding.
What is interesting in these functions is that, as pointed in the article, there seems to be something wrong with Sun's implementation for Java.
For many math functions java uses a software implementation rather than using the built in hardware functions on the processer. This is to ensure that these function perform exactly the same on different architectures. This probably accounts for the difference in performance.
You should really consider a state school which in general have much lower tuition. I went to a state school for undergrad and have taught at an Ivy League university where I went for grad school (this makes me perhaps biased but also informed). The curriculum for CS was not that different at the undergrad level or even masters level. What you pay for at a better school is being around more motivated people. The bar starts out higher for everyone and most people work a little harder (and/or grade grub more) at a top level school. If you're motivated and are inclined to be friends with other motivated people can get just-as-good an education for a fraction of the price.
There are also often state-funded scholarships which are available to anyone going to a state school who meets some minimum requirements. In Florida, for example, you can qualify for a scholarship to a state school with 1100 SAT/3.5 GPA. It will cover tuition and some books but not much else. You may have to take out some small loans but nothing like the 60K of debt many people come out of private universities with.
One caveat. If you want to go to grad school, then name recognition and faculty recommendations from people other faculty know of are really important. Not having had a chance to meet these people is a potential downside of going to a state school if you later want to go to a more prestigious grad school. Most people don't take this route and its not an insurmountable transition (I did it) so I think it's worth the lack of debt. In grad school (specifically in PhD work) being on scholarship is the norm, so there are more funding opportunities once you get there. Good luck.
The maxent code in this package classifies events with a descrete set of outcomes. You would have to break the above tasks down into descrete tasks. There might be very specific task like given features computed over some visual input, what angle (where 360 is divided into sufficiently small but descrete parts) should I approach the basketball for collection. Or if you have a routine to do one of those tasks you might use maxent to decide when you should enter that routine. It could compute isDirty for you room.
It's useful for place where you want to put an "if" in your code but don't know how to express the conditional well as it may depend on a large number of factors. In these cases you may be able to use a maxent model to estimate that conditional for you rather then trying to figure out how different factors in you conditional should be weighted or combined. Hope this helps.
The short answer is yes. Spam filtering can be though of as a document classification problem. Some documents are classified as going in the inbox and others to the trash. The Maximum Entropy or SVM classifier software which is included could be used to train a model for this type of classification. You would need data to train it (in this case email marked as either spam or inbox or any other category you want). The model will produce a probability of whether or not it's spam. To make it really useful you'd want to integrate somehting like this into you email client so that you could tell it when it makes a mistake and retrain it. Tight email integration would also allow you to use you're contacts as a source of information to the model so that it could learn the even if you close friends mention the words penis enlargement in the smae mail you might still want to read that.
There is a mature statistical machine learning package on sourceforge. Check out maxent.sourceforge.net.
It's primarily been applied to natural language processing but it's applicable to a wide range of classification problems. There are even examples in the download package. I use it regularly and like it a lot but I'm also the primary maintainer so I might be biased.
SCO, schmo. Does the following passage sound incredibly scary to anyone else?
And Microsoft, which has been accused of conniving with SCO in its march against Linux, is slated to enter the search market and compete against Google. The widgetry, which is supposed to retrieve all kinds of file types, both structured and unstructured, and all kinds of storage systems, beginning with the user's own drive, will be integrated into its operating systems like the anticipated Longhorn.
Given M$'s habit in the past of looking around the hard dirve and downloading what they find, the last thing we need is to blur the line between local searches and web searches so users become completely oblivious. In this horor film you don't find out the the call is comming from inside the house but that the operating system is the spyware.
It's a current event quiz.
on
News at a Glance
·
· Score: 2, Interesting
I concur with most posts that this isn't a really useful source of news, but it is a fun a way to see how up on the news you are. Plus you can check instantly to see if you're right. I get most of my news from the radio so it was refreshing to put a face on the news of the day.
TechTV also had a review of projectors under $1000 in September. It featured the Epson PowerLite S1, the Toshiba TLP-S10, and the Gateway 205 Projector. The Gateway came out on top but check out the full review here.
This is a non-issue. There is little to no incentive for Oracle to make their own distribution. The benifit of RedHat is that a large number of verndors certify their products against it. An Oracle distribution isn't very helpful unless Oracle is the only vendor you use. There is an implicit benfit for vendors to standardize to a single distribution in the same way that there is a benifit to most people using the same auction service (think ebay). RedHat has already filled that niche. They may be turning their back on the community (that remains to be seen) but the wolves are nowhere in sight.
The point you make has doesn't address whether you should use a statistical or symbolic approach. Symbolic approaches can ignore context just as easily as statistical ones. The benifit of statistical approachs is that they typically make it much easier to determine the combined influence of a number of different factors. The bane of symbolic approaches is the hours that can be spent tweeking one rule only to find that the new tweek intracts badly with some other rule. The addition and integration of context and other more meaning-based types of information will likly be much simplier in a statistical framework. Case in point: some systems in the evaluation mentioned gave more weight to statistics gathered from similar documents to the one being translated when translating names. This has the effect of providing a sort of document level topic bias. This type of approach could potentially allow you to prefer sentence as a gramatical unit in a school or linguistics context and sentence as punishment in a legal or penal context.
That said it is always possible to construct a patalogical or even reasonable casess that will twart a particular approach. But most people would be happy with somehthing that did a good job most of the time. Specifically DARPA, the evaluation's sponsor, would like a system that gave them the ability to determine which documents are worth having a human being translate. A recent article in Time says the number of Arabic speakers in the FBI has trippled since 9/11 to 208. That's still not that many people given the amount of data they monitor. Mediocre translation techniques, availble today and easy to adapt to new languages, are probably the best bet for leveraging the vastly larger number of Enlgish speaking government employees.
Thanks for the correction. I read quickly and mis- interpreted what had happened in that section. This answers my main question and motivation for posting which is, "How the hell did he get GotoMyPC on the users machine?" Answer: He didn't, "access a computer with GoToMyPC software" meant alreay installed as opposed to via. Preposition-phrase attachment ambiguity strikes again!
Solr/Lucene real-time search (or near real-time) is one of its weaker points. I think it could keep up with the updates but making them appear in the index immediately and having the caching still perform can be tricky.
We have one index with that's updated every 20 minutes, but only has about 50k documents and a combination of Solr cache auto-warming and squid's stale-while-re-validate logic works there.
In another system where updates need to be faster, we had to do some custom work to make it perform where there is an in memory index for recent changes, an on-disk index of previous changes, and process for moving from one to another. Hopefully these improvements will make their way back to Lucene in the future.
Solr/Lucene power a number of sites that would be in the enterprise search category (Apple, Netflix, C-Net). Where I work, we index 5 million docs in Solr/Lucne and serve out millions of search requests a day. It's not google scale, but most people don't need that. The markets where one needs a FAST are dwindling quickly.
I agree TWC is a tool. I still don't think its due to fear of losing customers to online video. Hopefully in the future, they will come up with more reasonable policies.
Below is what I was referring to. I was left with the impression that digital transmission and digital tier we're somehow related such as not having to transcode an analog signal in real-time but given that it say "independently" they may not be. Thanks for the correction.
http://connectedhome2go.com/2009/04/20/project-cavalry-in-philly-and-beyond/
"Independent of the broadcast digital transition set to take place on June 12th, Comcast is making the all-digital shift in about half of its markets this year. (All Motorola markets so far) In fact, the Comcast move is taking place in my own Philadelphia backyard at the moment. As part of a âoemarinationâ period, Comcast is deploying digital set-tops and DTAs for basic subcribers during already-scheduled truck rolls. After new hardware is the field, Comcast aims to move roughly 40 analog channels to its digital tier - all as a way to free up bandwidth for more HD content and DOCSIS 3.0 channel bonding."
While stories of the cable companies running in fear from the impending flood of online content and restricting bandwidth in response have been common on slashdot for a while it's disappointing that something like the Economist has picked up this fable.
The reality is that most of the content that offered on cable today won't make its way to the web for free under the current revenue models of content providers (not cable cos). Currently half of the revenue that channels like TLC get is from cable subscriptions. The other half is from advertising. These channels aren't interested in cutting their revenues in half on the hopes that on-line advertising somehow doubles in profitability. This is especially the case when it's currently only about 10% of what the same ads get you on TV.
Hulu is an experiment by major networks (FOX/NBC) (who already provide their content for free and get most of their revenues from advertising) to see if they can make the online advertising model work and capture more eyes than they are currently getting. While the site is successful in terms of traffic, the advertising dollars aren't there yet. As long as it doesn't undermine the real money of TV advertising, it's a useful experiment. (There is some cable network content on Hulu from Viacom ala Comedy Central/Sci Fi but its the content that is so mainstream that the advertising experiment may pay-off).
While the Economist was missing the boat they decided to also throw in the al-a-carte pricing myth as well. That model doesn't work either since not everyone wants the same 15 channels. If you move to that model then today's 100 channels would be tomorrows 20. Hope you like Home Shopping Network more than BBC America.
As to why DOCSIS 3 is so slow to make its way out? First cable cos are large companies that have basically been monopolies for long enough that their culture reflects that. They are slow to do anything. Second, they are waiting for the end of analog signals so they can reclaim some bandwidth. Third it's expensive. When you have 30 million customers, $100 a pop is real money. Under these conditions, there is little incentive to rush to market. That said, cable cos are starting to roll out this service and increasing the bandwidth of existing customers.
So what will happen? I think content providers will partner with cable cos to provide their content online. It will be on a different part of the pipe, just like phone service is so you get good quality of service for HD even when your neighbor has his torrents at full throttle. You'll see reasonable network caps like comcast's 250GB a month but your video viewing (on that separate pipe) won't count towards that so you mostly won't care providing you have a job or something else to do other than watch torrent content all day. Will you suddenly get all the good content you want for free? Unlikely. Will your most affordable (and legal) option for getting content still be from a cable co/tele co? Probably. Hopefully as more options become available: Cable, Fiber, Netfix, iTunes, the pressure will be on cable/tele cos to provider a better experience.
You summary is mostly accurate (and much more so than most other comments).
The Tivo software that Comcast has rolled out in Boston is actually built on the OCAP stack so you won't necessarily be stuck with the cable companies crappy interface. Reviews of that service have been mostly positive so it appears that the OCAP/tru2way platform is flexible enough to built a reasonable interface. This should also allow better integration with VOD service as well as switch digital which have been the problems for Tivo users so far.
One of the other motivations for this is now cable companies don't have to be the sole provider of set-top boxes. I'm don't think that slashdot readers get what it's like to deploy software to millions of homes on hardware that is made as cheap as it possibly could be. Diversity in this environment is a support nightmare and cable companies pay all the upfront costs for those boxes (hence their cheapness).
Tru2way should allow a lot more diversity in the market for people who want high-end boxes. If this is bundled with your several thousand dollar HD TV the impact is far less noticeable.
There is a PBS Nova show on this topic which discusses several alternative theories to the Clovis first one. America's Stone Age Explorers http://www.pbs.org/wgbh/nova/stoneage/ It was recently airing (again) so you may be able to catch it again.
I've had my 650 for about 2 weeks (on Sprint) and haven't noticed this problems. I just asked my wife who probably accounts for 1/2 my cell phone time and she hasn't noticed anything in the quality of reception. In noisy environments like my car I use the ear piece (which I think is also the law in my state) so perhaps it's the mike on the phone in noisy environments but that doesn't seem unusual. The first problem was a non-issue for me when upgrading for me and SD cards are so cheap I doubt it ever will be.
I really like the 650. The touch screen is much easier to navigate with my finger and nail than my 300 and the increaded resolution is really nice. Browsing is also fast and quite usabe. I don't bother with the palm version of slashdot since I can read the regular one just fine.
There are a few minor interface issues like it takes a while to move to the dialing screen which sometimes makes me think that it didn't register the click and the call logs interface high-lights the last call dialed and the cancel button but the cancel button has focus (not the last number) so I often am sometimes confused when I hit enter and it doesn't call the last number. Nothing huge though just something to get used to.
The only reason I see to wait is if you're still deciding whether to shell out $450 for a phone. For me it was worth it.
"Wicked Cool" seems like a pretty dated term to me but after all it is a book on shell scripts. Perhaps we'll see "Hella Cool Perl Scripts" next. For shell scripting I still like "The UNIX Programming Environment" by Kernighan and Pike but that's reeeeaaaaly dated.
I did something similar but it wasn't quite the hack that yours was. When glass bottles we're still used the openers in convienent stores dropped the bottle caps in an attached bucket. I would go to several convient stores and collect the caps from the buckets. My best take was $50 from Pepsi's 21 game (I found an odd numbered cap!) but it was worth it just for the free soda.
Nevermind. Google probably just collapses the two since the both resolve to the same IP address.
Actually it looks like google is blocking this. The top hit is www.caldera.com Not that different but shame on google blocking the www.sco.com site.
I've just spent 20 minutes reading posts looking for something to mod up and resisting the strong urge to mod down. Now I'm giving up and posting myself. There seem to be four main arguments people are making that seem misguided (IMHO).
First, people keep complaining about students having to submit their work to this site instead of the teacher submitting it. This is such a non-point. What difference does it make if the teacher submits it or the student does. It's perfectly reasonable to request electronic submission, and three lines of code can make a paper submitted to the teachers site send it to the plagiarism site.
Second, that idea this some how violates trust between the student and teacher. When you turn you paper in you expect that the teacher will check for these sorts of things. The means by which they do it doesn't change these expectations. Trust is based on a personal relationship. I'd prefer the grading be as objective as possible and be the same regardless of whether the professor likes, trusts, or hates me.
Third. Why does everyone assume that the "originality report" mentioned in the article only contains a binary value. Systems I've used look a lot like the output of a visual code diff (only the same areas are highlighted). The systems flag essays for review and then you make the call whether the specific case is actually plagiarism or just a quoted passages or a coincidence. There is no presumption of guilt, just a tool to make the assessment easier.
Finally, I don't know why no one has ripped this comment from the article apart:
"The reality is that the high monitoring of students really isn't about catching cheaters, it is a substitute for hiring enough faculty members to take the time to read student work," said Ian Boyko, national chair of the student federation.
The papers still get graded so someone reads them. If you hire more people then that means that one person doesn't see all the papers which means that in-class plagiarism has more of a chance of succeeding.
You can still get pictures of it from google image search.
What is interesting in these functions is that, as pointed in the article, there seems to be something wrong with Sun's implementation for Java.
For many math functions java uses a software implementation rather than using the built in hardware functions on the processer. This is to ensure that these function perform exactly the same on different architectures. This probably accounts for the difference in performance.
You should really consider a state school which in general have much lower tuition. I went to a state school for undergrad and have taught at an Ivy League university where I went for grad school (this makes me perhaps biased but also informed). The curriculum for CS was not that different at the undergrad level or even masters level. What you pay for at a better school is being around more motivated people. The bar starts out higher for everyone and most people work a little harder (and/or grade grub more) at a top level school. If you're motivated and are inclined to be friends with other motivated people can get just-as-good an education for a fraction of the price.
There are also often state-funded scholarships which are available to anyone going to a state school who meets some minimum requirements. In Florida, for example, you can qualify for a scholarship to a state school with 1100 SAT/3.5 GPA. It will cover tuition and some books but not much else. You may have to take out some small loans but nothing like the 60K of debt many people come out of private universities with.
One caveat. If you want to go to grad school, then name recognition and faculty recommendations from people other faculty know of are really important. Not having had a chance to meet these people is a potential downside of going to a state school if you later want to go to a more prestigious grad school. Most people don't take this route and its not an insurmountable transition (I did it) so I think it's worth the lack of debt. In grad school (specifically in PhD work) being on scholarship is the norm, so there are more funding opportunities once you get there. Good luck.
The maxent code in this package classifies events with a descrete set of outcomes. You would have to break the above tasks down into descrete tasks. There might be very specific task like given features computed over some visual input, what angle (where 360 is divided into sufficiently small but descrete parts) should I approach the basketball for collection. Or if you have a routine to do one of those tasks you might use maxent to decide when you should enter that routine. It could compute isDirty for you room. It's useful for place where you want to put an "if" in your code but don't know how to express the conditional well as it may depend on a large number of factors. In these cases you may be able to use a maxent model to estimate that conditional for you rather then trying to figure out how different factors in you conditional should be weighted or combined. Hope this helps.
The short answer is yes. Spam filtering can be though of as a document classification problem. Some documents are classified as going in the inbox and others to the trash. The Maximum Entropy or SVM classifier software which is included could be used to train a model for this type of classification. You would need data to train it (in this case email marked as either spam or inbox or any other category you want). The model will produce a probability of whether or not it's spam. To make it really useful you'd want to integrate somehting like this into you email client so that you could tell it when it makes a mistake and retrain it. Tight email integration would also allow you to use you're contacts as a source of information to the model so that it could learn the even if you close friends mention the words penis enlargement in the smae mail you might still want to read that.
There is a mature statistical machine learning package on sourceforge. Check out maxent.sourceforge.net. It's primarily been applied to natural language processing but it's applicable to a wide range of classification problems. There are even examples in the download package. I use it regularly and like it a lot but I'm also the primary maintainer so I might be biased.
SCO, schmo. Does the following passage sound incredibly scary to anyone else?
And Microsoft, which has been accused of conniving with SCO in its march against Linux, is slated to enter the search market and compete against Google. The widgetry, which is supposed to retrieve all kinds of file types, both structured and unstructured, and all kinds of storage systems, beginning with the user's own drive, will be integrated into its operating systems like the anticipated Longhorn.
Given M$'s habit in the past of looking around the hard dirve and downloading what they find, the last thing we need is to blur the line between local searches and web searches so users become completely oblivious. In this horor film you don't find out the the call is comming from inside the house but that the operating system is the spyware.
I concur with most posts that this isn't a really useful source of news, but it is a fun a way to see how up on the news you are. Plus you can check instantly to see if you're right. I get most of my news from the radio so it was refreshing to put a face on the news of the day.
TechTV also had a review of projectors under $1000 in September. It featured the Epson PowerLite S1, the Toshiba TLP-S10, and the Gateway 205 Projector. The Gateway came out on top but check out the full review here.
This is a non-issue. There is little to no incentive for Oracle to make their own distribution. The benifit of RedHat is that a large number of verndors certify their products against it. An Oracle distribution isn't very helpful unless Oracle is the only vendor you use. There is an implicit benfit for vendors to standardize to a single distribution in the same way that there is a benifit to most people using the same auction service (think ebay). RedHat has already filled that niche. They may be turning their back on the community (that remains to be seen) but the wolves are nowhere in sight.
The point you make has doesn't address whether you should use a statistical or symbolic approach. Symbolic approaches can ignore context just as easily as statistical ones. The benifit of statistical approachs is that they typically make it much easier to determine the combined influence of a number of different factors. The bane of symbolic approaches is the hours that can be spent tweeking one rule only to find that the new tweek intracts badly with some other rule. The addition and integration of context and other more meaning-based types of information will likly be much simplier in a statistical framework. Case in point: some systems in the evaluation mentioned gave more weight to statistics gathered from similar documents to the one being translated when translating names. This has the effect of providing a sort of document level topic bias. This type of approach could potentially allow you to prefer sentence as a gramatical unit in a school or linguistics context and sentence as punishment in a legal or penal context.
That said it is always possible to construct a patalogical or even reasonable casess that will twart a particular approach. But most people would be happy with somehthing that did a good job most of the time. Specifically DARPA, the evaluation's sponsor, would like a system that gave them the ability to determine which documents are worth having a human being translate. A recent article in Time says the number of Arabic speakers in the FBI has trippled since 9/11 to 208. That's still not that many people given the amount of data they monitor. Mediocre translation techniques, availble today and easy to adapt to new languages, are probably the best bet for leveraging the vastly larger number of Enlgish speaking government employees.
Thanks for the correction. I read quickly and mis- interpreted what had happened in that section. This answers my main question and motivation for posting which is, "How the hell did he get GotoMyPC on the users machine?" Answer: He didn't, "access a computer with GoToMyPC software" meant alreay installed as opposed to via. Preposition-phrase attachment ambiguity strikes again!