By Thomas Piketty..I came across this book from a review in NYTimes..Here's a quote from The Times that intrigued me to pick up this book..
"Mr. Piketty argues that the decades after World War II, when the divisions between the classes narrowed and opportunities to move up the economic ladder expanded — that is, when the middle class as we knew it was formed — may actually have been an aberration. Society, Mr. Piketty wrote, risks a return to the historical norm of a yawning gap between rich and poor."
The special cases that you describe (similar files with slight variations) are much difficult to handle programmatically. If you expect the number of such files to be small, then I would just handle them manually after the rest of the dedupe process is done. However, if you think there would be numerous such files and would require a non-trivial amount of time to classify, then I would consider automating the step using a service such as Mechanical Turk from Amazon. With MTurk some real person is involved in the classification loop (I don't recall what they charge, but it's pennies for each classification request).
Actually, MYIE2 already does this. It's got an experimental feature where you can ask it to use Mozilla's Gecko rendering engine in place of the standard IE engine built into Windows.
By Thomas Piketty..I came across this book from a review in NYTimes..Here's a quote from The Times that intrigued me to pick up this book.. "Mr. Piketty argues that the decades after World War II, when the divisions between the classes narrowed and opportunities to move up the economic ladder expanded — that is, when the middle class as we knew it was formed — may actually have been an aberration. Society, Mr. Piketty wrote, risks a return to the historical norm of a yawning gap between rich and poor."
The special cases that you describe (similar files with slight variations) are much difficult to handle programmatically. If you expect the number of such files to be small, then I would just handle them manually after the rest of the dedupe process is done. However, if you think there would be numerous such files and would require a non-trivial amount of time to classify, then I would consider automating the step using a service such as Mechanical Turk from Amazon. With MTurk some real person is involved in the classification loop (I don't recall what they charge, but it's pennies for each classification request).
takes a whole new meaning..
The summary incorrectly states 3 GB/s when it's actually 3 Gb/s..
T stands for "Time Until Launch". Check out http://space.launch.info/countdown.html for a nice overview about a launch countdown.
I was scrolling through the slashdot main page and I read the headline as 'SpeakEasy embarasses firefox' hehe.
I just checked out the ipod Shuffle. It's super cool! I can't wait to get my hands on it :-)
In an article like this, you'd expect to see some cool pics...The link has no pics! Whoever posted this article...you suck big time!!!
Actually, MYIE2 already does this. It's got an experimental feature where you can ask it to use Mozilla's Gecko rendering engine in place of the standard IE engine built into Windows.
Google's cache for those who can't access the site.