Temporal · Slashdot Mirror

Re:your sig on Johnson & Johnson Loses Major Trademark Lawsuit · 2008-05-27 06:15 · Score: 1

New punctuation update "~" (no quotes) at the end of a line to indicate sarcasm.

FYI: A tilde at the end of a word or sentence is already widely used to mean that the text should be read in a sing-songy/flirty voice. As I understand it, the usage originates from Japanese, and is used in particular by a lot of anime fans. If some people start using this for sarcasm too, it's going to be really confusing. Example: "I had fun last night.~"

Re:To be fair, he's a VP on Google Opens Up (Some) Search Algorithms · 2008-05-24 06:49 · Score: 2, Informative

The engineering VPs at Google are all engineers themselves. Udi himself was hired for his extensive background in web search, at Yahoo and Amazon. He knows a great deal about what he oversees.

Re:Oh My, on Senators OK $1 Billion for Online Child Porn Fight · 2008-05-16 11:44 · Score: 1

My initial reading of the title left off the "Fight" part - anyone else?

Yes. In fact, that's how it appeared on my screen.

Re:Literate programming... on Donald Knuth Rips On Unit Tests and More · 2008-04-26 09:16 · Score: 2, Funny

What's a cow-orker? A misspelled description of somebody who irritates cows?

Scott Adams once observed that a person you work with is technically called a "co-worker". The common misspelling "coworker" is actually a contraction of "cow orker", "ork" being an old Scottish slang term the meaning of which is not hard to guess.

Re:File a counter notice on More DMCA Censorship at Yahoo! · 2008-04-07 19:08 · Score: 1

Under the DMCA, to qualify as a safe harbor, you must have a policy for terminating repeat offenders. Yahoo has not gone "above and beyond" here; they are doing exactly what is required under the law.

Re:File a counter notice on More DMCA Censorship at Yahoo! · 2008-04-06 19:39 · Score: 1

Indeed, Yahoo is just doing what they are required to do by law, and filing a counter notice is the correct response. Complaining to Slashdot... not so much.

Re:Well, block them. on Users Know Advertisers Watch Them, and Hate It · 2008-04-01 18:01 · Score: 5, Insightful

Of course, note that by using TOR, you are essentially telling every web site you visit: "I am a user who is excessively concerned with privacy and knows how to anonymize himself. Statistically speaking, I am probably (though not certainly) college-age, computer-savvy, geeky, single, and male. Effective ads for me are likely to include ads for dating services, computer hardware, nifty gadgets, video games, and Ron Paul." Normally, advertisers would have to do a ton of tracking and data mining to determine these things, but you're just telling them right off the bat.

Just saying.

Re:Bad Engineering on Mac OS X Secretly Cripples Non-Apple Software · 2008-02-28 17:13 · Score: 1

It turns out that not everyone who posts on Slashdot is the same person.

Re:why? on Has Ron Paul Quit? · 2008-02-09 11:38 · Score: 1

Neither of them, let alone McCain, have any clue about US foreign policy.

Whereas I'm sure you, Master of Transhuman, are among the country's foremost experts on foreign policy. Clearly Obama should fire all his foreign policy advisers and hire you instead.

Re:Databases? WTF? on MapReduce — a Major Step Backwards? · 2008-01-18 16:40 · Score: 1

From reading the article, my impression is that the authors wrote it in response to some questions, and it's targeted at database people who have heard the MapReduce hype and are wondering if it will help them. Kind of like when people wondered if they should use Java for everything in the late 90s.

The article seems to assume that MapReduce is trying to compete with RDBMSs, and even attacks the authors of MapReduce, suggesting that they should read up on database theory. An article which simply argued that MapReduce is not a good alternative to RDBMSs while acknowledging that it is very useful in other areas would be more agreeable.

I agree that the hype around MapReduce seems a bit silly. It's just an engineering tool, not a computer science revelation. The main things it provides are scalability and fault tolerance (when running a task across thousands of machines, you have to expect failures). Theoretically speaking, it's not very interesting.

Re:Databases? WTF? on MapReduce — a Major Step Backwards? · 2008-01-18 14:12 · Score: 2, Insightful

I guess if you consider anything that involves (key, value) pairs to be basically an RDBMS, you might as well classify almost everything as an RDBMS, which seems to make the term pointless. Why write software anymore when we can just use a database? The reality is that I would use MapReduce and MySQL to solve very different problems.

I think TFA is being silly in trying to compare MapReduce to DBMSs. Yes, of course MapReduce compares unfavorably, because it isn't a DBMS. The comment that MapReduce is "A sub-optimal implementation, in that it uses brute force instead of indexing" is particularly telling: MapReduce is not intended for situations where you would want indexing, and never was. In general, the whole article is trying to judge MapReduce on points that are completely irrelevant to what it was designed for and the way it is actually used.

Really, if MapReduce were a DBMS, then why did the creators of MapReduce also create BigTable? BigTable *is* meant to be like a database, although it omits a lot of features in favor of scalability. MapReduce and BigTable are used for completely different things. I think Jeff and Sanjay (creators of both MapReduce and BigTable) probably find it pretty amusing to see MapReduce evaluated as a DBMS.

Re:Databases? WTF? on MapReduce — a Major Step Backwards? · 2008-01-18 10:19 · Score: 1

Um... Nope, sorry, the OP is right. MapReduce is a framework for batch processing of gigantic data sets where you intend to do something with every item in the set, or at least a large fraction of them. Relational databases are better for quickly looking up subsets of the items in a database based on query terms, and can be used for serving real-time queries.

Re:poorly publicized pre-primary polls on Diebold Voter Fraud Rumors in New Hampshire Primaries · 2008-01-10 09:59 · Score: 1

What's so great about your guy?

You didn't ask me but I'm answering. I recommend some videos:

Iowa victory speech: Beautiful. If you compare this with Hillary's post-Iowa speech, there's quite a contrast. Her speech was all about how important it is that we elect a Democrat and fight the evil Republicans. Obama's speech was about how we need to get past partisanship and negativity and reach for our dreams.

Obama speaking at Google: Gets into policy more than most of his speeches. (OK, so I partly choose this video because I work at Google and was there (this post reflects my own opinions and not those of my employer, etc.).)

I think the key reasons I support Obama are his commitment to openness and transparency in government -- something we really, really need right now -- and because he has the ability to inspire people. The latter is more important than it sounds: I believe Obama will be able to marshal far more support behind any cause (e.g. fighting climate change) than any other candidate would be able to, even if they believe the same thing. And, yeah, I generally agree with his other policies, though all of the democrats are pretty similar on most of these.

As for Ron Paul, I respect his integrity and resolve, but his policies would ruin this nation. It's really tempting to believe that Libertarian policy can work, since it's so simple and elegant, but in reality the world is too complicated.

Re:Employee Games the system on Google's Prediction Market · 2008-01-07 15:49 · Score: 1

I think he actually received some sort of tongue-in-cheek award.

Re:Sour milk on IE 8 Passes Acid2 Test · 2007-12-19 15:31 · Score: 1

Oh come on. Firefox has a "quirks" mode for rendering old non-complaint sites. Why is it wrong if IE does the same?

Re:Go Yahoo on Yahoo Becomes Apache Platinum Sponsor · 2007-12-16 17:14 · Score: 5, Informative

Err... It's great of Yahoo to do this and all, but as others have pointed out, Google was already a platinum sponsor of Apache, and until now was the only platinum sponsor.

Google also contributes directly to the Linux kernel, GCC, Mozilla, and many other projects, funds tons of open source development via the Summer of Code program, releases many of its own projects open source (from small things like its Java collections framework to huge things like Android), provides free hosting for open source projects, etc.

Not trying to diminish Yahoo's contributions -- they release plenty of code too -- but just saying that you can hardly claim Google doesn't do enough for OSS.

Re:10-100x better than what? on Spam Trap Claims 10x-100x Accuracy Gain · 2007-12-03 17:33 · Score: 1

10 to 100 times more accurate than existing systems means that for every 10 to 100 mistakes that existing systems make, this system will make just one.

Right, and the site claimed it makes 1 mistake per 100, so if it makes 1 mistake for every 100 mistakes that some existing system makes, then that existing system must be making 100 mistakes per 100.

I think the site just made a mistake in their numbers, but I found it funny.

10-100x better than what? on Spam Trap Claims 10x-100x Accuracy Gain · 2007-12-03 15:55 · Score: 1

From the web site:

Unprecedented accuracy. Over 99 percent spam blocking means fewer than one mistake in every 100 messages processed. That's 10 to 100 times fewer mistakes than any other available systems.

Uhh. So this system makes 1 mistake in 100, and claims this is 100x fewer than some other system. Apparently this other system they are comparing against gets it wrong every single time. I guess one way to make your products look good is to compare them against the theoretical worst competitor imaginable.

Re:3 reasons this will suck donkey balls on Google Conducts Trial on User-Voted Search Results · 2007-12-01 09:40 · Score: 1

Hypothetically, imagining that this does eventually affect other users' results, do you really think Google engineers are so dumb that they would not account for spammers?

You can rest assured that Google is not going to make any change that hurts the quality of its search results.

Re:3 reasons this will suck donkey balls on Google Conducts Trial on User-Voted Search Results · 2007-11-29 10:25 · Score: 1

It's one thing not to RTFA, but apparently you didn't even read the summary.

Other Google users will not be affected by the individual tweaking: instead it will be stored along with the users' own personal information for the next time they search for this word or phrase, so users are required to log in to avail of it.

Re:Next week on Googledot... on GOOG-411's "Biddy-Biddy-Boop" Sound Backstory · 2007-11-11 08:22 · Score: 1

Way to be pedantic, Colm. I think I use the MV toilets a lot more often than you do. ;)

Re:Next week on Googledot... on GOOG-411's "Biddy-Biddy-Boop" Sound Backstory · 2007-11-10 15:32 · Score: 2, Informative

Actually, all the toilets at Google HQ are the Japanese kind that wash your ass for you.

No joke.

Re:Briefcase... on Google Vows to Increase Gmail Limit · 2007-10-14 14:55 · Score: 1

Hashes aren't nearly reliable enough. By their very nature, a single hash is shared by multiple files. On such a large scale, even with ridiculously long hashes, there's bound to be multiple such instances.

So, as described here, we can approximate the chance of a hash collision using the function:

p(r, n) = 1 - e ^ (-n^2 / 2*r)

Where n is the number of messages in our system and r is the number of unique hashes. If you're playing along at home, here's some Python for that:

>>> import math >>> def p(r, n): return 1 - math.exp(-n**2/(2*r)) ...

Let's test it using known correct values for the birthday problem:

>>> p(365.0, 0.) 0.0 >>> p(365.0, 23.) 0.51550953806151678 >>> p(365.0, 365.) 1.0

Yep, looks like we're getting the right results.

Now, let's imagine that we are using a 128-bit MD5 hash (forget for the moment that it is broken). Let's also imagine that our system contains six billion billion messages -- that is, every single person on the planet has a gmail account and has sent one billion e-mail attachments. Plug and chug.

>>> p(2.0**128, 6.0e18) 0.051522532469024385

So, the chance that this system contains even a single hash collision is 1 in 20, despite the ridiculously large number of messages in it. And what if we use a better hash function, like SHA-256?

>>> p(2.0**256, 6.0e18) 0.0

Oops, looks like the probability is too small to be represented in a double-precision floating point value.

Now, it turns out MD5 is broken, and someone was able to construct two messages that have the same MD5 hash by using clever math. However, SHA-256 is not (as far as we know). If you were to find two messages with the same SHA-256 hash, this would be considered a significant event among cryptographers and you'd probably even get a front-page story on Slashdot (the cracking of SHA-1 did, and they didn't even find an actual collision yet).

All that said, if you really don't trust hash functions, you can always do your diff in the case of a collision. Since collisions are so rare, it wouldn't be too painful to take the hit of doing a full diff when they actually happen.

And we're talking about e-mail here, not normal files. Do you intend to do appropriate mime decoding on all files in each e-mail before hashing and storage, and accurately re-encode all known types of mime when the raw e-mails is again needed for forwarding, POP3 download, and the like? It's obviously MUCH easier to avoid this step.

Assuming you already have reusable MIME encoding and decoding code -- which GMail would have to in order to support the web interface -- this doesn't sound very hard, and obviously it would be worth it.

Re:Briefcase... on Google Vows to Increase Gmail Limit · 2007-10-14 08:23 · Score: 1

An obvious problem, but the solution is most certainly not trivial. Running diff over the network, against every file that has the same size, is ridiculously resource intensive, probably more-so than the storage they would stand to save. And the security issues involved would be huge.

There really isn't any good method to do this, without having an explicit "share" option that the user selects.

The solution most certainly is trivial. If two files have the same content, they also have the same hash. Throw your hashes in a database and check for matches any time a new file comes in. If you use a good hash function you don't even need to compare the actual contents of the files.

Re:Missing information in story on Future Looks Bright for Large Scale Solar Farms · 2007-09-23 06:57 · Score: 1

Um... nope.

Slashdot Mirror

User: Temporal

Comments · 1,094