SanityInAnarchy · Slashdot Mirror

Re:Bogus argument on Microsoft Claims Google Chrome Steals Your Privacy · 2010-03-31 15:31 · Score: 1

But in all other browsers I've ever seen that isn't the case with the address bar.

*facepalm*

You've seen Firefox, haven't you?

Re:Bogus argument on Microsoft Claims Google Chrome Steals Your Privacy · 2010-03-31 15:29 · Score: 1

At least until you actually open that URL, at which point IE happily sends it to Microsoft to make sure it's not spyware.

Re:Correct on Microsoft Claims Google Chrome Steals Your Privacy · 2010-03-31 15:18 · Score: 2, Insightful

do you not think that Firefox is becoming the new IE?

Well, seeing as Firefox supports HTML5 and web standards well enough for me to create pages that work as well in Firefox as they do in Chrome, Safari, Opera, etc... ...and said pages break in IE, and only in IE...

When Firefox starts breaking the fucking Internet, it will be the new IE, and not before.

Re:Not Correct on Microsoft Claims Google Chrome Steals Your Privacy · 2010-03-31 15:13 · Score: 4, Interesting

The fact is, Chrome is the most privacy intrusive browser

Firefox's Awesome Bar does the exact same thing, by default. IE's anti-phishing sends every URL you visit to Microsoft.

Re:They Suck on New Litigation Targets 20,000 BitTorrent-Using Downloaders · 2010-03-31 11:28 · Score: 1

Your entire response is thinly veiled justification,

Nope, it was a nuanced view. There is indeed a middle ground between "IT'S WRONG!!!" and "STUFF SHOULD BE FREE!!!" You are the one trying to paint it as black and white -- you seem to have disregarded half of what I said and interpreted it all as a "stuff should be free" response.

albeit in terms of attempting to marginalize those who would speak up for their rights.

Setting aside the question of whether copyright is or should be a fundamental right, who was I "attempting to marginalize", and can you show me doing that? I skimmed my own post to make sure, and I can't see how you got that impression.

It's both laughable and sad, and honestly not worth wasting the time to compose a full reply to.

In other words, so is your response. If you're not going to actually back up these claims, I can dismiss them out of hand.

I certainly hope you're never in the position of having to defend the rights to your own creations against those who would attempt to minimize their economic value.

I hope so, too, but I thought I made my position on that fairly clear -- both what I expect from my own creative output, and what others should expect from it.

your definition of "compensated for your labor" (something vaguely assumed to be the satisfaction of writing it or a one-time payment)

Yeah, pretty much. That's how most jobs work, by the way -- retirement plans aside, a factory worker doesn't get to collect income when he's 80 from work he did when he was 30. Why should "knowledge workers" expect to?

is most assuredly not the limit of compensation possible under the GPL,

I never claimed it was, but I'm also not out to get the most I possibly can out of every scrap of code that I write. If I were, I would be spending a lot more time talking to lawyers than I would writing code.

Re:They Suck on New Litigation Targets 20,000 BitTorrent-Using Downloaders · 2010-03-31 11:22 · Score: 1

you make a big mistake equating a child needing to steal just to stay alive with you needing to break the law to be entertained by new movies.

You assume it's about new movies, specifically. Aside from a desire to participate in our culture, there's also old movies (copyright lasts a long time), documentaries, books, etc. I'm going to regret this, but Stallman had a point with "the right to read" -- it is rapidly becoming difficult, if not impossible, to be literate without either accepting these unreasonable terms (or the unreasonable terms of a provider) or "pirating" in one form or another.

These days, I occasionally get news via YouTube clips -- but those YouTube clips may or may not count as fair use, and I certainly am not paying for the privilege.

You're right, my analogy is a bit extreme. Food is much more important than cultural literacy.

Off course many people will chose option C and go illegal, and from a moral standpoint i would argue they are justified (the media companies are basically forming a cartel against the consumer, and the artists), but being morally right isnt the same as being legal

The question I was responding to was a question of whether it is morally right. I was never in much doubt that it's illegal -- though even that gets fuzzy, often, like the YouTube clips I mentioned.

Re:So there are NO cheaters on Xbox Live? on Hacker Will Try To Restore Linux Support On PS3 · 2010-03-31 11:13 · Score: 1

I know you were talking about cheats -- aka, exploits -- but I was referring to hacking the device itself to run arbitrary code

In other words, you were changing the subject. The AC I originally replied to specifically said, "I just will not play with rampant cheaters," and was presumably hoping that the locked-down nature of a console prevents that. I was pointing out that it doesn't.

Re:No executable required? on New Method Could Hide Malware In PDFs, No Further Exploits Needed · 2010-03-31 07:37 · Score: 1

Whoops -- sorry, it's not in the summary. It is, however, in TFA.

Re:No executable required? on New Method Could Hide Malware In PDFs, No Further Exploits Needed · 2010-03-31 07:36 · Score: 1

At some point, in order for the exploit to trigger, some executable must operate on the data enclosed in the file. It is therefore an exploit in an executable, and thus it is important to know which executables are vulnerable.

All which correctly implement the PDF spec. Posting before reading the summary is also disingenuous.

Re:PDF-XChange on New Method Could Hide Malware In PDFs, No Further Exploits Needed · 2010-03-31 07:27 · Score: 1

Are you sure that's how he does it? He apparently has a better proof-of-concept that he hasn't posted, only sent to Adobe.

Re:They Suck on New Litigation Targets 20,000 BitTorrent-Using Downloaders · 2010-03-30 17:21 · Score: 4, Insightful

I didn't say it isn't copyright infringement. I said it's stealing.

Even if you don't see the two as mutually exclusive, both I and the Supreme Court disagree with you as to it being stealing. Read the first bloody sentence and you'll see what I mean.

Are you more interested in justifying stealing via quibbles over legal terminology,

This isn't about justifying it.

Look, if someone had their pocket picked, and they started whining about being raped, we'd rightly correct them on their terminology -- frankly, it's an insult to the real rape victims to claim that pickpocketing is rape. That's not "justifying" it in the least, and we can still say that pickpocketing is wrong without also saying it's "rape".

The difference between theft and copyright infringement is significant, and it's not just about legal definitions.

are you interested in stating your opinion on whether it's right or wrong to do such a thing?

I'm not sure I believe in absolute rights and wrongs. Take the above pickpocketing example -- is pickpocketing always wrong? A child living on the streets might have no other source of income, or might have a "pimp" who will beat him if he doesn't do it. It might be that the pickpocket is a security expert hired by the victim. It might be that the pickpocket needs ransom money to save his family. It might be that the pickpocket is a spy, and has legitimate reasons for needing information stored in the victim's wallet...

So moving back to copyright infringement. In general, I don't like it. However, putting it in the context of copyright infringement shows it to be a relatively small offense, much smaller than pickpocketing. Let me make this even clearer: I run Linux. My options for HD video are basically:

Buy a Blu-Ray player and rent or buy physical disks.
Stick to DVDs (also physical), and break copyright law every time I play (or rip) them.
Use an authorized Windows, only on authorized output devices, heavy DRM all the way, and I get something like Netflix WatchNow.
YouTube.
BitTorrent.

The issue here is that no one is providing the product that I actually want, legitimately. Even something like Netflix WatchNow -- what if I want to download something at a higher quality than I can stream, leave it downloading all day so I can watch a 2 hour movie in the evening?

So pretty much the only realistic options are ripping DVDs myself (violates the DMCA) or downloading via BitTorrent (violates copyright law), and BitTorrent is going to give me a superior product -- I don't have to leave the house, and it'll be higher quality (HD) than I'd get with a rented DVD.

Now, is it wrong?

That's a harder question. There are some DVDs I go out of my way to buy, because I feel it's wrong for me to continue to watch the rips I downloaded without paying. DVD is, after all, not that encumbered, since it's been so brutally cracked. I also do tend to buy content I like when it's available as a DRM-free download, and I'll even tolerate a reasonable amount of DRM (I like Steam) on video games.

But in many cases, the option simply isn't there. I want to support Firefly, but where can I buy a 1080p version of it that I can play on my Linux laptop? I can't, so the choice isn't between buying and torrenting, it's between torrenting or simply not watching it.

I don't know that I could say it's right, but I could certainly say that at that point, it's not even as bad as jaywalking.

Would you be okay with someone lifting GPL licensed code, changing it for their own purposes, and then selling the code without disclosing the source?

That depends.

First, you're begging the question. "Lifting" here is a synonym for "stealing".

In general, no. However, I also don't think copyright law should be so ludicrously long, and I think that definitely any software older t

Re:So there are NO cheaters on Xbox Live? on Hacker Will Try To Restore Linux Support On PS3 · 2010-03-30 16:46 · Score: 1

XBox (360) was hacked a long time ago. PS3 has not (yet) been. Thus Sony's desire to keep it that way.

Did you read my comment? The particular "hack" I was talking about didn't require any sort of custom software running on the Xbox. (And it was an original Xbox, though I'm sure 360 has fared no better.) I don't know of people doing this on the PS3, but I wouldn't be terribly surprised, and there are always glitches.

As for the PS3 being hacked, actually, yes it has -- this was an attempt by Sony to close an existing exploit. It will fail in the long run, as DRM always does, it's just a matter of time.

So there are NO cheaters on Xbox Live? on Hacker Will Try To Restore Linux Support On PS3 · 2010-03-30 15:10 · Score: 1

Really?

I mean, last I checked, it was mostly cheaters at any level beyond the first few in Halo 2 -- mostly people exploiting things like the fact that one Xbox gets picked "at random" to be the server, but it wasn't particularly difficult to influence which one is chosen as server. I mean, without even hacking the box, you could do this as simply as resetting your router.

I'd guess that particular problem has been dealt with, but preventing cheating, even on a console, is somewhat like DRM -- you can't do it globally. The best you can do is allow dedicated servers with admins who pay attention -- something not really possible with most console games, or (apparently) most console-ish PC ports.

Re:First DUH!! on Hacker Will Try To Restore Linux Support On PS3 · 2010-03-30 15:05 · Score: 1

While I'm not as surprised as you that Sony has made this move, I suspect the "duh" is about someone trying to hack it now that it's happened.

Did anyone really not see that coming?

Re:They Suck on New Litigation Targets 20,000 BitTorrent-Using Downloaders · 2010-03-30 15:01 · Score: 2, Informative

What would you think about people taking code licensed under the GPL and incorporating it into a commercial, closed-source program? That would be stealing, too.

Nope, that would still be copyright infringement.

Read GP again, particularly the part where the Supreme Court ruled you're wrong.

Re:Good thing on New Litigation Targets 20,000 BitTorrent-Using Downloaders · 2010-03-30 14:55 · Score: 1

I wouldn't buy a Lamborghini "anyway" but I shouldn't get one for free.

Why not?

See, car analogies fail hard here. Remember those "You wouldn't steal a..." scare ads that ran in movie previews and on DVDs -- in other words, in places where someone pirating that film would never actually see them? My favorite response was someone who, on hearing "You wouldn't steal a car!" stood up and shouted "I would if I could fucking download it!"

Let's put this in even simpler terms -- let's pretend fabricators get to the point where they can create food out of thin air. Anything you want to eat, you can download from the Internet. Now, that's certainly not going to sit well with the master chefs, who want to force you to go to their restaurants and pay through the nose if you want something gourmet, but it would pretty much end starvation overnight.

So tell me again -- why shouldn't I get a car for free, if it were possible to duplicate them for free?

Your answer is inevitably going to involve something about the cost of designing said car, testing it, etc -- all of which is very nice, but also the kind of argument that doesn't really apply to cars. You wouldn't steal a car, because cars can't easily be duplicated, and stealing one deprives the owner of said car.

Even if you bring it back to the discussion about movies, why is it "a farce"? Please explain, without using the fatally-flawed car analogy.

Re:Good thing on New Litigation Targets 20,000 BitTorrent-Using Downloaders · 2010-03-30 14:43 · Score: 1

right up until you heard about this you were paying for legal access to indie content, were you?

I frequently do, when I can get it -- and particularly when I can get it in a relatively unencumbered form, ideally digital download. I did buy Sanctuary, for example.

However, I also tend to boycott companies which do things I don't like. Right now, that's mostly Sony and Best Buy.

Re:hopefully.. on Adobe Flash Now Officially a Part of Google Chrome · 2010-03-30 10:08 · Score: 3, Informative

ctrl+f does more or less the same thing. I agree, I wish the whole process was a bit more configurable, but it is all there.

Re:Seven years for eight hours work on Novell Wins vs. SCO · 2010-03-30 09:16 · Score: 1

While we're on the subject of epistemology, I have far more evidence that PJ exists than that God exists.

Re:Seven years for eight hours work on Novell Wins vs. SCO · 2010-03-30 09:15 · Score: 1

Yeah, that's about the only way a post like that escapes a troll mod.

Re:Article summary on Why Some Devs Can't Wait For NoSQL To Die · 2010-03-30 04:26 · Score: 1

Now with something like YAML, the serializing & deserializing code will probably be just as short, but what I'll get out of it is an untyped array or a map, and in a statically typed language, I'll have to do all the casts/parsing myself.

Not necessarily. What, exactly, is typed about XML? As far as XML itself is concerned, you're just dealing with strings.

In fact, the default YAML implementation in Ruby definitely encodes type information. For example:

> puts 1.to_yaml --- 1 > '1'.to_yaml --- "1"

Or, let's just pick a random class:

> require 'ipaddr' => true > puts IPAddr.new('127.0.0.1').to_yaml --- !ruby/object:IPAddr addr: 2130706433 family: 2 mask_addr: 4294967295

And if this didn't happen, then yes, I would have to parse it out myself. Ruby may be dynamically typed, but it is strongly typed -- I can't just take a string and pretend it's an integer, I have to at least call to_i.

For AJAX, JSON is clearly preferred simply because 1) all JS frameworks have first-class support, and 2) all server-side frameworks these days have first-class support, too.

That, I didn't know -- I know towards the beginning, XML was the default.

However, as I said, I like HTML. It depends on the app -- if I'm actually building a client-side browser application, I'd probably tend towards JSON. On the other hand, I like the fact that with microformats and CSS, I can effectively create a REST interface that just is the website, and I can easily pull out anything I need to deal with client-side with JQuery -- assuming I'm not just doing what I would be 99% of the time, which is taking the HTML response and injecting it right into the document.

It seems to be a Ruby-only thing, and I've only toyed with Ruby as a language. I'm generally not very fond of dynamically typed languages, preferring something statically typed, hybrid OO/FP, like Scala.

Definitely a Ruby-only thing. I have to wonder why you like something statically-typed, but that's another discussion.

Point is, it provides a frontend which has reasonably high-level concepts for querying, working with records, relationships can be navigated as if they were just properties or queried on as if they were tables, etc. Yet much of this is decoupled from a specifically relational model -- it's up to the adapter to map the query (constructed in an entirely type-safe, injection-free manner by an internal DSL) to the database in question. It works very well on relational databases, but it also works on entirely different beasts -- I've been contributing to the Google App Engine adapter.

(I realize I say App Engine a bit too much. I don't work for Google, it's just the database I've been using the most lately, aside from SQLite.)

I.e. ability to scale transaction to multiple-entity updates, or a sequence of updates interspersed with queries.

App Engine does the latter just fine, so long as the queries are all within an appropriate scope -- but that scope is fixed. The former depends how your model is defined -- you can easily update multiple entities, but they have to share an entity-group.

It seems like many (most?) applications fit easily into that model. For example, take Slashdot -- my user preferences could all be stuffed into a single entity-group. It's hard to come up with an example where I'd want to atomically update the preferences of more than one user. This post would be its own entity-group, most likely, as there's no particular need to even update multiple posts on a given story simultaneously, and it's trivial to order them by posting date, and deterministically after that -- however, it would probably make sense to group all moderations associated with this post with the post itself. (You'd want to store all moderations to be a

Re:Article summary on Why Some Devs Can't Wait For NoSQL To Die · 2010-03-29 16:30 · Score: 1

Isn't it kinda the whole point of NoSQL - to limit the DB only to those abstractions that are either easily scaled automatically, or can be scaled in a manual but obvious way by the user?

That's one point. Another is that it can be much easier to plan for and reason about.

As an example, apply any document-oriented database to email. Store the original email as a giant text blob. Include one view which exposes a hash mapping header names to values, and suddenly be able to query on any header. Include another which actually parses out the MIME structure and exposes attachments and the like. Again, all possible in SQL, but it goes against the grain when the initial step is something like "dump a giant text blob and figure it out later."

Another example would be any schemaless database for rapid development -- just add a field ad-hoc, no need to deal with migrations, much less some massive ALTER TABLE once you're up and running.

Well, the trick here is that I do not really have to parse that XML at all.

Well, yes you do. You do have to deal with an XML tree, which is more complex than a JSON tree. You also have to be able to look at the XML yourself and figure out what it means.

I mean, I certainly second the tendency to use a library, but given the choice between a simpler and more complex format, both with good libraries, I'll choose the simpler one, provided it meets my needs.

Of course, it's the same for JSON and YAML, but therein lies the catch: which one is most widely supported by most frameworks? Even more importantly, which one is best supported (i.e. least lines of code) on the one I'm using at the moment?

That is indeed the more important step. More important than that is which is best supported in the app you're using at the moment.

I can see being apathetic about XML in that respect, but it can't be that hard to wrap the existing JSON stuff into something usable, even re-usable.

complete with validation (if I specify a property as int, it is read as int - or error is reported)

I don't know to what extent I care about that.

Of course, the price is parsing overhead. XML isn't easy to parse if you do it all right - encodings, entities, etc (been there, done that), whereas JSON is trivial.

Also the complexity of the tree you get out of it, and the format itself, and the bandwidth used.

So, for example, I would again find it perverse to use XML as a wire format for AJAX. HTML, I understand, because it can easily be dumped into the DOM, but if the client is actually meant to read and understand the message, JSON seems obvious. Rails made it trivial to support all of the above -- HTML XML, JSON, and Yaml -- I would use JSON for the browser (until I switched to HTML) and Yaml for other Rails apps or standalone Ruby scripts.

all ORMs I've seen so far are very leaky abstractions, so that is definitely not the answer...

Unless they haven't been done right yet. Out of curiosity, have you played with DataMapper much? It has an absurd number of backends on all kinds of things relational and otherwise.

Even so, I know that I do want ACID for sure (preferably optimistic concurrency in form of snapshots) by default, with flexible scaling boundaries.

What do you mean by "flexible scaling boundaries"?

And one possible reaction to the article -- you're not a bank any more than I'm Google. How much consistency do you really need?

Re:Article summary on Why Some Devs Can't Wait For NoSQL To Die · 2010-03-29 14:35 · Score: 1

As a side note, though, at this point we're stranding beyond the original question of "which is higher-level".

A bit. I suppose if you can show that in principle it would be possible to scale, you win. It seems pretty difficult to me, and it seems like the abstractions many NoSQL databases have chosen are deliberately biased towards scale.

It's also worth reminding that the claim in TFA was, in fact, "you don't really need a database cluster for your database - you're not Google".

I'm not, but at the point where I need much beyond SQLite, I probably do want the ability to scale, or at least do some sort of real-time replication and failover.

Something in additions to what? ANSI SQL? I think it's a meaningless point to raise, because NoSQL does not have a standard to begin with

That's actually my point. If you're straying beyond ANSI SQL, and getting into more and more vendor-specific stuff, you lose the advantage of having a standard at all.

The best we can do is compare relational and NoSQL implementations one-to-one, in which case "APPLY" etc are fair game.

Fair enough.

You mean, if you "save" the output within the database (say, via CREATE VIEW)? I do not know the answer to that, unfortunately. I believe that most implementations will recompute in full. In theory, of course, nothing in either relational model nor SQL prevents optimizing this.

Seems like that depends how pure it is, whether or not it has access to other rows in the same table, or other tables.

For traditional relational databases, you usually just throw more powerful hardware at them.

At which point, it gets much more expensive.

Yahoo had a 100TB Oracle database back then.

I don't know how Oracle stores data, specifically, though I do remember them doing some interesting things in multimaster, shared media, cluster filesystems. To me, that spells expensive. It spells some sort of massive RAID or SAN which your database cluster has to be wired to directly... ...versus vanilla hard disks (not even RAID, but hard disks with some checksumming) in some beige boxes connected via Ethernet.

The only real advantage of SQL is that pretty much everyone knows the basics of it. It's like XML or Java in that respect - not perfect, and in some cases just downright ugly, but easy to find experts in, has stable and mature solutions, and excellent tooling that, in part, alleviates the pain of having to deal with its deficiencies as a language.

That's actually a really good analogy. If I may:

There aren't many places I'd use XML now. It makes sense for document markup, maybe, and I like HTML, especially the extensibility of it. But it's used in a lot of places it doesn't make sense, like AJAX -- straight HTML or JSON is usually better there -- or REST -- if it's two web apps talking to each other, YAML is smaller and easier to parse -- or serialization, or config files -- YAML is smaller, easier to parse, and easier for humans to read.

Having an XML expert is nice, but it'll take you far less time to become a JSON expert -- something like 20 minutes.

Java is similar. I'm not sure the difference is quite as obvious, and it certainly takes longer to learn a new programming language, new frameworks, etc, than it does to learn a newer, lighter markup language. On the other hand, the productivity gains by switching to something like Ruby, despite missing certain tools Java has, are enormous. I mean, both are Turing-complete, and JRuby proves they aren't so different after all, but still, attr_accessor vs defining accessors yourself? Rails and convention-over-configuration vs piles and piles of XML? No contest.

More broadly speaking, relational is buying me the ability to spe

Re:Article summary on Why Some Devs Can't Wait For NoSQL To Die · 2010-03-29 12:56 · Score: 1

such a search would be O(n) for every array (which, of course, isn't a problem for your designated use case of tag clouds).

Wait, what?

That sounds like it would be. Every time a user clicks on a given tag, O(n) for each array? That's a serious performance hit vs what AppEngine does here, which is a string lookup in a hash table. Even in SQL, things like acts-as-taggable-on do a better job.

Of course, if it's just O(n) on insert and update, that's not a problem.

You can, obviously, shove anything into a blob or a text field - the DB won't preclude you from doing so. Querying it is also not a problem if the DB in question allows for custom user-defined functions written in a foreign language (practically all modern RDBMS do). Such a query will still be parallelized - it won't parallelize the body of your function, of course, but it will parallelize the application of that function to rows.

Still raises some questions, like: Will the results of the application of that function be cached and indexed for future queries?

it won't parallelize the body of your function, of course, but it will parallelize the application of that function to rows.

Will it do so across multiple machines?

Now, I'm not aware of a standard ANSI SQL way to get multiple output tuples from a single input one (though I suspect there is something in SQL99 or SQL03 additions). In MSSQL, I can define a UDF (written in T-SQL, with CREATE FUNCTION) that produces a table value from its input parameters, and then join on that. For example:

So, something in the additions somewhere might be able to do it. Either way, it's going to be ugly.

Effectively, SELECT..APPLY in MSSQL is precisely the "map" part of MapReduce, with exact same semantics.

Next question: Suppose I update, insert, or delete a row in xs. Do I have to reapply the whole thing?

It seems like what I'm getting here is more or less what I expected -- yes, you can do it (probably, I still have doubts about the performance/reliability tradeoffs), but it's cumbersome, to say the least. In order to take full advantage of these, you're going to have to dig deep into the non-portable details of your chosen database.

You also didn't really touch on scalability. I just checked the Postgres wiki, and found more or less the same situation I did awhile ago -- there is a total of one solution that can scale writes, and it does so with sharding. The closest is MySQL cluster, which currently holds all your data in RAM.

And then there's Riak, which can scale today.

So at the end of the day, what is SQL buying you as a language, and what is the relational model buying you here?

Re:Article summary on Why Some Devs Can't Wait For NoSQL To Die · 2010-03-29 10:14 · Score: 1

You can have optimistic concurrency in form of snapshots/MVCC.

Out of curiosity, what about the appengine behavior, in which there is only a single version that succeeds, and any concurrent transactions that affect the same entity group fail?

You can have arrays - there's no reason why an array cannot be an atomic type for the purpose of an attribute definition

Does any existing database do it this way? And more importantly, can you query on the members of said array? In the above key example, a query for "Which records are tagged with a given tag?" is stupidly simple.

You can dump raw XML or JSON into strings or blobs - in fact, many RDBMS these days have dedicated types which also allow querying over that

Great, what about YAML? What about ID3 tags on MP3s?

and, of course, if you want to do filter on the client, you can also query that XML/JSON in any language and in any way you see fit.

Doing it on the client is not nearly as efficient as implementing a map/reduce which runs where the data actually is.

And "emulating" MapReduce in an RDBMS? That's what SQL SELECT pretty much is, no?

No, not even close, unless I'm missing a lot about what SQL is.

A Map function is an arbitrary bit of code in a Turing-complete language which executes over every element in a given set, creating an arbitrary number (zero or more) corresponding elements which form the result set. A Reduce function is an arbitrary bit of code in a Turing-complete language which executes once for every element in a given set, taking that element and the result of reduce on the previous element as input, thus creating a single datum as output.

The power of this approach is that a given map can run concurrently on all elements in a given set, wherever they are -- generally, it makes sense to do this where the data is physically stored -- and the results can generally be cached, again wherever the data is physically stored. While a reduce can't necessarily run concurrently, many of them can run in parallel, and again, it's run where the data is.

A trivial example, for which there is probably a specific database feature, would be a fulltext keyword search. Just create a map function which emits a word and a count for each unique word in each document. Then it's just a matter of a key-based lookup on the result. I'm aware that there are many tools for searching in the relational world -- the point is that this is something you can do trivially, using a tool which isn't specific to search, without any sort of administrative overhead of dealing with cron jobs, rebuilding indices, etc.

Another example might be to take a given record that's an ODF and convert it to PDF, or extract out relevant metadata or keywords to search on.

Slashdot Mirror

User: SanityInAnarchy

Comments · 12,413