Excellent, this is precisely the kind of answer I was hoping for. I've done some fairly interesting things with databases in the past (including a rather odd sort of partial replication - for security purposes), and while I'm up on much of the theory of these things, I don't have much practical experience. Do you have any books you can reccommend that would cover the issues of access to redundant databases, and this partitioning (which from what you said seems to be a method of load balancing, akin to how items in a hash table are "balanced" through the use of a good hash).
The reason that I think many of the "share-nothing" architectures that really just push all the sharing/bottlenecks into the database are so popular is that the people writing databases like Oracle/DB2 tackle those kinds of critical issues so much better than some random webapp developer under a strict deadline, and the fact that a lot of this "scalability" is often limited to low enough numbers of machines that there's little problem with this, but you're right, it doesn't really "solve" the scalability problem, but merely delays it.
I've been programming for many years now, but I'm new to web-app development. I've been learning Ruby on Rails (for various reasons) and one of the points the book I'm reading makes (Agile Development with Rails) is that good scalability is best achieved through the use of a "share nothing" architecture - basically reduction of chokepoints by reduction of shared content in a system.
I'm studying this as I'm looking at scalability concerns in an app I'm putting together, and I did a google search on the topic, but the only thing of interest I could find was
this article, which doesn't really go into the downsides of this approach. What does slashdot think about this?
These machines are actually starting to sound like something some people in the US might even like. I can imagine sitting outside in some remote area, working as much as I like without even worrying about running out of batteries (and getting exercise at the same time).
What'd I'd really like to see is an inexpensive laptop which has a screen that's highly visible, even outdoors. I could get a lot of work done that way, and work on my tan at the same time. Does anyone know of any? I'd assume it'd work best with grayscale.
Is there any benefit to having this "professionally produced content"? I mean, there's still a few good things on TV, and even occasionally at the theatres - could it be that having a strong commercial incentive could allow for more niche shows to take off online, where the demographics are better targetted? Could Gotuit allow for better service to small viewerships, especially those who are fans of low budget comedies? (Red Dwarf, anyone?)
We live in an age where, due to widely-held populist views, and political correctness, it is a "sin" to act in a manner that is supposedly "elitist." Now my question is, what precisely is wrong with believing that people have different potentials, and contribute different amounts to society? Most of the great inventions in history were not created by common folk - they were created by people who were excellent in some way. Very intelligent, very wealthy, or maybe just very, very persistant. These uncommon qualities lead to uncommon acheivement, and most of us owe our lives to them (without modern technology, most of us wouldn't have lived to see 20, or maybe even 2).
I think the reason for all this elitism towards places like myspace, livejournal, etc from/.ers is because we once believed that the "democratization" of this medium would lead to a renessaince, would be a life-changing event and would open the floodgates on good content. The problem is not that most people should not be allowed to post on the internet (that is ridiculous), but that most people really do not have anything to say that is valuable to anyone other than their friends. Because their audience is so narrow, the "value added" for the internet as a whole is very small compared to amount of noise this generates. Add to this the amount of bloggers who believe their insights are unique and wonderful (and yet are absolutely not) and the signal to noise ratio on the internet goes way down. I think many people on slashdot feel let down by this, it has made them more cynical about the masses.
Not everyone has some brilliant insight to share with others - I know I sure don't, which is why I don't run a blog. I think myspace is great if you're a kid, and people should respect this, but I'd love to be able to tell google to ignore things like myspace/livejournal etc when conducting searches (by default rather than something I must do manually).
So perhaps the "elitist"/.ers are going too far in saying something like this shouldn't exist, but really, what is so very wrong about being "elitist."
Myspace is a fad. It is a fad that may be here for a long time, but it too shall pass. This type of abuse, as well as the abuse by sexual predators and antagonistic peers will eat away at its usefulness until it is outlived and replaced by the new "cool" thing.
Blaming the users for anything should raise a huge red flag that you've got some usability problems.
Bollocks! The fact that flying an F22 is probably fatal for untrained grandmothers does not mean it has "usibility problems" - not every task in life is meant to be done by idiots, and the more effort is put into idiot proofing software, the less is put into reliability, functionality, and extensibility for the rest of us. Some things are too hard for a segment of the population to do, and ontologically tagging complex relationships between data entries may simply be beyond the average user. That's not a bug, that's a challenge.
There's too many generalizations like "blaming the user is always wrong" and "security through obscurity is not useful" that are incorrect under many conditions, and/. posters and moderators seem to be doing their best to propagate these. People have finite time, money, intelligence, knowledge, skill and experience. Not everything can be easy enough for everyone to use.
Google does not extract any semantics from content. It merely analyses the linking between websites and connects that with keywords. No semantics here.
I believe you are referring to PageRank, which is one of many algorithms used by google to determine search relevance.
This article discusses their use of Latent Semantic Indexing, which is a somewhat crude but effective form of sematic inference which is widely used in the field of NLP.
The biggest problem with the semantic web is spam. If you can trust the tags, it's a beautiful idea. If you can't, it's worse than useless - it's a waste of time. Google has the right idea, automatic extraction of semantics from content. If there's no real content, then (hopefully) that will be reflected in the semantic analysis.
Me, I estimate we're 5-10 years away from doing anything terribly useful with all of this stuff, but I can definitely envision the day when an internet without semantics seems as distant as an internet without Google.
I wonder if they have any intention of getting these brain boxes drunk then get it to recite the ABC's?
That's quite a funny post but it brings me to an (IMHO) interesting point - given a virtual "brain" capable of performing a certain task, can specifically targetting "damage" to the system result in creativity? Many of the most creative minds in our history got their inspiration in part due to mind-altering chemicals...
While I was an intern at the Jet Propulsion Laboratory, back when I was an undergraduate, I was very gung-ho about biologically inspired computing - I implemented an automatic flowchart positioning system using a genetic algorithm that would "evolve" a correct solution to the problem. While this certainly worked to some extent, the instability and sheer unpredictable nature of using such a stochastic algorithm made it impossible to use in a mission-critical setting. Many biologically inspired algorithms solve problems through methods that cannot be proven correct (unlike, say, the mathematics circuitry in a CPU), but merely empirically observed to "do a good job."
One of the main drawbacks of human engineering is the need for certainty, which often prohibits the use of many high-efficiency stochastic algorithms (especially for things like mesh communication) in conservative industries, like the US defense industry. This is also a significant problem in other areas, however, and many biologically inspired algorithms have properties that we cannot, so far, completely explain - they are treated like "black boxes" with many unknowns for engineering purposes.
I think that in certain circles, the tremendous success that is evolution on this planet has overshadowed its enherent weaknesses - that it is a greedy, local optimizer which cannot reach a large amount of the possible biological search space due to being stuck in local optima, and the added constraint that everything must be constructed out of self-replicating units (these two factors are why something useful, like, say, a Colt 45, will never emerge without the pre-existence of an intelligence). Biological examples are fascinating and often practical, but the biological approach is almost always "brute force" and/or "sub-optimal but still alive."
I think biologically-inspired algorithms will continue to gain prominence, but in my estimation, it is likely that there will be harsh limits imposed on how far guarantees of performance from empirical tests and symbolic analysis will actually hold.
This is more satire taken seriously by an idiot on the web.
I happen to be a friend of Dan Lurie, and he's only 17, so it's possible that he took this seriously, however he is also very bright, so I can believe that his presentation of it as fact was perhaps a joke to see who he could fool. Apparently that list includes slashdot editors...
I too have made such generalizations in the past about Slashdot, however the truth is that slashdot is rather quite heterogeneous (basically the problem with the moderation system is that it only functions well in homogeneous environments, hence the improved moderation system which a collegue and I invented). Slashdot has many reasonable people, and many unreasonable... those who just want to RTFA, and those who would not be caught dead doing so. My audience is whomever will listen, and I write for them and them alone. If they disagree or if it is effort wasted, well I have gotten practice that is valuable in my growth as a communicator. If someone gleans something of value from what I write, then that too is useful.
No, the type of algorithm I have developed is pretty much impervious to anyone gaming the system, except over very short runs... this kind of thing can be proven statistically in a very straightforward way. Also it is possible to detect the "short term" abuse that is possible in these systems and reduce that... it's a fairly simple signal detection problem.
Well there's nothing terribly "secret" about it (I suspect it is similar to how amazon reccomendations are calculated, but it may have a different mathematical basis), however I can't really publish it until I have real-life statistical data to test it on, as it assumes a very specific generative model that I can only conjecture represents the users of a site well. It is also possible that because the generative model does not translate very well to linear algebraic calculations (like, say, fixed size intermediate feature models), it will not be efficient enough to calculate on scales of hundreds of thousands of users (though I think it can be approximately optimized in polynomial time).
Either way, I really do need real life data to support this from something like Digg/Reddit (not slashdot) - where users are constantly rating things.
No, I understand that these features wish to be used by a small subset of the population, but why do we have any niche apps? For one, many of these features are so simple that even h4x0rs can implement them, without explicit support. This low marginal cost may be lower than marginal benefit all by itself, which makes it a rational action (provided one factors in opportunity cost).
One thing you may discount is that the "power users" who use these features are often trendsetters in their small communities and cliques. Personally I've had such a great experience with the DS that I would never hesitate to reccomend it to anyone over a PSP - if the PSP had many of these great features that hackers have been adding, I'd reccomend that instead.
I did not mean to imply that there is only one network (did I actually say that somewhere?) The problem is that according to the network effect, there is usually a significant, positive marginal utility for growing a network - sometimes even propertional to the current value of the network (leading to exponential relationship between size and network utility). Of course real life is a little more complicated than that, but in general smaller networks can be so much less useful than larger networks (except when used by mostly isolated cliques) that it prevents the smaller networks from having a real chance to grow, which was the main thrust of my previous post.
I do agree that Education and Government, being fairly monolithic, cliquish networks, may be able to achieve more independence from Microsoft than most random home or corporate users can, but that's still nowhere near "killing" Vista.
Yeah, many of these algorithms have unacceptably large classification error - all it takes it to not see a SINGLE REALLY IMPORTANT EMAIL, and that program goes right back to the digital netherworld from which it came.
The "conversation" view provided by gmail (and some others, but not as well IMHO) has really changed how I use email, however. It reduces the marginal cost of sending small, almost Instant Message-like emails, as additional entries to a conversation do not add to the clutter of my inbox, and the entire conversation can be read at a glace (rather than shuffling through the myriad levels of >>>'s in the quoted text, backwards).
Machine classification algorithms are improving all the time (Bayesian filtering's success to weed out spam is merely the beginning, as a Bayesian filter is one of the simplest and also least reliable classifiers to come out of the field)... I think some day having a computer manage all of this will really just work better than handling it manually - especially if email volume per individual keeps increasing, as humans are easily overloaded, in comparison to many of these algorithms which give better results with more training samples (more emails).
The easiest way to reduce SNR on things like slashdot, digg, etc is to apply a meta-filtering technique, perhaps through Yet Another Community Portal, but with much smarter filtering technology. A colleague and I have come up with an algorithm that would eliminate most of these problems, but after talking to Digg for a while about it, they weren't interested. If someone with a reasonable chance of success were to set up yet another community portal, I might be inclined to donate my research to its benefit.
How about we vote? Me: give him death via organ donation.
It's fun to think about these guys being tortured to death for what they do to everyone, but seriously, what you suggest is a far worse punishment than we give to most people convicted of raping children or serial murder, despite being the only western nation that even has the death penalty. Sometimes there are better ways to solve a problem, and I cannot condone capital punishment for nonviolent crimes (even violent crimes are not considered bad enough for that in most of the western world).
I think this kind of thing is an important reminder to all humans how much we really have to learn about this crazy but wonderful world we live in.
Excellent, this is precisely the kind of answer I was hoping for. I've done some fairly interesting things with databases in the past (including a rather odd sort of partial replication - for security purposes), and while I'm up on much of the theory of these things, I don't have much practical experience. Do you have any books you can reccommend that would cover the issues of access to redundant databases, and this partitioning (which from what you said seems to be a method of load balancing, akin to how items in a hash table are "balanced" through the use of a good hash).
The reason that I think many of the "share-nothing" architectures that really just push all the sharing/bottlenecks into the database are so popular is that the people writing databases like Oracle/DB2 tackle those kinds of critical issues so much better than some random webapp developer under a strict deadline, and the fact that a lot of this "scalability" is often limited to low enough numbers of machines that there's little problem with this, but you're right, it doesn't really "solve" the scalability problem, but merely delays it.
I've been programming for many years now, but I'm new to web-app development. I've been learning Ruby on Rails (for various reasons) and one of the points the book I'm reading makes (Agile Development with Rails) is that good scalability is best achieved through the use of a "share nothing" architecture - basically reduction of chokepoints by reduction of shared content in a system.
I'm studying this as I'm looking at scalability concerns in an app I'm putting together, and I did a google search on the topic, but the only thing of interest I could find was this article, which doesn't really go into the downsides of this approach. What does slashdot think about this?
These machines are actually starting to sound like something some people in the US might even like. I can imagine sitting outside in some remote area, working as much as I like without even worrying about running out of batteries (and getting exercise at the same time).
What'd I'd really like to see is an inexpensive laptop which has a screen that's highly visible, even outdoors. I could get a lot of work done that way, and work on my tan at the same time. Does anyone know of any? I'd assume it'd work best with grayscale.
Is there any benefit to having this "professionally produced content"? I mean, there's still a few good things on TV, and even occasionally at the theatres - could it be that having a strong commercial incentive could allow for more niche shows to take off online, where the demographics are better targetted? Could Gotuit allow for better service to small viewerships, especially those who are fans of low budget comedies? (Red Dwarf, anyone?)
We live in an age where, due to widely-held populist views, and political correctness, it is a "sin" to act in a manner that is supposedly "elitist." Now my question is, what precisely is wrong with believing that people have different potentials, and contribute different amounts to society? Most of the great inventions in history were not created by common folk - they were created by people who were excellent in some way. Very intelligent, very wealthy, or maybe just very, very persistant. These uncommon qualities lead to uncommon acheivement, and most of us owe our lives to them (without modern technology, most of us wouldn't have lived to see 20, or maybe even 2).
/.ers is because we once believed that the "democratization" of this medium would lead to a renessaince, would be a life-changing event and would open the floodgates on good content. The problem is not that most people should not be allowed to post on the internet (that is ridiculous), but that most people really do not have anything to say that is valuable to anyone other than their friends. Because their audience is so narrow, the "value added" for the internet as a whole is very small compared to amount of noise this generates. Add to this the amount of bloggers who believe their insights are unique and wonderful (and yet are absolutely not) and the signal to noise ratio on the internet goes way down. I think many people on slashdot feel let down by this, it has made them more cynical about the masses.
/.ers are going too far in saying something like this shouldn't exist, but really, what is so very wrong about being "elitist."
I think the reason for all this elitism towards places like myspace, livejournal, etc from
Not everyone has some brilliant insight to share with others - I know I sure don't, which is why I don't run a blog. I think myspace is great if you're a kid, and people should respect this, but I'd love to be able to tell google to ignore things like myspace/livejournal etc when conducting searches (by default rather than something I must do manually).
So perhaps the "elitist"
Myspace is a fad. It is a fad that may be here for a long time, but it too shall pass. This type of abuse, as well as the abuse by sexual predators and antagonistic peers will eat away at its usefulness until it is outlived and replaced by the new "cool" thing.
Blaming the users for anything should raise a huge red flag that you've got some usability problems.
/. posters and moderators seem to be doing their best to propagate these. People have finite time, money, intelligence, knowledge, skill and experience. Not everything can be easy enough for everyone to use.
Bollocks! The fact that flying an F22 is probably fatal for untrained grandmothers does not mean it has "usibility problems" - not every task in life is meant to be done by idiots, and the more effort is put into idiot proofing software, the less is put into reliability, functionality, and extensibility for the rest of us. Some things are too hard for a segment of the population to do, and ontologically tagging complex relationships between data entries may simply be beyond the average user. That's not a bug, that's a challenge.
There's too many generalizations like "blaming the user is always wrong" and "security through obscurity is not useful" that are incorrect under many conditions, and
Google does not extract any semantics from content. It merely analyses the linking between websites and connects that with keywords. No semantics here.
I believe you are referring to PageRank, which is one of many algorithms used by google to determine search relevance. This article discusses their use of Latent Semantic Indexing, which is a somewhat crude but effective form of sematic inference which is widely used in the field of NLP.
Alright, I should apologize to all of slashdot, I merely wanted to do that once, just to get it out of my system. There, I feel much better now.
The biggest problem with the semantic web is spam. If you can trust the tags, it's a beautiful idea. If you can't, it's worse than useless - it's a waste of time. Google has the right idea, automatic extraction of semantics from content. If there's no real content, then (hopefully) that will be reflected in the semantic analysis.
Me, I estimate we're 5-10 years away from doing anything terribly useful with all of this stuff, but I can definitely envision the day when an internet without semantics seems as distant as an internet without Google.
I, for one, welcome our new neural-regnerating rodent overlords...
I wonder if they have any intention of getting these brain boxes drunk then get it to recite the ABC's?
That's quite a funny post but it brings me to an (IMHO) interesting point - given a virtual "brain" capable of performing a certain task, can specifically targetting "damage" to the system result in creativity? Many of the most creative minds in our history got their inspiration in part due to mind-altering chemicals...
While I was an intern at the Jet Propulsion Laboratory, back when I was an undergraduate, I was very gung-ho about biologically inspired computing - I implemented an automatic flowchart positioning system using a genetic algorithm that would "evolve" a correct solution to the problem. While this certainly worked to some extent, the instability and sheer unpredictable nature of using such a stochastic algorithm made it impossible to use in a mission-critical setting. Many biologically inspired algorithms solve problems through methods that cannot be proven correct (unlike, say, the mathematics circuitry in a CPU), but merely empirically observed to "do a good job."
One of the main drawbacks of human engineering is the need for certainty, which often prohibits the use of many high-efficiency stochastic algorithms (especially for things like mesh communication) in conservative industries, like the US defense industry. This is also a significant problem in other areas, however, and many biologically inspired algorithms have properties that we cannot, so far, completely explain - they are treated like "black boxes" with many unknowns for engineering purposes.
I think that in certain circles, the tremendous success that is evolution on this planet has overshadowed its enherent weaknesses - that it is a greedy, local optimizer which cannot reach a large amount of the possible biological search space due to being stuck in local optima, and the added constraint that everything must be constructed out of self-replicating units (these two factors are why something useful, like, say, a Colt 45, will never emerge without the pre-existence of an intelligence). Biological examples are fascinating and often practical, but the biological approach is almost always "brute force" and/or "sub-optimal but still alive."
I think biologically-inspired algorithms will continue to gain prominence, but in my estimation, it is likely that there will be harsh limits imposed on how far guarantees of performance from empirical tests and symbolic analysis will actually hold.
Maybe a non-profit organization of independent web developers could be formed (perhaps already exists?) that could obtain membership on their behalf?
This is more satire taken seriously by an idiot on the web.
I happen to be a friend of Dan Lurie, and he's only 17, so it's possible that he took this seriously, however he is also very bright, so I can believe that his presentation of it as fact was perhaps a joke to see who he could fool. Apparently that list includes slashdot editors...
This is Slashdot.
-1 - forgetting your audience.
I too have made such generalizations in the past about Slashdot, however the truth is that slashdot is rather quite heterogeneous (basically the problem with the moderation system is that it only functions well in homogeneous environments, hence the improved moderation system which a collegue and I invented). Slashdot has many reasonable people, and many unreasonable... those who just want to RTFA, and those who would not be caught dead doing so. My audience is whomever will listen, and I write for them and them alone. If they disagree or if it is effort wasted, well I have gotten practice that is valuable in my growth as a communicator. If someone gleans something of value from what I write, then that too is useful.
No, the type of algorithm I have developed is pretty much impervious to anyone gaming the system, except over very short runs... this kind of thing can be proven statistically in a very straightforward way. Also it is possible to detect the "short term" abuse that is possible in these systems and reduce that... it's a fairly simple signal detection problem.
Well there's nothing terribly "secret" about it (I suspect it is similar to how amazon reccomendations are calculated, but it may have a different mathematical basis), however I can't really publish it until I have real-life statistical data to test it on, as it assumes a very specific generative model that I can only conjecture represents the users of a site well. It is also possible that because the generative model does not translate very well to linear algebraic calculations (like, say, fixed size intermediate feature models), it will not be efficient enough to calculate on scales of hundreds of thousands of users (though I think it can be approximately optimized in polynomial time).
Either way, I really do need real life data to support this from something like Digg/Reddit (not slashdot) - where users are constantly rating things.
No, I understand that these features wish to be used by a small subset of the population, but why do we have any niche apps? For one, many of these features are so simple that even h4x0rs can implement them, without explicit support. This low marginal cost may be lower than marginal benefit all by itself, which makes it a rational action (provided one factors in opportunity cost).
:)
One thing you may discount is that the "power users" who use these features are often trendsetters in their small communities and cliques. Personally I've had such a great experience with the DS that I would never hesitate to reccomend it to anyone over a PSP - if the PSP had many of these great features that hackers have been adding, I'd reccomend that instead.
Never underestimate the power of the Alpha Geek
I did not mean to imply that there is only one network (did I actually say that somewhere?) The problem is that according to the network effect, there is usually a significant, positive marginal utility for growing a network - sometimes even propertional to the current value of the network (leading to exponential relationship between size and network utility). Of course real life is a little more complicated than that, but in general smaller networks can be so much less useful than larger networks (except when used by mostly isolated cliques) that it prevents the smaller networks from having a real chance to grow, which was the main thrust of my previous post.
I do agree that Education and Government, being fairly monolithic, cliquish networks, may be able to achieve more independence from Microsoft than most random home or corporate users can, but that's still nowhere near "killing" Vista.
Yeah, many of these algorithms have unacceptably large classification error - all it takes it to not see a SINGLE REALLY IMPORTANT EMAIL, and that program goes right back to the digital netherworld from which it came.
The "conversation" view provided by gmail (and some others, but not as well IMHO) has really changed how I use email, however. It reduces the marginal cost of sending small, almost Instant Message-like emails, as additional entries to a conversation do not add to the clutter of my inbox, and the entire conversation can be read at a glace (rather than shuffling through the myriad levels of >>>'s in the quoted text, backwards).
Machine classification algorithms are improving all the time (Bayesian filtering's success to weed out spam is merely the beginning, as a Bayesian filter is one of the simplest and also least reliable classifiers to come out of the field)... I think some day having a computer manage all of this will really just work better than handling it manually - especially if email volume per individual keeps increasing, as humans are easily overloaded, in comparison to many of these algorithms which give better results with more training samples (more emails).
The easiest way to reduce SNR on things like slashdot, digg, etc is to apply a meta-filtering technique, perhaps through Yet Another Community Portal, but with much smarter filtering technology. A colleague and I have come up with an algorithm that would eliminate most of these problems, but after talking to Digg for a while about it, they weren't interested. If someone with a reasonable chance of success were to set up yet another community portal, I might be inclined to donate my research to its benefit.
How about we vote? Me: give him death via organ donation.
It's fun to think about these guys being tortured to death for what they do to everyone, but seriously, what you suggest is a far worse punishment than we give to most people convicted of raping children or serial murder, despite being the only western nation that even has the death penalty. Sometimes there are better ways to solve a problem, and I cannot condone capital punishment for nonviolent crimes (even violent crimes are not considered bad enough for that in most of the western world).
That's like having an "ethics department of sudan" or "NSA oversight committee".
Actually considering how insecure the US has proven to be, I'd say the NSA Oversight Committee must be working overtime!
*ducks*