What you call a "hash table database" others might call an "indexed cursor".
Others would be wrong;)
An indexed cursor only contains a reference to the original data. Memcached contains a duplicate of the original data, so I'd argue it was a database in its own right.
However, even if Memcached doesn't meet the criteria of a database, DBM-based databases certain do. They operate on a similar principle; a unique key points to a specific piece of data. Unlike Memcached, they are persistent, but like Memcached they are very fast and easily scalable.
I was asking for an example of a data storage technique that scales better than RDB.
Well, consider a modern DBM-based database like Tokyo Cabinet. Let's say we want to distribute it evenly across 16 machines, labelled 1 to F. When a request for data comes in, we MD5 the key and use the first 4 bits to determine the machine to use. This gives us an even and consistent spread of data between machines.
Relational databases can't easily use the same trick, because table joins are very costly to perform if the table data is distributed across several machines. In a nutshell, the flexibility of relational databases reduces their speed and scalability compared to databases with a more limited scope.
A better idea for what? Different problems require different solutions. Relational databases are useful tools to solve a wide range of problems, but they're not particularly easy to scale. That's one of their main weaknesses.
You mean an index? - the kind of thing you would use to efficiently access large amounts of stored data?
No, I was more thinking of a distributed hash table database like Memcached. Hashtable databases are less useful than a full relational databases, but as they can be trivially distributed over any number of machines, they make scaling extremely easy.
So if you look at any large website, there will be typically two database layers. The relational database is used as the master, and the more scalable hashtable database is used as a read-only cache.
99.9% of database claim to follow the relational model.
The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.
Uh, actually, relational databases are pretty damn hard to scale. That's basically the main problem with them. Why do you think relational databases are so often paired with a cache made from a hashtable-based database?
Wouldn't base64 work just as well? Take 6 bytes from/dev/random, then base64 them into 8 characters. Assuming a truely random source, all characters are equally likely:
The other problem is that every damn thing on the internet now requires a login and password - so much that we start using crap passwords like "asdf" for sites like your phpbb forum login, which happens to be the same as the other 50 forums you have accounts on or ever needed to register for to ask a one-off question.
Or you have one master password, and hash that together with the domain to give you a site-specific password.
Unfortunately (or maybe not) the truth is 20 years later, to write multi-platform products, the best option is still C/C++...
Python seems to do pretty well. It's less performant than C/C++, but it's suitable for the large proportion of desktop applications.
Re:I am afraid, there is lack of direction for Rub
on
Ruby 1.9.1 Released
·
· Score: 1
C/Java/Whatever doesn't require the newlines
It's still more verbose, and you can't chain for-loops. You're also limited to built-in looping structures; there's no support for folds or the like.
...now the only real difference is the fact you are declaring what types are in the list
Types are useful, so long as you have a decent type system. Unfortunately, Java has a very poor type system. I wouldn't necessarily say it was the worst type system ever devised, but it's certainly in the top 10.
Re:I am afraid, there is lack of direction for Rub
on
Ruby 1.9.1 Released
·
· Score: 1
I mean i got your point, ruby is good in implementing containing data structures because of the blocks, but do you really think a project gets developed faster because a language has blocks and closures?
Certainly. When I program in Ruby, I use blocks regularly. Assuming that I am not a closure fanatic, then presumably I have a reason for using them. If they didn't reduce development time, I'd do without.
Especially in ruby, where you have to use blocks which are quite unreadable for everything
Personally, I find blocks make code more readable:
children = people.select { |p| p.age < 18 }
Compared to:
List<Person> children = new List<Person>() for (Person p : people) {
if (p.age < 18)
children.add(p); }
Blocks are also more concise. It's my opinion that the more readable code you can fit in your editor window at any one time, the better idea you can get of the whole program. A function that is more than four lines of code is a big function in my book.
Re:I am afraid, there is lack of direction for Rub
on
Ruby 1.9.1 Released
·
· Score: 1
I have never in my life seen a problem made simpler by changing languages rather than downloading a library.
Presumably you're not familiar with a great range of languages, then.
Say you want to make a bag in Java:
Map<T,int> bag(Collection<T> coll) {
Map<T,int> m = new HashMap<T,int>();
for (T key : m)
{
if (m.containsKey(key))
m.put(key, m.get(key) + 1);
else
m.put(key, 1);
}
return m; }
Whilst in Ruby:
def bag(coll)
Hash.new(0).tap{|m| coll.each{|k| m[k] += 1}} end
But short examples like this can't really capture how much languages differ when used in large projects. Different languages open up entirely different ways to abstract a problem. In Ruby you can write methods that generate methods for other classes. In Clojure and other Lisps, there are macros which rewrite the syntax tree at runtime.
These sorts of features create abstractions that simply don't exist in other languages. There are whole classes of solutions that are simply impractical to use in some types of languages. I've written a few thousand lines of Clojure code, which is quite a lot for the language, and very little of what I've written could be easily translated back to Java.
The question is whether the beams can supply a black hole with enough mass that it passes the turning point and is able to grow further from the mass absorbed by falling through Earth's crust.
Atoms are about 1e-10 m apart, and the Schwartzchild radius is 1.48e-27 m/kg. So unless the LHC boffins plan to accelerate over a million billion tonnes of matter through the collider, the answer is no.
Waterfall can work fairly well in a large software development environment with strictly defined and relatively lengthy release cycles. We used it when I worked at the Unisys Airline Development/Support Center for the large USAS products sold to airlines (development cycle roughly 12-18 months between versions), and it worked just fine.
Perhaps I was too harsh:)
Obviously I can only speak from my own experience, but all the waterfall-based projects I've worked on would have benefited from a more fluid approach to software development. Perhaps there are projects that work well with waterfall, but personally I've never seen the waterfall methodology do anything but hinder, even for long-term projects.
The waterfall method is still the best development model. Uou have to analyze, then plan, then code, then test, then maintain. The steps need to be in order and you can't skip any of them. Unfortunately waterfall doesn't fit into the real world of software development because you can't freeze your requirements for so long a time.
The waterfall model is a nice model that is unfortunately entirely disconnected with reality. Even if your requirements are fixed, the waterfall model is too rigid to be of any use. To effectively plan a software project, you need to be aware of all the pitfalls that you will only discover in the coding phase.
Software development should be strategic. Construct a plan with the best information you have at the time, but be aware that plans rarely survive battle intact. You need to continually adapt your plan as new information is obtained. You need to scout out difficult terrain, and be ready to retreat when the tide turns against you. Try out unusual tactics and unconventional weaponry: this may result in failure, but when it succeeds the results can rewrite the rulebook.
There are over 100 git commands, and a command can do radically different things depending the the switches and target syntax. It's more confusing than any other revision control system that I have worked with.
I find it helps to have a good understanding of the foundations of Git. I'm not sure if you've read into blobs and trees, but I'd highly recommend doing so. There are a lot of tools for Git, but they're all built up from basic building blocks.
Probably the biggest problem I see with open source is the lack of critical review. Without this someone that turns out garbage code will continue to do so forever.
Isn't this more a problem with closed source software? Quality of code seems more important for an open source project that wishes to attract volunteers than a closed source project where the developers are paid.
Except the browser is an excellent application to hack, even if sandboxed, because it has network access and is used for nearly everything these days, including online banking. If you want to be safer you'll have to use separate sandboxed browsers for finance vs email vs... vs random browsing.
Isn't Chrome meant to do this? Each tab in Chrome is an individual sandboxed process.
I think your example is very flawed. The compiler can't easily make the kind of analysis to drastically reorder or regroup this computation.
Let's simplify your example for a minute by dropping the roots bit.
Yes, I agree that dropping the root calculations results in an expression that is not easily paralellisable unless you know that addition is commutative. However, if you have an expression like:
((sqrt 1) + ((sqrt 2) + (sqrt 3)))
Then the order in which you evaluate the sqrts obviously doesn't matter, and it wouldn't be hard for the compiler to automatically create three threads for each sqrt calculation.
That said, I didn't mean to imply that functional evaluation was a panacea for parallel programming, merely that it can help.
That's my understanding of why most current research is on adding primitives that let the programmer specify parallel algorithms at the source level, such as Parallel Haskell [hw.ac.uk]'s par and seq primitives.
You're probably right on that score. Though I'd contend that even this being the case, par and pseq are easier to adapt to existing functional algorithms than "new Thread" or "fork" is to imperative algorithms.
sumRoots [] = 0 sumRoots (x:xs) = y + ys
where y = sqrt x
ys = sumRoots xs
psumRoots [] = 0 psumRoots (x:xs) = par y $ pseq ys $ y + ys
where y = sqrt x
ys = sumRoots xs
This seems to be a much more useful and practical way to take advantage of parallel hardware - keep the familiarity of the languages you know and then pick from a smorgasborg of parallel execution libraries as fits your needs.
The problem with this approach is that you wind up having to coordinate your threads and locking manually (such as in pthreads), or you wind up programming in a way that's only a stone's throw away from functional programming anyway (such as MapReduce).
First, in a functional language the runtime has to make a decision that other languages specify - order of evaluaion. If order doesn't mater, a sequential language can start multiple threads too.
Yes, you can manually put in threads in an imperative language, but a sufficiently smart compiler can do this for you in a functional language.
For instance, let's say you want to find the sum of the square roots of a sequence. In an imperative language, you'd write something like this:
def sum_roots(xs):
y = 0
for x in xs:
y += sqrt(x)
return y
Evaluation order is fixed by the language; the sqrt of the first item in the list is calculated first, then the second item in the list, and so on. To make this algorithm concurrent we'd have to manually go in and alter the code to support this.
Here the evaluation order is not defined. We could evaluate the end sqrt calculation first, or assign the first 5 sqrt calculations to one thread, the next 5 to another thread and so on. Which sqrt is calculated first is entirely up to the compiler.
Second, you can have immutable data in conventional languages too. Just don't modify variables.
And how many conventional languages have a standard library that is optimized for immutable data structures? For instance, if I have an immutable HashTable in Java, and I want to add a new value to it, I have to create a copy of my existing HashTable. This is obviously very inefficient.
Conversely, functional languages have data structures that are designed to be worked with immutably. New data structures can be created from old ones, not by copying their data wholesale, but by referencing their data in a way that's similar to how version control systems work. You don't need to maintain a copy of every past version; you just need to keep a list of changes.
Obviously you could get some of the benefit of this simply by using a conventional language with some third party library that gives them efficient immutable data structures. But once you do this, you're basically half way to functional languages anyway, and you lose all of the syntax sugar that dedicated functional languages have for working with immutable data.
My response back then was to get excited about FP. My response now is: Where is the proof?
Whether functional programming is the best paradigm to use for parallel computing is undecided. But it does have a couple of advantages over imperative programming.
First, imperative programming specifies the order of evaluation, whilst functional programming does not. In Haskell, for instance, an expression can essentially be evaluated in any order. In Java, evaluation is strictly sequential; you have to evaluate line 1 before line 2.
Second, imperative languages like Java favour mutable data, whilst functional languages like Haskell favour immutable data structures. Mutability is the bane of parallel programming, because you have to have all sorts of locks and constraints to keep your data consistent between threads. Programming languages that do not allow mutable data don't have this problem.
No matter how fine the granularity of the responses of the AI becomes, it's still just a collection of little functions that passed the point of "photorealism" from a conversational perspective. That doesn't mean it's self aware.
I could substitute "functions" for "cells", and claim the same thing about you. How can a machine built out of hydrocarbons and water ever be conscious?
What you call a "hash table database" others might call an "indexed cursor".
Others would be wrong ;)
An indexed cursor only contains a reference to the original data. Memcached contains a duplicate of the original data, so I'd argue it was a database in its own right.
However, even if Memcached doesn't meet the criteria of a database, DBM-based databases certain do. They operate on a similar principle; a unique key points to a specific piece of data. Unlike Memcached, they are persistent, but like Memcached they are very fast and easily scalable.
I was asking for an example of a data storage technique that scales better than RDB.
Well, consider a modern DBM-based database like Tokyo Cabinet. Let's say we want to distribute it evenly across 16 machines, labelled 1 to F. When a request for data comes in, we MD5 the key and use the first 4 bits to determine the machine to use. This gives us an even and consistent spread of data between machines.
Relational databases can't easily use the same trick, because table joins are very costly to perform if the table data is distributed across several machines. In a nutshell, the flexibility of relational databases reduces their speed and scalability compared to databases with a more limited scope.
Do you have a better idea?
A better idea for what? Different problems require different solutions. Relational databases are useful tools to solve a wide range of problems, but they're not particularly easy to scale. That's one of their main weaknesses.
You mean an index? - the kind of thing you would use to efficiently access large amounts of stored data?
No, I was more thinking of a distributed hash table database like Memcached. Hashtable databases are less useful than a full relational databases, but as they can be trivially distributed over any number of machines, they make scaling extremely easy.
So if you look at any large website, there will be typically two database layers. The relational database is used as the master, and the more scalable hashtable database is used as a read-only cache.
99.9% of database claim to follow the relational model.
The rest have scalability problems that 99.9% of developers will never see throughout their entire careers.
Uh, actually, relational databases are pretty damn hard to scale. That's basically the main problem with them. Why do you think relational databases are so often paired with a cache made from a hashtable-based database?
Wouldn't base64 work just as well? Take 6 bytes from /dev/random, then base64 them into 8 characters. Assuming a truely random source, all characters are equally likely:
The other problem is that every damn thing on the internet now requires a login and password - so much that we start using crap passwords like "asdf" for sites like your phpbb forum login, which happens to be the same as the other 50 forums you have accounts on or ever needed to register for to ask a one-off question.
Or you have one master password, and hash that together with the domain to give you a site-specific password.
Unfortunately (or maybe not) the truth is 20 years later, to write multi-platform products, the best option is still C/C++...
Python seems to do pretty well. It's less performant than C/C++, but it's suitable for the large proportion of desktop applications.
C/Java/Whatever doesn't require the newlines
It's still more verbose, and you can't chain for-loops. You're also limited to built-in looping structures; there's no support for folds or the like.
...now the only real difference is the fact you are declaring what types are in the list
Types are useful, so long as you have a decent type system. Unfortunately, Java has a very poor type system. I wouldn't necessarily say it was the worst type system ever devised, but it's certainly in the top 10.
I mean i got your point, ruby is good in implementing containing data structures because of the blocks, but do you really think a project gets developed faster because a language has blocks and closures?
Certainly. When I program in Ruby, I use blocks regularly. Assuming that I am not a closure fanatic, then presumably I have a reason for using them. If they didn't reduce development time, I'd do without.
Especially in ruby, where you have to use blocks which are quite unreadable for everything
Personally, I find blocks make code more readable:
Compared to:
Blocks are also more concise. It's my opinion that the more readable code you can fit in your editor window at any one time, the better idea you can get of the whole program. A function that is more than four lines of code is a big function in my book.
I have never in my life seen a problem made simpler by changing languages rather than downloading a library.
Presumably you're not familiar with a great range of languages, then.
Say you want to make a bag in Java:
Whilst in Ruby:
And in Clojure:
But short examples like this can't really capture how much languages differ when used in large projects. Different languages open up entirely different ways to abstract a problem. In Ruby you can write methods that generate methods for other classes. In Clojure and other Lisps, there are macros which rewrite the syntax tree at runtime.
These sorts of features create abstractions that simply don't exist in other languages. There are whole classes of solutions that are simply impractical to use in some types of languages. I've written a few thousand lines of Clojure code, which is quite a lot for the language, and very little of what I've written could be easily translated back to Java.
The question is whether the beams can supply a black hole with enough mass that it passes the turning point and is able to grow further from the mass absorbed by falling through Earth's crust.
Atoms are about 1e-10 m apart, and the Schwartzchild radius is 1.48e-27 m/kg. So unless the LHC boffins plan to accelerate over a million billion tonnes of matter through the collider, the answer is no.
Waterfall can work fairly well in a large software development environment with strictly defined and relatively lengthy release cycles. We used it when I worked at the Unisys Airline Development/Support Center for the large USAS products sold to airlines (development cycle roughly 12-18 months between versions), and it worked just fine.
Perhaps I was too harsh :)
Obviously I can only speak from my own experience, but all the waterfall-based projects I've worked on would have benefited from a more fluid approach to software development. Perhaps there are projects that work well with waterfall, but personally I've never seen the waterfall methodology do anything but hinder, even for long-term projects.
The waterfall method is still the best development model. Uou have to analyze, then plan, then code, then test, then maintain. The steps need to be in order and you can't skip any of them. Unfortunately waterfall doesn't fit into the real world of software development because you can't freeze your requirements for so long a time.
The waterfall model is a nice model that is unfortunately entirely disconnected with reality. Even if your requirements are fixed, the waterfall model is too rigid to be of any use. To effectively plan a software project, you need to be aware of all the pitfalls that you will only discover in the coding phase.
Software development should be strategic. Construct a plan with the best information you have at the time, but be aware that plans rarely survive battle intact. You need to continually adapt your plan as new information is obtained. You need to scout out difficult terrain, and be ready to retreat when the tide turns against you. Try out unusual tactics and unconventional weaponry: this may result in failure, but when it succeeds the results can rewrite the rulebook.
There are over 100 git commands, and a command can do radically different things depending the the switches and target syntax. It's more confusing than any other revision control system that I have worked with.
I find it helps to have a good understanding of the foundations of Git. I'm not sure if you've read into blobs and trees, but I'd highly recommend doing so. There are a lot of tools for Git, but they're all built up from basic building blocks.
If you haven't already, try taking a look through Git from the Bottom Up
A Classic example of the west's hypocrisy
You may not be aware of this, but the western world is made up of many individuals with differing opinions.
In the UK and to a lesser extent here in Australia a "git" is akin to a moron.
Actually, git is more akin to "bastard" or "son of a bitch". You can say to someone "he was a clever old git" without it being considered an oxymoron.
Incidentally, Linus claims he named Git after himself.
Probably the biggest problem I see with open source is the lack of critical review. Without this someone that turns out garbage code will continue to do so forever.
Isn't this more a problem with closed source software? Quality of code seems more important for an open source project that wishes to attract volunteers than a closed source project where the developers are paid.
Except the browser is an excellent application to hack, even if sandboxed, because it has network access and is used for nearly everything these days, including online banking. If you want to be safer you'll have to use separate sandboxed browsers for finance vs email vs ... vs random browsing.
Isn't Chrome meant to do this? Each tab in Chrome is an individual sandboxed process.
I think your example is very flawed. The compiler can't easily make the kind of analysis to drastically reorder or regroup this computation.
Let's simplify your example for a minute by dropping the roots bit.
Yes, I agree that dropping the root calculations results in an expression that is not easily paralellisable unless you know that addition is commutative. However, if you have an expression like:
Then the order in which you evaluate the sqrts obviously doesn't matter, and it wouldn't be hard for the compiler to automatically create three threads for each sqrt calculation.
That said, I didn't mean to imply that functional evaluation was a panacea for parallel programming, merely that it can help.
That's my understanding of why most current research is on adding primitives that let the programmer specify parallel algorithms at the source level, such as Parallel Haskell [hw.ac.uk]'s par and seq primitives.
You're probably right on that score. Though I'd contend that even this being the case, par and pseq are easier to adapt to existing functional algorithms than "new Thread" or "fork" is to imperative algorithms.
This seems to be a much more useful and practical way to take advantage of parallel hardware - keep the familiarity of the languages you know and then pick from a smorgasborg of parallel execution libraries as fits your needs.
The problem with this approach is that you wind up having to coordinate your threads and locking manually (such as in pthreads), or you wind up programming in a way that's only a stone's throw away from functional programming anyway (such as MapReduce).
I don't think so. The reason that you don't have side effects is precisely because everything is copied to message channels.
What the hell are "message channels", and why would you need them when your data is immutable?
First, in a functional language the runtime has to make a decision that other languages specify - order of evaluaion. If order doesn't mater, a sequential language can start multiple threads too.
Yes, you can manually put in threads in an imperative language, but a sufficiently smart compiler can do this for you in a functional language.
For instance, let's say you want to find the sum of the square roots of a sequence. In an imperative language, you'd write something like this:
Evaluation order is fixed by the language; the sqrt of the first item in the list is calculated first, then the second item in the list, and so on. To make this algorithm concurrent we'd have to manually go in and alter the code to support this.
Conversely, in a functional language:
Here the evaluation order is not defined. We could evaluate the end sqrt calculation first, or assign the first 5 sqrt calculations to one thread, the next 5 to another thread and so on. Which sqrt is calculated first is entirely up to the compiler.
Second, you can have immutable data in conventional languages too. Just don't modify variables.
And how many conventional languages have a standard library that is optimized for immutable data structures? For instance, if I have an immutable HashTable in Java, and I want to add a new value to it, I have to create a copy of my existing HashTable. This is obviously very inefficient.
Conversely, functional languages have data structures that are designed to be worked with immutably. New data structures can be created from old ones, not by copying their data wholesale, but by referencing their data in a way that's similar to how version control systems work. You don't need to maintain a copy of every past version; you just need to keep a list of changes.
Obviously you could get some of the benefit of this simply by using a conventional language with some third party library that gives them efficient immutable data structures. But once you do this, you're basically half way to functional languages anyway, and you lose all of the syntax sugar that dedicated functional languages have for working with immutable data.
My response back then was to get excited about FP. My response now is: Where is the proof?
Whether functional programming is the best paradigm to use for parallel computing is undecided. But it does have a couple of advantages over imperative programming.
First, imperative programming specifies the order of evaluation, whilst functional programming does not. In Haskell, for instance, an expression can essentially be evaluated in any order. In Java, evaluation is strictly sequential; you have to evaluate line 1 before line 2.
Second, imperative languages like Java favour mutable data, whilst functional languages like Haskell favour immutable data structures. Mutability is the bane of parallel programming, because you have to have all sorts of locks and constraints to keep your data consistent between threads. Programming languages that do not allow mutable data don't have this problem.
No matter how fine the granularity of the responses of the AI becomes, it's still just a collection of little functions that passed the point of "photorealism" from a conversational perspective. That doesn't mean it's self aware.
I could substitute "functions" for "cells", and claim the same thing about you. How can a machine built out of hydrocarbons and water ever be conscious?
Maybe because there isn't a better open source alternative.