Thank you. I wrote CBofN as a labor of love with little expectation that anyone else would ever read it. Comments like yours are the best possible reward.
Just for the record: I agree. That's why I often pronounce it as "Flakenstein" a la Mel Brooke's Young Frankenstein. You gotta have fun with a name like mine.
A suprisingly large number of responders have asked why anyone would need 64GB, and have speculated that we are either lazy, stupid, or have money to burn.
While I can't reveal too much, I'll try to give you an approximate idea of the sort of problems that we are working on.
Basically, we are doing two things: graph theoretic algorithms on graphs with hundreds of millions of vertices and billions of edges, and robust classification systems trained on training sets of a similar size.
For our purposes, we need to access these data structures in a deterministic order that cannot be predicted in advanced. We could cache all of our data structures on disk, but the algorithms would require years to complete because of disk seek time. Instead, we are going to try to keep a compressed version of the data in memory. The difference is that the in-memory approach will take minutes as opposed to years.
And if you are thinking that we need a new algorithm, trust me on this, we are using the faster algorithm. Most of the other candidates run in exponential time.
If you want a real clue as to what we're doing, then read this. If you like what you've read and think you can help, then contact me. I am hiring.
Michael Mozer of U. Colorado CS department wired his entire house with sensors and controls connected to neural networks and other machine learning systems. He did this at least five years ago, so the idea is hardly new. The house has a web page with an overview and another link that shows the status of the house.
One great story about the house has it that Mozer's students would call his house whenever the toilet sensor showed that he was sitting on the can for more than a minute.
I believe that particular sensor was later disabled.
As far as I can tell CT's post and the article have anumber of things wrong. I've known some of the people involved with DjVu for a couple of years, so let me list a couple of facts in no particular order:
DjVu was originally developed at AT&T by a group that has traditionally worked in machine learning. LizardTech purchased the technology from AT&T.
This format is specialized for scanned documents.
The technology is very different from just about everything else because it seperates background and foreground planes. The background is compressed with wavelets, and the foreground probably uses a form of clustering on characters shapes (in a typeface and language independent manner). As a result of the latter, you get a form of OCR almost for free. You can also do text search.
Everything can be viewed at 300dpi directly in your browser and in realtime (you normally only view at 100dpi but you can zoom in).
The linux viewer plugin and compressor has been available for years.
The main attraction of DjVu is that your scanned documents are tiny (typically less than 50KB) which makes it feasible for putting them on the web. Just about every other format results in files too big for easy distribution on the web. Interestingly, you can convert a *.ps.gz file into a DjVu file, and see a dramatic improvement in file size while preserving almost all of the detail. I am not talking about simple pages here, by very complex ones with a mixture or real images / artwork, and text.
Apologies for any mistakes, but I think that I got most of it right.
I am posting this from NS6PR3. I had a similar problem with core dumps (mentioned in other posts). For me, the fix was to download and install the recommended distribution and not the custom or full. Both custom and full seem to have problems with dynamically linked libraries.
This is obviously a shamless plug, but look at my book, The Computational Beauty of Nature. There's a few dozen projects in there suitable for high school students. It's been used as the basis for a few college courses, but I think bright high school students can get something out of it as well.
Basically, the book is about computation, fractals, chaos, complex systems, adaptation, and how all of these things relate to one another.
Check out the book's website for more information.
I disagree with you that writing a GC has to be difficult. I've written several, and all of them have been a page or less of code. Here's the GC from my minimal Lisp interpreter, which I call Stutter:
Of possible to interest to some readers, my latest Brain Candy column at Fatbrain.com is about Einstein's special theory of relativity and how it implies that any faster than light communication implies the ability to send messages backwards in time.
Of course, this yields all sorts of ugly paradoxes, which is why most physicist consider FTL communication unrealizable.
First a disclaimer: I am on the Fatbrain payroll; not as a regular employee, but as the resident scienc e columnist. As such, everything in this message is purely my own opinion and in no way reflects the opinion(s) of Fatbrain or its management.
Now, regarding user comments and such, as a science author I find Fatbrain's website to be vastly superior to Amazon's. For example, I submitted information about my book's website in August of 1998 to both Fatbrain and Amazon. It appeared at Amazon's page for my book in April of 1999 --- a mere 8 month delay. On the other hand, Fatbrain built a rather elaborate page for my book before the book even hit the stores. (To be clear, I only started writing for Fatbrain about two months ago, so this has nothing to do we them giving special treatment---they didn't even know me at the time.)
That said, if you are looking for a Stephen King book, then definitely go to Amazon; but if you are looking for scientific and technical books, it makes sense (at least to me) to shop at a place where they specialize in geek books, seek out authors to improve their website (with original content, author recommendations, etc.) and put a great deal of effort into building an informative website that contains user review, excerpts, and author information, not just for Oprah's book club authors, but for science and technical authors that are regularly ignored by Amazon.com.
Just to set the record stright, the poster's assertion that the article is a Northern Light plug is completely baseless. The authors (Lawrence and Giles) work at The NEC Research Insitute (where I work), which has no connection to Northern Light. In fact, they did an earlier and less comprehensive study a year ago that showed Hotbot and Altavista had the greatest coverage at that time.
The average human brain has about 10^10 neurons, each of which is connected to an average of 10^4 other neurons. This means that at a minimum the brain has 10^14 bits. Dividing this last number by (8 * 1024^2) translates it into 32.95 MBytes.
This is, of course, a ridiculously low lower bound.
Synaptic connections are far more likley to come in many shades of gray. Let's say that each synapse has about one-thousand possible strength values. Then you can multiply this lower bound by a factor of 10.
Next, if you include the specifics of the wiring, you can factor in the combinatorial nature of how the wiring could take place. This buys you another factor of 30 (10^10/log(2) for binary encoding of destination neurons).
But with spiking neurons, who knows? So all bets are off.
IBM's move to fund more basic research is certainly good news for everyone, but as a working scientist I am happy that this is not big news in the mainstream press. Here's why:
The public's expectations for science have often been way out of proportion to reality. Think about neural networks, AI, chaos and complexity theory, etc. While all of these topics have introduced either new insights or new technology, none of them have or will supply us with any sort of holy grail.
I never found the exact founding level for the DCI, but if the quoted level of $29M is correct, then this really isn't that big a deal. $29M will get you a research lab with between 50-200 research scientists. If the scientists have a working budget, then count on 50, but if they get money from outside sources, then count on 200. This is probably a mere fraction of what IBM pays for advertising.
``Deep Computing'' is advertisement-speak, not science-speak.
In short, I'd rather see science over-deliver and under-promise than over-promise and under-deliver.
Like complexity, emergence is one of those topics that no one can agree on how to define. If you ask a holist, s/he will tell you that an emergent phenomenon has top-level behavior that cannot be predicted from the bottom-level description. However, this definition seems to discount a simulation as a method to predict that unusual top-level behavior can arise.
If you ask a reductionist, s/he will tell you that nothing is emergent. In fact, Marvin Minsky's Society of Mind is often cited as a model that explains how intelligence can emerge from dumb lower-level building blocks. However, Minsky is a firm reductionist and claims that nothing is in fact emergent. (Ask him yourself.)
In any event, this article would seem to imply that emergence is a recent discovery that has placed us within an epsilon of building an AI. Like most mainstream articles on science, it's very much out of date. People have been actively talking about emergence for decades. And while we have gained a great deal of insight over that time, we still do not understand how the visual cortex works (which is the best-understood portion of the human brain) let alone more complicated things such as language and planning.
But none of this means that building an AI breaks any of the rules. Evolution took a few billion years to go from a single cell to multicellular beasts, and another billion years to produce a beast capable of talking about itself. Computer Science has only been around for a few decades, so give it a bit more time.
(All of the above was written by Gary Flake's personal agent;-)
This reminds me of the mid 70s when many people owned CB radios. You could occasionally hear someone screaming at the top of his/her lungs about something very important.
Here is my litmus test: go to an advanced OS class, look at what is studied, and note which piece of the linux distribution is where those things are realized. It's not in any of the apps nor the libraries. It's the kernel.
So I claim that the Linux OS *is* the kernel.
Regarding what was said about the apps (GCC, etc.) You *can* plug in many different alternates and still have a working system. I am not saying that it is easy or painless, just that it is possible.
Regarding the POSIX interface, great, wonderful,..., we are in full agreement.
But I think the really important point is this: you cannot force people through word play nor force of personality to name something after something else that you hold dear. It's like trying to force people to like you. It doesn't work that way. Language and names are defined by common usage (despite what prescriptive linguists say) and bullies have been notoriously unsuccesful in changing this basic fact.
There are at least two other annoying facets to this situation that I haven't seen raised. First, the GNU tools have been ported to and packaged for many other operating systems (e.g., Solaris, BSD, Ultrix, HP-UX, and, yes, even Windows and arguably DOS). Does RMS insist that we call these by a mangled GNU/XXX name? Of course not. The GNU tools are only tools, after all, and giving them such an unnatural level of prominence would be obviously silly.
So, the cult of GNU would say ``See the GNU tools are an essential and defining piece of Linux, ergo it is GNU/Linux!''
Not true. If you take your favorite Linux distribution, there is no reason why you couldn't replace the entire suite of tools by alternates (and without ever rebooting the machine). If you did this, what would you have? Why an altered Linux system, of course, but still a Linux system.
Now take your favorite Linux distribution and substitute in a new kernel. What do you have? Well, err, a new operating system. My point is that one piece (the kernel) actually defines the properties of the entire operating system while all other pieces (a superset of the GNU tools) can be replaced without changing the underlying structure.
As to the second point that I promised, I am disturbed not by the cult of GNU's desire for credit (they deserve credit, and a lot of it) but by RMS's insistence that the credit be noted by changing how we speak. Can you say ``double plus good?!?''
You know, I am no Katz fan (read my comments from yesterday), but I find your post infuriating. I thought that Contact was superficial, annoyingly new agey, and could only pretend to any real depth at all. But you liked it. And that's fine.
What irks me is your audacity to speak for "enlightened people" and equating them to be "those who go to college." Now, I've been to college. I even got one of those Ph.D. thingies. But having a college education is completely orthogonal to being "enlightened". And generalizing things to the point were you start equating "likes Contact" to "is smart" is pretty naive.
Thank you. I wrote CBofN as a labor of love with little expectation that anyone else would ever read it. Comments like yours are the best possible reward.
-- GWF
Me love you long time.
I can only assume that it is on here because it has Microsoft in the title and gives the editors a chance to whip out the (frankly stupid) borg icon.
And here I thought that this morning's edition of slashdot had been personalized just for me!
-- GWF
Just for the record: I agree. That's why I often pronounce it as "Flakenstein" a la Mel Brooke's Young Frankenstein. You gotta have fun with a name like mine.
-- GWF
ROFL. Thanks, you made my week. I've always wanted to be the Darth Vader of the Internets, if just for a day,
-- GWF
I beg to differ: Bill Flake is my father.
-- GWF
A suprisingly large number of responders have asked why anyone would need 64GB, and have speculated that we are either lazy, stupid, or have money to burn. While I can't reveal too much, I'll try to give you an approximate idea of the sort of problems that we are working on.
Basically, we are doing two things: graph theoretic algorithms on graphs with hundreds of millions of vertices and billions of edges, and robust classification systems trained on training sets of a similar size.
For our purposes, we need to access these data structures in a deterministic order that cannot be predicted in advanced. We could cache all of our data structures on disk, but the algorithms would require years to complete because of disk seek time. Instead, we are going to try to keep a compressed version of the data in memory. The difference is that the in-memory approach will take minutes as opposed to years.
And if you are thinking that we need a new algorithm, trust me on this, we are using the faster algorithm. Most of the other candidates run in exponential time.
If you want a real clue as to what we're doing, then read this. If you like what you've read and think you can help, then contact me. I am hiring.
One great story about the house has it that Mozer's students would call his house whenever the toilet sensor showed that he was sitting on the can for more than a minute.
I believe that particular sensor was later disabled.
--GWF
The main attraction of DjVu is that your scanned documents are tiny (typically less than 50KB) which makes it feasible for putting them on the web. Just about every other format results in files too big for easy distribution on the web. Interestingly, you can convert a *.ps.gz file into a DjVu file, and see a dramatic improvement in file size while preserving almost all of the detail. I am not talking about simple pages here, by very complex ones with a mixture or real images / artwork, and text.
Apologies for any mistakes, but I think that I got most of it right.
-- GWF
I am posting this from NS6PR3. I had a similar problem with core dumps (mentioned in other posts). For me, the fix was to download and install the recommended distribution and not the custom or full. Both custom and full seem to have problems with dynamically linked libraries.
FYI: I am on a Mandrake 7.1 system.
-- GWF
Basically, the book is about computation, fractals, chaos, complex systems, adaptation, and how all of these things relate to one another.
Check out the book's website for more information.
--GWF
In terms of allowing obvious patents, this one is about the most absurd one I've yet to find.
-- GWF
I disagree with you that writing a GC has to be difficult. I've written several, and all of them have been a page or less of code. Here's the GC from my minimal Lisp interpreter, which I call Stutter:
voidgarbage_collect(void)
{
CELL*cell;
inti,count=0;
mark(binding_list);
for(i=0;i<protect_used;i++)
mark(protect_table[i]);
for(cell=heap,i=0;i<heap_si ze;cell++,i++){
if(!cell_mark(cell)){
cell_car(cell)=free_list;
free_list=cell;
count++;
}
cell_mark(cell)=0;
}
}
Surely, this is not rocket science.
-- GWF
Check out this column which intuitively explains why FTL communication in any form violates causality.
-- GWF
Of possible to interest to some readers, my latest Brain Candy column at Fatbrain.com is about Einstein's special theory of relativity and how it implies that any faster than light communication implies the ability to send messages backwards in time.
Of course, this yields all sorts of ugly paradoxes, which is why most physicist consider FTL communication unrealizable.
-- GWF
Now, regarding user comments and such, as a science author I find Fatbrain's website to be vastly superior to Amazon's. For example, I submitted information about my book's website in August of 1998 to both Fatbrain and Amazon. It appeared at Amazon's page for my book in April of 1999 --- a mere 8 month delay. On the other hand, Fatbrain built a rather elaborate page for my book before the book even hit the stores. (To be clear, I only started writing for Fatbrain about two months ago, so this has nothing to do we them giving special treatment---they didn't even know me at the time.)
That said, if you are looking for a Stephen King book, then definitely go to Amazon; but if you are looking for scientific and technical books, it makes sense (at least to me) to shop at a place where they specialize in geek books, seek out authors to improve their website (with original content, author recommendations, etc.) and put a great deal of effort into building an informative website that contains user review, excerpts, and author information, not just for Oprah's book club authors, but for science and technical authors that are regularly ignored by Amazon.com.
Just to set the record stright, the poster's assertion that the article is a Northern Light plug is completely baseless. The authors (Lawrence and Giles) work at The NEC Research Insitute (where I work), which has no connection to Northern Light. In fact, they did an earlier and less comprehensive study a year ago that showed Hotbot and Altavista had the greatest coverage at that time.
The average human brain has about 10^10 neurons, each of which is connected to an average of 10^4 other neurons. This means that at a minimum the brain has 10^14 bits. Dividing this last number by (8 * 1024^2) translates it into 32.95 MBytes.
This is, of course, a ridiculously low lower bound.
Synaptic connections are far more likley to come in many shades of gray. Let's say that each synapse has about one-thousand possible strength values. Then you can multiply this lower bound by a factor of 10.
Next, if you include the specifics of the wiring, you can factor in the combinatorial nature of how the wiring could take place. This buys you another factor of 30 (10^10/log(2) for binary encoding of destination neurons).
But with spiking neurons, who knows? So all bets are off.
In short, I'd rather see science over-deliver and under-promise than over-promise and under-deliver.
The real question is:
Why does this guy have to write a check at the end of a date?
Hmmm?
Like complexity, emergence is one of those topics that no one can agree on how to define. If you ask a holist, s/he will tell you that an emergent phenomenon has top-level behavior that cannot be predicted from the bottom-level description. However, this definition seems to discount a simulation as a method to predict that unusual top-level behavior can arise.
;-)
If you ask a reductionist, s/he will tell you that nothing is emergent. In fact, Marvin Minsky's Society of Mind is often cited as a model that explains how intelligence can emerge from dumb lower-level building blocks. However, Minsky is a firm reductionist and claims that nothing is in fact emergent. (Ask him yourself.)
In any event, this article would seem to imply that emergence is a recent discovery that has placed us within an epsilon of building an AI. Like most mainstream articles on science, it's very much out of date. People have been actively talking about emergence for decades. And while we have gained a great deal of insight over that time, we still do not understand how the visual cortex works (which is the best-understood portion of the human brain) let alone more complicated things such as language and planning.
But none of this means that building an AI breaks any of the rules. Evolution took a few billion years to go from a single cell to multicellular beasts, and another billion years to produce a beast capable of talking about itself. Computer Science has only been around for a few decades, so give it a bit more time.
(All of the above was written by Gary Flake's personal agent
This reminds me of the mid 70s when many people owned CB radios. You could occasionally hear someone screaming at the top of his/her lungs about something very important.
Usually it was done in a song, though.
Here is my litmus test: go to an advanced OS class, look at what is studied, and note which piece of the linux distribution is where those things are realized. It's not in any of the apps nor the libraries. It's the kernel.
..., we are in full agreement.
So I claim that the Linux OS *is* the kernel.
Regarding what was said about the apps (GCC, etc.) You *can* plug in many different alternates and still have a working system. I am not saying that it is easy or painless, just that it is possible.
Regarding the POSIX interface, great, wonderful,
But I think the really important point is this: you cannot force people through word play nor force of personality to name something after something else that you hold dear. It's like trying to force people to like you. It doesn't work that way. Language and names are defined by common usage (despite what prescriptive linguists say) and bullies have been notoriously unsuccesful in changing this basic fact.
There are at least two other annoying facets to this situation that I haven't seen raised. First, the GNU tools have been ported to and packaged for many other operating systems (e.g., Solaris, BSD, Ultrix, HP-UX, and, yes, even Windows and arguably DOS). Does RMS insist that we call these by a mangled GNU/XXX name? Of course not. The GNU tools are only tools, after all, and giving them such an unnatural level of prominence would be obviously silly.
So, the cult of GNU would say ``See the GNU tools are an essential and defining piece of Linux, ergo it is GNU/Linux!''
Not true. If you take your favorite Linux distribution, there is no reason why you couldn't replace the entire suite of tools by alternates (and without ever rebooting the machine). If you did this, what would you have? Why an altered Linux system, of course, but still a Linux system.
Now take your favorite Linux distribution and substitute in a new kernel. What do you have? Well, err, a new operating system. My point is that one piece (the kernel) actually defines the properties of the entire operating system while all other pieces (a superset of the GNU tools) can be replaced without changing the underlying structure.
As to the second point that I promised, I am disturbed not by the cult of GNU's desire for credit (they deserve credit, and a lot of it) but by RMS's insistence that the credit be noted by changing how we speak. Can you say ``double plus good?!?''
You know, I am no Katz fan (read my comments from yesterday), but I find your post infuriating. I thought that Contact was superficial, annoyingly new agey, and could only pretend to any real depth at all. But you liked it. And that's fine.
What irks me is your audacity to speak for "enlightened people" and equating them to be "those who go to college." Now, I've been to college. I even got one of those Ph.D. thingies. But having a college education is completely orthogonal to being "enlightened". And generalizing things to the point were you start equating "likes Contact" to "is smart" is pretty naive.