For example, is there a similar service for LINUX machines? I could certainly use a lot of extra machines to run simulations, but I mostly use LINUX. Having to switch everything over to Solaris makes this service much less useful.
I agree; trading space is the way to go. You might also like the Distributed Internet Backup System (DIBS) project. It provides a way to trade space with strangers and protects your data with cryptography, and error correction coding.
I agree. For people who want a free, open source solution which can make distributed off-site backups as large as you like with built-in encryption and error correction I recommend the Distributed Internet Backup System (DIBS).
Check out the Connexions Project
on
MIT Everyware
·
· Score: 2, Interesting
A similar project with a more open source flavor is the Connexions Project. The feature I admire most about Connexions is the idea of creating open content which people can combine in different ways to make a textbook tailored to their course or their own interests.
The following is the description of the Connexions project on the main web page:
The Connexions Project is a collaborative, community-driven approach to authoring, teaching, and learning that seeks to provide a cohesive body of high-quality educational content to anyone in the world, for free. The project involves two basic, interrelated components: (1) a Content Commons of collaboratively developed, freely-available material that can be modified for any purpose, and (2) open-source software tools to help students, instructors and authors manage the information assets in the Content Commons. Connexions provides an open, standards-based approach for sharing and advancing knowledge to benefit the global educational community.
I'm disappointed to see all the negative reactions to this idea. I think trying to provide an incentive to many different experts to predict events of national importance is a great way to leverage the best minds in the country.
I can understand that some people are uncomfortable with the idea of letting investors profit from the death and misery of others. One way to alleviate such concerns would be to stipulate that any profits from the proposed futures market could only be donated to charities. This way, an incentive still exists for an investor to accurately weight his beliefs for different events, but succesful investors will only benefit society and not profit from the pain of others.
This is exactly the plan I have in mind for DIBS. Currently, encryption is done via GPG. "Striping" is in the works. The plan is to use Reed-Solomon (RS) error correcting codes to generate the parity. The RS algorithm is currently written (http://www.csua.berkeley.edu/~emin/source_code/py _ecc) and will be integrated into DIBS sometime in the next few releases.
Bacula looks like a nice project but designed for a different purpose. After looking through the manual, I didn't see any discussion of peer-to-peer capabilities. That is, if we both install Bacula, how do I set up my Bacula client to backup my data to your machine given that I don't trust you all that much?
People have pointed out some other issues which I'd like to respond to:
* Why not just use some rsync/gpg/perl scripts to backup files from one machine onto another?
-- What if you only have one machine? Using DIBS, you can trade (automatically encrypted) files with peers without having a user account on anyone else's machine or owning multiple machines.
* What if you are behind a firewall?
-- As stated in the manual, DIBS has a number of different communication modes to allow things to work if one peer is on the Internet and another is behind a firewall. There is even a way to make things work if both peers are behind firewalls, but this is kind of kludgy at the moment.
* Why not use Gnutella/Napster/Kazaa?
-- While these file-sharing networks are terrific, they are not back-up networks. Who is going to want to store the pictures from your summer vacation? Probably nobody if you are talking about Gnuetall/Napster/etc. On the other hand, you can store your stuff on my machine if you let me store stuff on your machine.
* What if your peers have varying connectivity to the network?
-- The idea is that DIBS will store your data on multiple peers so if some of the are unavailable, then you get your data from somewhere else. Also, the plan for later versions of DIBS is to have your client periodically probe peers to measure how responsive they are. Once you have this information, your client will automatically prefer trading with more reliable peers.
Once again, I don't mean to imply that there are absolutely no issues with distributed backup and I welcome more comments on potential problems. However, I think these problems can be solved and that is why I'm working on DIBS.
Thanks again for all the feedback, -Emin Martinian
A lot of people have pointed out issues related to security, bandwidth, efficiency, etc. My vision is that DIBS will be designed to take things into account.
For example, DIBS uses GPG to encrypt and sign all communications so that peers can't read the data they are storing for you and so that other people can't pretend to be you and store their files with your peers.
Also, my vision is to include state-of-the-art erasure correction codes so DIBS uses redundancy efficiently. (Erasure correction codes are a generlaization of parity checks used by RAID). In fact, I have already written a python implementation of Reed-Solomon codes available at www.csua.berkeley.edu/~emin/source_code/py_ecc. I haven't had time to put this into DIBS yet since I'm currently working on my PhD at MIT and that keeps me pretty busy.
Incremental backup is another feature I'm planning to add. There are some issues with how incremental backup interacts with encryption and erasure correction. I think resolving these issues may take a little more thought so I might have to wait until I graduate, become a professor and get some grad students of my own to help me.
A Slashdot post isn't the place to go into all the arguments for or against DIBS. However, I think distributed backup is a viable idea. While there are some serious issues, I believe that through clever engineering, we can solve them and create a cheap, simple, efficient, and secure backup system usable by anyone with a network connection.
I decided to start writing a distributed backup prototype like DIBS in order to find out what the major issues are and how to address them. Sure, currently DIBS has some flaws, but it is a prototype written by a grad student. With more feedback from the community and some more development effort I believe DIBS can become a valuable tool. If you agree, I invite you to join the development effort, or try it out and tell me how you think it could be improved, or even take whatever parts you find useful and make something better. The project page is at sourceforge.
As the headline of the story says, check out Audio Spotlight from MIT. I was lucky enough to see Joseph Pompei (the inventor of Audio Spotlight) give a demonstration and it was amazing. The technology works as promised: it produces a directed beam of sound which can make noises come from anywhere in the room you want. Furthermore, Mr. Pompei struck me as an exceptionally competent researcher. He had looked at a lot of issues like what kind of frequency response you can get (bass is harder to get than treble), whether the ultrasound causes long term damage (not according to a Harvard study), and how to manufacture (short answer: lots of DSP chips).
I don't new about the guy Newsweek talks about, but the technology is real and I'm looking forward to hearing it.
The article mentions distributed backup as a possible application, but in my mind distributed backup is the killer application.
Consider a distributed backup program which works roughly as follows.
You install the program and give it a certain amount of space on your hard drive.
You tell the agent which files or directories you want backed up (e.g./home).
The distributed backup program periodically contacts other computers and swaps encrypted versions of your data for their data.
If your machine crashes or you lose data or your city gets nuked, you can easily recover your data from the computers you shared with.
This type of application would provide at least 3 important benefits for backup. First, its relatively cheap. If you want to backup more data, just buy more local disk space and trade files with more computers. This seems much easier (at least for a home user) than setting up a tape backup system, making sure the tapes get replaced, making sure the tapes get put someplace safe, etc. Second, its much safer than pretty much any backup system you could buy today commericially since your data is literally spread all over the world. Finally, the backup system isn't controlled by any large corporation.
Obviously there are still some details left to be worked out such as how to let computers who want to trade files find each other (both centralized and distributed options exist analagous to napster and gnutella), how to prevent cheating (having your computer periodically ask its partners for hashes of the data they are backing up should work), how to control redundancy most efficiently (error correcting codes like Reed-Solomon codes or Tornado codes would probably be smarter than just repeating data).
If you're looking for a great distributed open source project that will make the world a better place, I encourage you to develop prototypes for distributed backup. I plan to develop my own prototype one day, but currently I'm pretty busy with graduate school.
Just like everyone else, I'm really excited about being able to get a PS2 and run Linux on it. Is there a mail/email address for the appropriate place in Sony that I can write to? I'd like to tell them that as soon as they release this in the US I'll be at the store waiting in line to buy it.
Since we all spend a lot of effort writing letters to companies who do things we DON'T like, it would be nice to write letters and express support (both moral and financial) for companies that do things we DO like.
By the way, many schools seem interested in buying computers for their students/teachers. With Linux running on a PS2, these schools could now have a cheap hardware platform which can run office apps, graphics, math programs, and play DVD's and video games! What do you think kids would be more excited about, getting a free desktop running Windows or a free PS2 running Linux?
You raise a good point, but leave out several important issues.
First, while there are lots of books on almost any subject you can think of, most of the books aren't necessarily that great. Even in the good books, a lot of ancillary information is included. A good set of lecture notes should summarize the most important points and provide you with the essentials. If you have these you can then learn the details from the book. Having good lecture notes to start with is quite useful.
Second, many of the valuable lessons learned in engineering are learned by working on projects. Of course, anyone can make up there own "projects" to work on for educational benefit. However, a relevant project which can be completed in a reasonable amount of time while illustrating important practical and theoretical issues which has been debugged and comes with the necessary background material (code, libraries, etc.) is very valuable.
Finally, although you could go to a library to check out books and learn the necessary material. It is much easier and more conveniant to be able to do it in the comfort of your home/office via the Internet.
As you say, scalability really is the main issue. A recent technical paper called "The Capcity of Wireless Networks" argues that the per user capacity of a wireless net with n nodes is proportional to 1/sqrt(n). This means that if you quadruple the number of nodes you halve the capacity each user gets. This would imply that eventually adding more nodes decreases everyone's capacity!
The math behind the 1/sqrt(n) argument in the paper is a little involved. However, the 1/sqrt(n) essentially occurs because as you add more nodes, each node has to spend more and more time relaying other people's packets.
To see why this occurs, consider a WIRED network where every node is on a square k by k grid where the number of users is n = k*k. Let each node be connected by a wire only to it's four nearest neighbors. Assume that each wire can carry 1 packet/sec. To route packets to a far away desitination you need the intermediate nodes to relay for you.
If each node (i,j) wants to transmit a packet to another randomly chosen node (i',j'), then each packet has to go thourgh an average of k hops. This is because on average |i-i'| + |j-j'| = k. Therefore on average each packet goes through k hops. Since there are only n wires, on average you can only have n/k packets/second of throughput.
To get the throughput per user you divide the total throughput by the number of users to get capacity per user = n/k/n = 1/k = 1/sqrt(n) and there you have it. As you add more users the per user capacity goes down.
You might think that in a wireless network you can avoid the hops and have node (i,j) transmit directly to node (i',j'). The problem is that this would cause so much interference to all the other nodes that it makes your capacity even worse.
Although I don't know much about the design decisions involved in implementing passport, I don't see why they don't use a zero-knowledge protocol (ZKP). Basically a ZKP is a way for Alice to prove to Bob that a certain claim, C, is true. Furthermore, under certain assumptions (e.g. factoring is hard, graph-isomorphism is hard, etc.) you can prove that Bob doesn't learn anything beyond the fact that C is true.
How would this be used for authentication? I generate an instance of a hard problem, P, along with a claim, C, which I only I can prove. I publish (P,C) as a type of public key.
If I want to prove to slashdot or Hotmail that I am me, I use a ZKP to prove C thus authenticating myself. Since I used a ZKP, even though slashdot now knows C is true, slashdot doesn't know how to prove C itself. So slashdot can't pretend to be me when talking to Hotmail (unless slashdot can factor or solve my chosen hard problem).
Some benefits of using a ZKP include:
I only need to log into my computer with
a single passphrase and then my computer
can use a ZKP to convince all the other
web sites of my identity.
The system is provably secure under certain
assumptions.
No central authentication server has a list
of passwords or other information it can
use to impersonate me.
Since no central authentication server is
necessary, the authentication prover (i.e.
the program that runs on my computer to
prove who I am) and the authentication
verifier (i.e. the program that runs on
slashdot to check my identity) could be
implemented by different companies. Thus
you could use an open source prover with
an MS verifier allowing interoperability.
So my question is why doesn't MS use a zero-knowledeg protocol to implement passport? Is this type of idea patented, or are there are other issues such as security, speed, etc.? I'm not trying to bash MS since I know that they have some pretty smart people there I'm just trying to find out why they didn't use ZKP.
I suspect the answer is because a ZKP based system would probably be easy to clone by open source people or other companies. On the other hand, passport seems to give them significant business advantages at the cost of security, interoperability, elegance, etc.
A lot of people have been pointing out that
the ultrawideband idea doesn't open up
more spectrum. Therefore it doesn't give
you better capacity, speed, etc. From what
I have heard, the main benefit of UWB is
supposed to be that the transmitters can
be made dirt cheap.
The UWB transmitters can be made by a spark gap. They basically spit out a really short pulse (i.e. a spark). This means you don't need fancy linear amplifiers. For conventional transmitters, CDMA, TDMA, FDMA, whatever you need to burn a lot of power do to circuit inefficiencies.
However, you don't hear Time-Domain talking about his advantage much, so maybe it doesn't exist...
Another issue people haven't talked about is channel estimation. If you blast your signal over an enormous range of the specturm it is going to get distored. Generally you compensate for this distortion with an equalizer. However, you have to somehow learn what the channel is in order to equalize it. Now if you are transmitting over a reasonably narrow range this is possible. It might be more difficult if you are transmitting over an enormous amount of spectrum.
I'm not saying UWB won't work. I'm just pointing out that many of its potential benefits and drawbacks don't seem to be addressed in the article or in the papers I've read. Does anynone know of references which discuss these issues?
Here is an alternate hat problem I heard from Everest Huang who heard it from Matt Secor:
Ten people are in a line so that each person can only see the people in front of him. God puts either a red or blue hat on each person. No one is allowed to look at their own hat but can see the hats of all people before them. Each person is allowed 1 guess at their own hat color. If they get it right, they live, otherwise they die immediatly. The person in the back of the line who can see the other 9 people guesses first and the guesses go on down the line.
Question: If god is malicious and places the hat colors to kill the maximum number of people, how many can you save? You can easily save 5 as follows: person 10 says person 9's hat color possibly killing person 10. Person 9 now knows his hat color and says it so he lives. Person 8 says person 7's hat color and possibly dies. Person 7 knows his hat color, says it and lives, etc. In this scheme, all the even numbered people die and the odd numbers live so you save 5. Can you save more?
More interesting actually are support vector machines (svms). There are several papers on the web about them, they were devloped at bell labs by some russian dude whose name escapes me at the moment. They can be more effective than NNs, but the math is a bit harder to understand.
The russian dude is named Vladimir Vapnik. He developed SVMs based on something called Statistical Learning Theory (SLT). One of the main insights of SLT regards the convergance of empirical averages.
The idea is that if you want to make a complicated estimate you need more data. For example, you need more data to fit a 10 degree polynomial to your data than you do to fit a 2 degree polynomial. This makes sense intuitively, but SLT provides the mathematical justification for the intuition. Basically SLT says that the estimate for the 2 degree polynomial will converge faster than the estimate for the 10 degree polynomial and it quantifies exactly how fast. Furthermore, SLT provides a way to measure how much data a particular kind of estimator or regression requires.
In the polynomial fitting example, it turns out that the model complexity is equal to the number of free parameters. This is not true in general. For example, NNs can have lots of free parameters but still have a low model complexity. This makes it possible to accuratly train an NN with 50 neurons using much less data than would be required for accuraly fitting a 50 degree polynomial. I think this is part of the reasons NNs do well in practice: NNs can represent large classes of models but training NNs converges better with less data because they have low model complexity in the sense of SLT.
The idea behind SVMs is to build a structure which is rich enough to represent a large class of relationships but constrained enough so that you can accuratly train it with limitied data. Thus SVMs trade off richness with accuracy based on the data. If you have only a little bit of data you use a structre which is constrained enough to be trained accuratly but might not model subtle relationships. Once you get more data then you can use richer structures and still maintain training accuracy. Its really a pretty nice idea.
Anyway, the point is that NNs do seem to work well in practice and Vapnik's work might help explain this. If you are interested in NNs or similiar things, I would suggest checking out SLT and SVMs. Unfortuantly both SLT and SVMs are pretty new so the math is still a little ugly.
The 2600 article was well written and convincing. I admit I haven't followed the DeCSS issue, but if the 2600 article is correct then it seems like this case is a major threat to free speech.
Before sending a check to the EFF and posting DeCSS on my web site, I just have one question. Is there a well written piece telling the other side of the story? Even though I don't have any reason to doubt the 2600 story, its always best to hear both sides before making a decision.
Something I've wanted for a long time is an XML based word processor. I would like to write a document indicating bold, italic, chapters, footnotes, equations, references, figures, etc. using XML tags. Then I'd like to run some kind of compiler that would take the XML and produce the output in ASCII, HTML, PDF, PostScript, Man Page format, Info Format, or whatever. For example, I could use a tool like this to write my resume ONCE and have it converted to whatever format I need to use it in.
I can kind of do this with MS Word, or with latex+latex2html, texinfo, etc. but those are all incomplete. I think XML could make word processing much better not only because of the backward compatability, but also because you can write one document and "print" it in lots of different ways.
Have applications been written which do this kind of thing or do I need to write one myself?
A lot of the spooky stuff in Quantum Mechanics is related to the Uncertainty Principle. The Uncertainty Principle says that you can't measure the position and momementum of a particle beyond some fundamental uncertainty. Some people think this is because the observer effects the system.
In high school, the explanation I got was that if you try and measure the position of an electron you have to bounce photons off of it. So you get a good measurement of the position. However, you get a crummy measurement of the momentum because bouncing photons off the electron changed its momentum. This is an interesting interpretation but it is wrong!
I wrote a semi-technical explanation of the Uncertainty Principle for a friend of mine a while ago and posted it on my web site at http://www.CSUA.Berkeley.EDU/~emin/writings/quantu m_ind.html in case anyone is interested. The explanation is available in PDF and HTML. To understand the explanation you need a basic idea of what integrals and derivatives are but you don't need to remember how to calculate them.
I think the DMCA is definitely something we should worry about because of its potential to stop freedom of speech. The DMCA makes it illegal to publish source code which could be used to circumvent copying restrictions. This is disturbing because restricting the publishing and distribution of source code is a drastic restriction similar to restricting freedom of speech. Furthermore it is conceivable that "dual use" programs which can be used for completely legitimate purposes would be illegal because of the DMCA. DeCSS is a good example of this.
Unfortunately it seems like a lot of people are more concerned about the "Free Beer" issues instead of the "Free Speech" issues. An excellent example is Napster. I'm sure that Napster could be legitimately used for legally exchanging music, but lets be honest. The main purpose of Napster is to allow people to exchange pirated music. When the big record labels convince sites to disallow Napster, they are not suppressing culture; they are stopping piracy.
I think that the record labels are justified in trying to stop piracy. They go to a lot of effort to produce the music they distribute, when you take their music without paying for it you are effectively stealing from them. If you think they are charging too much for the music you are free not to buy it. However to claim that they are evil because they are trying to stop people from pirating music is pure hypocrisy.
What makes the record companies and the DMCA evil is not that it is trying to stop piracy and tools like Napster. What makes them evil is that they are willing to trample over all sorts of other rights in their quest. We as a community should resist their efforts to make publishing source code or uncopyrighted music illegal. This is a position where we can make a strong ethical stand. However, if we focus on tools for piracy such as Napster or claim that record companies are stamping out culture by going after Napster we only weaken our position.
I would much rather have free speech than free beer. Protecting the right to distribute source code or express opinions is important but protecting piracy is not. By making a lot of noise to defend piracy we diminish our ability to defend the truly important issues.
I have a general question about Garbage Collection that I'm hoping someone on/. can answer: Is it possible to write garbage collectors that work even if the programmer is stupid/malicious?
For example, what can GC do if the programmer has a circular chain of objects which all reference each other but aren't referenced by anything else? Alternatively, what if a programmer creates a hole bunch of objects in the beginning of a program (for example in the first line of the main() function). If the programmer doesn't use these objects anymore after line 10 of main() will the GC clean them up at line 11? It seems like the objects are still in scope so the GC couldn't delete them. However they are never used again so they sit around wasting memory. Using malloc and free you could malloc a ton of objects in line 1 of main() and free them in line 11 of main().
Are their GC algorithms which can be proved to handle pathological cases such as this? You might say that this is an irrelvant question because nothing can save you from stupid/malicious programmers. I would agree. However, GC is often touted as protecting programs from stupid programmers or programming mistakes. Yet if you can't prove that GC always works then it seems like GC just makes memory management bugs less common and much more subtle. After many years of C/C++ programming, I'd rather have many simple memory leaks which I can track down with various tools as opposed to rare but impossible to understand bugs due to problems with the GC.
I'm not trying to put down GC. It's just that when I've used various GC langauges such as Tcl or Lisp I've encountered things that seem a lot like memory leaks. Is this because of buggy implementations of GC, bad programs which prevent the GC from working or because of fundamental limitations of GC?
By the way, I've heard that in languages like Java you need to set objects to NULL after you are done using them to be assured that the GC will clean them up. I haven't actually used Java, but it seems like that would defeat the whole purpose of garbage collection. After all, if I always have to remember to set objects to NULL then I might as well just always remember to call delete! Please tell me that the person I heard this from is wrong and Java isn't that lame.
Are there any competitors to the Sun Grid?
For example, is there a similar service for LINUX machines? I could certainly use a lot of extra machines to run simulations, but I mostly use LINUX. Having to switch everything over to Solaris makes this service much less useful.
Thanks.
I agree; trading space is the way to go. You might also like the Distributed Internet Backup System (DIBS) project. It provides a way to trade space with strangers and protects your data with cryptography, and error correction coding.
I agree. For people who want a free, open source solution which can make distributed off-site backups as large as you like with built-in encryption and error correction I recommend the Distributed Internet Backup System (DIBS).
I'm disappointed to see all the negative reactions to this idea. I think trying to provide an incentive to many different experts to predict events of national importance is a great way to leverage the best minds in the country.
I can understand that some people are uncomfortable with the idea of letting investors profit from the death and misery of others. One way to alleviate such concerns would be to stipulate that any profits from the proposed futures market could only be donated to charities. This way, an incentive still exists for an investor to accurately weight his beliefs for different events, but succesful investors will only benefit society and not profit from the pain of others.
-Emin
Templates are particulary useful and efficient for container classes like hash tables.
This is exactly the plan I have in mind for DIBS. Currently, encryption is done via GPG. "Striping" is in the works. The plan is to use Reed-Solomon (RS) error correcting codes to generate the parity. The RS algorithm is currently written (http://www.csua.berkeley.edu/~emin/source_code/py _ecc) and will be integrated into DIBS sometime in the next few releases.
-Emin
Bacula looks like a nice project but designed for a different purpose. After looking through the manual, I didn't see any discussion of peer-to-peer capabilities. That is, if we both install Bacula, how do I set up my Bacula client to backup my data to your machine given that I don't trust you all that much?
-Emin
People have pointed out some other issues which I'd like to respond to:
* Why not just use some rsync/gpg/perl scripts to backup files from one machine onto another?
-- What if you only have one machine? Using DIBS, you can trade (automatically encrypted) files with peers without having a user account on anyone else's machine or owning multiple machines.
* What if you are behind a firewall?
-- As stated in the manual, DIBS has a number of different communication modes to allow things to work if one peer is on the Internet and another is behind a firewall. There is even a way to make things work if both peers are behind firewalls, but this is kind of kludgy at the moment.
* Why not use Gnutella/Napster/Kazaa?
-- While these file-sharing networks are terrific, they are not back-up networks. Who is going to want to store the pictures from your summer vacation? Probably nobody if you are talking about Gnuetall/Napster/etc. On the other hand, you can store your stuff on my machine if you let me store stuff on your machine.
* What if your peers have varying connectivity to the network?
-- The idea is that DIBS will store your data on multiple peers so if some of the are unavailable, then you get your data from somewhere else. Also, the plan for later versions of DIBS is to have your client periodically probe peers to measure how responsive they are. Once you have this information, your client will automatically prefer trading with more reliable peers.
Once again, I don't mean to imply that there are absolutely no issues with distributed backup and I welcome more comments on potential problems. However, I think these problems can be solved and that is why I'm working on DIBS.
Thanks again for all the feedback,
-Emin Martinian
A lot of people have pointed out issues related to security, bandwidth, efficiency, etc. My vision is that DIBS will be designed to take things into account.
For example, DIBS uses GPG to encrypt and sign all communications so that peers can't read the data they are storing for you and so that other people can't pretend to be you and store their files with your peers.
Also, my vision is to include state-of-the-art erasure correction codes so DIBS uses redundancy efficiently. (Erasure correction codes are a generlaization of parity checks used by RAID). In fact, I have already written a python implementation of Reed-Solomon codes available at www.csua.berkeley.edu/~emin/source_code/py_ecc. I haven't had time to put this into DIBS yet since I'm currently working on my PhD at MIT and that keeps me pretty busy.
Incremental backup is another feature I'm planning to add. There are some issues with how incremental backup interacts with encryption and erasure correction. I think resolving these issues may take a little more thought so I might have to wait until I graduate, become a professor and get some grad students of my own to help me.
A Slashdot post isn't the place to go into all the arguments for or against DIBS. However, I think distributed backup is a viable idea. While there are some serious issues, I believe that through clever engineering, we can solve them and create a cheap, simple, efficient, and secure backup system usable by anyone with a network connection.
I decided to start writing a distributed backup prototype like DIBS in order to find out what the major issues are and how to address them. Sure, currently DIBS has some flaws, but it is a prototype written by a grad student. With more feedback from the community and some more development effort I believe DIBS can become a valuable tool. If you agree, I invite you to join the development effort, or try it out and tell me how you think it could be improved, or even take whatever parts you find useful and make something better. The project page is at sourceforge.
As the headline of the story says, check out Audio Spotlight from MIT. I was lucky enough to see Joseph Pompei (the inventor of Audio Spotlight) give a demonstration and it was amazing. The technology works as promised: it produces a directed beam of sound which can make noises come from anywhere in the room you want. Furthermore, Mr. Pompei struck me as an exceptionally competent researcher. He had looked at a lot of issues like what kind of frequency response you can get (bass is harder to get than treble), whether the ultrasound causes long term damage (not according to a Harvard study), and how to manufacture (short answer: lots of DSP chips).
I don't new about the guy Newsweek talks about, but the technology is real and I'm looking forward to hearing it.
Consider a distributed backup program which works roughly as follows.
This type of application would provide at least 3 important benefits for backup. First, its relatively cheap. If you want to backup more data, just buy more local disk space and trade files with more computers. This seems much easier (at least for a home user) than setting up a tape backup system, making sure the tapes get replaced, making sure the tapes get put someplace safe, etc. Second, its much safer than pretty much any backup system you could buy today commericially since your data is literally spread all over the world. Finally, the backup system isn't controlled by any large corporation.
Obviously there are still some details left to be worked out such as how to let computers who want to trade files find each other (both centralized and distributed options exist analagous to napster and gnutella), how to prevent cheating (having your computer periodically ask its partners for hashes of the data they are backing up should work), how to control redundancy most efficiently (error correcting codes like Reed-Solomon codes or Tornado codes would probably be smarter than just repeating data).
If you're looking for a great distributed open source project that will make the world a better place, I encourage you to develop prototypes for distributed backup. I plan to develop my own prototype one day, but currently I'm pretty busy with graduate school.
-Emin
Just like everyone else, I'm really excited about being able to get a PS2 and run Linux on it. Is there a mail/email address for the appropriate place in Sony that I can write to? I'd like to tell them that as soon as they release this in the US I'll be at the store waiting in line to buy it.
Since we all spend a lot of effort writing letters to companies who do things we DON'T like, it would be nice to write letters and express support (both moral and financial) for companies that do things we DO like.
By the way, many schools seem interested in buying computers for their students/teachers. With Linux running on a PS2, these schools could now have a cheap hardware platform which can run office apps, graphics, math programs, and play DVD's and video games! What do you think kids would be more excited about, getting a free desktop running Windows or a free PS2 running Linux?
You raise a good point, but leave out several important issues.
First, while there are lots of books on almost any subject you can think of, most of the books aren't necessarily that great. Even in the good books, a lot of ancillary information is included. A good set of lecture notes should summarize the most important points and provide you with the essentials. If you have these you can then learn the details from the book. Having good lecture notes to start with is quite useful.
Second, many of the valuable lessons learned in engineering are learned by working on projects. Of course, anyone can make up there own "projects" to work on for educational benefit. However, a relevant project which can be completed in a reasonable amount of time while illustrating important practical and theoretical issues which has been debugged and comes with the necessary background material (code, libraries, etc.) is very valuable.
Finally, although you could go to a library to check out books and learn the necessary material. It is much easier and more conveniant to be able to do it in the comfort of your home/office via the Internet.
-Emin
As you say, scalability really is the main issue. A recent technical paper called "The Capcity of Wireless Networks" argues that the per user capacity of a wireless net with n nodes is proportional to 1/sqrt(n). This means that if you quadruple the number of nodes you halve the capacity each user gets. This would imply that eventually adding more nodes decreases everyone's capacity!
The math behind the 1/sqrt(n) argument in the paper is a little involved. However, the 1/sqrt(n) essentially occurs because as you add more nodes, each node has to spend more and more time relaying other people's packets.
To see why this occurs, consider a WIRED network where every node is on a square k by k grid where the number of users is n = k*k. Let each node be connected by a wire only to it's four nearest neighbors. Assume that each wire can carry 1 packet/sec. To route packets to a far away desitination you need the intermediate nodes to relay for you.
If each node (i,j) wants to transmit a packet to another randomly chosen node (i',j'), then each packet has to go thourgh an average of k hops. This is because on average |i-i'| + |j-j'| = k. Therefore on average each packet goes through k hops. Since there are only n wires, on average you can only have n/k packets/second of throughput.
To get the throughput per user you divide the total throughput by the number of users to get capacity per user = n/k/n = 1/k = 1/sqrt(n) and there you have it. As you add more users the per user capacity goes down.
You might think that in a wireless network you can avoid the hops and have node (i,j) transmit directly to node (i',j'). The problem is that this would cause so much interference to all the other nodes that it makes your capacity even worse.
By the way, for those of you interested in learning more about zero-knowledge proofs can check out http://theory.lcs.mit.edu/~cis/zk/zk.html
How would this be used for authentication? I generate an instance of a hard problem, P, along with a claim, C, which I only I can prove. I publish (P,C) as a type of public key. If I want to prove to slashdot or Hotmail that I am me, I use a ZKP to prove C thus authenticating myself. Since I used a ZKP, even though slashdot now knows C is true, slashdot doesn't know how to prove C itself. So slashdot can't pretend to be me when talking to Hotmail (unless slashdot can factor or solve my chosen hard problem).
Some benefits of using a ZKP include:
So my question is why doesn't MS use a zero-knowledeg protocol to implement passport? Is this type of idea patented, or are there are other issues such as security, speed, etc.? I'm not trying to bash MS since I know that they have some pretty smart people there I'm just trying to find out why they didn't use ZKP.
I suspect the answer is because a ZKP based system would probably be easy to clone by open source people or other companies. On the other hand, passport seems to give them significant business advantages at the cost of security, interoperability, elegance, etc.
A lot of people have been pointing out that
the ultrawideband idea doesn't open up
more spectrum. Therefore it doesn't give
you better capacity, speed, etc. From what
I have heard, the main benefit of UWB is
supposed to be that the transmitters can
be made dirt cheap.
The UWB transmitters can be made by a spark gap. They basically spit out a really short pulse (i.e. a spark). This means you don't need fancy linear amplifiers. For conventional transmitters, CDMA, TDMA, FDMA, whatever you need to burn a lot of power do to circuit inefficiencies.
However, you don't hear Time-Domain talking about his advantage much, so maybe it doesn't exist...
Another issue people haven't talked about is channel estimation. If you blast your signal over an enormous range of the specturm it is going to get distored. Generally you compensate for this distortion with an equalizer. However, you have to somehow learn what the channel is in order to equalize it. Now if you are transmitting over a reasonably narrow range this is possible. It might be more difficult if you are transmitting over an enormous amount of spectrum.
I'm not saying UWB won't work. I'm just pointing out that many of its potential benefits and drawbacks don't seem to be addressed in the article or in the papers I've read. Does anynone know of references which discuss these issues?
Here is an alternate hat problem I heard from Everest Huang who heard it from Matt Secor:
Ten people are in a line so that each person can only see the people in front of him. God puts either a red or blue hat on each person. No one is allowed to look at their own hat but can see the hats of all people before them. Each person is allowed 1 guess at their own hat color. If they get it right, they live, otherwise they die immediatly. The person in the back of the line who can see the other 9 people guesses first and the guesses go on down the line.
Question: If god is malicious and places the hat colors to kill the maximum number of people, how many can you save? You can easily save 5 as follows: person 10 says person 9's hat color possibly killing person 10. Person 9 now knows his hat color and says it so he lives. Person 8 says person 7's hat color and possibly dies. Person 7 knows his hat color, says it and lives, etc. In this scheme, all the even numbered people die and the odd numbers live so you save 5. Can you save more?
If x is the answer then
echo x | md5sum
will return
7c5aba41f53293b712fd86d08ed5b36e -
The russian dude is named Vladimir Vapnik. He developed SVMs based on something called Statistical Learning Theory (SLT). One of the main insights of SLT regards the convergance of empirical averages.
The idea is that if you want to make a complicated estimate you need more data. For example, you need more data to fit a 10 degree polynomial to your data than you do to fit a 2 degree polynomial. This makes sense intuitively, but SLT provides the mathematical justification for the intuition. Basically SLT says that the estimate for the 2 degree polynomial will converge faster than the estimate for the 10 degree polynomial and it quantifies exactly how fast. Furthermore, SLT provides a way to measure how much data a particular kind of estimator or regression requires.
In the polynomial fitting example, it turns out that the model complexity is equal to the number of free parameters. This is not true in general. For example, NNs can have lots of free parameters but still have a low model complexity. This makes it possible to accuratly train an NN with 50 neurons using much less data than would be required for accuraly fitting a 50 degree polynomial. I think this is part of the reasons NNs do well in practice: NNs can represent large classes of models but training NNs converges better with less data because they have low model complexity in the sense of SLT.
The idea behind SVMs is to build a structure which is rich enough to represent a large class of relationships but constrained enough so that you can accuratly train it with limitied data. Thus SVMs trade off richness with accuracy based on the data. If you have only a little bit of data you use a structre which is constrained enough to be trained accuratly but might not model subtle relationships. Once you get more data then you can use richer structures and still maintain training accuracy. Its really a pretty nice idea.
Anyway, the point is that NNs do seem to work well in practice and Vapnik's work might help explain this. If you are interested in NNs or similiar things, I would suggest checking out SLT and SVMs. Unfortuantly both SLT and SVMs are pretty new so the math is still a little ugly.
The 2600 article was well written and convincing. I admit I haven't followed the DeCSS issue, but if the 2600 article is correct then it seems like this case is a major threat to free speech.
Before sending a check to the EFF and posting DeCSS on my web site, I just have one question. Is there a well written piece telling the other side of the story? Even though I don't have any reason to doubt the 2600 story, its always best to hear both sides before making a decision.
Thanks.
I can kind of do this with MS Word, or with latex+latex2html, texinfo, etc. but those are all incomplete. I think XML could make word processing much better not only because of the backward compatability, but also because you can write one document and "print" it in lots of different ways.
Have applications been written which do this kind of thing or do I need to write one myself?
-Emin Martinian
In high school, the explanation I got was that if you try and measure the position of an electron you have to bounce photons off of it. So you get a good measurement of the position. However, you get a crummy measurement of the momentum because bouncing photons off the electron changed its momentum. This is an interesting interpretation but it is wrong!
I wrote a semi-technical explanation of the Uncertainty Principle for a friend of mine a while ago and posted it on my web site at http://www.CSUA.Berkeley.EDU/~emin/writings/quantu m_ind.html in case anyone is interested. The explanation is available in PDF and HTML. To understand the explanation you need a basic idea of what integrals and derivatives are but you don't need to remember how to calculate them.
-Emin Martinian
Unfortunately it seems like a lot of people are more concerned about the "Free Beer" issues instead of the "Free Speech" issues. An excellent example is Napster. I'm sure that Napster could be legitimately used for legally exchanging music, but lets be honest. The main purpose of Napster is to allow people to exchange pirated music. When the big record labels convince sites to disallow Napster, they are not suppressing culture; they are stopping piracy.
I think that the record labels are justified in trying to stop piracy. They go to a lot of effort to produce the music they distribute, when you take their music without paying for it you are effectively stealing from them. If you think they are charging too much for the music you are free not to buy it. However to claim that they are evil because they are trying to stop people from pirating music is pure hypocrisy.
What makes the record companies and the DMCA evil is not that it is trying to stop piracy and tools like Napster. What makes them evil is that they are willing to trample over all sorts of other rights in their quest. We as a community should resist their efforts to make publishing source code or uncopyrighted music illegal. This is a position where we can make a strong ethical stand. However, if we focus on tools for piracy such as Napster or claim that record companies are stamping out culture by going after Napster we only weaken our position.
I would much rather have free speech than free beer. Protecting the right to distribute source code or express opinions is important but protecting piracy is not. By making a lot of noise to defend piracy we diminish our ability to defend the truly important issues.
For example, what can GC do if the programmer has a circular chain of objects which all reference each other but aren't referenced by anything else? Alternatively, what if a programmer creates a hole bunch of objects in the beginning of a program (for example in the first line of the main() function). If the programmer doesn't use these objects anymore after line 10 of main() will the GC clean them up at line 11? It seems like the objects are still in scope so the GC couldn't delete them. However they are never used again so they sit around wasting memory. Using malloc and free you could malloc a ton of objects in line 1 of main() and free them in line 11 of main().
Are their GC algorithms which can be proved to handle pathological cases such as this? You might say that this is an irrelvant question because nothing can save you from stupid/malicious programmers. I would agree. However, GC is often touted as protecting programs from stupid programmers or programming mistakes. Yet if you can't prove that GC always works then it seems like GC just makes memory management bugs less common and much more subtle. After many years of C/C++ programming, I'd rather have many simple memory leaks which I can track down with various tools as opposed to rare but impossible to understand bugs due to problems with the GC.
I'm not trying to put down GC. It's just that when I've used various GC langauges such as Tcl or Lisp I've encountered things that seem a lot like memory leaks. Is this because of buggy implementations of GC, bad programs which prevent the GC from working or because of fundamental limitations of GC?
By the way, I've heard that in languages like Java you need to set objects to NULL after you are done using them to be assured that the GC will clean them up. I haven't actually used Java, but it seems like that would defeat the whole purpose of garbage collection. After all, if I always have to remember to set objects to NULL then I might as well just always remember to call delete! Please tell me that the person I heard this from is wrong and Java isn't that lame.
Thanks.