> What was wrong with the Bay Area commute experience?
It wasn't the daily commute that was wrong - it was the rest of it.:) The big bike lane on the expressway (and the commuters who actually watched for bikes in it) was a very pleasant surprise. Sunnyvale, though, was an unexciting place to live, and without a car, it was difficult to do things other than get up, go to work, and get home (and go for a long ride through the mountains on weekends -- very nice). The caltrain stops running early enough that trips to San Francisco were a pain. I biked up there once, and only once - the roads were pretty nasty. It's not a very pedestrian-friendly place for the times when you don't want to have to haul your bike & lock around with you. Even getting to safeway took about 15 minutes.
The commute part was just fine. It was mostly the rest of living there without a car that I didn't like. In contrast, living in Boston without a car is great. Car ownership here just induces stress (has it been dinged yet today? stolen? ticketed?). For places within 20 miles of the city, I can usually get there just as fast on bike as you can via car.
I spent a summer in the bay area without a car (interesting experience; not recommended). Part of that involved a 12mi each way commute by bicycle. It was usually the highlight of the day - took about 50 minutes if I didn't want to get sweaty, about 5 minutes longer than it took via caltrain and walking. I'd highly recommend attempting your commute on a normal bike for a while and see how it pans out, particularly if it's under, say, 10-15 miles. The exercise is great, and it's a nice way to flush work from your system on the way home---and you get to pass all of those poor suckers in cars during rush hour.:) The advantages to a non-powered bike are several:
Easy storage - you can haul it into your office / apartment / up stairs, etc., with no effort.
Value - bike theft is a major problem in some areas. A good commuter bike is cheaper than an e-bike, and (because of the easy storage thing) easier to secure.
Efficiency - the MPGs are a little higher...;-)
Maintenence - maintaining your own bike is easy and rewarding. I suggest Zinn and the Art of Road Bike Maintenence (or mountain, if you're of that persuasion). Verrrrry good book. With no engine to take care of, it's easier to deal with on your own.
Exercise - goes without saying.
Easier to stuff in a car... just in case.:)
If you can shower at work, it's easier, but it's also very possible to take it a little easy on the way in to work and not show up smelling. Then you get the option of hammering it on the way home or just taking it easy.:)
Good luck with whichever way you decide to get to work -- far better than hauling a 3000lb steel beast to and fro every day!
A google returned search page for the word "search" is 20803 bytes of HTML, plus 311 bytes of header data. That's about 15 packets, so throwing in TCP+IP header overhead, it's about 21714 bytes. Without the CSS at the top, but with a few characters left for referencing the external style sheet, it's 21008 bytes. You've saved 706 bytes - or half a packet.
You've fetched an external CSS document. Google has sent a TCP SYN ACK (40 bytes), a TCP/IP header (40 bytes), an HTTP header (311 bytes), the CSS document (706 bytes), and two ACKs (80 bytes). Total = 1177. You've also processed an extra connection to the server, whose cost is difficult to quantify.
If your typical search model is that users grab the first page of results and go, external CSS is a big lose. You've wasted bandwidth, doubled the incoming bandwidth usage and number of server connections, slowed the user's experience, and increased the number of packets by about 20%. If users grab 3 or 4 pages, then it's a win. I bet google knows their usage patterns better than we do...
Now, of course, this argument is actually completely wrong -- because Google is very good about using content-encoding gzip. So what's the cost of that extra header when the results page is gzipped? 4905 bytes vs. 5204 bytes.
They both fit in 4 packets. That 300 bytes, effectively tagged on to the end of a packet that had to be sent anyway, amounts to pretty much squat. It's smaller than the (uncompressable, except by link compression) HTTP header required to fetch the CSS document. Note that Google doesn't compress for some browsers; if you're having problems replicating this in Opera, set your browser to ID itself as Mozilla 4.78 and it'll work properly.
I also don't buy the "ISP's cache" argument. There are tons and tons of users who don't get proxy cached. Also, if that cache is on the other end of a modem, then you're still not saving the user the expensive bit.
It's easy to replicate this. Fire up Ethereal, capture "tcp and port 80" while you do a google search. Follow the stream, and you'll see the relative sizes of things. The search I just did took 5536 bytes downstream - a bit more than the thought experiment version, but still 4 packets.
I think you're pretty off-base with respect to Google's awareness of the byte costs of various approaches. First of all, Google is trying to optimize the user's perception of speed - and downloading a separate CSS doc would require a second TCP connection, etc., etc., which could negatively impact both the user experience and the load on Google's servers. I wager that their common case is one search per user.
Second, have you actually _looked_ at the returned HTML from a Google search? It does use CSS within the returned page (see the style section), and it's very compact CSS and HTML.
The rest of their site has some "potential inefficiences" that could be corrected, but keep in mind that probably more than 99% of Google's traffic is search traffic. Amdahl's law - optimize the part that slows you down the most, not the little corner cases. Google's search results pages are very efficient.
Oh, and re the orkut thread, it was seeded with Orkut's friends and coworkers at Google, pretty much. The social network is pretty obvious in the way it grows out from there - stanford, google, bay area, computer science, geek schools, other schools, general population.
Except that while it's probably great in the studio, you rarely need 8fps for 40 frames when you're doing studio work, you need it for shooting in the field.
But, if you're a news photographer shooting close to your car, it's probably really easy to add a high-gain omnidirectional antenna on your car and have your shots get automatically replicated back there. Bet TV stations would sell the service to their print brethren for a hefty fee. (grin).
But the security angle of it is worrisome, if the only protocol it supports is FTP..
> Would you say that the patents for which you've applied have led you to write more software than you would have otherwise?
Indirectly, yes. The software I worked on was while I was doing an internship in a corporate research lab, and the patent made it more attractive for the company to pay me to do the work...
But I think there's a more fundamental issue. There are two kinds of patents you can get in computer science - a process patent (e.g., the method by which you perform RSA encryption) and an object patent (e.g., a disk drive). When people say software patents, they mean a process patent on the steps taken by the software. You don't actually have to write the software to get a patent on it - you have to figure out how the software would work. In many cases, this comes down to designing an algorithm. Creating a non-obvious, useful, and novel algorithm takes either loads of creativity, very hard work, or both.
I'm in academia, so I'd write software and design toys anyway. But corporate research labs (google, microsoft, ibm, vmware, compaq-hp-whatever-they-are, etc.) hire a _lot_ of Ph.D.'s, and pay them a lot of money, to design improved algorithms and processes for doing things. They need to justify their investment by getting use out of it, and to do so, they either have to keep it secret, or patent it. If they're selling a product based on it, keeping something secret in computer science is hard. If they're doing something in-house, like processing people's data and shipping it back to them, it's a bit easier, but much of our industry revolves around having people run the software you produce.
Outside of academic projects, there's not a ton of open-source research going on of the type that would produce an algorithm like RSA. There's a lot of implementation, and a lot of innovation, but not of the "I spent three years of my life working on this, and now I have a cool new invention." That is the kind of thing that I think software patents are good for--it gives people the incentive to take a big risk of investing a lot of time in developing an idea. I can take two weeks and hack up whatever I want, and nobody will notice the time I was gone, but if I want to work on something more substantive, I've got to justify it, and that applies whether in academia, a corporate lab, or working on my own.
The project I worked on that summer involved nine people over about six months - not an insubstantial time or money investment. Developing this thing probably cost the company $300k.
Now, I do have some opinions about seventeen years in the context of an industry like ours, but perhaps once we stop experiencing exponential growth, longer terms of patents will make sense.
Re:No, and you're assuming facts not in evidence
on
Microsoft's Patent Problem
·
· Score: 4, Interesting
> Almost all software patents are for stuff that's obvious
As one of the named inventors on a pending software patent application, I call BS on this. The patents you usually hear about, particularly on Slashdot, are bad. But that doesn't mean that "almost all" software patents are for stuff that was obvious when they were filed. In 1999, was the use of stego to encode digital watermarking information really obvious? The first academic conference on stego-related issues wasn't even created until 1996. I know some of the people who worked at Intertrust during its heyday - and they're damn smart crypto and security researchers. Look at some of the research papers from Intertrust. If you know anything about security, you'll recognize some very good computer scientists in there. Martin Abadi invented the logic used to analyze security protocols. Robert Tarjan quite literally wrote the book on advanced algorithms and data structures.
Now, contrast that with something like "a patent on the use of a web server to sell things" -- well, duh. But a patent that describes the method by which you use the high frequency components of an audio signal to digitally watermark an audio sample? It sounds kind of obvious in 2003 because that's how everyone's doing it, but the technology was quite new five years ago, and Intertrust was doing some of the preeminent research on it.
Don't blast all software patents because some are stupid. The system has a problem - a big one - but the fundamental concept of software patents isn't as silly as you might believe.
They shouldn't have sued for extortion. In the case of the guy who didn't own a TV, he should have informed DirectTV that he wasn't using it to pirate software, etc., and then if DirectTV had filed suit against them, should have filed a Rule 11 filing against DirectTV's attorney for failing to do due dilligence before filing the lawsuit:
By presenting to the court (whether by signing, filing, submitting, or later advocating) a pleading, written motion, or other paper, an attorney or unrepresented party is certifying that to the best of the person's knowledge, information, and belief, formed after an inquiry reasonable under the circumstances,--
(1) it is not being presented for any improper purpose, such as to harass or to cause unnecessary delay or needless increase in the cost of litigation;
(2) the claims, defenses, and other legal contentions therein are warranted by existing law or by a nonfrivolous argument for the extension, modification, or reversal of existing law or the establishment of new law;
(3) the allegations and other factual contentions have evidentiary support or, if specifically so identified, are likely to have evidentiary support after a reasonable opportunity for further investigation or discovery; and
(4) the denials of factual contentions are warranted on the evidence or, if specifically so identified, are reasonably based on a lack of information or belief.
People are too intimidated by lawsuits, and it's a crime that they let companies like DirectTV bully them into forking over a few grand. Of course, it's also pretty awful that to defend themselves against this kind of thing would probably cost $10k+...
Close, but not quite. Planetlab is not a closed, high performance network. Rather, it's more of an overlay testbed: The machines reside on the Internet (companies that host nodes) and on the Internet2 (research universities). That's part of what's so cool about it - the machines reside all over the world (see the map on the planetlab website - it's an accurate reflection of the location of the nodes). They have a lot of visibility into nooks and crannies on the Internet, and they're beginning to be deployed enough that there's often a planetlab node nearby, whereever in the network you are.
Article fluffy, planetlab not fluffy. For the moment, planetlab is primarily a research testbed. It has about 160 nodes deployed at 65 sites; these nodes are in use most of the time by a decently large group of researchers conducting internet measurement studies and research into distributed computation.
But - that's only part of the goal. Ultimately, I believe that the goal of Planetlab is to help transition these research technologies into deployed, useful services; so the network becomes more than just a research platform, it becomes the next DNS infrastructure, or the next Akamai, or the next Napster (ok, ok, don't sue!).
So, some of the examples the article cited are pretty illustrative. For example, the MIT Chord project is a Distributed Hash Table. DHTs are a peer-to-peer storage/retrieval system that allow completely decentralized resource sharing between cooperating hosts. And so on, and so on. The hope of the PlanetLab folk is that some of these projects will become the foundation for the next Internet architecture, or internet middleware, or whatever it is you want to call it -- the next set of critical services that change the way we use the 'net.
But even before that, Planetlab is one heck of a useful research tool. There are several papers at this year's Sigcomm conference (big computer networking conference) that took their measurements using Planetlab. There are a number of other papers and projects in the pipeline that're using planetlab as their research testbed. The cool thing about planetlab is that it's now considerably larger than most prior testbeds, and has a lot more momentum for future growth. Full disclosure: I spend a part of my time working on planetlab, but this post is not any kind of official view, it's just my interpretation.-
Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).
I'm from the same lab from which SFS comes, so I'm a bit biased, but I've been using it in a production setting for the last two years. My major use is to work from home and access my MIT filesystem remotely. I also maintain a network of ~40 machines distributed around the world, and I use SFS to provide access to centralized home directories on them. Very, very convenient. The software is stable, and the support is good. It works on *BSD and Linux. It also works on some versions of MacOS X, but may require an upgraded gcc on the latest (see the fs.net mail archives).
The short of it is that Matt was unable to treat many of his fellow
developers with the civility and respect that they deserve.
I think that's fairly clear. There are many strong, good hackers in this world who wouldn't be able to work together. While it's unfortunate that Matt and the rest of -core weren't able to resolve it, it's a fact of life in a big project...
> however Braitenberg's ideas came first so he
> probably deserves more recognition for this
> train of thought than the much more publicized Brooks.
Brooks
teaches the Embodied Intelligence course at MIT (which I took two years ago). One of the first things the course covers are Braitenberg's creatures (see the syllabus).
So while Brooks may certainly get more air-time than Braitenberg, he certainly gives credit where credit is due... but then, remember that Braitenberg focused on astoundingly simple circuits that lead to interesting-appearing behavior, whereas Brooks has used his approach to build working autonomous robots...
The nice thing about DHTs is that the interface is nearly identical on all of the platforms: Given a key, find the associated object. (And insert, of course). Most of the DHT teams are already working together to create a common interface so that they can easily be evaluated against each other. It's likely that the higher-level results from IRIS will be DHT agnostic. Some of the lower-level things (like making the DHTs themselves more resilient) will probably be done using each group's own DHT.
(Disclaimer: While I work in one of the groups that's participating in iris, these are only my guesses, not any kind of official word).
The model for much computer science research in the systems areas (networking, OS, etc.) is surprisingly close to open. The major publication players are USENIX, ACM, and IEEE. Of these, USENIX and ACM make all publications available on the web for free. IEEE digital library subscriptions are pretty affordable, and for all of these, subscriptions to the journals themselves are also affordable. An ACM Sigcomm membership (4 issues of CCR) is $23 year, $10 for students. Journal subscriptions are about $40/year.
Much of this has to do with CS researchers forcing the conference publishers to allow distribution of papers via personal webpages.
Once you have that, the rest follows.
But in fairness, Nature is only $160/year ($100 students), which covers 52 issues. Of course, you have to put up with advertising and pay a subscription...
Interesting. Do you know which towers these are? I've encountered one Verizon tower that was off by 1 second, but I reported the problem, and they got it fixed shockingly quickly. All of my time collectors are happy right now, and I suspect some must be on Verizon (the one in New York is, for certain). The time zone is irrelevant; the raw signal sends out GPS time, which has no time zone.
As a safeguard against this, my boxes NTP peer with a subset of each other, and each box peers with at least one external, nearby stratum-1 timeserver. It's a fairly robust setup; overall, there are about 15 CDMA time receivers, 3 GPS receivers, and 13 external stratum-1 servers involved. We're susceptable to GPS problems because of the large GPS-derived presence in our network, though three of the sites do peer with NIST atomic clocks. But that's not too big a worry. No individual clock failure will hurt things much, except rendering the attached box useless.
Maintaining a medium-size net of clocks
on
Do You Have The Time?
·
· Score: 5, Interesting
As part of the Resilient Overlay Networks project at MIT, I maintain a testbed of about 20 nodes, most of which have GPS-based time synchronization. We've started using a really fun little box from EndRun Technologies called the Praecis Ct. It gets GPS time that's being rebroadcast by cellular CDMA base stations. They provide accuracy to about 10 microseconds, and don't require a roof antenna -- anywhere you can get CDMA cellular service, you can use these things. They're kind of pricey (about $1k), but they're completely easy to use and set up. For more general information about NTP and things, see ntp.org, which mtaintains a nice FAQ about things-ntp.
For a few of the european hosts, we use GPS time receivers, primarily the Motorolla Oncore UT+ kits. You can get eval units of these, google around. They're nearly as easy to use, but do require a kernel config change.
It's really kind of addictive playing with time.:-) And you get spoiled by never having any clock weirdness on any of your machines...
Not really - consider SGI's servers, for instance. The Origin 3800 can handle 1 TB of RAM -- but it's a CC-NUMA machine, meaning you have to go through an intermediate router (don't think Internet; much faster) to get to the memory. SGI machines have a limit of 8GB per processor "brick", and their bricks interconnect at 1.6 or 1.2GB/s.
Then consider the SunFire 15K - it's an SMP machine; processors fit on boards that can contain up to 32GB of RAM; after that, you have to go off-board through a switch to get to other memory. Each system board has about 9.6GB/s of offboard memory access speed.
In short, Cray isn't tooting needlessly - this is impressive bandwidth to the memory. Latency is probably fairly high on it, but for streaming vast quantities of data in and out of local storage, it's probably amazingly nice.
com.google.soap.search.GoogleSearchFault: Invalid authorization key: xxxxxxxxxxxxxxx at com.google.soap.search.QueryLimits.lookUpAndLoadFr omINSIfNeedBe(Query ...
Alas, looks like the rest of us won't be able to play with Google's beta SOAP service. Which makes quite a bit of sense - this would be a great way for Google to allow people to resell Google in a standardized way, be it from inside a program (scary, too easy to reverse engineer) or from some other web service (less scary.:-). It doesn't make much sense for Google to say, "Hey, world, come and use our search services for free without our ads."
SFS (was Re:Corrections, pointers, and cautions)
on
Understanding NFS
·
· Score: 3, Informative
Yup. SFS is still "developmentware," but
it's the most stable developmentware you'll ever use; DM writes really solid code. I've been using it for more than a year to edit source code, listen to music, and generally access my school home directory from home (and from my laptop when I travel).
I haven't had any SFS problems for over 6 months, since 0.5i. But the notice is correct - your mileage may vary, and use with caution. I've seen SFS tickle bugs in the Linux NFS implementation, but the latest Linux NFS support is much improved over 2.2. On Open/FreeBSD, it's quite solid, IMHO.
For further info, browse the
SFS-users mailing list. It's a good way to get a feel for the issues involved in running SFS.
(Obligatory disclosure: I'm not one of the developers, but my office is across the hall).
Corrections, pointers, and cautions
on
Understanding NFS
·
· Score: 5, Informative
A few things in the article deserve to be
clarified. First, Lucas states that
"One thing to note is that NFS uses the same usernames on each side of
the connection." This is not accurate - NFS uses the same UIDs on both sides of the connection. If you don't have a unified UID space between your machines, you'll have.. issues.
Second, if you export NFS to the world,
you're insane and deserve what you get.
If you want remote filesystem access, use a secure protocol like the Self-Certifing Filesystem (SFS). SFS also avoids
completely the problem of having a shared
UID space.
Finally, his advice to mount your filesystems intr is good. But insufficient - also mount them soft, so that filesystem calls will eventually timeout if the server goes poof.
Please don't use the google mirror.
We've changed some of the links in the
main page, and updated it a bit to
point out things like the 173 megabyte
download. If you use the google mirror,
it will actually hurt our servers more
than it will help. Ironic, that.
Some comments from another student in Nick and
Jaeyoun's group:
Sorry about the slashdotting. Small server configuration error that's been fixed now. Browse away.
Roboguard and friends were a class project; it wasn't DARPA or NSF funded, it was all for fun and a good grade.:)
Our research group does networks and mobile systems research for our day jobs...
The Cricket Project that was used in the "Mother" robot is part of our real research.
Much of the robotics research at MIT happens in the AI Lab, so if you're curious about robotics, browse over there and see the things that the
Humanoid Robotics Group is doing. Very cool stuff.
> What was wrong with the Bay Area commute experience?
:) The big bike lane on the expressway (and the commuters who actually watched for bikes in it) was a very pleasant surprise. Sunnyvale, though, was an unexciting place to live, and without a car, it was difficult to do things other than get up, go to work, and get home (and go for a long ride through the mountains on weekends -- very nice). The caltrain stops running early enough that trips to San Francisco were a pain. I biked up there once, and only once - the roads were pretty nasty. It's not a very pedestrian-friendly place for the times when you don't want to have to haul your bike & lock around with you. Even getting to safeway took about 15 minutes.
It wasn't the daily commute that was wrong - it was the rest of it.
The commute part was just fine. It was mostly the rest of living there without a car that I didn't like. In contrast, living in Boston without a car is great. Car ownership here just induces stress (has it been dinged yet today? stolen? ticketed?). For places within 20 miles of the city, I can usually get there just as fast on bike as you can via car.
If you can shower at work, it's easier, but it's also very possible to take it a little easy on the way in to work and not show up smelling. Then you get the option of hammering it on the way home or just taking it easy. :)
Good luck with whichever way you decide to get to work -- far better than hauling a 3000lb steel beast to and fro every day!
You've fetched an external CSS document. Google has sent a TCP SYN ACK (40 bytes), a TCP/IP header (40 bytes), an HTTP header (311 bytes), the CSS document (706 bytes), and two ACKs (80 bytes). Total = 1177. You've also processed an extra connection to the server, whose cost is difficult to quantify.
If your typical search model is that users grab the first page of results and go, external CSS is a big lose. You've wasted bandwidth, doubled the incoming bandwidth usage and number of server connections, slowed the user's experience, and increased the number of packets by about 20%. If users grab 3 or 4 pages, then it's a win. I bet google knows their usage patterns better than we do...
Now, of course, this argument is actually completely wrong -- because Google is very good about using content-encoding gzip. So what's the cost of that extra header when the results page is gzipped? 4905 bytes vs. 5204 bytes. They both fit in 4 packets. That 300 bytes, effectively tagged on to the end of a packet that had to be sent anyway, amounts to pretty much squat. It's smaller than the (uncompressable, except by link compression) HTTP header required to fetch the CSS document. Note that Google doesn't compress for some browsers; if you're having problems replicating this in Opera, set your browser to ID itself as Mozilla 4.78 and it'll work properly.
I also don't buy the "ISP's cache" argument. There are tons and tons of users who don't get proxy cached. Also, if that cache is on the other end of a modem, then you're still not saving the user the expensive bit.
It's easy to replicate this. Fire up Ethereal, capture "tcp and port 80" while you do a google search. Follow the stream, and you'll see the relative sizes of things. The search I just did took 5536 bytes downstream - a bit more than the thought experiment version, but still 4 packets.
Second, have you actually _looked_ at the returned HTML from a Google search? It does use CSS within the returned page (see the style section), and it's very compact CSS and HTML.
The rest of their site has some "potential inefficiences" that could be corrected, but keep in mind that probably more than 99% of Google's traffic is search traffic. Amdahl's law - optimize the part that slows you down the most, not the little corner cases. Google's search results pages are very efficient.
Oh, and re the orkut thread, it was seeded with Orkut's friends and coworkers at Google, pretty much. The social network is pretty obvious in the way it grows out from there - stanford, google, bay area, computer science, geek schools, other schools, general population.
But, if you're a news photographer shooting close to your car, it's probably really easy to add a high-gain omnidirectional antenna on your car and have your shots get automatically replicated back there. Bet TV stations would sell the service to their print brethren for a hefty fee. (grin).
But the security angle of it is worrisome, if the only protocol it supports is FTP..
Indirectly, yes. The software I worked on was while I was doing an internship in a corporate research lab, and the patent made it more attractive for the company to pay me to do the work...
But I think there's a more fundamental issue. There are two kinds of patents you can get in computer science - a process patent (e.g., the method by which you perform RSA encryption) and an object patent (e.g., a disk drive). When people say software patents, they mean a process patent on the steps taken by the software. You don't actually have to write the software to get a patent on it - you have to figure out how the software would work. In many cases, this comes down to designing an algorithm. Creating a non-obvious, useful, and novel algorithm takes either loads of creativity, very hard work, or both.
I'm in academia, so I'd write software and design toys anyway. But corporate research labs (google, microsoft, ibm, vmware, compaq-hp-whatever-they-are, etc.) hire a _lot_ of Ph.D.'s, and pay them a lot of money, to design improved algorithms and processes for doing things. They need to justify their investment by getting use out of it, and to do so, they either have to keep it secret, or patent it. If they're selling a product based on it, keeping something secret in computer science is hard. If they're doing something in-house, like processing people's data and shipping it back to them, it's a bit easier, but much of our industry revolves around having people run the software you produce.
Outside of academic projects, there's not a ton of open-source research going on of the type that would produce an algorithm like RSA. There's a lot of implementation, and a lot of innovation, but not of the "I spent three years of my life working on this, and now I have a cool new invention." That is the kind of thing that I think software patents are good for--it gives people the incentive to take a big risk of investing a lot of time in developing an idea. I can take two weeks and hack up whatever I want, and nobody will notice the time I was gone, but if I want to work on something more substantive, I've got to justify it, and that applies whether in academia, a corporate lab, or working on my own. The project I worked on that summer involved nine people over about six months - not an insubstantial time or money investment. Developing this thing probably cost the company $300k.
Now, I do have some opinions about seventeen years in the context of an industry like ours, but perhaps once we stop experiencing exponential growth, longer terms of patents will make sense.
As one of the named inventors on a pending software patent application, I call BS on this. The patents you usually hear about, particularly on Slashdot, are bad. But that doesn't mean that "almost all" software patents are for stuff that was obvious when they were filed. In 1999, was the use of stego to encode digital watermarking information really obvious? The first academic conference on stego-related issues wasn't even created until 1996. I know some of the people who worked at Intertrust during its heyday - and they're damn smart crypto and security researchers. Look at some of the research papers from Intertrust. If you know anything about security, you'll recognize some very good computer scientists in there. Martin Abadi invented the logic used to analyze security protocols. Robert Tarjan quite literally wrote the book on advanced algorithms and data structures.
Now, contrast that with something like "a patent on the use of a web server to sell things" -- well, duh. But a patent that describes the method by which you use the high frequency components of an audio signal to digitally watermark an audio sample? It sounds kind of obvious in 2003 because that's how everyone's doing it, but the technology was quite new five years ago, and Intertrust was doing some of the preeminent research on it.
Don't blast all software patents because some are stupid. The system has a problem - a big one - but the fundamental concept of software patents isn't as silly as you might believe.
Federal rules of civil procedure, Rule 11
(b) Representations to Court.
By presenting to the court (whether by signing, filing, submitting, or later advocating) a pleading, written motion, or other paper, an attorney or unrepresented party is certifying that to the best of the person's knowledge, information, and belief, formed after an inquiry reasonable under the circumstances,--
(1) it is not being presented for any improper purpose, such as to harass or to cause unnecessary delay or needless increase in the cost of litigation;
(2) the claims, defenses, and other legal contentions therein are warranted by existing law or by a nonfrivolous argument for the extension, modification, or reversal of existing law or the establishment of new law;
(3) the allegations and other factual contentions have evidentiary support or, if specifically so identified, are likely to have evidentiary support after a reasonable opportunity for further investigation or discovery; and
(4) the denials of factual contentions are warranted on the evidence or, if specifically so identified, are reasonably based on a lack of information or belief.
People are too intimidated by lawsuits, and it's a crime that they let companies like DirectTV bully them into forking over a few grand. Of course, it's also pretty awful that to defend themselves against this kind of thing would probably cost $10k+...
Close, but not quite. Planetlab is not a closed, high performance network. Rather, it's more of an overlay testbed: The machines reside on the Internet (companies that host nodes) and on the Internet2 (research universities). That's part of what's so cool about it - the machines reside all over the world (see the map on the planetlab website - it's an accurate reflection of the location of the nodes). They have a lot of visibility into nooks and crannies on the Internet, and they're beginning to be deployed enough that there's often a planetlab node nearby, whereever in the network you are.
But - that's only part of the goal. Ultimately, I believe that the goal of Planetlab is to help transition these research technologies into deployed, useful services; so the network becomes more than just a research platform, it becomes the next DNS infrastructure, or the next Akamai, or the next Napster (ok, ok, don't sue!).
So, some of the examples the article cited are pretty illustrative. For example, the MIT Chord project is a Distributed Hash Table. DHTs are a peer-to-peer storage/retrieval system that allow completely decentralized resource sharing between cooperating hosts. And so on, and so on. The hope of the PlanetLab folk is that some of these projects will become the foundation for the next Internet architecture, or internet middleware, or whatever it is you want to call it -- the next set of critical services that change the way we use the 'net.
But even before that, Planetlab is one heck of a useful research tool. There are several papers at this year's Sigcomm conference (big computer networking conference) that took their measurements using Planetlab. There are a number of other papers and projects in the pipeline that're using planetlab as their research testbed. The cool thing about planetlab is that it's now considerably larger than most prior testbeds, and has a lot more momentum for future growth. Full disclosure: I spend a part of my time working on planetlab, but this post is not any kind of official view, it's just my interpretation.-
This reminds me of a paoper that was just presented at USENIX:
Fast, Scalable Disk Imaging with Frisbee. Fun talk.
Pretty cool tricks - they use multicast and filesystem specific compression techniques to parallel load the disks on a subset of the disks in the cluster. Very very very fast. (I use the disk imaging part of their software to load images on my test machines at MIT, and I'm quite impressed).
Anyway, just a bit of related cool stuff.
I'm from the same lab from which SFS comes, so I'm a bit biased, but I've been using it in a production setting for the last two years. My major use is to work from home and access my MIT filesystem remotely. I also maintain a network of ~40 machines distributed around the world, and I use SFS to provide access to centralized home directories on them. Very, very convenient. The software is stable, and the support is good. It works on *BSD and Linux. It also works on some versions of MacOS X, but may require an upgraded gcc on the latest (see the fs.net mail archives).
Highly recommend cheking it out. Mega convenient.
I think that's fairly clear. There are many strong, good hackers in this world who wouldn't be able to work together. While it's unfortunate that Matt and the rest of -core weren't able to resolve it, it's a fact of life in a big project...
> probably deserves more recognition for this
> train of thought than the much more publicized Brooks.
Brooks teaches the Embodied Intelligence course at MIT (which I took two years ago). One of the first things the course covers are Braitenberg's creatures (see the syllabus). So while Brooks may certainly get more air-time than Braitenberg, he certainly gives credit where credit is due. .. but then, remember that Braitenberg focused on astoundingly simple circuits that lead to interesting-appearing behavior, whereas Brooks has used his approach to build working autonomous robots...
> I wonder which DHT they'll use
The nice thing about DHTs is that the interface is nearly identical on all of the platforms: Given a key, find the associated object. (And insert, of course). Most of the DHT teams are already working together to create a common interface so that they can easily be evaluated against each other. It's likely that the higher-level results from IRIS will be DHT agnostic. Some of the lower-level things (like making the DHTs themselves more resilient) will probably be done using each group's own DHT.
(Disclaimer: While I work in one of the groups that's participating in iris, these are only my guesses, not any kind of official word).
Much of this has to do with CS researchers forcing the conference publishers to allow distribution of papers via personal webpages. Once you have that, the rest follows.
But in fairness, Nature is only $160/year ($100 students), which covers 52 issues. Of course, you have to put up with advertising and pay a subscription...
As a safeguard against this, my boxes NTP peer with a subset of each other, and each box peers with at least one external, nearby stratum-1 timeserver. It's a fairly robust setup; overall, there are about 15 CDMA time receivers, 3 GPS receivers, and 13 external stratum-1 servers involved. We're susceptable to GPS problems because of the large GPS-derived presence in our network, though three of the sites do peer with NIST atomic clocks. But that's not too big a worry. No individual clock failure will hurt things much, except rendering the attached box useless.
For a few of the european hosts, we use GPS time receivers, primarily the Motorolla Oncore UT+ kits. You can get eval units of these, google around. They're nearly as easy to use, but do require a kernel config change.
It's really kind of addictive playing with time. :-) And you get spoiled by never having any clock weirdness on any of your machines...
Not really - consider SGI's servers, for instance. The Origin 3800 can handle 1 TB of RAM -- but it's a CC-NUMA machine, meaning you have to go through an intermediate router (don't think Internet; much faster) to get to the memory. SGI machines have a limit of 8GB per processor "brick", and their bricks interconnect at 1.6 or 1.2GB/s.
Then consider the SunFire 15K - it's an SMP machine; processors fit on boards that can contain up to 32GB of RAM; after that, you have to go off-board through a switch to get to other memory. Each system board has about 9.6GB/s of offboard memory access speed.
In short, Cray isn't tooting needlessly - this is impressive bandwidth to the memory. Latency is probably fairly high on it, but for streaming vast quantities of data in and out of local storage, it's probably amazingly nice.
com.google.soap.search.GoogleSearchFault: Invalid authorization key: xxxxxxxxxxxxxxxr omINSIfNeedBe(Query
:-). It doesn't make much sense for Google to say, "Hey, world, come and use our search services for free without our ads."
at com.google.soap.search.QueryLimits.lookUpAndLoadF
...
Alas, looks like the rest of us won't be able
to play with Google's beta SOAP service. Which makes quite a bit of sense - this would be a great way for Google to allow people to resell Google in a standardized way, be it from inside a program (scary, too easy to reverse engineer) or from some other web service (less scary.
I haven't had any SFS problems for over 6 months, since 0.5i. But the notice is correct - your mileage may vary, and use with caution. I've seen SFS tickle bugs in the Linux NFS implementation, but the latest Linux NFS support is much improved over 2.2. On Open/FreeBSD, it's quite solid, IMHO.
For further info, browse the SFS-users mailing list. It's a good way to get a feel for the issues involved in running SFS.
(Obligatory disclosure: I'm not one of the developers, but my office is across the hall).
Second, if you export NFS to the world, you're insane and deserve what you get. If you want remote filesystem access, use a secure protocol like the Self-Certifing Filesystem (SFS). SFS also avoids completely the problem of having a shared UID space.
Finally, his advice to mount your filesystems intr is good. But insufficient - also mount them soft, so that filesystem calls will eventually timeout if the server goes poof.
Please don't use the google mirror.
We've changed some of the links in the
main page, and updated it a bit to
point out things like the 173 megabyte
download. If you use the google mirror,
it will actually hurt our servers more
than it will help. Ironic, that.
-Dave
- Sorry about the slashdotting. Small server configuration error that's been fixed now. Browse away.
- Roboguard and friends were a class project; it wasn't DARPA or NSF funded, it was all for fun and a good grade.
:)
Our research group does networks and mobile systems research for our day jobs...
- The Cricket Project that was used in the "Mother" robot is part of our real research.
- Much of the robotics research at MIT happens in the AI Lab, so if you're curious about robotics, browse over there and see the things that the
Humanoid Robotics Group is doing. Very cool stuff.
-Dave(You may, if you'd like, verify that I exist. http://nms.lcs.mit.edu/~dga, not that it's a particularly exciting page) -Dave