The Internet 2 is a consortium maintains a high-speed backbone across the US; the costs are subsidized by the government so universities can communicate each other at 10Gbps rates without having to go out to the commercial Internet. A small portion of the funding goes to some middleware projects.
However, most NSF-funded networking projects use the I2 as their testbed, but they're not necessarily a part of the I2. For example, GENI - the US effort to redesign internet protocols from the ground up - will run in parallel with I2. GENI is the US counterpart to this Japanese effort (although it's hard to tell from the light-on-details article).
How is lower-quality broadband out in the middle of Bumfuck, Iowa, hurting the American economy? Because it means that those living in Bumfuck, Iowa can't participate in the American economy?
A salary of $60k in Iowa is equivalent to $100k in California. $60k/year will buy you a nice family house, decent car, and a easy-going lifestyle. If the national telecom infrastructure was up to date, there would be many jobs that can be done in the middle of Iowa that are now done in California. Alternately, for a bit more than the salary you pay to an Indian programmer (well, a bit more than those who now are demanding more money...), you can get a native English speaker *in a nearby timezone*.
With low-quality or no broadband, you lose this potential workforce.
NSF does do seed money (i.e., smaller grants to get infrastructure started). However, they need to find a way to do sustainable funding in a better manner. Nothing's worse to see a seed grant help a site get started, only to have it dry up 2 years later (or, forcing the poor PI to become a full-time grant writer).
I'm not saying we need to send $50 million to Nowhereville, Arkansas. However, it'd be nice to see a little more invested in infrastructure out there...
Coming from a school which isn't on either coast, I think it's great that NSF money is being spread around a little bit.
While I'm not questioning the excellence of the existing facilities, there's certainly faculty outside of those who might be able to be more cost-effective with their money. A $2M grant falls into the "small fish" category in some places, while a $2M grant might be huge at smaller schools. Further, I personally believe the cost of getting a grant off the ground is less expensive in the Midwest. $60k is a great wage in the area where I live; taking into account the cost of living, it is equivalent to $100k on the coast. The NSF's money can simply be spread around farther.
I'm not faulting anyone here (the big places are top-notch, NSF projects usually do show good return on investment), but it'd be nice to see the money go to other places.
First of all, you assume that I'm taking any engineering courses. I'm not. If I was taking engineering courses, it would bother me less. I'm bothered by the $40 / hour fee for theory courses and thesis hours in Computer SCIENCE.
Further, in grad school, all your courses are CSE courses. Traditionally, the department agrees to pay for your tuition in order to entice you to come (my undergrad professors told me that if they don't offer to pay for your tuition plus a stipend, a CS department isn't seriously giving you an offer). So, student fees effectively reduce the standard stipend amount by 10%.
If you reduce an already underwhelming amount by 10% (compared to industry), it seriously effects the department's ability to recruit new students. I sure wouldn't have come to the department if they told me the college would be skimming off 10% of my pay.
The tuition is also going up - but not at enough of a rate to cover costs. The state continues to increase its funding of the university at a rate smaller than inflation, so the burden of the costs falls upon students through federal student loans. Where the state used to cover most of the university's costs, the feds supply the largest chunk of the pie through loans.
Again, the department attracts students by paying their tuition. The college gets around this by calling it a "fee", meaning it's cold cash from my pocket. I realize the value of a good education so I pay it, but the question is how many prospective students go to Kansas/Missouri/Iowa because they don't want to cough up $1000 / semester in cash to the university on an "all tuition paid" scholarship.
Trust me, I've discussed where the fees are going. All of it goes directly to the Engineering college's general coffer. Of this, a certain (smaller) percentage goes back to the CS department's budget.
The CS department is headed by a very logical guy. He took the department's portion of the money and raised the TA/RA salary by $50 a month, which covers about 50% of the fees. It's quite literally the best he could do - he had no say in the levying of the fee.
So, basically the money changes hands a couple of times, most of it ending up in the college's accounts, and no overall benefit to the CS department. Sucks for us, eh?
The fees took the CS department by surprise - the professors were just as mad as the students. They knew it would hinder their ability to attract good students and not really benefit them.
I'm enrolled at the University of Nebraska-Lincoln, and those engineering fees SUCK.
For example, the Computer Science and Engineering is 40% engineering, so every 3 hour class I take has a $150 fee attached to it. What if it is a 3 hour Computing Theory course? $150 extra. What if it is thesis hours? $150 extra. All because engineering courses "cost more". Even if, like thesis hours, there is no classroom.
What's worse is that this is FEE, not additional tuition. So, graduate students can't get them paid by their scholarships. The all-tuition-paid scholarship doesn't quite mean the same thing at UNL if you have to pay $1000 PER SEMESTER in fees. The involved departments have a harder time attracting top quality talent because of this. They are quite literally focused on the short term cash gain rather than the long term effects on the college.
There are other, indirect effects. Bio, chem, and physics students used to take computational courses to learn the basics of clustered computing. This resulted in long-lasting collaborations between these departments. Computational scientists worked out better algorithms for the physicists, and the physicists got better results. The grad students no longer take these classes, meaning that they are at a disadvantage - or just ignore the computational side of their subjects.
It's lose-lose-lose for the students, professors, and departments involved. The university, however, makes a bit more money.
(Not only do these fees specifically piss me off, they decided to "surprise" the students with them. I mean, the plans were put out for anyone to read. In a cellar. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard'.)
So, 30 racks per month... for a 15 year project. Say you only buy the first 5 years worth of disks - a simple 1800 racks.
The LHC went with a tape-based, distributed storage system. Seven T1 sites around the world keep the data on tape (one copy at CERN, another copy at a T1 site). They do reconstruction of the raw data, and write the reconstructed data on disk. They then distribute the reco data to a T2 site, which has a large amount of disk-only space (like you suggest). The individual physicist does the analysis at the T2 site.
From the section on whose assets can be frozen. """ or to have acted or purported to act for or on behalf of, directly or indirectly, """
So, if someone accuses you of doing this (she's a witch!), they can freeze your assets. Forget being able to face your accuser, presumed innocence, fair trial, etc. I thought we left Salem a long time ago.
So, what happens after they freeze your assets because your neighbors said they say you at a communist, err... terrorist, meeting?
""" Sec. 8. This order is not intended to, and does not, create any right, benefit, or privilege, substantive or procedural, enforceable at law or in equity by any party against the United States, its departments, agencies, instrumentalities, or entities, its officers or employees, or any other person. """
In other words, if we screw up in freezing the assets, we don't give you the right to file a lawsuit or any procedure to get your things back.
Uh, I don't think you parsed that English correctly, that's not what the sentence says.
Let me cut out some of the extra fluff. The first sentence says
"I find that, due to unusual threats posed by violence in Iraq and efforts undermining economic reconstruction in Iraq, it's in US interest to take additional steps. I hereby order..."
The first paragraph is just an introduction. It says that the point of the Executive Order is to hurt those who are trying to hurt Iraq; that has no legally binding meaning, except as a justification to why it's being done.
I also don't have a problem with the gov't blocking bank accounts of terrorists. They already do this. Part (i) of Section 1 goes after people who are doing the terrorism. Part (iii) is the interesting one:
""" (iii) to be owned or controlled by, or to have acted or purported to act for or on behalf of, directly or indirectly, any person whose property and interests in property are blocked pursuant to this order.
(b) The prohibitions in subsection (a) of this section include, but are not limited to, (i) the making of any contribution or provision of funds, goods, or services by, to, or for the benefit of any person whose property and interests in property are blocked pursuant to this order... """
See what happens there? They aren't referring to terrorists; they are referring to people who may be indirectly linked to terrorists. That's where the privacy rights people get up in arms. If I buy oil from the Saudis, and the Saudis donate the money to a charity which turns it over to terrorists, do I "indirectly" help them out? Who gets to define what "indirect" means; if it's the executive branch, it isn't a jury of your peers...
Ambiguity like this covers a wide swath of activities. I'm not claiming something crazy like they are going to start arresting people for buying gas, but it's not hard to read this order as "we now have the power to arbitrarily arrest people, but we only plan to apply it to terrorists."
I wish the logic said the power "CAN only" be used against terrorism. But instead, they the new power is claimed encompassing some ill-defined "indirect contributors" group, and a press release was made saying it "WILL only" be used against terrorism. The later depends on you trusting the government to hold its word; doesn't always seem to be true.
Correct Usage: Human writes SQL schema, then integrates the ORM. As long as human doesn't do silly things in the programming language, we have success. Database objects become much easier to use, and speed is fast.
Incorrect Usage: Human writes an object spec. ORM auto-generates SQL schema. Human blindly uses machine-generated ORM bindings without understanding underlying SQL. Database gets mildly large, then human complains "the stupid thing is slow".
If ORM is done for convenience, great! That's what I use it for. If ORM is used in lieu of understanding how SQL works, you could be headed for trouble.
ORM is great!... until you you have a couple hundred thousand rows. Then it's slow.... until you have a couple million rows. Then it's unbearable.
I love ORM for smaller applications, but there's always a point where heading down the hall to say "hi" to the local DBA is a good idea. And beware, redesigning the DB from the ORM to your own schema can be extremely painful. How close the ORM schema is to "pleasant" depends highly upon the package you use.
This is from someone who is trying to perform queries on someone else's database designed with Hibernate. One that has 12 million rows (average row size, 9KB). Which has been running my simple query for 40 minutes.
How about something that scales up to handle the data needs of the largest scientific experiment known to man? dCache and Castor (both distributed storage services) run the SRM protocol, which is SOAP-based. Both scale to many petabytes.
Now, both are also somewhat buggy and unreliable, as they are funded on shoestring budgets by national labs. But that's moot - the idea is that this has already been done.
I know some prior art which would strike some of these claims.
Distributed storage system: dCache, http://www.dcache.org/. This allows one to configure storage as multiple commodity storage servers at one site (up to several petabytes) or distributed over several sites (as used by NorduGrid).
One of the many protocols dCache supports is SRM (Storage Resource Manager), which is a web-services (SOAP) based protocol which allows you to perform your usual copy/delete/ls. It's designed as a generic protocol which several distributed systems implement.
Finally, SRM v2.2 (whose published spec at least predates the patent application, if not any of the big implementations) also has the concept of "Storage Classes" which allows the user to specify how the file should be stored (temporary with lifetime X, on tape, on disk, on disk with multiple replicas, etc).
Plus, Globus has been doing web-services storage for years, if not among a single "distributed" system - unless you count the grid as the system. Finally, the SRB from UCSD also implements a lot of these claims.
If the patent examiner is competent, many of these claims will be struck down and the patent will be refiled with a much narrower scope. Hopefully he's reading this very article!
One of the interesting parts of the LGPL that I found out about recently is Section 6. Someone else has linked the text already in this article, so I'll spare you the details.
Basically, it says that the user must be able to replace the current LGPL code with his/her own version and the device must still be operational. So, Apple doesn't need to release Safari's source code (beyond WebKit), but it *does* have to give you the opportunity to replace the WebKit pieces with your custom patches and have them work on the iPhone.
Or so goes one reading of the license. I'm sure Apple's legal dep't has a different reading.
At any rate, it's a huge step backward for a company which touts the power of open source in its OS on its webpage. It's a huge step backward for the company which recently made a tiny step forward in releasing non-DRM'd MP3s from its online store.
In the off chance that Steve Jobs reads this, I *did* buy my Macs almost solely because of the Unix/Open Source underpinnings. Having a closed iPhone is not all that exciting.
It works fine, but we actually tend to lean toward many streams as opposed to uber-fast single streams.
Truthfully, you have to tweak the system pretty hard to get decent performance over a single stream (for us, 155 Mbps isn't sufficient - I work on a LHC project), especially from Nebraska to Switzerland (CERN). FAST TCP helps out a whole lot. GridFTP is the other piece of the equation - it is basically FTP with multiple data streams.
We tend to lean on hundreds of streams a whole lot more than tweaking TCP settings, and the Caltech guys give us heck for that. They're right, however - if you're getting 100s of KBps per stream to some European site, it just takes a ridiculous number of streams to get up to 100 MBps. Right now, the storage systems are behind the network, so we haven't even been able to start playing with FAST TCP yet.
In fact, we often talk with the Caltech folk about deploying FAST TCP; the problem is that both ends need to deploy the kernel patches. Truthfully, the limiting factor becomes the disk systems, not the network. When we start to push closer to 10 Gbps instead of 4-6 Gbps, we'll need to make smarter decisions about the TCP stacks.
I was referring to the Caltech efforts, not the commercialized device. Don't know anything about the Aria products. For example, see the below link:
http://ultralight.caltech.edu/web-site/sc05/html/i ndex.html
This is based on a patched Linux 2.6 kernel, and it's a couple of years old.
Think again. I suspect that you only have tried that on a low-speed link (DSL, Cable, FIOS, etc). Try thinking about 2 orders of magnitude faster.
I transfer about 20 TB / day at work, and that wouldn't be possible with a "typical FTP connection".
If you read the papers coming out of Caltech, you'd see they were optimizing for 10 Gbps lines, not residential lines. 15-20x faster is a very fair estimate; look at Caltech's presentations at SC05 or SC07.
You're thinking way too small. FAST TCP was designed with 10 Gbps links in mind - i.e., Internet2 type applications. FAST TCP streams are able to achieve several hundred Mbps. FTP streams over TCP Reno usually max out on something relatively pathetic, like 10-20 Mbps.
Caltech's SC07 presentation showed commodity servers which could transfer 2 Gbps end-to-end using their FDT tool (Java based, actually). The servers had 4 HDDs, dual Gigabit ethernet conncetions, and ran a Linux 2.6 kernel with the FAST TCP patches.
On the other hand, GridFTP takes a different approach - parallelizing several TCP streams at a time. Why get a single stream going 10x faster using a special Linux kernel when you just send 10 parallel TCP streams at once? (While GridFTP over TCP with lots of streams is more popular than FAST TCP with GridFTP using a small number of streams, imagine what happens to your RAID array when there is a large (>100) number of streams of data coming off different parts of the same disk...)
Traditional TCP streams (such as what you get with FTP) top out around 10-20 Mbps. If you want to see a single stream go a couple hundred Mbps, you need TCP tweaks like FAST (however, FAST is one of many competing TCP "fixes").
Anyone know what the first file sent across it was? I'd imagine either pics of Xena or a Stargate: Atlantis episode.
Hi,
I transferred some of the first files over this network. It was Monte Carlo physics data produced for the CMS project. It took us about 20 minutes to go from 0 to 6.5 Gbps (we have a 10Gbps link to I2).
P2P is not a big application on I2. Simply put, clients like Bittorrent don't scale well for individual transfers and there aren't enough transfers to really aggregate to an impressive number.
I expect at least 100 Mbps per file using our transfer tools, then transfer many files at once.
I'm of the same thought - the other guy's missile defense system is about a valid military as possible.
The missile-targeting issue is nearly-moot: it only takes what, a couple minutes to re-aim the missiles in the ground?
The larger issue is that Russia feels threatened by US armaments in its "backyard" for no apparent reason, just like we would feel threatened if Russia put an ABM system in Canada to "fight off the Mexicans". They don't buy Bush's line of "we're doing this to fight the terrorists". Sometimes it would be nice if politicians used Newspeak - "we view the ABM system as plusungood" instead of veiled threats. We need to work with these people, not stir up the anthill.
That said, I don't really care for the crackdown on liberties in Russia.
If your numbers are right, the Roman Catholic Church is a nice second place compared to the Mormon church:
c h_of_Jesus_Christ_of_Latter-day_Saints
http://en.wikipedia.org/wiki/Finances_of_The_Chur
In 1997 (ten years ago, they've grown significantly since then), they were estimated to have $30 billion in assets and an annual income of $6 billion
The Internet 2 is a consortium maintains a high-speed backbone across the US; the costs are subsidized by the government so universities can communicate each other at 10Gbps rates without having to go out to the commercial Internet. A small portion of the funding goes to some middleware projects.
However, most NSF-funded networking projects use the I2 as their testbed, but they're not necessarily a part of the I2. For example, GENI - the US effort to redesign internet protocols from the ground up - will run in parallel with I2. GENI is the US counterpart to this Japanese effort (although it's hard to tell from the light-on-details article).
A salary of $60k in Iowa is equivalent to $100k in California. $60k/year will buy you a nice family house, decent car, and a easy-going lifestyle. If the national telecom infrastructure was up to date, there would be many jobs that can be done in the middle of Iowa that are now done in California. Alternately, for a bit more than the salary you pay to an Indian programmer (well, a bit more than those who now are demanding more money...), you can get a native English speaker *in a nearby timezone*.
With low-quality or no broadband, you lose this potential workforce.
Or, at least, so goes the theory.
Good points!
NSF does do seed money (i.e., smaller grants to get infrastructure started). However, they need to find a way to do sustainable funding in a better manner. Nothing's worse to see a seed grant help a site get started, only to have it dry up 2 years later (or, forcing the poor PI to become a full-time grant writer).
I'm not saying we need to send $50 million to Nowhereville, Arkansas. However, it'd be nice to see a little more invested in infrastructure out there...
Coming from a school which isn't on either coast, I think it's great that NSF money is being spread around a little bit.
While I'm not questioning the excellence of the existing facilities, there's certainly faculty outside of those who might be able to be more cost-effective with their money. A $2M grant falls into the "small fish" category in some places, while a $2M grant might be huge at smaller schools. Further, I personally believe the cost of getting a grant off the ground is less expensive in the Midwest. $60k is a great wage in the area where I live; taking into account the cost of living, it is equivalent to $100k on the coast. The NSF's money can simply be spread around farther.
I'm not faulting anyone here (the big places are top-notch, NSF projects usually do show good return on investment), but it'd be nice to see the money go to other places.
I am sucking it up, but you made a few mistakes.
First of all, you assume that I'm taking any engineering courses. I'm not. If I was taking engineering courses, it would bother me less. I'm bothered by the $40 / hour fee for theory courses and thesis hours in Computer SCIENCE.
Further, in grad school, all your courses are CSE courses. Traditionally, the department agrees to pay for your tuition in order to entice you to come (my undergrad professors told me that if they don't offer to pay for your tuition plus a stipend, a CS department isn't seriously giving you an offer). So, student fees effectively reduce the standard stipend amount by 10%.
If you reduce an already underwhelming amount by 10% (compared to industry), it seriously effects the department's ability to recruit new students. I sure wouldn't have come to the department if they told me the college would be skimming off 10% of my pay.
The tuition is also going up - but not at enough of a rate to cover costs. The state continues to increase its funding of the university at a rate smaller than inflation, so the burden of the costs falls upon students through federal student loans. Where the state used to cover most of the university's costs, the feds supply the largest chunk of the pie through loans.
Again, the department attracts students by paying their tuition. The college gets around this by calling it a "fee", meaning it's cold cash from my pocket. I realize the value of a good education so I pay it, but the question is how many prospective students go to Kansas/Missouri/Iowa because they don't want to cough up $1000 / semester in cash to the university on an "all tuition paid" scholarship.
Trust me, I've discussed where the fees are going. All of it goes directly to the Engineering college's general coffer. Of this, a certain (smaller) percentage goes back to the CS department's budget.
The CS department is headed by a very logical guy. He took the department's portion of the money and raised the TA/RA salary by $50 a month, which covers about 50% of the fees. It's quite literally the best he could do - he had no say in the levying of the fee.
So, basically the money changes hands a couple of times, most of it ending up in the college's accounts, and no overall benefit to the CS department. Sucks for us, eh?
The fees took the CS department by surprise - the professors were just as mad as the students. They knew it would hinder their ability to attract good students and not really benefit them.
I'm enrolled at the University of Nebraska-Lincoln, and those engineering fees SUCK.
For example, the Computer Science and Engineering is 40% engineering, so every 3 hour class I take has a $150 fee attached to it. What if it is a 3 hour Computing Theory course? $150 extra. What if it is thesis hours? $150 extra. All because engineering courses "cost more". Even if, like thesis hours, there is no classroom.
What's worse is that this is FEE, not additional tuition. So, graduate students can't get them paid by their scholarships. The all-tuition-paid scholarship doesn't quite mean the same thing at UNL if you have to pay $1000 PER SEMESTER in fees. The involved departments have a harder time attracting top quality talent because of this. They are quite literally focused on the short term cash gain rather than the long term effects on the college.
There are other, indirect effects. Bio, chem, and physics students used to take computational courses to learn the basics of clustered computing. This resulted in long-lasting collaborations between these departments. Computational scientists worked out better algorithms for the physicists, and the physicists got better results. The grad students no longer take these classes, meaning that they are at a disadvantage - or just ignore the computational side of their subjects.
It's lose-lose-lose for the students, professors, and departments involved. The university, however, makes a bit more money.
(Not only do these fees specifically piss me off, they decided to "surprise" the students with them. I mean, the plans were put out for anyone to read. In a cellar. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard'.)
So, 30 racks per month ... for a 15 year project. Say you only buy the first 5 years worth of disks - a simple 1800 racks.
The LHC went with a tape-based, distributed storage system. Seven T1 sites around the world keep the data on tape (one copy at CERN, another copy at a T1 site). They do reconstruction of the raw data, and write the reconstructed data on disk. They then distribute the reco data to a T2 site, which has a large amount of disk-only space (like you suggest). The individual physicist does the analysis at the T2 site.
It still gets worse:
From the section on whose assets can be frozen.
"""
or to have acted or purported to act for or on behalf of, directly or indirectly,
"""
So, if someone accuses you of doing this (she's a witch!), they can freeze your assets. Forget being able to face your accuser, presumed innocence, fair trial, etc. I thought we left Salem a long time ago.
So, what happens after they freeze your assets because your neighbors said they say you at a communist, err... terrorist, meeting?
"""
Sec. 8. This order is not intended to, and does not, create any right, benefit, or privilege, substantive or procedural, enforceable at law or in equity by any party against the United States, its departments, agencies, instrumentalities, or entities, its officers or employees, or any other person.
"""
In other words, if we screw up in freezing the assets, we don't give you the right to file a lawsuit or any procedure to get your things back.
Lovely.
Uh, I don't think you parsed that English correctly, that's not what the sentence says.
...
Let me cut out some of the extra fluff. The first sentence says
"I find that, due to unusual threats posed by violence in Iraq and efforts undermining economic reconstruction in Iraq, it's in US interest to take additional steps. I hereby order..."
The first paragraph is just an introduction. It says that the point of the Executive Order is to hurt those who are trying to hurt Iraq; that has no legally binding meaning, except as a justification to why it's being done.
I also don't have a problem with the gov't blocking bank accounts of terrorists. They already do this. Part (i) of Section 1 goes after people who are doing the terrorism. Part (iii) is the interesting one:
"""
(iii) to be owned or controlled by, or to have acted or purported to act for or on behalf of, directly or indirectly, any person whose property and interests in property are blocked pursuant to this order.
(b) The prohibitions in subsection (a) of this section include, but are not limited to, (i) the making of any contribution or provision of funds, goods, or services by, to, or for the benefit of any person whose property and interests in property are blocked pursuant to this order
"""
See what happens there? They aren't referring to terrorists; they are referring to people who may be indirectly linked to terrorists. That's where the privacy rights people get up in arms. If I buy oil from the Saudis, and the Saudis donate the money to a charity which turns it over to terrorists, do I "indirectly" help them out? Who gets to define what "indirect" means; if it's the executive branch, it isn't a jury of your peers...
Ambiguity like this covers a wide swath of activities. I'm not claiming something crazy like they are going to start arresting people for buying gas, but it's not hard to read this order as "we now have the power to arbitrarily arrest people, but we only plan to apply it to terrorists."
I wish the logic said the power "CAN only" be used against terrorism. But instead, they the new power is claimed encompassing some ill-defined "indirect contributors" group, and a press release was made saying it "WILL only" be used against terrorism. The later depends on you trusting the government to hold its word; doesn't always seem to be true.
Correct Usage: Human writes SQL schema, then integrates the ORM. As long as human doesn't do silly things in the programming language, we have success. Database objects become much easier to use, and speed is fast.
Incorrect Usage: Human writes an object spec. ORM auto-generates SQL schema. Human blindly uses machine-generated ORM bindings without understanding underlying SQL. Database gets mildly large, then human complains "the stupid thing is slow".
If ORM is done for convenience, great! That's what I use it for.
If ORM is used in lieu of understanding how SQL works, you could be headed for trouble.
ORM is great! ... until you you have a couple hundred thousand rows. Then it's slow. ... until you have a couple million rows. Then it's unbearable.
I love ORM for smaller applications, but there's always a point where heading down the hall to say "hi" to the local DBA is a good idea. And beware, redesigning the DB from the ORM to your own schema can be extremely painful. How close the ORM schema is to "pleasant" depends highly upon the package you use.
This is from someone who is trying to perform queries on someone else's database designed with Hibernate. One that has 12 million rows (average row size, 9KB). Which has been running my simple query for 40 minutes.
How about something that scales up to handle the data needs of the largest scientific experiment known to man? dCache and Castor (both distributed storage services) run the SRM protocol, which is SOAP-based. Both scale to many petabytes.
Now, both are also somewhat buggy and unreliable, as they are funded on shoestring budgets by national labs. But that's moot - the idea is that this has already been done.
I know some prior art which would strike some of these claims.
Distributed storage system: dCache, http://www.dcache.org/. This allows one to configure storage as multiple commodity storage servers at one site (up to several petabytes) or distributed over several sites (as used by NorduGrid).
One of the many protocols dCache supports is SRM (Storage Resource Manager), which is a web-services (SOAP) based protocol which allows you to perform your usual copy/delete/ls. It's designed as a generic protocol which several distributed systems implement.
Finally, SRM v2.2 (whose published spec at least predates the patent application, if not any of the big implementations) also has the concept of "Storage Classes" which allows the user to specify how the file should be stored (temporary with lifetime X, on tape, on disk, on disk with multiple replicas, etc).
Plus, Globus has been doing web-services storage for years, if not among a single "distributed" system - unless you count the grid as the system. Finally, the SRB from UCSD also implements a lot of these claims.
If the patent examiner is competent, many of these claims will be struck down and the patent will be refiled with a much narrower scope. Hopefully he's reading this very article!
One of the interesting parts of the LGPL that I found out about recently is Section 6. Someone else has linked the text already in this article, so I'll spare you the details.
Basically, it says that the user must be able to replace the current LGPL code with his/her own version and the device must still be operational. So, Apple doesn't need to release Safari's source code (beyond WebKit), but it *does* have to give you the opportunity to replace the WebKit pieces with your custom patches and have them work on the iPhone.
Or so goes one reading of the license. I'm sure Apple's legal dep't has a different reading.
At any rate, it's a huge step backward for a company which touts the power of open source in its OS on its webpage. It's a huge step backward for the company which recently made a tiny step forward in releasing non-DRM'd MP3s from its online store.
In the off chance that Steve Jobs reads this, I *did* buy my Macs almost solely because of the Unix/Open Source underpinnings. Having a closed iPhone is not all that exciting.
It works fine, but we actually tend to lean toward many streams as opposed to uber-fast single streams.
i ty::RatePlots?graph=quantity&entity=dest&src_filte r=&dest_filter=Nebraska&no_mss=true&period=l14d&up to=&.submit=Update
Truthfully, you have to tweak the system pretty hard to get decent performance over a single stream (for us, 155 Mbps isn't sufficient - I work on a LHC project), especially from Nebraska to Switzerland (CERN). FAST TCP helps out a whole lot. GridFTP is the other piece of the equation - it is basically FTP with multiple data streams.
We tend to lean on hundreds of streams a whole lot more than tweaking TCP settings, and the Caltech guys give us heck for that. They're right, however - if you're getting 100s of KBps per stream to some European site, it just takes a ridiculous number of streams to get up to 100 MBps. Right now, the storage systems are behind the network, so we haven't even been able to start playing with FAST TCP yet.
http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activ
Yup,
i ty::RatePlots?graph=quantity&entity=dest&src_filte r=&dest_filter=Nebraska&no_mss=true&period=l14d&up to=&.submit=Update
http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activ
In fact, we often talk with the Caltech folk about deploying FAST TCP; the problem is that both ends need to deploy the kernel patches. Truthfully, the limiting factor becomes the disk systems, not the network. When we start to push closer to 10 Gbps instead of 4-6 Gbps, we'll need to make smarter decisions about the TCP stacks.
I was referring to the Caltech efforts, not the commercialized device. Don't know anything about the Aria products. For example, see the below link: http://ultralight.caltech.edu/web-site/sc05/html/i ndex.html
This is based on a patched Linux 2.6 kernel, and it's a couple of years old.
Think again. I suspect that you only have tried that on a low-speed link (DSL, Cable, FIOS, etc). Try thinking about 2 orders of magnitude faster.
I transfer about 20 TB / day at work, and that wouldn't be possible with a "typical FTP connection".
If you read the papers coming out of Caltech, you'd see they were optimizing for 10 Gbps lines, not residential lines. 15-20x faster is a very fair estimate; look at Caltech's presentations at SC05 or SC07.
Hi -
You're thinking way too small. FAST TCP was designed with 10 Gbps links in mind - i.e., Internet2 type applications. FAST TCP streams are able to achieve several hundred Mbps. FTP streams over TCP Reno usually max out on something relatively pathetic, like 10-20 Mbps.
Caltech's SC07 presentation showed commodity servers which could transfer 2 Gbps end-to-end using their FDT tool (Java based, actually). The servers had 4 HDDs, dual Gigabit ethernet conncetions, and ran a Linux 2.6 kernel with the FAST TCP patches.
On the other hand, GridFTP takes a different approach - parallelizing several TCP streams at a time. Why get a single stream going 10x faster using a special Linux kernel when you just send 10 parallel TCP streams at once? (While GridFTP over TCP with lots of streams is more popular than FAST TCP with GridFTP using a small number of streams, imagine what happens to your RAID array when there is a large (>100) number of streams of data coming off different parts of the same disk...)
Actually, FAST TCP is also available as a linux kernel patch. It's a well-tuned Caltech product which has been in development for years:
http://netlab.caltech.edu/FAST/
Several highlights include:
- Caltech held the world record for data transfer for awhile
- Won the bandwidth challenge at SC05
It's one of the best ways to tune a single TCP stream. Finally, the list of about 50 TCP-related publications should indicate this isn't handwavium:
http://netlab.caltech.edu/FAST/fastpub.html
Traditional TCP streams (such as what you get with FTP) top out around 10-20 Mbps. If you want to see a single stream go a couple hundred Mbps, you need TCP tweaks like FAST (however, FAST is one of many competing TCP "fixes").
I transferred some of the first files over this network. It was Monte Carlo physics data produced for the CMS project. It took us about 20 minutes to go from 0 to 6.5 Gbps (we have a 10Gbps link to I2).
P2P is not a big application on I2. Simply put, clients like Bittorrent don't scale well for individual transfers and there aren't enough transfers to really aggregate to an impressive number.
I expect at least 100 Mbps per file using our transfer tools, then transfer many files at once.
I attended a presentation by the CEO of I2 last month. This should be finished by the end of this year.
I'm of the same thought - the other guy's missile defense system is about a valid military as possible.
The missile-targeting issue is nearly-moot: it only takes what, a couple minutes to re-aim the missiles in the ground?
The larger issue is that Russia feels threatened by US armaments in its "backyard" for no apparent reason, just like we would feel threatened if Russia put an ABM system in Canada to "fight off the Mexicans". They don't buy Bush's line of "we're doing this to fight the terrorists". Sometimes it would be nice if politicians used Newspeak - "we view the ABM system as plusungood" instead of veiled threats. We need to work with these people, not stir up the anthill.
That said, I don't really care for the crackdown on liberties in Russia.