Multicast solves the problem well for broadcast, or for extremely popular YouTube videos, when lots of people are watching the same video at the same time. It doesn't work when each user wants to watch a different video.
This user upload model would work well for videos that are popular enough that many people are watching them, but not so popular that lots of people are watching them simultaneously. If the video isn't popular, no users will have downloaded it and therefore nobody can upload it to new users. If the video is really popular multicast might be a better solution.
What if you got in a car accident? Had a previously unknown heart defect? Fell down a flight of stairs and broke your back?
Many young people think they don't need health insurance because they're healthy. Well, you're healthy now, but an accident could make you very unhealthy very quickly, and your medical bills could easily be more than you could possibly afford. In that situation, taxpayers end up paying your way.
Try this New York Magazine article which includes the story of a healthy young guy with no health insurance who got appendicitis.
I choose one menu item on my MacBook, and I'm connected to the Internet through a Bluetooth connection to my cell phone. That solves #2 and #3, and I didn't even have to pay for a Treo.
Actually, the difference between a 22 kHz sine wave and a 22 kHz square wave is the higher frequency information. A square wave, at any frequency, is the sum of an infinite series of sine waves, up to infinite frequency. So, when it is said that "humans can't hear above 22 kHz", that is equivalent to saying that humans can't tell the difference between a sine wave and a square wave at 22 kHz.
In addition, the oversampling and filtering process in your CD player is meant to ensure that no frequencies higher than 22 kHz make it out of the player, meaning that a 22 kHz wave of any kind encoded on a CD will come out of the player as a sine wave.
Even if you're lucky enough to use 1 Gbit/s cards and cabling and routers that can handle it, the aggregate throughput between nodes is 128 MB/s.
That might be true if you were using an Ethernet hub, but nobody does that for Gigabit Ethernet. Gigabit switches typically have a switching fabric capable of bandwidth well in excess of 1Gbit/sec. Of course, the connection between two nodes is limited to 1Gbit/sec, but the aggregate bandwidth is higher.
Just as a simple datapoint, I run a real application on a cluster of PCs over gigabit ethernet, using a relatively cheap gigabit switch. Aggregate bandwidth is over 300mb/sec. I doubt the switch could handle anywhere near the 4GB/sec you quote, but I imagine this hardware was quite a bit cheaper.
Your theory on AOL's intentions seems pretty far-fetched and naive to me.
Well, if their intentions were otherwise, they certainly covered their tracks well:
The data was released one week before the world's most prestigious information retrieval conference (SIGIR), and was announced directly to researchers
AOL had a booth set up at the SIGIR conference, presumably to talk to researchers about this data release. Of course, the booth was vacant most of the time, since by the time the conference started the data was gone.
Given that their response after the public outcry hasn't been a textbook case study in crisis management, I'd be surprised if these actions were just some kind of smokescreen.
If they really wanted to make the most money possible, they would have sold these logs (non-anonymized) to the scores of direct marketers that I'm sure would love to have this data. Instead, they packaged it up and tried to make it available to academic researchers. These researchers honestly just want to make better search engines that run faster and return better results. Furthermore, when academics come up with a great new idea, it gets published so that anyone can read it.
Every once in a while, someone suggests an open source search engine. Check out Nutch if you want to see work in this area. However, if open source search solutions are going to be any good at all, they'll have to rely on the decades of public, published information retrieval research that's already out there.
We are entering a time when companies are capable of totally outpacing academia because they have query log data, so they know exactly what users actually do. There is no way that an academic can get this kind of data unless a company releases it. Researchers at AOL, in good faith, tried to release data so researchers could have a chance at success. Ultimately, of course, that's good for AOL since they're not in the top three search engines out there. Public research can only help raise AOL's standing by helping to level the playing field. But, it's good for you too, because you can build your open source solution based on this research too.
Yes, the release was botched, and yes, the long term user identifiers were a mistake. But don't make AOL out to be some evil company that was only out to destroy your privacy. They made a mistake!
If we decide that a scientist must study intrinsic properties of the world, and not man-made artifacts, then I'll grant that most of computer science is not science. However, it seems like we should have a word for a person that uses the scientific method in the study of man-made artifacts. Wikipedia says that engineers "use creativity, technology, and scientific knowledge to solve practical problems," which sounds like part of computer science, but certainly not all of it.
Nevertheless, the original poster talked about the ability to create awe, and said that this was the domain of hardware engineers; you seem to imply that it's the domain of physical scientists instead. Both statements sound a bit narrow minded. I personally agree with Bill Gates (gasp!) that software is still the most exciting field right now, and the one most likely to make a major impact on humans in the next few years (with the possible exception of biological/health sciences). Like every discipline, the excitement of software will give way to the next discipline eventually, but I don't think that time has come yet.
Google is hiring Computer Science Ph.D.s at an astounding rate. I guess you could call these people programmers (you'd hope they'd know how to write a program or two) but hopefully you'd also call them scientists.
Your second statement seems contradictory. Wasn't it in part the windowing systems and object oriented programming that made us excited about Xerox PARC? Is that not software?
Is a search engine not software? Yes, it's deployed on massive hardware, but it's a software application. The Grand Challenge vehicles are (in my opinion) primarily feats of software.
I use VMWare for testing and it is great, but virtualisation tends to be used in production to compensate for software that doesn't cooperate well with other software.
I remember this kind of argument from Mac devotees in the pre-OS X days when the Mac didn't have real protected memory, and still used cooperative multitasking. People would say that pre-emptive multitasking was just a crutch, that cooperative multitasking was cleaner and potentially more efficient, and that "good" programs would consistently yield processor time in tight loops to let other programs run.
It turns out that putting yield statements in every inner loop of every program you run is a big huge hassle, and that pre-emptive multitasking solves the problem elegantly; so elegantly that everyone does it. Not yielding CPU time is not "bad code"; it's just leaning on an abstraction that you know exists.
This same pattern of argument has been used to downplay high level languages ("optimizing compilers are just a crutch--quality software has hand-scheduled instructions"). Now we'd legitimately have to call the x86 ISA a crutch, since modern processors effectively process x86 instructions in emulation.
As a summary, any user of Word, Excel, etc. knows that since Office 95 there has been a massive explosion of features that makes it nearly impossible to find anything (massive menus plus an explosion of toolbars plus lots of context-sensitive pallette-like sidebar things). As Harris states, 4 of the most highly requested features in Word are already in the product, but people can't find them. After loads of research, they decided to do this Ribbon UI thing.
Because it creates a solid incentive for power companies to be cleaner, and is more likely to result in cleaner air than you purchasing a hybrid?
I figure there are three alternatives for dealing with cleaning up emissions from power generation:
Just hope that power producers invest in clean technology on their own. This doesn't work because there's no financial incentive; polluting companies will undercut the prices of clean companies, and the clean companies will go out of business.
Regulate: tell all the power companies that they must adhere to certain emission cleanliness standards. This works to a point, but basically ensures that no companies will work to beat the standard (see point 1).
Carbon credits: this has all the same environmental benefits of the previous point (you can set aggregate emissions quality to the exact same level as in standard regulation), but it encourages companies to have even cleaner output.
Carbon credits seem (to me) be the best deal overall for society.
"Even that assumes Apple never changes the software."
Apple can only change new versions of the software. As long as I'm not updating my iTunes software on my computer, I can retain all the functionality I want. That's the great thing about actually buying music from iTunes instead of subscribing like on other services; if Rhapsody changes its licensing terms or raises its fees suddenly, there's nothing I can do about it.
"Say a vastly better portable mp3 player comes out from another company."
This is analogous to any format switch (LP/cassette/CD or VHS/Beta/DVD). Burning iTunes tracks to CD and then re-ripping them is arguably easier than moving from LP to CD. Also, it's faster for me to buy an iTunes track, burn to CD, then rip it back as mp3 than it is to buy the CD and rip to mp3 in the first place.
As another poster commented, you can tell where you are based on the text at the top of the results list. For instance, after scrolling down a little on a search for "windows", it says "windows 5-9 (151,200,195)". Arguably that's just as informative as Google when you switch to the next result page.
As for the original poster's question, I'd gather that it's because research shows that users almost never look at a second page of results, no matter how bad the first page is. I assume Microsoft is hoping that this 'infinite result list' will encourage people to look deeper in the results. If my scrollwheel worked with it, I'd see it as a major improvement over the Google interface.
What!? Why do you get to dictate what e-mail is for?
E-mail is a service used by employees to get work done. In the case of marketing/sales types, 1GB of saved e-mail is common, and it's critical business data. Yes, some of that data is binary, but it is critical.
Often administrators impose quotas, let the users whine a bit, and then the whining subsides. The adminstrators think that the problem is solved; nope, what actually happened is that all that critical e-mail just got moved to local folders. When that local hard disk inevitably crashes, taking the critical data for a $1 million sales deal along with it, the whining will turn to screaming.
The solution (in my opinion) is for administrators and companies to reevaluate how much e-mail is worth to users. For many, I'd argue it's worth many thousands of dollars. I'm sure some of that money could be used for a reasonable amount of storage.
Important differences between the keynote benchmarks and those shown by ArsTechnica:
The keynote benchmarks were compiled using Intel's C compiler, which tends to produce much faster code than gcc. To be fair, Apple compiled the G5 code with the IBM compiler. I assume xBench was compiled with gcc on both platforms.
The QuickTime test almost certainly uses hand-tuned Altivec instructions on the G4/G5. There's no evidence yet that the Intel code is tuned for SSE.
Even though ArsTechnica likely doesn't have access to the Intel compiler, I was surprised to see that they didn't try compiling any code themselves to benchmark with. That would have been the best way to know exactly what variables are under control here. There are plenty of applications (encryption, rendering, etc.) that have optimized open source implementations; these seem like the best place to start.
How can it be manipulative to quote performance numbers based on the SPEC benchmarks, which are the industry standard?
I drove across the USA in the summer of 2003 (north on the west coast from California to Washington, then across the northern states until Massachusetts), and I took a Verizon cell phone with data access. I recall having trouble getting a signal in eastern Montana and western North Dakota, but I was able to get data access everywhere else. Coverage has only improved since then.
Getting reception in buildings can still be a problem, I admit. However, it's becoming increasingly more difficult to find a usable phone jack anywhere. Many office buildings end up having a phone system that doesn't permit easy modem access anyway.
Have the same level of expenditure on open, published computer science research as Microsoft? (Remember to count MSR Redmond, Silicon Valley, Cambridge, and Asia)
Have the same level of sponsorships of academic conferences?
I think Microsoft has shown that they do believe advancing computer science will help them be competitive in the future. While all the old research labs seem to have slashed research budgets, Microsoft continues to funnel money into MSR.
Bachelor's Degree program in Software Engineering
Multicast solves the problem well for broadcast, or for extremely popular YouTube videos, when lots of people are watching the same video at the same time. It doesn't work when each user wants to watch a different video.
This user upload model would work well for videos that are popular enough that many people are watching them, but not so popular that lots of people are watching them simultaneously. If the video isn't popular, no users will have downloaded it and therefore nobody can upload it to new users. If the video is really popular multicast might be a better solution.
What if you got in a car accident? Had a previously unknown heart defect? Fell down a flight of stairs and broke your back?
Many young people think they don't need health insurance because they're healthy. Well, you're healthy now, but an accident could make you very unhealthy very quickly, and your medical bills could easily be more than you could possibly afford. In that situation, taxpayers end up paying your way.
Try this New York Magazine article which includes the story of a healthy young guy with no health insurance who got appendicitis.
I choose one menu item on my MacBook, and I'm connected to the Internet through a Bluetooth connection to my cell phone. That solves #2 and #3, and I didn't even have to pay for a Treo.
Actually, the difference between a 22 kHz sine wave and a 22 kHz square wave is the higher frequency information. A square wave, at any frequency, is the sum of an infinite series of sine waves, up to infinite frequency. So, when it is said that "humans can't hear above 22 kHz", that is equivalent to saying that humans can't tell the difference between a sine wave and a square wave at 22 kHz.
In addition, the oversampling and filtering process in your CD player is meant to ensure that no frequencies higher than 22 kHz make it out of the player, meaning that a 22 kHz wave of any kind encoded on a CD will come out of the player as a sine wave.
Even if you're lucky enough to use 1 Gbit/s cards and cabling and routers that can handle it, the aggregate throughput between nodes is 128 MB/s.
That might be true if you were using an Ethernet hub, but nobody does that for Gigabit Ethernet. Gigabit switches typically have a switching fabric capable of bandwidth well in excess of 1Gbit/sec. Of course, the connection between two nodes is limited to 1Gbit/sec, but the aggregate bandwidth is higher.
Just as a simple datapoint, I run a real application on a cluster of PCs over gigabit ethernet, using a relatively cheap gigabit switch. Aggregate bandwidth is over 300mb/sec. I doubt the switch could handle anywhere near the 4GB/sec you quote, but I imagine this hardware was quite a bit cheaper.
Your theory on AOL's intentions seems pretty far-fetched and naive to me.
Well, if their intentions were otherwise, they certainly covered their tracks well:
Given that their response after the public outcry hasn't been a textbook case study in crisis management, I'd be surprised if these actions were just some kind of smokescreen.
That's a bit cynical, don't you think?
If they really wanted to make the most money possible, they would have sold these logs (non-anonymized) to the scores of direct marketers that I'm sure would love to have this data. Instead, they packaged it up and tried to make it available to academic researchers. These researchers honestly just want to make better search engines that run faster and return better results. Furthermore, when academics come up with a great new idea, it gets published so that anyone can read it.
Every once in a while, someone suggests an open source search engine. Check out Nutch if you want to see work in this area. However, if open source search solutions are going to be any good at all, they'll have to rely on the decades of public, published information retrieval research that's already out there.
We are entering a time when companies are capable of totally outpacing academia because they have query log data, so they know exactly what users actually do. There is no way that an academic can get this kind of data unless a company releases it. Researchers at AOL, in good faith, tried to release data so researchers could have a chance at success. Ultimately, of course, that's good for AOL since they're not in the top three search engines out there. Public research can only help raise AOL's standing by helping to level the playing field. But, it's good for you too, because you can build your open source solution based on this research too.
Yes, the release was botched, and yes, the long term user identifiers were a mistake. But don't make AOL out to be some evil company that was only out to destroy your privacy. They made a mistake!
If we decide that a scientist must study intrinsic properties of the world, and not man-made artifacts, then I'll grant that most of computer science is not science. However, it seems like we should have a word for a person that uses the scientific method in the study of man-made artifacts. Wikipedia says that engineers "use creativity, technology, and scientific knowledge to solve practical problems," which sounds like part of computer science, but certainly not all of it.
Nevertheless, the original poster talked about the ability to create awe, and said that this was the domain of hardware engineers; you seem to imply that it's the domain of physical scientists instead. Both statements sound a bit narrow minded. I personally agree with Bill Gates (gasp!) that software is still the most exciting field right now, and the one most likely to make a major impact on humans in the next few years (with the possible exception of biological/health sciences). Like every discipline, the excitement of software will give way to the next discipline eventually, but I don't think that time has come yet.
Google is hiring Computer Science Ph.D.s at an astounding rate. I guess you could call these people programmers (you'd hope they'd know how to write a program or two) but hopefully you'd also call them scientists.
Your second statement seems contradictory. Wasn't it in part the windowing systems and object oriented programming that made us excited about Xerox PARC? Is that not software?
Is a search engine not software? Yes, it's deployed on massive hardware, but it's a software application. The Grand Challenge vehicles are (in my opinion) primarily feats of software.
I remember this kind of argument from Mac devotees in the pre-OS X days when the Mac didn't have real protected memory, and still used cooperative multitasking. People would say that pre-emptive multitasking was just a crutch, that cooperative multitasking was cleaner and potentially more efficient, and that "good" programs would consistently yield processor time in tight loops to let other programs run.
It turns out that putting yield statements in every inner loop of every program you run is a big huge hassle, and that pre-emptive multitasking solves the problem elegantly; so elegantly that everyone does it. Not yielding CPU time is not "bad code"; it's just leaning on an abstraction that you know exists.
This same pattern of argument has been used to downplay high level languages ("optimizing compilers are just a crutch--quality software has hand-scheduled instructions"). Now we'd legitimately have to call the x86 ISA a crutch, since modern processors effectively process x86 instructions in emulation.
Don't fear abstraction! It's good for you.
Jensen Harris BayCHI Ribbon UI Podcast
As a summary, any user of Word, Excel, etc. knows that since Office 95 there has been a massive explosion of features that makes it nearly impossible to find anything (massive menus plus an explosion of toolbars plus lots of context-sensitive pallette-like sidebar things). As Harris states, 4 of the most highly requested features in Word are already in the product, but people can't find them. After loads of research, they decided to do this Ribbon UI thing.
Because it creates a solid incentive for power companies to be cleaner, and is more likely to result in cleaner air than you purchasing a hybrid?
I figure there are three alternatives for dealing with cleaning up emissions from power generation:
Carbon credits seem (to me) be the best deal overall for society.
"Even that assumes Apple never changes the software."
Apple can only change new versions of the software. As long as I'm not updating my iTunes software on my computer, I can retain all the functionality I want. That's the great thing about actually buying music from iTunes instead of subscribing like on other services; if Rhapsody changes its licensing terms or raises its fees suddenly, there's nothing I can do about it.
"Say a vastly better portable mp3 player comes out from another company."
This is analogous to any format switch (LP/cassette/CD or VHS/Beta/DVD). Burning iTunes tracks to CD and then re-ripping them is arguably easier than moving from LP to CD. Also, it's faster for me to buy an iTunes track, burn to CD, then rip it back as mp3 than it is to buy the CD and rip to mp3 in the first place.
As another poster commented, you can tell where you are based on the text at the top of the results list. For instance, after scrolling down a little on a search for "windows", it says "windows 5-9 (151,200,195)". Arguably that's just as informative as Google when you switch to the next result page.
As for the original poster's question, I'd gather that it's because research shows that users almost never look at a second page of results, no matter how bad the first page is. I assume Microsoft is hoping that this 'infinite result list' will encourage people to look deeper in the results. If my scrollwheel worked with it, I'd see it as a major improvement over the Google interface.
What!? Why do you get to dictate what e-mail is for?
E-mail is a service used by employees to get work done. In the case of marketing/sales types, 1GB of saved e-mail is common, and it's critical business data. Yes, some of that data is binary, but it is critical.
Often administrators impose quotas, let the users whine a bit, and then the whining subsides. The adminstrators think that the problem is solved; nope, what actually happened is that all that critical e-mail just got moved to local folders. When that local hard disk inevitably crashes, taking the critical data for a $1 million sales deal along with it, the whining will turn to screaming.
The solution (in my opinion) is for administrators and companies to reevaluate how much e-mail is worth to users. For many, I'd argue it's worth many thousands of dollars. I'm sure some of that money could be used for a reasonable amount of storage.
Even though ArsTechnica likely doesn't have access to the Intel compiler, I was surprised to see that they didn't try compiling any code themselves to benchmark with. That would have been the best way to know exactly what variables are under control here. There are plenty of applications (encryption, rendering, etc.) that have optimized open source implementations; these seem like the best place to start.
How can it be manipulative to quote performance numbers based on the SPEC benchmarks, which are the industry standard?
Getting reception in buildings can still be a problem, I admit. However, it's becoming increasingly more difficult to find a usable phone jack anywhere. Many office buildings end up having a phone system that doesn't permit easy modem access anyway.
10 was right out.
Would you care to name other companies that:
I think Microsoft has shown that they do believe advancing computer science will help them be competitive in the future. While all the old research labs seem to have slashed research budgets, Microsoft continues to funnel money into MSR.