The advantages of SANs are easy to realize, they need not necessarily be FibreChannel vs NAS (NFS/CIFS) as a SAN could be iSCSI, FCOE, FCIP, FICON etc..
-Storage Consolidation compared with internal disk. -Fewer components in your servers that can break. -Server admins don't have to focus on Storage except at the VolMgr/Filesystem level -Higher Utilization (a WebServer might not need 500GB of internal disk). -Offloading storage based functions (RAID in the array vs RAID on your server's CPU, I'd rather the CPU perform application work rather than calculating parity, replacing failed disks etc). This increases when you want to replicate to a DR site.
This is not a ZFS vs SANs argument. I think ZFS running on SAN based storage is a great idea as ZFS replaces/combines two applications that are already on the host (volmgr & filesystem).
I swear, I used to believe in this stuff, but I'm starting to see that it's more marketing myth than technical wizardry.
It's one thing to reduce your costs by consolidating local disks onto cheaper networked disks, but my experience is that SAN arrays usually cost more than internal disk, even though they should be cheaper.
The genius of ZFS is that it's a return to the "inexpensive" part of RAID. An administrator can take a bunch of "low-end" SATA disks, apply some ZFS magic, and end up with performance and reliability numbers that would make your jaw drop. I've seen benchmarks that claim that a SUN Thumper, a single box of a mere 4 rack units, can do 1GB/sec of IO throughput. Not 1 gigabit per second, but 1 gigabyte per second. That's faster than the best 8Gb FC SANs!
More importantly, any competent admin can manage a Solaris ZFS filer. The ZFS command line utilities are simple to use, and I say this as a self-confessed "Windows Server Administrator". Compare this to most SAN arrays, which are so complex that most enterprises won't allow anyone but a "certified administrator" to even touch them.
Single instancing is a big step towards ending the dominance of the big players on the storage market. As soon as someone creates a software RAID like ZFS that has integrated controller redundancy instead of just storage redundancy, the era of traditional FC SANs is over. There's essentially nothing that a "hardware" RAID does other than controller redundancy that hasn't been already implemented in software. It's just a matter of time now...
We investigated this a while back, and it is basically a dirty, filthy hack on top of vanilla NTFS.
First of all, it doesn't compare blocks or byte-ranges, but entire files only. If two files are 99% identical, then they are different, and SIS won't merge them.
Second, it uses a reparse point to merge the files, which has significant overhead, at least 4KB for each file, if I remember correctly. That is, SIS won't save you any disk space for small files, which is actually quite common on file servers. The overhead erases much of the benefit even for larger files, to the level that SIS will skip files smaller than 32KB by default.
Third, it operates in the background, after files have been written. This means that files have to be written out in their entirety, read back in, compared byte-for-byte to another file, and then erased later. This is incredibly inefficient. On large file servers, the disk was thrashed like crazy.
Lastly, we found that the Copy-on-Write mechanism immediately copied out the entire file if it was changed even slightly. For small files, this is not noticable, but for large files this can be a massive performance hog. A 4kb write can be potentially translated into a multi-GB copy!
Proper single-instancing systems use in-memory hash tables that are often partitioned using "file similarity" heuristics to prevent cache thrashing. Even more advanced systems can maintain single-instancing during replication and backups, reducing bandwidth requirements enormously. Take a look at the features of the Data Domain filers for an idea of what the current state of the art is.
I think this was the original paper that pointed out that it's possible to exceed the naive "optimal broadcast" efficiency of a single source on a switched network, by allowing intermediate nodes to perform some computation.
Mind you, this is only tangentially related to current P2P systems, but it's still interesting.
As the AC post just above yours pointed out, it's fairly simple to scrape this information from a wide variety of sources. It would be sufficient for someone to update a simple table once every few months. Even if it was baked into the uTorrent executable, it would get updated reasonably often along with the regular point releases.
Keep in mind also that most P2P clients represent a large set of distributed cooperating applications that could analyze and monitor the internet from "their perspective" and exchange routing efficiency data.
Perfection is not only not required, but not useful, because some randomness is still required to ensure that the P2P mesh is robust in the face of a partial failure somewhere.
would you care to provide links to these papers? i'm sure you've read quite a few that back this up otherwise you wouldn't be arguing this position.
I only have vague memories of reading some related stuff a while back (I'm not exactly keeping tabs of this stuff for research or anything), but I remember that there was a Slashdot post a while back about this paper from Microsoft Research:
It basically says that there's an even more efficient form of P2P, where the blocks are optimally encoded using something akin to a huge "RAID 5" type parity algorithm, so that every peer can help every other peer, immediately. From what I gather, this doesn't quite work in practice, not just because of the computational requirements, but because this algorithm has issues with untrusted peers -- users can only validate complete files, or something like that. The Slashdot post where it was mentioned had some good discussion and links to P2P design and issues, and I believe one or two developers who've worked on well-known P2P systems chimed in with some good points. One guy mentioned that the reconstitution algorithm thrashes the disk, for example.
From the reading I've done, it looks like traditional P2P (random block exchange) is surprisingly good, as long as the peers are chosen well. I actually studied some P2P protocols at work as a potential method for distributing files to an enterprise with thousands of sites, but I couldn't find any at the time that could be embedded into a larger systems as a module and could make intelligent peer choices. Most of the ones I tried kept chewing up WAN bandwidth instead of preferring local LAN bandwidth. Note that uTorrent does detect peers on the same subnet, but not adjacent subnets.
I can't find the link any more, but I remember that there was a paper recently that proved mathematically that certain types of data transfer mechanisms like client-server can never utilize more than a certain percentage of the total theoretical bandwidth of a network like the internet, but P2P does a lot better, and P2P with network coding is basically optimal, or close to it, but still not 100%.
I agree, I see other peers on the same ISP as me and others on another ISP which the ISP are also based in the same city as me and yet it doesn't connect to them.
Coding a way so you can manually prioritise that peer or domain would be easy to do.
I see this all the time too. It shits me to no end that I could be connecting to users with 10Mbit uplinks in the same city, but uTorrent blindly connects to peers in places like Hungary which is almost precisely the furthest possible distance from me.
Your post is precisely what is wrong: It's all about what you get out of an individual download.
Not everything in the world is about directly maximizing YOUR OWN PERSONAL BENEFIT. More importantly, you can actually improve your own personal speeds by cooperating. Read up on the Tragedy of the Commons. If many players all blindly try to maximize their personal utilization of a common resource, they all suffer as an aggregate.
This is particularly true of peer-to-peer protocols, which are ideally placed to utilize otherwise wasted local bandwidth. I've read papers that show that an efficiently designed P2P protocol can actually maximize the return on investment of a switched network, a feat that essentially no other type of protocol can achieve, largely because a well designed P2P protocol can minimize the amount of data flowing through inter-network or international links.
Because I do not want to make to job of MAFIAA any easier.
You do realise that the RIAA and the MPAA represent Recording Artists and Movie Producers, respectively, right?
Neither group are ISPs. Neither invest billions into internet infrastructure.
If peer-to-peer users would just play nice and use the ISP infrastructure efficiently, then maybe the ISPs wouldn't be so inclined to side with the content producers.
You might even find that if digital content distribution is done right, then the ISPs might start to push for laws similar to the "copyright tax" on writeable media to allow their users to legally download content.
Keeping traffic completely local would make it much easier to snag a bunch of file sharers in a massive "three strikes and you're out" campaign, don't you think? Since mere use of torrent software seems to be associated with illicit activity in the minds of the ignorant (ie. the authoRIAAties), I'm not sure that "I was just downloading the latest Ubuntu ISO" would be enough to avoid being threatened by the ISP. Lots of local inter-ISP torrent traffic might also cause them to alert local law enforcement to take a closer look. This could increase one's risk significantly, particularly if any 'infringing' content is ever shared (by an occasional, less enlightened, user of the connect, for example). Seems safer to not have to worry about local/non-local bandwidth, to be honest. Might be smarter to prefer connections that are as non-local and non-concentrated as possible. It's not always just about data transfer speed and bandwidth saving - there are other factors to consider.
[citation needed]
Keep in mind that in large part, the motivation of ISPs for monitoring or throttling bittorrent is not concerns over copyright violations, but the impact to their bottom line. All ISPs have three classes of links: Internal, peered, and external. They have a strong preference to maximize the utilization of the former over the latter, as internal links are effectively free and often underutilized, while external links are often very expensive and overloaded.
If torrent traffic utilized internal connections much more than external connections, ISPs wouldn't be financially motivated to monitor at all, because monitoring equipment is expensive. Right now, monitoring and throttling is worth it, because bittorrent tends to use external links the majority of the time.
In effect, improving efficiency would improve security.
It doesn't have to be reliable, it just has to be better than "totally random". Even a very bad peer selection policy would be a HUGE improvement over what they have now.
Prefix bits do not indicate location. 2 Class C's can be a long way from each other geographically. Even if the entire Internet was broken down into Class C spaces, and you prioritised addresses in your Class C, I don't think you would see many hits. I mean, there may be 50k people on the torrent, but how many of them are in the same neighbourhood as you?
That's why the Vuze plugin uses a IP->location mapping database.
True, but it's still better than random. Many countries were allocated IP blocks from large ranges. Most of Australia's IP addresses start with prefixes around 200-something, for example. Similarly, most ISPs have large blocks allocated to them like/8 ranges or the like. Some ISPs are big enough that torrent users could have 10 or more connections to peers in the same ISP for reasonably common files like TV shows, and only need 1 or 2 to the outside world.
Still, you're correct, adding even a simple country database would help a lot. There's easily obtainable databases of "AS" numbers that map IP ranges to organizations and/or countries, and embedding that into the client would also be a fairly simple exercise.
THEIR arrogance is astounding? How about yours? They are working FOR FREE. You are merely complaining. Get your hands dirty and start doing some work yourself.
You can suggest things all you want, but once you start insulting someone for their free work, you've crossed a line. Nobody is forced to use their client. There are dozens of decent clients and probably hundreds of open source ones.
As for their choices, they will work on what's more important to them, I'm sure. Since they don't need this 'local' feature, they haven't got much incentive to actually work on it.
First of all, they're not working for 'free', uTorrent is owned by BitTorrent Inc, a for-profit company. Initially it was free, but it's now developed by a corporation. Those devs are salaried employees.
More importantly, uTorrent depends on and uses infrastructure that is not free, by any stretch of the imagination. International links are $billions expensive.
So by your logic, just because a user can download their client for free, it gives Bittorent Inc carte blanche to do anything at all they want, including shit all over the internet infrastructure?
How the fuck does it make sense for a company who's product uses something like 30% of the total internet bandwidth to not make an hours worth of effort to minimize their impact on said infrastructure? Their product in its present state is so harmful that ISPs are buying millions of dollars worth of equipment to throttle it, and with good reason.
Compare their behavior to the largely free, open, and volunteer efforts of the dedicated people who worked on the early Internet protocols like DNS and NNTP. These were systems designed to scale, use bandwidth efficiently, and 'play nice'.
What happened since then? Why is it acceptable now to design a protocol that is maximally inefficient? Why would anyone support this kind of behavior?
There's a much bigger issue with uTorrent that the developers seem to refuse to solve, or even acknowledge.
In essence, uTorrent connects to clients randomly, and makes no attempt to prioritize "nearby" clients. This may not be a huge issue for Americans, but everywhere else, you know, like the rest of the fucking planet, this is hugely inefficient, for both the end users, and most importantly, ISPs. This is why they're throttling bittorrent: because it tends to make connections to peers outside the ISP's internal network, which costs ISPs money. In Australia for example, international bandwidth is extremely limited and very expensive, but local bandwidth, even between ISPs, is essentially unlimited, high-speed, and often free or 'unmetered'.
What do you think is going to be faster: connecting to your neighbour through at the same fucking router, or some kid's home PC in Kazakhstan over 35 hops away? Even connections from here to America have to go through thousands of miles of fiber optic cable over an ocean.
Note that some other clients like Azureus have already implemented weighted peer choices, where peers with similar IP addresses are preferred over other peers. It's not hard. Heck, it's a trivial change to make, as no changes need to be made to the protocol itself. A reasonably competent programmer could implement this in an hour: simply take the user's own IP address, and then sort the IPs of potential peers by the number of prefix bits in common, then do a random selection from that list, weighted towards the best-matching end. How hard is that?
The arrogance of the uTorrent devs is simply staggering. They're a group of developers who could, with an hours effort, reduce international bandwidth usage by double-digit percentages and improve torrent download speeds by an order of magnitude, but they just... don't.
I love exams like that, where there is no simple 'right or wrong' answer, but it's a competition against the rest of the class.
At the University of New South Wales, I had a similar experience in my first year of studying Computer Science. We had this great professor who was a bit more inventive than usual with exams and projects.
We started the first year studying a pure functional language called Haskell. A mere 4 weeks into the semester, just when we had figured out what 'functional' meant, and some of us could write a recursive loop, we got our first project: write a program to do optical character recognition (OCR). My jaw hit the floor when I heard that. Our marks were decided largely by the recognition rate on a test set generated by a randomly sized characters picked from a huge set of fonts. Mind you, the problem was substantially simplified from a 'real' OCR program, but still, getting above 85% was hard. If your program could only recognize less than 50% of the test characters, you got that as your mark, which was the first step towards failing the course.
The second project was even better: we were told to write an AI to play the card game Hearts. As a part of the project materials, we were given a "game server" that could could play a set of AIs against each other, so we could trial our AIs ourselves. The marking was evil: the professor ran our AIs randomly against each other for a large number of games, and ranked them by the number of wins. The AI that won the most games received 100%, and the lowest received under 50%.
I heard about other projects that he did for other years, I think one group had to write an AI to play a game similar to monopoly, again with 4 AIs per game.
No, ADSL in Australia is not > 10Mbit/sec to most people.
Whirlpool's latest survey showed that half the connections are running at less than 10Mbit/sec.
Yeah, but that's today. The national broadband system won't be rolled out for at least a decade, by which time that will have improved. A lot. Telstra is about to start rolling out DOCSIS 3.0 for their cable broadband, which starts at around 40 Mbps, and can go much higher.
Sounds about right... the Australian government is notorious for under delivering. Expect this roll-out to complete in a decade, by which time the average consumer will have 10 gigabits wired directly into their brain.
The Tesla c1060 processor boards sound like a very efficient way of packing in compute power, but unless they're neglecting to mention it, the 4GB of GDDR3 RAM each has on board has no error correction. Given the rates of correctable errors observed e.g. here, I could never recommend using it for computing simulations that matter. A flipped bit in a floating point number can have a disproportionate affect on the outcome of calculations that rely upon it, and short of running the whole simulation a second or third time, one couldn't be confident that such an error did not occur.
Large compute-intensive simulations can take weeks, and are used to justify engineering and business decisions that involve the disposition of large amounts of money and other resources — it is important that the computational part of the process can be relied upon.
Which is why the upcoming NVIDIA "Fermi" GPU based boards will support 4GB of ECC memory. Also, they'll have about 2 TFLOPS of single-precision power, and you can stack 4 of them in a box = 8 TFLOPS beside your desk.
I can't wait until the US government starts banning these things because they could be used by terrorists to design nuclear weapons or something. 8)
"Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries."
A good example of incompetence, a bad example however as far as a solution for your problem. A fiber ring would help little for communications confined within a continent.
It was going around the Pacific rim, as in the ocean, not Australia, the continent.
From a quick Google - its based on the ARM core (easily licensable cpu core)
Must be a coincidence, but I was just thinking a week ago why nobody's tried to make a many-core CPU by doing a cookie-cutter job and just replicating a simple ARM core a bunch of times... looks like someone has!
I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.
I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.
Yeah, I love the lack of forward planning by Telcos in Australia.
Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries.
Wouldn't be the first time, except maybe for AT&T.
I don't think that it's limited to just AT&T - I am in Australia, so have never even had to deal with them, but I am finding that in the vast majority of Australian companies as well, simple back to basics work quality is plummeting. Everything seems to be about making everything as cheap as possible - whether or not it even functions the way it is supposed to. That also goes for the majority of customer service dealings as well.
It seems that the "Do it once but do it properly" mentality is limited to very few people and businesses. I work as a business analyst and the amount of arguing I have to do with each project to get extra money spent to do things properly (the majority of the time it saves money in the long run anyhow for other projects - I am not even taking into account the maintenance and support savings into that equation) yet I seem to always have to fight the same battles over and over.
There's a simple reason for that: money is trivial to measure. Quality is much harder to measure. For example, failure rates like MTBFs often don't directly correlate into straight dollars and cents, but a small percentage chance that it might cost a large but unknown amount at some point in the future. This kind of thing confuses people, so they stick to the simple stuff. In an Excel spreadsheet, the solution that costs fewer up-front dollars is just "better" in the world view of most people.
I've had a conversation recently with the CIO of a major business who didn't quite understand why backups were worthwhile. He said something along the lines of "how does this help the business sell more widgets?".
I see the same thing, but often much worse, in big government or big bureaucracies. Project management is complex, so to simplify things, they just ignore the rest of the business or potential future requirements like they don't even exist. In the past, I've tried to point out that, say, with an additional 10% spend on one project they could halve the cost of a dozen future projects, but that's basically crazy talk to a project manager that has to minimize the cost of this project, right now. I've given up trying, and I bet a lot of other people have too.
Microsoft can really change things around if they decided to port Win7 to ARM, instead of offering only Windows CE.
But considering monopolies, I wouldn't expect that any time soon.
People generally use Windows on PCs because they have x86 Windows software they need to run.
How many people have a stack of ARM software to run on ARM Windows? If you're going to need new software anyway, why would anyone in their right mind pick Windows to run it on?
Because 6 months before you can even buy "Windows 8 - ARM Edition", Microsoft will have released a Visual Studio patch that enables "ARM" as a target alongside the existing x86/x64/Itanium platforms. Both.NET and Java will have runtimes ported as well. Converting 32-bit code from one CPU to another is much easier than going from 32-bit to 64-bit, so it wouldn't take very long for vendors to update their software for it. Also, Microsoft strongarms ISVs into compatibility. For example, it's often hard (or harder) to get "Windows Logo" certifications for software unless it works on various platforms.
By the time an ARM-compatible Windows is released, there would be thousands of titles compatible with it.
Seriously. I actually like iTunes, but damn is it a resource hog. Sometimes it will chew up 90%+ of CPU for no apparent reason. It will often be unresponsive to clicks for a couple seconds. I am not sure what is so complicated about a music player that causes this.
And then every time it asks me for an upgrade, it insists on installing Quicktime and other things that I don't want on my PC.
I don't use Macs, but wonder if all of Steve's apps behave this way...
I actually need and use iTunes (to talk to my iPhone), but one thing that shits me to no end is that every time I get a point-release update of iTunes, it installs two hidden "on startup" items. I have to use the 'msconfig' tool to get rid of them every bloody time.
Programs should really stop the habit of silently installing background processes that mostly do nothing except slow down the computer's boot time.
For example, since Vista, Windows has had a great task scheduler API that lets developer schedule system tasks like "check for update" on lots of complex criteria, such a "30 minutes after the PC goes idle". That way, the processes are only run once per machine (not user), don't slow down the boot, and can close to conserve memory after the check is done.
And don't get me started with the hideous piece-of-s*** that is Bonjour, which is a system service installed by iTunes that intercepts and modifies DNS requests. It opens your computer to vulnerabilities and has broken some apps. A music player has absolutely no business fucking around with system-wide DNS.
Every time someone complains that their machine is 'slow', it's either a virus, or I just use msconfig to disable the 50 startup processes installed by crap like iTunes. Miraculously, it turns out that there was nothing wrong with their hardware after all.
I had the same experience, but it took me a lot less time & money to reproduce it!
Some guy at work got a deal on Sennheiser open headphones, and I picked up a HD 595 for about AUD 300 back when they were about 450 in stores. I used a Sony amp I already had, and now my PC is set up so that the output from the PC goes to the amp over digital optics, and then I just plug the headphone into the amp. That eliminates the static and noise from the PC motherboard, otherwise the only function of the amp is that it provides a convenient volume control knob.
I used to have 'proper' speakers and amps for my PC, but the headphones are a night & day difference, even to my untrained ear. I immediately noticed subtle details in audio, and I can now easily hear compression artifacts that I couldn't detect before.
At the time, I was playing Diablo II, and I noticed little things in the audio I couldn't make out before. For example, the blacksmith in the second stage throws her hammer up in the air and catches it. There's a tiny little 'slap' sound when she catches the handle, which I just couldn't hear before.
I figured then that audio in some good quality games is a lot like the visuals: if you play a game designed for 24-bit color on a 16-bit display, you're going to be missing the intention of the artist. The same goes for sound, artists would be using good equipment, and the sounds will have subtle nuances you can't hear without at least decent audio equipment. If you use the $5 speakers that came with your PC, you just won't have the same experience.
The advantages of SANs are easy to realize, they need not necessarily be FibreChannel vs NAS (NFS/CIFS) as a SAN could be iSCSI, FCOE, FCIP, FICON etc..
-Storage Consolidation compared with internal disk.
-Fewer components in your servers that can break.
-Server admins don't have to focus on Storage except at the VolMgr/Filesystem level
-Higher Utilization (a WebServer might not need 500GB of internal disk).
-Offloading storage based functions (RAID in the array vs RAID on your server's CPU, I'd rather the CPU perform application work rather than calculating parity, replacing failed disks etc). This increases when you want to replicate to a DR site.
This is not a ZFS vs SANs argument. I think ZFS running on SAN based storage is a great idea as ZFS replaces/combines two applications that are already on the host (volmgr & filesystem).
I swear, I used to believe in this stuff, but I'm starting to see that it's more marketing myth than technical wizardry.
I love the way that RAID used to stand for Redundant Array of Inexpensive Disks, but according to EMC and their cohorts, it now stands for Redundant Array of Independent Disks. Notice the way they dropped the problematic "inexpensive" part?
It's one thing to reduce your costs by consolidating local disks onto cheaper networked disks, but my experience is that SAN arrays usually cost more than internal disk, even though they should be cheaper.
The genius of ZFS is that it's a return to the "inexpensive" part of RAID. An administrator can take a bunch of "low-end" SATA disks, apply some ZFS magic, and end up with performance and reliability numbers that would make your jaw drop. I've seen benchmarks that claim that a SUN Thumper, a single box of a mere 4 rack units, can do 1GB/sec of IO throughput. Not 1 gigabit per second, but 1 gigabyte per second. That's faster than the best 8Gb FC SANs!
More importantly, any competent admin can manage a Solaris ZFS filer. The ZFS command line utilities are simple to use, and I say this as a self-confessed "Windows Server Administrator". Compare this to most SAN arrays, which are so complex that most enterprises won't allow anyone but a "certified administrator" to even touch them.
Single instancing is a big step towards ending the dominance of the big players on the storage market. As soon as someone creates a software RAID like ZFS that has integrated controller redundancy instead of just storage redundancy, the era of traditional FC SANs is over. There's essentially nothing that a "hardware" RAID does other than controller redundancy that hasn't been already implemented in software. It's just a matter of time now...
Windows Storage Server 2003 (yes, yes I know its from Microsoft) shipped with this feature (that is called Single Instance Storage)
http://blogs.technet.com/josebda/archive/2008/01/02/the-basics-of-single-instance-storage-sis-in-wss-2003-r2-and-wudss-2003.a
It's not even close to the same thing.
We investigated this a while back, and it is basically a dirty, filthy hack on top of vanilla NTFS.
First of all, it doesn't compare blocks or byte-ranges, but entire files only. If two files are 99% identical, then they are different, and SIS won't merge them.
Second, it uses a reparse point to merge the files, which has significant overhead, at least 4KB for each file, if I remember correctly. That is, SIS won't save you any disk space for small files, which is actually quite common on file servers. The overhead erases much of the benefit even for larger files, to the level that SIS will skip files smaller than 32KB by default.
Third, it operates in the background, after files have been written. This means that files have to be written out in their entirety, read back in, compared byte-for-byte to another file, and then erased later. This is incredibly inefficient. On large file servers, the disk was thrashed like crazy.
Lastly, we found that the Copy-on-Write mechanism immediately copied out the entire file if it was changed even slightly. For small files, this is not noticable, but for large files this can be a massive performance hog. A 4kb write can be potentially translated into a multi-GB copy!
Proper single-instancing systems use in-memory hash tables that are often partitioned using "file similarity" heuristics to prevent cache thrashing. Even more advanced systems can maintain single-instancing during replication and backups, reducing bandwidth requirements enormously. Take a look at the features of the Data Domain filers for an idea of what the current state of the art is.
I assure you that I can maintain my grip on my sanity even in the face of the most advanced heuristics! 8)
Also just found another one, but this is a bit heavy on the maths:
Network information flow - R Ahlswede, N Cai, SYR Li, RW Yeung, 2000
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.4568&rep=rep1&type=pdf
I think this was the original paper that pointed out that it's possible to exceed the naive "optimal broadcast" efficiency of a single source on a switched network, by allowing intermediate nodes to perform some computation.
Mind you, this is only tangentially related to current P2P systems, but it's still interesting.
It wouldn't even require the cooperation of ISPs.
As the AC post just above yours pointed out, it's fairly simple to scrape this information from a wide variety of sources. It would be sufficient for someone to update a simple table once every few months. Even if it was baked into the uTorrent executable, it would get updated reasonably often along with the regular point releases.
Keep in mind also that most P2P clients represent a large set of distributed cooperating applications that could analyze and monitor the internet from "their perspective" and exchange routing efficiency data.
Perfection is not only not required, but not useful, because some randomness is still required to ensure that the P2P mesh is robust in the face of a partial failure somewhere.
would you care to provide links to these papers? i'm sure you've read quite a few that back this up otherwise you wouldn't be arguing this position.
I only have vague memories of reading some related stuff a while back (I'm not exactly keeping tabs of this stuff for research or anything), but I remember that there was a Slashdot post a while back about this paper from Microsoft Research:
Network Coding for Large Scale Content Distribution
http://research.microsoft.com/pubs/67246/tr-2004-80.pdf
It basically says that there's an even more efficient form of P2P, where the blocks are optimally encoded using something akin to a huge "RAID 5" type parity algorithm, so that every peer can help every other peer, immediately. From what I gather, this doesn't quite work in practice, not just because of the computational requirements, but because this algorithm has issues with untrusted peers -- users can only validate complete files, or something like that. The Slashdot post where it was mentioned had some good discussion and links to P2P design and issues, and I believe one or two developers who've worked on well-known P2P systems chimed in with some good points. One guy mentioned that the reconstitution algorithm thrashes the disk, for example.
From the reading I've done, it looks like traditional P2P (random block exchange) is surprisingly good, as long as the peers are chosen well. I actually studied some P2P protocols at work as a potential method for distributing files to an enterprise with thousands of sites, but I couldn't find any at the time that could be embedded into a larger systems as a module and could make intelligent peer choices. Most of the ones I tried kept chewing up WAN bandwidth instead of preferring local LAN bandwidth. Note that uTorrent does detect peers on the same subnet, but not adjacent subnets.
I can't find the link any more, but I remember that there was a paper recently that proved mathematically that certain types of data transfer mechanisms like client-server can never utilize more than a certain percentage of the total theoretical bandwidth of a network like the internet, but P2P does a lot better, and P2P with network coding is basically optimal, or close to it, but still not 100%.
I agree, I see other peers on the same ISP as me and others on another ISP which the ISP are also based in the same city as me and yet it doesn't connect to them.
Coding a way so you can manually prioritise that peer or domain would be easy to do.
I see this all the time too. It shits me to no end that I could be connecting to users with 10Mbit uplinks in the same city, but uTorrent blindly connects to peers in places like Hungary which is almost precisely the furthest possible distance from me.
Your post is precisely what is wrong: It's all about what you get out of an individual download.
Not everything in the world is about directly maximizing YOUR OWN PERSONAL BENEFIT. More importantly, you can actually improve your own personal speeds by cooperating. Read up on the Tragedy of the Commons. If many players all blindly try to maximize their personal utilization of a common resource, they all suffer as an aggregate.
This is particularly true of peer-to-peer protocols, which are ideally placed to utilize otherwise wasted local bandwidth. I've read papers that show that an efficiently designed P2P protocol can actually maximize the return on investment of a switched network, a feat that essentially no other type of protocol can achieve, largely because a well designed P2P protocol can minimize the amount of data flowing through inter-network or international links.
Why would anyone support this kind of behavior?
Because I do not want to make to job of MAFIAA any easier.
You do realise that the RIAA and the MPAA represent Recording Artists and Movie Producers, respectively, right?
Neither group are ISPs. Neither invest billions into internet infrastructure.
If peer-to-peer users would just play nice and use the ISP infrastructure efficiently, then maybe the ISPs wouldn't be so inclined to side with the content producers.
You might even find that if digital content distribution is done right, then the ISPs might start to push for laws similar to the "copyright tax" on writeable media to allow their users to legally download content.
Keeping traffic completely local would make it much easier to snag a bunch of file sharers in a massive "three strikes and you're out" campaign, don't you think? Since mere use of torrent software seems to be associated with illicit activity in the minds of the ignorant (ie. the authoRIAAties), I'm not sure that "I was just downloading the latest Ubuntu ISO" would be enough to avoid being threatened by the ISP. Lots of local inter-ISP torrent traffic might also cause them to alert local law enforcement to take a closer look. This could increase one's risk significantly, particularly if any 'infringing' content is ever shared (by an occasional, less enlightened, user of the connect, for example). Seems safer to not have to worry about local/non-local bandwidth, to be honest. Might be smarter to prefer connections that are as non-local and non-concentrated as possible. It's not always just about data transfer speed and bandwidth saving - there are other factors to consider.
[citation needed]
Keep in mind that in large part, the motivation of ISPs for monitoring or throttling bittorrent is not concerns over copyright violations, but the impact to their bottom line. All ISPs have three classes of links: Internal, peered, and external. They have a strong preference to maximize the utilization of the former over the latter, as internal links are effectively free and often underutilized, while external links are often very expensive and overloaded.
If torrent traffic utilized internal connections much more than external connections, ISPs wouldn't be financially motivated to monitor at all, because monitoring equipment is expensive. Right now, monitoring and throttling is worth it, because bittorrent tends to use external links the majority of the time.
In effect, improving efficiency would improve security.
It doesn't have to be reliable, it just has to be better than "totally random". Even a very bad peer selection policy would be a HUGE improvement over what they have now.
Prefix bits do not indicate location. 2 Class C's can be a long way from each other geographically. Even if the entire Internet was broken down into Class C spaces, and you prioritised addresses in your Class C, I don't think you would see many hits. I mean, there may be 50k people on the torrent, but how many of them are in the same neighbourhood as you?
That's why the Vuze plugin uses a IP->location mapping database.
True, but it's still better than random. Many countries were allocated IP blocks from large ranges. Most of Australia's IP addresses start with prefixes around 200-something, for example. Similarly, most ISPs have large blocks allocated to them like /8 ranges or the like. Some ISPs are big enough that torrent users could have 10 or more connections to peers in the same ISP for reasonably common files like TV shows, and only need 1 or 2 to the outside world.
Still, you're correct, adding even a simple country database would help a lot. There's easily obtainable databases of "AS" numbers that map IP ranges to organizations and/or countries, and embedding that into the client would also be a fairly simple exercise.
THEIR arrogance is astounding? How about yours? They are working FOR FREE. You are merely complaining. Get your hands dirty and start doing some work yourself.
You can suggest things all you want, but once you start insulting someone for their free work, you've crossed a line. Nobody is forced to use their client. There are dozens of decent clients and probably hundreds of open source ones.
As for their choices, they will work on what's more important to them, I'm sure. Since they don't need this 'local' feature, they haven't got much incentive to actually work on it.
First of all, they're not working for 'free', uTorrent is owned by BitTorrent Inc, a for-profit company. Initially it was free, but it's now developed by a corporation. Those devs are salaried employees.
More importantly, uTorrent depends on and uses infrastructure that is not free, by any stretch of the imagination. International links are $billions expensive.
So by your logic, just because a user can download their client for free, it gives Bittorent Inc carte blanche to do anything at all they want, including shit all over the internet infrastructure?
How the fuck does it make sense for a company who's product uses something like 30% of the total internet bandwidth to not make an hours worth of effort to minimize their impact on said infrastructure? Their product in its present state is so harmful that ISPs are buying millions of dollars worth of equipment to throttle it, and with good reason.
Read up on the Tragedy of the Commons and get a clue.
Compare their behavior to the largely free, open, and volunteer efforts of the dedicated people who worked on the early Internet protocols like DNS and NNTP. These were systems designed to scale, use bandwidth efficiently, and 'play nice'.
What happened since then? Why is it acceptable now to design a protocol that is maximally inefficient? Why would anyone support this kind of behavior?
There's a much bigger issue with uTorrent that the developers seem to refuse to solve, or even acknowledge.
In essence, uTorrent connects to clients randomly, and makes no attempt to prioritize "nearby" clients. This may not be a huge issue for Americans, but everywhere else, you know, like the rest of the fucking planet, this is hugely inefficient, for both the end users, and most importantly, ISPs. This is why they're throttling bittorrent: because it tends to make connections to peers outside the ISP's internal network, which costs ISPs money. In Australia for example, international bandwidth is extremely limited and very expensive, but local bandwidth, even between ISPs, is essentially unlimited, high-speed, and often free or 'unmetered'.
What do you think is going to be faster: connecting to your neighbour through at the same fucking router, or some kid's home PC in Kazakhstan over 35 hops away? Even connections from here to America have to go through thousands of miles of fiber optic cable over an ocean.
Note that some other clients like Azureus have already implemented weighted peer choices, where peers with similar IP addresses are preferred over other peers. It's not hard. Heck, it's a trivial change to make, as no changes need to be made to the protocol itself. A reasonably competent programmer could implement this in an hour: simply take the user's own IP address, and then sort the IPs of potential peers by the number of prefix bits in common, then do a random selection from that list, weighted towards the best-matching end. How hard is that?
The arrogance of the uTorrent devs is simply staggering. They're a group of developers who could, with an hours effort, reduce international bandwidth usage by double-digit percentages and improve torrent download speeds by an order of magnitude, but they just... don't.
I love exams like that, where there is no simple 'right or wrong' answer, but it's a competition against the rest of the class.
At the University of New South Wales, I had a similar experience in my first year of studying Computer Science. We had this great professor who was a bit more inventive than usual with exams and projects.
We started the first year studying a pure functional language called Haskell. A mere 4 weeks into the semester, just when we had figured out what 'functional' meant, and some of us could write a recursive loop, we got our first project: write a program to do optical character recognition (OCR). My jaw hit the floor when I heard that. Our marks were decided largely by the recognition rate on a test set generated by a randomly sized characters picked from a huge set of fonts. Mind you, the problem was substantially simplified from a 'real' OCR program, but still, getting above 85% was hard. If your program could only recognize less than 50% of the test characters, you got that as your mark, which was the first step towards failing the course.
The second project was even better: we were told to write an AI to play the card game Hearts. As a part of the project materials, we were given a "game server" that could could play a set of AIs against each other, so we could trial our AIs ourselves. The marking was evil: the professor ran our AIs randomly against each other for a large number of games, and ranked them by the number of wins. The AI that won the most games received 100%, and the lowest received under 50%.
I heard about other projects that he did for other years, I think one group had to write an AI to play a game similar to monopoly, again with 4 AIs per game.
No, ADSL in Australia is not > 10Mbit/sec to most people.
Whirlpool's latest survey showed that half the connections are running at less than 10Mbit/sec.
Yeah, but that's today. The national broadband system won't be rolled out for at least a decade, by which time that will have improved. A lot. Telstra is about to start rolling out DOCSIS 3.0 for their cable broadband, which starts at around 40 Mbps, and can go much higher.
Ha! I'll believe that when I'm connected to it.
Sounds about right... the Australian government is notorious for under delivering. Expect this roll-out to complete in a decade, by which time the average consumer will have 10 gigabits wired directly into their brain.
The Tesla c1060 processor boards sound like a very efficient way of packing in compute power, but unless they're neglecting to mention it, the 4GB of GDDR3 RAM each has on board has no error correction. Given the rates of correctable errors observed e.g. here, I could never recommend using it for computing simulations that matter. A flipped bit in a floating point number can have a disproportionate affect on the outcome of calculations that rely upon it, and short of running the whole simulation a second or third time, one couldn't be confident that such an error did not occur.
Large compute-intensive simulations can take weeks, and are used to justify engineering and business decisions that involve the disposition of large amounts of money and other resources — it is important that the computational part of the process can be relied upon.
Which is why the upcoming NVIDIA "Fermi" GPU based boards will support 4GB of ECC memory. Also, they'll have about 2 TFLOPS of single-precision power, and you can stack 4 of them in a box = 8 TFLOPS beside your desk.
I can't wait until the US government starts banning these things because they could be used by terrorists to design nuclear weapons or something. 8)
"Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries."
A good example of incompetence, a bad example however as far as a solution for your problem. A fiber ring would help little for communications confined within a continent.
It was going around the Pacific rim, as in the ocean, not Australia, the continent.
From a quick Google - its based on the ARM core (easily licensable cpu core)
Must be a coincidence, but I was just thinking a week ago why nobody's tried to make a many-core CPU by doing a cookie-cutter job and just replicating a simple ARM core a bunch of times... looks like someone has!
I know somebody who works on network infrastructure for Telstra. I suggested to him that a lot of traffic which currently goes through wireless and wired LANs will soon run through the cellular networks. He was horrified at the idea. Apparently TCP/IP traffic from 3G cells has to go all the way back to the internet backbone, so anything resembling P2P still saturates the links between the base stations and the back end. Thats a minor issue just now but in addition the links to the 3G cells are only just keeping up with demand right now.
I pointed to the European environment where 3G data is much cheaper and more bandwidth is available. He says that we don't do that kind of investment here. So at the end of the day its a money problem. Lots of profit being taken while they can get away with it.
Yeah, I love the lack of forward planning by Telcos in Australia.
Some years ago, there was talk of building some huge fiber-optic ring around the Pacific, connecting a bunch of countries. The only telco in Australia at the time that could afford to buy into the project was Telstra. One of the VPs of Telstra was quoted as saying "we have sufficient bandwidth right now". Think about it: the VP of a telco couldn't quite understand the need to maintain exponential growth in bandwidth right when broadband was taking off. Thanks to morons like that overpaid suit, Australia has been bandwidth-starved for a decade, which is why you don't see that many truly "unlimited" plans or free WiFi access points like in other countries.
Wouldn't be the first time, except maybe for AT&T.
I don't think that it's limited to just AT&T - I am in Australia, so have never even had to deal with them, but I am finding that in the vast majority of Australian companies as well, simple back to basics work quality is plummeting. Everything seems to be about making everything as cheap as possible - whether or not it even functions the way it is supposed to. That also goes for the majority of customer service dealings as well.
It seems that the "Do it once but do it properly" mentality is limited to very few people and businesses. I work as a business analyst and the amount of arguing I have to do with each project to get extra money spent to do things properly (the majority of the time it saves money in the long run anyhow for other projects - I am not even taking into account the maintenance and support savings into that equation) yet I seem to always have to fight the same battles over and over.
There's a simple reason for that: money is trivial to measure. Quality is much harder to measure. For example, failure rates like MTBFs often don't directly correlate into straight dollars and cents, but a small percentage chance that it might cost a large but unknown amount at some point in the future. This kind of thing confuses people, so they stick to the simple stuff. In an Excel spreadsheet, the solution that costs fewer up-front dollars is just "better" in the world view of most people.
I've had a conversation recently with the CIO of a major business who didn't quite understand why backups were worthwhile. He said something along the lines of "how does this help the business sell more widgets?".
I see the same thing, but often much worse, in big government or big bureaucracies. Project management is complex, so to simplify things, they just ignore the rest of the business or potential future requirements like they don't even exist. In the past, I've tried to point out that, say, with an additional 10% spend on one project they could halve the cost of a dozen future projects, but that's basically crazy talk to a project manager that has to minimize the cost of this project, right now. I've given up trying, and I bet a lot of other people have too.
Microsoft can really change things around if they decided to port Win7 to ARM, instead of offering only Windows CE.
But considering monopolies, I wouldn't expect that any time soon.
People generally use Windows on PCs because they have x86 Windows software they need to run.
How many people have a stack of ARM software to run on ARM Windows? If you're going to need new software anyway, why would anyone in their right mind pick Windows to run it on?
Because 6 months before you can even buy "Windows 8 - ARM Edition", Microsoft will have released a Visual Studio patch that enables "ARM" as a target alongside the existing x86/x64/Itanium platforms. Both .NET and Java will have runtimes ported as well. Converting 32-bit code from one CPU to another is much easier than going from 32-bit to 64-bit, so it wouldn't take very long for vendors to update their software for it. Also, Microsoft strongarms ISVs into compatibility. For example, it's often hard (or harder) to get "Windows Logo" certifications for software unless it works on various platforms.
By the time an ARM-compatible Windows is released, there would be thousands of titles compatible with it.
Seriously. I actually like iTunes, but damn is it a resource hog. Sometimes it will chew up 90%+ of CPU for no apparent reason. It will often be unresponsive to clicks for a couple seconds. I am not sure what is so complicated about a music player that causes this.
And then every time it asks me for an upgrade, it insists on installing Quicktime and other things that I don't want on my PC.
I don't use Macs, but wonder if all of Steve's apps behave this way...
I actually need and use iTunes (to talk to my iPhone), but one thing that shits me to no end is that every time I get a point-release update of iTunes, it installs two hidden "on startup" items. I have to use the 'msconfig' tool to get rid of them every bloody time.
Programs should really stop the habit of silently installing background processes that mostly do nothing except slow down the computer's boot time.
For example, since Vista, Windows has had a great task scheduler API that lets developer schedule system tasks like "check for update" on lots of complex criteria, such a "30 minutes after the PC goes idle". That way, the processes are only run once per machine (not user), don't slow down the boot, and can close to conserve memory after the check is done.
And don't get me started with the hideous piece-of-s*** that is Bonjour, which is a system service installed by iTunes that intercepts and modifies DNS requests. It opens your computer to vulnerabilities and has broken some apps. A music player has absolutely no business fucking around with system-wide DNS.
Every time someone complains that their machine is 'slow', it's either a virus, or I just use msconfig to disable the 50 startup processes installed by crap like iTunes. Miraculously, it turns out that there was nothing wrong with their hardware after all.
I had the same experience, but it took me a lot less time & money to reproduce it!
Some guy at work got a deal on Sennheiser open headphones, and I picked up a HD 595 for about AUD 300 back when they were about 450 in stores. I used a Sony amp I already had, and now my PC is set up so that the output from the PC goes to the amp over digital optics, and then I just plug the headphone into the amp. That eliminates the static and noise from the PC motherboard, otherwise the only function of the amp is that it provides a convenient volume control knob.
I used to have 'proper' speakers and amps for my PC, but the headphones are a night & day difference, even to my untrained ear. I immediately noticed subtle details in audio, and I can now easily hear compression artifacts that I couldn't detect before.
At the time, I was playing Diablo II, and I noticed little things in the audio I couldn't make out before. For example, the blacksmith in the second stage throws her hammer up in the air and catches it. There's a tiny little 'slap' sound when she catches the handle, which I just couldn't hear before.
I figured then that audio in some good quality games is a lot like the visuals: if you play a game designed for 24-bit color on a 16-bit display, you're going to be missing the intention of the artist. The same goes for sound, artists would be using good equipment, and the sounds will have subtle nuances you can't hear without at least decent audio equipment. If you use the $5 speakers that came with your PC, you just won't have the same experience.