HyperSCSI Examined
An anonymous reader writes "Eugenie Larson of byteandswitch.com has published a brief article that reviews the HyperSCSI protocol, which like iSCSI allows for an IP based san. The twist of HyperSCSI is that it's opensource, and runs over raw ethernet, avoiding the overhead of TCP/IP. The article has some comments from early adopters of HyperSCSI, as well as some comments from top vendors in the iSCSI industry."
I read somewhere that it's like 5 times faster than SCSI over TCP/IP. Is it true? And how great is the sacrifice of not using TCP/IP? I mean, what doesn't support Ethernet these days?
TCP is a layer above IP. Hence, the two are not mutually exclusive.
You can have "without the overhead of TCP/IP" and "IP based". All IP gets you is an address format and ARP type standards, it's not a lot of overhead.
If you look go to the MCSA site and look at the HyperSCSI FAQ, it does implement reliability and flow control, just not in the same manner as TCP.
The only technical negative side I can (at this time) is that because the implementation isn't over IP, you can't traverse a router. This usually isn't a problem but could cause some inflexibility in larger deployments.
and
obviously, it's so when linux does support it, legions of slashbots can complain about the duplicate stories!
From the article:
Read, L
remember that there are other ways to do error recovery besides with tcp. this system could detect errors by sending a crc of the total packet/sector sent and the receiving end would do the same crc. read-after-write would also detect bad stuff.
eric
The article does and abysmal job of covering this, but the homepage for HyperSCSI has a nice PDF presentation that covers just this topic. In short, it goes something like this: The SCSI protocols already provide error checking The HyperSCSI layer adds flow control and retransmits Ethernet provides certain other checks So, in total, you have the same reliabilty of iSCSI and FibreChannel with less overhead (i.e. significant overlapping of the protocols in terms of error detection/correction).
Oh, was that my outside voice?
USB maximum cable length: 5m
Serial ATA maximum cable length: 1m
Fiber Channel maximum cable length: 10,000 m
SATA is killing SCSI. Yeah. Thats right, I just replaced my 30 drive dual channel ultra320 array with... what, 3 12 channel SATA-150 controllers?
Come back and troll when SATA has doubled in speed, and when I can plug at least 15 drives into a card.
Or hell, just stay out of the game, since Fibre Channel has 2Gbit/ps (250MB/ps, still faster than SATA, slower than ultra320) and 255 devices, with multiple host access over a SAN, which can be set up redundantly. And that ignores the point of this article, SCSI over normal networking equipment.
So, to reiterate:
SATA - Lower speed. Lower capacity (# drives). Single host access. (Lower drive warranties too.) Cheap.
SCSI - Higher speed. Higher capacity (# drives). Multiple computer access via FC,iSCSI,HyperSCSI. (Longer drive warranties.) Expensive.
So, for the home user, cheap is good.
For the average financial institution which I'll estimate has roughly 1TB of information that needs to be available to everyone all of the time, well, they'll get what they pay for.
Not quite reinventing, just reengineering. To keep the analogy, it's like reformulating the rubber to provide better "grip" in racing tires - great for flat, dry tracks, but not great for inclement weather. In this case, they are redesigning TCP to remove all of the stuff that is unneccesary for this particular purpose:
But keep the stuff that IS needed:
This is a lot like using a GPU - the General Purpose CPU in your computer can do all of the same things, but having a processor that is streamlined for such a specific task is much more efficient.
Oh, was that my outside voice?
They are not the same, and increasing throughput doesn't necessarily increase latency.
Unroutable is definitely the way to go here.
You're gonna want to go with big packets and such. And you don't want to go through and entire stack just to get your sector.
Think of it this way, unroutable means never having to ARP.
NO, Ethernet is also a layer 2 protocol. HyperSCSI runs as a layer 3 protocol over Ethernet's layer 2. Remember, Ethernet is both a layer 1 protocol (At the physical side) and a layer 2 protocol (Data Link).
IP is Layer 3. HyperSCSI is Layer 3.
"You've got an invalid haircut" -Warren Zevon - Life'll Kill Ya
If you read the docs, you will note that they mention running over UDP/IP as a possible outgrowth. But they note as it's only really useful for Storage WANs (Since it does add some overhead), why not just use iSCSI. HyperSCSI is designed for Switched Networks, iSCSI is somewhat more flexible, if also somewhat slower.
For what they want UDP offers nothing that straight ethernet frames don't. And UDP has more overhead.
"You've got an invalid haircut" -Warren Zevon - Life'll Kill Ya
We recently built a 1.6 TB SATA file server for our (ahem) institution. Used a 3ware 8500-12 controller (which looks to the O/S like a single scsi device), 12 disks (10 active, 1 parity, 1 hot spare, 160 GB each). Redundant everything. The speed limiters turn out to be the filesystem (ext3, probably not the best choice for small files; writing directly to the device is about 60 MB/s, to the filesystem typically 10-20 MB/s) and the network connections. Our users haven't noticed a speed difference between it and the NetApp it replaced. But for about 1/10 the price, they sure noticed the 15x extra capacity!
IP is Layer 3. IP Rides the raw Datalink. IP is the same layer as HyperSCSI. Raw Datalink is layer 2.
Ethernet is both a Layer 1 topology and a Layer 2 Datalink protocol. That's why you can push ethernet frames over dissimilar topologies (Like 100baseFX and LANE over ATM).
OSI Layer 1 is Physical (Ethernet is here)
OSI Layer 2 is Datalink (Ethernet is also here)
OSI Layer 3 is Network (IP and HyperSCSI live here)
OSI Layer 4 is Protocol (TCP, UDP and the SCSI side of HyperSCSI live here)
"You've got an invalid haircut" -Warren Zevon - Life'll Kill Ya