iSCSI Moves Toward Standard
EyesWideOpen writes "The iSCSI technology, which allows computers to connect to hard drives over a network connection such as a company Ethernet network or the Internet, requires only minor changes before the Internet Engineering Task Force endorses it as a formal version 1.0 standard. A final round of comments has been completed on the technology according to the Storage Networking Industry Association, the subgroup that led the creation of the iSCSI, and as a result companies now can start building iSCSI products."
sPh
One example that is in my face is SAN's and the office that I am in. There is 14 offices around the world, and having one centeralized data center would make things so much easier for local office staff, and reduce costs for storage maintance. Less cost for more skilled people in the remote offices.
My $0.02
www.oobersworld.com - For those that ride.
...I give it a week or two before someone buys a patent for "Accessing digital storage devices via a network" and sues.
Jeesh.
El riesgo vive siempre!
I work at a mid size hosting facility, and we've done quite a bit of experimentation with iSCSI. In my opition it's not ready yet. Either that or it's just a bad idea, full stop.
We do quite a bit with our SAN -- there are a coupla IBM 2105 ESS ("Shark") boxen in the back of the data center with many terabytes of disk online. It's all about Fibre Channel. At least as fast as SCSI, effectively faster when you have all sorts of cache running on the storage side, and you have the flexibility to define exactly how much disk goes to what server, and you can add more dynamically without a power down, etc.
Unfortunately, Fibre Channel is expensive. It requires expensive host bus adapters and even more expensive switches. And of course it runs over fiber optic cable, which isn't exactly penny kit. So the industry decided to try running it over Ethernet.
Now there are iSCSI-to-Fibre gateways, such as Cisco's 5420 Storage Router (which we've evaluated), but there are just problems in general with running block level storage over a TCP/IP network...That's why our iSCSI stuff is just sitting around doing nothing right now.
The only place I can see iSCSI being used at this time is for really temporary quick-and-dirty setups, such as a programmer needing another 100 GB online for a one-week project. But even then, NAS seems like a better idea.
Tired of FB/Google censorship? Visit UNCENSORED!
I applaud all such efforts. If it doesn't work, fine, we won't use it. But if it works, it could easily become yet another technology that is excellent for its uses. Think about this technology a little more deeply. With a bit of work, it would change the name of the game in file servers. All operating systems that support iSCSI and the FS would be able to share the harddrive. I can see some savings down the line in terms of maintenance, and reduced downtime. I hope I'm right. Now, we just need to figure out exactly how to use this technology.
If everyone had fiber into their homes, I can at the very least see harddrive upgrades without ever opening the box. Wouldn't that be nice, folks?
Stop the brainwash
The difference is very simple:
With a file server (current buzzword is "NAS" for Network-Attached Storage) the server maintains the file system, and multiple clients connect to it to read and write files. It's a shared *file system*.
With a SAN (Storage Area Network) a bunch of raw disks is made available over a network. Currently this is normally Fiber Channel; iSCSI will bring standard Ethernet to SANs, making it much cheaper. No file system is mandated by the SAN; a machine connected to the SAN gets access to one or more raw disks and can use them any way it wants. Typically, the unit of allocation is one disk, though some systems (EMC) allow disks to be subdivided and the sub-disks handed out separately. While the storage pool on the whole is shared, each disk (or sub-disk) is only connected to one machine at a time.
A SAN provides a centrally managed pool of local disk, so you don't have to run around upgrading individual servers. This is a *big* win for large corporations.
Well, the article is useless, but this white paper clarifies some points.
One exquisite use would be for someone maintaining a lab: imagine remotely partitioning and ghosting 100's of computers from a single console through Gigabit Ethernet, or being able to repartition a colocated server.
One aspect that is disappointing is that it just looks like SCSI over IP. None of the peer to peer aspects of Firewire were mentioned, such as target-disk mode that newer Macintoshes support. It's really nice to be able to reboot, hold 't' and plug my laptop into another Mac and have its hard disk appear on the desktop as though it was an external Firewire disk.
We're starting to see PCs ship with 10/100/Gig ethernet standard. Within a year or two, it won't be unreasonable to run GigE to every desktop in the building.
Now consider what iSCSI offers the system admins. You can use the network boot option on the desktop systems and run them diskless. This means you can centralize your storage. No longer to you face the daily panic of a user desperate to recover a file they only saved on their local hard drive. If someone is having trouble with their system, you just give them a fresh boot image; if the problem persists, it's hardware. If I were a sysadmin, I would be pushing hard for iSCSI.
And from the technology standpoint of iSCSI vs. Fibre Channel, I expect that ethernet speeds will outpace Fibre Channel speeds; it's a larger market, so the R&D investment will go there first.
[Disclaimer: I work for a data storage company, but everything stated here is based on general observations and opinions, not insider information.]
I don't understand why it is necessary to tunnel a low level protocol like scsi over ethernet (other than to trick legacy software into remote storage). There are protocols for remote storage, why not use these?
Jilles
Nor have I ever understood the difference between a "Storage Area Network" and a "pre-packaged Novell file server with all permissons set to RWX", except that the SAN is priced 10 times higher!
Would you like to?
There are basically two types of SANs. The two types are not mutually exclusive; they can coexist on the same network.
The first type is exclusive access to shared storage. Let's say you have a big enterprise storage system, like an IBM Shark or an HDS 9960 or an EMC Symmetrix. These devices are basically giant RAIDs with fibre channel switches built right in. You can connect one computer-- PC, Unix system, supercomputer, whatever-- to each fibre channel port on the storage system, then use the storage system's software to carve it up into LUNs. Let's say the Windows server gets 5 TB, and the Oracle cluster gets 20 TB, and the compute server gets 1 TB. You create RAID sets using the storage system's control software, then assign each set (5 TB, 20 TB, 1 TB) to a fibre channel port. Each machine thinks it has a directly attached storage device, when actually it's just getting a piece of the big storage device in the basement. The point is that you can put all your eggs in one exceptionally good basket, reducing maintenance costs, and you can reconfigure things on the fly without moving any cables around. It's handy, especially in a big data center environment. You can also take advantage of some cleverness inside the storage system this way, using features like point-in-time snapshots, serverless backup, or filesystem mirroring. One data center I work with has two HDS 9960 systems, one in one city and another in another city, connected by some big pipe (OC-3? OC-12? I forget.) They run some special Hitachi software on the two storage systems that keeps the two devices in sync all the time. Basically, an atomic bomb could take out the entire data center and the city around it, but the data would be safe.
So that's one type of SAN. It's about centralizing exclusive access to shared storage. These kinds of SANs make a ton of sense under some circumstances. You generally have to have at least dozens of servers, each with their own storage requirements, before it makes sense to bother with this kind of thing.
The other type of SAN is about shared access to shared storage. This requires a special type of filesystem, like Centravision CVFS or SGI CXFS. (There are some hybrid solutions out there, like Sanergy. I haven't worked with Sanergy myself, but I've heard bad things about it.) With these SANs, each client has read-write access to the same filesystem. It's kind of like what you described-- a server with wide-open file permissions-- but without the server. Access to the filesystem is at fibre channel wire speeds, 100 MB per second or more, with really low latency. This kind of system has serious drawbacks, though. SAN or cluster filesystems are complex, and that makes them more prone to failure of some kind. Heterogeneous host support is also a challenge. Finally, SANs like this just don't scale, because of contention. If you have a hundred clients reading data from a server, the server will put the IO requests in a queue and cache them intelligently. Read some data from A, cache it and stream it out the network interface while reading some data from B, and so on. You can sustain relatively high data transfer efficiency that way, as long as your server is beefy enough. But with a shared-access SAN, there's no caching request arbitrator in the middle. There's just your computer and that other computer, giving the disks conflicting instructions. Even with the biggest, smartest RAID controller, you're still going to run into disk access contention issues pretty quickly. I've seen a shared-access filesystem grind to a halt when as few as four computers were all hitting the disks at once. The heads were spending more time seeking than they were spending reading. That's kind of a bad example, though, because that system used a really shitty RAID controller for its storage device. But it proves the principle of what I'm saying.
Because of these drawbacks, shared-access SANs really work best for server clustering. If you have a parallel cluster of servers all accessing the same database-- particularly if they're just query servers and the database is read-only-- then it makes sense to consider putting the tables on a shared-access SAN to keep storage costs low. Especially if you have ten servers and a 10 TB database; you can save 90 TB of disk by using a shared-access SAN.
So yeah, there's a huge difference between a SAN and a file server with wide-open permissions. They're different tools, and you should use them for different sorts of jobs. Anybody who tries to tell you, though, that a SAN can replace a file server in a typical network-attached storage environment doesn't know what he's talking about.
Fiber holds some promise, but can't supply the electrical power that some cabling systems do. If you try to create a cable that has everything for everyone, it gets expensive to manufacture (try comparing the price between phone wiring, cat 5 ethernet and optical; I don't even know of a cable that has copper and optical in it).
science is a religion
There are WAY to many people that are not bright enought to know where to hook up the cable. You will have SCSI devices being pluged in to a Floppy port. CD ROM drives in to sound cards. You see my point?
We see your point, but I think you missed the OP's point. (Or at the very least, his implication.)
In a magical happy land with gumdrop houses on lollypop lane, it wouldn't matter where these bits and pieces got plugged in. Your computer would have one or more Ports on the back. Got a monitor? Plug it into a Port. Got an external drive? Plug it into a Port. Got network access? Plug it into a Port. All the Ports are the same, and figuring out which device does what is handled in software. So it doesn't matter where you plug things in.
I agree with you that it won't happen. I'm not completely sure I agree that it shouldn't. I think it probably could, but like many thing, the expense and overhead seems disproportionate to the scale of the problem.
Apple already has an economy system known as the iMac, so wouldnt it be viable that they will also be using iSCSI for their systems?! See, iMac and iSCSI will work really well together because the first character in both names begin with the same character.
(a 64bit 1Gbs network addapter is often as fast as disk anyway pratically speaking)
If you're lucky-- without serious tweaking, I mean-- you can get 50 MB/s over gigabit ethernet. That's what I get using FTP between two SGI boxes using the SGI-approved 64-bit card and jumbo frames. Yes, this is faster than the ATA hard drive in your laptop, Chaz.
Using a single fibre channel loop, each of my lab systems gets about 95 MB/s from its RAID. (Small RAID, with [I think] 8 drives.)
Using multiple fibre channel loops, my servers pull about 400 MB/s off their RAIDs. And that's using 1 Gbps FC. If we decided to upgrade to 2 Gbps FC, we could get twice that performance, because the disks are capable of it.
There's the rub, right there. It's trivial to put a second FC adapter in your system and double your storage performance; just map a second LUN to the other port and stripe your disk accesses across both LUNs. How can you do that over iSCSI? That'd be a routing nightmare.
Anytime you read that IETF is about ready to approve something as a standard, take it with a grain of salt unless it comes from the IETF chair or the area director responsible for that group. Such statements are usually propaganda from people who are trying to encourage premature adoption, or at best they are wishful thinking. It's not unusual for working groups to produce drafts which they think are ready for approval, but which actually contain serious technical problems that need to be resolved. Fixing those problems can require months or even years.
In particular, the fact that The Storage Networking Industry Association has completed its comments on the draft doesn't have any bearing whatsoever on IETF standardization.
Someone mentioned the security issue. I haven't followed the iSCSI discussions but security is definitely an issue that was identified before the group was formed, and one which is particularly difficult to solve for iSCSI because of performance concerns. I'll be interested to see how they've addressed it. I'd consider it extremely unlikely for IETF approve the standard without due consideration of security. And saying "it's going to be behind a firewall, so it doesn't have to be secure" has traditionally not been considered sufficient.
(FWIW, I'm a former IETF area director)
According to this article at lwn.net (scroll down past SSSCA discussion to get to iSCSI discussion), the possibility exists that iSCSI could not be used by free operating systems because of patent encumbrances. Were these issues resolved since then?
--Lawrence Lessig for Congress!
There are protocols for remote storage, why not use these?
I agree that for most network storage, low-level SAN protocols are pointless - higher-level abstractions of remote disk such as smb/nfs/etc are much better as they enforce proper filesystem semantics, and run on top of a physical filesystem. You get all the advantages of having a filesystem in the first place - locking, sane disk space allocation algorithms, journaling, that sort of thing.
However, some applications - big databases particularly - prefer to have raw access to the storage medium, with no filesystem in the way to slow them down. These applications implement their own locking, sharing and space allocation semantics which are optimized for their own particular storage use patterns.
Classic file sharing protocols don't cut it for these big databases because there's no way to get raw disk access over the network with them. Which is why these lower-level SAN protocols exist - they provide the raw disk access that the big databases want, over a network. This means you can have your database spread over multiple physical locations to minimize the risk of your whole database going up in smoke, without taking the performance hit that running the database over smb/nfs would have.
You won't see iSCSI hardware making it into bog-standard file server hardware any time soon, but I can see it being huge in big-iron database servers, where it should be considerably cheaper and easier than Fibre Channel, the current best solution.
Admittedly, there are big questions over whether raw disk access is really necessary for databases - modern general-purpose filesystems are a LOT quicker than they used to be, and MySQL, for instance, which doesn't use raw disk IO but is still blazingly fast, is turning some of the performance assumptions on their head. But the big guys - Oracle, DB2 and so forth - still prefer it, so this is why iSCSI is here.
In fact there is a standard for this, USB. (Universal) It supports almost everything that can be added, although currently does not support the bandwidth requirements for some peripherals.
You're reading my mind. I was thinking that in the instant that I hit the "Submit" button. USB does seem to have a lot of the characteristics of a universal port, with some exceptions. It has nowhere near the signal bandwidth necessary to drive a monitor, for example.
There is an important difference between my SCSI chain and an IP network - you won't find many SCSI chains with the kinds of security threats that are quite common on networks these days. Remember that block devices live below the OS permissions level - it's deeper than root access.
I hope that iSCSI has good security measures *enabled by default*. I remember some discussion on iSCSI mailing lists about using SRP and potential intellectual property problems. I hope it's in the final standard.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
That's why you have a tablet that runs RDP/X over 802.11b. Or a. The only problem is games and video . . . well, it was a nice idea. I don't forsee video being able to be carried over anything but multichannel audio/signalled digital/fiber cable in the near future.
funny munging
iSCSI? Yes. I have a friend who works for a company developing iSCSI devices. He uses Linux exclusively and claims that Linux has the most robust support for iSCSI.
You know what, maybe it wasn't the fastest but it worked!!!!! You could even boot diskless systems which would carry on running quite happily using the remote disks as though they were local. In effect, all you did was to boot a system image that used a RAM-disk to start itself. This still works on Linux and many other Unix like systems. Many systems have ways of booting from RO media. Once the NI is loaded, you can network mount the remote disks and dismount the RAM disk.
Digital effectively split up disk access using something called MSCP. It was somewhat more general than the Linux SCSI 3-layer model but it effectively split the disk access by a program or file system from a device driver. It became a trivial matter to split the communication between the levels via the net. Of course, getting a disk mounted by more than one system led to some real fun on the file system side, but that eventually worked too. You know, sometimes, you need a pool of storage that isn't mega-high speed, but where you can store a lot.
As for your comments about Gigabit Lans, well that becomes less of an issue than switching.
Ok, these days HP/Compaq/Digital use Fibre-Channel for their high-performance systems. However, the price is far from cheap. Last, I heard the NI-based clusters still work very well and as the network performance was increased, so was the remote mounted disk throughput.
I don't know how well the iSCSI people are doing, but as long as they realise that they need to fix a few other details (a standard network lock protocol would be really cool to allow two disparate systems to coordinate access).
See my journal, I write things there
Why would it be a routing nightmare? Just assign a second IP to the second lun and network adapter, easy as can be. The fact is that very few machines really need much more than 50-100MB/s because the clients arent going to be able to get data much more quickly than that anyways. There are obvious exceptions like DB servers, but they are the minority. Most of the time management of disk space is much more important than speed of disk access.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
With a SAN (Storage Area Network) a bunch of raw disks is made available over a network. Currently this is normally Fiber Channel; iSCSI will bring standard Ethernet to SANs, making it much cheaper.
Bingo. Cheap stock gig-E cards and a driver hack on top of a classic IP stack and you can build a mainframe-reliable file server / disk farm out of commodity boxes from the local PC store.
But that network better not be connected to anything BUT the disks and the file servers' private disk-interface LAN(s), and the file servers better not have IP forwarding enabled (or have a good filter). Else one carefully corrupted packet destroys one file system. (Maybe two or so for RAID.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Um, dosen't FireWire already have the capacity to transmit uncompressed digital video?
It has the bandwidth (SDI only requires 270 Mbps; FireWire is 400 Mbps) but I don't know if anybody has used it for uncompressed video. People use it all the time for DV-compressed video, of course.
But, as you noted, that's merely TV-resolution data. DVI, on the other hand, can handle up to 5 Gbps, if I remember correctly. That's a big difference.
What we really need is something like SGI's XIO...
No, I don't think so. XIO uses a hundred pins. No hundred-pin interface could ever be that reliable. What we really need is a super-fast serial connection, like FireWire-only-a-lot-faster. With the price of fiber optics coming down steadily, I wonder whether it would be practical to try to design a rugged two-strand cable with roughly the same diameter as a FireWire cable, or less. That effectively removes the bandwidth problem from the connector and puts it into the transceivers, where it ought to be.
Why would it be a routing nightmare? Just assign a second IP to the second lun and network adapter, easy as can be.
Can your OS handle two IPs on the same network segment? None of the ones I know of can. You see, you can only have one route to a given network. So you might have two interfaces on the same network, but all your traffic is going to go through just one of them. The other one sits there and does nothing at all.
The fact is that very few machines really need much more than 50-100MB/s because the clients arent going to be able to get data much more quickly than that anyways.
Depends on your situation. In some cases, 640 K really is enough for anybody. For the rest of us, though....
If it really takes off, how about using iSCSI internally instead of raw SCSI? Then, all your disk interfaces could be the same.
Does the extra hardware for NICs still cost too much? (Last I heard, even raw SCSI was considered too expensive for the consumer market, so I'm probably off my rocker again.)
"Provided by the management for your protection."
You just have to ensure that, on a particular machine, each of its NICs gets an address from a different IP subnet.
Can somebody please tell me how this relates to iSCSI being easier to manage than SCSI over Fibre Channel? Running two separate subnets and two Ethernet drops to each client on the network sounds like a terrible way to scale.