iSCSI Moves Toward Standard
EyesWideOpen writes "The iSCSI technology, which allows computers to connect to hard drives over a network connection such as a company Ethernet network or the Internet, requires only minor changes before the Internet Engineering Task Force endorses it as a formal version 1.0 standard. A final round of comments has been completed on the technology according to the Storage Networking Industry Association, the subgroup that led the creation of the iSCSI, and as a result companies now can start building iSCSI products."
sPh
One example that is in my face is SAN's and the office that I am in. There is 14 offices around the world, and having one centeralized data center would make things so much easier for local office staff, and reduce costs for storage maintance. Less cost for more skilled people in the remote offices.
My $0.02
www.oobersworld.com - For those that ride.
...I give it a week or two before someone buys a patent for "Accessing digital storage devices via a network" and sues.
Jeesh.
El riesgo vive siempre!
I've noticed that the convergence of data transfer cables seems to be increasing. We have Serial ATA, iSCSI, 10/100/1000 Ethernet, USB, FireWire and also HyperTransport. These are all attempts to simplify the cabling while increasing speed, with the exception that the current HyperTransport implementations are all hardwired on motherboards. Personally, I think it would make life easier if there was one thin 2 to 4 wire cable that was useable by all electronic devices both external and internal. *Sigh* It will probably be another 10 years before it's actually a single cable, if ever.
Nor have I ever understood the difference between a "Storage Area Network" and a "pre-packaged Novell file server with all permissons set to RWX", except that the SAN is priced 10 times higher!
I suspect that is because you don't know what a SAN is.
Try NetBSD... safe,straightforward,useful.
I work at a mid size hosting facility, and we've done quite a bit of experimentation with iSCSI. In my opition it's not ready yet. Either that or it's just a bad idea, full stop.
We do quite a bit with our SAN -- there are a coupla IBM 2105 ESS ("Shark") boxen in the back of the data center with many terabytes of disk online. It's all about Fibre Channel. At least as fast as SCSI, effectively faster when you have all sorts of cache running on the storage side, and you have the flexibility to define exactly how much disk goes to what server, and you can add more dynamically without a power down, etc.
Unfortunately, Fibre Channel is expensive. It requires expensive host bus adapters and even more expensive switches. And of course it runs over fiber optic cable, which isn't exactly penny kit. So the industry decided to try running it over Ethernet.
Now there are iSCSI-to-Fibre gateways, such as Cisco's 5420 Storage Router (which we've evaluated), but there are just problems in general with running block level storage over a TCP/IP network...That's why our iSCSI stuff is just sitting around doing nothing right now.
The only place I can see iSCSI being used at this time is for really temporary quick-and-dirty setups, such as a programmer needing another 100 GB online for a one-week project. But even then, NAS seems like a better idea.
Tired of FB/Google censorship? Visit UNCENSORED!
After reading the CNet article, I still couldn't figure out why this was necessarily a great thing. So I went over to SNIA's website and read the white paper.
Anyway, it makes more sense now, and I can definitely see benefits. What we're talking about here is network-accessible storage with a very low barrier to entry, both in cost and in expertise to set up. In a way it reminds me of the Filer (1Tb filespace machine that we used via mounting NFS shares onto it) I had at my last job, but much, much less expensive and much, much easier to run.
Interesting stuff, at any rate.
I applaud all such efforts. If it doesn't work, fine, we won't use it. But if it works, it could easily become yet another technology that is excellent for its uses. Think about this technology a little more deeply. With a bit of work, it would change the name of the game in file servers. All operating systems that support iSCSI and the FS would be able to share the harddrive. I can see some savings down the line in terms of maintenance, and reduced downtime. I hope I'm right. Now, we just need to figure out exactly how to use this technology.
If everyone had fiber into their homes, I can at the very least see harddrive upgrades without ever opening the box. Wouldn't that be nice, folks?
Stop the brainwash
The difference is very simple:
With a file server (current buzzword is "NAS" for Network-Attached Storage) the server maintains the file system, and multiple clients connect to it to read and write files. It's a shared *file system*.
With a SAN (Storage Area Network) a bunch of raw disks is made available over a network. Currently this is normally Fiber Channel; iSCSI will bring standard Ethernet to SANs, making it much cheaper. No file system is mandated by the SAN; a machine connected to the SAN gets access to one or more raw disks and can use them any way it wants. Typically, the unit of allocation is one disk, though some systems (EMC) allow disks to be subdivided and the sub-disks handed out separately. While the storage pool on the whole is shared, each disk (or sub-disk) is only connected to one machine at a time.
A SAN provides a centrally managed pool of local disk, so you don't have to run around upgrading individual servers. This is a *big* win for large corporations.
Well, the article is useless, but this white paper clarifies some points.
One exquisite use would be for someone maintaining a lab: imagine remotely partitioning and ghosting 100's of computers from a single console through Gigabit Ethernet, or being able to repartition a colocated server.
One aspect that is disappointing is that it just looks like SCSI over IP. None of the peer to peer aspects of Firewire were mentioned, such as target-disk mode that newer Macintoshes support. It's really nice to be able to reboot, hold 't' and plug my laptop into another Mac and have its hard disk appear on the desktop as though it was an external Firewire disk.
You suspect wrong, dude. I have been cutting through marketroid speak for more than 20 years.
sPh
I think you meant NAS. SAN is not really anything like what you describe.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
You suspect wrong, dude. I have been cutting through marketroid speak for more than 20 years.
:-)
That's great - but do you not think you might have cut a little too far?
Try NetBSD... safe,straightforward,useful.
Damn, I should change my name&nick than?
We're starting to see PCs ship with 10/100/Gig ethernet standard. Within a year or two, it won't be unreasonable to run GigE to every desktop in the building.
Now consider what iSCSI offers the system admins. You can use the network boot option on the desktop systems and run them diskless. This means you can centralize your storage. No longer to you face the daily panic of a user desperate to recover a file they only saved on their local hard drive. If someone is having trouble with their system, you just give them a fresh boot image; if the problem persists, it's hardware. If I were a sysadmin, I would be pushing hard for iSCSI.
And from the technology standpoint of iSCSI vs. Fibre Channel, I expect that ethernet speeds will outpace Fibre Channel speeds; it's a larger market, so the R&D investment will go there first.
[Disclaimer: I work for a data storage company, but everything stated here is based on general observations and opinions, not insider information.]
I don't understand why it is necessary to tunnel a low level protocol like scsi over ethernet (other than to trick legacy software into remote storage). There are protocols for remote storage, why not use these?
Jilles
I can see this as being a possibility for workgroups/small to medium businesses looking to get into SAN tech, but The bandwidth would be pahtetic. Unless you had an ether segment decicated to your iSCSI the latency would be terrible. With FC, you have a dedicated Full Duplex pipe at 1Gb/sec minimum on the front side. with iSCSI, even using Gigabit Ethernet, the best bandwidth you would see is .3Gb/sec shared. I do not see this tech ever making it as a permanent large-scale solution
Ultimately, this WILL be the wave of the future.
What iSCSI will allow is a single topology for how information is transferred (persistent storage, transactions, peer-to-peer linkages, etc)
In large datacenters, you currently have Fibrechannel, FICON (a form of Fibrechannel), and SCSI in your mix of persistent storage communications. This is in addition to your already large networks of ethernet, FDDI, ATM, etc.
Each require their own expensive switches.
What iSCSI will eventually provide is a single fabric for all your data traffic. This will result in a substantial cost savings both in equipment investment AND maintenance, which affects the total cost of ownership and return on investment.
This WILL need a substantial rework on the many sides of the IP networks, such as HBA's (Host Bus Adapter) that fully implement the IP stack for performance reasons. Gigabit ethernet takes a substantial amount of your CPU just doing normal transactional data.
I look forward to the long term implementation of this.
Nor have I ever understood the difference between a "Storage Area Network" and a "pre-packaged Novell file server with all permissons set to RWX", except that the SAN is priced 10 times higher!
Would you like to?
There are basically two types of SANs. The two types are not mutually exclusive; they can coexist on the same network.
The first type is exclusive access to shared storage. Let's say you have a big enterprise storage system, like an IBM Shark or an HDS 9960 or an EMC Symmetrix. These devices are basically giant RAIDs with fibre channel switches built right in. You can connect one computer-- PC, Unix system, supercomputer, whatever-- to each fibre channel port on the storage system, then use the storage system's software to carve it up into LUNs. Let's say the Windows server gets 5 TB, and the Oracle cluster gets 20 TB, and the compute server gets 1 TB. You create RAID sets using the storage system's control software, then assign each set (5 TB, 20 TB, 1 TB) to a fibre channel port. Each machine thinks it has a directly attached storage device, when actually it's just getting a piece of the big storage device in the basement. The point is that you can put all your eggs in one exceptionally good basket, reducing maintenance costs, and you can reconfigure things on the fly without moving any cables around. It's handy, especially in a big data center environment. You can also take advantage of some cleverness inside the storage system this way, using features like point-in-time snapshots, serverless backup, or filesystem mirroring. One data center I work with has two HDS 9960 systems, one in one city and another in another city, connected by some big pipe (OC-3? OC-12? I forget.) They run some special Hitachi software on the two storage systems that keeps the two devices in sync all the time. Basically, an atomic bomb could take out the entire data center and the city around it, but the data would be safe.
So that's one type of SAN. It's about centralizing exclusive access to shared storage. These kinds of SANs make a ton of sense under some circumstances. You generally have to have at least dozens of servers, each with their own storage requirements, before it makes sense to bother with this kind of thing.
The other type of SAN is about shared access to shared storage. This requires a special type of filesystem, like Centravision CVFS or SGI CXFS. (There are some hybrid solutions out there, like Sanergy. I haven't worked with Sanergy myself, but I've heard bad things about it.) With these SANs, each client has read-write access to the same filesystem. It's kind of like what you described-- a server with wide-open file permissions-- but without the server. Access to the filesystem is at fibre channel wire speeds, 100 MB per second or more, with really low latency. This kind of system has serious drawbacks, though. SAN or cluster filesystems are complex, and that makes them more prone to failure of some kind. Heterogeneous host support is also a challenge. Finally, SANs like this just don't scale, because of contention. If you have a hundred clients reading data from a server, the server will put the IO requests in a queue and cache them intelligently. Read some data from A, cache it and stream it out the network interface while reading some data from B, and so on. You can sustain relatively high data transfer efficiency that way, as long as your server is beefy enough. But with a shared-access SAN, there's no caching request arbitrator in the middle. There's just your computer and that other computer, giving the disks conflicting instructions. Even with the biggest, smartest RAID controller, you're still going to run into disk access contention issues pretty quickly. I've seen a shared-access filesystem grind to a halt when as few as four computers were all hitting the disks at once. The heads were spending more time seeking than they were spending reading. That's kind of a bad example, though, because that system used a really shitty RAID controller for its storage device. But it proves the principle of what I'm saying.
Because of these drawbacks, shared-access SANs really work best for server clustering. If you have a parallel cluster of servers all accessing the same database-- particularly if they're just query servers and the database is read-only-- then it makes sense to consider putting the tables on a shared-access SAN to keep storage costs low. Especially if you have ten servers and a 10 TB database; you can save 90 TB of disk by using a shared-access SAN.
So yeah, there's a huge difference between a SAN and a file server with wide-open permissions. They're different tools, and you should use them for different sorts of jobs. Anybody who tries to tell you, though, that a SAN can replace a file server in a typical network-attached storage environment doesn't know what he's talking about.
Fiber holds some promise, but can't supply the electrical power that some cabling systems do. If you try to create a cable that has everything for everyone, it gets expensive to manufacture (try comparing the price between phone wiring, cat 5 ethernet and optical; I don't even know of a cable that has copper and optical in it).
science is a religion
iDunno, you tell me.
example.org - powered by Linux!
Yeah, wait until Apple develops a corporate project management system called iTeam. The catchphrase, of course, would be "There's no 'i' in iTeam! ..no, wait..."
example.org - powered by Linux!
Apple already has an economy system known as the iMac, so wouldnt it be viable that they will also be using iSCSI for their systems?! See, iMac and iSCSI will work really well together because the first character in both names begin with the same character.
Could this technology be used with other SCSI devices like Scanners and optical drives? For me, on more than one occasion, it would have been nice to share a scanner over the network.
(a 64bit 1Gbs network addapter is often as fast as disk anyway pratically speaking)
If you're lucky-- without serious tweaking, I mean-- you can get 50 MB/s over gigabit ethernet. That's what I get using FTP between two SGI boxes using the SGI-approved 64-bit card and jumbo frames. Yes, this is faster than the ATA hard drive in your laptop, Chaz.
Using a single fibre channel loop, each of my lab systems gets about 95 MB/s from its RAID. (Small RAID, with [I think] 8 drives.)
Using multiple fibre channel loops, my servers pull about 400 MB/s off their RAIDs. And that's using 1 Gbps FC. If we decided to upgrade to 2 Gbps FC, we could get twice that performance, because the disks are capable of it.
There's the rub, right there. It's trivial to put a second FC adapter in your system and double your storage performance; just map a second LUN to the other port and stripe your disk accesses across both LUNs. How can you do that over iSCSI? That'd be a routing nightmare.
Anytime you read that IETF is about ready to approve something as a standard, take it with a grain of salt unless it comes from the IETF chair or the area director responsible for that group. Such statements are usually propaganda from people who are trying to encourage premature adoption, or at best they are wishful thinking. It's not unusual for working groups to produce drafts which they think are ready for approval, but which actually contain serious technical problems that need to be resolved. Fixing those problems can require months or even years.
In particular, the fact that The Storage Networking Industry Association has completed its comments on the draft doesn't have any bearing whatsoever on IETF standardization.
Someone mentioned the security issue. I haven't followed the iSCSI discussions but security is definitely an issue that was identified before the group was formed, and one which is particularly difficult to solve for iSCSI because of performance concerns. I'll be interested to see how they've addressed it. I'd consider it extremely unlikely for IETF approve the standard without due consideration of security. And saying "it's going to be behind a firewall, so it doesn't have to be secure" has traditionally not been considered sufficient.
(FWIW, I'm a former IETF area director)
According to this article at lwn.net (scroll down past SSSCA discussion to get to iSCSI discussion), the possibility exists that iSCSI could not be used by free operating systems because of patent encumbrances. Were these issues resolved since then?
--Lawrence Lessig for Congress!
In one of my later posts on this topic is said "You should build your network based on requirements and budget." This is well within out requirements and would prove to save us some (10-15%) money overall.
www.oobersworld.com - For those that ride.
There are protocols for remote storage, why not use these?
I agree that for most network storage, low-level SAN protocols are pointless - higher-level abstractions of remote disk such as smb/nfs/etc are much better as they enforce proper filesystem semantics, and run on top of a physical filesystem. You get all the advantages of having a filesystem in the first place - locking, sane disk space allocation algorithms, journaling, that sort of thing.
However, some applications - big databases particularly - prefer to have raw access to the storage medium, with no filesystem in the way to slow them down. These applications implement their own locking, sharing and space allocation semantics which are optimized for their own particular storage use patterns.
Classic file sharing protocols don't cut it for these big databases because there's no way to get raw disk access over the network with them. Which is why these lower-level SAN protocols exist - they provide the raw disk access that the big databases want, over a network. This means you can have your database spread over multiple physical locations to minimize the risk of your whole database going up in smoke, without taking the performance hit that running the database over smb/nfs would have.
You won't see iSCSI hardware making it into bog-standard file server hardware any time soon, but I can see it being huge in big-iron database servers, where it should be considerably cheaper and easier than Fibre Channel, the current best solution.
Admittedly, there are big questions over whether raw disk access is really necessary for databases - modern general-purpose filesystems are a LOT quicker than they used to be, and MySQL, for instance, which doesn't use raw disk IO but is still blazingly fast, is turning some of the performance assumptions on their head. But the big guys - Oracle, DB2 and so forth - still prefer it, so this is why iSCSI is here.
There is an important difference between my SCSI chain and an IP network - you won't find many SCSI chains with the kinds of security threats that are quite common on networks these days. Remember that block devices live below the OS permissions level - it's deeper than root access.
I hope that iSCSI has good security measures *enabled by default*. I remember some discussion on iSCSI mailing lists about using SRP and potential intellectual property problems. I hope it's in the final standard.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
iSCSI? Yes. I have a friend who works for a company developing iSCSI devices. He uses Linux exclusively and claims that Linux has the most robust support for iSCSI.
You know what, maybe it wasn't the fastest but it worked!!!!! You could even boot diskless systems which would carry on running quite happily using the remote disks as though they were local. In effect, all you did was to boot a system image that used a RAM-disk to start itself. This still works on Linux and many other Unix like systems. Many systems have ways of booting from RO media. Once the NI is loaded, you can network mount the remote disks and dismount the RAM disk.
Digital effectively split up disk access using something called MSCP. It was somewhat more general than the Linux SCSI 3-layer model but it effectively split the disk access by a program or file system from a device driver. It became a trivial matter to split the communication between the levels via the net. Of course, getting a disk mounted by more than one system led to some real fun on the file system side, but that eventually worked too. You know, sometimes, you need a pool of storage that isn't mega-high speed, but where you can store a lot.
As for your comments about Gigabit Lans, well that becomes less of an issue than switching.
Ok, these days HP/Compaq/Digital use Fibre-Channel for their high-performance systems. However, the price is far from cheap. Last, I heard the NI-based clusters still work very well and as the network performance was increased, so was the remote mounted disk throughput.
I don't know how well the iSCSI people are doing, but as long as they realise that they need to fix a few other details (a standard network lock protocol would be really cool to allow two disparate systems to coordinate access).
See my journal, I write things there
Why would it be a routing nightmare? Just assign a second IP to the second lun and network adapter, easy as can be. The fact is that very few machines really need much more than 50-100MB/s because the clients arent going to be able to get data much more quickly than that anyways. There are obvious exceptions like DB servers, but they are the minority. Most of the time management of disk space is much more important than speed of disk access.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
What about a big-honking server running NFS...
I mean it is a software solution, but it does work.
With a SAN (Storage Area Network) a bunch of raw disks is made available over a network. Currently this is normally Fiber Channel; iSCSI will bring standard Ethernet to SANs, making it much cheaper.
Bingo. Cheap stock gig-E cards and a driver hack on top of a classic IP stack and you can build a mainframe-reliable file server / disk farm out of commodity boxes from the local PC store.
But that network better not be connected to anything BUT the disks and the file servers' private disk-interface LAN(s), and the file servers better not have IP forwarding enabled (or have a good filter). Else one carefully corrupted packet destroys one file system. (Maybe two or so for RAID.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I was working for IBM on this when we got hit by layoffs.. arg!
;-)
anyone have any job openings for a linux-iSCSI-nas programmer??
Why would it be a routing nightmare? Just assign a second IP to the second lun and network adapter, easy as can be.
Can your OS handle two IPs on the same network segment? None of the ones I know of can. You see, you can only have one route to a given network. So you might have two interfaces on the same network, but all your traffic is going to go through just one of them. The other one sits there and does nothing at all.
The fact is that very few machines really need much more than 50-100MB/s because the clients arent going to be able to get data much more quickly than that anyways.
Depends on your situation. In some cases, 640 K really is enough for anybody. For the rest of us, though....
This is a great peice of tech for the right application shared storage unforunatly it's only half the solution. This can make devices cheaper by utilizing off the shelf cheap networking gear vs expensive FC switching gear (Funny that Cisco supports each of them on the same frame though) but your still only getting access to a raw disk or more hopefully a RAID of raw disks. Now for a few things this makes a ton of sence a clustered solution with redundant data centers and a big pipe (think cheap leased fiber) they can locate mirrored storage and redundant servers. Things that perform better with raw disk IO (Read Oracle) inside a clustered invirnment are going to love this especialy since data sets are getting larger by the moment but actual data use in real time is down (there is a LOT of data laying on disk not being accessed very must and going to a HSM system is becoming less and less desirable due to the increased cheapness of disks (IDE drives are just about cheaper than tape right now and that trend is increasing))
Now for joe user this tech is pretty useless none of the major OS's support a multiple reader writer FS ontop of a block device (SGI has one thats part of there FS but dosent look to but part of the linux port yet but I may be wrong) Windows definatly dosent have anything for this out of the box there are solutions to do it but generaly more complicated that it's worth for a small installation or requires some big external hardware and drivers to make it work (EMC's "solution") to redirect the actial block IO of a network mount to a block device (generaly FC hardware or SCSI on some smaller setups FC is a lot more reliable though IMNSHO) This is all a TCO reduction movement that dosent make a whole lot of sence block devices get sped up buy using large buffers whereever you can shove them microcontrolers are great at doing back to back IO servers have other things to do. FC has latency issues as it's realy just a serial SCSI you can put hardware on two coasts and make it work but it's generaly not pretty iSCSI HBA's should be a lot more tollerant of latency.
No sir I dont like it.
FYI, HyperSCSI does roughly the same as iSCSI and claims to address some of its shortcomings.
..to you're beowulf cluster of furbies...
the above is my personal opinion and does not necessarily reflect that of the little voices in my head
This tech looks like it might make diskless stations a lot more feasable around the house. Nice!
I agree that maintenance issues are greatly reduced if you can put all the drives into one hotswappable array somewhere and still get decent performance. The downside is that the drives would probably have to be spinning 24/7 in a lot of cases, might reduce their service life. Wouldn't be a big deal for something like the Elite-23's, but they are really the exception, most drives probably wouldn't last a year.
Clickety Click
TodayTM BillyJoelTM GoogleTMd for StitchTMes due to WindowsTM while RollerbladeTMing with an AppleTM and a PopsicleTM
If it really takes off, how about using iSCSI internally instead of raw SCSI? Then, all your disk interfaces could be the same.
Does the extra hardware for NICs still cost too much? (Last I heard, even raw SCSI was considered too expensive for the consumer market, so I'm probably off my rocker again.)
"Provided by the management for your protection."
Can your OS handle two IPs on the same network segment? None of the ones I know of can. You see, you can only have one route to a given network. So you might have two interfaces on the same network, but all your traffic is going to go through just one of them. The other one sits there and does nothing at all.
I'm no expert, but Linux does support channel-bonding, which I think is what the poster was talking about.
Hardly. Any company seriously considering shipping a product that supports iSCSI has already been working on it for the past year and a half. I worked on a development team making an iSCSI target and initiator for Linux, and we had to suffer through major, non-backwards compatible draft releases as we tried to make iSCSI work. I guess that's why they have that disclaimer on them saying you're not supposed to use them for anything serious...
Anyway, I don't work there anymore, but I'd imaging there would only be small changes required for them to ship a fully standards-compliant iSCSI product.
--It's all fun and games, 'till someone loses an eye. Then it's one-eyed fun!--
You just have to ensure that, on a particular machine, each of its NICs gets an address from a different IP subnet.
Can somebody please tell me how this relates to iSCSI being easier to manage than SCSI over Fibre Channel? Running two separate subnets and two Ethernet drops to each client on the network sounds like a terrible way to scale.