Phase Change Memory vs. Storage As We Know It
storagedude writes "Access to data isn't keeping pace with advances in CPU and memory, creating an I/O bottleneck that threatens to make data storage irrelevant. The author sees phase change memory as a technology that could unseat storage networks. From the article: 'While years away, PCM has the potential to move data storage and storage networks from the center of data centers to the periphery. I/O would only have to be conducted at the start and end of the day, with data parked in memory while applications are running. In short, disk becomes the new tape."
CD-R is a phase change memory. It revolutionized things, but even DVD-Rs and BD-Rs aren't that spectacular these days. Seems holographic discs have more potential if the cost barrier comes down.
... the death of x tech here, it will eventually die once the groundwork has been laid to migrate to a better system.
When you can pick up 4GB of RAM memory for a song, why not load the whole OS into memory? As long as you don't suffer a system crash, you can unload it back to disk when you're done.
How soon we forget. The article is speculative, sure, but the hardware is not only real, it's in mass production by Samsung: http://hardware.slashdot.org/article.pl?sid=09/09/28/1959212
Just looking at the numbers, the article is a bit overblown. Phase change memory will first be a good replacement for flash memory, not DRAM. It's still considerably slower than DRAM. But it eliminates the erasable-by-page-only problem that has plagued SSDs, especially Intel SSDs, and the article does mention SSDs as a bright spot in the storage landscape. PCM should make serious inroads into SSDs very quickly because manufacturers can eliminate a whole blob of difficult code. With Samsung's manufacturing muscle behind it, prices per megabyte should be reasonable right out of the gate and as Samsung gets better at it, prices should plummet even faster than flash memory did.
The I/O path between storage and the CPU will get an upgrade, and it could very well be driven by PCM. Flash memory SSDs are already very fast and PCM is claimed to be 4X faster. That saturates the existing I/O paths (barring 16-lane PCIe cards sitting next to the video card in an identical slot). Magnetic hard drives haven't come anywhere close to saturation. Development concentrated for a decade (or two?) on increasing capacity, for which we are thankful, but the successes in capacity development have outrun improvements in I/O speed. In turn, that meant that video cards were the driver behind I/O development, not storage. Now that there's a storage tech in the same throughput class as a video card, I expect there to be a great deal of I/O standards development to deal with it.
But hard drives == tape? Not for a long long time. The development concentration on increasing capacity will pay off for many years to come. PCM arrays with capacities matching modern hard drives (2 TB in a 3.5" half height case. Unreal!) are undoubtedly a long ways off.
Hopefully there are no lurking patent trolls under the PCM bridge...
> disk becomes the new tape
Well they got this right even if it was not to be accomplished with the mentioned technology.
I think that in the medium/long time range this will undoubtedly come true.
I mean, would any /. reader bet on the chances of hard drives to come on par with today memory access speeds in the future, even with zillions of years of technological advancement ?
Everything I write is lies, read between the lines.
Ya, we had that back in the stone-age and Multics would have been poster-child for this type of thinking, but it was a *bitch* and made portability problematic. I think VMS has some of this type of capability with their Files 11 support - any VMS people care to comment. Unix (and most current OS) sees everything as a stream of bytes, in most cases, and this is much simpler.
An OS cannot be everything to all people all the time...
It must have been something you assimilated. . . .
Numonyx announced some good advances in PCM a few months back:
http://www.pcper.com/comments.php?nid=7930
Allyn Malventano
Storage Editor, PC Perspective
this sig was brought to you by the letter
I was tempted to stop reading right there, but I kept reading. While his point about POSIX improvements is not bad, the rest of the article is ridiculous. It essentially amounts to: Imagine if we had pretty much exactly what we have today, but we used different words to describe the components of the system! We already have slower external storage (Networked drives / SANs, local hard disk), and incremental means of making data available locally more quickly by degrees (Local Memory, L2 Cache, L1 Cache, etc.) We already get that at the expense of its ability to be accessed by other CPUs a further distance away. It turns out I probably should have stopped reading when I first got the feeling I should when reading the first sentence in the article: "Data storage has become the weak link in enterprise applications, and without a concerted effort on the part of storage vendors, the technology is in danger of becoming irrelevant." I can't wait to answer with that one next time and watch jaws drop:
...
Boss: Where and how are we storing our database, how are do we ensure database availability, and how are we handling backups?
me: You're behind the times Boss. That is now irrelevant!
Yeah. That's the ticket
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
I imagine that is a generous characterization.
There seem to be plenty of not-even-computer-related engineers and students here (and others too!), if someone reads me in the wrong direction.
Nerd rage is the funniest rage.
fadvise and FADV_SEQUENTIAL exist in posix. Not sure how well different oses like Linux or bsd use the hints -- I know that some of it's been broken because of bad past implementations.
You are a god amongst men but you waste all your knowledge upon this tribe.
A loop, by its nature, continues. If that didn't make sense, start reading this sentence again.
From TFA:
Ya, we had that back in the stone-age and Multics would have been poster-child for this type of thinking, but it was a *bitch* and made portability problematic.
No, Multics would have been the poster child for "there's no I/O, there's just paging" - file system I/O was done in Multics by mapping the file into your address space and referring to it as if it were memory. ("Multi-segment files" were just directories with a bunch of real files in them, each no larger than the maximum size of a segment. I/O was done through read/write calls, but those were implemented by mapping the file, or the segments of a multi-segment file, into the address space and copying to/from the mapped segment.)
I think VMS has some of this type of capability with their Files 11 support - any VMS people care to comment. Unix (and most current OS) sees everything as a stream of bytes, in most cases, and this is much simpler.
"Seeing everything as a stream of bytes" is orthogonal to "a hint that the file will be read sequentially". See, for example, fadvise() in Linux, or some of the FILE_FLAG_ options in CreateFile() in Windows (Windows being another OS that shows a file as a seekable stream of bytes).
equals phase change memory
First off most non-volatile RAM isn't nearly as fast as DRAM. So let's assume you mean "what if everything were in DRAM, and that was non-volatile, it would be so much faster". Well, again not really. Faster, but there are far more bottlenecks than just disk I/O. You can go buy ramdisks now, or you could make them in your current RAM, copy the OS there, and run off that after you boot. Go try it. Firefox isn't going to render quicker, your mail isn't going to load any faster, and youtube isn't going to lag any less. If you work with large photos, most software is already going to exhaust your RAM, so (given you have sufficient quantities) you're already not losing anything.
In short, because of modern hard disk and OS caching, the ridiculous quantities of RAM these days, and a current reliance on the network for most tasks, a pure ramdisk system isn't likely to be that much better for most people. If you put a large database or maybe compile there, you would see improvement. But that's not common for most people.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
We have it today. Tfa's on crack.
It's called madvise
In Linux there is also fadvise()
Of course... reading from a file (from an app point of view) is really nothing more than accessing data in a mapped memory area. Oh.. I suppose unless you actually use the POSIX mmap call to map the file into memory for reading, you won't have an easy ability to provide the advise.
And it makes portability a bitch regardless, as not all OSes are POSIX, and not all OSes have mmap().
Nevertheless, it's not fair to say it is impossible for an app to provide hints. Whether giving the hints or not actually has a useful effect (usually) may be a matter of debate.
Troll detected.
Eh. I've worked in the electronics industry long enough to know that plenty of engineers are also fuckin' morons. It's bad enough that floor scum like me had to save my company a shitload of effort by recommending something as simple and common-sense as the notion of spare connectors also being connector-savers during test. Previously they'd had to shut the whole station down weekly because of pushed pins before they replaced the backplane.
The proles have an excuse - being uneducated. You engineers shouldn't be making these basic common-sense fuckups and yet I continue to see it time and time again. What do you do all day, anyway? We all know you made it though calculus 5 and quantum physics...and yet you still continue to overlook the most obvious shit that a lowly machinist or even the fucking coffee boy would've caught.
There was a recent article about engineers being more likely to turn to terror. Maybe it all comes to the bitterness of having missed out on all that pussy in college. But you da man, Mr. Elitist, you da man.
The real question is whether we need something other than read/write/seek to deal with the various forms of solid-state memory. The usual options are 1) treat it as disk, reading and writing in big blocks, and 2) treat it as another layer of RAM cache, in main memory space. Flash, etc. though have much faster "seek times" than hard drives, and the penalty for reading smaller blocks is thus much lower. Flash also has the property that writing is slower than reading, while for disk the two are about the same. For small I/O operations, the operating system overhead for the operation takes more time than the actual data access.
For most end users, permanent storage is for storing big sequential files, audio or video. There are interfaces that would make databases faster (one could have flash devices that implemented a key/value store, with onboard lookup), but nobody would notice when playing video. The trend in databases is already to get enough RAM to keep all the indices in RAM, so we're already doing the "read it in the morning" thing suggested in the article. So the payoff for building flash devices to help with that is modest.
There are interesting things to do in this space, but improving reliability in the RAID sense is probably more important than speeding up non-sequential small accesses.
Access to data isn't keeping pace with advances in CPU and memory, creating an I/O bottleneck that threatens to make data storage irrelevant.
Data storage. Irrelevant. I.. see. The new year is not yet 14 hours old but I feel a certain confidence that this will be the single most vacuous thing I encounter in 2010 - and I've already seen Entertainment Tonight this year.
Windows is more closely tied to the whole "Separate levels of RAM memory and Hard Disk Memory" than Linux is I could really see Linux get more traction of all systems went to PCM tomorrow.
Tsukasa: All I really want, is to be left alone...
There was a recent article about engineers being more likely to turn to terror. Maybe it all comes to the bitterness of having missed out on all that pussy in college. But you da man, Mr. Elitist, you da man.
Whoa, methinks someone struck a nerve.
sic transit gloria mundi
Maybe these guys ought to ask someone that was around in the days BEFORE there were SANs. Managing storage back then absolutely sucked. Every server had it's own internal storage with it's own raid controller OR had to be within 9m (the max distance of LVD SCSI) of a storage array.
There was no standardization, every OS has it's own volume managers, firmware updates, patches etc etc etc. Plus compare the number of management points when using a SAN vs internal storage. An enterprise would have thousands of servers connecting through a handful of SAN switches to a handful of arrays. Server admins have more important things to do than replace dead hard drives.
Want to replace a hot spare on a server, what a pain. As you had to understand the volume manager or unique raid controller in that specific server. I personally like how my arrays 'call home' and an HDS/EMC engineer shows up with a new drive, replaces the failed one and walks out the door, without me having to do anything about it.
Two words: Low Utilization. You'd buy an HP server with two 36GB drives and the OS+APP+data would only require 10GB of space. So you'd have this land locked storage all over the place.
Moving the storage to the edge? Even if you replace spinning platters with solid state, putting all the data on the edge is a 'bad thing.'
"But Google does it!"
Maybe so, but then again they don't run their enterprise based upon Oracle, Exchange, SAP, CIFS/NFS based home directories etc like almost all other enterprises do.
It's because data storage will ALWAYS be relevant (talk to any Alzheimers' patient if you don't believe me) that access speeds are a concern.
The SAN argument is that your storage is so precious it must not be stranded. If you're paying $50K/TB with drives, controllers, FC switches, service, software, support, installation and all that jazz then that's absolutely true. If you're doing something like OpenFiler clusters on BackBlaze 90TB 5U Storage Pods for $90/TB and 720 TB/rack you have a different point of view. As for somebody showing up to replace a drive, I think I could ask Jimmy to put his jacket on and shuffle down to the server room to swap out a few failed drives every couple months - that's what hot and cold spares are for and he's just geeking on MyFace anyway. Low utilization? Use as much or as little as you like - at $90/TB we can afford to buy more. We can afford to overbuy our storage. We can afford to mirror our storage and back it up too. In practice the storage costs less than the meeting where we talk about where to put it or the guy that fills it. If you want to pay for the first tier OEM, it's available but costs 10x as much because first tier OEMs also sell SANs.
Openfiler does CIFS/NFS and offers iSCSI shared storage for Oracle, Exchange and SAP. If you need support, they offer it. OpenFiler is nowhere near the only option for this. If you want to pay license fees you could also just run Windows Server clustered. There are BSD options and others as well. Solaris and Open Solaris are well spoken of, and ZFS is popular, though there are some tradeoffs there. Nexenta is gaining ground. There's also Lustre, which HP uses in its large capacity filers. Since you're building your own solution you can use as much RAM for cache as you like - modern dual socket servers go up to 192GB per node but 48GB is the sweet spot.
Now that we've moved redundancy into the software and performance into the local storage architecture, moving storage to the edge is exactly what we want to do: put it where you need it and if you need a copy for data mining then mirror it to the mining storage cluster. We still need some good dedicated fiber links to do multisite synchronous replication for HA, but that's true of SAN solutions also. We're about 20 years past when we should have ubiquitous metro fiber connections, and that's annoying. Right now without the metro fiber the best solution is to use application redundancy: putting a database cluster member server in the DR site with local shared storage.
Oh, and if you need a lot of IOPS then you choose the right motherboard and splurge on the 6TB of PCIe attached solid state storage per BackBlaze pod for over a million IOPs over 10Gig E. If you need high IOPS and big storage you can use adaptor brackets and 2.5" SSDs or mix in an array of The Collossus, though you're reaching for a $6K/TB price point there and cutting density in half but then the SSD performance SAN has an equal multiple and some serious capacity problems. If you go with the SSD drives you would want to cut down the SAS expanders to five drives per 4x SAS link because those bad boys can almost saturate a 3Gbps link while normal consumer SATA drives you can multiply 3:1.
If you're more compute focused then a BackBlaze node with fewer drives and a dual-quad motherboard with 4 GPGPUs is a better answer. At the high end you're paying almost as much for the network switches as you are for the media. If you're into the multipath SAS thing then buy 2x the controllers and buy the right backplanes for that - but
Help stamp out iliturcy.
I don't think the author knows much about the purpose of a SAN. A SAN is not just a disk array giving you faster access to disks. Local storage that is faster does not help you with concurrent access (clusters), rollback capability(Snapshots, mirror copies \ point in time server recovery), site recovery(off sited mirrors) or substantial data compression gain through technologies like deduplication.
As for speed, my SAN is giving me write performance in the range of 600mbytes/sec per client. I access my storage over a 10gbit ethernet backbone. Certainly suboptimal, but my blades have a pair of nics and no disks. It's cheap, very fast and I have 3-4 rollback points for my ESX cluster. Thats around 200 VM's in two sites, active, active cross recoverable.
The SAN is not going away.
(In case any of you are desiging and want the part list I'm talking about Cisco Nexus 5020 10Gbe backbone, Bluearc Mercury 100 cluster with disks slung on a HDS USP-VM. 64gb cache depth on each path and a few hundred tb of disk. Servers are HP BL495 G6's, with Chelsio cards. Chassis has BNT(HP) 10gbe switches. I haven't even started with Jumbo's yet, I can do better, but this is pretty good for now. All up it was just over a mil AUD).
Whats this? It's a faster storage device. Thats a fairly small part in a SAN.
I mean what's the advantage of phase change memory in this scenario? If you loose power to your CPU or your system crashes, you will have effectively lost your memory content anyhow. So you might as well open your files with mmap and have lots of RAM. The system will automagically figure out what to swap to disk if RAM isn't enough as well as it will regularly backup the contents do disk.
Is anyone working on micromachines (MEMS) that set vast arrays of very tiny storage discs into very tiny radio transmitters, each disc transceiving on its own very narrow frequency band? A 1cm^2 chip, perhaps stacked a dozen (or more) layers thick, delivering a couple hundred million discs per layer, each holding something like 32bits per microdisc and a GB per layer, streaming something like 2-200Tbps per layer, seek time 10ns, consuming a few centiwatts per layer.
Or skip the radio and just max out a multimode fiber throughput. Parallelizing data transfer should leave stored data transferrable entirely in under 250ms.
--
make install -not war
> It's because data storage will ALWAYS be relevant (talk to any Alzheimers' patient if you don't believe me) that access speeds are a concern.
I think he means, if RAM is persistent and you have the equivalent of a hard drive in bytes, why would you need to store anything that's already in memory??
You misread my intent.
Nerd rage is the funniest rage.
Data storage gives a nice place to keep everything in sync. It's NOT just about storing any old data.
Also, it simply doesn't scale. Not with the way that individuals today are consuming gigabytes every day. It only provides a benefit if multiple users are hitting the same data sources - same as any other caching scheme - and then we again run into the problem of keeping all these edge caches in sync. It absolutely doesn't scale, and will generate substantially more network traffic than hitting a central server would - and you'd have to hit the central server anyway, to maintain data coherency.
It also assumes something that is absolutely false - that there is a *need* for this. Back when 40 megs of hd space was $500, sure ... but cheap terabyte drives remove the need for a lot of the centralized stuff - there's no reason that peer-to-peer collaborative efforts can't be better for things like two or more people working on the same project/document, or distributing files (bittorrent has certainly proven THAT), or anything else.
The nature of the problem has changed, and centralized servers will not be the future ... the cloud is just another re-working of client-server, and when all devices can talk directly to each other (hello, IPv6 where ARE YOU?) the cloud will vanish. In 2020 we'll be thinking of cloud computing the same way we though of those brick cell phones from the 80s.
IPv6, good-enough computing, wide-spread broadband, cheap storage - it's this confluence of events that is a real game-changer. Those hooked on the concept of cloud computing aren't thinking very far ahead.
What the author fails to realize is that the limiting factor on a SAN is most often the host itself, not the disk. A single disk my not have the IO, but an array most certainly does (depends on array). A standard, 33 MHz PCI bus can only transfer 133Mb/s (theoretical max). Even faster buses still do not match the I/O speed or throughput of a SAN.
The limiting factor on a PC is that southbridge chip, not the storage. The vast majority of the systems typically connected simply can not push the I/O fast enough out of its ports. It is not waiting on disk, it is waiting on the IO of its bridge chip and bus. Of course putting it on a ram disk is faster. RAM sits off the north bridge and therefore has better throughput to the CPU.
This is more a limit of bridge chips and PC architecture then the speed of a SAN.
who buys desktops these days?
People who want a full-size keyboard, video, and mouse, and who don't want to pay for a duplicate mini-keyboard, mini-monitor, and mini-mouse built into a laptop. They either A. drive to work and thus never have enough time as a passenger on mass transit to make using the computer away from the docking station worth it, or B. have a smartphone, a handheld gaming device, an e-book reader, or even a paperback book to pass the time.
Or people who use video games, certain kinds of CAD software, or other software requiring more performance than is available in an affordable laptop. It's like asking why anybody buys a PS3 and PS3 games instead of a PSP and PSP games, or a Wii and Wii games instead of a DS and DS games, or an Xbox 360 and Xbox 360 games instead of a Windows Mobile smartphone and Windows Mobile games.
Or people who use peripherals for which the external (USB) version carries a significant price premium. For example, when I bought a video digitizer a few years back, I chose the PCI version (ATI TV Wonder VE) over the USB version because USB video digitizers were twice as expensive and required a USB 2.0 card.
Best thing, they have built in UPS ;)
For the price of a replacement battery for a lot of laptops, I could almost buy a new computer.
Superfetch? You're kidding, right? Real VMs were doing this long before MS figured it out.
NT has always had a disk cache. SuperFetch of Windows 6.x just extends it to files that haven't been opened yet, as in Lord Byron II's suggestion of loading more of the operating system into RAM at startup.
Actually the Amiga uses the better technology - architecture been here for a while . Intel bought that out -The Itanium and PS3 are clones of the Amiga-So yes a better architecture / system does exist and the real problem is the x86 architecture -1947 technology invented by Motorola left in 1969. The x86 (north and south bridge) is the problem not the storage.
VMS allows a process to map a file into its address space and use the paging mechanism to do the disk i/o. It is kind of like a private page file. You get fast random access. You get persistence if you close the file/map properly when you are finished with it.
It has been a long time since I looked at this stuff, but I think you could share the mapped file with other local processes. You had to roll your own atomic access with mutexes.
RMS/Files-11 is a whole different, overly-complex issue. At least FIVE different formats for a simple sequential text file. A whole multi-indexed database in a single file.
--
has taken on a sense of urgency. We're losing Terabytes of zeros a day people!!
But seriously, I'm not sure about their claim that a PCM cell works better the smaller it gets. At some point the material will be subject to statistical mechanical fluctuations which could wipe your memory.