Slashdot Mirror


Replacing Traditional Storage, Databases With In-Memory Analytics

storagedude writes "Traditional databases and storage networks, even those sporting high-speed solid state drives, don't offer enough performance for the real-time analytics craze sweeping corporations, giving rise to in-memory analytics, or data mining performed in memory without the limitations of the traditional data path. The end result could be that storage and databases get pushed to the periphery of data centers and in-memory analytics becomes the new critical IT infrastructure. From the article: 'With big vendors like Microsoft and SAP buying into in-memory analytics to solve Big Data challenges, the big question for IT is what this trend will mean for the traditional data center infrastructure. Will storage, even flash drives, be needed in the future, given the requirement for real-time data analysis and current trends in design for real-time data analytics? Or will storage move from the heart of data centers and become merely a means of backup and recovery for critical real-time apps?'"

4 of 124 comments (clear)

  1. Totally inane by MrAnnoyanceToYou · · Score: 5, Insightful

    Discarding data is something that, as a programmer, I don't often do. Too often I will need it later. Real time analytics are not going to change this. As long as hard drive storage continues to get cheaper, there's going to be more data stored. Partially because the easier it is to store large blocks the more likely I am to store bigger packets. I'd LOVE to store entire large XML blocks in databases sometimes, and we decide not to because of space issues. So, yeah, no. Datacenters aren't going anywhere. Things just get more complicated on the hosting side.

    Note that the article writer is a strong stakeholder in his earthshattering predictions coming true.

  2. The cutting edge is in high frequency trading by Animats · · Score: 5, Informative

    For the cutting edge in this area, see what the "high frequency traders" are doing. Computers aren't fast enough for that any more. The trend is toward writing trading algorithms in VHDL and compiling them into FPGAs, so the actual trading decisions are made in special-purpose hardware. Transaction latency (from trade data in on the wire to action out) is dropping below 10 microseconds. In the high-frequency trading world, if you're doing less than 1000 trades per second, you're not considered serious.

    More generally, we have a fundamental problem in the I/O area: UNIX. UNIX I/O has a very simple model, which is now used by Linux, DOS, and Windows. Everything is a byte stream, and byte streams are accessed by making read and write calls to the operating system. That was OK when I/O was slower. But it's a terrible way to do inter-machine communication in clusters today. The OS overhead swamps the data transfer. Then there's the interaction with CPU dispatching. Each I/O operation usually ends by unblocking some thread, so there's a pass through the scheduler at the receive end. This works on "vanilla hardware" (most existing computers), which is why it dominates.

    Bypassing the read/write model is sometimes done by giving one machine remote direct memory access ("RDMA") into another. This is usually too brutal, and tends to be done in ways that bypass the MMU and process security. So it's not very general. Still, that's how most Ethernet packets are delivered, and how graphics units talk to CPUs.

    The supercomputer interconnect people have been struggling with this for years, but nothing general has emerged. RDMA via Infiniband is about where that group has ended up. That's not something a typical large hosting cluster could use safely.

    Most inter-machine operations are of two types - a subroutine call to another machine, or a queue operation. Those give you the basic synchronous and asynchronous operations. A reasonable design goal is to design hardware which can perform those two operations with little or no operating system intervention once the connection has been set up, with MMU-level safety at both ends. When CPU designers have put in elaborate hardware of comparable complexity, though, nobody uses it. 386 and later machines have hardware for rings of protection, call gates, segmented memory, hardware context switching, and other stuff nobody uses because it doesn't map to vanilla C programming. That has discouraged innovation in this area. A few hardware innovations, like MMX, caught on, but still are used only in a few inner loops.

    It's not that this can't be done. It's that unless it's supported by both Intel and Microsoft, it will only be a niche technology.

  3. Can we please stop already? by mwvdlee · · Score: 5, Insightful

    I'm getting sick and tired of hearing about yet another hype in IT-land where everything has to be done in yet another new way.

    All developers understand that different problems require different solutions. Will the managers who shove this crap up our asses please stop doing so? It's not productive, you're not going to get a better solution by forcing it do be implemented in whatever buzzword falls of the last bandwagon of an ever-growing parade of buzzwords.

    "In-memory analytics" is what we started out with before databases, and guess what; it's never gone away. We've never stopped using it. Now just tell us what problem you have let us developers decide how to solve it.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  4. Re:Goodbye Orwell by quanticle · · Score: 5, Informative

    You're misinterpreting the post. No one said anything about long term data storage being marginalized or eliminated. Instead, the author is talking about the difference between persistent and non-persistent storage. He's saying that existing database technologies that rely on persistent storage are being marginalized as the speed difference between spinning disks and RAM widens, and the low cost of RAM makes it practical to hold large data sets entirely in memory. According to the author, data processing and analysis will increasingly move towards in-memory systems, while traditional databases will be relegated to a "backup and restore" role for these in-memory systems.

    --
    We all know what to do, but we don't know how to get re-elected once we have done it