Slashdot Mirror


Replacing Traditional Storage, Databases With In-Memory Analytics

storagedude writes "Traditional databases and storage networks, even those sporting high-speed solid state drives, don't offer enough performance for the real-time analytics craze sweeping corporations, giving rise to in-memory analytics, or data mining performed in memory without the limitations of the traditional data path. The end result could be that storage and databases get pushed to the periphery of data centers and in-memory analytics becomes the new critical IT infrastructure. From the article: 'With big vendors like Microsoft and SAP buying into in-memory analytics to solve Big Data challenges, the big question for IT is what this trend will mean for the traditional data center infrastructure. Will storage, even flash drives, be needed in the future, given the requirement for real-time data analysis and current trends in design for real-time data analytics? Or will storage move from the heart of data centers and become merely a means of backup and recovery for critical real-time apps?'"

11 of 124 comments (clear)

  1. Totally inane by MrAnnoyanceToYou · · Score: 5, Insightful

    Discarding data is something that, as a programmer, I don't often do. Too often I will need it later. Real time analytics are not going to change this. As long as hard drive storage continues to get cheaper, there's going to be more data stored. Partially because the easier it is to store large blocks the more likely I am to store bigger packets. I'd LOVE to store entire large XML blocks in databases sometimes, and we decide not to because of space issues. So, yeah, no. Datacenters aren't going anywhere. Things just get more complicated on the hosting side.

    Note that the article writer is a strong stakeholder in his earthshattering predictions coming true.

    1. Re:Totally inane by fuzzyfuzzyfungus · · Score: 3, Insightful

      Also, it isn't really all that earthshattering. The fact that RAM is faster and offers lower latency than just about anything else in the system has been true more or less forever. Essentially all OSes of remotely recent vintage already opportunistically use RAM caching to make the apparent speed of disk access suck less(nicer RAID controllers will often have another block of RAM for the same purpsoe). Programs, at the individual discretion of their creators, already hold on to the stuff that they will need to chew over most often in RAM, and only dump to disk as often as prudence requires.

      The idea that, as advances in semiconductor fabrication make gargantuan amounts of RAM cheaper, high-end users will do more of their work in RAM just doesn't seem like a very bold prediction...

    2. Re:Totally inane by Kilrah_il · · Score: 3, Funny

      As advances in semiconductor fabrication make gargantuan amounts of RAM cheaper, high-end users will do more of their work in RAM.

      Now you have a bold prediction.
      Sincerely,
      me

      --
      Whenever in an argument, remember this.
    3. Re:Totally inane by tomhudson · · Score: 3, Insightful
      Good one - except that in this case, a lot of the so-called "work" is BS, consumers are pushing against being data-mined, regulators are getting into the act, and if your business model is so dependent on being a rude invasive pr*ck, perhaps you deserve to die ...

      And the same thing will happen when revenue-strapped governments slap a transfer tax and/or minimum hold periods on stocks - something that should have been done a long time ago.

    4. Re:Totally inane by quanticle · · Score: 3, Informative

      I didn't really see the author mention anything about discarding data. Rather, it seems like he's saying that existing databases (which attempt to commit data to persistent storage as soon as possible) will be marginalized as the speed gap between persistent storage and RAM widens. Instead, business applications are going to hold data in RAM, and rely on redundancy to prevent data loss when a system fails before its data has been backed up to the database.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
  2. The cutting edge is in high frequency trading by Animats · · Score: 5, Informative

    For the cutting edge in this area, see what the "high frequency traders" are doing. Computers aren't fast enough for that any more. The trend is toward writing trading algorithms in VHDL and compiling them into FPGAs, so the actual trading decisions are made in special-purpose hardware. Transaction latency (from trade data in on the wire to action out) is dropping below 10 microseconds. In the high-frequency trading world, if you're doing less than 1000 trades per second, you're not considered serious.

    More generally, we have a fundamental problem in the I/O area: UNIX. UNIX I/O has a very simple model, which is now used by Linux, DOS, and Windows. Everything is a byte stream, and byte streams are accessed by making read and write calls to the operating system. That was OK when I/O was slower. But it's a terrible way to do inter-machine communication in clusters today. The OS overhead swamps the data transfer. Then there's the interaction with CPU dispatching. Each I/O operation usually ends by unblocking some thread, so there's a pass through the scheduler at the receive end. This works on "vanilla hardware" (most existing computers), which is why it dominates.

    Bypassing the read/write model is sometimes done by giving one machine remote direct memory access ("RDMA") into another. This is usually too brutal, and tends to be done in ways that bypass the MMU and process security. So it's not very general. Still, that's how most Ethernet packets are delivered, and how graphics units talk to CPUs.

    The supercomputer interconnect people have been struggling with this for years, but nothing general has emerged. RDMA via Infiniband is about where that group has ended up. That's not something a typical large hosting cluster could use safely.

    Most inter-machine operations are of two types - a subroutine call to another machine, or a queue operation. Those give you the basic synchronous and asynchronous operations. A reasonable design goal is to design hardware which can perform those two operations with little or no operating system intervention once the connection has been set up, with MMU-level safety at both ends. When CPU designers have put in elaborate hardware of comparable complexity, though, nobody uses it. 386 and later machines have hardware for rings of protection, call gates, segmented memory, hardware context switching, and other stuff nobody uses because it doesn't map to vanilla C programming. That has discouraged innovation in this area. A few hardware innovations, like MMX, caught on, but still are used only in a few inner loops.

    It's not that this can't be done. It's that unless it's supported by both Intel and Microsoft, it will only be a niche technology.

    1. Re:The cutting edge is in high frequency trading by Gorobei · · Score: 3, Interesting

      Yep, the article is 10-20 years out of date.

      HFT has been using statistical synchronization of dbs for years.

      Big financial shops switched to in-memory dbs decades ago. With co-lo on the compute farms.

      I don't know why he's even talking about 32G boxes as servers. That's a desktop, real db hosts are an order of magnitude bigger.

      His "push the disks to the edge of the network?" Um, that's already happened - it's called tier 2. Tier 1 is the terabytes of solid-state storage we keep just in case.

      This is a blast from the 1990s.

    2. Re:The cutting edge is in high frequency trading by Rich0 · · Score: 3, Insightful

      There is another simple solution to optimizing HFT - just aggregate and execute all trades once per minute, with the division between each minute taking place in UTC plus/minus a random offset (a few seconds on average - with 98% of divisions being within 5 seconds either way).

      Boom, now there is no need to spend huge amounts of money coming up with lightning-fast implementations that don't actually create real value for ordinary people.

      Business ought to be about improving the lives of ordinary people. Sure, sometimes the link isn't direct, and I'm fine with that. However, we're putting far to much emphasis on optimizing what amounts to numbers games that do nothing to produce real things of value for anybody...

  3. Can we please stop already? by mwvdlee · · Score: 5, Insightful

    I'm getting sick and tired of hearing about yet another hype in IT-land where everything has to be done in yet another new way.

    All developers understand that different problems require different solutions. Will the managers who shove this crap up our asses please stop doing so? It's not productive, you're not going to get a better solution by forcing it do be implemented in whatever buzzword falls of the last bandwagon of an ever-growing parade of buzzwords.

    "In-memory analytics" is what we started out with before databases, and guess what; it's never gone away. We've never stopped using it. Now just tell us what problem you have let us developers decide how to solve it.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    1. Re:Can we please stop already? by Desert+Raven · · Score: 3

      Agreed, someone comes up with something new to solve a very specific issue, and all of a sudden someone's predicting how it will completely replace everything else in the next month.

      Grow up.

      Physical storage and relational databases aren't going anywhere anytime soon. in-memory this and non-relational that are all well and good for the specific problems they were designed for, but physically stored and relational data fits the needs of 90% of data storage and retrieval. I sure as HECK don't want my bank storing my financial data purely in memory.

      So keep yelling to yourselves about how the sky is falling on traditional techniques. Meanwhile the rest of us have real work to do.

  4. Re:Goodbye Orwell by quanticle · · Score: 5, Informative

    You're misinterpreting the post. No one said anything about long term data storage being marginalized or eliminated. Instead, the author is talking about the difference between persistent and non-persistent storage. He's saying that existing database technologies that rely on persistent storage are being marginalized as the speed difference between spinning disks and RAM widens, and the low cost of RAM makes it practical to hold large data sets entirely in memory. According to the author, data processing and analysis will increasingly move towards in-memory systems, while traditional databases will be relegated to a "backup and restore" role for these in-memory systems.

    --
    We all know what to do, but we don't know how to get re-elected once we have done it