Replacing Traditional Storage, Databases With In-Memory Analytics

← Back to Stories (view on slashdot.org)

Replacing Traditional Storage, Databases With In-Memory Analytics

Posted by Soulskill on Saturday January 1, 2011 @06:02AM from the neuralization-of-data-centers dept.

storagedude writes "Traditional databases and storage networks, even those sporting high-speed solid state drives, don't offer enough performance for the real-time analytics craze sweeping corporations, giving rise to in-memory analytics, or data mining performed in memory without the limitations of the traditional data path. The end result could be that storage and databases get pushed to the periphery of data centers and in-memory analytics becomes the new critical IT infrastructure. From the article: 'With big vendors like Microsoft and SAP buying into in-memory analytics to solve Big Data challenges, the big question for IT is what this trend will mean for the traditional data center infrastructure. Will storage, even flash drives, be needed in the future, given the requirement for real-time data analysis and current trends in design for real-time data analytics? Or will storage move from the heart of data centers and become merely a means of backup and recovery for critical real-time apps?'"

124 comments

Min score:

Reason:

Sort:

Goodbye Orwell by schmidt349 · 2011-01-01 06:07 · Score: 1, Interesting

The marginalization of long-term data storage can only be a good thing -- the big advertising and other firms get the analytical data that actually matters to their bottom line, and to the extent that the average joe's privacy is being invaded at the very least the fruits of that invasion will become increasingly accessible.
1. Re:Goodbye Orwell by quanticle · 2011-01-01 07:18 · Score: 5, Informative
  
  You're misinterpreting the post. No one said anything about long term data storage being marginalized or eliminated. Instead, the author is talking about the difference between persistent and non-persistent storage. He's saying that existing database technologies that rely on persistent storage are being marginalized as the speed difference between spinning disks and RAM widens, and the low cost of RAM makes it practical to hold large data sets entirely in memory. According to the author, data processing and analysis will increasingly move towards in-memory systems, while traditional databases will be relegated to a "backup and restore" role for these in-memory systems.
  
  --
  We all know what to do, but we don't know how to get re-elected once we have done it
2. Re:Goodbye Orwell by postbigbang · 2011-01-01 07:26 · Score: 1
  
  Mod parent up.
  The post asks an 'or' question which is plainly stupid and demonstrates a lack of knowledge on the part of the poster. Analytics are but one part of organizational asset deployments. In and of themselves, analystics initiatives don't really change storage. There are occasions where outputs are transient, but audit/compliance necessitate storing enough that whatever needs to be constructed can, and what can be legally/ethically discarded will be.
  So data center storage needs don't really change-- they're growing like crazy 24/7. Cool analytics are just another production method.
  
  --
  ---- Teach Peace. It's Cheaper Than War.
3. Re:Goodbye Orwell by hairyfeet · 2011-01-01 10:15 · Score: 2
  
  Exactly. It really doesn't matter if you have the slowest (and thus longer lasting and cheaper to operate) HDD on the planet if all the important data is in RAM and kept there. RAM since DDR has gotten so ridiculously fast that NO SSD has a snowball's chance of catching up anytime soon, if at all, and the economies of scale have made RAM one of the cheapest if not the cheapest upgrades you can add to any system.
  Even in the consumer market falling RAM prices and changes to OS design make the hard drive pretty much a backup and long term storage medium more than anything else. I advise my customers on new builds to go ahead and let me install 4GB, because with Superfetch after a week of Windows 7 learning their usage patterns all of their apps are preloaded into RAM making launching and using instantaneous, and with suspend to RAM booting is pretty much a thing of the past. It cost less than $100 to add 8GB to mine and now everything I use is ALWAYS preloaded, making the speed just insane. Everyone that comes by the shop is always amazed at how I can launch a half a dozen apps while another 4 or 5 are doing various jobs and it is always instantaneous. But with 6GB reserved by the OS for Superfetch all the apps I use are simply waiting for me in RAM.
  So I have to agree with TFA. With the prices of RAM cheap and only getting cheaper having data you actually use often needing to swap in and out of the HDD or SSD is just nuts. And then if you have it all in RAM you can use the lower speed and less power hungry "green" drives for persistent backup instead of using SSDs which haven't come anywhere near the GB per $ ratio of spinning platters yet, although their speed is incredible. But if everything is already in RAM, do you really need to spend the crazy $$$ for the large SSD?
  
  --
  ACs don't waste your time replying, your posts are never seen by me.
4. Re:Goodbye Orwell by dintech · 2011-01-01 10:39 · Score: 1
  
  I'm a KDB developer at a large financial institution. Most banks using KDB store today's stock market data and an on disk store of everything before today. The theory goes that there is the most to be gained by manipulating the most important data in memory, namely today's data. You need the history but the speed of the on-disk partition is always going to be slower.
5. Re:Goodbye Orwell by More_Cowbell · 2011-01-01 15:13 · Score: 2
  
  I work for a large (global) web hosting company, and I'd just like to counter the 'low cost of RAM' idea... Yes, most RAM is cheap, but when you start looking at 'large data sets', cheap is a relative term.
  For example, the HP DL580 G7 can hold a Terabyte of RAM, but to do so it uses 16GB DIMMs, at $1000 each. http://h30094.www3.hp.com/product/sku/5100299/mfg_partno/500666-B21
  When you add that up, it's $64,000 just for RAM in ONE server. And we don't sell it to you, (in fact we only lease it from HP ourselves) we add a ridiculous additional monthly charge to your bill, well above what it costs us. Also keep in mind, anybody spending that kind of money has a multiple times redundant system... So, no, I would not call it 'low cost'.
  
  --
  Experience teaches only the teachable. -AH
6. Re:Goodbye Orwell by TheTyrannyOfForcedRe · 2011-01-01 17:21 · Score: 1
  
  RAM since DDR has gotten so ridiculously fast that NO SSD has a snowball's chance of catching up anytime soon, if at all,
  Apparently you haven't heard of PRAM.
  http://en.wikipedia.org/wiki/Phase-change_memory
  If PRAM doesn't pan out there are other nonvolitile as-fast-or-faster-than-DRAM technologies in the works as well.
  
  --
  "Liechtenstein is the world's largest producer of sausage casings, potassium storage units, and false teeth."
7. Re:Goodbye Orwell by antifoidulus · 2011-01-01 19:35 · Score: 1
  
  But low cost is always relative to:
  
  a: the competition and
  b: the opportunity cost sacrificed by using slower systems
  
  Even if the absolute cost is high, if it's lower than the cost of either going with a competing product or going with a slower solution, it's still cheap.
  
  --
  Monstar L
Totally inane by MrAnnoyanceToYou · 2011-01-01 06:14 · Score: 5, Insightful

Discarding data is something that, as a programmer, I don't often do. Too often I will need it later. Real time analytics are not going to change this. As long as hard drive storage continues to get cheaper, there's going to be more data stored. Partially because the easier it is to store large blocks the more likely I am to store bigger packets. I'd LOVE to store entire large XML blocks in databases sometimes, and we decide not to because of space issues. So, yeah, no. Datacenters aren't going anywhere. Things just get more complicated on the hosting side.
Note that the article writer is a strong stakeholder in his earthshattering predictions coming true.

--
My little site.
1. Re:Totally inane by hedwards · 2011-01-01 06:19 · Score: 1
  
  Indeed. Some information is useful in the short term, but most information is quite useful for long periods of time. I'm personally, in the middle of archiving my audio CDs to disk, scanning my photos and sorting my digital images. On top of that I've got emails to hang onto.
  
  The bigger issue isn't storage space, it's finding a way of keeping track of it all. Deleting the things that you don't need or aren't allowed to store beyond a certain point and keeping track of the other files you do want or need to store.
2. Re:Totally inane by fuzzyfuzzyfungus · 2011-01-01 06:21 · Score: 3, Insightful
  
  Also, it isn't really all that earthshattering. The fact that RAM is faster and offers lower latency than just about anything else in the system has been true more or less forever. Essentially all OSes of remotely recent vintage already opportunistically use RAM caching to make the apparent speed of disk access suck less(nicer RAID controllers will often have another block of RAM for the same purpsoe). Programs, at the individual discretion of their creators, already hold on to the stuff that they will need to chew over most often in RAM, and only dump to disk as often as prudence requires.
  
  The idea that, as advances in semiconductor fabrication make gargantuan amounts of RAM cheaper, high-end users will do more of their work in RAM just doesn't seem like a very bold prediction...
3. Re:Totally inane by Anonymous Coward · 2011-01-01 06:38 · Score: 0
  
  The idea that, as advances in semiconductor fabrication make gargantuan amounts of RAM cheaper, high-end users will do more of their work in RAM just doesn't seem like a very bold prediction...
  Indeed. However where the premise in the summary is incorrect (this is /. - I didn't read TFA) is that just because you're doing things in RAM doesn't mean you storing less on disc either. All you do is increase the dataset within the RAM and reduce the network traffic to the database. But all this data needs to be stored somewhere and larger working datasets generally mean a larger general data pool from which to work. Databases aren't going anywhere.
4. Re:Totally inane by Kilrah_il · 2011-01-01 06:42 · Score: 3, Funny
  
  As advances in semiconductor fabrication make gargantuan amounts of RAM cheaper, high-end users will do more of their work in RAM.
  Now you have a bold prediction.
  Sincerely,
  me
  
  --
  Whenever in an argument, remember this.
5. Re:Totally inane by Anonymous Coward · 2011-01-01 06:53 · Score: 0
  
  Yup, the author missed the point entirely. In-memory analytics is no threat to storage media (HDD, SSD, etc..). It drives more storage purchase, not less. It just makes data at rest, and data in use more clear. It's no different than tiered storage which already exists in spades today, and is not lowering any HDD sales.
6. Re:Totally inane by tomhudson · 2011-01-01 07:09 · Score: 3, Insightful
  
  Good one - except that in this case, a lot of the so-called "work" is BS, consumers are pushing against being data-mined, regulators are getting into the act, and if your business model is so dependent on being a rude invasive pr*ck, perhaps you deserve to die ...
  And the same thing will happen when revenue-strapped governments slap a transfer tax and/or minimum hold periods on stocks - something that should have been done a long time ago.
7. Re:Totally inane by davester666 · 2011-01-01 07:10 · Score: 0
  
  Well, obviously, the person believes nobody else ever said something similar.
  They probably were thinking back, and the only quote that came to mind was something about 640k being enough for everybody.
  
  --
  Sleep your way to a whiter smile...date a dentist!
8. Re:Totally inane by Kilrah_il · 2011-01-01 07:13 · Score: 0
  
  It isn't? What's wrong with you people?
  
  --
  Whenever in an argument, remember this.
9. Re:Totally inane by MrAnnoyanceToYou · 2011-01-01 07:21 · Score: 2
  
  There must be some way to solve a problem like that, where you have a series of pointers to files, if not the files themselves as well, with the ability to add markers of some kind to each of those pointers. (maybe we can call them, "Records!!!" like CD's used to be called) And then! Then! We can disguise how the management of these 'records' are organized from the user, so they don't have to think about it. And give them a simple, logical way to get data about those 'records' out of the big, organized whole. It'd be, like, a whole new basic way to store our records! We could easily find what we wanted in our basic data storage. I can't believe noone's thought of it before. ;)
  My point here isn't that you should use a database to store your data about your files, (unfortunately, a unified markup system for files doesn't exist yet; it would be nice, but all that stuff is in the OS right now) my point is that the author of the article is missing that even if in-memory data systems do become extremely large, the underlying theory of the technology will not change much.
  And the underlying theory relies heavily on caching, limiting how much of your overall dataset is currently relevant, and so on. While I will admit it's possible many databases' useful data size will eventually be outgrown by RAM-style memory storage, when that happens market forces will probably make it comparatively expensive to hold all your data in memory at once. Partially because clean, concise code is generally far more expensive to produce than sloppy crap that chews through your data storage.
  
  --
  My little site.
10. Re:Totally inane by quanticle · 2011-01-01 07:22 · Score: 3, Informative
  
  I didn't really see the author mention anything about discarding data. Rather, it seems like he's saying that existing databases (which attempt to commit data to persistent storage as soon as possible) will be marginalized as the speed gap between persistent storage and RAM widens. Instead, business applications are going to hold data in RAM, and rely on redundancy to prevent data loss when a system fails before its data has been backed up to the database.
  
  --
  We all know what to do, but we don't know how to get re-elected once we have done it
11. Re:Totally inane by Tablizer · 2011-01-01 07:24 · Score: 1
  
  Because I crave pizza, I have an italics prediction...
  
  --
  Table-ized A.I.
12. Re:Totally inane by Anonymous Coward · 2011-01-01 08:02 · Score: 0
  
  Not really... most customers don't give a shit where their data comes from and goes to, as long as they don't have to pay a subscription fee. If Facebook started asking for social security numbers and bank account numbers, people would type them in for access.
  Governments? Give me a break. Show me one government that has the spine to stand up to ad agencies, either the snarfers at the front line like Phorm, or the data-miners. Ain't gonna happen. Even the EU is running scared and has backed down, showing that they pretty much have zero interest in privacy, even though the lessons in privacy were taught very brutally during WWII.
  Oh, revenue transfer tax... also not going to happen. Especially with the Tea Party here in the US having a stranglehold on the government this year. Expect to see government just give a rubber stamp to any business practices, no matter how unethical.
13. Re:Totally inane by davester666 · 2011-01-01 08:08 · Score: 1
  
  Hey, I'm happy with my Commodore 64, but I am considering getting an Amiga.
  
  --
  Sleep your way to a whiter smile...date a dentist!
14. Re:Totally inane by Fulcrum+of+Evil · 2011-01-01 08:10 · Score: 1
  
  and a lot of it is fraud detection (say, at Visa) and large internet sites deciding what sorts of products to show you when you log in based on your purchase history/similar users' history.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
15. Re:Totally inane by hedwards · 2011-01-01 09:03 · Score: 1
  
  My point here isn't that you should use a database to store your data about your files, (unfortunately, a unified markup system for files doesn't exist yet; it would be nice, but all that stuff is in the OS right now) my point is that the author of the article is missing that even if in-memory data systems do become extremely large, the underlying theory of the technology will not change much.
  I realize that, but it's a related issue. Back in the 80s, it didn't do you a damned bit of good to know that the file was saved if you had to spend 10 hours sorting through disks to find it. In the modern era that's a much smaller concern for most people as a 1tb disk is quite affordable and there's a number of products to search it efficiently.
  
  It's something which has been talked about before. The discussion I best remember was in terms of back up systems. (Backup & Recovery if you're curious)
  
  The basic idea was to move backups from faster, but more readily accessible media to slower and harder to get media as the files got older and less frequently used. The main reason for individuals to do that is so that they've got a copy on some sort of WORM so as to make it more difficult to make fat fingered mistakes.
  
  There are a few products out there that do that, but they aren't particularly universal and I haven't personally found one that I like for my typical files. And with the rate at which disk space is expanding, most people aren't going to need them, unless they're responsible for enterprise file management.
16. Re:Totally inane by epine · 2011-01-01 09:21 · Score: 1
  
  The fact that RAM is faster and offers lower latency than just about anything else in the system has been true more or less forever.
  This is the problem when the article is so poor to begin with, if you're not careful, you're pulled down to the same inane level. Since my brain isn't working well after reading that tripe, let me add that GaAs has been faster than silicon more or less forever. OK, I'm better now.
  Let's not go too far down that road, or we'll run into the truism that the quickest man for the job is the man with the smallest dataset (and the fattest wallet).
  The more I think about that article, the further I drift away from the cognitive on switch.
17. Re:Totally inane by Belial6 · 2011-01-01 09:42 · Score: 1
  
  Then you will be interested in the company who makes them.
  
  Personally, I'm thinking about getting a C64 myself.
18. Re:Totally inane by hairyfeet · 2011-01-01 10:33 · Score: 1
  
  Hey! Quit making us greaybeards feel old! I still remember holding my first 1GB drive in my hand and thinking "How in the hell am I ever gonna use this much space?" and then in what seemed like the blink of an eye I', holding a 40GB and thinking "Now what? Even if I install every app and game I ever liked I'm STILL not gonna be able to fill this thing!". Now I have dual 500Gb drives onboard, with another 1TB USB for backups, and instead of being marveled at the space I'm just waiting for the 2TB drives to come down to yank the 500GB. And this flash stick, no bigger than a stick of gum, at 8GB has more space than my first 8 drives put together. My first flash was $100 for 64MB and I thought "Wow,dozens of floppies in my pocket! What will I do with it all?" Man times they do change. Now get off my lawn!
  
  --
  ACs don't waste your time replying, your posts are never seen by me.
19. Re:Totally inane by tomhudson · 2011-01-01 10:38 · Score: 1
  Governments? Give me a break. Show me one government that has the spine to stand up to ad agencies, either the snarfers at the front line like Phorm, or the data-miners. Ain't gonna happen. Even the EU is running scared and has backed down, showing that they pretty much have zero interest in privacy, even though the lessons in privacy were taught very brutally during WWII.
  Jennifer Stoddard, Canada's Privacy Commissioner. She's the one who forced Facebook to change their procedures the last time, and she's got them in her sights again.
  And at $11,000 per incident (page view), it would quickly send Facebook into Chapter 11.
  Especially since the last time, the Europeans quickly joined in.
  
  Oh, revenue transfer tax... also not going to happen. Especially with the Tea Party here in the US having a stranglehold on the government this year. Expect to see government just give a rubber stamp to any business practices, no matter how unethical.
  Several states and many local governments won't be able to roll over their bonds. Likely candidates include California, Nevada, New York, Michigan, etc. At that point, Uncle Sam has 4 choices:
  
  let them default - for individual states, this actually has a very high probability, since individual states cannot be petitioned into bankruptcy. They'll just pay with state-issued IOUs.
  
  guarantee their loans - with more than 40 out of 50 states with problems, this is one of those "too big to bail out" scenarios. The US credit rating is already under review - this would guarantee a quick downgrade.
  
  bail them out - yeah, with what money?
  
  Taxes have to go up. Either before a US credit downgrade, or after. Before is less painful.
20. Re:Totally inane by tomhudson · 2011-01-01 10:41 · Score: 1
  Fraud detection doesn't need microsecond timing. Fraud detection is based on good data, not "fast data"
  
  Behavioral tracking is illegal in several countries. Expect to see more governments giving advertisers a choice - stop, have all behavioral tracking stripped at the borders, be sued into bankruptcy, or just be blocked.
21. Re:Totally inane by Anonymous Coward · 2011-01-01 12:20 · Score: 0
  
  Instead, business applications are going to hold data in RAM, and rely on redundancy to prevent data loss when a system fails before its data has been backed up to the database
  Most commercial memory database offerings provide the same reliability WRT to persistant storage as a normal database. IE for successful commit all writes are flushed to disk in a memory database without exception the same as a traditional database.
  The reason you get higher performance with a memory database normal RDBMS is optimized for pulling data from spinning platters with huge seek/random read penalties. Memory databases are optimized for resolving queries from random access memory which have no such limit.
  It is NOT about the lack of persistant storage...mearly the selection of different internal data structures. Writes are still bound by the performance constraints of spinning disks or SSDs.
22. Re:Totally inane by Johnny+Mnemonic · 2011-01-01 12:24 · Score: 1
  
  And at $11,000 per incident (page view), it would quickly send Facebook into Chapter 11.
  
  Sure it would. Facebook would simply exit Canada. Users would complain, but who gives a shit about them, right? But advertisers would also complain that they don't have access to that market anymore. And advertisers are just another word for business. Stoddard may really be anti-business, but I wonder if her bosses are, or if her new bosses would be.
  
  Don't kid yourself. Facebook isn't going anywhere, not until the users stop using it.
  
  --
  
  --
  $tar -xvf .sig.tar
23. Re:Totally inane by tomhudson · 2011-01-01 12:56 · Score: 1
  
  And at $11,000 per incident (page view), it would quickly send Facebook into Chapter 11. Sure it would. Facebook would simply exit Canada. Users would complain, but who gives a shit about them, right? But advertisers would also complain that they don't have access to that market anymore. And advertisers are just another word for business. Stoddard may really be anti-business, but I wonder if her bosses are, or if her new bosses would be.
  Don't kid yourself. Facebook isn't going anywhere, not until the users stop using it.
  There are always other companies ready to fill in the gap. That's the nature of the beast, and Facebook knows it - just like they know that their user statistics are totally cooked.
  You can buy facebook followers at the rate of 5 for a penny. The only ones who would be impacted are the "social media directors" who would be shown to be totally superfluous.
  .. and that can't happen soon enough. Them and the "SEO" scammers.
  -- Barbie
24. Re:Totally inane by Firehed · 2011-01-01 13:19 · Score: 2
  
  Fraud detection doesn't need microsecond timing. Fraud detection is based on good data, not "fast data"
  Sorry, but that's just wrong. Fraud analysis on credit transactions needs to be performed extremely quickly (and payment sites that process ACH need to do that quickly as well) in order for the networks to be usable. So while it requires good data, it also needs fast data - and a lot of it. At a minimum, it often looks at the user's complete payment history, the history on that credit card (did the user suddenly change? if so, the card number was probably stolen) not specific to the user, the activity at that IP address and other IPs that user has logged in from (which may include many other users and/or cards), etc. There's a lot of work to be done in less than a second or two.
  
  --
  How are sites slashdotted when nobody reads TFAs?
25. Re:Totally inane by tomhudson · 2011-01-01 13:43 · Score: 1
  
  The actual indicators of fraud don't need micro-second timing.
  To the contrary, accepting some delay makes fraud harder.
  Example - rather than writing balances to temporary storage, then reconciling them with persistent storage, accepting a second or two while the canonical database is doing it's thing means that you can't "re-play" a credit card transaction.
  If you think you need faster than that, you're looking at the problem wrong.
26. Re:Totally inane by icebraining · 2011-01-02 07:33 · Score: 1
  
  Nepomuk (there are versions for the three main OSes) and other semantic desktop technologies are working on that. All you need is a tracker to index them and a RDF database.
  
  --
  Dilbert RSS feed
27. Re:Totally inane by chgros · 2011-01-02 12:39 · Score: 1
  
  I'd LOVE to store entire large XML blocks in databases sometimes, and we decide not to because of space issues.
  Wait, do you mean that XML takes *less* space than a database? What kind of data do you have in there? I find that a binary format gzipped in a DB is way more efficient (time and space-wise) than XML.
28. Re:Totally inane by Decker-Mage · 2011-01-02 15:23 · Score: 1
  
  Actually the approach I'm in the middle of implementing will use NexentaStor in a virtual machine with the heaviest duty search engine [probably dtSearch and/or Windows Search with all the plugins] I can lay my hands on to keep things nicely indexed. The community edition of NexentaStor is good to 18 TB which should do for a while as I'm a text junkie not a music or video junkie. It also uses ZFS so it is single-instance storage right out of the box plus additional goodness. I've been planning this for a while and the last pieces are on thier way now.
  
  --
  "[I]t is a wise man who admits the limits of his knowledge or skill, and that pretending either causes harm." --Terry Go
Questions? by Anonymous Coward · 2011-01-01 06:16 · Score: 0

Will storage, even flash drives, be needed in the future, given the requirement for real-time data analysis and current trends in design for real-time data analytics?
Of course storage will be needed in the future! It was needed in the past and it's needed in the present. What kind of question is that?

Or will storage move from the heart of data centers and become merely a means of backup and recovery for critical real-time apps?
Oy-yoy-yoy.
I'm getting another drink.
1. Re:Questions? by fuzzyfuzzyfungus · 2011-01-01 07:11 · Score: 1
  
  You'd better just bring the whole bottle. Somebody just used the world "merely" in front of the phrase "backup and recover for critical real-time apps".
  
  The remainder of the bottle will, depending on whether you work for that somebody or not, either enable a heartwarming humanitarian gesture, or be your only friend during the days of hair-raising stress and thankless toil that could strike at any second...
Re:Just dump your data into the hole by tikram · 2011-01-01 06:23 · Score: 0

Just... wow... goatse in 2011? Are you a time traveler from 1999?
The cutting edge is in high frequency trading by Animats · 2011-01-01 06:32 · Score: 5, Informative

For the cutting edge in this area, see what the "high frequency traders" are doing. Computers aren't fast enough for that any more. The trend is toward writing trading algorithms in VHDL and compiling them into FPGAs, so the actual trading decisions are made in special-purpose hardware. Transaction latency (from trade data in on the wire to action out) is dropping below 10 microseconds. In the high-frequency trading world, if you're doing less than 1000 trades per second, you're not considered serious.
More generally, we have a fundamental problem in the I/O area: UNIX. UNIX I/O has a very simple model, which is now used by Linux, DOS, and Windows. Everything is a byte stream, and byte streams are accessed by making read and write calls to the operating system. That was OK when I/O was slower. But it's a terrible way to do inter-machine communication in clusters today. The OS overhead swamps the data transfer. Then there's the interaction with CPU dispatching. Each I/O operation usually ends by unblocking some thread, so there's a pass through the scheduler at the receive end. This works on "vanilla hardware" (most existing computers), which is why it dominates.
Bypassing the read/write model is sometimes done by giving one machine remote direct memory access ("RDMA") into another. This is usually too brutal, and tends to be done in ways that bypass the MMU and process security. So it's not very general. Still, that's how most Ethernet packets are delivered, and how graphics units talk to CPUs.
The supercomputer interconnect people have been struggling with this for years, but nothing general has emerged. RDMA via Infiniband is about where that group has ended up. That's not something a typical large hosting cluster could use safely.
Most inter-machine operations are of two types - a subroutine call to another machine, or a queue operation. Those give you the basic synchronous and asynchronous operations. A reasonable design goal is to design hardware which can perform those two operations with little or no operating system intervention once the connection has been set up, with MMU-level safety at both ends. When CPU designers have put in elaborate hardware of comparable complexity, though, nobody uses it. 386 and later machines have hardware for rings of protection, call gates, segmented memory, hardware context switching, and other stuff nobody uses because it doesn't map to vanilla C programming. That has discouraged innovation in this area. A few hardware innovations, like MMX, caught on, but still are used only in a few inner loops.
It's not that this can't be done. It's that unless it's supported by both Intel and Microsoft, it will only be a niche technology.
1. Re:The cutting edge is in high frequency trading by Simon80 · 2011-01-01 06:41 · Score: 1
  
  If Intel tried to market its tools to mainstream and OSS developers (yes, open source the tools), then maybe the stuff would catch on better. They are quite capable of making stuff user-friendly for the average developer, but they only seem to market to the HPC market, because that's where the high margin CPUs sell. I think if they spent more time increasing general awareness of anything, it would be easier to get people to use them in their target markets, which would help them sell high end CPUs anyway.
2. Re:The cutting edge is in high frequency trading by Gorobei · 2011-01-01 06:48 · Score: 3, Interesting
  
  Yep, the article is 10-20 years out of date.
  HFT has been using statistical synchronization of dbs for years.
  Big financial shops switched to in-memory dbs decades ago. With co-lo on the compute farms.
  I don't know why he's even talking about 32G boxes as servers. That's a desktop, real db hosts are an order of magnitude bigger.
  His "push the disks to the edge of the network?" Um, that's already happened - it's called tier 2. Tier 1 is the terabytes of solid-state storage we keep just in case.
  This is a blast from the 1990s.
3. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 07:11 · Score: 3, Insightful
  
  There is another simple solution to optimizing HFT - just aggregate and execute all trades once per minute, with the division between each minute taking place in UTC plus/minus a random offset (a few seconds on average - with 98% of divisions being within 5 seconds either way).
  Boom, now there is no need to spend huge amounts of money coming up with lightning-fast implementations that don't actually create real value for ordinary people.
  Business ought to be about improving the lives of ordinary people. Sure, sometimes the link isn't direct, and I'm fine with that. However, we're putting far to much emphasis on optimizing what amounts to numbers games that do nothing to produce real things of value for anybody...
4. Re:The cutting edge is in high frequency trading by BitZtream · 2011-01-01 07:38 · Score: 2, Informative
  
  So I'm guessing you've never actually done any development?
  The 'byte stream' model is not from UNIX, its just the way the hardware is laid out physically.
  IPC happens in an entirely different way unless you're using something simplistic like pipes
  RDMA is pretty much a stable of high speed cluster computing, however its DMA that allows pretty much everything in your PC to work without slowing the processor down. Even your keyboard controller uses DMA to get the characters into somewhere useful.
  As far as what you're calling RDMA via Infiniband, I've seen massive clusters (some of the largest in the world) using it ... safely.
  If you think nothing uses the protections provided by the x86 family I'd like to know what shitty OS you're using? Not only does everyone actually use it on the x86, they do it in ... get this ... C! Perhaps you should take a look at a few open source OSes and notice that while there is some assembly in specific places for speed and the required lowest level libraries ... you'll be suprised by the fact that all of that memory management stuff is written in ... C and utilized by .... C programs.
  I guess you're also ignore the fact that intel and amd added more protection hardware to the x86 architecture JUST FOR VIRTUALIZATION ... I suppose you think the fact mordern hypervisors won't work without these features present is just a silly little annoyance that the software venders throw in to make us buy new hardware to pad their bank accounts?
  I'm not sure what development you do, by my standard C library uses MMX for many functions that require me to do nothing to take advantage of their speedup.
  You really have no clue do you?
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
5. Re:The cutting edge is in high frequency trading by Anonymous Coward · 2011-01-01 07:58 · Score: 0
  
  You may want to read on what HFT actually means, because you seem to think it has to do with normal trading orders - there is a reason there is an F in that acronym.
6. Re:The cutting edge is in high frequency trading by fuzzyfuzzyfungus · 2011-01-01 08:02 · Score: 1
  
  While I like your future better, I'm guessing that the real one will look more like "A solid ball of hyper-computronium wrapped around the NYSE, tended by robots and powered by a Dyson sphere capturing the entire output of the sun"...
  
  Sure, the only surviving life forms will be extremophilic bacteria in the wastelands and investment bankers in the Suburbidomes(tm); but think of how high the GDP per capita will be!
7. Re:The cutting edge is in high frequency trading by Bill,+Shooter+of+Bul · 2011-01-01 08:25 · Score: 2
  
  You really do not understand the domain in question. The whole idea behind hft is to analyze real time data and make a near instantaneous stock trade that capitalizes on that data analysis *before* anyone else does. Waiting a second is too long in this case. The value they add to their customers: Cold hard cash. The value to the stock market: liquidity (fair argument if its too much liquidity).
  
  --
  Well.. maybe. Or Maybe not. But Definitely not sort of.
8. Re:The cutting edge is in high frequency trading by LordNacho · 2011-01-01 08:26 · Score: 1
  
  Are you sure this won't simply create a different game?
9. Re:The cutting edge is in high frequency trading by Gorobei · 2011-01-01 08:45 · Score: 1
  
  Right. We can have the banks just trade once a minute or once a day.
  End users can go back to using Travellers Cheques: sure you spend a few hours of your foreign vacation either getting ripped off or waiting in line at a bank, but hey, at least global trading is now leisurely.
  Stocks are just as good: you paid 3% to trade, but hey, it's a long term investment!
  Commodities? You need a supply of tin? Just buy a tin mine.
  People proposing slowing down trading speeds are like people proposing slowing down computer clock speeds. Sure, you save some energy, but so what? Everyone has to use a 6502 based iPad because you think that would be better?
10. Re:The cutting edge is in high frequency trading by Anonymous Coward · 2011-01-01 09:24 · Score: 0
  
  What is the criteria of better?
  I think the parent was proposing something that attempts to minimize the size of the financial sector of the economy while preserving the benefits of what they do.
  Yes, we can probably move from trading once a day to trading once a minute to trading once a millisecond to trading once a microsecond to to trading once a nanosecond to trading once a pciosecond, but how is the welfare of society improved by this change?
11. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 09:34 · Score: 1
  
  Actually, I'd prefer once per day at midnight, with a blackout on company announcements after 5PM. That would go even further towards leveling the playing field.
  What value does a bot generate when all it does is capitalize on the tiniest fluctuations in stock price. It isn't like it makes the stock any more efficient - the price would certainly adjust itself. The only difference is that some investment bank can't make a fortune solely based on its ping time.
12. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 09:36 · Score: 1
  
  Yeah, I know exactly what it is. My proposal basically is to get rid of it by making it useless. It provides no real benefit to the economy, so nobody will be hurt if it goes away...
13. Re:The cutting edge is in high frequency trading by Gorobei · 2011-01-01 09:42 · Score: 1
  
  I thought I gave some examples - FX, equity, commodity prices get better as frequency increases.
  Less cost and fuss for consumers and importers/exporters, etc. A few people spend their lives making prices tighter, and millions of people get better prices on vacations, on their mortgages, etc. Why begrudge them for pocketing a few percent off the top?
  International trade on high-tech products becomes possible: you can get a firm offer on 20 inputs you need in 1 hour. In the old days, that level of co-ordination was impossible - you had to BUY the suppliers.
  I take "better" to mean "you get better stuff at a better price."
14. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 09:46 · Score: 1
  
  Uh, I understand exactly what it is, and who benefits, which would not be the economy at large.
  The point in aggregating trades is to entirely negate the advantage of HFT, thus eliminating it from the market. It isn't like there wouldn't still be liquidity - you'll just have to wait 1-2 minutes to have an order filled. The average person making a trade usually has a lag of hours between an event happening and getting to make a trade anyway.
15. Re:The cutting edge is in high frequency trading by Gorobei · 2011-01-01 10:10 · Score: 1
  
  So you would be happy if Google could only adjust its search algorithm once a day? It would be a more level playing field, and then search companies couldn't make a fortune based solely on their ping times.
16. Re:The cutting edge is in high frequency trading by Bill,+Shooter+of+Bul · 2011-01-01 10:21 · Score: 1
  
  Oh, so your solution to the technical problem is to get rid of the industry which experiences it?
  Ok, I guess. I'm really more here on slashdot to discover some sweet techniques for solving immensely difficult technical problems.
  I didn't get that from your first post. Maybe because you started out with the technical part? don't know exactly. I'm not knowledgeable in the field of trading to make an intelligent comment about the result of banning HFT. The market does need liquidity, that much I do know.
  
  --
  Well.. maybe. Or Maybe not. But Definitely not sort of.
17. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 11:07 · Score: 1
  
  The market does need liquidity, that much I do know.
  The market had plenty of liquidity before the invention of HFT. I'm just suggesting limiting liquidity to a few minutes, rather than a few nanoseconds. Will it really hurt the economy if it takes a stock 10 minutes to plunge 50% rather than a few seconds, with only a few big well-connected institutions getting out in time?
  I'm all for technology that solves real-world problems. However, HFT is a case of where technology and a lack of regulation has actually created real-world problems. Improving HFT actually makes those problems worse.
18. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 11:14 · Score: 1
  
  Yes, but Google's search algorithms help ordinary people find information they need, and they help real business that produce real things to do so more efficiently, which makes the cost of everything you consume a little cheaper.
  A better HFT algorithm just ensures that some big banker makes a few hundred million more dollars at the expense of any ordinary person who has a retirement account.
  I have nothing against progress. However, most of the financial industry just shuffles numbers around manufacturing money out of nothing, and occasionally turning money back into nothing in astronomical quantities. Did you notice how gas prices plummeted from $4/gallon to about $2.50 in a few weeks after the hedge fund meltdowns? Now, tell me how much value all those funds trading oil futures were creating?
  I have nothing wrong with financial instruments that actually create more efficient markets. If an airline needs to buy 50 million gallons of fuel next year I'm fine with them hedging the price of oil to keep their ticket prices stable. My problem is when the market in the actual commodity becomes secondary to playing financial games. The oil market should be about running cars, or environmental controls, or whatever - not about 100 day traders making $40 on the trade of a $50 barrel of oil.
19. Re:The cutting edge is in high frequency trading by SuricouRaven · 2011-01-01 11:37 · Score: 1
  
  The old PS2 keyboards used interupts, not DMA. USB I'm not sure about.
20. Re:The cutting edge is in high frequency trading by Gorobei · 2011-01-01 12:30 · Score: 1
  
  99% of the fuel market is not about day traders scalping a dollar or two on a few thousand barrels of oil. It's more like:
  1. Geeks building code to track every tanker, tender, barge, pipe, and hub in the world to estimate oil availability.
  2. Traders yelling "lease me a tanker" and having people on call to figure the time and cost to get it moving oil from A to B.
  3. Full time meteorologists predicting short-term weather.
  4. Geeks building models based on the above.
  5. Geeks pricing out the cost of refineries, catalytic crackers, etc, to figure how to optimize profits.
  This is a multi-billion dollar industry, not a few day-traders making bets in their pajamas.
  It's not surprising that the experts in the field make a lot of money.
21. Re:The cutting edge is in high frequency trading by aaarrrgggh · 2011-01-01 12:33 · Score: 1
  
  The liquidity HFT provides should be at arbitrage margins, not the insane profits the players are making. If it makes sense at 0.001%, then go for it. At 0.1%, they are raping the system for the 'value' they provide.
22. Re:The cutting edge is in high frequency trading by Anonymous Coward · 2011-01-01 13:20 · Score: 0
  
  ignoring your weird mixing of ping time and search algorithms,
  the difference is that google would be screwing competitors;
  the big investment banks are screwing anyone who invests
  who doesn't have a data center in manhattan.
  i think the right question to ask is, should we prevent entities
  with special positions in the market (insider knowledge, many
  orders of magnitude faster data) from using this sort of advantage.
  in the past the answer has generally been "yes".
23. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-01 13:57 · Score: 2
  
  Then, why did the price of gasoline drop $1.50 in a few weeks from record highs when the hedge funds dried up?
  I am sure that lots of effort goes into the logistics of oil distribution, etc. That is all effort well-spent.
  The part I don't like is when people buy oil futures speculating that prices will rise without any intention to take delivery of the oil. That just results in people bidding up the price.
  I'm certainly not the only one suggesting that needless speculation drives up the cost of commodities. Just look at housing prices with the mortgage bubble/etc.
  Again, my issue isn't with markets - it is with people buying derivatives purely on speculation, without any interest in actually dealing with the product that is being tracked. Corn growers who want to hedge the value of their crops is fine. Airlines that want to hedge the value of oil is also fine. However, this should be limited to the value of the actual material being traded.
24. Re:The cutting edge is in high frequency trading by carnalforge · 2011-01-01 16:54 · Score: 2
  
  [...]
  More generally, we have a fundamental problem in the I/O area: UNIX. UNIX I/O has a very simple model, which is now used by Linux, DOS, and Windows. Everything is a byte stream, and byte streams are accessed by making read and write calls to the operating system. That was OK when I/O was slower. But it's a terrible way to do inter-machine communication in clusters today. The OS overhead swamps the data transfer. Then there's the interaction with CPU dispatching. Each I/O operation usually ends by unblocking some thread, so there's a pass through the scheduler at the receive end. This works on "vanilla hardware" (most existing computers), which is why it dominates.
  This is true. Though you're underestimating "modern" os's. Though, think of it as defensive planning. Who knowed ~20+ years ago that we would have solid state disks? Who knowed we would have 10GB NICs? SATA?
  But the foundamental design of IO streams works and is easily adapted on new devices. Add on that the simplicity of /dev and all the concept of input and output in UNIX. Think about it.
  
  [...]
  The supercomputer interconnect people have been struggling with this for years, but nothing general has emerged.
  RDMA via Infiniband is about where that group has ended up. That's not something a typical large hosting cluster could use safely.
  Add to that fibrechannel. And NUMA is an old and tried technology.
  
  Most inter-machine operations are of two types - a subroutine call to another machine, or a queue operation. Those give you the basic synchronous and asynchronous operations. A reasonable design goal is to design hardware which can perform those two operations with little or no operating system intervention once the connection has been set up, with MMU-level safety at both ends. When CPU designers have put in elaborate hardware of comparable complexity, though, nobody uses it. 386 and later machines have hardware for rings of protection, call gates, segmented memory, hardware context switching, and other stuff nobody uses because it doesn't map to vanilla C programming. That has discouraged innovation in this area. A few hardware innovations, like MMX, caught on, but still are used only in a few inner loops.
  At the cost of my mood points or whatever, now i call bullshit.
  Rings protection? Used at least in linux.
  Call gates? You mean Sysenter? Used at least in linux from ~2002 if im not wrong
  Segmented memory? Hello 32bits? Is that what you mean? Correct me if im wrong, but i thought it was a thing of the past.
  Hardware context switching? You mean VMX (AMD) or SVM (Intel) ? At least on Linux those instructions are used.
  C is the limiting on this? Please.
  MMX? SSE/2 etc?
  gcc -mmmx -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2
  (talking about gcc because that is what i know, though im sure other compilers cane use those instructions too)
  
  It's not that this can't be done. It's that unless it's supported by both Intel and Microsoft, it will only be a niche technology.
  yep right.
  
  --
  :wq!
25. Re:The cutting edge is in high frequency trading by pla · 2011-01-02 03:26 · Score: 0
  
  The value they add to their customers: Cold hard cash. The value to the stock market: liquidity
  
  One problem there - Moving "cold hard cash" around doesn't create value. It makes parasites fatter.
  
  And in some cases, it can destroy value. When people around the world starve to death as red spring wheat sits rotting in storage, the system has a serious problem.
26. Re:The cutting edge is in high frequency trading by kartracer_66 · 2011-01-02 03:48 · Score: 1
  
  I think what you're suggesting is a having a call auction every minute. There may be exchanges that do this already, but there'd still be advantages to being high frequency (i.e. waiting until the last possible nanosecond to submit your order and take advantage of whatever you can find in the order book before the matching engine does its thing...or submitting early if you have other information, there is little liquidity on one side or the other and the time/size order priority is in play).
  I think the HFTs are pretty easy scapegoats these days, but on closer examination, any criticism of them is a criticism of capital markets in general. People/robots/algorithms with more information are always going to outperform the retail investor. If you really want to curb HFTs a transaction tax is the only semi-effective thing I can think of.
27. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-02 07:38 · Score: 1
  
  Well, my proposal to randomize the exact time that trades are executed was intended to accomplish the purpose of preventing last-nanosecond orders from coming in.
  Insider trading will always be a problem that has no technical solution. However, even lower-frequency trades, like once a day, might help equalize access to the markets.
28. Re:The cutting edge is in high frequency trading by h4rm0ny · 2011-01-02 09:39 · Score: 1
  
  Then, why did the price of gasoline drop $1.50 in a few weeks from record highs when the hedge funds dried up?
  That question is clearly one designed as a counterpoint, but only to those who know what your supposed answer would be. For those of us not well up on the markets, can you explain what significance this has / you believe it has. (Not sarcastic - genuinely ignorant person here).
  
  --
  
  Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
29. Re:The cutting edge is in high frequency trading by Rich0 · 2011-01-02 14:07 · Score: 2
  
  Simple - the kinds of people who were up to their eyeballs in hedging the price of oil futures, were also up to their eyeballs in hedging the prices of real-estate, mortgage-backed securities, and credit-default swaps. They lost their shirts, and for a little while they couldn't afford to keep buying oil futures. Suddenly the price of oil plummeted tremendously, and now ordinary people who buy oil for the purpose of actually burning it and not trading it can afford to do so.
  Derivatives can serve a legitimate purpose in stabilizing markets. However, they are out of control today and if anything tend to destabilize markets as a result.
30. Re:The cutting edge is in high frequency trading by Animats · 2011-01-02 17:36 · Score: 1
  
  I usually don't reply to people this stupid. But it's a slow night.
  The 'byte stream' model is not from UNIX, its just the way the hardware is laid out physically.
  No, the hardware isn't laid out that way. The byte stream model is a software-implemented convenience to hide things like disk blocks and packet sizes. There's overhead associated with that, in several senses. You usually have to impose some protocol on top of the stream just to define the boundaries between items. There have been non-UNIX systems where files were record-oriented, rather than stream-oriented. UDP is transaction-oriented, although the transactions aren't reliable. QNX messaging is transaction-oriented and reliable, yet a transaction maps to one network packet if possible.
  RDMA is pretty much a stable of high speed cluster computing, however its DMA that allows pretty much everything in your PC to work without slowing the processor down. Even your keyboard controller uses DMA to get the characters into somewhere useful.
  Actually, DMA is part of the problem. The trouble is that most DMA is applied to real memory addresses; it doesn't pass through the MMU. This is a historical artifact of the minicomputer and PC world, which for cost reasons didn't have channel controllers like mainframes. As a result, DMA has to be managed by the OS. On IBM mainframes, the OS could give an application hardware access to a dedicated raw device, and that didn't allow the application to write outside its process, because the channel controller's memory mapping was set up by the OS.
  Now that transistor counts in I/O controllers are no longer an issue, it's worth rethinking this. With the right hardware, two applications on different machines should be able to communicate safely without OS intervention.
  As far as what you're calling RDMA via Infiniband, I've seen massive clusters (some of the largest in the world) using it ... safely.
  Current Linux support for RDMA exists but has problems. RDMA and paging do not play well together. There's a proposal to put support for something like that into Linux, but it's really ugly. It's called "ummunotify", which is intended to notify processes when their MMU state is being changed by the kernel. This is so they can coordinate with the other machine that has RDMA access into their address space.
  Personally, I think it's time to get rid of paging. Historically, paging systems at best yield the effect of having twice as much RAM, and RAM is so cheap today as a fraction of system cost that it's a nonissue. If you don't have to worry about page fault delays, performance is far more repeatable.
31. Re:The cutting edge is in high frequency trading by h4rm0ny · 2011-01-02 22:59 · Score: 1
  
  Suddenly the price of oil plummeted tremendously, and now ordinary people who buy oil for the purpose of actually burning it and not trading it can afford to do so.
  Which is a good thing! I get your point now, thank you.
  
  --
  
  Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
32. Re:The cutting edge is in high frequency trading by Anonymous Coward · 2011-01-04 12:15 · Score: 0
  
  Some zeroes and ones are faster than others, news at 11...
Terabyte RAM? by sunderland56 · 2011-01-01 06:35 · Score: 1

Even a single consumer hard drive is a terabyte of storage.... how many servers at any cost have a terabyte of RAM?
1. Re:Terabyte RAM? by Simon80 · 2011-01-01 06:47 · Score: 1
  
  I think you're missing the point. If the data is analyzed in a single pass as it is received, 1TB of RAM is not necessary.
2. Re:Terabyte RAM? by Anonymous Coward · 2011-01-01 06:47 · Score: 0
  
  http://www.oracle.com/us/products/servers-storage/servers/x86/sun-fire-x4800-server-077287.html
  I think will provide a terabyte of RAM if you can afford Oracle's prices..
3. Re:Terabyte RAM? by Anonymous Coward · 2011-01-01 06:59 · Score: 1
  
  I think you are missing the point here. If the data to analyze is so small, then why the fuss? If the data fits in memory, leave it in memory, if not, store it and retrieve it later. Guess what, the place to store your data is probably a database with storage attached. Unless of course, you are one of those young kids (disclaimer, I'm 28), that reinvent the wheel all the time and write that part themselves, because databases are out.
  So, lets say to analyze your incoming data of size 1MB, you also need to reference 100MB of other data. Fits in memory, right? Perfect. Now lets say your incoming data size is 10MB and you need the other 5TB of data to properly analyze it. Unless you have that much RAM, you need to store that data somewhere. Probably a database ... blah blah, see above... However, if your data of size 100MB is incoming and you don't need reference data, well, fits in RAM, analyze it right away and store the data in your database and forget about it, as you won't need to reference it later.
  It's just BS... either your working set is large enough to fit in memory or it is not. There are two things you can do, buy enough memory to fit it in there or store and retrieve when necessary. Database caches takes care of "hot" data.
4. Re:Terabyte RAM? by fuzzyfuzzyfungus · 2011-01-01 07:00 · Score: 2
  
  1TB is still in the realm of rather specialized; but 512GB systems(while not inexpensive) are actually pretty available. A quick glance at Dell shows that(even without the benefits of a rep, volume pricing, or any sort of negotiation), a 2U R815 with 512GB of RAM can be yours for a hair under $40,000. Kitted out with the specs you actually want, of course, it might run you another $20k above that. If AMD isn't your flavor, the intel-based but otherwise similar R810 will run five to ten thousand more than the R815 with otherwise similar options...
  
  At those prices, I'd venture to say that Flash still has a reasonably bright future ahead of it in the high-speed/low-latency storage market(not to mention the volatility issue); but(especially if your problem can handle being broken up across multiple systems with only modestly fast interconnects) the cost of enormous amounts of RAM has dropped pretty significantly.
  
  Now, if you can't deal with the limitations of commodity cluster interconnects, and have to have more than a half terabyte of RAM in a single memory space, I get the impression that your options get more expensive pretty fast. Phrases like "up to 16TB shared global memory" and "single system image", are generally your cue to hold on to your wallet and run... If that is what you want, though, you can buy it.
5. Re:Terabyte RAM? by Rich0 · 2011-01-01 07:15 · Score: 1
  
  the cost of enormous amounts of RAM has dropped pretty significantly
  Uh, your example was 512GB, and you're comparing $40k for RAM to about $40 for a hard drive. That's around 1000:1!
  Sure, RAM is only getting cheaper, but so are hard drives. A few years ago I got 2GB of RAM for about the same price as 320GB of hard drive. So, if anything the relative cost of RAM has gone UP, and not down...
6. Re:Terabyte RAM? by fuzzyfuzzyfungus · 2011-01-01 07:27 · Score: 1
  
  Oh, RAM isn't even close to HDDs, no is there any reason to expect that it will ever be, if you care about storage space. Only if latency and IOPs are at issue does RAM become a relevant competitor. When it comes to I/O operations, particularly highly random ones scattered across the storage area, RAM will(unsurprisingly, given what its name stands for) absolutely wipe the floor with anything with moving parts. To even touch the I/O performance, you would probably be talking multiple racks jammed full of top of the line 15kRPM monsters(a proposition unlikely to be achieved for $40k...)
  
  Plus, while the actual hardware is of pretty niche interest, it is pretty impressive(looking at the history of component costs and sizes in computing) that you can now get a half-terabyte of RAM, in a package that a single person of average strength can move, that will run from reasonably ordinary household wiring, for approximately the US per-capita GDP.
7. Re:Terabyte RAM? by imsabbel · 2011-01-01 07:42 · Score: 1
  
  We bought a machine for FEM a few weeks ago (there was budget left for 2010).
  4*12 core opteron, 256GByte ram. 12k.
  Which is peanuts, pretty much.
  So i have little doubt that 1TB Ram is quite affortable nowadays if you have big-iron level money available.
  
  --
  HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
8. Re:Terabyte RAM? by Tablizer · 2011-01-01 07:45 · Score: 1
  
  At those prices, I'd venture to say that Flash still has a reasonably bright future
  Unlike your puns ;-)
  
  --
  Table-ized A.I.
9. Re:Terabyte RAM? by fuzzyfuzzyfungus · 2011-01-01 07:51 · Score: 2
  
  That one wasn't even intentional, unfortunately. My love of puns has, apparently, seeped directly into whatever part of my brain is responsible for day-to-day verbal and written work...
10. Re:Terabyte RAM? by Anonymous Coward · 2011-01-01 12:58 · Score: 2, Interesting
  
  I think, perhaps, that you're missing the point, at least of the article. It has nothing to do with whether to store information in memory or in the database and everything to do with the current trend of using dedicated analytics products (i.e. OLAP) to do data analysis. Whereas we used to use the same relational databases to store, retrieve and analyze all data with SQL as the Swiss Army knife that enabled it all, we're moving towards a model where the relational database is responsible for storage and retrieval of information only and dedicated analytics products have their own cache of the information for reporting and analysis purposes.
  The point is that relational databases are being marginalized and one of their major selling points (i.e. the ability to analyze data based on the relationship between different types of data) is increasingly less relevant. Once you're limiting your RDBMS usage to simple CRUD operations, the rationale for choosing an RDBMS (especially an expensive one like Oracle and its ilk) over NoSQL options or open source databases with limited support for power-user options starts to disappear. MySQL may lack a lot of the features that experienced DBAs consider mandatory, but it can do INSERTs, UPDATEs and DELETEs as well as anything and it has no problems with SELECTs based on keyed columns. Similarly, Casandra, Voldemort and such can also easily support that limited subset of functionality.
  That is why RDBMSs are becoming marginalized. Applications are increasingly being designed to either avoid an RDBMS back-end or to use it as simple "dumb" storage and rely on a separate analytics product to accomplish all the complicated logic that previously would be accomplished with complicated SQL and stored procedures. Beyond that, OLAP concepts allow the data-mining interface to require less development effort. It's simple to write an interface around (an) OLAP cube(s) and allow the user to choose the dimensions and measures and allow the user to pivot, drill-down and such. In fact, most analytics products do this stuff out of the box without any development necessary. With a SQL database, an interface needs to be created that will translate the user's instructions into SQL, which can often become very complex and requires significant effort to ensure that the resulting SQL will perform well.
  This isn't about RDBMSs becoming unnecessary, it's about them now being best served in a much more limited role than they've previously occupied in the application architecture.
11. Re:Terabyte RAM? by Anonymous Coward · 2011-01-01 23:35 · Score: 1
  
  All valid points, but where in the article do you see all this? "All" the article talks about is analytics products using memory instead of storage, as the I/O path is too slow.
  There's nothing about SQL in there at all. And the I/O problem is nothing "new", if your data is too large, buy better computers or be more intelligent about using what you have. If you are Google and want an up-to-date index to be competitive, well you buy that hardware. If you are not and it's OK to take a full day to analyze your data, you don't buy that hardware and keep the old hardware.
  Using OLAP or not has nothing to do with the hardware side. Guess what, SQL operations are blazingly fast on a memory only table and can take forever on slow disk drives. Using an OLAP product with a SQL backend on slow disks and no caching is just as slow.
Funny by roman_mir · 2011-01-01 06:40 · Score: 1

It's funny that only today I chatted with some folks on the PostgreSQL IRC support channel about this, asking whether it is at all possible to have 2 postmasters running at the same time, one to do in memory SQL against an all-in-memory database, and the other to write to the database (and no, they think that it is not possible to have 2 postmasters talking to the same database this way, they believe it will corrupt the data). The suggestion was just to increase shared_buffers and file system block buffer size. I am thinking that maybe also it's useful to try and set up the streaming replication (xlog shipping) to another PostgreSQL database store/instance and use the other database as read only, then increase shared_buffers and OS disk block buffers.
Don't really know whether there is any significant advantage of one approach over another (except for having 2 databases of-course, so they become spares.)

--
You can't handle the truth.
1. Re:Funny by Anonymous Coward · 2011-01-01 07:15 · Score: 0
  
  If you are running out of memory capacity on your machine, you are running out of memory capacity on your machine ... it doesn't matter whether you have 2 postmasters competing for the memory on one machine or just give it all to one postmaster, right?
2. Re:Funny by roman_mir · 2011-01-01 07:20 · Score: 1
  
  But but but, you are missing the point. Can 2 postmasters access the same disk, one to read from it only and the other one to do writes?
  If that was possible, then 2 postmasters could be on one machine, each on its own processor/memory or on 2 machines with the data directory mapped to both. The answer from the PostgreSQL guys in the IRC channel was that it's not possible, because all postmasters end up writing SOMETHING to the data directory, maybe those are just XLOGs, but they will write something and will screw each other up.
  That's why the answer to this question is to replicate the database and have one for read only with huge RAM and the other for writing, and stream-replicate the write DB to read only DB through XLOG file pushing.
  
  --
  You can't handle the truth.
All well and good until... by dg41 · 2011-01-01 06:45 · Score: 1

This is all well-and-good until someone accidentally knocks out the power. Then all of that stuff needs recomputed if it's not stored to disk.
Re:Just dump your data into the hole by Anonymous Coward · 2011-01-01 06:50 · Score: 1

Are you really sure you want them to come up with something new?
Can we please stop already? by mwvdlee · 2011-01-01 07:06 · Score: 5, Insightful

I'm getting sick and tired of hearing about yet another hype in IT-land where everything has to be done in yet another new way.
All developers understand that different problems require different solutions. Will the managers who shove this crap up our asses please stop doing so? It's not productive, you're not going to get a better solution by forcing it do be implemented in whatever buzzword falls of the last bandwagon of an ever-growing parade of buzzwords.
"In-memory analytics" is what we started out with before databases, and guess what; it's never gone away. We've never stopped using it. Now just tell us what problem you have let us developers decide how to solve it.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
1. Re:Can we please stop already? by macslas'hole · 2011-01-01 07:14 · Score: 1
  
  Exactly, "in-memory analytics" sounds like more marketing BS, just another way to sell some unneeded software or service.
  
  --
  Life's a tale told by an idiot, full of sound and fury, signifying nothing.
2. Re:Can we please stop already? by Desert+Raven · 2011-01-01 07:24 · Score: 3
  
  Agreed, someone comes up with something new to solve a very specific issue, and all of a sudden someone's predicting how it will completely replace everything else in the next month.
  Grow up.
  Physical storage and relational databases aren't going anywhere anytime soon. in-memory this and non-relational that are all well and good for the specific problems they were designed for, but physically stored and relational data fits the needs of 90% of data storage and retrieval. I sure as HECK don't want my bank storing my financial data purely in memory.
  So keep yelling to yourselves about how the sky is falling on traditional techniques. Meanwhile the rest of us have real work to do.
3. Re:Can we please stop already? by AllenNg · 2011-01-01 08:47 · Score: 2
  
  I think you're missing a few evolutionary pieces. Most data analytics systems that I'm aware of are not currently relational. Long ago, the data lived in memory, but memory was expensive, so everything was moved to disk. The relational model added the formalisms of normalization (to cut down on space, among other reasons), but the types of multi-dimensional queries used by the analytics apps required too many joins for this to work. So the data was de-normalized (eg. OLAP) to improve performance. As memory prices came down, people started putting the OLAP indexes and aggregates into memory to get a performance boost. Moving the data back to memory and returning to a normalized, relational model isn't so much "drastic new thing" as it is "logical next step". For me, the upsetting thing is that just as I'm getting good at the data warehousing thing, it seems we're going to be switching to being relational again.
4. Re:Can we please stop already? by pinkushun · 2011-01-01 08:59 · Score: 1
  
  It also ticks me off how they redesign these existing practices, to the point where they stop making sense, and you have to relearn the new and better (read: rephrased) technology. Almost like they want you to rewrite all those tests...
  CAPTCHA: KISSUBAI - keep it simple stupid, unless buzzwords are involved
5. Re:Can we please stop already? by Anonymous Coward · 2011-01-01 14:16 · Score: 0
  
  No, in-memory analytics is a vague reference to OLAP solutions which really do offer a new-way of doing things. OLAP allows developers to give users a ton of power to choose dimensions/measures, drill-down and pivot by simply creating the OLAP cube for the data their application creates. Beyond creating the cubes, the out-of-the-box tools do everything else. This is a lot less effort that creating an interface on top of a SQL database.
Re:Just dump your data into the hole by fuzzyfuzzyfungus · 2011-01-01 07:09 · Score: 1

Dear internet: Set your photoshops to "Goatse Tron Guy" and you will glimpse mankind's unutterably horrible future!
In Memory Analytics... by Anonymous Coward · 2011-01-01 07:23 · Score: 0

...are not usually applied to "big data". I'm not sure what technologies are being referred to, but a few billion rows is the limit to what I've seen. This is NOT what I would call "big data".
Free "in memory" analytics app Qlikview by egork · 2011-01-01 07:24 · Score: 1

Download a free (as in the beer) app http://www.qlikview.com/us/explore/experience/free-download and see for yourself what current commercial software can do. I load as much as a hundred GB into the RAM for analytics with this application. Just keep in mind that star schema is the best for this software. Get your tables from an existing database as flat files, load them "as is" and start analysis immediately.

--
...a stunned silence fell upon the hall.
But puting data in system ram = harder reboots by Joe+The+Dragon · 2011-01-01 07:36 · Score: 1, Interesting

But puting data in system ram = harder reboots as you need to dump it to a disk. Also what about UPS's you need one that has the power to last for the time it takes to do that as well.
1. Re:But puting data in system ram = harder reboots by TooMuchToDo · 2011-01-01 08:52 · Score: 1
  
  And god forbid your system halts and you lose any data you haven't already committed to persistent storage.
2. Re:But puting data in system ram = harder reboots by Anonymous Coward · 2011-01-01 11:20 · Score: 1
  
  You should be using stable operating systems and diesel backups. You should also be using clusters with the same data so a loss of one system isn't catastrophic.
3. Re:But puting data in system ram = harder reboots by Joe+The+Dragon · 2011-01-01 12:46 · Score: 2
  
  what help is diesel when the main power room with Transfer Switch is on fire and the UPS don't have the power to run the systems for a long time as they are setup just to be there for the time it's takes for the diesel to start up.
I know am being your stereotypical anarchist but.. by Nrrqshrr · 2011-01-01 07:46 · Score: 2

Decentralization is the way.
Hell no by HalAtWork · 2011-01-01 07:47 · Score: 1

You will always want that data so you can manipulate it in some other manner that wasn't taken into account by the in-memory analysis, or even the scope of your project. These marketing blokes sure like to seize the day, don't they?

--
Twinstiq, game news
Re:Just dump your data into the hole by Anonymous Coward · 2011-01-01 07:54 · Score: 0

How do I do my restores from that? All I seem to find are core dumps, and remnants of memory leaks.
It's a matter of use and optimisation. by rawler · 2011-01-01 08:04 · Score: 1

Hard-drives aren't really as slow as people think. The problem is that mechanical hard-drives is slow on seeking, but if seeking can be eliminated, you can quite easily saturate your CPU on even a moderately complex calculation.
Case of point: http://www.youtube.com/watch?v=WQw7c-PliB4
1. Re:It's a matter of use and optimisation. by SuricouRaven · 2011-01-01 11:45 · Score: 1
  
  Mechanical drives can sustain a read of between 50MB/s and 80MB/s, depending how much you want to spend.
An addition not a replacement by McDee · 2011-01-01 08:28 · Score: 1

In-memory data storage is fine as long as it isn't primary data storage. Yes it's faster but there are a lot of downsides as well. The most important is that it isn't easy to share between servers (a close second is that it's hard to replicate to a remote site for disaster recovery purposes) so each server needs to have its own copy of the data and there needs to be some way of keeping all that data in sync.
The alternative is to have good old "traditional" storage sitting where it always sits and when the servers boot up or start their processing they load in the appropriate data set from the storage in to memory. This gives you all of the benefits of the fast in-memory processing without worrying about all of the downsides you create by using it as primary storage. So the memory isn't storage, it's cache.
So the real battle that will take place is not between hard disks and memory, it will be between RAM and SSDs.
Open-Source VoltDB by geoffrobinson · 2011-01-01 09:31 · Score: 1

I believe VoltDB (http://voltdb.org) uses in-memory and MPP if anyone is interested in giving it a test-spin. It's from Michael Stonebreaker of various databases (Ingres, Vertica, etc)
They've been doing a number of presentations on the topic you can probably find on the site.

--
Except for ending slavery, the Nazis, communism, & securing American independence, war has never solved anything.
Global-scale analytics != standard IT load by drdrgivemethenews · 2011-01-01 09:31 · Score: 2

Although TFA doesn't say so explicitly, I think it's talking about the race to get the best targeted advertising analytics in place for global applications like eBay, FB etc. These applications don't have the same database requirements as traditional business apps. It makes sense to talk about new ways of doing things for them, but TFA's author and a lot of other people make the mistake of thinking or implying that these new techniques will apply directly to traditional business apps as well. Sorry, not.

----------

Happy New Year, may it suck less for ya than the last one.
map your data by wrench+turner · 2011-01-01 09:54 · Score: 1

Most OS's and programming languages will let you map your memory data structure to a contiguous disk file so your disk IO is performed at paging speeds. The file system is only touched when the file is mapped (opened). Your system can then be configured to chose to what degree your data is in memory vs. disk.
Heard it all before by rrohbeck · 2011-01-01 10:04 · Score: 1

Remember when the first 64-bit machines became commercially available?
"zOMG, now we can keep whole databases in RAM with the 4GB limit gone!"
This is just CS101. Memory hierarchy - you keep your data in the fastest memory it'll fit in (that you can afford.)
Now we can afford more RAM so we can do more per unit time because we don't have to wait for IO. Duh.

--
thegodmovie.com - watch it
Re:I know am being your stereotypical anarchist bu by hazem · 2011-01-01 11:14 · Score: 2

Decentralization is the way.
If you're a consultant and find a client working in a centralized way, you sell decentralization as the way to solve all their woes. If you find them working in a decentralized way, you sell them on centralizing to solve all their woes.
There are only two constants here: 1) every business has woes, regardless of structure; 2) consultants extract lots of value by shifting those woes around
Re:Just dump your data into the hole by SuricouRaven · 2011-01-01 11:28 · Score: 1

Your plan failed! I was curious if that site still exists, so I defocused my eyes before looking. All I saw was a vague blur with a red blur in the middle.
And along w/the rest of the "Duh's" by FlyingGuy · 2011-01-01 13:39 · Score: 1

WTF is SoulSkill still drunk?
This is SO nothing new, nor is it even interesting.
In memory DB's are nothing new, they are simply prone to failure and this is why hardware storage be it spinning drives or Flash will always be around.
All it takes is one hiccup by the memory logic or an interrupt controller or DMA channel and all your in memory data is toast forcing a reload from the last checkpoint which can take quite a while when you are talking say a terabyte of information.
Clifford Hersh and Jeffery Spirn coked up the ANTS database a few years back. It was BLAZINGLY fast. It outran all of them, including Times Ten and it never got any traction and it was a fully in memory database.

--
Hey KID! Yeah you, get the fuck off my lawn!
Re:"pwufessuh haiwypheet" of ITT Tech BLOWN AWAY 6 by Kilrah_il · 2011-01-01 16:43 · Score: 1

You are one sick puppy! I mean, you had an argument about a fucking HOSTS file and you didn't agree. What do you do? Do you go back to your private rwal-world life and ignore the other person's comment? No, you find out when he posts regarding a completely unrelated topic and flame him there.
Get a life, man.
Oh, and you still didn't find the time to register a username on /. (or you really are a coward). Sweet.

--
Whenever in an argument, remember this.
Too bad businesses are typically... by GNU_Suit · 2011-01-01 17:34 · Score: 1

Too bad businesses are typically run by dolts who don't have the slightest idea how to interpret the data. I've been in charge of "medical informatics" for a large firm and spent a startling amount of time having to explain the difference between a mean and a median to high-level executives.
pfft I remember when it was called datamining by Billly+Gates · 2011-01-01 18:23 · Score: 1

Yes it is the wave of the ... present not future.
Walmart already knows when you buy a product how old you are, marriage classification, gender, sexual preference, criminal background, and what food and or products you stastically will buy the next 3 to 6 months and offers coupons based on them. They get this data from the credit card companies and sharing data with other suppliers.
This is a marketers dream and the wave of the future. Statistically analysis is why Oracle and DB2 are still huge despite mysql. It is because the database and their apis support these functions and you can help beat the competition by being statistically significant in all you do and knowing trends. Forecasting is another big one based on datamining errr memory analytic
I.T. is called information technology for a reason. Using it to produce and predict rather than look up makes it much more useful.

--
http://saveie6.com/
Will storage [...] be needed in the future? by RichiH · 2011-01-02 00:55 · Score: 1

No, we will simply keep everything powered on and start anew if we lose power.
Also, we will all be mega-corps, even at home. No one will start with datasets under a few Pentabytes. Not even for photos and text.
If you don't like the game... by Decker-Mage · 2011-01-02 15:01 · Score: 1

"The number of flash drives or PCIe flash devices needed to achieve the performance of main memory is not cost-effective, given the number of PCIe buses needed, the cost of the devices and the complexity of using them compared to just using memory."

... "Even if flash device latency improves, it still has to go through the OS and PCIe bus compared to a direct hardware translation to address memory."

... "Knowledge of how to do I/O efficiently is limited because I/O programming is not taught in schools."

... "The cost of I/O in terms of operating system interrupt overhead, latency and the path through the I/O stack is another limitation."

If you don't like the game, change the rules! The problem here is the multiple hardware and software layers between the flash memory (for now) and the processor. Take the bus and extend it to a box, if necessary, that has a ton of directly addressable flash memory, IOW Flash RAM is your system memory. DRAM, if any, should be used as level 3 cache. As your database grows, you add more flash RAM. All the quotes above only make the self-imposed complexity even more ridiculous. Stupid humans!

--
"[I]t is a wise man who admits the limits of his knowledge or skill, and that pretending either causes harm." --Terry Go