IBM Speeds Storage With Flash: 10B Files In 43 Min
CWmike writes "With an eye toward helping tomorrow's data-deluged organizations, IBM researchers have created a super-fast storage system capable of scanning in 10 billion files in 43 minutes. This system handily bested their previous system, demonstrated at Supercomputing 2007, which scanned 1 billion files in three hours. Key to the increased performance was the use of speedy flash memory to store the metadata that the storage system uses to locate requested information. Traditionally, metadata repositories reside on disk, access to which slows operations. (See IBM's whitepaper.)"
But how big was each file? 1kb? 1mb? 1gb?
Did anyone else read that as "10 byte files?" that seemed mighty slow lol
http://searchstorage.techtarget.com/news/1381297/Isilon-Systems-clustered-NAS-adds-solid-state-drives-SSDs-10-GbE-connectivity
Feb 2010: "Isilon senior product manager Gautam Mehandru said Isilon has added the solid-state drives to its nodes for a specific purpose: metadata storage and management. "The OneFS file system will automatically identify metadata and place it on the SSD capacity of the cluster," he said. "Regular data will remain on hard disk drives – this will allow faster namespace operations for design and simulation workflows to accelerate replication and performance in server virtualization environments."
Thats very slow.
Also, please, better technical expertise writing the articles.
-Woof woof woof!
I read the article, but I don't really understand what they mean by "scanning in" 10 billion files in 43 minutes. Is this just copying? Is this "scanning in" in the traditional sense like from paper? Maybe I missed something reading through it I guess.
Hard to be impressed otherwise.
How big are these files and what are they scanning them for?
It were nice if there were some text here but there isn't.
Help stamp out iliturcy.
Traditional filesystems hold their metadata on disc? Ermmm... Exactly what do you think that the 'sync' command does. Traditionally metadata is held in memory and periodically written to disc for storage.
......Is this kind of performance in scanning in high demand?
http://www.awfullybigmoustache.com
They noted that while solid-state storage can cost 10 times as much as traditional disks, they can offer 100 percent performance boost.
So you get 2 times the performance for 10 times the price? I'd say that's still 5 times as expensive. What would be the performance boost with a RAID of 5 disks?
The Tao of math: The numbers you can count are not the real numbers.
Some filesystems allow you to store the journal on a different disk , such as a SSD
Now, some of my maths might be (a little) off, but ...
I've just spent half the day processing financial files ... 133KB average file size and processed (by process, I mean every byte is 'looked' at in c++ code) 4000 per second. I did this on a single file (compressed tar.gz) that when expanded is 7857 files and just over 1GB in size. The compressed file is temporarily stored in /dev/shm. The parallelisation is around one thread processing the ram drive file while the other file copies the next file (1GB file uncompressed, 65MB compressed) from a 5400rpm notebook drive (Thinkpad X60) to the ram drive.
Now, this latest in file processing by a giant of the industry has 'achieved' 3.55 million per second files 'processed' (and by processed it is never said what - but I'll assume the same as me) of files that are 650bytes in size (PDF says dataset was 6.5 TB).
I was processing on a notebook that is about 7 years old architecturally and achieved 544MB bytes processed per second and the latest IBM can do is 2.3GB per second.
Is this a *big* step forward? I should log into our cluster and do a test on memory a little more advanced and see how their numbers stack up.
I guess what i'm saying is, there is just no substitute for writing software properly.
.
What does it mean by "scan"?
time sudo ls -lAR / | grep -E '^[ld\-]+' | wc -l
It should give you the number of files on your filesystem and the time it took to "scan" them all.
not that impressive
I'm assuming that the files are 4KB in size.
This is just 1 percent the capacity of the human brain. I challenge IBM to make a machine with 100 times the performance.
IBM throws a lot of hardware at a problem; problem gets solved.
I have a vague memory of Sun producing an NFS accelerator about 20 years ago. This worked by caching remote file data in non-volatile memory.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
How big is a file on average and what constitutes "scanning" that file?
let med correct that...
doing nothing for 43 minutes without crashing is a new achievement for Adobe Flash...
DECADES ago, circa 2000-2002 @ MS' Tech-Ed in fact, increased DB performance by MANY ORDERS OF MAGNITUDE, & simply by using software based RamDisks/RamDrives for putting DB devices into RAM (before they began doing it "natively" in SQLServer), or if the DB's were too large, just their indexes &/or Temp/Scratch tables!
AND?
I was also doing "temp/scratch" table work the SAME WAY on smaller DB engines like Access &/or DBase III before it, circa 1991-1999 as well before that...
Why? Because, it works.
I.E./E.G.-> Lower 'seek/access' for starters (which is step #1 of the File Open/Read-Write/Flush/Close I/O cycle), & NO std. HDD read/write mechanical head-movements latencies being another.
Between the 2 of those alone, alongside B-Tree indexing?
You have a "HAUL A$$" DB engine...
This can also be applied to DB driven websites (or not), Terminal Servers, & far, Far, FAR MORE also! Creativity's your ONLY limitation really!
* Yes - The future IS doing Ramdisk/Ramdrives folks, & that "future IS now"...
(Albeit I was doing that decades ago, & only now are you seeing it as more "mainstream", & imo, only REALLY mainly due to co$ts of course - because there were SSD's that worked, Quantum had them iirc, rushmore drives iirc but they cost a fortune!)
I'll also admittedly state that "accomplishment" @ Tech-Ed for myself & EEC Systems/SuperSpeed.com isn't exactly "brain surgery" to figure out that using a faster media along with good algorithms on the datasets you have will yield a better, faster, & more efficient way of doing things!
HOWEVER? Hey - Nobody else did it before we did that I knew of & received a good deal of "notoriety/press/ink" for it @ least...
I later moved on to actual "TRUE SSD's" as I call them, not based on FLASH RAM:
---
1.) Gigabyte IRAM 4gb DDR2-RAM + PCI-Express x4 bus & SATA II 300gb/sec access circuit
or
2.) CENATEK "RocketDrive" 2gb PC-133 SDRAM + PCI 2.2 bus 133mb/sec. access circuit
---
They'll do the SAME for DB's, WebSites, Terminal Servers, & far, Far, FAR more also... but faster on writes typically than FLASH was initially @ least (that's changed, but these have better longevity).
For home use/performance-gains? I use them for:
---
A.) Pagefile.sys placement (1/2 of 4gb IRAM in own partition)
B.) WebBrowser cache, history, & actual browser program placements
C.) Print Spooler location
D.) %Comspec% location
E.) %TEMP% and %TMP% ops for OS + Apps
F.) Operating System & Application Event Loggings & logging in general
... and more!
---
SSD's?
Hey - They truly are, "The good stuff"... period!
( I've known & actually used them, & right after software-based Ramdisks/Ramdrives, for ages, & simply because THEY WORK for practical & better, noticeable, and effective performance gains (mostly)).
APK
P.S.=> I just like seeing & knowing that ideas myself & others used decades ago & we were often laughed at by the "wannabe's" in this art & science of computing are only NOW becoming "the performance wave of the future" in the mainstream... funny that, eh? Not...
... apk
Well, It was good read, the researchers had spent lot of precious time building this up, and its a valid thing.
I see a lot of negativity on Slashdot and the real tragedy is that the funny posts appear first than the insightful's.
people.. whats the matter?? If apple does something, nokia, google,ms, ibm, every time you speak cowardly funny.
/
fuck face.
God. Really?
10 billion files at one byte each is a transfer rate of ten gigabytes per 43 minutes. Slow.
10 billion files at one gigabyte each is a transfer rate of ten exabytes in 43 minutes. Incredibly fast.
Do you see why it's important to include within the summary the average file size they used?
I was wondering what does it mean 10B files... Ok, the article talk of 10 Billion files... But 1 Billion is 10^9 or is 10^12. So If you have to use a symbol, use a sensible one... What about 10G files? :D
Isn't it below the required minimum for one to drink at a fraternity party?
IBM seems to lag behind standards.
UGG Boots Australia is the United States under the company, in terms of degree of market recognition, to seize the market early, large leafy tree. However, in China, Ugg Australia is a trading company agent in product promotion maintenance, to ensure credibility, the existence of defects, which led to the present, very few pure through the headquarters of the United States authorized UGG authentic Cheap Ugg. Jumbo ugg from Australia Australia is a relatively big, and reflects the Australian tradition of craftsmanship and style, fashionable, unique, fine workmanship. But it is worth Unfortunately Jumbo ugg Australia brand is currently only in home sales has not been involved in China. Xiaobian that: If no relatives, friends in Australia, then forget it, travel expenses, postage, flowers you have blood Aukoala Australia ugg sale was founded in the 1970s, when only an initial small workshop, has now been developed to favored by European and American fashion star to become the essential thing. Product imaginative, low-key luxury and fashion, not only continuation of the classic design, but also into the tassel, rivets, Bohemia, feathers and other fashion elements, not only in the order of a high-end products brand, is on the taste laid in the fashion industry's extraordinary. ugg cardyXiao Bian broke the news: Aokoala enter the Chinese market in 2010, by a regular brands company, the overall evaluation was very good, recommended to has a try. It's written by xSteven on 7.26 tag: UGG Boots Cheap Ugg ugg ugg sale ugg cardy
Coach bags in the Chinese market is booming, the report shows same-store sales showed double-digit growth. The U.S. market grew only 6.3%. According to Bain & Company survey, China's luxury market in 2009 the total capacity of 23.3 billion U.S. dollars, Coach said he accounted for 5% of the total market. Lew Frankfort said: "Coach handbags brand awareness in China is not very high, about 8%, we decided to catch up over the next five years, China will surpass Japan to become the Coach after the United States the second largest Coach outlet market. Recently, high-end accessories manufacturer and retailer discount Coach bags said it expects fiscal year 2011 sales in China rose 75 percent year on year. goal is to develop the Chinese middle class consumer level. Coach's CEO Lew Frankfort said in an interview: "As of 2010 In July, China has 25 new store openings in China, the number of stores totaled 65. "authentic bags will focus future development strategy in the Asian market, before the Coach has been very high-profile entry into the cheap Coach European, U.S, UK, Ireland and Portugal markets. Lew Frankfort also said:" As the middle class was growing rapidly, so consumption will rise. China will become the first global luxury market, but now Coach bags is just out of infancy. ". Coach also plans to launch in China in mid-2011, e-brand shopping site. It's written by xSteven on 7.26 tag: Coach bags Coach handbags Coach outlet discount Coach bags authentic bags cheap Coach Coach bags