It is worth noting that the row hammer issue isn't new. It as been known about for some time. Including this old Slashdot post http://hardware.slashdot.org/s...
There has been an implementation of row hammer testing in MemTest86 V6.0 for over 6 months now as well. MemTest86 implements just the single sided hammer, whereas Google used a double sided hammer. http://www.memtest86.com/ While the double hammer might produce more RAM errors, this pattern of memory accesses isn't very likely to occur in real life software. So is of limited use as a RAM reliability test.
What is new in this report is the fact that they manipulated the RAM bit flips to turn them into an exploit. Something that was previously speculated on but considered too hard to implement.
What they didn't show however is any results from desktop machines. All their testing was on laptops. In fact they state, "We also tested some desktop machines, but did not see any bit flips on those". So the problem isn't as grave as it might at first appear. They speculate that ECC RAM blocks the bit flips and this has also been the experience with MemTest86, most (but not all) of the flips are single bit flips, which ECC would correct.
After a bunch of anecdotal reports we did some measurements of radio interference caused by LED lighting (and the power supplies included in these globes).
Most were OK, but there are a bunch that spray out a large amount of broad band interference. Some spectrum graphs are here showing a few lights in their on and off states. http://www.ledbenchmark.com/fa...
Interference was seen in the digital radio bands, FM radio, DAB bands, everywhere really. So the only thing surprising about this post is the lack of publicity the problem has been given to date.
Summary appears to be wrong. "...were able to deduce that Voyager was traveling through a less dense medium — i.e. interstellar space."
Interstellar space is apparently 40 times more dense than space in the solar system. The solar wind pushes the particles back to the edge of the solar system, making the plasma more dense at the edge (not less dense).
To quote from NASA http://www.jpl.nasa.gov/news/news.php?release=2013-277 "Voyager 1's plasma wave instrument detected the movement. The pitch of the oscillations helped scientists determine the density of the plasma. The particular oscillations meant the spacecraft was bathed in plasma more than 40 times denser than what they had encountered in the outer layer of the heliosphere. Density of this sort is to be expected in interstellar space."
Old news, they have had wireless devices in coconuts for years. Maybe they are expecting better antenna diversity from the rough end of the pineapple, I dunno. See, http://goo.gl/VoirWo
I don't get it. Why spend $375 million sending a camera to the moon only to return such poor quality images?
I looked a dozens of them, they all seem small, grainy, out of focus and black and white. (of course the moon being mostly grey might explain this last point)
Couldn't they afford a better camera? My smartphone would have done a better job.
We have done the actual benchmarks, and the original post matches our experience. PHP gives processing times of around 1 second (for a search function) and C++ code via a CGI gaves times of 0.1 sec. A ten times improvement.
Many people above have recommended using external appliances, or external hardware. This doesn't make sense in our opinion. Using an external indexer that crawls your files means that 1) You are loading up your network, 2) You are limited to network bandwidth speeds (rather than SATA or SCSI data transfer speeds) 3) You have the overhead of the HTTP protocol.
What makes sense is to run the indexer on the server that is hosting the files and index them directly off the disk. Don't spider them, and don't do it across a network. This can save you many days of indexing time.
But with this much data, I don't think there is any really quick solution. Whatever you decide to do is going to take some setup effort.
As pointed out in this New benchmark Chrome doesn't perform so well when using an independent benchmark.
One of the most popular and commonly used test suite is SunSpider. It is worth noting that this is developed by the WebKit team (WebKit being the rendering engine in Safari and now, Google Chrome).
So the benchmark being used was created by the developers of the JS engine. So it is hardly surprising that they do well in their own benchmark.
IE has a lot of catching up to do
on
Chrome Vs. IE 8
·
· Score: 1
Chrome roundly beats IE in terms of memory resource usage.
All previous versions of IE had a fairly limited Javascript engine, in terms of the allowed memory usage and limits on the size of statically declared arrays.
There is another example here of how IE fails completely under high resource use. While Chrome and Firefox were able to handle much larger data sets.
Also IE is much slower in the benchmarks above. Up to 4 times slower than Chrome.
My suggestion is the Zoom Search Engine.
By I am way bias, as I wrote half the code.
Some other things to consider.
1) Some of the solutions are Linux or Windows only. And some of the Linux solutions can't index Office documents. (Linux modules to extract text from all Office documents are not always available)
2) Don't forget about the new Office 2007 document formats (the compressed XML formats). They are really different from the Office 2003 formats.
4) You might need to manually edit the meta data on some documents. If the document is read only and can't be regenerated, then you might need a method, the Zoom's.desc files to add meta data to read only Office files
5) Get a native code solution, the search time benchmarks we did show that compiled C++ code will out perform PHP and another scripting languages to 10 times or more.
In summary, C++ was 4 times faster than PHP, and in turn PHP was 3 times faster than ASP and JavaScript was truly appalling.
I can't think of many applications that wouldn't benefit from being 4 to 12 times faster.
"Through a new technique known as Compression by Recursive Annulus Primes, huge volumes of data from the web can now be compressed into tiny index files. Using this revolutionary technology it will now be possible, for the first time, to carry around your own personal copy of the internet on a device such as high capacity USB thumb drive."
It is worth noting that the row hammer issue isn't new. It as been known about for some time. Including this old Slashdot post
http://hardware.slashdot.org/s...
There has been an implementation of row hammer testing in MemTest86 V6.0 for over 6 months now as well. MemTest86 implements just the single sided hammer, whereas Google used a double sided hammer.
http://www.memtest86.com/
While the double hammer might produce more RAM errors, this pattern of memory accesses isn't very likely to occur in real life software. So is of limited use as a RAM reliability test.
What is new in this report is the fact that they manipulated the RAM bit flips to turn them into an exploit. Something that was previously speculated on but considered too hard to implement.
What they didn't show however is any results from desktop machines. All their testing was on laptops. In fact they state, "We also tested some desktop machines, but did not see any bit flips on those". So the problem isn't as grave as it might at first appear. They speculate that ECC RAM blocks the bit flips and this has also been the experience with MemTest86, most (but not all) of the flips are single bit flips, which ECC would correct.
Disclaimer: I'm one of the MemTest86 developers.
+1 for, The C Programming Language, by K&R
(Brian W. Kernighan, Dennis M. Ritchie)
Because if you have never programmed in C (or assembler) then you really don't know how a computer works.
After a bunch of anecdotal reports we did some measurements of radio interference caused by LED lighting (and the power supplies included in these globes).
Most were OK, but there are a bunch that spray out a large amount of broad band interference. Some spectrum graphs are here showing a few lights in their on and off states.
http://www.ledbenchmark.com/fa...
Interference was seen in the digital radio bands, FM radio, DAB bands, everywhere really. So the only thing surprising about this post is the lack of publicity the problem has been given to date.
Summary appears to be wrong.
"...were able to deduce that Voyager was traveling through a less dense medium — i.e. interstellar space."
Interstellar space is apparently 40 times more dense than space in the solar system. The solar wind pushes the particles back to the edge of the solar system, making the plasma more dense at the edge (not less dense).
To quote from NASA
http://www.jpl.nasa.gov/news/news.php?release=2013-277
"Voyager 1's plasma wave instrument detected the movement. The pitch of the oscillations helped scientists determine the density of the plasma. The particular oscillations meant the spacecraft was bathed in plasma more than 40 times denser than what they had encountered in the outer layer of the heliosphere. Density of this sort is to be expected in interstellar space."
Old news, they have had wireless devices in coconuts for years. Maybe they are expecting better antenna diversity from the rough end of the pineapple, I dunno.
See, http://goo.gl/VoirWo
Can your smartphone stand the rigors of launch and lunar environment?
Yes, in all probability.
The $150 Edge-of-Space Camera: MIT Students Beat NASA On Beer-Money Budget.
http://www.wired.com/gadgetlab/2009/09/the-150-space-camera-mit-students-beat-nasa-on-beer-money-budget/
I don't get it.
Why spend $375 million sending a camera to the moon only to return such poor quality images?
I looked a dozens of them, they all seem small, grainy, out of focus and black and white. (of course the moon being mostly grey might explain this last point)
Couldn't they afford a better camera? My smartphone would have done a better job.
We have done the actual benchmarks, and the original post matches our experience.
PHP gives processing times of around 1 second (for a search function) and C++ code via a CGI gaves times of 0.1 sec. A ten times improvement.
Graphs and numbers are here,
http://www.wrensoft.com/zoom/benchmarks.html
Further when we switched to FastCGI we saw another 5 fold improvement, after optimising the code for FastCGI.
So I would believe a 50 folder improvement should be possible by going from PHP to FastCGI (and rewriting code to suit a FastCGI)
We are the developers of the Zoom search engine.
http://www.wrensoft.com/zoom/
We have spent some time recently looking at the problem if indexing large amounts of data, for see,
http://www.wrensoft.com/zoom/support/faq_large_sites.html
Many people above have recommended using external appliances, or external hardware. This doesn't make sense in our opinion. Using an external indexer that crawls your files means that 1) You are loading up your network, 2) You are limited to network bandwidth speeds (rather than SATA or SCSI data transfer speeds) 3) You have the overhead of the HTTP protocol.
What makes sense is to run the indexer on the server that is hosting the files and index them directly off the disk. Don't spider them, and don't do it across a network. This can save you many days of indexing time.
But with this much data, I don't think there is any really quick solution. Whatever you decide to do is going to take some setup effort.
As pointed out in this New benchmark Chrome doesn't perform so well when using an independent benchmark. One of the most popular and commonly used test suite is SunSpider. It is worth noting that this is developed by the WebKit team (WebKit being the rendering engine in Safari and now, Google Chrome). So the benchmark being used was created by the developers of the JS engine. So it is hardly surprising that they do well in their own benchmark.
Chrome roundly beats IE in terms of memory resource usage. All previous versions of IE had a fairly limited Javascript engine, in terms of the allowed memory usage and limits on the size of statically declared arrays. There is another example here of how IE fails completely under high resource use. While Chrome and Firefox were able to handle much larger data sets. Also IE is much slower in the benchmarks above. Up to 4 times slower than Chrome.
People have made a lot of good suggestions,
.desc files to add meta data to read only Office files
My suggestion is the Zoom Search Engine.
By I am way bias, as I wrote half the code.
Some other things to consider.
1) Some of the solutions are Linux or Windows only. And some of the Linux solutions can't index Office documents. (Linux modules to extract text from all Office documents are not always available)
2) Don't forget about the new Office 2007 document formats (the compressed XML formats). They are really different from the Office 2003 formats.
3) You stated that you wanted to index Access databases. In this case you will proably need to expose the content of the database via web pages, to allow the spider to spider them. For example,
http://www.yourwebsite.com/AccessDBRecord.php?id=1
http://www.yourwebsite.com/AccessDBRecord.php?name=Project1
http://www.yourwebsite.com/AccessDBRecord.php?name=Project2
etc..
4) You might need to manually edit the meta data on some documents. If the document is read only and can't be regenerated, then you might need a method, the Zoom's
5) Get a native code solution, the search time benchmarks we did show that compiled C++ code will out perform PHP and another scripting languages to 10 times or more.
We wrote the same search engine code in 4 languages, PHP, ASP, C++ & JavaScript. The results are published here, http://www.wrensoft.com/zoom/benchmarks.html
In summary, C++ was 4 times faster than PHP, and in turn PHP was 3 times faster than ASP and JavaScript was truly appalling. I can't think of many applications that wouldn't benefit from being 4 to 12 times faster.
See,
http://www.wrensoft.com/forum/viewtopic.php?t=871
"Through a new technique known as Compression by Recursive Annulus Primes, huge volumes of data from the web can now be compressed into tiny index files. Using this revolutionary technology it will now be possible, for the first time, to carry around your own personal copy of the internet on a device such as high capacity USB thumb drive."