Slashdot Asks: What's Your View On Benchmark Apps?

← Back to Stories (view on slashdot.org)

Slashdot Asks: What's Your View On Benchmark Apps?

Posted by msmash on Friday April 29, 2016 @07:48AM from the reliability-of-benchmarking-apps dept.

There's no doubt that benchmark apps help you evaluate different aspects of a product, but do they paint a complete picture? Should we utterly rely on benchmark apps to assess the performance and quality of a product or service? Vlad Savov of The Verge makes an interesting point. He notes that DxOMark (a hugely popular benchmark app for testing a camera) rating of HTC 10's camera sensor is equal to that of Samsung's Galaxy S7, however, in real life shooting, the Galaxy S7's shooter offers a far superior result. "I've used both extensively and I can tell you that's simply not the case -- the S7 is outstanding whereas the 10 is merely good." He offers another example: If a laptop or a phone does well in a web-browsing battery benchmark, that only gives an indication that it would probably fare decently when handling bigger workloads too. But not always. My good friend Anand Shimpi, formerly of AnandTech, once articulated this very well by pointing out how the MacBook Pro had better battery life than the MacBook Air -- which was hailed as the endurance champ -- when the use changed to consistently heavy workloads. The Pro was more efficient in that scenario, but most battery tests aren't sophisticated or dynamic enough to account for that nuance. It takes a person running multiple tests, analyzing the data, and adding context and understanding to achieve the highest degree of certainty. The problem is -- more often than not -- gadget reviewers treat these values as the most important signal when judging a product, which in turn, also influences several readers' opinion. What's your take on this?

50 comments

Min score:

Reason:

Sort:

You also have to consider cheaters by dargaud · 2016-04-29 07:54 · Score: 2

Case in point: ADSL line speed. I've had several different ADSL providers, and living somewhat far out, the speed is consistently bad, sometimes awful. But if I try one of the many 'ADSL speed test' websites, the results are always in line with the promised speed. I once routed one of those through a proxy, just for the name change, and the speed was one tenth, same if I accessed it simply by the IP number ! Benchmarks are too easy to cheat. Wasn't it Intel who was caught doing that a few years ago ?

--
Non-Linux Penguins ?
1. Re:You also have to consider cheaters by Actually,+I+do+RTFA · 2016-04-29 08:32 · Score: 1
  
  ... same if I accessed it simply by the IP number
  Wait, how would that work. I mean, all the name->IP translation happens locally, and only IP addresses are sent out... unless they deeper packet examination. Which seems like a high cost.
  I suppose they could parse the HTTP request headers... or listen for the DNS queries?
  
  --
  Your ad here. Ask me how!
2. Re:You also have to consider cheaters by ArmoredDragon · 2016-04-29 08:45 · Score: 2
  
  Case in point: ADSL line speed. I've had several different ADSL providers, and living somewhat far out, the speed is consistently bad, sometimes awful. But if I try one of the many 'ADSL speed test' websites, the results are always in line with the promised speed.
  Not every place you visit (in fact, likely most places) will fully saturate your downstream link. They might have the bandwidth to be capable of doing so, but they ration it on a per-session (sometimes per-IP) basis so that everybody who happens to access the site can get a reasonable speed. (By the way, this is the principle that so called "download accelerators" take advantage of -- they combine multiple sessions into one. But they won't work on a per-IP basis unless you are able to do i.e. multipath TCP.)
  On a gigabit link that's just mine and only mine, few sites have saturated my downstream. Steam is probably the fastest one, which pumps out 33MByte/s but no more than that. The only way I can saturate my link without a bandwidth benchmarking site is with bittorrent on a torrent that has plenty of seeders, so you might try that.
3. Re:You also have to consider cheaters by Anonymous Coward · 2016-04-29 14:17 · Score: 0
  
  The article has been reposted every few months for the last 30 goddamn years, for every kind of software and hardware that has existed over that time. The point is always, always the same... benchmarks are not the whole story, but people tend to treat them like they are. We get it, already.
4. Re:You also have to consider cheaters by unrtst · 2016-04-29 16:27 · Score: 1
  
  ... same if I accessed it simply by the IP number
  Wait, how would that work. I mean, all the name->IP translation happens locally, and only IP addresses are sent out...
  When you go to http://www.google.com/, your browser sends a header saying:
  Host: www.google.com
  When you go to http://206.111.13.26/, that's not sent.
  I suspect the speedtest site was something like HisProvidersName.speedtest.net, and maybe it faked it if it got a connection from an IP within that provider.
The Benchmark Lifecycle by Prien715 · 2016-04-29 08:01 · Score: 3, Insightful

A good benchmark -- in cameras, CPUs, GPUs, cars, anything really -- is ideally a set of tests which contains a random sampling of real-world scenarios. In the beginning, the benchmark is good precisely because the vendors are unaware of it and don't spend a bunch of time trying to optimize for it specifically.
Once a benchmark becomes popular, companies try to make their product better for the benchmark ("See PHB! I increased our PCBench score by 10%!") but CAN ultimately end up doing so in a custom way that doesn't represent real-world performance (e.g. Volkswagen). Because the company is now specifically trying to optimize for a specific use-case, the benchmark is no longer random and thus no longer representative of real-world use.
Enter a new benchmark, which is really good, and better mirrors real-world performance and the cycle begins anew.

--
-- Political fascism requires a Fuhrer.
1. Re:The Benchmark Lifecycle by Actually,+I+do+RTFA · 2016-04-29 08:38 · Score: 1
  
  I have a friend working on.... a popular webbrowser. They test JS performance (of theirs and competitors') all the time against benchmarks. In theory, those benchmarks are derived from looking at the 1000 most popular sites (according to some site ranking algorithm). If that's true, than that seems to be a valid(ish) benchmark. I mean, those 1000 sites probably account for the vast majority of traffic, and other sites probably model themselves after those 1000 sites.
  
  --
  Your ad here. Ask me how!
2. Re:The Benchmark Lifecycle by Anonymous Coward · 2016-04-29 08:48 · Score: 0
  
  I view benchmarks like I view performance review numbers. You cannot show improvement if you cannot compare to past metrics. So you collect metrics even if they are poor choices. For example, you can measure a software engineer against SLOC. It is not a great measure of productivity (and many people can attest to why), but it is a measure that is readily available by looking at SCM. Having a bad measure is better than having no measure. Over the years, the SLOC measure may get tweaked in terms of how it is calculated to prevent software engineer from gaming the system too much. Perhaps credit will be added for code reviews and penalties for build breaking. Eventually you will have a number that reflects some level of work done by the software engineer, but not necessarily a linearly scalable number that can accurately reflect productivity. But it is still better than no number.
  You have to also account for how much time is required to get the metric. You can get a bad metric from your software engineers in moments or you can take them away from software engineering tasks for one hour per day to get a more reliable measure or you can take them away from software engineering tasks for half the days every day for good metrics. Most managers would pick the easy options so that the engineers can spend more time solving problems than reporting metrics.
3. Re:The Benchmark Lifecycle by DigiShaman · 2016-04-29 09:36 · Score: 1
  
  Benchmarks have to be consistent from platform to platform for it to hold any scientific validity. However, would it be possible to inject random inert code into the binaries during the Benchmark install process; basically making it polymorphic enough to throw off any vendor's attempt to detect and optimize against it?
  
  --
  Life is not for the lazy.
4. Re:The Benchmark Lifecycle by beelsebob · 2016-04-29 11:27 · Score: 1
  
  I view benchmarks like I view performance review numbers. You cannot show improvement if you cannot compare to past metrics. So you collect metrics even if they are poor choices. For example, you can measure a software engineer against SLOC. It is not a great measure of productivity (and many people can attest to why), but it is a measure that is readily available by looking at SCM. Having a bad measure is better than having no measure. Over the years, the SLOC measure may get tweaked in terms of how it is calculated to prevent software engineer from gaming the system too much. Perhaps credit will be added for code reviews and penalties for build breaking. Eventually you will have a number that reflects some level of work done by the software engineer, but not necessarily a linearly scalable number that can accurately reflect productivity. But it is still better than no number.
  Complete falacy.
  No number is almost always better than an uninformative number. SLOC is a great example of this. You actively do not want engineers to be contributing lots of lines of code - that's how you end up with the Facebook app - 17000 classes doing... basically nothing much, and no one who understands how any of it works.
  You actively do not want to use that kind of measure, because it bears exactly 0 correlation to an engineer being productive and/or useful.
5. Re:The Benchmark Lifecycle by unrtst · 2016-04-29 16:39 · Score: 1
  
  Once a benchmark becomes popular, companies try to make their product better for the benchmark ("See PHB! I increased our PCBench score by 10%!") ...
  
  Slight tangent from this, when management of any kind starts running the benchmarks / tests / security scanners / etc, watch out! Suddenly, there's a huge red flag that must be fixed immediately, and it's just an internal only static site with a self signed cert.
dumb by Anonymous Coward · 2016-04-29 08:04 · Score: 0

I just want to know I get what I pay for.
1. Re:dumb by GrumpySteen · 2016-04-29 08:28 · Score: 1
  
  Then you'd best stick with $20 crack whores. Sure, you're paying for regret and an STD, but you can be pretty sure that you're getting what you paid for.
Only two benchmarks are important by fustakrakich · 2016-04-29 08:05 · Score: 1

Boot up time and Photoshop filters. Use a bittorrent client to measure internet speeds. "Speed test" web sites are dogged down by traffic.

--
“He’s not deformed, he’s just drunk!”
tl;dr by Anonymous Coward · 2016-04-29 08:05 · Score: 0

Just buy one of each at Best Buy and then return the ones you didn't like. Or return them all and buy the one you do like from Newegg.
They are exactly that... by Anonymous Coward · 2016-04-29 08:06 · Score: 0

Benchmark tools are exactly that... a benchmark of that specific machine at that specific time under those specific stresses. Not all that much different than the ACT, SAT or similar standardized tests. Other than computer tests tend to be a bit more accurately repeatable.
As with published machine specs, and user and industry reviews, they can be helpful to size up one machine over another. That is about it.
It's like getting in a religious argument and saying 'My Ghod can beat up your Ghod!'
Not any different than buying a car, washing machine, or cow.
Look for what you need it to do.
Look for what features you need, then extras you would like to have / can afford.
Determine which manufacturer you are willing to shell out your shekels for.
Roll the dice and take your chances along with the rest of us the sheep.
FredInIT
Build to the benchmark by MetaKey · 2016-04-29 08:07 · Score: 2

An unpleasant side effect of benchmarking is when manufacturers start building products to do well on the benchmark to the detriment of other, also important, specs. So, while the product may kick ass on the common benchmarks it may not be so great because other important stuff gets neglected.. The benchmark process starts steering the design "committee"..
1. Re:Build to the benchmark by Actually,+I+do+RTFA · 2016-04-29 08:19 · Score: 1
  
  You get what you measure. Unfortunately, my use cases and the majority's are not the same.
  
  --
  Your ad here. Ask me how!
2. Re:Build to the benchmark by somenickname · 2016-04-29 08:30 · Score: 2
  
  Companies have been known to take this even further. You can probably find plenty of compilers that have something like, "if(this_looks_like_benchmark_x) emit_special_code_for_benchmark_x". I know for a fact that the old Sun compiler could detect a matrix multiply and would emit hand tuned, parallelized assembly when it detected it.
  Vendors will always play games with benchmarks and customers will always read things into benchmarks that aren't true. That's not to say that benchmarks aren't useful but, if you are making decisions based on benchmarks, you really need to understand what is being benchmarked, who did the benchmarking and what (if anything) the benchmark results mean.
3. Re:Build to the benchmark by Anonymous Coward · 2016-04-29 12:20 · Score: 0
  
  I know for a fact that the old Sun compiler could detect a matrix multiply and would emit hand tuned, parallelized assembly when it detected it.
  That just sounds like a proper optimization...
4. Re:Build to the benchmark by somenickname · 2016-04-29 12:53 · Score: 1
  
  Maybe. But, it's very dishonest. A simple matrix multiply is a triple nested loop and when the compiler detects that loop with a certain stride through memory, it drops in the fast stuff. The exact same loop with a different stride through memory didn't trigger any special optimization and, as expected, the performance dropped by at least an order of magnitude. So, in the context of benchmarks, it's cheating: The benchmark does not represent the capabilities of the machine or compiler on any workload that doesn't look like the benchmark (not even on workloads that are almost identical to the benchmark). It represents the compiler writers ability to detect the benchmark.
5. Re:Build to the benchmark by Cederic · 2016-04-29 21:09 · Score: 1
  
  Umm. I _want_ my compiler writers to cheat. Ok, I may not get the full benefits if I don't know all the cheats, but I'll trigger enough of them to make the system faster.
  A compiler that knows how do make code execute faster? Sounds fucking ideal to me.
6. Re:Build to the benchmark by sjames · 2016-04-30 07:42 · Score: 1
  
  Actually, no. What you'll do is get the compiler and processor that are super fast for a tiny fraction of your code and slow as a log truck going up hill the rest of the time instead of the cpu/compiler that is twice as fast for 100% of your code. You will lose big on that deal.
Bench Marking Tools by Anonymous Coward · 2016-04-29 08:07 · Score: 0

Bench Marking Tools. I usually use a mechanical pencil myself.
Benchmark tools? by xxxJonBoyxxx · 2016-04-29 08:07 · Score: 1

>> What's Your View On Benchmark Tools?

So..who are the "tools" - the shysters creating the benchmarks or the rubes consuming them?
Benchmarks are useless in reviews by Anonymous Coward · 2016-04-29 08:07 · Score: 1

That's the conclusion I've mostly come to, at least for complete consumer products.
When I look at the latest Dell, Apple, etc desktop or laptop I already see the figures available from the maker, and often there's at least a few choices in terms of CPU, RAM, or SSD options. The only way performance from one item to another would be considerably different would be if one OEM made a major error.
On the other hand there are things that are hard to tell from the spec sheet that make a huge difference for me:
Is the keyboard any good?
What kind of fan does this system use?
How bad is the glare on the included screen?
Does the case feel like it will fall apart on the first tweak?
More so for laptops and tablets...
How long can the system maintain a safe temperature while running a stress test, or a lesser stress of an intense game?
1. Re:Benchmarks are useless in reviews by danbob999 · 2016-04-29 08:21 · Score: 1
  
  Does the case feel like it will fall apart on the first tweak?
  I've bought dozens of PCs, for myself and others. I have carried a laptop for years on bicycle, including on snow/ice and fell multiple times.
  I've never replaced a desktop, and not even a laptop, because of a broken case. Even the so-called "cheap plastic" laptops are more than durable enough for a lifespan of 3-10 years. And even in the unlikely case of a case break, the laptop will most likely continue to work just fine, and therefore the problem would be only cosmetic.
  Too old and too slow, or broken display/hard drive/RAM/fan/whatever are good reasons to replace/upgrade a PC. But a broken case, really?
2. Re:Benchmarks are useless in reviews by CAIMLAS · 2016-04-29 08:52 · Score: 1
  
  Not sure which laptops you've bought or how they've dropped, but apparently you've not worked on others' stuff much - people break shit in some really horrible ways. Cracks in the case around the display, particularly near the hinge, are notably problematic, as are around the keyboard. It doesn't take much of a crack for things to start not working properly.
  
  --
  ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
3. Re:Benchmarks are useless in reviews by danbob999 · 2016-05-01 02:54 · Score: 1
  
  Not sure which laptops you've bought
  Mostly cheap ones.
  
  or how they've dropped,
  That's my point, they haven't. Or they were in their protecting bag when it happened.
Buy what you can afford by Anonymous Coward · 2016-04-29 08:08 · Score: 0

And leave the epeen contests to the hipsters. If you can't afford pimp, it's irrelevant anyway.
My view is: by QuietLagoon · 2016-04-29 08:14 · Score: 2

Benchmark tools do well when they are used for what they are designed to measure. Benchmark tools go off the rails when they are seen and interpreted as some kind of all-purpose suitability tester.
Reviewes by 110010001000 · 2016-04-29 08:21 · Score: 2

"however, in real life shooting, the Galaxy S7's shooter offers a far superior result."
Says who? The reviewers "objective" opinion? These are the same guys that say a $10,000 audio cable produces "warmer" sounds than a $5 one.
1. Re:Reviewes by PopeRatzo · 2016-04-29 08:26 · Score: 1
  
  These are the same guys that say a $10,000 audio cable produces "warmer" sounds than a $5 one.
  Just don't say nothing bad about my $200 Shakti Electromagnetic Stabilizer Stone.
  http://www.musicdirect.com/p-7...
  
  --
  You are welcome on my lawn.
2. Re:Reviewes by Anonymous Coward · 2016-04-29 08:40 · Score: 0
  
  Exactly. Let's not forget that all cameras heavily post-process the image, and this is a big part of making pictures look good. Straight from the sensor all photos look bland. If you can't get raw sensor data, you have to know what artifacts to look for to be able to judge whether you're just enjoying the post processing algorithms or getting great sensor output. I cringe when I see how some people configure their TVs. Yet they would certainly rate their configuration higher than a properly calibrated screen that is capable of showing the full range of inputs without blowing out colors and highlights or drowning out the shadows.
It's a benchmarck, not God's Score by gurps_npc · 2016-04-29 08:22 · Score: 2

You are not looking at God's manual for existence, to check a score, like some kind of video game.
It's just the results from a test - helpful, but not perfect. Luck, design for the test, and many other factors may affect it.
If all you do is look at the benchmark, you deserve to be screwed over. Doing so is like looking at new lawyers grades in law school and making the highest score a partner right off the back.

--
excitingthingstodo.blogspot.com
The limitations of testing by MobyDisk · 2016-04-29 08:23 · Score: 1

If you want to use a test result, you must first understand what the test is measuring. It isn't ever going to be as simple as "Laptop A got 536 and laptop B got 642, therefore laptop B is better at everything." This same thing applies to medical diagnostic tests, or academic test, or product quality tests. Unfortunately, this is hard. Because statistics is hard. And science is hard.
Sorry. :-(
dxomark vs verge by Anonymous Coward · 2016-04-29 08:24 · Score: 0

as someone who has recently been through evaluation at dxomark and with the tech media, I can say that the tech media (e.g., the verge) really don't have much of an idea what they're talking about.. Evaluating camera performance is such a subjective topic (and one that people become very passionate about -- see iPhone), that what one reviewer likes, another reviewer may hate.
The benchmarks at least provide a repeatable set of tests which are designed to measure specific capabilities.... not just "it doesn't take pictures of my cat as good as the iPhone". For example: sharpness. I would rather rely on a review that measured the sharpness to a quantifiable number, rather than see a photo on a review site that gives no context (were you walking? was the subject moving? did you use a tripod? etc.)
The more troubling aspect of a review lab like DxO is that the same company also sells imaging consulting services to device manufacturers.
Tools measure what they were designed to measure by Anonymous Coward · 2016-04-29 08:32 · Score: 0

If I compared a pencil to a thin laptop, there would be many overlapping capabilities and features. Some areas the pencil would excel (subjectively and measurably) and others where the laptop would "win". If a test was only designed to compare thickness, both items may be seen as equal. Color could also be identical, as well as temperature in standby mode. Benchmarks should include a number of tests, and some factors remain subjective, especially when it comes to user preference...but the final choice is up to the customer/consumer.
Mostly useless by Anonymous Coward · 2016-04-29 08:38 · Score: 0

Benchmark apps do nothing except paint a picture of performance of your current hardware in the current configuration. Unless you are willing to do the legwork and RTFM to figure out the optimal configuration, then you are just wasting time with the benchmark.
Most people that are technically savvy enough to build their own PC are also savvy enough to manually do the math and figure out the optimal configuration. Benchmark software is a tool for beginners.
Benchmark = a standard or point of reference by Proudrooster · 2016-04-29 08:43 · Score: 2

Benchmark: a standard or point of reference against which things may be compared or assessed.
Yes, benchmarks do a good job of comparing two pieces of hardware, especially tests which involve the entire system. I use benchmarks all the time for hardware comparison and system optimization/overclock comparison. Without benchmark tools we couldn't effectively compare changes to setting or in hardware speed specifically raw CPU, raw GPU, raw RAM, and raw DISK I/O speeds.
Benchmark tools also help determine system stability by pushing the hardware to the limit and taking it to it's thermal throttling speed.
So people ship custom hardware to vendors to cheat on benchmark? Yes.
Will these cheats show up in the reviews on NewEgg, Amazon, and Tom's Hardware when they can't be replicated? Yes
So please, benchmark away. Publish the results. Keep the data in a table for all to view. Benchmarks keep everyone honest in the end.
1. Re:Benchmark = a standard or point of reference by somenickname · 2016-04-29 11:34 · Score: 1
  
  Yes, benchmarks do a good job of comparing two pieces of hardware, especially tests which involve the entire system.
  No, they usually don't. Doing a "full system test" is almost certainly not going to give you useful information. How do you weigh individual results into a final result? How do you know the vendor hasn't included special cheat modes into the hardware/software to skew the benchmark? How do you know the benchmark is even testing what it claims to be testing?
  
  Without benchmark tools we couldn't effectively compare changes to setting or in hardware speed specifically raw CPU, raw GPU, raw RAM, and raw DISK I/O speeds.
  Comparing "raw" anything is probably not useful either. Discovering that increasing the CPU speed by 10% increases a benchmark score by 10% is almost certainly meaningless unless the benchmark is the intended workload of the machine. And, that's the key: A benchmark only has meaning if it accurately represents the intended workload of the machine. Most benchmarks do not.
  
  Benchmark tools also help determine system stability by pushing the hardware to the limit and taking it to it's thermal throttling speed.
  So does "while(true);". That doesn't make it a useful benchmark.
  
  Keep the data in a table for all to view. Benchmarks keep everyone honest in the end.
  Actually, when a benchmark becomes popular, it does the opposite of "keep everyone honest". Vendors start to design towards a benchmark and, in many cases, detect the benchmark and enter into a special mode of operation (or emit canned benchmark assembly) to cheat the benchmark. This is a very common thing to do and quickly turns benchmark results into, "Who can cheat the benchmark in the most clever way" instead of giving meaningful information about what you are trying to benchmark.
2. Re:Benchmark = a standard or point of reference by Proudrooster · 2016-04-29 13:57 · Score: 1
  
  So does "while(true);". That doesn't make it a useful benchmark.
  This actually just gets put in the L1/L2 cache of the CPU.
  In General, if I use a benchmark like Cinebench it correlates to real world performance in programs like Final Cut, Adobe Premiere, and After Effects for video rendering.
  In all my years of benchmarking and overclocking, I have not found anything suspicious. Years ago there was the whole Intel vs. AMD benchmark bru-ha-ha where benchmarks favored Intel due to compiler optimization favoring Intel hardware, but the CPU wars are long over. AMD lost and now Intel is going out of business. Without competition things stagnate. Come on DDR4 memory!
This has been an issue since forever by CAIMLAS · 2016-04-29 08:47 · Score: 1

Systematic review is very important; however, in most cases, the system used to review is not complex enough to effectively qualify what's being reviewed.
It's like any system used to summarize data: fundamentally you're going to get a flawed diagnosis, because it's summarized. Unless you're dealing with a huge amount of data, and the analysis thereof, the answer is almost always "it depends".
And then there is the 'bias review' introduced in a lot of these benchmark tools. It's why open source benchmark methodology has arisen over the years for desktops, by and large.
It wasn't long ago when many of the popular benchmark systems biased against AMD, versus Intel, or one GPU over another. There was also no way you could do anything about it - the Intel C compiler at one point would "do shit compilation" when a non-Intel CPU was detected.

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Trust by wickedsteve · 2016-04-29 08:58 · Score: 3, Insightful

I can't trust benchmarks unless they are actually doing what my device is for. I have a gaming pc so I trust a benchmark tool that actually renders scenes like the games I play. The benchmark records things that apply to my enjoyment of games like frames per second under various settings. If a tool just gives me a grade on some arbitrary scale then it is no use to me.
DxOMark is indeed a perfect example by Ecuador · 2016-04-29 09:00 · Score: 1

DxOMark is indeed a perfect example of elaborate benchmarking and what can go wrong with it. To make a streamline and objective test they only measure the few things that are the easiest to measure objectively over various cameras. In the end they seem to just combine these test scores and come up with a number that makes no sense if you look at real life performance, since not only they do not measure a multitude of things that also affect performance, but in addition, the way they combine the things they measure is not helping things. For example they measure color depth in bits and they actually say that "22 bits is excellent, differences under 1 bit are hard to detect" and yet, since it is only one of the few things that they can measure, if a camera has a bit depth of 22.9 and another 22.4, with other things being equal the former will get a much higher score, even though they do admit both are excellent and their difference is not perceptible! And in their scores from what I can gather, they don't seem to include resolution, since I see cameras with the same stats getting the same overall score despite one managing exactly the same noise/bit depth etc at twice the resolution!
So, what good is DxOMark? Like every benchmark it is perfect for what it measures. In fact from the posted SNR curves people have calculated things like the read noise and even the quantum efficiency of sensors with what seems to be good accuracy (if you compare the values for the few sensors which have a published or calculated otherwise QE), but the "overall score" is rubbish. The car analogy is of course being given cars to benchmark and you have excellent facilities to measure the breaking distance, the acceleration and the engine noise with great accuracy. You make them into "scores" and add them to give the "overall score" of a car! And to make the analogy more accurate, you measure two basically inaudible electric engines and your sensitive equipment gives you a few db difference in SPL, but it translates to a greater score to the one as inaudible to humans as the other electric car and because it is 1/3rd of the score, it becomes an "overall better" car.
The CPU performance benchmarks are also the same thing. They will tell you how good a CPU runs the benchmark. Hence, when my CS lab for example wanted to get a new processing cluster I made benchmarks out of the projects that would be running on the cluster and Dell, HP and Sun gave us access to sample units to run them. If you don't have access of course things get a lot harder...

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
Buggy Piece of Shit by Anonymous Coward · 2016-04-29 09:29 · Score: 0

It can be the case that you have a horrible product that outperforms another. For instance it might be a buggy piece of shit. Or in other cases there may be a lesser performing product that is more powerful in other ways. Such as capabilities being added as a the result of source code being released under a free software license. I wouldn't even begin to consider hardware that was dependent on proprietary components. No performance test is going to rate that and I'm going to be a bit sceptical of someone recommending a Windows laptop that performs well in Windows when I run GNU/Linux. Plus different drivers on each OS can impact such things. It might be the Windows drivers are better today- but what about tomorrow? I want to know that my experience is smooth. That upgrades don't result in loss of support for hardware and other such issues.
Necessary but not sufficient by tlhIngan · 2016-04-29 09:54 · Score: 1

Benchmarks are necessary, but not sufficient way to test things.
The reason for benchmarks is simple - you want a scientifically repeatable test that can be used to compare things with each other. This limits the benchmark's utility as a real-world test because it's inherently limited in what it can test. All it gives is how your thing measures up to all the other things out there. And yes, benchmarks will be gamed, doesn't matter the field (see VW, Mitsubishi and everyone else with diesel engines). However, that doesn't mean their utility is null - it's a comparison tool. Just like how your fuel consumption figures are based on somewhat unrealistic test scenarios, they're like that because they have to be repeatable and comparable.
But on devices which are complex, a benchmark will never cover all the use cases and will never cover everything.
(E.g., an audio amplifier is really simple and a benchmark can cover everything because its job is to increase signal strength, so all you need as a benchmark is how far the output waveform deviates from the input waveform. But a preamp that say corrects for room deficiencies cannot be tested by benchmarks alone because its too complex).
So for complex measurements (or things not fully quantifiable, e.g., "image looks better" or "clearer" or "faster", then you need more tests.
Although, for the imaging system a benchmark should be good enough - as all it needs to do is take a photo of a calibration chart and measure the final photo output for errors. Other aspects like lag can be measured as well. In which case if they produce the same results, they should be just as good. (This will be analyzing the image itself internally or via the same screen). If the S7 images are "more vibrant" then perhaps it's the screen itself since OLEDs are known to oversaturate and produce nice looking, but completely color inaccurate photos.
Norton SI by CanEHdian · 2016-04-29 13:43 · Score: 1

Oh the old days where you could rate a system relative to IBM/XT... and even then people had the same discussion.

--
When the copyright term is "forever minus a day", live every day like it's the last.
Which benchmark? by drinkypoo · 2016-04-29 16:04 · Score: 1

3dmark? pretty pictures
iobench? now that's useful

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
There is a silver lining by wkwilley2 · 2016-05-01 23:51 · Score: 1

Benchmarks are great tools since they are repeatable and give you a picture of what your hardware, phone, etc is capable of.
However I've learned never to rely on the benchmarks alone as they normally don't mimic real world usage scenarios.
Tl;dr, great for reference and stress test, bad for real world usage.

--
Have you ever fallen asleep at the keybhanusdiog?