Baidu Forced To Withdraw Last Month's ImageNet Test Results
elwinc writes: Back in mid-May, Baidu, a computer research and services organization in Mainland China, announced impressive results on the ImageNet "Large Scale Visual Recognition Challenge," besting results posted by Google and Microsoft.
Turns out, Baidu gamed the system, creating 30 accounts and running far more than the 2 tests per week allowed in the contest.
Having been caught cheating, Baidu has been banned for a year from the challenge. I believe all competitors are using variations on the convolutional neural network, AKA deep network. Running the test dozens of times per week might allow a competitor to pre-tune parameters for the particular problem, thus producing results that might not generalize to other problems. All of which makes it quite ironic that a Baidu scientist crowed "Our company is now leading the race in computer intelligence!"
That's what I always say.... (/sarcasm)
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
Chinese company caught cheating? NO WAY!
Seriously though, raise your hand if you're surprised.
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
This reminds me of some Chinese ball screws (rotary to linear motion components) my company once ordered from Alibaba. The companies had pictures, drawings and put together quotes for parts but then delivered samples that were just totally totally useless. Some of these 'precision' parts looked like they had been made with a file. It just didn't make any sense that they would waste their and our time on such clearly incompetent products.
But when you go there you realise the problem. It is basically an economy in a state of hyper competition. There is so much competition that people will just try anything get ahead, completely oblivious to the wider problem or goal they are trying to solve. You can see that in how the government had to rationalise the solar industry because nobody could make any money. They are just really really crazy competitive.
The trouble though is that there are now many good Chinese engineers who know what they are doing but are still hyper competitive. I really don't know how us westerners with our 40hr work weeks, healthcare and pensions are going to eventually compete with that until we too are faced with the desperation of trying to escape from abject poverty along with 1 billion other people.
People growing up under oppressive governments have much fewer problems with cheating — because cheating government is a fair game. It rubs off — and the attitude is quickly extended to non-governmental institutions large and even smaller ones.
This is not "racism" — ex-Soviets like myself often have the same problem... A cheating Western student fears (or used to fear) the shame of being exposed. A Chinese — or a Soviet — fears merely getting caught. Like a speeding ticket — there is no shame in driving fast, only in being stopped by "the bear".
China today uses drones to catch cheaters — America had not felt the need for such measures. Perhaps, it was a foolish attitude, because we the immigrants bring all our traits to the "wonderful tapestry of diversity", not just the good ones...
Anybody dealing with Chinese companies (or Russian ones, if you can find any), ought to be careful and not depend merely on trust.
In Soviet Washington the swamp drains you.
They'll just go in and steal the research from another competitor and call it their own. Cheating and espionage are familiar bedfellows.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Such cheat have been used for years in the field. Before ImageNet, there was the Pascal VOC challenge with about the same rules, and I'm pretty sure all winners were optimizing the hyperparameters their submission on the test dataset.
Seriously, as long as computer vision benchmark are based on a single train/test split, there will be such abuses. If there were several splits with meaningful statistics computed on it, I would be less worried by the overfitting you get by optimizing the hyperparameters.
But hey, you're never gonna make it to CVPR without tunning your method so as to fool reviewers that it performs much better than the state of the art. 0.1% for a good idea, 99.9% for engineering tricks.
Video of some good progressive thrash music
Maybe that's appropriate punishment for children, but these are professional scientists. The only reason nobody has the brass to ban them for life is because their country owns us.
Baidu isn't just "a computer research and services organization", they're the Chinese version of Google. They're a massive company with eight billion USD in revenue last year. The headline is either misleading or completely clueless.
Just because I can hook a shark from a boat, I do no offer to wrestle it in the water.
Surely you jest!
I am very small, utmostly microscopic.
Message from the team in question:
Dear ILSVRC community,
Recently the ILSVRC organizers contacted the Heterogeneous Computing team to inform us that we exceeded the allowable number of weekly submissions to the ImageNet servers (~ 200 submissions during the lifespan of our project).
We apologize for this mistake and are continuing to review the results. We have added a note to our research paper, Deep Image: Scaling up Image Recognition, and will continue to provide relevant updates as we learn more.
We are staunch supporters of fairness and transparency in the ImageNet Challenge and are committed to the integrity of the scientific process.
Ren Wu – Baidu Heterogeneous Computing Team
So, while they deserve the year ban, the apology is nice. It's a shame we can never know what results a fair competition could have yielded ... and an even bigger shame that the media misreported Baidu as overpowering Google. I suppose the damage is done and the ILSVRC has made the right choice.
...
Perhaps I'm misunderstanding the classification problem but why isn't this run like most other classification problems (like Netflix and many other data challenges) where you get ~80% for training and the remaining 20% are held back for the final testing and scoring? Is the tagged data set too small to do this? Seems like wikimedia would contain a wealth of ripe public domain images for this purpose
My work here is dung.