Can Machine Learning Replace Focus Groups?
itwbennett writes "In a blog post, Steve Hanov explains how 20 lines of code can outperform A/B testing. Using an example from one of his own sites, Hanov reports a green button outperformed orange and white buttons. Why don't people use this method? Because most don't understand or trust machine learning algorithms, mainstream tools don't support it, and maybe because bad design will sometimes win."
I have read the synopsis 4 (four) times and I didn't get shit.
Of course, TFA sheds some light on the whole thing, but really... work on your short version, guys, because what's in here makes no sense.
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
about nothing
Wake me up when they produce banner-ad algorithms that beat adblock, noscript, etc.
The only possibly benefit I can see from this is *maybe* adjusting a site's color-scheme or layout to be more intuitive and easy to navigate. I.E. making the "add to cart" button easier to find without being obnoxious about it. But then again, if I decide to add something to my cart, I'm confident I'll find the button even if it's 1.2% less optimized. And visual optimization can be done by any 1st year graphic design student.
This signature is false.
So that you don't have to click through the slashvertisement, I have read TFA for you.
Here is a summary: Let's say you have several different designs for a web interface that you want to test to find out which one works the best.
One method is to have a "testing period" in which you randomly show each person one of the designs at random and identify how well it works for that person. Then, once you've shown 1,000 people each of the designs, you figure out which one is the best on average. Now the "testing period" is over, and the best design is shown to everyone from that point forward. That is the "old" method.
The "new" method is to dispense with the testing period. Instead, you show the first person one design at random. If it works (e.g. they click on the ad), it gets bonus points. If it doesn't work, it gets a penalty. At any time, you show the design with the most points; if it is bad, it will lose points over time and eventually stop being shown.
The goal of the "new" method is to hopefully avoid showing bad designs to 2000 people just to figure out which one is the best.
If you care about the details then you should probably read the article. This summary is just an approximation for those who can't be bothered or who object to slashvertisements on principle.
Just get the last answer and repeat it over and over.
Such a machine will be equally as good as any focus group.
NO !!!
AccountKiller
Is this a Turing test?
This is not "machine learning" subsituting for human A/B testing. It's just changing the ratio of the number of visitors exposed to the "new" feature to be tested from 50% to 10%, while keeping the rest (90%) of the visitors using the "best so far" feature. There's also a bit of randomness thrown in when choosing which new feature the 10% of visitors get to test.
In this scheme, the human visitors are still doing the A/B testing, it's just that determination of which human is testing which feature dynamically adapts over time.
Now, if this guy had subsituted human A/B testing completely with a machine learning technology that could somehow determine which feature is better without any input from humans, then I'd be impressed. That's kind of what the summary and article imply. But that's not what he's done. He's just being a bit more sophisticated regarding which humans get to test which feature.
He's also made a big fat claim regarding the effectiveness of his method with zero evidence to back it up. Theoretical results regarding one-armed bandit problems are quite a far cry for real-world results regarding website feature selection. I'm looking forward to seeing some results of the proposed method on the latter.
Throwing up banner ads with different color schemes and automatically re-weighting them based on click-through % is something I was doing well over ten years ago. This can't really be news, can it?
I live ze unknown. I love ze unknown. I am ze unknown.
A/B focus testing is about observing how customers or users choose between two alternatives based on their qualitative sense of aesthetics. ML is about classifying data based on quantifying the data into defined classes or toward optimal values.
Predicting the outcome of a focus group is a completely different problem than multi arm slot machines. In focus groups there is no objective metric, so focus group problems are not amenable to machine learning unless your machine can define, measure, and perhaps predict aesthetic criteria.
Now THAT I'd like to see.
It's a 'good-enough' approximation to an optimal selection process.
The probability of someone clicking on option A, B or C is unknown, but is expected to be constant when averaged over the population. Given the ratio of clicks versus views on any given option, the posterior distribution of that probability can be modelled as a Beta distribution. The experimental question is then: given the current estimates, which option should be presented to maximise the utility of the test?
For simply ranking the options, the utility may be the Shannon information. In this case though, the utility also has to incorporate the expected benefit of a click-through. One could set up a utility function which is weighted between the two outcomes, possibly varying over time.
In practice though, Beta distributions with different means tend to converge to separate peaks quite quickly, so taking a possible 10% hit on the current best estimate click-through outcome seems an entirely plausible approximation. Bayesian experimental design though could also tell you when to stop testing and stick with the winner.
To be valid, the last step (of which the author makes no mention) should be to compare the three groups to see if their differences are statistically significant. With tens of thousands of clicks, it's likely that they are, but the percentages were awfully close in the 2-3% range.
I do it even better with my Accelerated Market Research, which is based on Bayesian reasoning.
http://oyhus.no/AcceleratedMarketResearch.html
The multiarmed bandit problem is a problem in which you simultaneously try to optimize your overall reward and still explore. As a consumer, I face that problem (switch brands or stick with the tried-and-true). However, for focus groups, maximizing rewards for participants doesn't matter; it's all about finding the best solution for the organizer of the focus group. The participants already get the products for free. That means that it is not a multiarmed bandit problem, and algorithms for solving such problems are the wrong algorithms to use for focus groups.
There are mathematically more efficient ways of doing this kind of testing. But there are other constraints when testing with human beings as well, such as dependencies on the order in which you test. A/B testing is probably a pretty good compromise.
He isn't trying to use ML to predict the outcome of a focus group.
ML is about many things. One thing it is about is how a learner should explore an environment in order to maximize what he learns. It is one of those techniques that Hanov refers to, and it's a good idea in principle. But he picked the wrong algorithm for focus groups.
The algorithm he points to would is the right one for online testing of different web page designs, where you stick with your current design 99% of the time but show visitors different designs 1% of the time and see whether those work better or worse.
and suddenly the button with the racial epithet on it becomes the most popular one and you lose all your real customers.
Idiocy rewarded!
In the UK, most places will serve you a medium if you ask for medium rare, simply because most folk who ask for medium rare well send it back to the kitchen because it's "not cooked properly". We're not good with our steaks here.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'