Can Machine Learning Replace Focus Groups?
itwbennett writes "In a blog post, Steve Hanov explains how 20 lines of code can outperform A/B testing. Using an example from one of his own sites, Hanov reports a green button outperformed orange and white buttons. Why don't people use this method? Because most don't understand or trust machine learning algorithms, mainstream tools don't support it, and maybe because bad design will sometimes win."
I have read the synopsis 4 (four) times and I didn't get shit.
Of course, TFA sheds some light on the whole thing, but really... work on your short version, guys, because what's in here makes no sense.
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
So that you don't have to click through the slashvertisement, I have read TFA for you.
Here is a summary: Let's say you have several different designs for a web interface that you want to test to find out which one works the best.
One method is to have a "testing period" in which you randomly show each person one of the designs at random and identify how well it works for that person. Then, once you've shown 1,000 people each of the designs, you figure out which one is the best on average. Now the "testing period" is over, and the best design is shown to everyone from that point forward. That is the "old" method.
The "new" method is to dispense with the testing period. Instead, you show the first person one design at random. If it works (e.g. they click on the ad), it gets bonus points. If it doesn't work, it gets a penalty. At any time, you show the design with the most points; if it is bad, it will lose points over time and eventually stop being shown.
The goal of the "new" method is to hopefully avoid showing bad designs to 2000 people just to figure out which one is the best.
If you care about the details then you should probably read the article. This summary is just an approximation for those who can't be bothered or who object to slashvertisements on principle.
Just get the last answer and repeat it over and over.
Such a machine will be equally as good as any focus group.
NO !!!
AccountKiller
This is not "machine learning" subsituting for human A/B testing. It's just changing the ratio of the number of visitors exposed to the "new" feature to be tested from 50% to 10%, while keeping the rest (90%) of the visitors using the "best so far" feature. There's also a bit of randomness thrown in when choosing which new feature the 10% of visitors get to test.
In this scheme, the human visitors are still doing the A/B testing, it's just that determination of which human is testing which feature dynamically adapts over time.
Now, if this guy had subsituted human A/B testing completely with a machine learning technology that could somehow determine which feature is better without any input from humans, then I'd be impressed. That's kind of what the summary and article imply. But that's not what he's done. He's just being a bit more sophisticated regarding which humans get to test which feature.
He's also made a big fat claim regarding the effectiveness of his method with zero evidence to back it up. Theoretical results regarding one-armed bandit problems are quite a far cry for real-world results regarding website feature selection. I'm looking forward to seeing some results of the proposed method on the latter.
The only possibly benefit I can see from this is *maybe* adjusting a site's color-scheme or layout to be more intuitive and easy to navigate.
Well, for those of us who do use testing and usability reporting on a daily basis, or have jobs that *require* us to know what is easiest for people to navigate (read: any and all web designers), this is pretty nice, and I intend to use the concept heavily.
Bits of code, random ramblings: jakimfett.com
Throwing up banner ads with different color schemes and automatically re-weighting them based on click-through % is something I was doing well over ten years ago. This can't really be news, can it?
I live ze unknown. I love ze unknown. I am ze unknown.
A/B focus testing is about observing how customers or users choose between two alternatives based on their qualitative sense of aesthetics. ML is about classifying data based on quantifying the data into defined classes or toward optimal values.
Predicting the outcome of a focus group is a completely different problem than multi arm slot machines. In focus groups there is no objective metric, so focus group problems are not amenable to machine learning unless your machine can define, measure, and perhaps predict aesthetic criteria.
Now THAT I'd like to see.
Don't forget the sales pitch. It could help you chose between different text. Real world trials are far better than gut feel on that.
if I decide to add something to my cart, I'm confident I'll find the button even if it's 1.2% less optimized.
That's very well and good for you, but marketing and layout-optimization people are more interested in the question of whether one site or the other makes you more likely to decide to add something to your cart, and not whether you'll succeed once you've decided to do so.
DRM: Terminator crops for your mind!
For most people, myself included, I'd imagine the deciding factor is not website layout, but something much more obvious.
Money, dear boy. (spoken with an English accent, ofc)
Plus a variety of other factors like shipping speeds, general reputation of the sites, ease of RMA, etc... Whether the "buy" button is Green, Orange or White is quite simply the last on my list of priorities, and pulling metrics on it without examining the other factors will net faulty results.
This signature is false.
Ah, I see. You're one of those few people whose every decision is the logical, incontrovertible result of sober factual considerations.
"Psychology" is merely the study of what forces mold the choices of everyone's mind but yours.
DRM: Terminator crops for your mind!
It's a 'good-enough' approximation to an optimal selection process.
The probability of someone clicking on option A, B or C is unknown, but is expected to be constant when averaged over the population. Given the ratio of clicks versus views on any given option, the posterior distribution of that probability can be modelled as a Beta distribution. The experimental question is then: given the current estimates, which option should be presented to maximise the utility of the test?
For simply ranking the options, the utility may be the Shannon information. In this case though, the utility also has to incorporate the expected benefit of a click-through. One could set up a utility function which is weighted between the two outcomes, possibly varying over time.
In practice though, Beta distributions with different means tend to converge to separate peaks quite quickly, so taking a possible 10% hit on the current best estimate click-through outcome seems an entirely plausible approximation. Bayesian experimental design though could also tell you when to stop testing and stick with the winner.
To be valid, the last step (of which the author makes no mention) should be to compare the three groups to see if their differences are statistically significant. With tens of thousands of clicks, it's likely that they are, but the percentages were awfully close in the 2-3% range.
I do it even better with my Accelerated Market Research, which is based on Bayesian reasoning.
http://oyhus.no/AcceleratedMarketResearch.html
Sorry marketing and sales department, you're fired.
You can thank jxander for proving your jobs were never useful in the first place.
But don't feel bad; since competitor A offers the same service for 0.01% less, we'll soon be bankrupt anyway.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
The multiarmed bandit problem is a problem in which you simultaneously try to optimize your overall reward and still explore. As a consumer, I face that problem (switch brands or stick with the tried-and-true). However, for focus groups, maximizing rewards for participants doesn't matter; it's all about finding the best solution for the organizer of the focus group. The participants already get the products for free. That means that it is not a multiarmed bandit problem, and algorithms for solving such problems are the wrong algorithms to use for focus groups.
There are mathematically more efficient ways of doing this kind of testing. But there are other constraints when testing with human beings as well, such as dependencies on the order in which you test. A/B testing is probably a pretty good compromise.
He isn't trying to use ML to predict the outcome of a focus group.
ML is about many things. One thing it is about is how a learner should explore an environment in order to maximize what he learns. It is one of those techniques that Hanov refers to, and it's a good idea in principle. But he picked the wrong algorithm for focus groups.
The algorithm he points to would is the right one for online testing of different web page designs, where you stick with your current design 99% of the time but show visitors different designs 1% of the time and see whether those work better or worse.
and suddenly the button with the racial epithet on it becomes the most popular one and you lose all your real customers.
I lol'd.
DRM: Terminator crops for your mind!
Clever, retort sir, however might I interest you in a long forgotten theory of economics that something bought or sold might possibly have greater value, than that of the mechanism by which it is sold.
Which is why advertising and marketing are such underfunded spheres of public endeavour....
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Epsilon-greedy is one of the most well-known algorithms in machine learning. I'd heard of it before, but I didn't know how it works (I dropped AI after 2nd year), but I do now.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Oh FFS -- the use of button colours was what is known in technical jargon as an "example". The point of the article applies to all variables. And while you make think "layout" is less important than "shipping speeds", how do you find out shipping speeds? You have to look for them. If you can't find them, you walk. If you can't find them, chances are it's because of something we call in technical jargon "site design", which includes details such as "layout".
It's easy when you're designing something (I'm guessing you've never had to design anything for the public) to make lots of assumptions without even realising. You might put your "checkout" button where it is on your favourite webshop, but that might actually be the least obvious place to anyone who doesn't already share your shopping habits. Or maybe you think it's a wonderful shade of green, but what you don't realise (as someone with normal site and no understanding of occular defects) is that it's actually invisible against your chosen background to about 5% of the global population.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
In the UK, most places will serve you a medium if you ask for medium rare, simply because most folk who ask for medium rare well send it back to the kitchen because it's "not cooked properly". We're not good with our steaks here.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'