Can Machine Learning Replace Focus Groups?

OK, so... by war4peace · 2012-05-31 10:44 · Score: 5, Insightful

I have read the synopsis 4 (four) times and I didn't get shit.
Of course, TFA sheds some light on the whole thing, but really... work on your short version, guys, because what's in here makes no sense.

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)

Re:OK, so... by jakimfett · 2012-05-31 11:03 · Score: 1

I have read the synopsis 4 (four) times and I didn't get shit.
Read this AC submitted summary It may (or may not) enlighten you.

--
Bits of code, random ramblings: jakimfett.com
Re:OK, so... by WrongSizeGlass · 2012-05-31 12:52 · Score: 4, Funny

I have read the synopsis 4 (four) times and I didn't get shit.
Of course, TFA sheds some light on the whole thing, but really... work on your short version, guys, because what's in here makes no sense.
If you had just clicked the green button the machine would have understood it for you.
Re:OK, so... by Alex+Belits · 2012-05-31 13:24 · Score: 0

The article doesn't make any sense, either. Who, other than scammers, cares about trivial shit like one button being pressed by a random person that wandered to some web page? People write software that users use to accomplish some work. You can't recruit random people to perform random actions on a randomly changing user interface, and then collect statistics on what they accomplished.
To think of it, if someone did that, the "best" interface would look just like GNOME3... Oh shit...

--
Contrary to the popular belief, there indeed is no God.
Re:OK, so... by Tarsir · 2012-05-31 14:42 · Score: 5, Insightful

You know, I read the summary without understanding it, and just clicked through to read the article, but only after reading your comment did I realize just how little sense the summary really made.

In a blog post, Steve Hanov explains how 20 lines of code can outperform A/B testing.
It starts off talking about a nobody who did something that is apparently so trivial that it can be outdone by 20 lines of code. You might think that the following sentence will answer at least one of the questions raised by this sentence: Who is Steve Hanov? What is A/B testing? What do Steve's 20 lines of code do? But you'd be wrong.

Using an example from one of his own sites, Hanov reports a green button outperformed orange and white buttons.
Because the next sentence jumps to a topic whose banality and seeming irrelevance to the matter at hand defies belief. Three coloured buttons, one of which 'outperformed' the others, with nary a hint as to what these buttons do, or how one can outperform the others.

Why don't people use this method?
The third sentence appears to pick up where the first left off. Why don't people use the A/B testing method? Or are we talking about the three coloured buttons method?

Because most don't understand or trust machine learning algorithms, mainstream tools don't support it, and maybe because bad design will sometimes win.
The final sentence is a tour-de-force of disjointed confusion. It skips from machine learning algorithms that haven't been discussed, to tools with unknown purpose, to the design of something which was never specified.
It's like the summary is some kind of abstract art installation whose purpose is to be as uninformative as possible. It is literally the opposite of informative: Not only does it provide no information, it raises questions which you can't even be sure relate to the purported topic at hand, because you don't know what the topic at hand is.
It is either a bizarrely confused summary or one of the most artful trolls ever to grace Slashdot's front page
Re:OK, so... by Anonymous Coward · 2012-05-31 16:44 · Score: 0

Elegantly put. please mod up
capcha: Grander
Re:OK, so... by Anonymous Coward · 2012-05-31 17:28 · Score: 0

The summary was generated by a machine learning program that automatically learned to generate summaries of articles.
Re:OK, so... by artor3 · 2012-05-31 18:04 · Score: 1

Sadly, it learned to generate summaries by reading Slashdot :-(
Re:OK, so... by mwvdlee · 2012-05-31 18:34 · Score: 1

Anybody who wants to their users to take a certain action?
Think of websites (as stated in TFS) or focus group testing (also stated in TFS).
A lot of user interface testing is basically looking at how a user interacts with a UI. Things like automated testing could show you that people more easily recognize the functionality of the [OK] button over a functionally identical [Well, might as well try and go ahead with doing what I wanted to do] button.
As for websites; even on my open source project websites I prefer people press the [Download] button instead of browsing to a different site. Imagine how it is when commercial interests are at stake (even sites like /. want you to give them money).

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:OK, so... by Alex+Belits · 2012-06-01 00:43 · Score: 1

Anybody who wants to their users to take a certain action?
Think of websites (as stated in TFS) or focus group testing (also stated in TFS).
My response to that is identical to the comment you are replying to.

--
Contrary to the popular belief, there indeed is no God.
Re:OK, so... by Anonymous Coward · 2012-06-01 02:31 · Score: 0

I'm pretty sure this summary was what the 20 lines of code generated when the article was used as input.
Re:OK, so... by littlewink · 2012-06-01 02:58 · Score: 0

RTFA
I understood it the first time I read it yesterday.
Today, listening to your complaint, I read it again and still understand it.
Maybe you're not bright enough to either a) read it carefully or
b)understand it. No problema - there are plenty of jobs as janitors and car salesmen.
Re:OK, so... by littlewink · 2012-06-01 03:00 · Score: 0

In the time you took to complain you could have RTFA. I understood it the first time I read it yesterday. Today, listening to complaints, I read it again and still understand it. Maybe you're not bright enough to either a) read it carefully or b)understand it. No problema - there are plenty of jobs as janitors and car salesmen.
Re:OK, so... by lurker1997 · 2012-06-01 04:26 · Score: 1

That is probably the best post I have ever read here. Extremely insightful and hilariously written. I was in tears laughing through most of it.
Re:OK, so... by war4peace · 2012-06-01 05:08 · Score: 1

Smart. Very smart. You should be proud of yourself, being part of an elite that has the inherent right of stomping less-gifted people. Gratz!

--
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
Re:OK, so... by Tarsir · 2012-06-01 05:09 · Score: 1

In the time you took to complain you could have RTFA.
Reread my post. I clicked through and read the article before posting my comment.

I understood it the first time I read it yesterday.
No you didn't. As the summary contains no actual information, you filled it in with your own prejudices and preconceptions, no doubt because you are not in the habit of reading things carefully. cf My first point.

Today, listening to complaints, I read it again and still understand it.
What is this supposed to prove? Of course you "still" understand it after having read the full article, unless you think people habitually lose all knowledge of their previous experiences after sleeping for eight hours.

Maybe you're not bright enough to either a) read it carefully
Ha! cf My first point again.

or b)understand it.
cf My second point.

No problema - there are plenty of jobs as janitors and car salesmen.
What do you do that's so prestigious and intellectually demanding?

Much ado ... by jxander · 2012-05-31 10:52 · Score: 0

about nothing

Wake me up when they produce banner-ad algorithms that beat adblock, noscript, etc.

The only possibly benefit I can see from this is *maybe* adjusting a site's color-scheme or layout to be more intuitive and easy to navigate. I.E. making the "add to cart" button easier to find without being obnoxious about it. But then again, if I decide to add something to my cart, I'm confident I'll find the button even if it's 1.2% less optimized. And visual optimization can be done by any 1st year graphic design student.

--
This signature is false.

Re:Much ado ... by jakimfett · 2012-05-31 11:05 · Score: 1

The only possibly benefit I can see from this is *maybe* adjusting a site's color-scheme or layout to be more intuitive and easy to navigate.
Well, for those of us who do use testing and usability reporting on a daily basis, or have jobs that *require* us to know what is easiest for people to navigate (read: any and all web designers), this is pretty nice, and I intend to use the concept heavily.

--
Bits of code, random ramblings: jakimfett.com
Re:Much ado ... by Anonymous Coward · 2012-05-31 11:18 · Score: 0

I intend to use the concept heavily.
After paying your patent license fees!
Re:Much ado ... by BasilBrush · 2012-05-31 11:28 · Score: 1

Don't forget the sales pitch. It could help you chose between different text. Real world trials are far better than gut feel on that.
Re:Much ado ... by spazdor · 2012-05-31 11:50 · Score: 1

if I decide to add something to my cart, I'm confident I'll find the button even if it's 1.2% less optimized.
That's very well and good for you, but marketing and layout-optimization people are more interested in the question of whether one site or the other makes you more likely to decide to add something to your cart, and not whether you'll succeed once you've decided to do so.

--
DRM: Terminator crops for your mind!
Re:Much ado ... by jxander · 2012-05-31 12:09 · Score: 1

For most people, myself included, I'd imagine the deciding factor is not website layout, but something much more obvious.
Money, dear boy. (spoken with an English accent, ofc)
Plus a variety of other factors like shipping speeds, general reputation of the sites, ease of RMA, etc... Whether the "buy" button is Green, Orange or White is quite simply the last on my list of priorities, and pulling metrics on it without examining the other factors will net faulty results.

--
This signature is false.
Re:Much ado ... by spazdor · 2012-05-31 12:32 · Score: 1

Ah, I see. You're one of those few people whose every decision is the logical, incontrovertible result of sober factual considerations.
"Psychology" is merely the study of what forces mold the choices of everyone's mind but yours.

--
DRM: Terminator crops for your mind!
Re:Much ado ... by Anonymous Coward · 2012-05-31 16:35 · Score: 0

Clever, retort sir, however might I interest you in a long forgotten theory of economics that something bought or sold might possibly have greater value, than that of the mechanism by which it is sold. Tsk, tsk, I apologize sir, I hadn't originally noticed the little billboard for the red crawfish tattooed on your arm, well, I suppose I'm off for lunch,anything but seafood I suppose.
Re:Much ado ... by mwvdlee · 2012-05-31 18:44 · Score: 1

Sorry marketing and sales department, you're fired.
You can thank jxander for proving your jobs were never useful in the first place.
But don't feel bad; since competitor A offers the same service for 0.01% less, we'll soon be bankrupt anyway.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Much ado ... by spazdor · 2012-06-01 07:06 · Score: 1

I lol'd.

--
DRM: Terminator crops for your mind!
Re:Much ado ... by Anonymous Coward · 2012-06-01 08:32 · Score: 0

What are you talking about, the first thing I look for on any website is if the checkout button is green or red, that is the determining factor for if I purchase there, if the button is not Pale_Golden_Rod I do not even give it a second thought, that is unless it is Pea_Green. /Sarcasm.
Re:Much ado ... by Half-pint+HAL · 2012-06-01 11:39 · Score: 1

Clever, retort sir, however might I interest you in a long forgotten theory of economics that something bought or sold might possibly have greater value, than that of the mechanism by which it is sold.
Which is why advertising and marketing are such underfunded spheres of public endeavour....

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Re:Much ado ... by Half-pint+HAL · 2012-06-01 11:41 · Score: 1

Epsilon-greedy is one of the most well-known algorithms in machine learning. I'd heard of it before, but I didn't know how it works (I dropped AI after 2nd year), but I do now.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Re:Much ado ... by Half-pint+HAL · 2012-06-01 11:52 · Score: 1

Oh FFS -- the use of button colours was what is known in technical jargon as an "example". The point of the article applies to all variables. And while you make think "layout" is less important than "shipping speeds", how do you find out shipping speeds? You have to look for them. If you can't find them, you walk. If you can't find them, chances are it's because of something we call in technical jargon "site design", which includes details such as "layout".
It's easy when you're designing something (I'm guessing you've never had to design anything for the public) to make lots of assumptions without even realising. You might put your "checkout" button where it is on your favourite webshop, but that might actually be the least obvious place to anyone who doesn't already share your shopping habits. Or maybe you think it's a wonderful shade of green, but what you don't realise (as someone with normal site and no understanding of occular defects) is that it's actually invisible against your chosen background to about 5% of the global population.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

Translation by Anonymous Coward · 2012-05-31 10:56 · Score: 5, Informative

So that you don't have to click through the slashvertisement, I have read TFA for you.

Here is a summary: Let's say you have several different designs for a web interface that you want to test to find out which one works the best.

One method is to have a "testing period" in which you randomly show each person one of the designs at random and identify how well it works for that person. Then, once you've shown 1,000 people each of the designs, you figure out which one is the best on average. Now the "testing period" is over, and the best design is shown to everyone from that point forward. That is the "old" method.

The "new" method is to dispense with the testing period. Instead, you show the first person one design at random. If it works (e.g. they click on the ad), it gets bonus points. If it doesn't work, it gets a penalty. At any time, you show the design with the most points; if it is bad, it will lose points over time and eventually stop being shown.

The goal of the "new" method is to hopefully avoid showing bad designs to 2000 people just to figure out which one is the best.

If you care about the details then you should probably read the article. This summary is just an approximation for those who can't be bothered or who object to slashvertisements on principle.

Re:Translation by jakimfett · 2012-05-31 11:01 · Score: 1

...thank you for saving me the work of slogging through it on my own.

--
Bits of code, random ramblings: jakimfett.com
Re:Translation by mark-t · 2012-05-31 11:18 · Score: 1, Interesting

The "new" method has the problem of immediately favoring the first design to get a positive response.
My own experience with focus groups is that they were more interested in _WHY_ you chose something the way you did, rather than in just what you chose. I'm not entirely sure how this algorithm will determine that.

--
File under 'M' for 'Manic ranting'
Re:Translation by spazdor · 2012-05-31 11:42 · Score: 3, Informative

The "new" method has the problem of immediately favoring the first design to get a positive response.
No it doesn't. The designs are ranked according to what percentage of responses have been positive so far, not by the total number of positive responses. The first design to get a positive response will get shown more, and thus it will get more positive responses, and more negative responses.

--
DRM: Terminator crops for your mind!
Re:Translation by Anonymous Coward · 2012-05-31 12:12 · Score: 0

You didn't read the article, did you?
Re:Translation by spazdor · 2012-05-31 12:14 · Score: 2

More people will inevitably vote it down (unless it is indeed the best option), because it's getting more exposure.
Unless you're saying that display frequency will actually affect click-through rate. Are you suggesting that, for instance, a design which only gets shown 300 times and gets 100 positive responses, if it were shown 3000 times instead it should be expected to get more than 1000 positive responses? This seems unlikely if successive tests are causally independent (and given that successive tests are most likely completely different site users, at different computers, who have never met each other, that seems a fair assumption.)

--
DRM: Terminator crops for your mind!
Re:Translation by mark-t · 2012-05-31 12:32 · Score: 1

My remark was on the algorithm that the poster above had presented... not the article.

--
File under 'M' for 'Manic ranting'
Re:Translation by WrongSizeGlass · 2012-05-31 12:55 · Score: 2

Is there any way they can apply this to summaries and stories on /.? I'd be willing to read that summary ... and maybe even that story.
Re:Translation by swillden · 2012-05-31 12:59 · Score: 5, Informative

No.... I'm suggesting that the algorithm presented above, which only ever displays the single highest scoring design, is biased against designs that haven't yet had a chance to be viewed by anybody, and thus have not had an opportunity to get a positive response, when people are already showing some favor towards others.
What you're missing is the implied assumption that all of the options will fail most of the time, and that all options are initialized with maximum scores. The goal is to find the design that best motivates the user to take some action (e.g. click a link), and the assumption is that most of the time the user will not take that action. By starting all of the choices at a high value, they will all gradually converge downward to their true effectiveness rate, at which point the most effective will be chosen nearly all of the time. During the convergence process, the "leader" may change, but if the current leader isn't the true best, as it gets driven towards it's true rate, it will eventually dip under one of the others.
If, by chance, a more effective option has a really bad run early on and gets pushed below the true effectiveness rate of another option, it would never recover -- which is why the author includes an occasional randomly-selected choice. If there is a large difference between the effectiveness of the options this is really unlikely to happen, but in the rare event it happens the randomization will eventually fix it. The author also covers a method of handling the case where the audience preferences drift over time, by including the ability to "forget" old input via simple exponential decay.
The only really bad thing about this approach is that it assumes you don't have a lot of repeat visitors. If you do, they'll be annoyed by seeing different versions, apparently at random (from their perspective).

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:Translation by DerekLyons · 2012-05-31 15:05 · Score: 1

The "new" method has the problem of immediately favoring the first design to get a positive response.
Only if you're stupid enough to only show the design with the highest score. Something as simple as choosing randomly among the top .75n results (where n=number of designs under test) fixes that.
Re:Translation by Anonymous Coward · 2012-05-31 18:50 · Score: 0

Reading the article would make you appear less stupid. The choices are initialised to have a 100% success rate, not 0%, and so those will automatically become the "highest scoring" if the others fail even one test.
Re:Translation by mwvdlee · 2012-05-31 19:00 · Score: 1

Take a piece of paper and try to run down some scenarios. Try to find a scenario that disproves your own theory, then figure out why.
I'm sure there are edge cases where this "new" method fails, but there are also edge cases where classical focus group testing fails.
Since my job involves some A/B testing, I did the above and found some edge cases. But they're far less likely and with some job-specific optimizing (we have relatively long feedback delays) these edge cases can be mitigated.
Most interesting issue I found is when the positive feedback to each of the choices is near 100%. Not much of a problem unless having 100% positive feedback is somehow negative.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Translation by Anonymous Coward · 2012-05-31 21:15 · Score: 0

So you mean he proposes to show the users of his website a 'randomly' selected design each time they visit...
And I was just thinking people were complaining about Facebook for constantly changing their UI!
Re:Translation by spazdor · 2012-06-01 04:57 · Score: 1

designs that haven't yet had a chance to be viewed by anybody,
There are no such designs in this model, owing to the fact that 10% of all visitors are shown a design at random, unweighted by previous measurements.
Seriously, the algorithm presented in TFA anticipates and addresses your objection perfectly. You'd do well to check it out; AC's summary up there was good but incomplete.

--
DRM: Terminator crops for your mind!
Re:Translation by Half-pint+HAL · 2012-06-01 11:56 · Score: 1

The only really bad thing about this approach is that it assumes you don't have a lot of repeat visitors. If you do, they'll be annoyed by seeing different versions, apparently at random (from their perspective).
What he doesn't discuss is what "one" instance of the site is. If you've got tracking cookies switched on, then you can assign one version of the site to the user at first visit and have it persist across browsing sessions.
An oversight on the author's part, but not a huge leap of logic.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

Of course it can. by thoughtspace · 2012-05-31 10:58 · Score: 1

Just get the last answer and repeat it over and over.
Such a machine will be equally as good as any focus group.

Re:Of course it can. by mwvdlee · 2012-05-31 19:09 · Score: 1

Imagine you have 3 buttons...
First user sees button 1, clicks it.
Next user sees button 1 (because repeat), doesn't click it.
Next user sees button 2, doesn't click it.
Next user sees button 3, doesn't click it.
Next user sees button 1, clicks it.
Next user sees button 1 (because repeat), doesn't click it.
Next user sees button 2, doesn't click it.
Next user sees button 3, doesn't click it. ...repeat...
Even though button 1 has a 50% success rate and the other buttons 0% (and is thus infinitely better), it's only shown 50% of the time.
In this example, you'd want to show button 1 ~100% of the time, since it's the only button that ever gets clicked.
Just repeating the last anwer produces sub-optimal results.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Of course it can. by Anonymous Coward · 2012-06-01 05:10 · Score: 0

Okay, let's imagine that each button is set so that 0/0 represents a 100% rate. Here's what I'm pretty sure would happen

First user sees button 1, clicks it.
1(1/1), 2(0/0), 3(0/0)

Next user sees button 1 (because repeat), doesn't click it.
1(1/2), 2(0/0), 3(0/0)

Next user sees button 2, doesn't click it.
1(1/2) ,2(0/1), 3(0/0)

Next user sees button 3, doesn't click it.
1(1/2), 2(0/1), 3(0/1)

Next user sees button 1, clicks it.
1(2/3), 2(0/1), 3(0/1)

Next user sees button 1 (because repeat), doesn't click it.
1(2/4), 2(0/1), 3(0/1)
From this point, everyone sees button 1. I don't get why button 2 would ever be shown again except in the 10% random case.
Re:Of course it can. by Half-pint+HAL · 2012-06-01 12:08 · Score: 1

And that is precisely why they don't set it to 0/0 = 100%, instead initialising everything to 1:1 = 100%
1(1:1) 2(1:1) 3(1:1)
First user sees 1, clicks it:
1(2:2) 2(1:1) 3(1:1)
At this point, the algorithm could still pick any of the three.
Say it picks 1 again, and this is not clicked:
1(2:3) 2(1:1) 3(1:1)
So say it picks 2 for the next user, but the user doesn't click it:
1(2:3) 2(1:2) 3(1:1)
Well this time it has to pick 3 (unless the 10% random kicks in). Lets assume that's unsuccessful.
1(2:3) 2(1:2) 3(1:2)
OK, so 1 is now favoured, but one more "no click" on 1 levels us off 2:4 = 1:2.

There will never be a true zero probability in the epsilon-greedy algorithm, and it can only approximate zero after accumulating an awful lot of evidence...

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

Can Machine Learning Replace Focus Groups? by dgharmon · 2012-05-31 10:59 · Score: 1

NO !!!

--
AccountKiller

Re:Can Machine Learning Replace Focus Groups? by Half-pint+HAL · 2012-06-01 12:09 · Score: 1

Of course not. The whole point of a focus group is for the facilitator to lead the group to the conclusion he or she wants. Management can't maipulate machine learning algorithms -- only developers can.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

What the...? by Anonymous Coward · 2012-05-31 11:02 · Score: 0

Is this a Turing test?

This is not exclusively machine learning by Anonymous Coward · 2012-05-31 11:04 · Score: 5, Insightful

This is not "machine learning" subsituting for human A/B testing. It's just changing the ratio of the number of visitors exposed to the "new" feature to be tested from 50% to 10%, while keeping the rest (90%) of the visitors using the "best so far" feature. There's also a bit of randomness thrown in when choosing which new feature the 10% of visitors get to test.

In this scheme, the human visitors are still doing the A/B testing, it's just that determination of which human is testing which feature dynamically adapts over time.

Now, if this guy had subsituted human A/B testing completely with a machine learning technology that could somehow determine which feature is better without any input from humans, then I'd be impressed. That's kind of what the summary and article imply. But that's not what he's done. He's just being a bit more sophisticated regarding which humans get to test which feature.

He's also made a big fat claim regarding the effectiveness of his method with zero evidence to back it up. Theoretical results regarding one-armed bandit problems are quite a far cry for real-world results regarding website feature selection. I'm looking forward to seeing some results of the proposed method on the latter.

Re:This is not exclusively machine learning by BasilBrush · 2012-05-31 11:51 · Score: 1

So you want to do A/B testing on whether this algorithm is better than A/B testing?
It'd probably be better to use the epsilon-greedy method when deciding whether the A/B testing or epsilon-greedy algorithm is better.
Or maybe not. Well have to test that too.
It's testing all the way down.
Re:This is not exclusively machine learning by tgv · 2012-05-31 18:36 · Score: 2

Indeed, this has no relation to machine learning, whatsoever. The summary is once again ... deceptive.
And I'm sure the proof, that the best one gets chosen, doesn't exist. I'm also sure that this [i]way of choosing[/i] an interface has a high probability of choosing the preferred one, but there is also a big difference with A/B testing: you'll never know how big the difference between the two is. In straight-forward testing with two groups (which is not really A/B, by the way: that is alternating between A and B and then ask the subject to chose the best one; it has its origins in perceptual testing, where ABX testing is preferred), you can find out the difference in scores. Here you can't.
Re:This is not exclusively machine learning by khipu · 2012-05-31 20:57 · Score: 1

Both Hanov and you are mixing up a couple of things. A/B testing is done with focus groups, not live visitors. When you test with focus groups, you don't run a live web server, and you're willing to pay for completion of some test design.
Algorithms for use with the multiarmed bandit are already widely used in live testing. Those algorithms properly belong to the field of machine learning (reinforcement learning), but it turns out that very simple algorithms or strategies are hard to beat. You're right that it's not "exclusively" machine learning because the simple algorithms were already known before machine learning even existed, but these algorithms are still primarily studied in machine learning.
As for whether these methods are effective, that's easy: they are, and they are widely used. The part that's hard isn't to decide which versions of a page to present how often, but instead to figure out which change was responsible for the better outcome you were interested in.
Re:This is not exclusively machine learning by Cederic · 2012-06-01 01:31 · Score: 1

You can A/B test with live visitors. Works well too.
I think his approach has merit, but it's really just an automatically applied implementation of the outcome of the test - at some point you'd want to switch off A or B completely anyway.
Of course, far more interesting would be understanding why people chose A or B and offering the appropriate one based on what you know of the person involved. That's more sophisticated, but already done by people like Amazon: My amazon.co.uk web page will be very different to yours, in terms of content.
Re:This is not exclusively machine learning by khipu · 2012-06-01 02:18 · Score: 1

You can A/B test with live visitors. Works well too.
It's still not a multi-armed bandit situation. The multi-armed bandit situation specifically means that you present either A or B, not an A/B choice. There are other machine learning techniques for optimizing A/B tests, just not the ones in the article.
Re:This is not exclusively machine learning by Half-pint+HAL · 2012-06-01 12:12 · Score: 1

Indeed, this has no relation to machine learning, whatsoever.
Is there an algorithm? Does the machine use the algorithm to obtain the optimum result? Just because the machine uses humans as its test subjects doesn't stop it being machine learning.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Re:This is not exclusively machine learning by tgv · 2012-06-01 20:55 · Score: 1

So ... sorting is machine learning? MS Word is machine learning? Don't think so.
Nowhere did I nor the GP claim that machines have to be involved. And the machine doesn't use humans in this case, it just uses their choices as its data. So your rebuttal is somewhat unfounded.
Machine learning is learning in the first place, through algorithm: a machine can learn to do a task on its own. Not: a machine assists in a task where someone else learns. In this case, the machine doesn't learn anything. It just acts as a biased dice. The outcome of the process might be called "learned", but the knowledge is in the head of the one that runs the experiment and overlooks the outcome, not in the machine. And the "learning" doesn't generalize, so it doesn't help in improving performance on any other task than selecting between these two designs.
That's why it's not machine learning.
Re:This is not exclusively machine learning by Half-pint+HAL · 2012-06-02 20:09 · Score: 1

So ... sorting is machine learning? MS Word is machine learning? Don't think so.
Nowhere did I nor the GP claim that machines have to be involved. And the machine doesn't use humans in this case, it just uses their choices as its data. So your rebuttal is somewhat unfounded.
Machine learning is learning in the first place, through algorithm: a machine can learn to do a task on its own. Not: a machine assists in a task where someone else learns. In this case, the machine doesn't learn anything. It just acts as a biased dice. The outcome of the process might be called "learned", but the knowledge is in the head of the one that runs the experiment and overlooks the outcome, not in the machine. And the "learning" doesn't generalize, so it doesn't help in improving performance on any other task than selecting between these two designs.
That's why it's not machine learning.
A hell of a lot of machine learning is based around giving the computer equation and let it work out the particular coefficients that give the best possible answer. There are very few machine learning tasks that don't have some sort of experimenter assumptions built in, and no machine learning algorithm is ever 100% generalisable (otherwise machine learning would be a pretty small field, as there would only be one machine learning algorithm!)
The reason that this is classed as a machine learning problem and sort isn't is that a sorting algorithm runs once and gives you a definite answer. But with epsilon-greedy, the computer maintains a theory that approximates the "correct" answer, and over time the answer gets better and better without direct operator control.
Yes, it's a simple algorithm. Yes, you could do a similar thing on paper with a human controller. But that doesn't stop the computer implementation qualifying as machine learning.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

This Is News? by hondo77 · 2012-05-31 11:09 · Score: 2

Throwing up banner ads with different color schemes and automatically re-weighting them based on click-through % is something I was doing well over ten years ago. This can't really be news, can it?

--
I live ze unknown. I love ze unknown. I am ze unknown.

Re:This Is News? by BasilBrush · 2012-05-31 11:34 · Score: 1

Maybe, given that most sites aren't doing it means it comes under "stuff that matters".
Re:This Is News? by hondo77 · 2012-06-01 12:23 · Score: 1

I meant that I can't believe this is news because I assumed people had been doing this for years.

--
I live ze unknown. I love ze unknown. I am ze unknown.
Re:This Is News? by Anonymous Coward · 2012-06-03 03:53 · Score: 0

I meant that I can't believe this is news because I assumed people had been doing this for years.
Good for you. Too bad you didn't patent it. And I'm not being sarcastic.
As for the method employed: Wired had a big article on a/b testing last month which is probably why the summary was crap. It was written from the perspective of someone who already knew what a/b testing was.
Although the summary was poorly written, since I had seen the previous wired article I wanted to see what the new new hotness was.
Now to add to the actual discussion, my understanding from the wired article was that they give all candidates an even test run (of say a few thousand page views). Then based on performance they'll select one. Instead of dynamically weighting them immediately after the first performance feedback is entered. This would make sense for 2 reasons.
1) The computational expense of deciding which page to show on the fly based on it's most recent popularity is higher than getting a static sample run
2) Different pages may perform well in different demographics. If you introduce a new style and it gets downvoted to oblivion by the 10am crowd, you may never find out that it would have blown away all candidates with the 10pm crowd.
There are probably more reasons and I'm just pulling those out of my ass but there you go.

The article's premise is entirely wrong by RandCraw · 2012-05-31 11:15 · Score: 5, Insightful

A/B focus testing is about observing how customers or users choose between two alternatives based on their qualitative sense of aesthetics. ML is about classifying data based on quantifying the data into defined classes or toward optimal values.

Predicting the outcome of a focus group is a completely different problem than multi arm slot machines. In focus groups there is no objective metric, so focus group problems are not amenable to machine learning unless your machine can define, measure, and perhaps predict aesthetic criteria.

Now THAT I'd like to see.

Re:The article's premise is entirely wrong by Anonymous Coward · 2012-05-31 11:26 · Score: 0

A/B focus testing is about observing how customers or users choose between two alternatives based on their qualitative sense of aesthetics. ML is about classifying data based on quantifying the data into defined classes or toward optimal values.
Predicting the outcome of a focus group is a completely different problem than multi arm slot machines. In focus groups there is no objective metric, so focus group problems are not amenable to machine learning unless your machine can define, measure, and perhaps predict aesthetic criteria.
Now THAT I'd like to see.
If you read TFA, you'll see that humans input the data to the machine. Then, the machine "learns" what is statistically best. The browser user chooses to click based on aesthetic criteria and the machine counts the votes for that link. So, it is really like a double-blind focus group.
Re:The article's premise is entirely wrong by BasilBrush · 2012-05-31 11:39 · Score: 1

Neither the article nor the summary says anything about A/B focus testing. Or mention focus groups at all. It refers to A/B testing, where 2 different websites are offered to customers, and the better one found according to how objectively successful it has been. (by sales, clicks or whatever numerical measure.)
Re:The article's premise is entirely wrong by retchdog · 2012-05-31 11:59 · Score: 2

i don't know what the fuck a "double-blind" focus group is, since the user is clearly not blind to the design (this is the entire point).
and the reason why this is "like" a focus group, is that it is a focus group. all the information is coming from humans; it's just being used in a not-completely-idiotic way.
it's such an obvious idea it's surprising that no one has done this yet. oh, wait: http://m6d.com/about/about-us/
"Because the approach is rooted in machine learning, it continuously updates advertising decisions based on real-time signals from a marketer’s customer base. That feedback loop allows us to improve advertising performance over time."

--
"They were pure niggers." – Noam Chomsky
Re:The article's premise is entirely wrong by RandCraw · 2012-05-31 13:22 · Score: 1

You're right. My criticism was misdirected. The article is fine; it's not about ML or focus groups but minimizing trial size.
It was the Slashdot summary that somehow saw it as 'ML Replaces Focus Groups'. Thee-a-culpa.
Re:The article's premise is entirely wrong by Hognoxious · 2012-05-31 19:38 · Score: 1

Somebody in the chain, probably the submitter, thinks "user trials" and "focus groups" are synonyms.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:The article's premise is entirely wrong by Half-pint+HAL · 2012-06-01 12:19 · Score: 1

No, it's not a focus group. A focus group is a bunch of people talking about what they like/don't like. However, humans are very poor at judging what they like. Most living room (en_US "lounge") chairs are uncomfortable. People buy them because when they sit down on them in the showroom, they appear comfortable. Because they encourage poor posture, they take the strain off the sitting muscles. This gives the illusion of relaxation, and tricks people into believing the uncomfortable is comfortable.
A related issue is the fact that the majority of people claim to like their steaks "medium rare". Not because they like them medium rare, but because that's what they hear on the TV.
Focus groups are more often than not a total waste of time.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Re:The article's premise is entirely wrong by retchdog · 2012-06-01 16:22 · Score: 1

yeah i know what a real focus group is, but it's a reasonable metonymic usage imho. welcome to today's internet, where you're never more than a statistic, unless someone actually notices you, in which case god help you.
medium rare: well, it's also what i'd personally recommend to someone... it's a good starting point. imho anything more than medium is a waste of decent steak, so medium-rare is in the middle of acceptable. personally, i go for rare at most if i'm at a good place (which is none-too-often, sadly), or if i'm cooking.

--
"They were pure niggers." – Noam Chomsky

Bayesian modelling and experiment design by HalfFlat · 2012-05-31 13:43 · Score: 2

It's a 'good-enough' approximation to an optimal selection process.

The probability of someone clicking on option A, B or C is unknown, but is expected to be constant when averaged over the population. Given the ratio of clicks versus views on any given option, the posterior distribution of that probability can be modelled as a Beta distribution. The experimental question is then: given the current estimates, which option should be presented to maximise the utility of the test?

For simply ranking the options, the utility may be the Shannon information. In this case though, the utility also has to incorporate the expected benefit of a click-through. One could set up a utility function which is weighted between the two outcomes, possibly varying over time.

In practice though, Beta distributions with different means tend to converge to separate peaks quite quickly, so taking a possible 10% hit on the current best estimate click-through outcome seems an entirely plausible approximation. Bayesian experimental design though could also tell you when to stop testing and stick with the winner.

Re:Bayesian modelling and experiment design by ShieldW0lf · 2012-05-31 14:05 · Score: 1

If you used this type of algorithm to rotate a selection of different-but-good style sheets on a website, you'd be able to go past "which one is best at the time the test was devised" and actually build sites that pre-emptively and reactively stay "fashionable", "trendy" and "cool".

--
-1 Uncomfortable Truth
Re:Bayesian modelling and experiment design by shadowrat · 2012-05-31 16:44 · Score: 1

An algorithm like this isn't going to always pick a trendy and fashionable design. It's going to pick the least bad design you have. If you make 15 designs now, they will probably all be tired in 2 years. Sure the algorithm will say design 7 is the best 2 years from now, but it's probably not as good as whatever your designer would come up with at that time. Its probably better to plan on your designer making the 15 designs over the span of the 2 years .That way you know you are submitting designs made under the influence of the current culture and tastes.
Re:Bayesian modelling and experiment design by ShieldW0lf · 2012-06-01 03:19 · Score: 1

You're not wrong... but, there are scenarios where, for example, a designer comes up with 4 proposed designs, all of which are good, and someone need to make a decision as to which one to go with without any meaningful way to differentiate. This algorithm allows all 4 to be approved as "functional and not embarrassing" and put into place.
And yes, 2 years later, you might decide it's a good idea to hire a designer to freshen things up, and have them deliver you a few more designs. But, with a pattern like this, you don't need to discard the old ones... you can add the new ones in amongst the old and have the algorithm elevate the one that is popular.
But the real gem would be to find out that the design that was least popular 4 years ago is actually in better sync with what is stylish now, more so than the ones you paid for 6 months ago, and have that dusty old design automatically leap to the front of the queue without you even having to think about it.

--
-1 Uncomfortable Truth
Re:Bayesian modelling and experiment design by martas · 2012-06-01 09:15 · Score: 1

For simple non-critical things like web design what parent describes is all well and good, but please don't use any similar method for a problem with serious consequences, be it in medicine or science or anything like that. There are statistically sound ways of doing experimental design, e.g. for deciding when to stop an experiment, and they are not Bayesian (usually).

--
weinersmith
Re:Bayesian modelling and experiment design by HalfFlat · 2012-06-02 17:32 · Score: 1

I am honestly curious: why should Bayesian experimental design not be used for serious work?
Re:Bayesian modelling and experiment design by martas · 2012-06-04 12:14 · Score: 1

Put simply, because it is the wrong tool. Frequentist methods for problems like hypothesis testing and confidence set estimation were designed based on some simple assumptions that probably never really hold in the real world, but probably aren't very far from the truth. Bayesian methods rely on assumptions (and definitions of what kind of error is to be avoided) that are not suitable for many problems in science and medicine. E.g. Bayesian confidence interval estimation will tell you that "on average" over the random distribution of the unknown parameter you're estimating (i.e. the prior distribution that you pulled out of your ass) you won't be off by more than a certain amount. But clearly if what you're estimating is, for example, the safe dose of radiation for workers at a nuclear power plant, there is no random distribution over that amount. There is just a single maximal amount that is safe. Hence, the guarantee you need is that in the worst case over all possible unknown values of the quantity to estimate, you won't be off by more than some amount. This is exactly the kind of guarantee that frequentist methods give you.

Hope that explanation isn't complete gibberish to you...

--
weinersmith

Er, how about statistical significance? by blach · 2012-05-31 14:04 · Score: 2

To be valid, the last step (of which the author makes no mention) should be to compare the three groups to see if their differences are statistically significant. With tens of thousands of clicks, it's likely that they are, but the percentages were awfully close in the 2-3% range.

Even better by Kim0 · 2012-05-31 18:03 · Score: 1

I do it even better with my Accelerated Market Research, which is based on Bayesian reasoning.

http://oyhus.no/AcceleratedMarketResearch.html

wrong algorithm by khipu · 2012-05-31 20:40 · Score: 1

The multiarmed bandit problem is a problem in which you simultaneously try to optimize your overall reward and still explore. As a consumer, I face that problem (switch brands or stick with the tried-and-true). However, for focus groups, maximizing rewards for participants doesn't matter; it's all about finding the best solution for the organizer of the focus group. The participants already get the products for free. That means that it is not a multiarmed bandit problem, and algorithms for solving such problems are the wrong algorithms to use for focus groups.

There are mathematically more efficient ways of doing this kind of testing. But there are other constraints when testing with human beings as well, such as dependencies on the order in which you test. A/B testing is probably a pretty good compromise.

you got it wrong too by khipu · 2012-05-31 20:46 · Score: 1

Predicting the outcome of a focus group is a completely different problem than multi arm slot machines.

He isn't trying to use ML to predict the outcome of a focus group.

ML is about classifying data based on quantifying the data into defined classes or toward optimal values.

ML is about many things. One thing it is about is how a learner should explore an environment in order to maximize what he learns. It is one of those techniques that Hanov refers to, and it's a good idea in principle. But he picked the wrong algorithm for focus groups.

The algorithm he points to would is the right one for online testing of different web page designs, where you stick with your current design 99% of the time but show visitors different designs 1% of the time and see whether those work better or worse.

Then /b finds your site... by GrumpySteen · 2012-06-01 02:59 · Score: 1

and suddenly the button with the racial epithet on it becomes the most popular one and you lose all your real customers.

Too Dumb to Understand, Therefore "5,Insightful" by Anonymous Coward · 2012-06-01 08:00 · Score: 0

Idiocy rewarded!

OT: steaks by Half-pint+HAL · 2012-06-01 20:24 · Score: 1

In the UK, most places will serve you a medium if you ask for medium rare, simply because most folk who ask for medium rare well send it back to the kitchen because it's "not cooked properly". We're not good with our steaks here.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

Re:OT: steaks by retchdog · 2012-06-04 14:04 · Score: 1

that's a shame, but in line with the stereotypes of english food i suppose.
by the way, i've only read about and seen pictures of beef wellington, but it seems to me to be the culinary equivalent of an orgy, and would be, in and of itself, a total redemption of british cuisine. am i wrong here?

--
"They were pure niggers." – Noam Chomsky

Slashdot Mirror

Can Machine Learning Replace Focus Groups?

93 comments