Incorporating Machine Learning into Firefox 2.0?
blakeross asks: "I will be doing research this summer at Stanford with Professor Andrew Ng about how we can incorporate machine learning into Firefox. As we work to finish up Firefox 1.0, we're also seeking ideas that will make Firefox 2.0 blow every other browser out of the water. People who come up with the best 3-5 ideas that involve the use of machine learning will win Gmail accounts, and if we implement your idea you'll be acknowledged in both our paper and in Firefox credits. Your idea will also be appreciated by the millions of people who use Firefox. We'll also entertain Thunderbird proposals. See my weblog post for more details; I'll read all comments posted in response to this story or to my weblog."
Here are the best five ideas incorporating machine learning:
... "this is a picture of a sunflower") and let them group and search for items. Eg. "Pictures like this" or "Documents about cats."
1. Based on the user's browsing habits, automatically bookmark the most frequently visited sites, and automatically put them into *multiple* categories (not just one category) to make them easy to find.
2. Create a full-text index in real-time of every page that has been browsed. When the user visits any web page, display a sidebar of "Related previously-viewed pages."
3. A Google-News-like consolidation feature for the user's most-frequently visited news site, automatically highlighting stories of interest based on ones they've previously viewed.
4. Allow user to select "Fewer images like this" or "More images like this" or "Less text like this" and "More text like this" and using Bayesian or other similar filters, automatically block or highlight content. For blocking advertisements, or highlighting certain key passages.
5. Allow the user to browse their own hard drive, and categorize content automatically ("this is a document about lambs"
Please give my Gmail accounts to Gmail for the troops.
.... considering how much Google gave out to drop the prices on eBay.
:p
I suggest better prizes. Y'know, like a girlfriend? I'm sure lots of us Slashdotters would like to have one over a Gmail account
Founder of Mirror Moon - Tsukihime Game Trans
- The pop-up management in modern browsers who provide this feature although more efficient than in the past is still not perfect. Adapt to what pop-ups a person normally uses
- Content highlighting (especially in news sites). Learn what types of news articles / subjects a user is interested in, and highlight titles in news pages that suit the user.
- Accelerator for narrowband connections. Predict which pages the user is more likely to visit next, and start loading them as the user still reads the previous page.
- Recognise efficiently scam sites? Protect users from fraudsters?
PS: Not machine learning, but the sole requirement by me for a browser (dunno if its done in firefox now as hvent used it for a long time): Open new tab as a default rather than a new window, or at least provide the option.
Make it so you can open all links on a page in new tabs, and the browser will sort them by content.
Also, it would be awesome if using the internet were more like playing Fallout. That was a great game.
Since when has this country used intellectual elite as a pejorative term?
Make it so when the user hits the Page Down key, a horizontal line appears for a few seconds where the old bottom of the page was, then fades away. So when you're reading long sections of text and hit Page Down, your eye can quickly scan to where you left off.
Sick of people knocking on Gentoo's greatness in completely unrelated
(Undisclaimer: I do machine learning research at BYU.)
:)
:)
Machine learning, in general, is getting computers to generalize based on data instances. The two main flavors are classification (inferring classifications of data instances based on previous instances) and regression (inferring a function based on input/output pairs).
A lot of people incorporate artificial intelligence into the category "machine learning," though it's not strictly correct. Machine learning is more a branch of AI than anything. One way to keep them straight is to think AI = deduction, ML = induction. (That's vastly simplifying, but it helps to classify them roughly.)
I wonder which way the author leans? Could he possibly post to clarify his meaning?
You can do an awful lot with machine learning that you can't do with conventional techniques. You can often get great results for otherwise NP-hard problems. Slashdot had a story a while back about using machine learning to do mesh compression, in which their algorithm comes up with a close approximation to the real answer to an NP-hard problem in polynomial time.
I'm currently using it to interpolate 2D images, and kicking bicubic B-spline interpolation all to heck. (Paper pending...) The machine learning algorithm infers shapes from the pixels, and keeps edges sharp.
If I come up with an idea, I'll post it later. In the meantime: isn't Firefox supposed to be lean and mean?
I got my Linux laptop at System76.
There. Your most important feature that browsers never had. Searchable bookmarks. Doesn't get much simpler than that. Am I the only one who thinks it's something every browser should have had long time aog?
Your pizza just the way you ought to have it.
Damn right!
Looks like these guys are just looking for a place to dump their thesis after they finish.
Thanks but no thanks.
A browser doesn't really need machine learning as far as I'm concerned.
If you want to waste a shitload of resources and bloat up some app add machine learning to emacs or something but not my browser!
I hate it when anything software tries to "predict". I don't want it. Please make sure it has an OFF button. Seriously. Thank you.
The firefox download manager should scan downloads for malicious spyware, stop the bad download(s) and warn the user of the danger posed by the file(s).
Now, because this has a lot of discussions, when I start typing basebal... I get a lot of urls in the autocompletion field like http://www.baseballthinkfactory.org/files/primer/o racle/
or even unrelated baseball sites. So it's not uncommon for me to have to press downarrow several times. A very useful application of machine learning would be to order the autocompletion possibilities so that my average number of downarrow presses is minimized.
Here's an example:
Bayesian filtering
Thunderbird wouldn't be the same without it. Does it drag your system to a halt? Nope.
I'd be awfully surprised if anything real CPU intensive would ever be installed into Firefox by default. Give these guys some credit.
Ironically, the word ironically is often used incorrectly.
Currently, if I start typing URL in the address bar, it matches URLs alphabetically. This gets very annoying at times, especially if you accidentally type giigle.com instead of google.com and then it keeps on matching giggle.com for weeks when I type "g".
This problem can be fixed by using frequency count with some time decay. For example, if I went to google.com 100 times within last week and once to giggle.com, then match to google.com on "g". If, however, I went to giigle.com 5 times recently, then match to giigle.com
While one might argue that this makes the algorithm unpredictable from user's standpoint, in my experience people keep on typing until they see the correct match. So, this way they'll see the right match sooner on average.
"You mortals are so obtuse." -Q
(Clippo, from Office, featured in Firefox...) Clippo: It looks like you're browsing pornography. You also appear to be typing with your left hand. Would you like to enable the spellchecker?
Speaking of Bayesian filtering, some form of clever-er guessing as to where my next bookmark in my ecclectic collection of bookmarks goes. Sample relatively unique keywords in pages as bookmarked, weight towards bookmark folder baskets, bingo.
Avoid more sophisticated algorhythms that infer a sorting methodology the same as the developer, however. Maybe I have a Programming folder which has C in it, and so you'd infer that all characteristics of matches to Programming inherit to C, if that's the sort of sorter you are, and that fits with you, me, and program-think, so that's right? Right? Except perhaps I'm a university student who has a University folder, and I'm studying Java, whose extrinsic attribute prioritizes sorting it into that group... so you'd end up with a word weighting argument between superclass Programming, which is wrong, and Java, which is right.
Let me be clear. This suggests nothing at all about helping the user organize their bookmarks - everyone has their own system (although perhaps a Bayesian category guesser would be a separate fun feature). This suggestion is merely better guessing of first suggested folder when I CTRL-D.
Often masses of information are broken into multi page presentations.
Somewhere on the page you have buttons named things like Next, Previous, or Page: 1 2 3 4 5 6.
There may be good design rules for positioning these elements but often they are not followed.
I've found many instances where I have to scroll up or down just to find the Next button so that I can click it.
It should be possible to learn for a given site (or sub-tree of a site) what the Next and Previous buttons are just from user behavior and the nearly identical layout of say page 2 to page 3. I think this could be done without parsing any of the html or gifs associated with the buttons.
If Firefox could learn and extract multi-page navigation then these functions could be bound to buttons up on the menu bar, or assigned to keys, and the whole problem of scrolling to find a Next would go away.
There are three types of sites in the world:
Those that use flash for ads
Those that use flash for content
Those that stay the hell away from flash
Rightnow, Firefox doesn't have any way to tell the difference between 1 and 2. But I do, I can clearly see if it's an ad or not. On every flash ad give me the option to tell the browser it's good flash or bad flash and intelligently learn what sites ("sites" also being defined by study of the urls, if I say www.bob.com/~jimbo/whatever.htm and www.john.com/~jimbo/howie.htm and www.curly.com/~jimbo/marthastewart.html are bad it should figure out there is a commonality in the ~jimbo part and apply my preference) have bad flash and block flash content on those sites, instead presenting me with a button to load to allow that content to load.
It should use a number of pieces of information, the url of the page, the url of the flash animation, the size of the animation, the name of the animation, the server the page is being served off of, etc.
"1. Keep track of how users enlarge/reduce the font size: if sites that use a 10 point font are repeatedly enlarged to 14 or 16 point then it is fairly safe to assume that the user has poor eyesight and all sites with tiny text should automatically be sized up."
This is a good concept in several ways.
First, what most people with eyesight limitations do is adjust the really severe problem text and put up with the less severe sorts, so if they enlarge 10 point to 16 consistently, they enlarge 12 to 16 only late in a browsing session, and just put up with 14 point type even though it's a bit smaller than optimum for them. People will go to an effort only when the threshold of discomfort is crossed and the problem gets their consious attention, and many people will put up with a problem beyond that.
Second, it's a clearly quantifiable area, making it the sort of thing machines can excel at. If it turns out to have unexpected complexities, we will get a warning about how much worse other tasks, such as adjusting web sites based on the user's color preference or aestetic criteria, will be (no plaid backgrounds)
Who is John Cabal?
Don't ask me if I want to remember a username/password combo until AFTER the login has been successful.
Spoon not. Fork, or fork not. There is no spoon.