Incorporating Machine Learning into Firefox 2.0?
blakeross asks: "I will be doing research this summer at Stanford with Professor Andrew Ng about how we can incorporate machine learning into Firefox. As we work to finish up Firefox 1.0, we're also seeking ideas that will make Firefox 2.0 blow every other browser out of the water. People who come up with the best 3-5 ideas that involve the use of machine learning will win Gmail accounts, and if we implement your idea you'll be acknowledged in both our paper and in Firefox credits. Your idea will also be appreciated by the millions of people who use Firefox. We'll also entertain Thunderbird proposals. See my weblog post for more details; I'll read all comments posted in response to this story or to my weblog."
Here are the best five ideas incorporating machine learning:
... "this is a picture of a sunflower") and let them group and search for items. Eg. "Pictures like this" or "Documents about cats."
1. Based on the user's browsing habits, automatically bookmark the most frequently visited sites, and automatically put them into *multiple* categories (not just one category) to make them easy to find.
2. Create a full-text index in real-time of every page that has been browsed. When the user visits any web page, display a sidebar of "Related previously-viewed pages."
3. A Google-News-like consolidation feature for the user's most-frequently visited news site, automatically highlighting stories of interest based on ones they've previously viewed.
4. Allow user to select "Fewer images like this" or "More images like this" or "Less text like this" and "More text like this" and using Bayesian or other similar filters, automatically block or highlight content. For blocking advertisements, or highlighting certain key passages.
5. Allow the user to browse their own hard drive, and categorize content automatically ("this is a document about lambs"
Please give my Gmail accounts to Gmail for the troops.
- The pop-up management in modern browsers who provide this feature although more efficient than in the past is still not perfect. Adapt to what pop-ups a person normally uses
- Content highlighting (especially in news sites). Learn what types of news articles / subjects a user is interested in, and highlight titles in news pages that suit the user.
- Accelerator for narrowband connections. Predict which pages the user is more likely to visit next, and start loading them as the user still reads the previous page.
- Recognise efficiently scam sites? Protect users from fraudsters?
PS: Not machine learning, but the sole requirement by me for a browser (dunno if its done in firefox now as hvent used it for a long time): Open new tab as a default rather than a new window, or at least provide the option.
Make it so when the user hits the Page Down key, a horizontal line appears for a few seconds where the old bottom of the page was, then fades away. So when you're reading long sections of text and hit Page Down, your eye can quickly scan to where you left off.
Sick of people knocking on Gentoo's greatness in completely unrelated
There. Your most important feature that browsers never had. Searchable bookmarks. Doesn't get much simpler than that. Am I the only one who thinks it's something every browser should have had long time aog?
Your pizza just the way you ought to have it.
How about this: How about browsing the filesystem using tabs?
So for example, in one Firefox window you see the contents of your hard drive (or network folders) pretty much the same way as the Windows Explorer or Gnome/KDE/MacOSX show it to you today BUT if you click the middle mouse button on a directory (or select "open in new tab") you end up with the new directory being open in a new tab.
Think about it, how many windows do you usually have open browsing your filesystem? with this thing you have ALL those windows in the same window organized by tabs, PLUS you also have all you websites as well on tabs right along the filesystem tabs!
And here's another kicker: You can bookmark a group of filesystem browser tabs and later go back to them. You can even drag the group of bookmark tabs to the desktop so that when you double-click on it Firefox opens up all of them at once.
This should all be done with host filesystem integration so that you can drag-and-drop files between the firebox filesystem view and the normal host OS desktop.
I don't know if this has been posted, but I'll give it a shot...
I've accumulated well over a thousand bookmarks and have been much too lazy to organize them into folders. If you could automatically cluster bookmarks (http://vivisimo.com/ does this with web results) I would be eternally grateful.
One more suggestion is to learn usage patterns in a particular website. For example, when I go to http://www.nytimes.com, I generally click on the opinions sections. If the browser could anticipate that I typically go to the opinion section, it could start to preload it before I click on it.
I realize the later suggestion is much easier to implement than the former, but the clustering would be very useful for lazy surfers like me.
If you're a fan of women, add me to your friends list.
The firefox download manager should scan downloads for malicious spyware, stop the bad download(s) and warn the user of the danger posed by the file(s).
Now, because this has a lot of discussions, when I start typing basebal... I get a lot of urls in the autocompletion field like http://www.baseballthinkfactory.org/files/primer/o racle/
or even unrelated baseball sites. So it's not uncommon for me to have to press downarrow several times. A very useful application of machine learning would be to order the autocompletion possibilities so that my average number of downarrow presses is minimized.
Currently, if I start typing URL in the address bar, it matches URLs alphabetically. This gets very annoying at times, especially if you accidentally type giigle.com instead of google.com and then it keeps on matching giggle.com for weeks when I type "g".
This problem can be fixed by using frequency count with some time decay. For example, if I went to google.com 100 times within last week and once to giggle.com, then match to google.com on "g". If, however, I went to giigle.com 5 times recently, then match to giigle.com
While one might argue that this makes the algorithm unpredictable from user's standpoint, in my experience people keep on typing until they see the correct match. So, this way they'll see the right match sooner on average.
"You mortals are so obtuse." -Q
Speaking of Bayesian filtering, some form of clever-er guessing as to where my next bookmark in my ecclectic collection of bookmarks goes. Sample relatively unique keywords in pages as bookmarked, weight towards bookmark folder baskets, bingo.
Avoid more sophisticated algorhythms that infer a sorting methodology the same as the developer, however. Maybe I have a Programming folder which has C in it, and so you'd infer that all characteristics of matches to Programming inherit to C, if that's the sort of sorter you are, and that fits with you, me, and program-think, so that's right? Right? Except perhaps I'm a university student who has a University folder, and I'm studying Java, whose extrinsic attribute prioritizes sorting it into that group... so you'd end up with a word weighting argument between superclass Programming, which is wrong, and Java, which is right.
Let me be clear. This suggests nothing at all about helping the user organize their bookmarks - everyone has their own system (although perhaps a Bayesian category guesser would be a separate fun feature). This suggestion is merely better guessing of first suggested folder when I CTRL-D.
Often masses of information are broken into multi page presentations.
Somewhere on the page you have buttons named things like Next, Previous, or Page: 1 2 3 4 5 6.
There may be good design rules for positioning these elements but often they are not followed.
I've found many instances where I have to scroll up or down just to find the Next button so that I can click it.
It should be possible to learn for a given site (or sub-tree of a site) what the Next and Previous buttons are just from user behavior and the nearly identical layout of say page 2 to page 3. I think this could be done without parsing any of the html or gifs associated with the buttons.
If Firefox could learn and extract multi-page navigation then these functions could be bound to buttons up on the menu bar, or assigned to keys, and the whole problem of scrolling to find a Next would go away.
There are three types of sites in the world:
Those that use flash for ads
Those that use flash for content
Those that stay the hell away from flash
Rightnow, Firefox doesn't have any way to tell the difference between 1 and 2. But I do, I can clearly see if it's an ad or not. On every flash ad give me the option to tell the browser it's good flash or bad flash and intelligently learn what sites ("sites" also being defined by study of the urls, if I say www.bob.com/~jimbo/whatever.htm and www.john.com/~jimbo/howie.htm and www.curly.com/~jimbo/marthastewart.html are bad it should figure out there is a commonality in the ~jimbo part and apply my preference) have bad flash and block flash content on those sites, instead presenting me with a button to load to allow that content to load.
It should use a number of pieces of information, the url of the page, the url of the flash animation, the size of the animation, the name of the animation, the server the page is being served off of, etc.
Smater Front Page: Making use of the first thing that the user sees when starting up.
How about creating an interface for the default page for Firebird. Instead of pointing to the Mozilla.org homepage creating a default Front page designed to evolve to the habits of the user. Whatever way you want to utilize machine learning, you will need a centralized location to acess the results, why not use "Home". That being, creating a simple interface (XUL, not html) that points the user to their most visted bookmarks, or a catagorised and searchable list of their bookmarks(or internet), or updating the user if their most visted sites are updated, aggregating information from sites from their own browsing habits in a single interface when the browser starts up.
Also, if the user uses Thunderbird or Sunbird updating the user of new E-mails and new appointments on the front page. A front page that is customizable to the needs of the user, and avoiding the clutter and ads of commerical sites, and that is local on the users computer and not centralized on a website. And most importantly makes the individual users own data most intuitively accesible to themselves, and evolves to fit the individual user.
"1. Keep track of how users enlarge/reduce the font size: if sites that use a 10 point font are repeatedly enlarged to 14 or 16 point then it is fairly safe to assume that the user has poor eyesight and all sites with tiny text should automatically be sized up."
This is a good concept in several ways.
First, what most people with eyesight limitations do is adjust the really severe problem text and put up with the less severe sorts, so if they enlarge 10 point to 16 consistently, they enlarge 12 to 16 only late in a browsing session, and just put up with 14 point type even though it's a bit smaller than optimum for them. People will go to an effort only when the threshold of discomfort is crossed and the problem gets their consious attention, and many people will put up with a problem beyond that.
Second, it's a clearly quantifiable area, making it the sort of thing machines can excel at. If it turns out to have unexpected complexities, we will get a warning about how much worse other tasks, such as adjusting web sites based on the user's color preference or aestetic criteria, will be (no plaid backgrounds)
Who is John Cabal?
Here's my idea: Use machine learning to figure out the most-often-used code paths, and thereby allow users to optimize their browsers by removing or unloading from memory the least-needed functions, or rearranging the code to allow fastest access to those functions... Maybe you could still leave stubs in the UI so that code could be dynamically loaded if someone eventually decided to use something they rarely used. A self-optimizing browser that sheds the code you don't give a crap about. Now that would rock.
I would want a credibility rating on web pages.
There is a lot of information on the web but almost no way to verify the data. I would like a way for people to report the credibility of the information contained on a web page.
This is especially important with news reporting and double-extra-especially important in times of war.
It is also all too simple for politicians, journalists and other people of power to repeat the same old lies over and over.
The memory of media is short. A Truthalizer would help make it a bit longer.