Microsoft's Search Engine Plans
prostoalex writes "Andy Beal from SearchEngineGuide.com interviews Robert Scoble from Microsoft. Scoble tells the audience what current search technologies Microsoft is working on as part of its Longhorn/WinFS development as well as in the field of Internet. Scoble also discusses current problems with local drive and Internet searching, such as absence of metadata for a lot of files out there: "When I take pictures off of my Nikon, they have some metadata (for instance, inside the file is the date it was taken, along with the exposure information) but that metadata isn't useful for most human searches. For instance, how about if I wanted to search for "my wedding photos?" Neither X1, nor Windows XP's built in search would find your wedding photos. Why? Because they have useless names like DSC0001.jpg and there's no metadata that says they are wedding photos.""
I can get around searching for "wedding photos" because I remember the date. 3 special days, and hundreds of wedding photos appear.
It's part of being human that we don't necessarily remember the phrase "wedding photos" but we may remember many other tiny pieces of data about a shoot that are unique to us, and the time and date are one of those. I can be certain the post 9pm photos done on those days are pretty embarassing.
Just concentrating on "Wedding Photos" is useful if someone else is searching my picture archive, but that's not useful to me
nude geekgrrls
I think they already are...
My site for a long time wasn't ranked on Google, MSN, yahoo! search. Then one day I was on the first page for Google. Amazingly enough, I was in exactly the same place on msn and yahoo searches. They all supposedly have their own crawlers, but why was it until I was listed on Google that I was listed on the rest? Just a theory I have...it probably means nothing.
"Wisdom is not a product of schooling but of the life-long attempt to acquire it." -Albert Einstein
Even easier than putting into directories is using a portfolio type application, like Picasa (the original version of Apple's iPhoto btw) which allows simple drag and drop library creation. You can have pictures in multiple libraries, it just takes a small few moments to drop photos into their correct places and they are sorted as need be. If you want wedding photos, look in there if you want photos of janine, kate or benson look in their respective folders.
It doesnt need to be a morass of embedded folder after folder either, as humans have mental acuity unlike a computer. You may have uncle bob who is photographed a lot and auntie beryl who isn't, but all the photos of beryl you may know will contain bob. We can store a surprising amount of information, and perhaps 5 to 10 libraries is all you will need for most peoples collections.
Special occasions get their own. It just takes moments after downloading the photos.
nude geekgrrls
Phil Greenspun has a similar idea and is looking for help on how to accomplish this on a personal level with existing the Windows XP filesystem. Check out his blog post for details. There's already an intersting discussion taking place in the comments for that post.
Free Mac Mini. Yes, I'm
And even better, many photo programs allow batch renames. So while you're putting them in the wedding folder, rename them all to wedding####.jpg and let the program automatically append numbers.
Reminds me of Scotty's line, "The more they overthink the plumbing, the easier it is to stop up the drains." They've developed a complex solution for a simple problem that already had a simple solution.
While a database driven file system with the ability to let users define their own metadata fields in the database sounds really, really cool, I won't be using Microsoft's first or second version for anything I value.
So what's the status of the *nix version of a database file system?
The closest thing to a workable scheme is Gelerntner's Lifestream stuff -- where your system knows that you got married on a certain date (even if you have trouble remembering it) and that documents (JPEGs, Word files, GNUCash transactions from that time probably pertain to it.
What I'm listening to now on Pandora...
Judiging from the interview, the "innovative" Longhorn seems to allow you to add metadata in a slightly user-friendly way. But virtually nobody will use it, except maybe to mark a few important files which you have stored in a special place anyway.
So what would be a better solution then? My idea is that metadata should be added automatically. For instance, a human will recognize most wedding photos for what they are. Getting a computer to recognize this is not trivial, but lots of research is currently invested in this. Already computers can easily recognize general categories ("groups of people", "nature", "animal", "portrait"). My guess is that it is already possible to implement a system that you can train to let the computer recognize your particular brand of photos.
I don't expect Microsoft to try to go into this way of innovation. They will probably wait until an entrepeneur develops it and then copy it or buy them out.
There should be no way to just click the "OK" button without having entered something. Or you could make Photi come back every 5 minutes saying "Lizten man, if you don't giff me ze names right now, I'll notify the authorities!! We haff ways to make you talk!!!"
I store them by date photographed, using ThumbsPlus to view thumbnails and metadata stored in a database. So far, it's worked out for the 45Gb of photos I've taken in the past 5 years.
--Mike--
PS: Yes, I'll chat with and give ideas to anyone who wants to make this better... even Microsoft.
Hmm... stories up for a few minutes, and of course the Slashdot luddites has pipes up with comments says that "i just need to put them in a folder. stupid microsoft."
The point is folders only allow a single hierarchy of data. Sure you can make a Wedding Photos folder. But what if you also want a folder with all the pictures of Uncle Bob from multiple events, a folder with 5-star photos from multiple events, a folder with night photos, a folder with wild partying photos, and a folder with photos of centerpieces.
The Longhorn WinFS will allow you to make queries saying "show me all the photos with Uncle Bob (from my mom's side) and Aunt Jane (from my dad's) that were taken in daylight at formal special events in the last two years that I've rated with 4 stars or more." This cannot be done with modern file systems (unless you want to use some stupid non-standard awkward file naming system that you think covers every possibility), although it can be done with other software (ie. Photoshop Album). Assuming you maintain the meta data... with which Photoshop Album, for example, is a simple drag-and-drop operation.
The trick is incorporating it into the file system mean you don't have to reinvent the wheel. The meta-data technology used for the photos can be used when you're writing, say, a music cataloging application (artist, genre, rating, keywords, composer, publication date, length) or a document repository (client, project, document type, importance, length) or a cataloging application for the terabytes of video files we're all going to have one day.
It is, needless a good idea and where file systems are heading in the future. People who want to defeat Microsoft would be well advised to see the benefits instead of sticking their heads in the sand.
How many people have trouble finding files on their hard drive using the most basic search criteria. People who are so unorganized as to lose files on their hard drive are probably not sophisticated enough to use advanced search methods successfully.
Research shows that 67% of those who use the term "research shows", are just making shit up.
Ever look at the properties page of an MS Office file? There's enough metadata tags in there to keep you busy for hours.
Does anyone really fill those in? Rarely.
Is there a method to search on them? Never looked.
Sometimes it's interesting to browse the properties page to see who really created a spreadsheet or document. For example, people who shamelessly "borrow" templates from former employers and either aren't smart enough or too lazy to do just a little clean up. But that's about it.
Doesn't storing your photos in hierarchical folders labeled appropriately count as metadata? I know it's not very flexible or powerful, but it's metadata of a sort. Store your wedding photos in a wedding folder in a photos folder.
Now, if you're talking about a database of metadata about files, then that's something else.
--Rick "If it isn't broken, take it apart and find out why."
Maybe the photo software could check with your calendar, see that a certain date/time was "my wedding," and assign that metadata to photos as they are downloaded. Most photos already have time/date metadata.
Spoon not. Fork, or fork not. There is no spoon.
Yeah, I can't wait to download stuff from the internet full of their own meta data. Isn't it true that search engines are not using meta data as much cause of false data? The OS having its own contacts list might seem like a good idea, but i can see many people trying to hack into it and mass mail all your friends.
Mark
Apple has a solution to this, which has trade-offs, but seems pretty functional.
Essentially, each of their iLife apps is a replacement for the Finder. Do we really need music search integrated with file search? Or is it sufficient to build independent metadata (ID3) and filestructure (playlists) just for music. That's really the brilliance of iTunes in that it never takes you back to your HD filestructure. You can even ask it to maintain the HD filestructure to reflect the metadata structure, so it'll keep everything in an artist/album/song structure, naming things as needed.
iPhoto is set up the same way, but it's pretty apparent that the iPhoto guys are the 'B' team, since they haven't gotten it nearly as slick as iTunes yet, but it also has the equivalent of content metadata, playlists, and smart playlists. So, yes, I can easily find my wedding photos. The trade-off is that you can't search for 'Wedding' in the Finder and get wedding photos, wedding songs, etc. Maybe that's upcoming, but I'm not totally convinced of the value.
The iTunes organizational structure does carry into iPhoto, so if you want to select a song for a slideshow in iPhoto, you can see your iTunes playlists, and filter against metadata. It also carries into iMovie, etc.
Other posters have clearly identified the problems with metadata. File organization is generallly only useful if you are willing to symlink across all of your metadata, otherwise your photos of you mom and your wedding photos are disjoint, since some should be in both places. The single biggest problem with metadata is putting it in to begin with. iPhoto now allows you to do that during photo import - using a slide-show type UI.
I think MSs tendency to do everything in one place is interesting, but tends to not come off so well. Having everything in SQL could eliminate one of the shortcomings in Apple's implementation which is that they need to maintain an XML intermediate structure for music files, photos, etc. While somewhat handy, it's main function is to join file metadata and the FS, which means that it is somewhat fragile.
--
The closest thing to a workable scheme is Gelerntner's Lifestream stuff -- where your system knows that you got married on a certain date
--
That's fine for personal photos, but what about MP3s or other acquired media which has no direct association with personal life events?
Pros would love this; often you want to search some big image archive for pictures of a specific location. Tourists would find their photos self-organizing.
Lookup can then be by address, or using a map or globe. Think MapQuest.
This offers the possibility of a new (and totally legitimate) peer-to-peer application - location based picture-sharing. See the pictures others took of tourist locations.
I think it's cool that Microsoft is taking cues from the iApps - interesting that they want to integrate it so much into the operating system. Whereas so far Apple is stressing an application-centered solution on top of a more general-purpose filesystem, Microsoft is getting deeper into the integration game, getting into file metadata a la BeOS, and tracking files according to thematic relevance a la relational databases.
If the "smart desktop" idea catches on it will be interesting to see the response from developers on Mac OS X and Linux, as far as offering intelligent activity tracking. Somehow I see a twisty maze of documents and activities, all alike.
Should operating systems do all the work of organizing users files for them, concealing the filesystem behind a database veneer, or behind a purely task-oriented veneer? Should this kind of thing be left to application developers, like the maker of Path Finder?
Wouldn't Windows be more useful if it was a truly modular system that could be configured simply by stripping away unwanted components? Isn't that what makes Darwin so healthy in the enterprise market today?
-- thinkyhead software and media
What I'd like to see come out of Google, is an add in that will categorize and search my local drives using the Google search algorithm. They have Google appliances that businesses can buy and use internally. I'd like to see a home based, and home priced, version of that application. Maybe have it search the internet as well, present the results separately. So if I'm looking for a file containing the words "efficient search keywords" (or something like that) it shows me files in my local system (including network shares maybe) as well as results on the internet.
"For a successful technology, honesty must take precedence over public relations for nature cannot be fooled." -Feynman