Examining Mac OS X 10.4's Spotlight
Ton writes "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'. The really interesting part is that metadata will be playing a big role in Spotlight while just a few years ago people were afraid metadata in Mac OS X was going the way of the dodo."
My windows XP search (at work) is very odd. It will not find text in assembly files (*.S) that I know is there. I've played around with turning the indexing thing on and off to no avail. That and other strange behaviour led me to find Visual Grep which is well worth whatever I paid for it (50 USD?). Still something like that should work in a real OS.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Anyone who has used the instantly updated searches in Mail.app or iTunes will have a feel for how useful a system-wide approach could be. However I too am concerned about resource usage. I think I'll wait and see how big the metadata index tends to get and how big the CPU/memory hit is.
I believe though that the indexing is done during saves, so you'll not notice a general system slow down. What you will notice is a slow down on file saves.
From reading the article, I think Hans Reiser has been right about the need for reiser4 on mainstream linux.
He saw all this stuff comming from way back. If you read the LKML, you will remember that he warned us.
Its a pity no one listens to him.
I read about beagle for linux it seems to be very similar in functionality. http://www.gnome.org/projects/beagle/
-- My site
I'm waiting for Tiger so that I can try out Automator. This promises to be a point-n-click version of scripting. Hopefully this will be easy enough to use even my parents and maybe even my boss will be able to use it.
The first thing I'll do is try making an Automator to create thumbnails. Currently I'm using a bash script I wrote on my Linux box to do this. This will be the first time I've paid for an OS upgrade since Win98, so I hope it's worth it.
Vote for global prefs bug
Spotlight's datastore isn't SQLite.
The DB for it was custom designed for fast unicode text searches. As far as i know Apple isn't going to document the DB format but will be providing a C based API to search it.
Does the world need another DB file format? We'll see....
Three words: Automated boobie detection
It would be cool if it didn't suck.
Coming from a WindowsXP background, some things Ive noticed so far:
- Clicking the 'X' doesnt actually close the application. This annoyed me to start with, but ive slowly gotton used to it.
- Having to select the application window before I can quit it using the application menu. Or I have to right click on the dock icon to quit. Annoying still.
- Love the dock. Its just
..... right.
- Most of the file system is hidden from you, which I like. Put my data where I want it and ignore the rest.
- The ability to access the underlying BSD OS easily. Love it.
- Everything looks and feels 'polished'. THats what I always hated about KDE/Gnome when I tried them, the features were there, but noone had taken the time to step back and polish the entire thing off so it all looks and feels together.
- Every time I boot the Mac, my TFT display is 'wavey' until i have the monitor do an autoadjust. Dont really know whoes fault this is, tho its fine under windows and linux.
So, final conclusion? I love it, so much that I have already placed an order for a G5 Imac. And in the meantime, Ive purchased a G4 upgrade for this little baby, just to help it alongI'm a PC (Win/Lin) user, and I'm thinking about changing over to Mac.... lol, I'm not that cliche. But I might consider learning more about them. They are nice powerful beasts within. They'd be nice to have on a Folding Farm. :D
As much as I like Apple, those were all things that someone else did but just didn't do too well. Quicktime certainly wasn't ground breaking technology when it came out.
Unless you used BeOS in the past!
This really is a big deal, much bigger than Microsoft's feeble attempts at full text search, or Google's desktop search. In many way's this much, much more useful than full-text search, especially for developers.
At home I have about 6,000 MP3s, a 1000 photos, 500 scientific articles in PDF format and hundreds of words files that I need to juggle. Each one has its own metadata database, and none of them are updated in real time.
Databases:
MP3 - WinAmp & AudioTron
Photos - Photoshop
PDFs - Acrobat Indexer
Word files - MS Indexer
That doesn't include any of the other data that is stored completely databases and would have been easier to store in the file system - like email, guitar tab files and god knows what else.
A properly implemented global meta-data store (that works at the filesystem level, not as an iterative service) profoundly changes how one uses the system, making sorting and finding data actually almost pleasurable.
+--------------------- You idiot! I told you we were facing the wrong way!
The brain behind Spotlight is Dominic Giampaolo, the same guru that wrote the fantastic BeFS for BeOS.
Which explains why it's tied to the filesystem rather than using a general hook at the vnode layer to allow the same functionality to be implemented regardless of the filesystem in use. Having the filesystem support it would make it more efficient on HFS+ but it should be possible on UFS, ISO 9660 CDs, or even over NFS or SMB.
In fact, the way it's described... with one metadata store per filesystem rather than per file, and user-level metadata provided by applications... this is something that FreeBSD or Linux could implement right now, over any file system: all they would need would be a mechanism for the vnode layer to send messages to a usermode daemon that tracked inode operations (eg, creation, deletion, maybe mode changes or date changes, and renames) in a name-inode database (any database, including Postgres or MySQL) and updated any associated metadata in the background.
This could be done with negligable slowdown for file operations: the index can be updated asynchronously, because it can always be recreated in the background after a crash, so the vnode operation won't ever have to wait for the daemon to respond... and changes to the metadata are all in userspace.
Devon
What hasn't been mentioned is the smart folders will always keep you directories uptodate. No more drag and droping files after I download them.
The question is will I be able to make smart folders based on permissions I give on my files so that I can share them on my network.
weo
#=-weo-=#
Check out Mor Naaman at Stanford who is working on adding GPS metadata to photographs. Once he has the GPS coordinates he uses that to get information such at time of day, lighting, weather, elevation, temperature, etc... This allows you to create metadata searches for "All early morning images in clear weather in Las Vegas, etc..."
YOu can try the system out here with a collection of almost 4k images.
Unfortunately, Microsoft chose not to expose this functionality to Windows users. Which is odd, considering IFilters for Office documents are installed on every Windows (2000+) machine, as is Indexing Service.
Metadata is gathered by a sort of 'plug-in' for each different file type.
Apple has had a few developer kitchens on writing Spotlight importers. The idea is that any given app developer might have his own ideas as to what constitutes the interesting searching criteria for his file types. Apple has importers for common image formats, plain text, rich text, mail messages, etc.
If you were a photographer, for example, and you have a fancy camera that puts a lot of info into the EXIF tags of the image files it generates, you could search for "all images I made using this particular lens with a f-stop setting between 2.5 and 3", or if you're looking through files from a music notation program, you could search for "all files in 5/8 time in the key of G minor".
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
I'm not a fan of a db being my filesystem, but maybe I'm old-fashioned. I can see where M$ could benefit from it, though. Their file system is crap; and not so much the file system as how SMB plays with it. On the other hand, seeing as how NTFS and Oracle fubar each other, I wonder who this will play in WinFS.
Does anyone know how they are going to deal with security? Will the indexed information inherit the same security attributes as the underlying files? Do the indexers run as root?
Word on the street is that it doesn't index the ~/Library folder. Combine that with Safari's new 'private browsing' mode, and you should be safe enough from casual snoops.
In the swedish language it is not uncommon to use "her" when talking about a human. Think "mother earth".
28 days, 6 hours, 42 minutes and 12 seconds... that is when the world will end.
I am the poster and the link was included in the post. Actually, the whole post was about the specific link to the Apple Developer site. Why the editors removed that link is absolutely beyond me...
How well this system works will in part depend upon how many data format plug-ins are provided. For example, take something like the SID audio format. It's relatively unknown, but has an officially registered MIME type with IANA giving it a status above many other file format types, and it is used to provide background sounds on some web sites. Will it make the cut?
This is just one file format chosen at random. There are thousands out there, some of which are used pretty heavily for documentation in certain circles. How about all of the OpenOffice file formats, or the AbiWord format?
I can see this feature being hugely useful if Apple does a good job of providing plug-ins, and making it easy for third-parties to add more.
"...how often does one do a blind search of the whole system anyway?" well, you've got to realize that this will be the big topic for the next boring couple of years. even google, not to mention apple, MS and every 'nix flavor are working on solutions. managing your information.
;)
i have to admit that i have crapola all over my harddrive that i will never go back to -- the files just keep getting buried and copied over to my newest computer. even if spotlight is kinda flawed, engineers have to start looking for better ways to manage information.
and besides, it gives MS something to do besides f-ing up browser standards
for what it's worth, OS X now comes with the option to use a case-sensitive HFS+.
i speak for myself and those who like what i say.
I'm not convinced yet apple is going to get Spotlight right, i.e. truely revolutionary. It has potential (smart Finder folders is on the right path) but at the moment, it seems they are more interested in simply trying to duplicate Quicksilver/Launchbar technology, which is the wrong way to do this.
I'm tired of apple ripping off ideas from developers without (A) Giving them credit or (B) developing something equivalent so the new as at least as feature-full as the old. Based on apple's history, the first version of Spotlight will likely be a horribly dumbed down version of Launchbar in terms of tech, since apple is obsessed with "ease of use": i.e. a three year old has to be able to work it.
Rant aside, there are a few key pieces I think apple is missing:
(1) User-created metadata. I should be able to tag anything I want with any metadata I want so the organization system follows ME and MY preferences, instead of the system determining it for me. Apple should be thinking about taking the insanely wonderful metadata system they created in iTunes and applying that to the finder. It is essential you be able to tag metadata in, because you don't always access the same objects for the same purposes.
(2) Flexible file system. This is a concept I've developed which basically says that the file system should be dynamic and adaptable to match the thought flow of the user (only possible with a good metadata file system). If you've ever seen this app on the PC, think: "The Brain". What that means is that if apple does #(2) right, it should be easy as hell to tag things, and then basically I can create relationships which let me "flow" through my files by navigating CONCEPTS instead of folder heirarchy. A good app that does this is Devonthink. Devonthink will grab the contents out of your files, and when you do a search, you can not only see your search term but "related" search terms. Click on a new search term and you get a new listing. So as you come up with ideas about what you want to do, you can easily and naturally branch off into other parts of your file system. This methodology models the way the human brain actually works- thinking in concepts and spacial organization, rather then structure. (The "flexible" comes because the system takes your tags and adapts the search around them, allowing you to change how the "flow" works, depending upon what topics are most important to you.)
(3) The next level after metadata search is a new way of visually interpreting the metadata and relationships between. Which means a NEW FINDER. I can't believe Steve actually threw this comment out after demoing Spotlight: "With this, you probably won't even need to use the finder any more." Well then why even have the Finder at all, Steve?! There IS a reason for the finder, which is why it's stayed around all these years, and that is that people think SPACIALLY. People are creatures of habit, and one way we remember where things are is if we know where to look for it and it's always in the same place. Which means there needs to be a visual grounding to the above dynamic files system, to give people a sure footing to all of this. I'm talking about things like a window that always stays in the same spot and always performs the same task, like showing you what new files have been added to the system, or actively updating your list of word documents wherever they are. Right now in the finder, a window is a window is a window. That shouldn't be. If a search is applied to a window, then that window isn't just showing you files, it's performing an active function. The finder needs to evolve to take on the new roles and responsiblities it should have in the context of a metadata files system. Spotlight should replace the finder: the two should work together seamlessly.
The good news is that Spotlight is built into the system, so even if apple screws up the implimentation (likely), the next generation of 3rd party apps will hopefully be able to fill in the gaps.
This sounds like an attempt by Apple to do on HFS what they've done for years on the Newton, er, *did* for years on the Newton. On the Newt there is no file system: there's only a database system, and each application maintains its own database of entries. When you issue a search, the operating system queries each of the applications in turn, asking them to search their entries in an appropriate fashion looking for a particular string or whatnot. Then it assembles the entries and the user can choose them and launch the application opening the entry. Nice.
On the Mac, that'd be expensive. Querying all the apps means running the apps. So instead Apple has lightweight app proxies (the "plugins") which provide metadata information rather than directly searching the files. Blah.
Maybe they are too stupid to realize what they have? It's been in there since XP shipped. It's an extension of the same mechanisim that lets Windows Explorer look inside of .cab files and .zip files as though they are regular folders.
r l= /library/en-us/indexsrv/html/ixufilt_912d.asp
Note that their own Indexing service that is built in since 2000 also has plugins for parsing different kinds of documents and you can add more.
http://msdn.microsoft.com/library/default.asp?u
Of course like most MS things its interface sucks so it's not nearly as useful as say the google one but fixing the interface would be very small work since the base system already works.
I agree with you, Apple usually gets it closer to right but expect MS to shoot back since they already have the base tech in there. They just need to get off their asses and give it a useful interface.
I'm very fuzzy on the details, but I know that Apple played a leadership role, back in the mid-90s, in lobbying the FCC for the radio spectrum allocations for what we now call WiFi.
Spotlight can support arbitrary file types, entirely dependant on what an application developer decides to supply, and you decide to install. Google is limited to the file types Google implements.
WinFS is an overly complicated pile of steaming pooh, that Microsoft are having trouble delivering.
fixing the interface would be very small work since the base system already works.
Heh heh heh... [insert reference to The Inmates are Running the Asylum, here]
Apple usually gets it closer to right but expect MS to shoot back since they already have the base tech in there. They just need to get off their asses and give it a useful interface.
Apple is the R&D wing of Microsoft, so by waiting for Spotlight to come out, they are doing just that.
Ah. Thanks for the heads-up. Didn't Apple hire the BeOS filesystem guys?
AFAIK they use some kind of search on the lexicon for the inverted index. For instance, the string "nut" is matched to "nutmeg", "donut", etc., and the document lists for those terms are merged together. Phrase search would also be done using all matching words, eg "nut hol" would expand to phrase searches like "donut hole", "peanut holder", etc.
The exact method for matching the search string to the lexicon isn't clear. It could be a suffix tree, but it may be as simple as grep-like scanning of the words, since there aren't that many relative to the text size.
Looking at mail.app it seems to do this process on each keystroke. It's not terribly fast, but it gets the job done.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
Plus, you should be able to combine the things (aka a compound query) to do really neat stuff.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Of course, this metadata will be so much cooler when something like spotlight is there to take advantage of it...
Die Menschen verhoehnen was sie nicht verstehen. -- Goethe.
While schematized semi-structured DAGs of data may be overkill for many applications, you might be surprised how often something like this is needed, and how few developers actually have the skill to build it when it is necessary.
It is not uncommon for Windows developers to use a Jet database as their "file format", and just rename the extension to something else. Right off the top of my head I can think of three [1,2,3] apps that do this. CityDesk and ContentSaver would both be much better served by something like WinFS, as their data are not particularly relational in nature. Jet is also not easily fulltext searchable, doesn't give you eventing, is not scalable past 2GB...
The team behind Chandler (Mitch Kapor et al) have probably spent at least a man-year or two working on a repository with similar features to those intended for WinFS. From what I've heard, it's a nice piece of work, and they're hoping other developers will use it (i.e. not just for Chandler).
I myself spent much of last year working on a similar repository for version 1.0 of my company's application. It was an expensive task, but the result was well worth it, as our 2.0 product adds very different functionality and yet was easily built on the same storage foundation.
You can bet many others have tackled subsets of WinFS functionality for their applications. (Sleepycat's customer list would probably lead you to many of them.) The problem with everyone doing this on their own is not only duplication of effort, but it essentially closes the door on interoperability, since each implementation is in effect another proprietary file format. Not to mention that some of these problems are truly subtle and difficult, such as allowing concurrent access to sub-file-level items (fine-grained locks), replication and synchronization, etc.
[1] Diebold GEMS - http://www.diebold.com/dieboldes/GEMS.htm
[2] Fog Creek CityDesk - http://www.fogcreek.com/CityDesk/index.html
[3] Macropool ContentSaver - http://www.macropool.com/en/index.html
Does anyone know how this will work with Backups/Restores? OS X backup programs have enough problems with resource files, yet alone this additional data.
._ prefix. Will the metadata be useable on an NFS mounted filesystem.
Also, how about remote file systems (nfs for example). Resource files are mapped as regular files with a
Sure search engines are killer apps for the Internet but that's because the web is intrinsically disorganised and distributed.
Is search really so relevant for a single computer and the average desktop user? Most people already organise their files in a somewhat structured way, and generally know where to find stuff. (Especially if they use OS X)
Sure powerful file search might be useful occasionally, but i don't see it as a huge issue that companies like M$ think it is.