Examining Mac OS X 10.4's Spotlight
Ton writes "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'. The really interesting part is that metadata will be playing a big role in Spotlight while just a few years ago people were afraid metadata in Mac OS X was going the way of the dodo."
Spotlight is basically a SQLite db that holds data about documents and files on your system. Metadata is gathered by a sort of 'plug-in' for each different file type.
A Typical use will be making query's such as: Show me everything agent dero sent me between tuesday and thursday last week. Mails, IM transfered images, you name it... Best of all, since this is metadata based, it's supposed to be lightning fast
You could envision a plugin that would Spotlightify slashdot threads you read, in theory, and apply the power of a database to it.
but really, you should RTFA
Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
The linked article is shit.
. htmlYou want this one instead, its got loads more info on what it does and how it works, plus some code examples for the gimps.
http://developer.apple.com/macosx/tiger/spotlight
Shitram Brown, PhD
Professor of Mathematics
Anyone who has used the instantly updated searches in Mail.app or iTunes will have a feel for how useful a system-wide approach could be. However I too am concerned about resource usage. I think I'll wait and see how big the metadata index tends to get and how big the CPU/memory hit is.
I believe though that the indexing is done during saves, so you'll not notice a general system slow down. What you will notice is a slow down on file saves.
People who have used it report no performance degredation. And no, its nothing like Windows search, which Mac OS has also had since System 8 or earlier.
For one, it doesn't take half an hour, it shows you the results as you type, instantaneously.
Secondly, via plugins it can understand *any* file, such as an image metadata importer that uses OCR so you can search for words, or a Flesh-tone detector so you can search for all your porn that way.
Shitram Brown, PhD
Professor of Mathematics
From reading the article, I think Hans Reiser has been right about the need for reiser4 on mainstream linux.
He saw all this stuff comming from way back. If you read the LKML, you will remember that he warned us.
Its a pity no one listens to him.
The post links to the Apple Spotlight page that has been there for months. Is THIS the "discussion" that is being referred to in the post?
>>> "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'.
. html
What's really funny is that there's no link to the actual published discussion... but anyway...
http://developer.apple.com/macosx/tiger/spotlight
I read about beagle for linux it seems to be very similar in functionality. http://www.gnome.org/projects/beagle/
-- My site
The radical difference is that Spotlight generates the metadata itself rather than you having to tag stuff yourself. It has content handlers to intelligently tag all kinds of different "stuff" so it "knows" what a Word document is and what a web page is and what a .png file is etc etc.
Bad analogies are like waxing a monkey with a rainbow.
Already the differences in Fat32/NTFS versus HFS+ (the mac filesystem) yield significantly faster searches before spotlight is introduced. Sit down on an OSX apple and notice that an entire search of the HD is actually a fast operation, not the waiting many-minute exercise that it is on windows.
Now since spotlight is built into the core of the system, and isn't just a tack-on service like the windows indexer is, there are significant speed advantages, updating the SQL database when files are modified, added, etc is incredibly light on the CPU, and is equivalent to doing something like changing the file name.
What spotlight isn't, and this might be where you are getting confused, spotlight isn't a spider that crawls from folder to folder cataloguing information about each file, which is what the windows indexer was doing, hence why it was resource intensive, as it was busy checking files and folders that you have possibly not made any changes to.
As a counter to the 'Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.' Microsoft, google and apple would disagree. Having an up-to-date catalogue without the CPU strain is a must have, go figure MS have been trying to implement it since NT4.0.
I still have to be convinced that full-content indexing is a good idea. I very rarely need to search for something in the contents of a group of files, and when I do it's usually such a small group that the time saved would not outweigh the disk space used by such large indexes. On the other hand, this problem should get better over time, since the largest files are usually video, and have little indexable content, meaning that the index is likely to get relatively smaller over time (until someone writes a plug-in that can interpret objects in images, and applies this to every frame in a movie. Fortunately, I think this is still a long way off).
I am TheRaven on Soylent News
Okay - I'll bite
* Desktop-metaphor based GUI for a personal computer
* WYSIWYG publishing with a laser printer
* PDAs via Newton
* AppleLink (err, AOL now)
* QuickTime (movies, QTVR, 3D, etc)
We could go on and on. Give Apple props where due, huh?
And please consider modding the troll down...
MetaData is not new. Its not radical. But MS aparently can't make it work. So Apple gets to use it first, 5 percent of the computer population go wow! 95 percent ask why can't we have this, and Longhorn SP1 will get it and proclaim it as a great new radical technology.
Just a small info. The brain behind Spotlight is Dominic Giampaolo, the same guru that wrote the fantastic BeFS for BeOS.
uhm. No. It is not continually indexing the data, if you read the article you'll see it only updates the meta-data for items when they're saved - you can write custom plug-ins for new data types, or just go with the bundles ones for standard file types like images, text etc.
:
:
:
Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.
On the contrary, this is a *better* solution to a very basic problem that has plagued computers since they were invented.
The problem
How do I organise and access the data I use every day (emails, letters, images, music etc)?
The old solution
You can put your files in folders (one per file). You can name the files with a short description, ending with a cryptic 3 letter code to denote the file type. Files *must* be in one category/folder only at a time. Limited meta-data (date modified, file-type etc) may be stored.
The new solution
You add meta-data to files (often automatically) saying who created them, what project it's under, whether it's 'to do' or 'unfinished' or whatever. You'd do this in a save dialog for the application, as you saved the file. All other applications which use searchlight will update their view of this stuff for free, in real time.
When you want to work on a project, you click on the live project folder, and immediately you see all the files, emails, images etc for that project, no more, no less, regardless of where they are on the disk and what other projects they're shared with.
Want to see all the stuff to do with John, 5 months ago? On this project? Containing the word gizmo? That sort of query will be easy to make.
If you have an image editing application, it can show you all the images taken in Paris in 2002, without having to build a database application into it. This makes adding this kind of feature to applications trivial.
Ideally adding meta-data tags like 'project-1', and 'To do' should be as easy as choosing them in the save dialog or applying them like a label in the Finder. It's not quite at that stage yet, but that should come later. Some of these ideas are quite old (Be), but they are long overdue in a desktop operating system.
Coming from a WindowsXP background, some things Ive noticed so far:
- Clicking the 'X' doesnt actually close the application. This annoyed me to start with, but ive slowly gotton used to it.
- Having to select the application window before I can quit it using the application menu. Or I have to right click on the dock icon to quit. Annoying still.
- Love the dock. Its just
..... right.
- Most of the file system is hidden from you, which I like. Put my data where I want it and ignore the rest.
- The ability to access the underlying BSD OS easily. Love it.
- Everything looks and feels 'polished'. THats what I always hated about KDE/Gnome when I tried them, the features were there, but noone had taken the time to step back and polish the entire thing off so it all looks and feels together.
- Every time I boot the Mac, my TFT display is 'wavey' until i have the monitor do an autoadjust. Dont really know whoes fault this is, tho its fine under windows and linux.
So, final conclusion? I love it, so much that I have already placed an order for a G5 Imac. And in the meantime, Ive purchased a G4 upgrade for this little baby, just to help it alongThe reason Windows XP does not do full text search correctly is because it uses a specific registry handler entry for each type of file (*.txt, *.rtf etc). It uses a different handler for different types of files.
; EN-US;Q309173
.TXT flat text handler is identified by using a registry key:
{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"
.ASP place
{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"
However it only comes with a few configured filetypes settings, and no way to set a default "When no searchFilter available, treat as plain text" setting.
I stressed and strained about this when XP came out initially. The only way I found to do it so I got expected results was to build myself a scanner.
It searched through a drive, and identifies EVERY file extension.
It then looks through the registry to see which Extensions have linked Handlers.
It generates a reg file containing stub links for every unmatched filetype.
Its a bit shotgun, but allowed me to continue using the Text search for XP.
Microsoft have released their own shotgun registry pack, for more info see here:
http://support.microsoft.com/default.aspx?scid=kb
(I have since moved myself into using my own full search tool, but at least the XP search doesn't miss files which are clearly within visible range).
[Now for the science part..]
Take a file, something like "PunchTheMonkey.asp".
Make sure you have it open in notepad, and make sure there is a certain text string - for instance "spyware".
Open the windows XP search in that folder, tell it to search *.ASP, and give it the phrase "spyware".
Windows XP will NOT find this file.
-----
The Windows
[HKEY_CLASSES_ROOT\.txt\PersistentHandler]
@="
Adding an entry like the one above for each required filetype will restore the full text search functionality.
So, I add the following entry into the correct
[HKEY_CLASSES_ROOT\.ASP\PersistentHandler]
@="
After I have logged off/rebooted, I try the same again, and XP will now identify the file.
liqbase
Unless you used BeOS in the past!
This really is a big deal, much bigger than Microsoft's feeble attempts at full text search, or Google's desktop search. In many way's this much, much more useful than full-text search, especially for developers.
At home I have about 6,000 MP3s, a 1000 photos, 500 scientific articles in PDF format and hundreds of words files that I need to juggle. Each one has its own metadata database, and none of them are updated in real time.
Databases:
MP3 - WinAmp & AudioTron
Photos - Photoshop
PDFs - Acrobat Indexer
Word files - MS Indexer
That doesn't include any of the other data that is stored completely databases and would have been easier to store in the file system - like email, guitar tab files and god knows what else.
A properly implemented global meta-data store (that works at the filesystem level, not as an iterative service) profoundly changes how one uses the system, making sorting and finding data actually almost pleasurable.
+--------------------- You idiot! I told you we were facing the wrong way!
What's up with apple and German tanks? First the Panther (http://www.achtungpanzer.com/pz4.htm#panther) and now the Tiger (http://www.achtungpanzer.com/tigerp.htm). What's next, the Leopard? When apple releases Mac OS 1x.x Leopard II, then I'm buying a Macintosh!
You've got to give them credit for product design as well. Nobody makes more desirable-looking software and hardware. Is it any wonder that Apple's fiercest supporters are graphic designers?
I've tried Spotlight and suggest that when it comes out, every time you step away from your computer make sure to lock your screen. All someone has to do is type 'porn' into the little search toolbar and within seconds it's all nicely listed.
Perhaps Apple needs to add a feature to turn off indexing for certain directories.
I am the poster and the link was included in the post. Actually, the whole post was about the specific link to the Apple Developer site. Why the editors removed that link is absolutely beyond me...
Hmmm. This sounds a little dangerous. It will make it much too easy for the wife to find your porn collection, your AOL-IM sessions with that weird Goth chick, the draft of your divorce papers, etc. I AM NOT UPGRADING TO TIGER.
Which explains why it's tied to the filesystem rather than using a general hook at the vnode layer to allow the same functionality to be implemented regardless of the filesystem in use.
Wow. Check it out. Everything you said here is completely 100% wrong.
Spotlight is filesystem-independent. It runs as a set of daemons and stores its metadata database in a hidden directory called ".Metadata" at the root level of the volume.
All your "could be" talk is basically a summary of how Spotlight works.
I write in my journal