Examining Mac OS X 10.4's Spotlight
Ton writes "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'. The really interesting part is that metadata will be playing a big role in Spotlight while just a few years ago people were afraid metadata in Mac OS X was going the way of the dodo."
Spotlight is basically a SQLite db that holds data about documents and files on your system. Metadata is gathered by a sort of 'plug-in' for each different file type.
A Typical use will be making query's such as: Show me everything agent dero sent me between tuesday and thursday last week. Mails, IM transfered images, you name it... Best of all, since this is metadata based, it's supposed to be lightning fast
You could envision a plugin that would Spotlightify slashdot threads you read, in theory, and apply the power of a database to it.
but really, you should RTFA
Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
You must have a different version of locate to me. I can't get mine to index my emails, it has no idea about the metadata entries in common document types and can't tell the difference between an image and a movie file.
Could you send me the source for the version you have installed that does that?
The linked article is shit.
. htmlYou want this one instead, its got loads more info on what it does and how it works, plus some code examples for the gimps.
http://developer.apple.com/macosx/tiger/spotlight
Shitram Brown, PhD
Professor of Mathematics
People who have used it report no performance degredation. And no, its nothing like Windows search, which Mac OS has also had since System 8 or earlier.
For one, it doesn't take half an hour, it shows you the results as you type, instantaneously.
Secondly, via plugins it can understand *any* file, such as an image metadata importer that uses OCR so you can search for words, or a Flesh-tone detector so you can search for all your porn that way.
Shitram Brown, PhD
Professor of Mathematics
makewhatis.cron can be a pain on Linux as well, if it is on a workstation which is mostly switched off.
Unfortunately for windows boxes, they do tend to be left shut down a lot of the time, so more of their runtime is spent rebuilding the search database when the machine is being used for something, rather than in the middle of the night, which is the preferred way
http://michaelsmith.id.au
The post links to the Apple Spotlight page that has been there for months. Is THIS the "discussion" that is being referred to in the post?
>>> "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'.
. html
What's really funny is that there's no link to the actual published discussion... but anyway...
http://developer.apple.com/macosx/tiger/spotlight
The radical difference is that Spotlight generates the metadata itself rather than you having to tag stuff yourself. It has content handlers to intelligently tag all kinds of different "stuff" so it "knows" what a Word document is and what a web page is and what a .png file is etc etc.
Bad analogies are like waxing a monkey with a rainbow.
Already the differences in Fat32/NTFS versus HFS+ (the mac filesystem) yield significantly faster searches before spotlight is introduced. Sit down on an OSX apple and notice that an entire search of the HD is actually a fast operation, not the waiting many-minute exercise that it is on windows.
Now since spotlight is built into the core of the system, and isn't just a tack-on service like the windows indexer is, there are significant speed advantages, updating the SQL database when files are modified, added, etc is incredibly light on the CPU, and is equivalent to doing something like changing the file name.
What spotlight isn't, and this might be where you are getting confused, spotlight isn't a spider that crawls from folder to folder cataloguing information about each file, which is what the windows indexer was doing, hence why it was resource intensive, as it was busy checking files and folders that you have possibly not made any changes to.
As a counter to the 'Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.' Microsoft, google and apple would disagree. Having an up-to-date catalogue without the CPU strain is a must have, go figure MS have been trying to implement it since NT4.0.
As mentioned, I think it's the plugin architecture that makes it special. That makes it possible to search for anything that you can imagine. For example, you could write plugins for your logfiles, movie subtitles, internet cache, etc. It's basically your imagination that sets the limit.
To my knowledge, other metadata-based search systems have not had a similar degree of extensibility. Please correct me if I'm wrong.
Apple are well known for optimising their software to be significantly faster with each pre-release build. Having had the opportunity to test the developer tester of 10.4 with spotlight on a 12" powerbook (which was bogged down with various applications at the time) I can assure you that spotlight remained snappy, and definitely true to the 'instant' claim (I've noticed apple are quite careful on not over advertising their products, as it cause more problems than sales and a bad image). After using microsoft products we become very used to how slow a process can be. Apple's advantage is clear, they know their target hardware, like video-card driver writers they can optimise any part of their OS to fit their hardware for optimum speed. Additionally the g4/g5 chipsets have some quite useful registers for performing these sorts of searches (think sort of like MMX for x86, except with developers actually utilising them outside of games)
I still have to be convinced that full-content indexing is a good idea. I very rarely need to search for something in the contents of a group of files, and when I do it's usually such a small group that the time saved would not outweigh the disk space used by such large indexes. On the other hand, this problem should get better over time, since the largest files are usually video, and have little indexable content, meaning that the index is likely to get relatively smaller over time (until someone writes a plug-in that can interpret objects in images, and applies this to every frame in a movie. Fortunately, I think this is still a long way off).
I am TheRaven on Soylent News
Okay - I'll bite
* Desktop-metaphor based GUI for a personal computer
* WYSIWYG publishing with a laser printer
* PDAs via Newton
* AppleLink (err, AOL now)
* QuickTime (movies, QTVR, 3D, etc)
We could go on and on. Give Apple props where due, huh?
And please consider modding the troll down...
Just a small info. The brain behind Spotlight is Dominic Giampaolo, the same guru that wrote the fantastic BeFS for BeOS.
uhm. No. It is not continually indexing the data, if you read the article you'll see it only updates the meta-data for items when they're saved - you can write custom plug-ins for new data types, or just go with the bundles ones for standard file types like images, text etc.
:
:
:
Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.
On the contrary, this is a *better* solution to a very basic problem that has plagued computers since they were invented.
The problem
How do I organise and access the data I use every day (emails, letters, images, music etc)?
The old solution
You can put your files in folders (one per file). You can name the files with a short description, ending with a cryptic 3 letter code to denote the file type. Files *must* be in one category/folder only at a time. Limited meta-data (date modified, file-type etc) may be stored.
The new solution
You add meta-data to files (often automatically) saying who created them, what project it's under, whether it's 'to do' or 'unfinished' or whatever. You'd do this in a save dialog for the application, as you saved the file. All other applications which use searchlight will update their view of this stuff for free, in real time.
When you want to work on a project, you click on the live project folder, and immediately you see all the files, emails, images etc for that project, no more, no less, regardless of where they are on the disk and what other projects they're shared with.
Want to see all the stuff to do with John, 5 months ago? On this project? Containing the word gizmo? That sort of query will be easy to make.
If you have an image editing application, it can show you all the images taken in Paris in 2002, without having to build a database application into it. This makes adding this kind of feature to applications trivial.
Ideally adding meta-data tags like 'project-1', and 'To do' should be as easy as choosing them in the save dialog or applying them like a label in the Finder. It's not quite at that stage yet, but that should come later. Some of these ideas are quite old (Be), but they are long overdue in a desktop operating system.
The Both links say quite a bit. I guess the kernel gurus know better, but i think the sql plugin for a FS would be a cool thing to show off with at the very least.
n .net/Articles/100148/
http://kerneltrap.org/node/view/3727
http://lw
The reason Windows XP does not do full text search correctly is because it uses a specific registry handler entry for each type of file (*.txt, *.rtf etc). It uses a different handler for different types of files.
; EN-US;Q309173
.TXT flat text handler is identified by using a registry key:
{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"
.ASP place
{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"
However it only comes with a few configured filetypes settings, and no way to set a default "When no searchFilter available, treat as plain text" setting.
I stressed and strained about this when XP came out initially. The only way I found to do it so I got expected results was to build myself a scanner.
It searched through a drive, and identifies EVERY file extension.
It then looks through the registry to see which Extensions have linked Handlers.
It generates a reg file containing stub links for every unmatched filetype.
Its a bit shotgun, but allowed me to continue using the Text search for XP.
Microsoft have released their own shotgun registry pack, for more info see here:
http://support.microsoft.com/default.aspx?scid=kb
(I have since moved myself into using my own full search tool, but at least the XP search doesn't miss files which are clearly within visible range).
[Now for the science part..]
Take a file, something like "PunchTheMonkey.asp".
Make sure you have it open in notepad, and make sure there is a certain text string - for instance "spyware".
Open the windows XP search in that folder, tell it to search *.ASP, and give it the phrase "spyware".
Windows XP will NOT find this file.
-----
The Windows
[HKEY_CLASSES_ROOT\.txt\PersistentHandler]
@="
Adding an entry like the one above for each required filetype will restore the full text search functionality.
So, I add the following entry into the correct
[HKEY_CLASSES_ROOT\.ASP\PersistentHandler]
@="
After I have logged off/rebooted, I try the same again, and XP will now identify the file.
liqbase
Clicking the 'X' doesnt actually close the application. This annoyed me to start with, but ive slowly gotton used to it.
o se/
If you want to quickly quit a load of apps or switch application, hit cmd-Tab, and then cycle through the apps with the tab key.
However you have one gig of RAM on the system. You have no need to quit the programs when switching between them. They'll be paged out to disk as necessary if you manage to fill the available RAM. Multi-tasking works very well as processes aren't in general allowed to hog the processor.
I think this is a common thing amongst people who're used to windows - the windows in OS X represent documents, not applications, so that's why they can be closed without quitting the application. You will find Apple managed to balls this up by being inconsistent though - some applications DO quit on closing the window, but in theory they're applications which only have one window, and are utilities, like the Address Book.
Be sure to try expose as well, though I doubt it'd work well on that older system.
http://www.apple.com/macosx/features/exp
This has already been done to some extent in Quicksilver.
/.
http://quicksilver.blacktree.com/
It's an app that indexes parts of your file system and supports plugins to to index application data. The best part is that it is keyboard based. For example. type command-space "slash" enter and it fires off Safari opening
I'm not sure how Apple will improve on this.
The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
Having to select the application window before I can quit it using the application menu. Or I have to right click on the dock icon to quit. Annoying still.
OK, use Splat-Tab (Apple/Command/Cloverleaf, call it what you will) to switch between apps. When you get to the one you want, hold down Splat and press Q. It quits the application. Press H instead and it Hides it. There's more of these...
Hope this helps.. It seems this is OS X 10.3 only, so you might want to check out LiteSwitch X which does the same thing.
Mark
Liked this comment? Why not buy me something nice
Apple has had this type of search engine before, they called it V Twin and it was a basic part of Copland. This is what Sherlock used in Classic and why it was so fast. The idea is even older, it's from a conceptual computer interface Apple dubbed the Knowledge Navigator. All this appears to be is V Twin running on SQLite instead of a proprietary method.
The interesting part to me is the focus on metadata. I loved this feature in BFS that metadata was king. This is going to lead the way to better file management. Hopefully the Finder will integrate it.
Not since Marie-Antoinette played milkmaid has looking simple and honest been so fake and complicated.
As someone replied earlier, this is a new paradigm in app management: the top menu controls the application, and the window menu controls the window. More importantly, OSX apps are designed to be left open -- keep them open, close or hide their windows, and they'll use virtually no resources, but will start significantly faster the next time you use them.
Having to select the application window before I can quit it using the application menu. Or I have to right click on the dock icon to quit. Annoying still.
Learn your keyboard shortcuts. Take the ten minutes to learn them, and you'll regain hours of your time. Cmd-Q is the shortcut for quit, for example. If you're used to Windows machines, you can switch the cmd key with the Windows key.
Love the dock. Its just ..... right.
Check out Quicksilver, from http://quicksilver.blacktree.com . Once you get used to it [and once it gets used to you], it's phenominally faster than the Dock.
The ability to access the underlying BSD OS easily. Love it.
iTerm, from http://iterm.sourceforge.net , is a great OSX terminal app.
Here's a list of favorite OSX apps I posted a while back. Most are free/OSS, and they're all some of the best apps for any platform.
http://unxmaal.com
Apple looked like it was abandoning, or at least deprecating the concept in OS X.
Well, Apple *did* deprecate the old file type and creator tags, and resource forks in files that the Mac file system had always had since 1984. There were lot of problems with the metadata in the original MFS, not the least of which was that each file on the Mac was actually two files.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
No, they will not (or at least the Quicksilver developers won't). They correctly identify Spotlight as an addition to and not as an replacement of their respective applications. In fact, the QS site even hints at Tiger being an requirement for the final version.
-- The plural of 'anecdote' is not 'data'.
Which explains why it's tied to the filesystem rather than using a general hook at the vnode layer to allow the same functionality to be implemented regardless of the filesystem in use.
Wow. Check it out. Everything you said here is completely 100% wrong.
Spotlight is filesystem-independent. It runs as a set of daemons and stores its metadata database in a hidden directory called ".Metadata" at the root level of the volume.
All your "could be" talk is basically a summary of how Spotlight works.
I write in my journal
I have a volume with nothing on it except 60 GB of AAC files. The metadata folder for that volume is 14 MB.
I write in my journal
What's radical is that it does all the above, plus some. The way I rememver Jobs introducing it is something like this.
You have a program called iTunes that creates a database of your music so you can search for a song by any one of a number of tags, including genre, play time, title, author, etc plus any of the keywords the user adds and how they rated it.
You have another program called iPhoto that does the same for image files because iPhoto understands the internal tags in a jpg (or other image) file.
You have another program called Finder that indexes based on file data. It knows what size the mp3 is, but not how long the song is -which iTunes does know.
You have all this separate programs for dealing with different kinds of files because they all contain different kinds of metadata and internal tags.
Spotlight puts all these kinds of searches in one place, and allows you to combine them. So with the appropriate plug-in filter, it can search any file type and take advantage of any internal tags in the file to speed up the search. Its much faster and more accurate than searching based on the entire contents of the file.
So Spotlight combines metadata it generates itself (file content), with basic file metadata (file size, creation date...) and file type specific metadata (image dimensions or song duration).
Then, IIRC, you can save your search and the results will be updated in real time as files are added or deleted.