Slashdot Mirror


Examining Mac OS X 10.4's Spotlight

Ton writes "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'. The really interesting part is that metadata will be playing a big role in Spotlight while just a few years ago people were afraid metadata in Mac OS X was going the way of the dodo."

15 of 440 comments (clear)

  1. Re:Radical by dJOEK · · Score: 5, Informative

    Spotlight is basically a SQLite db that holds data about documents and files on your system. Metadata is gathered by a sort of 'plug-in' for each different file type.

    A Typical use will be making query's such as: Show me everything agent dero sent me between tuesday and thursday last week. Mails, IM transfered images, you name it... Best of all, since this is metadata based, it's supposed to be lightning fast

    You could envision a plugin that would Spotlightify slashdot threads you read, in theory, and apply the power of a database to it.

    but really, you should RTFA

    --
    Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
  2. Re:Radical by Professor+S.+Brown · · Score: 5, Informative

    The linked article is shit.

    http://developer.apple.com/macosx/tiger/spotlight. htmlYou want this one instead, its got loads more info on what it does and how it works, plus some code examples for the gimps.

    --
    Shitram Brown, PhD
    Professor of Mathematics
  3. Re:Sounds like Windows, actually by Professor+S.+Brown · · Score: 5, Informative

    People who have used it report no performance degredation. And no, its nothing like Windows search, which Mac OS has also had since System 8 or earlier.

    For one, it doesn't take half an hour, it shows you the results as you type, instantaneously.

    Secondly, via plugins it can understand *any* file, such as an image metadata importer that uses OCR so you can search for words, or a Flesh-tone detector so you can search for all your porn that way.

    --
    Shitram Brown, PhD
    Professor of Mathematics
  4. Is THIS the discussion? by siliconjunkie · · Score: 5, Informative

    The post links to the Apple Spotlight page that has been there for months. Is THIS the "discussion" that is being referred to in the post?

  5. the actual discussion/article by Anonymous Coward · · Score: 5, Informative

    >>> "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'.

    What's really funny is that there's no link to the actual published discussion... but anyway...

    http://developer.apple.com/macosx/tiger/spotlight. html

  6. Re:Radical by CountBrass · · Score: 5, Informative

    The radical difference is that Spotlight generates the metadata itself rather than you having to tag stuff yourself. It has content handlers to intelligently tag all kinds of different "stuff" so it "knows" what a Word document is and what a web page is and what a .png file is etc etc.

    --
    Bad analogies are like waxing a monkey with a rainbow.
  7. Re:Sounds like Windows, actually by catwh0re · · Score: 5, Informative
    Actually it's quite different from the index search.

    Already the differences in Fat32/NTFS versus HFS+ (the mac filesystem) yield significantly faster searches before spotlight is introduced. Sit down on an OSX apple and notice that an entire search of the HD is actually a fast operation, not the waiting many-minute exercise that it is on windows.

    Now since spotlight is built into the core of the system, and isn't just a tack-on service like the windows indexer is, there are significant speed advantages, updating the SQL database when files are modified, added, etc is incredibly light on the CPU, and is equivalent to doing something like changing the file name.

    What spotlight isn't, and this might be where you are getting confused, spotlight isn't a spider that crawls from folder to folder cataloguing information about each file, which is what the windows indexer was doing, hence why it was resource intensive, as it was busy checking files and folders that you have possibly not made any changes to.

    As a counter to the 'Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.' Microsoft, google and apple would disagree. Having an up-to-date catalogue without the CPU strain is a must have, go figure MS have been trying to implement it since NT4.0.

  8. Re:Radical by TheRaven64 · · Score: 5, Informative
    Actually, the plug-in architecture was also present in BeOS. BeOS R5 shipped with a plug in that would convert ID3 tags to filesystem metadata. The only novel things about Spotlight are the fact that the plug-ins are invoked automatically in the background (in BeOS they had to be explicitly invoked, usually from the Tracker - BeOS's finder) and the full content indexing.

    I still have to be convinced that full-content indexing is a good idea. I very rarely need to search for something in the contents of a group of files, and when I do it's usually such a small group that the time saved would not outweigh the disk space used by such large indexes. On the other hand, this problem should get better over time, since the largest files are usually video, and have little indexable content, meaning that the index is likely to get relatively smaller over time (until someone writes a plug-in that can interpret objects in images, and applies this to every frame in a movie. Fortunately, I think this is still a long way off).

    --
    I am TheRaven on Soylent News
  9. Re:Radical by jtrascap · · Score: 5, Informative

    Okay - I'll bite

    * Desktop-metaphor based GUI for a personal computer
    * WYSIWYG publishing with a laser printer
    * PDAs via Newton
    * AppleLink (err, AOL now)
    * QuickTime (movies, QTVR, 3D, etc)

    We could go on and on. Give Apple props where due, huh?
    And please consider modding the troll down...

  10. FYI... by jonr · · Score: 5, Informative

    Just a small info. The brain behind Spotlight is Dominic Giampaolo, the same guru that wrote the fantastic BeFS for BeOS.

  11. a problem that doesn't really exist by guet · · Score: 5, Informative

    uhm. No. It is not continually indexing the data, if you read the article you'll see it only updates the meta-data for items when they're saved - you can write custom plug-ins for new data types, or just go with the bundles ones for standard file types like images, text etc.

    Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.

    On the contrary, this is a *better* solution to a very basic problem that has plagued computers since they were invented.

    The problem :
    How do I organise and access the data I use every day (emails, letters, images, music etc)?

    The old solution :
    You can put your files in folders (one per file). You can name the files with a short description, ending with a cryptic 3 letter code to denote the file type. Files *must* be in one category/folder only at a time. Limited meta-data (date modified, file-type etc) may be stored.

    The new solution :
    You add meta-data to files (often automatically) saying who created them, what project it's under, whether it's 'to do' or 'unfinished' or whatever. You'd do this in a save dialog for the application, as you saved the file. All other applications which use searchlight will update their view of this stuff for free, in real time.

    When you want to work on a project, you click on the live project folder, and immediately you see all the files, emails, images etc for that project, no more, no less, regardless of where they are on the disk and what other projects they're shared with.

    Want to see all the stuff to do with John, 5 months ago? On this project? Containing the word gizmo? That sort of query will be easy to make.

    If you have an image editing application, it can show you all the images taken in Paris in 2002, without having to build a database application into it. This makes adding this kind of feature to applications trivial.

    Ideally adding meta-data tags like 'project-1', and 'To do' should be as easy as choosing them in the save dialog or applying them like a label in the Finder. It's not quite at that stage yet, but that should come later. Some of these ideas are quite old (Be), but they are long overdue in a desktop operating system.

  12. Re:Sounds like Windows, actually by LiquidCoooled · · Score: 5, Informative

    The reason Windows XP does not do full text search correctly is because it uses a specific registry handler entry for each type of file (*.txt, *.rtf etc). It uses a different handler for different types of files.

    However it only comes with a few configured filetypes settings, and no way to set a default "When no searchFilter available, treat as plain text" setting.

    I stressed and strained about this when XP came out initially. The only way I found to do it so I got expected results was to build myself a scanner.
    It searched through a drive, and identifies EVERY file extension.
    It then looks through the registry to see which Extensions have linked Handlers.
    It generates a reg file containing stub links for every unmatched filetype.

    Its a bit shotgun, but allowed me to continue using the Text search for XP.

    Microsoft have released their own shotgun registry pack, for more info see here:
    http://support.microsoft.com/default.aspx?scid=kb; EN-US;Q309173

    (I have since moved myself into using my own full search tool, but at least the XP search doesn't miss files which are clearly within visible range).

    [Now for the science part..]

    Take a file, something like "PunchTheMonkey.asp".

    Make sure you have it open in notepad, and make sure there is a certain text string - for instance "spyware".

    Open the windows XP search in that folder, tell it to search *.ASP, and give it the phrase "spyware".

    Windows XP will NOT find this file.

    -----

    The Windows .TXT flat text handler is identified by using a registry key:

    [HKEY_CLASSES_ROOT\.txt\PersistentHandler]
    @="{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"

    Adding an entry like the one above for each required filetype will restore the full text search functionality.

    So, I add the following entry into the correct .ASP place

    [HKEY_CLASSES_ROOT\.ASP\PersistentHandler]
    @="{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"

    After I have logged off/rebooted, I try the same again, and XP will now identify the file.

    --
    liqbase :: faster than paper
  13. Re:Im very interested... by Anonymous Coward · · Score: 5, Informative

    Clicking the 'X' doesnt actually close the application. This annoyed me to start with, but ive slowly gotton used to it.

    If you want to quickly quit a load of apps or switch application, hit cmd-Tab, and then cycle through the apps with the tab key.

    However you have one gig of RAM on the system. You have no need to quit the programs when switching between them. They'll be paged out to disk as necessary if you manage to fill the available RAM. Multi-tasking works very well as processes aren't in general allowed to hog the processor.

    I think this is a common thing amongst people who're used to windows - the windows in OS X represent documents, not applications, so that's why they can be closed without quitting the application. You will find Apple managed to balls this up by being inconsistent though - some applications DO quit on closing the window, but in theory they're applications which only have one window, and are utilities, like the Address Book.

    Be sure to try expose as well, though I doubt it'd work well on that older system.
    http://www.apple.com/macosx/features/expo se/

  14. Re:Im very interested... by Unxmaal · · Score: 5, Informative
    Clicking the 'X' doesnt actually close the application. This annoyed me to start with, but ive slowly gotton used to it.

    As someone replied earlier, this is a new paradigm in app management: the top menu controls the application, and the window menu controls the window. More importantly, OSX apps are designed to be left open -- keep them open, close or hide their windows, and they'll use virtually no resources, but will start significantly faster the next time you use them.

    Having to select the application window before I can quit it using the application menu. Or I have to right click on the dock icon to quit. Annoying still.

    Learn your keyboard shortcuts. Take the ten minutes to learn them, and you'll regain hours of your time. Cmd-Q is the shortcut for quit, for example. If you're used to Windows machines, you can switch the cmd key with the Windows key.

    Love the dock. Its just ..... right.

    Check out Quicksilver, from http://quicksilver.blacktree.com . Once you get used to it [and once it gets used to you], it's phenominally faster than the Dock.

    The ability to access the underlying BSD OS easily. Love it.

    iTerm, from http://iterm.sourceforge.net , is a great OSX terminal app.

    Here's a list of favorite OSX apps I posted a while back. Most are free/OSS, and they're all some of the best apps for any platform.

    --
    http://unxmaal.com
  15. Re:How to do the hard part easily on Linux or BSD. by Twirlip+of+the+Mists · · Score: 5, Informative

    Which explains why it's tied to the filesystem rather than using a general hook at the vnode layer to allow the same functionality to be implemented regardless of the filesystem in use.

    Wow. Check it out. Everything you said here is completely 100% wrong.

    Spotlight is filesystem-independent. It runs as a set of daemons and stores its metadata database in a hidden directory called ".Metadata" at the root level of the volume.

    All your "could be" talk is basically a summary of how Spotlight works.

    --

    I write in my journal