Examining Mac OS X 10.4's Spotlight
Ton writes "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'. The really interesting part is that metadata will be playing a big role in Spotlight while just a few years ago people were afraid metadata in Mac OS X was going the way of the dodo."
Can someone please explain a little more as to how Spotlight using metadata is a "radical" new thing?
I haven't seen any mainstream implementations (WinFS?) of it, but I didn't know it was a brand new concept.
Error 407 - No creative sig found
Mac OS X thread, not Mac OS 9 thread, silly.
You must have a different version of locate to me. I can't get mine to index my emails, it has no idea about the metadata entries in common document types and can't tell the difference between an image and a movie file.
Could you send me the source for the version you have installed that does that?
My windows XP search (at work) is very odd. It will not find text in assembly files (*.S) that I know is there. I've played around with turning the indexing thing on and off to no avail. That and other strange behaviour led me to find Visual Grep which is well worth whatever I paid for it (50 USD?). Still something like that should work in a real OS.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
I've been using computers through my whole childhood, school, work, etc. I'we been doing all sorts, from playing games to hardware design and real time data analysis on the computer. And never, ever have I had the need to search for my files.
;)
What is with this search thing that everybody is so hot about?
Not that it doesn't look cool though. Anything on a Mac looks cool
um...
no
locate is extreemly primitive compared to spotlight.
you goose.
um...
no
Perhaps you don't have the imagination to see how it is different. Imagine being able to type in "dog" and get everything on your drive that has dog in it anywhere. In the title, the text, the metadata, within pdfs......
another goose
Anyone who has used the instantly updated searches in Mail.app or iTunes will have a feel for how useful a system-wide approach could be. However I too am concerned about resource usage. I think I'll wait and see how big the metadata index tends to get and how big the CPU/memory hit is.
I believe though that the indexing is done during saves, so you'll not notice a general system slow down. What you will notice is a slow down on file saves.
People who have used it report no performance degredation. And no, its nothing like Windows search, which Mac OS has also had since System 8 or earlier.
For one, it doesn't take half an hour, it shows you the results as you type, instantaneously.
Secondly, via plugins it can understand *any* file, such as an image metadata importer that uses OCR so you can search for words, or a Flesh-tone detector so you can search for all your porn that way.
Shitram Brown, PhD
Professor of Mathematics
From reading the article, I think Hans Reiser has been right about the need for reiser4 on mainstream linux.
He saw all this stuff comming from way back. If you read the LKML, you will remember that he warned us.
Its a pity no one listens to him.
>You must have a different version of locate to me.
That might be the case.
There are no atheists when recovering from tape backup.
Or should it be co-operative since it's a mac thread
Twentieth century called, they want they trolls back.
I think you're supposed to switch from searching 'documents' to 'files and directories'. If that's accurate, I do not know - my prime exposure to Windows is in magazines.
~phil
makewhatis.cron can be a pain on Linux as well, if it is on a workstation which is mostly switched off.
Unfortunately for windows boxes, they do tend to be left shut down a lot of the time, so more of their runtime is spent rebuilding the search database when the machine is being used for something, rather than in the middle of the night, which is the preferred way
http://michaelsmith.id.au
I had the very same problem with XP, and it's supposed to be like that, but you can change the standard behaviour.s p
This link adresses the issue:
http://www.pcmag.com/article2/0,1759,1206399,00.a
The post links to the Apple Spotlight page that has been there for months. Is THIS the "discussion" that is being referred to in the post?
>>> "Apple has published a discussion of Spotlight, the radical systemwide search technology that will be part of Mac OS X 10.4 'Tiger'.
. html
What's really funny is that there's no link to the actual published discussion... but anyway...
http://developer.apple.com/macosx/tiger/spotlight
People who have used it report no performance degredation.
What's the baseline? How bad was the previous search engine?
"Users reported increased productivity do to less problems with the operating system after upgrading to Windows XP." Yeah, but what are they comparing against??
I read about beagle for linux it seems to be very similar in functionality. http://www.gnome.org/projects/beagle/
-- My site
Already the differences in Fat32/NTFS versus HFS+ (the mac filesystem) yield significantly faster searches before spotlight is introduced. Sit down on an OSX apple and notice that an entire search of the HD is actually a fast operation, not the waiting many-minute exercise that it is on windows.
Now since spotlight is built into the core of the system, and isn't just a tack-on service like the windows indexer is, there are significant speed advantages, updating the SQL database when files are modified, added, etc is incredibly light on the CPU, and is equivalent to doing something like changing the file name.
What spotlight isn't, and this might be where you are getting confused, spotlight isn't a spider that crawls from folder to folder cataloguing information about each file, which is what the windows indexer was doing, hence why it was resource intensive, as it was busy checking files and folders that you have possibly not made any changes to.
As a counter to the 'Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.' Microsoft, google and apple would disagree. Having an up-to-date catalogue without the CPU strain is a must have, go figure MS have been trying to implement it since NT4.0.
Apple are well known for optimising their software to be significantly faster with each pre-release build. Having had the opportunity to test the developer tester of 10.4 with spotlight on a 12" powerbook (which was bogged down with various applications at the time) I can assure you that spotlight remained snappy, and definitely true to the 'instant' claim (I've noticed apple are quite careful on not over advertising their products, as it cause more problems than sales and a bad image). After using microsoft products we become very used to how slow a process can be. Apple's advantage is clear, they know their target hardware, like video-card driver writers they can optimise any part of their OS to fit their hardware for optimum speed. Additionally the g4/g5 chipsets have some quite useful registers for performing these sorts of searches (think sort of like MMX for x86, except with developers actually utilising them outside of games)
I am sure there are differences - but seems like in concept they all work the same way - anyone like to comment and contrast on the differences here ?
..And the people bowed and prayed, To the neon gods they made.
I'm waiting for Tiger so that I can try out Automator. This promises to be a point-n-click version of scripting. Hopefully this will be easy enough to use even my parents and maybe even my boss will be able to use it.
The first thing I'll do is try making an Automator to create thumbnails. Currently I'm using a bash script I wrote on my Linux box to do this. This will be the first time I've paid for an OS upgrade since Win98, so I hope it's worth it.
Vote for global prefs bug
Will this sucker have an opt-out? I don't want it.
Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.
Agreed - but they're racing to the finish line anyway.
Why?
Eh, the other day I couldn't get Windows 2000 to locate a file which I knew existed. It found two other instances, but not the exact file I was looking at in an Explorer window.
Hey wanker, you dropped something.
m
Actually it's more like getting my ancient AA C compilier to work DOSEMU and my USB EPROM programmer to with Linux or *BSD so I could just get over it.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
As far as I can tell, that locate doesn't do other metadata than privileges and ownerships.
Nice of you to announce you would support the Mac platform with your Google search, but as you can see Apple has it covered.
We are Mac users, we are spoiled.
come see the pretty pictures
That doesn't work, tragically.
I was hacking an irritating module for Mambo the other day, that I knew had specified somewhere in one of the files a background colour, rather than using the overall Mambo template like its supposed to. Peeking at the HTML source I found the hex code for the colour. Figured rather than opening each document and doing a find, I'd do a search for all 'files and directories' that contained that HEX string. No joy. As far as WinXP could figure, that string didn't exist, even though I've got indexing turned on on the drive.
In the end I opened all the files in Textpad and used its 'find in all files' method to dig out the annoying line of code.
"Joy is not in things; it is in us." Richard Wagner
So what do you use, big boy? *wink*
Just a small info. The brain behind Spotlight is Dominic Giampaolo, the same guru that wrote the fantastic BeFS for BeOS.
Windows search does not search in .cpp .h .html .xml and other files noone uses for increasing search speed. Seriosly.
Content Search Does Not Search All File Types for the Specified String
uhm. No. It is not continually indexing the data, if you read the article you'll see it only updates the meta-data for items when they're saved - you can write custom plug-ins for new data types, or just go with the bundles ones for standard file types like images, text etc.
:
:
:
Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.
On the contrary, this is a *better* solution to a very basic problem that has plagued computers since they were invented.
The problem
How do I organise and access the data I use every day (emails, letters, images, music etc)?
The old solution
You can put your files in folders (one per file). You can name the files with a short description, ending with a cryptic 3 letter code to denote the file type. Files *must* be in one category/folder only at a time. Limited meta-data (date modified, file-type etc) may be stored.
The new solution
You add meta-data to files (often automatically) saying who created them, what project it's under, whether it's 'to do' or 'unfinished' or whatever. You'd do this in a save dialog for the application, as you saved the file. All other applications which use searchlight will update their view of this stuff for free, in real time.
When you want to work on a project, you click on the live project folder, and immediately you see all the files, emails, images etc for that project, no more, no less, regardless of where they are on the disk and what other projects they're shared with.
Want to see all the stuff to do with John, 5 months ago? On this project? Containing the word gizmo? That sort of query will be easy to make.
If you have an image editing application, it can show you all the images taken in Paris in 2002, without having to build a database application into it. This makes adding this kind of feature to applications trivial.
Ideally adding meta-data tags like 'project-1', and 'To do' should be as easy as choosing them in the save dialog or applying them like a label in the Finder. It's not quite at that stage yet, but that should come later. Some of these ideas are quite old (Be), but they are long overdue in a desktop operating system.
No, it's just broke... it won't find some things that are there sometimes.
VS6 and VS.NET use the same algorythm - you can search for things and have it miss things you know are there (sometimes that are on the screen in front of you).
The Both links say quite a bit. I guess the kernel gurus know better, but i think the sql plugin for a FS would be a cool thing to show off with at the very least.
n .net/Articles/100148/
http://kerneltrap.org/node/view/3727
http://lw
Coming from a WindowsXP background, some things Ive noticed so far:
- Clicking the 'X' doesnt actually close the application. This annoyed me to start with, but ive slowly gotton used to it.
- Having to select the application window before I can quit it using the application menu. Or I have to right click on the dock icon to quit. Annoying still.
- Love the dock. Its just
..... right.
- Most of the file system is hidden from you, which I like. Put my data where I want it and ignore the rest.
- The ability to access the underlying BSD OS easily. Love it.
- Everything looks and feels 'polished'. THats what I always hated about KDE/Gnome when I tried them, the features were there, but noone had taken the time to step back and polish the entire thing off so it all looks and feels together.
- Every time I boot the Mac, my TFT display is 'wavey' until i have the monitor do an autoadjust. Dont really know whoes fault this is, tho its fine under windows and linux.
So, final conclusion? I love it, so much that I have already placed an order for a G5 Imac. And in the meantime, Ive purchased a G4 upgrade for this little baby, just to help it alongTurning on the indexing service appears to make no discernable difference in search times either. It just means your machine periodically grinds away for an hour building up a table. I have no idea what it's doing but searches seem to take as long whether it's on or off.
I think in all, I'd prefer something simpler like slocate, that builds up a file index but doesn't attempt to read the contents. Even slocate takes a while to index, but at least it works as designed.
Meta info sounds better, but even that could be fraught. The idea failed miserably for the web. I can well imagine if the idea catches on for local files that over zealous apps could start stuffing their files full of useless meta info so you're continuously getting false hits.
I'm a PC (Win/Lin) user, and I'm thinking about changing over to Mac.... lol, I'm not that cliche. But I might consider learning more about them. They are nice powerful beasts within. They'd be nice to have on a Folding Farm. :D
The reason Windows XP does not do full text search correctly is because it uses a specific registry handler entry for each type of file (*.txt, *.rtf etc). It uses a different handler for different types of files.
; EN-US;Q309173
.TXT flat text handler is identified by using a registry key:
{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"
.ASP place
{ 5e941d80-bf96-11cd-b579-08002b30bfeb}"
However it only comes with a few configured filetypes settings, and no way to set a default "When no searchFilter available, treat as plain text" setting.
I stressed and strained about this when XP came out initially. The only way I found to do it so I got expected results was to build myself a scanner.
It searched through a drive, and identifies EVERY file extension.
It then looks through the registry to see which Extensions have linked Handlers.
It generates a reg file containing stub links for every unmatched filetype.
Its a bit shotgun, but allowed me to continue using the Text search for XP.
Microsoft have released their own shotgun registry pack, for more info see here:
http://support.microsoft.com/default.aspx?scid=kb
(I have since moved myself into using my own full search tool, but at least the XP search doesn't miss files which are clearly within visible range).
[Now for the science part..]
Take a file, something like "PunchTheMonkey.asp".
Make sure you have it open in notepad, and make sure there is a certain text string - for instance "spyware".
Open the windows XP search in that folder, tell it to search *.ASP, and give it the phrase "spyware".
Windows XP will NOT find this file.
-----
The Windows
[HKEY_CLASSES_ROOT\.txt\PersistentHandler]
@="
Adding an entry like the one above for each required filetype will restore the full text search functionality.
So, I add the following entry into the correct
[HKEY_CLASSES_ROOT\.ASP\PersistentHandler]
@="
After I have logged off/rebooted, I try the same again, and XP will now identify the file.
liqbase
Does anybody have an informed idea about how they do the full-text instant search results trick? I use Opera's mail client, and it does the same trick when searching your mail. It is pretty impressive, algorithmically speaking. To pull this off, standard inverted files used by search engines are probably too slow. I personally suspect they use some variant of a suffix tree. Maybe some of you know for a fact how they do it?
Being well balanced is overrated. -- John Carmack
Probably very similar to Search Kit which currently does the same thing, but has to be manually set up. You can choose the type of index it creates, inverted, vector or both together.
What if I want to find files from male colleagues?
Monkey! Go read the comment I made a few items below yours.
7 53280
It shows clearly how to rectify the EXACT problem you are having.
this is a huge mistake from microsoft.
heres a quick link:
http://slashdot.org/comments.pl?sid=128854&cid=10
liqbase
Man, you actually paid for visual grep? :p
Seriously, though. This adds one more item to my list of things that make me wonder how people can ever work with Windows. You said it yourself, you need a Real OS. Or just grep for Windows, or a whole bunch of utilities while you're at it.
HTH
Please correct me if I got my facts wrong.
Unless you used BeOS in the past!
This really is a big deal, much bigger than Microsoft's feeble attempts at full text search, or Google's desktop search. In many way's this much, much more useful than full-text search, especially for developers.
At home I have about 6,000 MP3s, a 1000 photos, 500 scientific articles in PDF format and hundreds of words files that I need to juggle. Each one has its own metadata database, and none of them are updated in real time.
Databases:
MP3 - WinAmp & AudioTron
Photos - Photoshop
PDFs - Acrobat Indexer
Word files - MS Indexer
That doesn't include any of the other data that is stored completely databases and would have been easier to store in the file system - like email, guitar tab files and god knows what else.
A properly implemented global meta-data store (that works at the filesystem level, not as an iterative service) profoundly changes how one uses the system, making sorting and finding data actually almost pleasurable.
+--------------------- You idiot! I told you we were facing the wrong way!
This has already been done to some extent in Quicksilver.
/.
http://quicksilver.blacktree.com/
It's an app that indexes parts of your file system and supports plugins to to index application data. The best part is that it is keyboard based. For example. type command-space "slash" enter and it fires off Safari opening
I'm not sure how Apple will improve on this.
The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
Still, I see the twisted Microsoft logic, I'll have to go home and try this on OS X and see what happens.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
What's up with apple and German tanks? First the Panther (http://www.achtungpanzer.com/pz4.htm#panther) and now the Tiger (http://www.achtungpanzer.com/tigerp.htm). What's next, the Leopard? When apple releases Mac OS 1x.x Leopard II, then I'm buying a Macintosh!
The brain behind Spotlight is Dominic Giampaolo, the same guru that wrote the fantastic BeFS for BeOS.
Which explains why it's tied to the filesystem rather than using a general hook at the vnode layer to allow the same functionality to be implemented regardless of the filesystem in use. Having the filesystem support it would make it more efficient on HFS+ but it should be possible on UFS, ISO 9660 CDs, or even over NFS or SMB.
In fact, the way it's described... with one metadata store per filesystem rather than per file, and user-level metadata provided by applications... this is something that FreeBSD or Linux could implement right now, over any file system: all they would need would be a mechanism for the vnode layer to send messages to a usermode daemon that tracked inode operations (eg, creation, deletion, maybe mode changes or date changes, and renames) in a name-inode database (any database, including Postgres or MySQL) and updated any associated metadata in the background.
This could be done with negligable slowdown for file operations: the index can be updated asynchronously, because it can always be recreated in the background after a crash, so the vnode operation won't ever have to wait for the daemon to respond... and changes to the metadata are all in userspace.
Yes but my biggest concern is how they are going to handle mounted file system. If they don't index them then Spotlight will not help my users much. If they do when does the indexing happen and how often? And when you work with Terabytes of data how long is that going to take. Also where does the data base live is it on the server where all the clients can use the data or is it on a per machine basis where all of my 200 some client machines will have to index all the network storage?
Well I guess we will just have to wait and see...
-S
It is said that a child learns wisdom from the parent,
but the truly wise parent learns joy from the child
Devon
What hasn't been mentioned is the smart folders will always keep you directories uptodate. No more drag and droping files after I download them.
The question is will I be able to make smart folders based on permissions I give on my files so that I can share them on my network.
weo
#=-weo-=#
Quicksilver is a pretty nifty program, and I used to use it as a free alternative to Launchbar (which used to perform roughly the same tasks). Both programs learn what you want certain shortcuts to mean, and both use Command-Space to activate them. For me, entering 'FX' is Firefox, 'PS' is photoshop, and so on.
However Launchbar has since updated to 4.0 beta release, and in doing so has pre-empted spotlight, as it does (right now, in 10.3) index system-wide metadata. So now you can cue up songs by entering MP3 names, open any kind of files by entering keywords for filename or type, open websites, perform google searches,Google image searches and so on.
It's worth trying out as an alternative to Quicksilver.
Uhm. Every file system in OSX is "mounted", so I'm not sure what your point is.
Also: we're talking about a desktop system here: how many desktops have terabytes of data: or were you just trolling?
As to the network scenario: why would you build a separate index on each machine covering the entire network: that would be arse-achingly stupid. Use the searchlight database on the machine where the data is held. Pretty obvious really.
Bad analogies are like waxing a monkey with a rainbow.
If I recall correctly, MacOS X has a disk indexing service that defaults to 'on'.
I think XP has one but it defaults to 'off'.
I strongly suspect that may be part of the reason for the significant differences you're seeing in search times. It's a little like comparing using 'find / -name blah' on OS/X to 'locate' on Linux and using that to say that your Linux filesystem of choice is faster - it doesn't make much sense.
That said, these 'notification-triggered' indexers like Spotlight sound interesting, and are much nicer than the disk crawlers found on all majors OSes at present. I'll be watching to see where this and similar efforts go with interest.
stat file.jpg
in linux. Would be nice in linux to beef up on metadata too.
I hope that spotlight will work also, if you have a linux partition exported to the Mac via NFS. Will file information of NFS mounted systems also stored in the database?
Having linux and OS X working together is already now not without issues. If you have a file Test.jpg and test.jpg in your Linux partition and you copy both to the same place in OSX, the finder (on the mac) complains, because the two files are considered the same.
I think they were trying to prevent freezing whilst searching through a massive data files or archives and the other such dark web type things.
Their mistake wasn't creating different filters for the searches, but that there was no way to set a default.
liqbase
Check out Mor Naaman at Stanford who is working on adding GPS metadata to photographs. Once he has the GPS coordinates he uses that to get information such at time of day, lighting, weather, elevation, temperature, etc... This allows you to create metadata searches for "All early morning images in clear weather in Las Vegas, etc..."
YOu can try the system out here with a collection of almost 4k images.
Apple has had this type of search engine before, they called it V Twin and it was a basic part of Copland. This is what Sherlock used in Classic and why it was so fast. The idea is even older, it's from a conceptual computer interface Apple dubbed the Knowledge Navigator. All this appears to be is V Twin running on SQLite instead of a proprietary method.
The interesting part to me is the focus on metadata. I loved this feature in BFS that metadata was king. This is going to lead the way to better file management. Hopefully the Finder will integrate it.
Not since Marie-Antoinette played milkmaid has looking simple and honest been so fake and complicated.
This article provides a little bit more to the discussion:
http://www.appleinsider.com/article.php?id=733/
I think this is what will be most interesting, is that you can save your query for use anytime. All of a sudden it becomes easier to mange projects that are constantly being updated.
it won't work properly in Longhorn and you have to wait three more OS revisions for Microsoft to get it right.
Sounds funny, but it's more likely to be true.
That's just it - a desktop system can mount a drive/share that's 200GB, 500GB, 1TB.
.dot file at the top of the mounted drive/share's directory tree that is world read/writeable?
/Volumes/bigfileshare ../johns stuff ../don's stuff
.dot file dropping) - does it still adhere to the file system's privleges when showing results?
So the question is - after you've mounted the drive, does the indexer wait till a low-useage-cpu-state and start indexing?
Does it drop the index in a
Assume the directory tree is such that it's
if if both subdirs are chmod 700, if don does a spolight search, will John's stuff show up? (I assume that the file system would prevent an indexer process, when John's logged in, from indexing Don's data).
However, if the metadata index is shared (ie the
And what if I'm logged in as root on the machine and do a spotlight search - will that turn up hits to either user's data when root can't really read the FS of either user's directories?
http://slashdot.org/~tf23/journal
Charmed.. I'm sure. If you weren't being so idiotically quick to answer, and so stupidly quick to put people down, I was replying to the comment above, not stating that there wasn't a solution to the overall issue.
"Joy is not in things; it is in us." Richard Wagner
From now on, I think every digital camera should have GPS interface.
I could always keep a GPS receiver on me, and then update the EXIF data from the GPS data with some clever programming... (hmm... a project is born)
J.
I would beleive that the indexing is handled as a server on the machine that host the files. This would be a logical choice. User search on his machine, his spotligth server sends a request to the spotlight of mounted file system, passing credentials (as it would have to be done in local search) and present the results.
For removable media (like a removable firewire), it would problably have to be indexed on a single host and scanned if it have been written to by someone else.
Or maybe they would just ignore removable media.
it handles pdfs! Yippee!
-- I speak only for myself
# find /* >> /biglist
# chmod 755 biglist
# grep -i "mach is not bsd" | grep *.txt | grep *.htm* | less
Try Google Desktop Search, and see how completely the resource consumption problem has been solved in 2004.
Documents are indexed as files are saved. The performance hit is during document saving. There is no need for "background indexing".
Apps need to be made "Spotlight-aware" in order to invoke the Spotlight indexing on save.
> Already the differences in Fat32/NTFS versus HFS+ (the mac filesystem) yield significantly faster searches before spotlight is introduced. Sit down on an OSX apple and notice that an entire search of the HD is actually a fast operation, not the waiting many-minute exercise that it is on windows.
Your experience is different from mine. Full text search from Mac OS X finder is a very slow operation with a painfully bad user interface. On the contrary, full text search a subfolder in W2K is much easier.
> Now since spotlight is built into the core of the system, and isn't just a tack-on service like the windows indexer is, there are significant speed advantages, updating the SQL database when files are modified, added, etc is incredibly light on the CPU, and is equivalent to doing something like changing the file name.
This is a ridiculous claim. Adding an email to a 1Gb file will require re-parsing the whole file. And comparing a SQL transaction involving hundred on inserts to a file name change only shows your lack of understanding.
> Having an up-to-date catalogue without the CPU strain is a must have...
Agreed. But I am a bit afraid of the side effects. Like compilations that copies around thousands of header files. Xcode is already painfully slow...
Spotlight is easy to do in the simple case. Even the plugins to extract content are very similar to NeXT Librarian.app filters (but those were not real time).
The hard case involve duplication of file hierarchies, generation of temporary indexable documents, handling of server-mounted volumes and removable medias.
I've tried Spotlight and suggest that when it comes out, every time you step away from your computer make sure to lock your screen. All someone has to do is type 'porn' into the little search toolbar and within seconds it's all nicely listed.
Perhaps Apple needs to add a feature to turn off indexing for certain directories.
And without a proper search tool, how is it, exactly, that we're supposed to keep track of our fellow insurgents, plans for sneaky attacks, plots to undermine the "powers that be" and means of crippling the status quo?
;)
A disorganized revolution is just a waste of time.
Keeping one's data organized is a priority, bucko.
#SickNotWeak
Who did I put down?
:)
You clearly stated in your post that you had come across the problem we were discussing.
I'll try not to help out folks in future
liqbase
Does anyone know how they are going to deal with security? Will the indexed information inherit the same security attributes as the underlying files? Do the indexers run as root?
How well this system works will in part depend upon how many data format plug-ins are provided. For example, take something like the SID audio format. It's relatively unknown, but has an officially registered MIME type with IANA giving it a status above many other file format types, and it is used to provide background sounds on some web sites. Will it make the cut?
This is just one file format chosen at random. There are thousands out there, some of which are used pretty heavily for documentation in certain circles. How about all of the OpenOffice file formats, or the AbiWord format?
I can see this feature being hugely useful if Apple does a good job of providing plug-ins, and making it easy for third-parties to add more.
"...how often does one do a blind search of the whole system anyway?" well, you've got to realize that this will be the big topic for the next boring couple of years. even google, not to mention apple, MS and every 'nix flavor are working on solutions. managing your information.
;)
i have to admit that i have crapola all over my harddrive that i will never go back to -- the files just keep getting buried and copied over to my newest computer. even if spotlight is kinda flawed, engineers have to start looking for better ways to manage information.
and besides, it gives MS something to do besides f-ing up browser standards
No, apps do not need to be made "spotlight-aware." The indexing happens whenever the kernel notes a file write; even command-line changes to files will cause the index to be updated.
properly implemented global meta-data store (that works at the filesystem level, not as an iterative service)
This is an iterative service, with some hooks into various libraries so that it can capture disk writes and update the db. Ergo, there are potential concurency issues.
I wonder if Dominic has been sold on seperating the indexing from the filesystem, or if this was the design that emerged after a need to keep HFS+ around for a while.
KDE has had a plugable metadata framework for years -- KFileMetaInfo. There's also a similar layer in GNOME VFS if I'm not mistaken.
They haven't yet been used for indexing in anything that's presently released, but that's coming.
The kind of metadata that was almost deprecated by Apple isn't quite same thing as the "modern" concept of metadata. The classical HFS metadata covered concepts like file type, file creator, and "Finder bits" that aren't handled at the file system level in other OSes. This, combined, with the Mac OS's historical use of resource forks for storing developer defined data records, made perserving such data difficult or impossible in heterogenous environments like the Internet. It's really a shame; I've always thought this concept was the most elegant attempt to solve the problem of "rich data" associated with data files without requiring the data in the file itself to have some form of universal container format.
The metadata concept used by Spotlight is going to be based in part on a plug-in system that allows the Mac OS to reconstruct metadata information from the data within files themselves, rather than just using the metadata facilities provided by HFS and Mac OS resource forks. That means that each different kind of file, from Word documents to PDFs to Postscript jobs, needs its own special kind of processing to read its own format of storing such data. It's less elegant and more processor intensive that just using the historical HFS system, but it's more likely to to be useful for extracting metadata from files provided by Windows and other Unix variant users.
Those who complain about affect & effect on
I'm not convinced yet apple is going to get Spotlight right, i.e. truely revolutionary. It has potential (smart Finder folders is on the right path) but at the moment, it seems they are more interested in simply trying to duplicate Quicksilver/Launchbar technology, which is the wrong way to do this.
I'm tired of apple ripping off ideas from developers without (A) Giving them credit or (B) developing something equivalent so the new as at least as feature-full as the old. Based on apple's history, the first version of Spotlight will likely be a horribly dumbed down version of Launchbar in terms of tech, since apple is obsessed with "ease of use": i.e. a three year old has to be able to work it.
Rant aside, there are a few key pieces I think apple is missing:
(1) User-created metadata. I should be able to tag anything I want with any metadata I want so the organization system follows ME and MY preferences, instead of the system determining it for me. Apple should be thinking about taking the insanely wonderful metadata system they created in iTunes and applying that to the finder. It is essential you be able to tag metadata in, because you don't always access the same objects for the same purposes.
(2) Flexible file system. This is a concept I've developed which basically says that the file system should be dynamic and adaptable to match the thought flow of the user (only possible with a good metadata file system). If you've ever seen this app on the PC, think: "The Brain". What that means is that if apple does #(2) right, it should be easy as hell to tag things, and then basically I can create relationships which let me "flow" through my files by navigating CONCEPTS instead of folder heirarchy. A good app that does this is Devonthink. Devonthink will grab the contents out of your files, and when you do a search, you can not only see your search term but "related" search terms. Click on a new search term and you get a new listing. So as you come up with ideas about what you want to do, you can easily and naturally branch off into other parts of your file system. This methodology models the way the human brain actually works- thinking in concepts and spacial organization, rather then structure. (The "flexible" comes because the system takes your tags and adapts the search around them, allowing you to change how the "flow" works, depending upon what topics are most important to you.)
(3) The next level after metadata search is a new way of visually interpreting the metadata and relationships between. Which means a NEW FINDER. I can't believe Steve actually threw this comment out after demoing Spotlight: "With this, you probably won't even need to use the finder any more." Well then why even have the Finder at all, Steve?! There IS a reason for the finder, which is why it's stayed around all these years, and that is that people think SPACIALLY. People are creatures of habit, and one way we remember where things are is if we know where to look for it and it's always in the same place. Which means there needs to be a visual grounding to the above dynamic files system, to give people a sure footing to all of this. I'm talking about things like a window that always stays in the same spot and always performs the same task, like showing you what new files have been added to the system, or actively updating your list of word documents wherever they are. Right now in the finder, a window is a window is a window. That shouldn't be. If a search is applied to a window, then that window isn't just showing you files, it's performing an active function. The finder needs to evolve to take on the new roles and responsiblities it should have in the context of a metadata files system. Spotlight should replace the finder: the two should work together seamlessly.
The good news is that Spotlight is built into the system, so even if apple screws up the implimentation (likely), the next generation of 3rd party apps will hopefully be able to fill in the gaps.
Filesystem metadata is great, but "instantly" updated search indexes sounds like a solution to a problem that doesn't really exist.
Doesn't exist *for you* perhaps. Perhaps you don't have a lots of user data, or you have taken time to sort it into useful folders. I'd say it's about as useful as the incremental seach in iTunes is. Sure I could remember what artist did a track, and access a track by scrolling down to that artist, then finding the track. Or I could scroll down the list of thousands of track names, remembering my alphabet ordering, and locate the track that way. Assuming I've remembered the exact wording of track name. But I've always found it easier to type whatever word comes to mind first from artist or track into the search box.
And so it is with documents. Even if I do remember the file name and folder that a particular piece of information is stored in, I still need to navigate there. Most times it will be quicker just to type in whatever it is you remember about the data you want into a search box - even if you know where the data is stored.
The backdraw of putting the metadata inside the FS / Letting the FS handle it ... is: Interperability with other FS's and/or other computers/users.
It's all fine and dandy, if you copy from a HFS+ FS to another, or send an Apple .dmg file to another apple user (this is btw, how apple seems to have chosen to deal with the problem... but it only works for apple-to-apple users, as IIRC there isn't an open standard implementation...). But say you want to send that .txt file of yours to me, you on Mac OS X and I'm on Linux ... you read the file of the FS while you leave the Metadata on the FS, and then you send it to me. So all Metadata you've had about that file stored in your FS stays there, while I only get the content of the file and know when I recived it from you.
So unless the Metadata is stored inside the files, like may file types have started to do, and Everyone has adopted the same standard - we are going to hit a snag. I mean 85-90% of the world doesn't do 'Apple's implementation' ... they do Zip/tar.gz/Tar.bz2/etc..etc.. and IIRC none of them does store the resource-forks/metadata and what not else Apple does store in the .dmg-file that is stored in their HFS+ FS...
Put in simple terms, MetaData in FS implementation(s) insn't FS agnostic - while a 'Metadata in file' approach is.
I don't claim I know more than I know, and if you know you know more than I know, then by all means, let me know.
This sounds like an attempt by Apple to do on HFS what they've done for years on the Newton, er, *did* for years on the Newton. On the Newt there is no file system: there's only a database system, and each application maintains its own database of entries. When you issue a search, the operating system queries each of the applications in turn, asking them to search their entries in an appropriate fashion looking for a particular string or whatnot. Then it assembles the entries and the user can choose them and launch the application opening the entry. Nice.
On the Mac, that'd be expensive. Querying all the apps means running the apps. So instead Apple has lightweight app proxies (the "plugins") which provide metadata information rather than directly searching the files. Blah.
Whether this means that non HFS+ volumes won't be searchable with Spotlight, or that they get a slow non-indexed search I don't know. I'd suspect the former.
Linux has Dashboard for background-search metadata relations among apps - data autointegration.
--
make install -not war
I'm very fuzzy on the details, but I know that Apple played a leadership role, back in the mid-90s, in lobbying the FCC for the radio spectrum allocations for what we now call WiFi.
Either until you code it up, or you buy a Mac next year?
GPL Deconstructed
"* WYSIWYG publishing with a laser printer
Xerox invented that one too.
Xerox invented laser printing, Apple invented Publishing."
To nitpick, Adobe with postscript made that possible.
Perhaps then we should ask: Why doesn't anybody listen to Hans Reiser?
What's the answer? I don't know it, but your statement certainly makes me wonder, especially since I'm familiar with his good work.
Simpy
I was recently
reading
There was a great piece of software that did this back in 1990 called OnLocation, from On Software. It didn't use file system meta-data, but did index your entire hard disk, supported plugins for various file types, and was very fast.
Shareware CD-ROMs used to come with the OnLocation index files pre-installed, which was a pretty nice way to find something on a slow 1x CD drive loaded with shareware.
Everything that's old...is new again.
How do I turn off that damn magnifying glass icon that takes up like 50 pixels of the menu bar, 2/3 of which is gratutious white space? I didn't use Sherlock, I didn't use Find By Content, and I don't expect that I'll want to use this, either. I installed the 10.4 beta a few months ago and couldn't find any obvious way to turn it off.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
OK, one question here. I've had a mac OS X system since early last year. One of the few things that I still consider better in the MS world are the keyboard shortcuts. In MS Win, (almost) everything in a standard windows dialog can be reached without a mouse. Example: flipping through tabs with CRTL-TAB. I still haven't figured out how to do this in OSX.
Also, the keyboard shortcuts for typing are more consistent in MS Win. Example: CRTL-Forward Arrow brings you forward one word. On the mac, though, these shortcuts are not consistent, which is weird. And in many applications, HOME, END, PGUP and PGDN doesn't work.
Are there any settings to modify this stuff system-wide?
The technologies are barely related; Apple is not ripping off QS/LB in the least here. Spotlight is a technology for searching through files based on their conent and metadata. QS/LB are utilities for finding files based on easily typed mnemonics. You are looking at one aspect of Spotlights appearance (the dropdown search pane in the corner) and assuming it's a ripoff based on some similarity to the appearance of the other utilities.
In fact, the Spotlight indexing technology will be a boon to the utilities, as they will be able to leverage this newly available metadata to execute even more powerful searches. Quicksilver is already invaluable to me, and I expect it to just get better.
It would be great if Reiser4 mounted from OS X could use the same spotlight plugins to provide metadata to Lucene, some kind of bridge...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I think then there needs to be a new app created called Blacklight that installes a pluging which undoes any indexing done by other plugins...
Thus you could hide common terms from search results, perhaps even only for specific directories. Actually that might be nice for non-nefarious uses as you wouldn't nesicarily want it pulling up stuff in whole trees of the filesystem like system files.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I've been really excited about Tiger since I saw the live QT feed of the WWDC 2004 Keynote. With Spotlight providing comprehensive system-wide searching as quickly as iTunes seaches your music, Automator for easy scripting (lets you easily automate common tasks) and Core Image to do some amazing things with video and pictures. (To see Tiger on the linked video skip to about half way through.)
Well we mount all SMB for cross platform work with our window users. So that's too bad spotlight will not work in our current set up. Maybe that will change down the road.
-S
It is said that a child learns wisdom from the parent,
but the truly wise parent learns joy from the child
Sorry I should have said networked mounted file system. Hadn't had enough coffee before I posted.
My basic concern is that when you mount the file system it will start scanning and you could get a big hit in performance when mounting a lot of very big network file systems.
-S
It is said that a child learns wisdom from the parent,
but the truly wise parent learns joy from the child
True, it is iterative, but its not confined to being iterative like most everything else. In most people's experience, building metadata dbs is either done by the daemon that runs once a day (like the Windows Indexer), or a startup time crawler (like iTunes.) Eitherway, it requires an additional process that doesn't monitor in real time, meaning its not possible to do the sorts of really cool queries BeOS did.
+--------------------- You idiot! I told you we were facing the wrong way!
Yes. System Preferences > Keyboard & Mouse > Keyboard Shortcuts. Check the box marked "Turn on full keyboard access".
This allows you to tab between gui elements. Ctrl-F2 activates the menus for keyboard access. And you can edit shortcuts for every application you have.
Veteran, Bermuda Triangle Expeditionary Force, 1992-1951
"As someone replied earlier, this is a new paradigm in app management: the top menu controls the application, and the window menu controls the window."
Actually, this behavior is not a new paradigm as it has been a feature of the Mac OS back before it was Mac OS -- all the way back to The Beginning.
There are a few reasons for this behavior, but the most important one is that in good UI design, each widget should serve a clear purpose. On a Mac, the "close window" widget closes windows and that's it (unless the app has only one possible window). Aside from making the app appear to "launch" faster, it's a cleaner UI implementation that leaves little room for ambiguity. Plus, lets say you are downloading a big file in your web browser but don't want the display or the Dock cluttered with windows. On the Mac, you can close all the windows but still not quit the browser and keep the download active and out of sight.
Per Square Mile, a blog about density
So now Apple's given my wife a way to INSTANTLY find all my porn.
I guess I now have to go back to a "download as needed then delete" paradigm.
Sheesh, I wish they'd think these things through.
However, from what I've seen, that's not the sort of thing Spotlight is about. The plugins we're talking about make use of intrinsic metadata - information extracted from the datastream itself. Many common file types include some descriptive information: EXIF data in pictures, MP3 tags in audio files, meta tags in HTML files, and so on. Spotlight is a way of extracting and using that data.
The practical differences include, OTTOMH:
- Spotlight's information won't be lost when files get stored on other file systems, sent over email, processed on other platforms, &c.
- Spotlight uses information that's already in the files - you won't have to set it up manually.
- You can use existing tools to see and edit the metadata - MP3 taggers, photo editors, whatever. And you can do so on any machine and OS.
This is probably one of those rare cases when that foul word 'leverage' might be appropriate -- Spotlight should allow you to make much better use of an existing resource. As such, it sounds like a jolly neat idea!Ceterum censeo subscriptionem esse delendam.
There's been reference from the beginning of the computer revolution to this solution we've all been waiting for... and credit to evolutionary steps taken by apps such as,Quicksilver, Launchbar, BeOS, etc... but one application that predates AND which most closely matches the feature set is:
Simson Garfinkle's "Sbook.app" from NeXT in the 90's.
The usefulness of Sbook.app ability to add tokens in a flat file for instantaneous searches enabled people to apply Sbook.app outside its realm of address book that it originally was designed.
Abstracting its functionality and interoperating at the kernel level is pure Apple polish on the brand. Until people start using "Spotlight", the verdict will be out on adoption across the platform.
I will venture it will be one of the defining characteristics of the Mac platform into the future.
You don't have to click on the application to quit. Even when the application is in the background, you can right-click on the dock icon and choose Quit. Click-and-hold does the same thing on a one-button mouse.
Of course, this metadata will be so much cooler when something like spotlight is there to take advantage of it...
Die Menschen verhoehnen was sie nicht verstehen. -- Goethe.
One can make the argument that using alternating genders for examples is more inclusive. If you look at usage on the Apple pages, you'll find that examples alternate between he and she. One finds arguments all over the place about why aren't there more women in tech/blogs/games, etc. Perhaps it is because the language rejects them. While it was certainly proper grammar 50 years ago to use 'he' as the collective third person singular, there was a movement in the 70's to figure out how to correct this issue. Using 'they' is not correct. It is a matter of style, not grammar. Plus, it made you ask a question. Why did you have such a severe reaction to it? Maybe you could take a look at that and wonder how you would feel if the entire language used "she" and "her" as the collective third person singular. Would you feel included in the discussion? Now, put yourself in a woman's shoes and wonder how much women feel included in the discussion when "he" and "him" are used.
Man, you ever heard of Compuserve? Apple "borrowed" much of it's technology from other companies, as do other companies in this industry. And I'm the proud owner of four G4's. I'm just not blinded by platform religion/zealotry.
Is this going to do exactly what the Google Desktop Search for OS X would do?
I heard that Google was supposed to be developing that software, but now that this will be part of the OS, and probably implemented more elegantly, is Google going to abandon the idea?
What does everyone think?
Those who know, do not speak. Those who speak, do not know. ~Lao Tzu
How is it that iTunes isn't real-time? I know that if I change some metadata using it, the song file gets updated immediately. Do you mean that it doesn't update if you already have iTunes open and then edit the tags with some other program?
If so, why is that a problem? I haven't seen any program that's so good it beats iTunes (+ Applescript), so I would just use that and not have a problem. It's like at the doctor: "Doc, it hurts when I do this!" Doc: "Well, don't do that anymore, then!"
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Does anyone know how this will work with Backups/Restores? OS X backup programs have enough problems with resource files, yet alone this additional data.
._ prefix. Will the metadata be useable on an NFS mounted filesystem.
Also, how about remote file systems (nfs for example). Resource files are mapped as regular files with a
not bad at all unless you think that results-as-you-type is a bad search engine.
OS Xs current search is much faster than Windows XPs search.
I am the Alpha and the Omega-3
but it will not be flawed
I am the Alpha and the Omega-3
- Case sensitive
- 7-bit ASCII only
- Negligible metadata storage
Spotlight:- Case insensitive, AFAIK (HFS is)
- Unicode
- Arbitrarily large metadata storage
See the problem with your logic?Check out my sci-fi/humor trilogy at PatriotsBooks.
he was speculating. that guy does not know crap about what spotlight will be able to do.
I am the Alpha and the Omega-3
I bought a G3 iMac DV for $150 on EBay, added some RAM, and fired up Panther. It actually worked pretty good, but would chug on my 6 MP digital photos.
A week later, I bought an eMac refurb for only $550 from Apple.com (I sold my PC, as well as the iMac on EBay for a profit.) I bought a DVD burner for $50, and a 120 GB hard drive for $60. My PC was kinda old and worked fine for me, so it doesn't matter that it's not a G5. Later I'm going to overclock it to 1.6 Ghz to get a little peppier performance anyway.
Sure search engines are killer apps for the Internet but that's because the web is intrinsically disorganised and distributed.
Is search really so relevant for a single computer and the average desktop user? Most people already organise their files in a somewhat structured way, and generally know where to find stuff. (Especially if they use OS X)
Sure powerful file search might be useful occasionally, but i don't see it as a huge issue that companies like M$ think it is.
One more thing to note about the Spotlight Store: There is one content index and one meta-data store per file system. This keeps the content indexes and meta-data stores with the files they belong to--crucial when using external FireWire drives that travel from Mac to Mac.
On Windows, it was only a matter of time after including the "continually chew up CPU time indexing the hard disk" in Windows 95 that Microsoft was forced to add in a menu item to turn that off. It just consumed too many I/O resources and CPU cycles to continually update the system search feature.
Adding more "disable the semi-broken foobaz feature" checkboxes is one approach. Or they could have fixed it. (Based on what I've heard, I'm guessing that Apple made it work.)
Yes, the searches were blazingly fast when executed, and they were excruciatingly slow when the indexing was removed. However, how often does one do a blind search of the whole system anyway?
Not very often, if they're excruciatingly slow.
("Why would anybody ever need a car? How often does one need to travel 20 miles in half an hour, anyway?")
If you have a mac with a ton of files, various "Previous System Folders" etc...follow along :)
I have smart folders for pdfs, avis, mpgs, and wmvs
I have these sorts of files *all over the place*...movie clips, test files, you name it.
I go to the finder, "open" the Windows Media Files folder, and they are all "there"
Or all the "archive" files (zip, rar, sit/sitx etc) i've collected and not erased in the last year...
or all of the emails i've received from japanese users...
it goes on and on.
To me, its like the whole star trek "Computer..find all of the blah blah blah for sector Whatever"
It concentrates on the "what you want" as opposed to the current paradigm of where did i pit it/what app did i use, etc
Isn't Mac OS X 10.4 kind of redundant? I mean, isn't "X" the roman numeral for 10? That's what I thought it meant, at least. Perhaps X.4 would do the trick?
Real programmers can write assembly code in any language. -- Larry Wall
What are the chances that using the name Spotlight is in conflict with Quest Software's registered trademark here
Blame application developers who think they don't need to follow conventions. By convention, with scrolling text documents:
Command+left is supposed to move to the start of a line. Command+right is supposed to move to the end of a line. Command+up is supposed to move to the start of the document. Command+down is supposed to move to the end of the document.
Option+left is supposed to move to the start of a word, or back one word if you're already at the start. Option+right is supposed to move forward one word. Option+up is supposed to move you to the start of the screen, or up one screen if you're already at the start. Option+down is supposed to move you to the bottom of the screen, or down one screen if you're already at the bottom.
Add shift to any of these to cause the selection to be extended.
Home is supposed to scroll to the top, but not move the insertion point. End is supposed to scroll to the bottom, but not move the insertion point. Page up and page down are supposed to scroll one page up and down, respsectively, but not move the insertion point. Recently, I've noticed a trend to act like Windows shortcuts if a modifier is down, which seems like a good idea as long as the modifier isn't the shift key, but that doesn't seem consistent yet.
Control+arrow keys aren't standard, but the best use I've seen for control+left and control+right is sub-word navigation. The best use I've seen for control+up and control+down is to scroll the document without moving the insertion point.
Apple's recap of this is not quite as complete, but more generalized. It's available on Apple's developer site. Not everyone follows it. It's worth noting that not everyone follows the Windows conventions on Windows, either. Control+Tab for tab navigation, for instance, is not automatic on Windows. It has to be added to applications.
Am I the only who makes this connection?
It was a little bit annoying at this year's WWDC. I couldn't get the song out of my head...
It is the puppyfoot key.
Thank you.
Actually, apps do not need to be made spotlight aware - spotlight hooks into the filesystem and when a file has been modified/saved/created/copied, it will update the index appropriately.
oops, you are right... but for spotlight to index new file types, it needs a plug-in to know how to extract metadata from the file. so, in a way, spotlight may need to be made aware of your app.
It seems like unless you are actively creating relevant metadata for you files Spotlight could be pretty worthless.
Spotlight is not really like ReiserFS -- with Spotlight all metadata is automatically extracted from the file each time it's updated (using file type specific importers).
So you can't store arbitrary metadata for your files (like with ReiserFS and BeFS) -- Spotlight is a uniform way to access metadata already in the file format, e.g. you can query the width of an image, but you can't attach a location comment to your images if the file format doesn't already support it.
It *is* going the way of the dodo! Siracusa's the ultimate dodo, and it's going his way!