Metadata in Vista Could Be Too Helpful
linumax writes "Windows Vista will improve search functionality on a PC by letting users tag files with metadata, but those tags could cause unwanted and embarrassing information disclosure, Gartner analysts have warned. Search and organization capabilities are among the primary features of Windows Vista, the successor to Windows XP due out late in 2006. While building those features, Microsoft is not paying enough attention to managing the descriptive information, or metadata, that users can add to files to make it easier to find and organize data on a PC, according to Gartner. 'This opens up the possibility of the inadvertent disclosure of this metadata to other users inside and outside of your organization,' Gartner analysts Michael Silver and Neil MacDonald wrote in a research note published on Thursday."
Isn't this like saying Airbags are too safe? I thought whole point of metadata is to make it easier to search and find data? How can it be *too* helpful?
The greatest experience we can have is the mysterious.
- Albert Einstein
is to make the metadata attatched to document files viewable only on the Vista installation it was created on. Perhaps it would be possible to have the operating system strip the data off the files that are being copied or moved to other network locations as a precursor to each respective process. In this case, they would also have to work some kind of functionality into the next iteration of Outlook, so that the problem could be stemmed from the email side of things.
What 3rd party vendors would do to accomodate this is anyone's guess.
We never send any raw documents out to customers. We always print them to PDF first. Looking back I wonder if there is still a chance private data could be leaked, that somehow PDF layers the hidden stuff underneath and if someone were to peel back the top.
But this will just be an extension to that policy to check for any meta data.
Even the much vuanted google desktop had information discloser issues.
as this type of technology comes to the mainstream its to be expected the early stuff may have a bug or two. (see: google desktop)
and here they are slamming microsoft for a new feature people are asking for. and telling them how to do it, when they have no idea on how hard this kind of thing is to do from a software engineering perspective.
I mean sheesh The product is in BETA, make a bug report to microsoft as a beta tester if you find a bug.
I mean windows vista has alot of very new stuff under the hood which is very cool. much of the stuff effects security and stability which is a good thing.
-Nex6
For example, a user might use "good customers" and "bad customers" as keywords on contract files. If such a contract is sent to the customer with the keyword still attached, it could cause embarrassment or even loss of business, the analysts wrote.
Wait a minute... Since the tags in question are an OS feature, wouldn't the OS have to store them somewhere else in the filesystem, outside the file, since it can't know how to stuff them inside a file of an arbitrary format? And when you send someone a file, isn't it only the content of the file that is sent, along with the filename of course? Ergo, isn't it impossible to inadvertently send someone a file with Vista's tags still attached, since they're not in the file itself?
<slashdot-editor-mode> Does this mean that Gartner analysts are simply FUD-mongering without a clue? </slashdot-editor-mode>
The mac OS (offering previews of the next Windows OS since 1984) already suffers from this problem and so far there are no graceful solutions. Namely spotlight gathers sensitive info in ways I wish it would not. To be specific, I deal with a lot of confidential e-mail that can include personell problems of empoyees. At the same time it's got all my project info on it. When an employee comes to talk about a project I will often search for terms related to the project or sometimes by the employeees name in spotlight while they sit around my screen. Spotlight pulls up the docs and the e-mails onto the same search results screen. Seeing titles of certain e-mails or possibly just the addresses can reveal confidential information or be embarassing.
As a result I no longer have spotlight index my e-mails. And of course that's a pain in the ass since it means Mail.app's searhc feature is busted. While I can figure out how to work around that (e.g. don't use mail.app, which would be a pity), the story does not end there. Unfortunately, spotlight indexes my backup volumes too, and it can blunder across old mail there and index it.
Now you might think I could also turn off indexing the backup volumes but there's the rub. First I might not want to. Second, you can't always do it. Spotlight has some bugs in how it handles logical partitions on disks and in particular it sometimes ignores being told not to index a volume if another partitions is being indexed.
Anyhow eventually there will be more fine grained control on privacy, but then the interface will become more cludgy too. In fact that may just kill the whole fine grained control effort since most folks don't worry about this sort of things and would prefer simplicity.
It's perhaps worth noting that windows dropped making the filesystem a database (for now). That might be a smart move since making at a wrapper like spotlight means they are less locked into a single search design. Problems like this will emerge slowly and flexibility to plug problems will be needed.
Some drink at the fountain of knowledge. Others just gargle.
Um... did you forget about that other option? Keep metadata specific to the computer. Infact, never have it directly attached to the file data. One simple way to visualize this would say you have a file access table, this table is essentially array with one column being the file name and one column being its beginning sector, one column for file size, and now you just add another column for the start of the metadata and the size of it. Essentially treat the metadata like a separate file that is pointed to by the real file's table. When you copy the file through the shell (including explorer), the program doing the copying could go out of its way to copy the metadate with the file, but by default the metadata should not be moved with the original file, no matter what. Now when you upload a file, the browser or email client will by default not send the metadata. Pretty simple concpet really, infact I convoluted it quite a bit in the above explanation. Essentially keep metadata local and unattached to the file, just because you send me a picture doesn't mean I want the metadata to say "my children", because they are actually your children, not mine, and I'd have it say "my nephews and nieces" or something like that. Metadata is nothing more than the user's personal opinion and idea of what is in the file, no need to send it around by default. I can understand why in some companies would like metadata to stay (i.e. labeling documents as various customers and roles, etc...) but for once Microsoft should start off by making the smart and safe choice of defaulting to no, and let the damn company create a policy to allow metadata in certain files to persist.
Regards,
Steve
How is that scary? It's just indexing data that is already on your computer. The fact that a file is "hidden" in a subdirectory 10 levels deep in an odd file format doesn't make it any more secure, just harder to find. Security by obscurity doesn't work. If a hacker has access to your machine, he can just as easily index your files from the outside as he can by looking in the Google Desktop index file.
The same goes for these Vista metatags of course. If you have a file called cc-num.txt and tag it with "This is an unencrypted text file containing all my credit card numbers and ATM passwords" the problem is not with the tag that makes it slightly easier for someone else to find the information, but the fact that you have such a file on your computer in the first place! If it's confidential information, then encrypt it. Thinking other people can't find a file because you don't index or tag it is only deceiving yourself.
Support Right To Repair Legislation.
Another problem with meta data is the generation of meta data. If people generated their own data they could control what goes into it. But the problem here is that you just don't do it normally. Plus as documents change, get copied and modified and so on it gets out of sync unless you keep modifying it. Last thing most people would want is some rigourous change control protocol for every document and e-mail.
Which of course means automated meta-data scraping. this leads to the problem of confidential info disclosure. that's obvious. But it also leads to another problem that annoying. When do you update the meta data? when the file is created or modified? a small lag? or in batch overnight?
On macs you can force a batch overnight search. But the default on is for instant updates. If you add a search term to a document WHILE a search is being performed in another window it will find it! amazing. and very useful too. And it assures things like computers that sleep at night and detachable drives stay indexed.
But it's also amazingly annoying when you stop doing conventional desktop activities and start doing more unix like things. Tage for example untarring a 30 GB archive with twenty thousand small files in it or something that is generating transisent files in a rapid fire fashion. Well you start untarring and for the first few files it zips along. then suddenly throughput nose dives. Why? you look at your processes and you see MDL the indexing programming is chewing up your disk access.
You can work around this if you can control the file names and make sure they are ones it will not index. But that's not assured, always possible, and will vary from computer to computer.
So anyhow there's lots of fine tuning needed on these ubiquitous metadata systems. Fine grained privacy control and fine grained operation modes so it's live in desktop application mode and lags in unix/high performance modes.
Some drink at the fountain of knowledge. Others just gargle.
Gee, if anybody needs to be lectured about not storing metadata it's inline, it's the designers of Unix. Special files, directories with special names, using "From" as a message separate in mail files.
/var/weirdstuff/homedirs/tom instead of /home/tom and neither /dev nor /etc, /proc, /sys are special because of their name. The name is by convention and does not carry metadata.
Unix stores what little metadata it natively supports in the inode, not in the file data blocks.
Special files have nothing to do with metadata, but with the Unix philosophy of "everything is a file", which works great and allows you to reduce the number of necessary system calls considerably.
I know no directories with special names. There are many names "by convention", but if I want I can have my ~tom in
Assorted stuff I do sometimes: Lemuria.org