RDF For Desktop Metadata?
claes writes "There is an article "Metadata for the desktop" that suggests that RDF should be used to describe data in desktop environments. This is an interesting idea. RDF is already used by Creative Commons to attach license metadata to its works. Mozilla also supports it.
RDF was designed for the web, but can it also find its way to the desktop? And what metadata is most important to describe?"
is porn!
Suppose today I want to see shaved asian hardcore action. Now provided that metadata searches are integrated into the OS(like they will be in Tiger), all I need to do is a quick metadata search on my hard drive and boom, there is what I am looking for.
I mean provided there was a decent standard(a porn standards body would rule!) and good regex capabilities built into the OS, I would be willing to pay for porn. I know that there are comments built into the jpeg standard, but there are all sorts of porn file formats, it would be helpful to have a universal standard across them. It saves time, beats trying to search on google and going through a lot of crap just to get to something good. I am a man on the run, I have places to go, I can't be bogged down by my porn. Plus, think of the people that get to catagorize this stuff(well, the fun stuff anyway, not goatse), what an awesome job that would be!
I should probably post AC, but I figure this post is bound to earn me at least one fan and/or freak.
Why don't slashdoters define what meta-data is in the first place? Google's define: metadata lists not less than 20 definitions. Are we talking about "data about data"?
I am a big fan of implicit filesystem feedback. This can support all kinds of services from file sharing to most recently accessed search requests. Even fine tuning access controls in an RSBAC security policy.
The big concern is keeping this data protected and private. You dont want to share all of your metadata with everyone, so security of these systems should be something to look at carefully.
Are there any filesystems left that use forked files? Resource, Data and Metadata forks? Any at all?
While MacOS was at a disadvantage being one of the only ones to use it, wouldn't it have been an excellent advantage for ALL filesystems to be forked?
(I don't know the answer to this - anyone who knows more about filesystems, give your thoughts)
Why does the document complain about the lack of integration, then mention that Microsoft, Apple, the ReiserFS people, etc. are coming up with solutions, and then adds a completely new one? Shouldn't they just be supporting one Apple's or ReiserFS's efforts?
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Sure. I have no objection to a more extensive use of metadata. In fact I crave it - must have it.
But why oh why do people think that XML-based solutions is the way to go? An RDF solution would be bloat beyond belief. Ok, so it's not that bad for a few files, but when we get down to it - we don't have just a few files. We have plenty of them.
So why not use something smaler? A simpler protocol?
We can still have RDF-frontends for those that crave their daily XML-fix. Get real.
Since most of us are advanced computer users or even computer experts, I think we largely know how to search for content.
For one thing, I always give my filenames relevant titles, not things like document06.doc.
Also, I already know how to search through files for content using basic grep or advanced Windows searching.
I mean, sure, meta data like ID3 tags for MP3s that I steal offline are important because my Nomad mp3 player indexes based on that info, but in general I'd say meta data is not quite as important as some may suspect.
If you liked my post,
If ever there was an appropriate thread for him to post in, this is it! : D
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
I've heard the NTFS file system is designed to allow the system to add any number of properties (besides the obvious filename, last access time and permissions) to any stored file. This is likely to be exploited by Longhorn, which is planned to be capable of appending metadata to newly created files (for example, if you download a file from the Internet, the system would likely append a Originated-From-URL property to it).
What I wonder is, is there any filesystem in the FOSS world that supports something like this, or are there plans to make it supported before 20??, when Longhorn hits the stores? I see this as a critical feature that must be made available by non-Windows OSes.
Score: i, Imaginary
Last time I checked you can pick up HD storage space for $0.70 a GB.
thank God the internet isn't a human right.
Who
What
Where
When
Why
and possibly How...
flinging poop since 1969
I'm mostly wondering if the new Spotlight feature of MacOS X 10.4 is going to be based on this, or a proprietary technology. I've been itching for cross-platform metadata file support for years now...
for when I can just throw out the whole desktop in favor of a "cloud" of data... using google-like interfaces to find my stuff. I think it would be interesting to figure out how to tell a compiler where to find stuff...
meh
Danny Ayers has some interesting discussion on his blog about winfs and rdf. There's also discussion of Jon Udell's Questions about Longhorn.
Well, I have a file called DocumentNo5.mp3, but its a rip of an R.E.M. album.
Unknown host pong.
A group at MIT is using RDF for an integrated data management system. It's sorta like Outlook (or Kontact, if you prefer ;-) on steroids. It's called Haystack.
I have to say, their ideas are intriguing, but after using it... I think the big shortcoming is that it's tough to come up with a generalized user interface for manipulating any data thrown at it. Haystack tries at this, and I think, fails at providing any kind of cues or context that tells you what your are dealing with. In Haystack, every task and piece of information you deal with looks very much like every other piece of data, because, as a design choice, Haystack every piece of data has the same rank as every other piece of data.
Having different applications for different types of data usually make sense, if only to limit the amount of options presented to the user so they can make an intelligent decision about what action they want to perform. See this article on Slashdot about how users need limited since it makes decision-making too difficult psychologically.
Inevitably, discussions around RDF and metadata always devolve into hand-wavy discussions on how the computer will be able to "magically" do smart things based on the metadata. But it really isn't magic and it isn't automatic at all. Equivalencies and mappings have to be created by humans along with the rules about what to do.
RDF uses many concepts from AI research. Anybody who has read about this branch of computer science knows that the discipline has pretty much given up on creating AI in the 'sci-fi' sense as an impractical dream. That's what makes the Loebner prize so controversial. I don't expect that computers will be intelligent enough able to relieve users of too much of the burden in assigning metadata.
RDF is a promising approach, but if you read the article, it makes a lot of assumptions about what needs to happen to make the benefits real. Among them are establishing standards for what metadata fields apply to different types of objects: photos, people, music, etc. That kind of standardization won't happen overnight, if at all.
The computer also needs to know what to do when it encounters that kind of data. The article mentions MIME and browsers and, in effect, says the browser can make a rational decision even if it hasn't seen a particular MIME type before. That isn't really true.. you have to install a plugin that tells the browser what to do, or have a registry that someone has put together where the browser can install the right plugin at the right time.
That said, KDE's unification of contact information and passwords does show some of the promise of metadata efforts. And Apple's Spotlight looks like a good solution as far as it goes. I guess I'm just trying to make the point that the magic of metadata needs to be taken with a fairly large hunk of salt.
======
In X-Windows the client serves YOU!
But when I tried to publish one article at Kuro5hin, the RDF code, which took the form of HTML comments, was displayed literally in the visible body of my article. That is, all the tags had been turned into entities so the tags appeared literally in the rendered text.
I think Kuro5hin's Scoop content management system doesn't permit HTML comments. Maybe it's not trying to suppress comments, but it didn't occur to scoop's developers to allow them.
RDF on the web would likely be much more popular if one could count on publication sites allowing it in the submitted markup.
Another problem I had is that Creative Commons' recommended way to apply a license to a web page is not permitted by any of the community sites I frequent. CC-licensed web pages usually have a small banner that links to the license text. But for obvious reasons, sites like Slashdot and Kuro5hin don't permit images in article or comment submissions.
The result is that, even for the copies of my articles on my own website, I use neither RDF nor the CC banner, because I want to make it easy for others to copy my CC-licensed articles to site that don't permit RDF or graphics.
The way I apply the license is the much-less-cool method recommended for plain text files. I have the following text appear in the body of my articles:
Request your free CD of my piano music.
There's no need to provide definitions for terms which are so easily looked up. Metadata isn't an obscure concept, as you found when you did your search.
Hi,
A bit of a shameless plug, but none the less: I think that folks who
liked the ideas in Edd's article might also be interested in my
project, libferris.
Ferris allows metadata to be extracted from files and presented through
a uniform interface. It supports inference on metadata and has the
ability to index that metadata in many ways (eg. Berkeley db, odbc
LDAP). Note that the metadata index can be used to index anything
libferris can mount (XML, ODBC, RDF, LDAP, http, ftp...)
A cool thing related to Edd's piece is that you can read an inferred
attribute "as-rdf" to obtain all the metadata that libferris knows
about for a file as a single RDF/XML file.
The vast majority are very small files. How much more space would be required to give each one some RDF? And remember disk space is allocate in terms of sectors, or sometimes in blocks of several sectors, so small files waste proportionately more space.
And that's just on the Windows installation for my PC. I also have Slackware Linux and BeOS on other partitions. Quite likely there are very nearly a million files on my PC alone.
Request your free CD of my piano music.
Of course you need a few extra things like a universal schema database so everyone can use he same schemss and a way to organise them, ratings, recommendation engine etc I can say, as an AC that it is nearly here, Windows to begin with (the core is cross platform C++ only the GUI is Wondows specific) before transitioning to a Mozilla based crossplatform app, the possibilites are far greater than the article discusses, a final unity of metadata, and more importantly search and aggregation across systems, datatypes, languages (human and machine).
Interesting? The Voynich Manuscript is interesting. The Linux kernel is interesting. A guy who can't be bothered to read more than 3 words of 1 definition of a word isn't interesting. This post should be at -1, Troll. Might as well get a good mod on this post too: That and installing Gentoo on my home box.
Read Da Fucking what?
But that's the problem! If it's not fun to organize items into folders, how is it anymore fun to add metadata to a file? I'm not talking about text files. Text files are easy, because you can pull the metadata out of them automatically (in fact, you can do this now with search tools). I'm talking about files that have to be explicitly tagged with metadata, like pictures. How is adding metadata to each picture file to categorize your vacation pictures any less laborious than placing the vaction pictures into their own directory?
That's the problem as I see it. You still end up being a filing clerk! If people don't even organize their folders now, are people going to use metadata when it's available? Will improved search capabilities make users want to be clerks?
In a nutshell, isn't it the same problem?
Show me on the doll where his noodly appendage touched you.
CC is interested in desktop metadata developments. See this CC weblog post from a few days ago.
Having different applications for different types of data usually make sense, if only to limit the amount of options presented to the user so they can make an intelligent decision about what action they want to perform.
I agree wholeheartedly that unifying desktop applications into one nebulous interface isn't a very useful way to give users access to their data. Mail clients make good mail clients, but they make lousy photo gallery browsers.
That said, what I do wish we'd see more of is an effort for different applications to share the same information, because the dividing line between which application to use is much clearer than the dividing line between which application should be the keeper of particular types of data.
I don't want to have to open my web browser to see if I've bookmarked a URI that somebody mentioned in an IRC channel. I also don't want to have to open my PIM to find the phone number of somebody who I'm talking to in that IRC channel.
These are the sorts of data access issues I'd like to see resolved, and I do see RDF as a possible, even attractive, approach to solving the problem. However, as you've pointed out, we can't simply modify our applications to all spit out RDF, and expect everything to fall into place. Some degree of consensus about how to represent data is required. Rather than writing new applications like Haystack, or looking for new approaches to managing one's information, I'd rather see efforts to modify existing applications to share data sources more effectively.
I've yet to see a real world example of how to use RDF that wasn't for research(ie to prove RDF works) purposes. Most of the projects listed for semantic web are purely research, toy projects, or completely unproven. I know of several companies that have tried, but they usually end up extending the hell out of RDF to make it practical and useful. That makes me think RDF is flawed.
Request your free CD of my piano music.
1. Ditto to the post which said, "Separate the apps, not the data." The current proliferation of app-specific formats is absurd and counter-productive.
2. I file hundreds of docs &/or URLs per day. I need something which offers some degree of assistance in immediate auto-categorization (e.g. Bayes) with feedback, while still allowing user-defined hierarchy. "Yes, thank you for intelligently recognizing that this new info is about device interrupts; but now I need to tell you that it's about kernel-coding vs. crash-debugging vs. performance-analysis."
3. One poster calls the article, "self-maturbation on the part of bored jobless software engineers that aren't solving any problems that need solving".
Speak for your yourself. Yeah, I'm a developer, but most of my minute-to-minute usage of my desktop isn't all that different from "lusers" or PHBs, i.e. massaging info.
Get some perspective. Your statement is like saying, "Cars are really primarily made for mechanics and automotive engineers, not for soccer moms and commuters."
4. Forked-data: sure, as long as it's restricted to the app-specific stuff. Take that table the user just created: use forked-data for the meta-data which is specific to the spreadsheet or WP app, but leave the table data as ASCII data which anyone can read.
5. Someone said, "a file-name should be enough". Speak for yourself; a lot of my needs go waaayy beyond that. If the metadata goes beyond your neeeds, then your course is clear: just don't use it. It costs you nothing to architecturally allow for its use by other people.
6. re: "clouds", there are times when I'd really like to know -- what app created this file? what OS? which host? which user? what other files had been opened (e.g., stdin)? what was the original volume label? etc.
Knowledge representation via "is-a" links has been tried, and it breaks down rather quickly. Read "Artificial Intelligence meets Natural Stupidity", by Drew McDermott, for a 20 year old critique of this concept. It's overkill for searching, and not powerful enough for reliable automated question answering.
The Cyc debacle illustrates how much work you have to put into tagging to get very little out. After twenty years of that money sink, it's still useless.
I noticed the article made no mention of Pike (also the name of a fish - see language logo). Pike's a fine C-like scripting language ...that I know extremely poorly myself, but anyway..
From Pike's official homepage (at the University of Linkoping, Sweden):
Worth downloading and checking out for other reasons than "just" RDF & OWL. Free software, available under LGPL, GPL, and MPL (Mozilla Public License).
668.5
the fact that Mac OS X uses .plist files to represent creator code and application information in .APP bundles.
When looking into metadata, people should probably be sure to check out XMP
It's from Adobe, and whereas RDF just says how to format metadata, XMP addresses what to include in your RDF, and how to place it into different types of files. They have free libraries, but it's simple enough to follow even with your own code. And... given that it's how all Adobe products are doing metadata, at least in the publishing world it will probably stay something to pay attention to.
Creative Commons has addressed this, and I first hit it in researching implementing metadata support for Inkscape.
The more things play nice together, the more users are likely to adopt using them.This "metadata" is actually called an "NTFS stream" and has been around since at least NT4.
If you move the file around the NTFS drive, or from one NTFS drive to another, then yes, the metadata goes with it. If you move it to a FAT volume though, the metadata is lost forever. Not a huge deal as NTFS is getting more and more users nowadays.
XP uses these metadata streams to some degree, actually. Some of the things in the properties page for a file are actually NTFS streams.
Longhorn will make more extensive use of them, I'm certain.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
Did anyone else read RDF and think.. Reality Distortion Field ( Steve Jobs)
Good, yet another format to use/suffer!
No matter how good those formats are (XML/RDF/etc) they all fail at the simplicity norm, the KISS principle.
In the example of the article, by not using a simple text oriented format they innecesarily complicates the access by any program to these values, and that leads to the second point.
The computational cost involved in parsing / validating all those formats; the day that our cpu's can process hundreds or thousands of simultaneous parsings without a noticeable impact on performance, that day it could start to make sense to popularize his usage, until then, they are a luxury and as such restricted to a limited (especialized) usage.
On the RDF case, metadata is data, the 'meta' part is a human hability and can be used wherever we want, no need for a special format. By pretending to format the 'metadata' concept we are just defining a new stream format, and if we consider how wide the 'meta' concept is, it seems dificult to limit to a simple ontology. The result? the need of another international consortium to stablish a reasonable set of vocabularies, big deal!
I think there are better ways to spend our cpu cicles than to parse verbose formats, but how knows?
What's in a sig?
Maybe it can replace Windows' crappy Desktop.ini file, allowing for complex Desktop setups.. one use i see for it is to have links to all the icons, programs they start, etc...
Investing forum
Nobody questions your ability to index and search your own data anyway, its when you start to cooperate with other people, when metadata become really useful. You might not name your document document06.doc, but someone else might. And not everything is grepable, pictures, music files, binary data files, all that makes a great use of searchable metadata.
Anyway, I think it would be great to have a unified system of metadata, so that you would not need specific system for every category of files, like mp3 and id3 tags.
If programs would be read like poetry, most programmers would be Vogons.
I would say that RDF, or any XML format, is unacceptibly wasteful for metadata. Besides, many filesystems already support extended attributes. Why not use the mechanisms that were developed exactly for this purpose, instead of introducing a new and inferior one?
Forgive my zeal, I just really hate the XML for everything mentality.
Please correct me if I got my facts wrong.
If you use a Mac, you might be interested in DEVONthink (and it's little brother, DEVONnote). It does (2), and very well.
It text indexes all supported Mac document files (Web, RTF, text, PDF, etc.), and can store anything (links, movies, PDFs, whatever). You can then do very fast search.
Have a look.
I think there is too little metadata about installed applications in a regular Linux system. There is metadata in the packages (RPM for example), and there is metadata in the .desktop files. But the package metadata is on the package level, and does not describe each individual application it contains. The .desktop files are very sparse, and describes things that fit on one line of text or less. This makes it hard to write new kinds of user interfaces. I can't find any way to make a freshmeat like user interface acting on the software that is actually installed in the system, since there are no descriptions that are longer than one sentence in a .desktop file.
Firstly, RDF is not XML; its canonical exchange format encodes to XML, but there are plenty of other representations.
Secondly, please explain how the implicitly-described files in your NTFS streams can be seamlessly shared over the Web in a composable way.
The point of RDF on the desktop is that it does statement-level meta-data very well, and is Web-integrated.
Hmm, perhaps I should read up on RDF more, but everything I have seen that had anything to do with RDF was in XML. Saying that RDF is "Web-integrated" also says "XML" to me.
As for sharing metadata over the web (I am not talking about NTFS about it, because until today I didn't even know it supported extended attributes), I think HTTP headeders perfectly fit this purpose - they are metadata, after all. Just encode every attribute in an HTTP header.
Besides, the main use I see for metadata is to improve organizing and finding objects. This works from localhost upwards; first, you slap meaningful metadata on your files, then you can use your (local) search functionality to efficiently find your files, and finally, with suitable protocols, others can find objects on your system, too.
Please correct me if I got my facts wrong.
Isn't this what kSpaces does today? Granted, kSpaces is still a little rough around the edges, but it seems like a start.
.NET platform). All client-server communication is done via SOAP, and client-client data transfers are made via HTTP.
From the website:
kSpaces is a metadata-driven, distributed knowledge management platform. It was designed to be lightweight, transparent and extensible. The kSpaces proof-of-concept allows files to be described with arbitrary RDF metadata. These descriptions can then be easily shared with and queried by other nodes in the system. Finally, kSpaces-managed files can be made available to all other nodes participating in the same kSpace.
kSpaces employs file system monitoring and auto-tagging technologies in order to achieve almost full transparency to the user. Its lightweight, plugin-based design allows for maximum extensibility.
The kSpaces reference implementation was written using Java for the server and C# for the client (on Microsoft's
The kSpaces Node software works by monitoring a directory (My kSpace in the My Documents folder) and managing metadata about any files in that directory. Subdirectories are not supported.
kSpaces automatically tags files through the use of plugins. The two autotagging plugins that are included analyze a file's ID3 and EXIF headers, and then generate the appropriate RDF metadata.
Metadata associated with a file can be viewed and edited through the kSpaces Node application, supported by editor plugins. Five editor plugins have been included in the proof-of-concept, four of which are read only. These plugins allow the management of a subset of Dublin Core metadata, EXIF metadata, ID3 metadata and kSpaces-specific metadata. The Raw RDF plugin shows the raw RDF metadata associated with a knowledge asset.
The metadata that is stored about knowledge assets in a kSpace can be queried using RDQL. In end-user applications, RDQL could be generated by using natural language processing or other technologies.
Finally, the kSpaces node allows the kSpace contents to be viewed using browsing plugins. In this proof-of-concept, only a very basic one has been provided, which shows all assets present in the kSpace. Writing additional browsing plugins will allow users to see the kSpace assets from different facets that can be tailored to the user's needs.
Bah.. you mean like DTP is irrelevant to anyone who can use a typewriter?
Anyway, the grandparent has it exactly wrong: "normal" users who won't correctly name things and store them in a badly-thought out directory tree will not be using metadata. We power users on the other hand, can use it to make our own systems far more useful to us.
Since most of us are advanced computer users or even computer experts, I think we largely know how to search for content.
But what good is your know-how if you can't find what you're looking for? Let's face it: mos of us have hundreds, even thousands (or tens of thousands if you count all your pictures and mp3s), of files on our computer(s) and finding what you want depends nowadays greatly on your memory.
With Spotlight-type (metadata driven) search you can narrow the results, query by query. You don't have to remember where you stored the file or with what name. Even if you know how to archive your files, I think these metadata based technologies are going to help us a lot in the future.
And on a second notion, let's remember that if you know how to find stuff on your computer, you belong to minority of computer users. Most users have no idea how computer works and therefore they don't know how to make efficient searches or how to archive their files practically. Metadata will help these people to find their ways better on this machine that they are not familiar with.
Nah, in large corporations, most of the metadata is stored by global information systems, that usually have a process in place which won't let you proceed unless you fill in metadata. The point is that name and directory is as bad as hierarchical databases for all the purposes, which metadata are in usability alongside the relational databases.
If programs would be read like poetry, most programmers would be Vogons.
There's been work on adding Dublin Core metadata support to Inkscape, for its next release.
The need for the metadata support is entirely practical in this case: the Open Clip Art Library requires all SVG submissions have proper metadata embedded, to ensure licensing and authorship correctness. Also, there is an SVG Clip Art Browser that uses the metadata info for its display.
One interesting observation that's come up recently and is being discussed on the lists is what happens when you embed several pieces of clipart into a larger document, how do you access the RDF of the individual bits in Inkscape?