Ask Slashdot: Software To Organise a Heterogeneous Mix of Files?
BertieBaggio writes "I am a medical student at the end of an academic year trying to get my notes organised. I'm looking for a software document organisation system to organise a mix of text notes, journal articles, diagrams and scans. Ideally such a system would permit full-text and metadata search, multiple categorisations (eg tags), preserve the underlying files and be cross-platform (Linux/Windows/OS X). While I'm not averse to paying for such a complex solution, ideally the software would be FOSS so that extension or migration are possible if necessary. Desktop search (eg Google Desktop) probably does 90% of what I want apart from multiple categorisations, which is the feature I'm most interested in. Searching turned up a similar question over at 43folders which pointed me in the direction of Papers and DevonThink, but these are OS X only and seem to be aimed more at academic paper organisation. What recommendations does the Slashdot community have for categorising and organising a heterogeneous mix of files?"
Zotero - is awesome - Firefox plugin
http://zotero.org/
Personal Brain.
Best Slashdot Co
does what you need and then some.
I have a Mac and it's not the greatest OS, but I love the search. I search all my old emails and a horde of other documents all the time.
I'm sure other computers can do this just fine, but I was never satisfied with a desktop search implementation until OS X. And I used to be a search index consultant.
Take off every 'sig' !!
Sounds like you're describing a directory tree. Search with grep, or any similar utility. Put files in multiple categories (appropriately named directories) using ln. It's cross platform, timeless, and seems to do what you describe. I feel like I'm missing something though.
I love that program. You can get it from www.thebrain.com. It may sound like sarcasm, but it isn't. It's allowed me to organize a myriad of loosely-related information many times. I even bought the full version with my own money ($250).
Any guest worker system is indistinguishable from indentured servitude.
Emacs Org-Mode. I've learned a little Emacs syntax just to use that package after I've being a Vim user for over 15 years.
Free Manning, jail Obama.
My sister has worked for various doctors over the years, and a doctor's office is a prime candidate for something that will organize information: forms, papers, x-rays (photos), scanned documents, etc.
Many times she has spoken to me of the failures of the information-organization software that they have tried. Some would reach a certain capacity and choke, others had terrible OCR, and so on. In fact she even asked me about building a better application to do that; but I was too busy trying to put food on the table to take on a large project that probably would not pay off for a couple of years at least.
If there is an application that does this well, I would like to know about it too. (One person has already mentioned Evernote, but that's a "cloud-based" application for some unknown reason, and I would have privacy concerns.)
A document management system? You have a lot of competition there.
There's nothing wrong with "organise."
Dilbert RSS feed
Learn that spelling varies between different countries, and you may find maturity.
Have checked out Oracle Text? As I understand it, it is now a standard part of an Oracle database, and it can index text documents - according to rumours, it should be able to index not just words as they occur in the documents, but also their "meaning", whatever that means, and it should understand several doc formats. I haven't used it myself, though.
You can download it for free for development purposes - get the enterprise edition for your OS plus the very, very (VERY!) comprehensive documentation, and install; now you just need a handy front-end :-)
Hate to say it but it works. I have used it for several years and based on your requirements it would be a perfect fit. It can store several types of documents in document libraries tagged with meta data. You can create views on the meta data with the document libraries. You can perform full text and meta data searches. It has out of box workflows and can create complex workflows with developer tools.
It has alot of what you would need. It also is not cheap. Sharepoint Services or Sharepoint Foundation is free (part of the server software) but the advance features will require licenses.
There's an app for that.
It is called a file system. You can put it on Dropbox, Jungledisk, or even better, use unison for synchronization. Use folders to group files together. Use filenames to remind yourself what the content is. Use file suffixes to show the file type. To search, use spotlight or mdfind on OS X, locate on linux, and... go kill yourself on windows (disclaimer: I've only tried the search function on XP and older windozes). Metadata works great with spotlight; I don't know any solutions to that on linux or windows, but someone else probably does.
This is actually what OneNote -- the oft overlooked/maligned offering from MS -- is designed to do, and it does it pretty well believe it or not. Technically it's aimed at collaboration, but there's no reason it can't work equally well for self-organization.
https://www.eff.org/https-everywhere
I use Evernote, and so do a lot of my med students. It is cross platform, the free version is quite functional and stores PDFs, rich text and graphics. It is searchable and shareable.
Im not sure what happened to it or why they stopped developing it, but Google Wave was an awesome tool for doing just this. It was on the net, so it was cross compatible. It handled a wide array of file formats. It was searchable and it had a collaboration element that held on to revisions quite nicely. The API was open so others could develop their own solutions. Its a shame it went the way of the buggy
Why does it have to be a heterogeneous mix of files?
I use Mendeley Desktop for this purpose. It integrates well with Microsoft Word, and provides easy citations and reference organization. It is FOSS, and works under Windows and Linux. http://www.mendeley.com/ It also has an Iphone app, but I've never used it, so I can't vouch for its usefullness.
OpenKM will do most of what you want. We've deployed it for clients who have been happy with it. It does not preserve the underlying filesystem, but you can upload a ZIP file of documents.
It's a tomcat app - that used to be heavy-duty - if it is today depends on what kind of machine you're using.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Oh, yes, this. Brings back fond memories of a troll extraordinaire.
A successful API design takes a mixture of software design and pedagogy.
Would be a file system. Run something like Beagle for full text/metadata search. Use hard links to keep a single file under multiple folders.
Give me Classic Slashdot or give me death!
Also, to elaborate on your specific requirement for "multiple categorisations", and so that I might save myself from a "smart-ass" mod, here's a possible suggestion: http://www.tagsistant.net/ It's a tagging file system. You didn't specify which operating system you were on, but this works with Linux/BSD. Not quite mature, but I could see it potentially going places. At the very least, the idea of implementing tags directly in the filesystem might trickle up to extfs or NTFS or hfs+ eventually.
Hello, As far as I know Google Desktop doesn't allow full-text indexing of files. Is there a desktop search engine that would allow doing full-indexing of file?
Google Docs works OK for something like this. Can add as many categories as you like to each docs for easy sorting. Can add descriptions, etc for metadata search ...and of course content search works well for known file types.
[citation-needed]
char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
Check out EndNote (http://www.endnote.com/eninfo.asp), also. Your school may already have a site license like mine where it is free to all students.
Evernote gets my vote. I purchased the annual so no worries about space. The Optical Character Recognition (OCR) has been outstanding. I took a photo of a roadside marker, and dang if Evernote didn't have the whole text indexed by the time I got home. Evernote saves to the cloud, no backup worries. You can forward a message with .pdf or .doc file attached and it gets indexed. You can set up folders, share selectively part of your stuff. I send credit card statements to a finance folder. I setup a owners manual folder for the literature on all the gadgets and gizmos I've bought. My budget will include the annual fee (?$45) for the rest of my years and my backup / search worries are over, forever.
So I logged in after being tagged 'anonymous coward'. There, That was RickRack talking about Evernote.
OSX only again, but does what you want, including handling all files.
invented the web for this didn't he? How about putting your files on a webserver with something like a Lucene index?
Korma: Good
Zotero is free. Where is the profit in marketing that?
I don't read your sig. Why are you reading mine?
In the Open Source world, DSpace is probably the document management system to beat.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Learn how to spell in English, not just American.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
It's ok to take seriously. It'll be done by 2150, when the Daleks will invade the Earth. (The Daleks are too smart to invade a disorganized planet.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
called secretaries, now they want to be known as administrative assistants.....
I use VisiCalc because I'm down like that...
Still, it's a little weird to suggest that this is some brand new type of technology.
No actually it went from 'good' to 'excellent'.
char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
What did he spell incorrectly? You're not getting all twisted over the American/British spellings of organi(s|z)e, are you?
Still there for me. Not that I need it with Adblock Plus, but still..
We're looking at ResourceSapace http://www.resourcespace.org/ "Free and Open Source Digital Asset Management". Web based and multi-user. Works with images, video, pdfs, .doc and other types of files. So far it looks like it could be pretty handy.
http://lunarfrog.com/taggedfrog/ Does not do full text searches but can use the tags to categorize and also the favorites function. Its light and easy on resources....
Many thanks for all the informative replies so far. I've had a quick glance at Evernote, thebrain, Nepomuk (I'm loving KDE4 so far after switching a week or so ago), OpenKM and FreeMind and these seem promising. I've still to look at emacs' org-mode, and when I do I will try to put my vi prejudices aside ;-) Some of the other suggestions are rather good but aren't really what I'm looking for as they are either fully cloud-based (eg Google Docs, Wave) or one platform only (eg Sharepoint) or too expensive (hire a secretary :-P).
I like the idea of some of the "roll your own" ideas, eg directories + hard links, serving from a web server or wiki. The problem is as I progress though the medical degree, I am likely to have decreasing amounts of time to tinker with things if they have shortcomings; and to be honest they probably will as I am unlikely to have thought through the problem fully! Plus third-party solutions will definitely have substantially more polish than anything hastily dreamt up by me!
A shared wiki for my cohort / medical school / country may be an option on top of whatever comes out of this discussion, but I'd like something personal as... ah, let's say medical students have wildly varying standards of what is acceptable for notes ;-)
My supplementary questions for anyone still wanting to chip in:
Thanks again for the helpful replies, Slashdot. You continue to impress me - I doubt I would have gotten such a useful variety of responses elsewhere. I hope this discussion is useful to other folk looking for something similar.
If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
Evernote saves to the cloud, no backup worries.
You are foolish if you trust your data to the "cloud". Please understand: THE CLOUD IS NOT, BY ITSELF, A REPLACEMENT FOR BACKUPS.
http://www.knowledgetree.org/Main_Page Works rather well and has a *LOT* of features. Perhaps a bit too much for what you're looking for?
Do you Gentoo!?
Everything in your Evernote account is also replicated to your machine, in a set of files you can include in a local backup if you wish.
A friend recommended this to me: http://www.mekentosj.com/ I've played with it a bit, and it's very academic-focused.
There's probably a need for a more general metadata-integrated information management tool, that makes use of Mac OS X facilities for metadata definition and management. Do Linux and Windows 7 have similar OS level facilities to support metadata creation and management?
A key consideration is the ability to store metadata in the file, rather alongside. Some EXIF tools support this for photographs and I think the various Office formats (Microsoft and otherwise) support metadata within the file.
The OP specified cross platform compatibility. Linux/BSD is probably not going to be enough. However, I'd imagine that it would be easy enough to get it running under OSX. I'm not sure how easy it would be to get this running on Windows though.
Personally, this looks like something I'm going to have to keep my eye on as my main OSes are Linux and FreeBSD.
library scientist
Eh????
"I don't know, therefore Aliens" Wafflebox1
Tagsistant is a tags based (multiple categories) filesystem working on top of a traditional filesystem.
You might want to setup a Metadata server with descriptions of your documents. Think Library card catalog, basic equivalent to a Metadata server. The ESRI Geoportal will do everything you want and it is FOSS.
I agree that Zotero is awesome. I just wish it wasn't tied so closely to Firefox.
there is an alpha stand alone version of zotero now but that's not what I really want. it's one that works will all browsers and shares a common database.
But my major gripe with Zotero is that it does not work with Pages.app. The zotero folks seem to get inexplicably indignant when asked why they don't support footnotes in pages.app (other than the useless RTF conversion method). They say it's because there is no public API, but any doofus can examine the XML and it's obvious how to extend it from a few minutes contemplation. it would be perfectly simple to add new tags that contained the footnotes. I've actually test that for other reasons so I know it can be done from a syntactic point of view. It's the integration of that into Zotero that is beyond me. I wish they didn't have a Stallmanesque purity test for their extensions. After all they support MS Word.
So i struggle on with the lameness of endnote. And zotero goes unused despite being a better approach.
Some drink at the fountain of knowledge. Others just gargle.
Wuala is a really great client for this, and also comes with other nice-to-haves you may not have considered such as sharing, security and previous version access.
Linux / Windows / Mac / Android / iPhone / Web (Java Applet)
Cloud storage
Sync Locally / Backup jobs
Secure
-Encrypted and your PW never leaves your computer
-Files are in cloud, unless synced, so logout and files are inaccessible
-Files are chunked, meaning the assembled file does not exist anywhere
Many ways to get more storage space (some free)
Share files with friends, groups, or public web access
Previous version access
File tagging
File comments
File search
No filesize limit
Explorer integration
I've been evaluating Amazon Cloud Drive and it's very much like Google Docs but a bit more generic file-wise. That maybe an option for you.
If you insist in keeping your notes (think before how often you will look at them, then the above tools may work for you.
(recoll is based on xapian and works very well if you have big static archives which you don't need to index often.)
Save yourself - go on a date.
Why not use OneNote with either a local folder, a personal server or even a cloud-based Skydrive? It will gobble up everything you have and allow you to search, organize, etc.
Ken
Weird, certainly. You could sort, search and store documents in the way described using Gopher or WAIS long before HTML was even invented.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
You might also want to look at an xml normalizing tool like Xena - automagically converts all your docs, files, etc into open formats whose content can be searched by open tools.
I am a researcher. I want to add my vote for "file system." The less interaction you do with most of this material, the better off you'll be. For me, important or useful material goes into a reference manager. Those files get tagged in the reference manager. At this point in my career--only four years in--that's just under 600 articles with accompanying pages of notes. Other stuff goes into folders based on broad categories. I don't do any tagging on these because find-by-content always does the job just fine. Avoid the extra work. You're not paid to be a secretary. And most of the organizing won't pay off, will become an end in itself.
Thanks, it's good to be aware of the payoff of pre-organising things (a la putting in frameworks in programs that will never need it) before embarking on something like this. You also reminded me of something I wanted to put in my follow up: The reason I wanted multiple categorisations or tags is that some things do fall under more than one category. I'm a medical student, so I'll give you an example: Malaria. Does that come under:
I'm not sure what your research area is, but you must have come across something similar. I've (very briefly) worked in the area of antioxidants, and even there it would have been useful to pull out say everything on copper chelators in rat models, clinical trials of antioxidants and so forth. Basically, I'm concerned that while a filesystem-based approach is good for retrieving one or a few results for a specific query (ie where was that paper by Doe et al from 2007 on the effects of foo in vivo), it is less good at including things that fall under multiple categories (eg malaria, heart disease). Symlinks go somewhat of a ways towards alleviating this but they are a significant increase in efforts versus tagging and I'm concerned about cross-platform symlink handling.
Of course, you may ne right in tht I may be putting *too* much thought into this, but then again I'm a medical student and we worry about not knowing things fully ;-)
(OT but ordered/unordered lists aren't showing up for me in preview. Weird.)
If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
... quit dicking around trying to find software and start studying. Use whatever break you have before your classes start again to get your organization system in shape. Frankly, you're about nine months late on this quest and your faith in computers and your own tagging abilities are, if not misguided, pathetically touching. Good luck with your medical career.
That is all.
Have you looked at http://www.razuna.org/ http://www.opensourcedigitalassetmanagement.org/
+1 iusethistoo
Evernote has clients for OSX, IOS, Android, Blackberry, Windows Mobile, Palm, and the web. OCRs images you put in it. Syncs between your devices transparently. All clients are free but there's a max of like 60M/mo transfer; if you want to exceed that it's $45/year.
I haven't put alllll my data in it and probably never will but for reference material and general notes? It's kinda where everything goes now.
egypt urnash minimal art.
Anybody ever heard of Foldera? It sounded like it was going to be great (collaborative workspace software-as-a-service) until they never actually made a public release, oops :)
simple, fast homepage with your links: http://www.ngumbi.com/
This isn't your ideal solution—as you've noted, DTP is currently OS X only—but it does work pretty well for me, especially when I'm thinking about a general topic and need to find information on it. I even wrote a post about the similarities between Joyce's method of composition / finding material and how Johnson uses DTP.
Tracker ( http://projects.gnome.org/tracker/index.html ) is relevant to your needs. It stores relational information about your data. So for example, if you have an id3 tagged mp3 album, it will store the artist as an entity x, and the album as an entity y, and it will store that each of the mp3 files are a part of y, by x. In the same way, it stores authors and publishers as entities for pdf's. You can address this data store in the ordinary `desktop search' way (there are some search GUI's available), but more interestingly, you can use sparql to query the data store (allowing you for example to ask for all documents dated before 2000, by an author whose full name contains "Knuth").
Tracker comes with some data miners that crawl your data (e.g. using full text-search, and meta-data extractors) to build up the data store, but in your case it might make sense to enter your organization of the data into the data store yourself (using RDF). This would allow you to use the Tracker infra structure to access your data afterwards.
I'm not saying it does *today* - but isn't this a really common use case for a typical Linux user - geek or not?
you had me at #!
The reason I wanted multiple categorisations or tags is that some things do fall under more than one category. I'm a medical student, so I'll give you an example: Malaria. Does that come under:
I'm a linguist, not in medicine, but linguistics presents similar challenges of multiple categorization. One thing I've quite liked about The Brain is the ability to create both hierarchical (A contains B, C is contained by B) and sideways relationships (D is related to E, but I don't have to say how, or I can notate the link to describe the relation). These can also be redone on the fly, and this flexibility is wonderful when reorganizing things as you learn more about your problem space.
The Brain also allows many-to-one or many-to-many relationships, something that I don't think FreeMind allows. Using your example of malaria, it would be quite easy to create multiple higher-level topics/nodes/ideas such as "parasitic diseases" / "tropical medicine" / etc. and then have the "malaria" node be a child of all of the relevant ones.
A number of other mind mapping tools I've looked at in the past enforce a single-parent tree model that doesn't fit various kinds of semantic relationships very well, whereas the relations available in The Brain and the ability to choose any arbitrary node as the "root" node for viewing the model, or even the "root" node for overall organization, allows for a more organic web of association. I seem to recall that you can also tie two independent mind maps together, but I haven't tried that to date.
NB: I haven't bought The Brain just yet, but the demo impressed me, and this discussion has reminded me, so I think I will later today...
HTH,
"What in the name of Fats Waller is that?"
"A four-foot prune."
BAAAWWWW Everything should be free. Everything should run on antique equipment. Bawwwww
VUE - from http://vue.tufts.edu/ might be a helpful mix of directory tree and mindmap. It allows for content tagging and linking (local or on the Internet). Search by hierarchical (or not) ontologies is possible.
Get yourself a copy of Microsoft's OneNote. It is exactly what you want - easy to use, allows for searching within documents, and allows for mixed media storage - voice, documents, screen shots, handwriting, all mix freely. It has basic OCR and voice recognition skills that help it search through non-typed material, though sometimes it's a bit hit or miss here.
I use Evernote, and so do a lot of my med students. It is cross platform, the free version is quite functional and stores PDFs, rich text and graphics. It is searchable and shareable.
Evernote does not work on Linux, so "multi-platform" might be more correct than "cross-platform."
Via RDBMS, I've done similar things with info I had to use extensively. I parsed the files, tracked the paths, classified sections (as best a machine could), added category tagging, word indexing, etc. I'm not called "Tablizer" without a reason.
Note that it was for personal use, but could be extended to others with a fancier UI and less abbreviations.
Table-ized A.I.
I think it satisfies all your other requirements, though
library scientist
Eh????
In case it wasn't sarcasm but genuine surprise, yes, there are such beasts: http://en.wikipedia.org/wiki/Library_science
You might consider exploring http://www.knowledgetree.com/ (commercial open source) or another DMS like the following:
Alfresco - Open Source Enterprise Content Management (CMS) ... - alfresco.com ... - openkm.com ... - epiware.com ... - ademero.com
OpenKM - powerful, easy to use, web-based scalable electronic
Epiware - Document Management Solutions for Everyone. A powerful
Document Management Software - Your Search for Document
http://www.opendocman.com/ (PHP)
http://en.wikipedia.org/wiki/Category:Document_management_systems
James Fallows from The Atlantic magazine writes periodically about these kinds of software programs, everything from the ancient Lotus Agenda to many of the programs mentioned above. One very powerful tool that appears to be missing from my skimming through the posts is Zoot! (only on Windows) - soon to be Zoot-XT in July: http://www.zootsoftware.com/index.html
KDE's Dolphin file manager, coupled with Akonadi and Strigi (built-in, and seamlessly integrated) does everything that you are asking for. It runs best on Linux, but there are KDE Windows and Mac ports. Of course, that means that you must install all of KDE on that Winbox or Mac.
Note that in the past there was much criticism of Akonadi due to resource usage, but that has been taken care of for at least two major version numbers (KDE 4.5 and 4.6). Let us know how it works out for you, you are really going to enjoy the tools that KDE and Dolphin offer for file management and organization.
It is dangerous to be right when the government is wrong.
I've only started playing with it, so not sure it would suit your needs, but it is free and cross platform. From their site -
"About Compendium
Many people use Compendium to manage their personal digital information resources, since you can drag+drop in any document, website, email, image, etc, organise them visually, and then connect ideas, arguments and decisions to these. Compendium thus becomes the 'glue' that allows you to pool and make sense of disparate material that would otherwise remain fragmented in different software applications. You can assign your own keyword 'tags' to these elements (icons), create your own palettes of icons that have special meanings, overlay maps on top of background images, and place/edit a given icon in many different places at once: things don't always fit neatly into just one box in real life."
http://compendium.open.ac.uk/institute/about.htm
Have a look at www.dmoz.org, the so-called Open Directory. Not that I suggest you post your notes on that site, but it is a good example how a directories-and-links homebrew solution could work. You could even have the actual notes sitting outside the index tree.
What I'd like on top of this is a mechanism to track files as they get moved around in the file system, which does happen occasionally. Also to keep track of their copyright and confidentiality status, so I can avoid releasing that which shouldn't be.
Comparison of reference management software
http://en.wikipedia.org/wiki/Comparison_of_reference_management_software
Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
In case it wasn't sarcasm but genuine surprise
Having been to College and seen it in the the course book, it was sarcasm.
Still doesn't explain what the Science is in Library Science.
"I don't know, therefore Aliens" Wafflebox1
I've only had time to skim a lot of these comments, so forgive any redundancy, but one of the first questions to ask is: Will you really spend time tagging or organizing things as you add them? Many people think they will, but then don't follow through. Perhaps make sure you have full-text indexing and search - it is costly to implement at times, but might automate some of the work of getting the stuff ready to search.
Evernote does not work on Linux, so "multi-platform" might be more correct than "cross-platform."
There is an unofficial Evernote client for Linux called "Nevernote" (http://nevernote.sourceforge.net/). Haven't tried it, but apparently it's pretty faithful.
for the personal wiki thing I use my version of tiddlywiki, fiddlywiki. ssynced with dropbox. downside is no wysiwyg yet but it is html underneath so it is cross platform, easy to get stuff out . tags work great and google desktop ppicks it up so it is searchable, unlike zotero which i also use just not so much for content any more. http://way.net/FiddlyWiki