Ask Slashdot: Software To Organise a Heterogeneous Mix of Files?
BertieBaggio writes "I am a medical student at the end of an academic year trying to get my notes organised. I'm looking for a software document organisation system to organise a mix of text notes, journal articles, diagrams and scans. Ideally such a system would permit full-text and metadata search, multiple categorisations (eg tags), preserve the underlying files and be cross-platform (Linux/Windows/OS X). While I'm not averse to paying for such a complex solution, ideally the software would be FOSS so that extension or migration are possible if necessary. Desktop search (eg Google Desktop) probably does 90% of what I want apart from multiple categorisations, which is the feature I'm most interested in. Searching turned up a similar question over at 43folders which pointed me in the direction of Papers and DevonThink, but these are OS X only and seem to be aimed more at academic paper organisation. What recommendations does the Slashdot community have for categorising and organising a heterogeneous mix of files?"
Zotero - is awesome - Firefox plugin
I have a Mac and it's not the greatest OS, but I love the search. I search all my old emails and a horde of other documents all the time.
I'm sure other computers can do this just fine, but I was never satisfied with a desktop search implementation until OS X. And I used to be a search index consultant.
Take off every 'sig' !!
Sounds like you're describing a directory tree. Search with grep, or any similar utility. Put files in multiple categories (appropriately named directories) using ln. It's cross platform, timeless, and seems to do what you describe. I feel like I'm missing something though.
Emacs Org-Mode. I've learned a little Emacs syntax just to use that package after I've being a Vim user for over 15 years.
Free Manning, jail Obama.
My sister has worked for various doctors over the years, and a doctor's office is a prime candidate for something that will organize information: forms, papers, x-rays (photos), scanned documents, etc.
Many times she has spoken to me of the failures of the information-organization software that they have tried. Some would reach a certain capacity and choke, others had terrible OCR, and so on. In fact she even asked me about building a better application to do that; but I was too busy trying to put food on the table to take on a large project that probably would not pay off for a couple of years at least.
If there is an application that does this well, I would like to know about it too. (One person has already mentioned Evernote, but that's a "cloud-based" application for some unknown reason, and I would have privacy concerns.)
What specific file types can't you load? I have used Sharepoint 2007 since it rolled out and you can load any file type you want. There are no restrictions. We have non Office documents loaded in document libraries as well as Office libraries.
Sharepoint would do what you are looking for but the advanced features will cost you.
I use Evernote, and so do a lot of my med students. It is cross platform, the free version is quite functional and stores PDFs, rich text and graphics. It is searchable and shareable.
I use Mendeley Desktop for this purpose. It integrates well with Microsoft Word, and provides easy citations and reference organization. It is FOSS, and works under Windows and Linux. http://www.mendeley.com/ It also has an Iphone app, but I've never used it, so I can't vouch for its usefullness.
OpenKM will do most of what you want. We've deployed it for clients who have been happy with it. It does not preserve the underlying filesystem, but you can upload a ZIP file of documents.
It's a tomcat app - that used to be heavy-duty - if it is today depends on what kind of machine you're using.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
But, it can't search any of those files.
It has a search function, but it's almost completely useless. I can even put the exact file name I'm looking for and it won't even be in the top 10 results.
The only advantage Sharepoint has over a simple shared file directory is some crude revision control and the ability to create calendars.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Zotero is free. Where is the profit in marketing that?
I don't read your sig. Why are you reading mine?
Heh, the only way I've ever been able to tolerate having to use any version of Sharepoint is to open a document library in IE, and then click on some dropdown to change it to explorer view, and then create and right click on a folder and select explore in a new window. Then it opens up in File Explorer, where I bookmark/favorite it so I don't have to deal with the atrocious "information blackhole" Sharepoint web interface, and I can easily drag / drop / delete entire folders using the File Explorer interface, and the URLs I send to coworkers are a lot more sane-looking and consistent. (At least in older versions, Sharepoint URLs would seldom get the user to where they wanted to go (way to break the internet there!), leading to long entertaining prose as people attempted to describe how to "navigate" to some random place in Sharepoint.)
And maybe the search works better now, but I often couldn't find files amidst all the junk that shows up, even if I knew and specified their filenames.
Makes it much easier to use a local revision control thing too, I've lost work a few times trying to use Sharepoint's revision tracker doodad.
Indeed, I found Zotero extremely valuable to manage papers (especially because you can add them directly from the web site with a single click), but I don't use it for anything else (my LaTeX files are under version control and organized using traditional directories, my notes are mostly in Tomboy [except for those more complex which I do in LaTeX, and of course those I do on paper], my mails are on the mail server [which I don't access through the web interface if I don't need to], any self-written programs are of course also on disk and under version control, data produced by those programs also lives on disk with directory organization [but not under version control; a data file is not supposed to be changed after generation], ...)
The Tao of math: The numbers you can count are not the real numbers.
As a SharePoint admin for three years, I can definitely say, without any kind of reservation, that it is utter crap.
Now don't get me wrong, the idea of SharePoint is great. But it is badly designed (the users can't find any document they need) and badly implemented (loosing data is unacceptable).
If you need a document management system, I advice anyone to use SharePoint for a few months and then switch to another system. You'll appreciate your new system so much more that way...
Many thanks for all the informative replies so far. I've had a quick glance at Evernote, thebrain, Nepomuk (I'm loving KDE4 so far after switching a week or so ago), OpenKM and FreeMind and these seem promising. I've still to look at emacs' org-mode, and when I do I will try to put my vi prejudices aside ;-) Some of the other suggestions are rather good but aren't really what I'm looking for as they are either fully cloud-based (eg Google Docs, Wave) or one platform only (eg Sharepoint) or too expensive (hire a secretary :-P).
I like the idea of some of the "roll your own" ideas, eg directories + hard links, serving from a web server or wiki. The problem is as I progress though the medical degree, I am likely to have decreasing amounts of time to tinker with things if they have shortcomings; and to be honest they probably will as I am unlikely to have thought through the problem fully! Plus third-party solutions will definitely have substantially more polish than anything hastily dreamt up by me!
A shared wiki for my cohort / medical school / country may be an option on top of whatever comes out of this discussion, but I'd like something personal as... ah, let's say medical students have wildly varying standards of what is acceptable for notes ;-)
My supplementary questions for anyone still wanting to chip in:
Thanks again for the helpful replies, Slashdot. You continue to impress me - I doubt I would have gotten such a useful variety of responses elsewhere. I hope this discussion is useful to other folk looking for something similar.
If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
I'm working on a sharepoint add-on to allow paper documents to be processed in the same manner as electronic documents. In it's current form it's a paper shredder with a Sharepoint logo taped to it.
Any insufficiently advanced magic is indistinguishable from technology.
I am also a Sharepoint server admin and I would never recommend it to anyone.
As has been said already, it has a lot of really good ideas, all executed terribly. Search is so important, yet Sharepoint is very bad it. Yes you can drag and drop a whole hierarchy of files to add them to sharepoint, but woe to you if one of those files has a name that Sharepoint does not like http://blogs.msdn.com/b/joelo/archive/2007/06/27/file-name-length-size-and-invalid-character-restrictions-and-recommendations.aspx
If you insist in keeping your notes (think before how often you will look at them, then the above tools may work for you.
(recoll is based on xapian and works very well if you have big static archives which you don't need to index often.)
Save yourself - go on a date.
Weird, certainly. You could sort, search and store documents in the way described using Gopher or WAIS long before HTML was even invented.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
You might also want to look at an xml normalizing tool like Xena - automagically converts all your docs, files, etc into open formats whose content can be searched by open tools.
Interesting. We've run a pretty large SP farm (100,000+ users, 7M+ items in the search index) for about the same time, and i can't recall it actually losing any documents/items, without direct deletion by a user. Ever.
Can't find something? Poor design of sites/libraries/lists.
For a simple at home Sp implementation, a couple of doc libraries with appropriate metadata would allow for easy searching.
Oh, and SP Foundation can run on Win7. Doesn't strictly need Server2008.
If you're well tied into the whole MS ecosystem, then SharePoint may be the way to go. If not....try somewhere else.
Have you looked at http://www.razuna.org/ http://www.opensourcedigitalassetmanagement.org/
This isn't your ideal solution—as you've noted, DTP is currently OS X only—but it does work pretty well for me, especially when I'm thinking about a general topic and need to find information on it. I even wrote a post about the similarities between Joyce's method of composition / finding material and how Johnson uses DTP.
Tracker ( http://projects.gnome.org/tracker/index.html ) is relevant to your needs. It stores relational information about your data. So for example, if you have an id3 tagged mp3 album, it will store the artist as an entity x, and the album as an entity y, and it will store that each of the mp3 files are a part of y, by x. In the same way, it stores authors and publishers as entities for pdf's. You can address this data store in the ordinary `desktop search' way (there are some search GUI's available), but more interestingly, you can use sparql to query the data store (allowing you for example to ask for all documents dated before 2000, by an author whose full name contains "Knuth").
Tracker comes with some data miners that crawl your data (e.g. using full text-search, and meta-data extractors) to build up the data store, but in your case it might make sense to enter your organization of the data into the data store yourself (using RDF). This would allow you to use the Tracker infra structure to access your data afterwards.
Via RDBMS, I've done similar things with info I had to use extensively. I parsed the files, tracked the paths, classified sections (as best a machine could), added category tagging, word indexing, etc. I'm not called "Tablizer" without a reason.
Note that it was for personal use, but could be extended to others with a fancier UI and less abbreviations.
Table-ized A.I.
You might consider exploring http://www.knowledgetree.com/ (commercial open source) or another DMS like the following:
Alfresco - Open Source Enterprise Content Management (CMS) ... - alfresco.com ... - openkm.com ... - epiware.com ... - ademero.com
OpenKM - powerful, easy to use, web-based scalable electronic
Epiware - Document Management Solutions for Everyone. A powerful
Document Management Software - Your Search for Document
http://www.opendocman.com/ (PHP)
http://en.wikipedia.org/wiki/Category:Document_management_systems
James Fallows from The Atlantic magazine writes periodically about these kinds of software programs, everything from the ancient Lotus Agenda to many of the programs mentioned above. One very powerful tool that appears to be missing from my skimming through the posts is Zoot! (only on Windows) - soon to be Zoot-XT in July: http://www.zootsoftware.com/index.html
KDE's Dolphin file manager, coupled with Akonadi and Strigi (built-in, and seamlessly integrated) does everything that you are asking for. It runs best on Linux, but there are KDE Windows and Mac ports. Of course, that means that you must install all of KDE on that Winbox or Mac.
Note that in the past there was much criticism of Akonadi due to resource usage, but that has been taken care of for at least two major version numbers (KDE 4.5 and 4.6). Let us know how it works out for you, you are really going to enjoy the tools that KDE and Dolphin offer for file management and organization.
It is dangerous to be right when the government is wrong.
Have a look at www.dmoz.org, the so-called Open Directory. Not that I suggest you post your notes on that site, but it is a good example how a directories-and-links homebrew solution could work. You could even have the actual notes sitting outside the index tree.
What I'd like on top of this is a mechanism to track files as they get moved around in the file system, which does happen occasionally. Also to keep track of their copyright and confidentiality status, so I can avoid releasing that which shouldn't be.
Comparison of reference management software
http://en.wikipedia.org/wiki/Comparison_of_reference_management_software
Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
I've only had time to skim a lot of these comments, so forgive any redundancy, but one of the first questions to ask is: Will you really spend time tagging or organizing things as you add them? Many people think they will, but then don't follow through. Perhaps make sure you have full-text indexing and search - it is costly to implement at times, but might automate some of the work of getting the stuff ready to search.