Slashdot Mirror


Ask Slashdot: Software To Organise a Heterogeneous Mix of Files?

BertieBaggio writes "I am a medical student at the end of an academic year trying to get my notes organised. I'm looking for a software document organisation system to organise a mix of text notes, journal articles, diagrams and scans. Ideally such a system would permit full-text and metadata search, multiple categorisations (eg tags), preserve the underlying files and be cross-platform (Linux/Windows/OS X). While I'm not averse to paying for such a complex solution, ideally the software would be FOSS so that extension or migration are possible if necessary. Desktop search (eg Google Desktop) probably does 90% of what I want apart from multiple categorisations, which is the feature I'm most interested in. Searching turned up a similar question over at 43folders which pointed me in the direction of Papers and DevonThink, but these are OS X only and seem to be aimed more at academic paper organisation. What recommendations does the Slashdot community have for categorising and organising a heterogeneous mix of files?"

39 of 254 comments (clear)

  1. Quick Answer by Anonymous Coward · · Score: 5, Informative

    Zotero - is awesome - Firefox plugin

    1. Re:Quick Answer by dm42 · · Score: 2

      I really like Zotero for web research - and if everything comes from the web, it can be great. What I don't like is that, aside from Zotero, there's no real good way to access the files from other software (for example, opening Adobe Acrobat and searching through the tagged files in Zotero to find that PDF that I know I have). In order to access stuff, I need to go into Zotero as my file manager first. Then there's the backup issues, data migration issues, and what happens when my Zotero database is over 4 GIG and can't be written to a DVD anymore? Or, what happens if Zotero disappears? Or what about corrupted databases?

      Don't get me wrong, Zotero is an EXCELLENT research tool - I've used it and I like it in many respects. But if I'm thinking about LONG TERM storage, resiliance against corruption and future accessibility - I'm not sure it's the best tool for the job.

      What I would really like to see/have is a system that allows me to tag files within my filesystem either as I write out the file from my application (i.e., FILE-->Save FILENAME: Basefilename:tag1:tag2:tag3.ext ) or by renaming a file already within my filesystem.

      I USED to be in the tech sector as my livelihood and knew about some tools that might help... low and behold, with a combination of FUSE, PERL, and SQLITE, I was able to cobble together something for my purposes... It may or may not be what you're looking for. It's basically a tagging overlay filesystem writen in Perl using FUSE and SQLITE. The difference between this and others (it reuses some code and concepts from StratusFS) is that it is hierarchical. All of your files are not dumped into one big directory, they are tagged within the file system hierarchy underneath.

      Since I'm no longer in the tech sector, my dev time is limited and it certainly isn't a "PRODUCT" in the conventional sense. Rather, it's one of those things that "works for me but your mileage may vary". If you're interested, you can find the project at http://code.google.com/p/htaggingolfs/ . I use it to organize a bunch of stuff. It ain't perfect but is a good start - or at least "proof of concept." If any dev types are interested in contributing... drop me a line.

    2. Re:Quick Answer by spectro · · Score: 2

      dude, you make some good points but DVDs have been obsolete for like 10 years. A 32Gb thumb drive goes for like 30 bucks these days.

      --
      HTML is obsolete. It's time for a new, simpler and richer markup language.
  2. OS X - Plain old search by FormOfActionBanana · · Score: 2, Interesting

    I have a Mac and it's not the greatest OS, but I love the search. I search all my old emails and a horde of other documents all the time.

    I'm sure other computers can do this just fine, but I was never satisfied with a desktop search implementation until OS X. And I used to be a search index consultant.

    --
    Take off every 'sig' !!
  3. an age old solution by Akatosh · · Score: 3

    Sounds like you're describing a directory tree. Search with grep, or any similar utility. Put files in multiple categories (appropriately named directories) using ln. It's cross platform, timeless, and seems to do what you describe. I feel like I'm missing something though.

    1. Re:an age old solution by oh_my_080980980 · · Score: 4, Insightful

      Because that's not a proper document management solution. Directories/Folders are not a substitute for documents tagged with meta data. Not too mention you can't create views.

    2. Re:an age old solution by Hatta · · Score: 2

      Directories/Folders are not a substitute for documents tagged with meta data

      Why not? It works for me. It's pretty easy to have a script parse, e.g., your MP3s ID3 tags and link them to the appropriate directory.

      Not too mention you can't create views

      That's what 'find' is for.

      --
      Give me Classic Slashdot or give me death!
  4. Emacs Org-Mode by he-sk · · Score: 3, Interesting

    Emacs Org-Mode. I've learned a little Emacs syntax just to use that package after I've being a Vim user for over 15 years.

    --
    Free Manning, jail Obama.
    1. Re:Emacs Org-Mode by complex_pi · · Score: 2

      Emacs Org-Mode. I've learned a little Emacs syntax just to use that package after I've being a Vim user for over 15 years.

      A bit more: Org-mode allows to define text documents with smart headings and lists. You can insert links, equations, store file attached to a heading. It is cross-platform and you can export your documents to, among other options, html or latex-pdf. You can flags items as TODO or attribute a "done" time or a "todo" time.

    2. Re:Emacs Org-Mode by he-sk · · Score: 2

      Org-Mode files are plain text, but there is extensive support for inline tables/spreadsheets and images and even code blocks. For example, you can keep data from an experiment in a table, do some analysis for which one would normally use Excel, and then plot and display a graph based on the data in one document. That takes care of a lot of files right there, because many tasks can be incorporated into the org file organically.

      Another great feature is the Org-Agenda which defaults do displaying date-based information (what is due or scheduled for today), but can be used to create filters (stored searches if you will) across any kind of data in all your org files.

      I guess it takes a certain hacker mentality, but the great thing about Org-Mode is that it allows you to organize your files and system organically to exactly suit your needs as you discover them. Of course, the major disadvantage is that you have to buy into using Emacs.

      --
      Free Manning, jail Obama.
  5. Does a good one actually exist? by Jane+Q.+Public · · Score: 2

    My sister has worked for various doctors over the years, and a doctor's office is a prime candidate for something that will organize information: forms, papers, x-rays (photos), scanned documents, etc.

    Many times she has spoken to me of the failures of the information-organization software that they have tried. Some would reach a certain capacity and choke, others had terrible OCR, and so on. In fact she even asked me about building a better application to do that; but I was too busy trying to put food on the table to take on a large project that probably would not pay off for a couple of years at least.

    If there is an application that does this well, I would like to know about it too. (One person has already mentioned Evernote, but that's a "cloud-based" application for some unknown reason, and I would have privacy concerns.)

  6. Re:Gunna hate this BUT by oh_my_080980980 · · Score: 2

    What specific file types can't you load? I have used Sharepoint 2007 since it rolled out and you can load any file type you want. There are no restrictions. We have non Office documents loaded in document libraries as well as Office libraries.

    Sharepoint would do what you are looking for but the advanced features will cost you.

  7. Evernote by dmr001 · · Score: 2

    I use Evernote, and so do a lot of my med students. It is cross platform, the free version is quite functional and stores PDFs, rich text and graphics. It is searchable and shareable.

  8. Mendeley Desktop by burningcpu · · Score: 2

    I use Mendeley Desktop for this purpose. It integrates well with Microsoft Word, and provides easy citations and reference organization. It is FOSS, and works under Windows and Linux. http://www.mendeley.com/ It also has an Iphone app, but I've never used it, so I can't vouch for its usefullness.

  9. OpenKM by bill_mcgonigle · · Score: 2

    OpenKM will do most of what you want. We've deployed it for clients who have been happy with it. It does not preserve the underlying filesystem, but you can upload a ZIP file of documents.

    It's a tomcat app - that used to be heavy-duty - if it is today depends on what kind of machine you're using.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  10. Re:Gunna hate this BUT by Shatrat · · Score: 2

    But, it can't search any of those files.
    It has a search function, but it's almost completely useless. I can even put the exact file name I'm looking for and it won't even be in the top 10 results.
    The only advantage Sharepoint has over a simple shared file directory is some crude revision control and the ability to create calendars.

    --
    09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
  11. Re:T-Ball Anyone? by mspohr · · Score: 2

    Zotero is free. Where is the profit in marketing that?

    --
    I don't read your sig. Why are you reading mine?
  12. Re:Gunna hate this BUT by rwa2 · · Score: 2

    Heh, the only way I've ever been able to tolerate having to use any version of Sharepoint is to open a document library in IE, and then click on some dropdown to change it to explorer view, and then create and right click on a folder and select explore in a new window. Then it opens up in File Explorer, where I bookmark/favorite it so I don't have to deal with the atrocious "information blackhole" Sharepoint web interface, and I can easily drag / drop / delete entire folders using the File Explorer interface, and the URLs I send to coworkers are a lot more sane-looking and consistent. (At least in older versions, Sharepoint URLs would seldom get the user to where they wanted to go (way to break the internet there!), leading to long entertaining prose as people attempted to describe how to "navigate" to some random place in Sharepoint.)

    And maybe the search works better now, but I often couldn't find files amidst all the junk that shows up, even if I knew and specified their filenames.

    Makes it much easier to use a local revision control thing too, I've lost work a few times trying to use Sharepoint's revision tracker doodad.

  13. Re:Zotero by maxwell+demon · · Score: 2

    Indeed, I found Zotero extremely valuable to manage papers (especially because you can add them directly from the web site with a single click), but I don't use it for anything else (my LaTeX files are under version control and organized using traditional directories, my notes are mostly in Tomboy [except for those more complex which I do in LaTeX, and of course those I do on paper], my mails are on the mail server [which I don't access through the web interface if I don't need to], any self-written programs are of course also on disk and under version control, data produced by those programs also lives on disk with directory organization [but not under version control; a data file is not supposed to be changed after generation], ...)

    --
    The Tao of math: The numbers you can count are not the real numbers.
  14. Re:Gunna hate this BUT by Anonymous Coward · · Score: 4, Interesting

    As a SharePoint admin for three years, I can definitely say, without any kind of reservation, that it is utter crap.

    Now don't get me wrong, the idea of SharePoint is great. But it is badly designed (the users can't find any document they need) and badly implemented (loosing data is unacceptable).

    If you need a document management system, I advice anyone to use SharePoint for a few months and then switch to another system. You'll appreciate your new system so much more that way...

  15. OP here by BertieBaggio · · Score: 3, Interesting

    Many thanks for all the informative replies so far. I've had a quick glance at Evernote, thebrain, Nepomuk (I'm loving KDE4 so far after switching a week or so ago), OpenKM and FreeMind and these seem promising. I've still to look at emacs' org-mode, and when I do I will try to put my vi prejudices aside ;-) Some of the other suggestions are rather good but aren't really what I'm looking for as they are either fully cloud-based (eg Google Docs, Wave) or one platform only (eg Sharepoint) or too expensive (hire a secretary :-P).

    I like the idea of some of the "roll your own" ideas, eg directories + hard links, serving from a web server or wiki. The problem is as I progress though the medical degree, I am likely to have decreasing amounts of time to tinker with things if they have shortcomings; and to be honest they probably will as I am unlikely to have thought through the problem fully! Plus third-party solutions will definitely have substantially more polish than anything hastily dreamt up by me!

    A shared wiki for my cohort / medical school / country may be an option on top of whatever comes out of this discussion, but I'd like something personal as... ah, let's say medical students have wildly varying standards of what is acceptable for notes ;-)

    My supplementary questions for anyone still wanting to chip in:

    • Do Evernote and thebrain/personalbrain have an "offline" mode? That is, can I keep things nicely organised locally instead of having to upload to the cloud? Evernote seems to suggest this is the case but at a very cursory perusal both seem to stop short of actually saying "you can use this to organise the files on your hard drive".
    • Do any other docs / medical students want to chip in and say what they use? I suspect each solution will have practical considerations that may affect my decision or not. Sharing practical experience is very useful, even if it's just "I use Evernote and find it useful" as some have said above.
    • Similarly, does anyone have experience of Nepomuk? Any drawbacks?

    Thanks again for the helpful replies, Slashdot. You continue to impress me - I doubt I would have gotten such a useful variety of responses elsewhere. I hope this discussion is useful to other folk looking for something similar.

    --
    If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
    1. Re:OP here by supercrisp · · Score: 3, Interesting

      I am a researcher. I want to add my vote for "file system." The less interaction you do with most of this material, the better off you'll be. For me, important or useful material goes into a reference manager. Those files get tagged in the reference manager. At this point in my career--only four years in--that's just under 600 articles with accompanying pages of notes. Other stuff goes into folders based on broad categories. I don't do any tagging on these because find-by-content always does the job just fine. Avoid the extra work. You're not paid to be a secretary. And most of the organizing won't pay off, will become an end in itself.

  16. Re:Gunna hate this BUT by jd2112 · · Score: 4, Interesting

    I'm working on a sharepoint add-on to allow paper documents to be processed in the same manner as electronic documents. In it's current form it's a paper shredder with a Sharepoint logo taped to it.

    --
    Any insufficiently advanced magic is indistinguishable from technology.
  17. Re:Gunna hate this BUT by mbenzi · · Score: 2

    I am also a Sharepoint server admin and I would never recommend it to anyone.

    As has been said already, it has a lot of really good ideas, all executed terribly. Search is so important, yet Sharepoint is very bad it. Yes you can drag and drop a whole hierarchy of files to add them to sharepoint, but woe to you if one of those files has a name that Sharepoint does not like http://blogs.msdn.com/b/joelo/archive/2007/06/27/file-name-length-size-and-invalid-character-restrictions-and-recommendations.aspx

  18. OCR,pdf annotations,text/html,pdflatex + recoll by drolli · · Score: 2

    If you insist in keeping your notes (think before how often you will look at them, then the above tools may work for you.

    (recoll is based on xapian and works very well if you have big static archives which you don't need to index often.)

  19. Shoebox 2.0 by peterofoz · · Score: 2
    For class notes, the shoebox is the perfect bucket sort container for paper you'll probably never reference again. Works especially well for utility bills, bank statements. Mark it by year and possibly by broad topic so you know if you have to shred the contents or just toss it.

    Save yourself - go on a date.

  20. Re:In The Works by jd · · Score: 2

    Weird, certainly. You could sort, search and store documents in the way described using Gopher or WAIS long before HTML was even invented.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  21. Xena by spasm · · Score: 2

    You might also want to look at an xml normalizing tool like Xena - automagically converts all your docs, files, etc into open formats whose content can be searched by open tools.

  22. Re:Gunna hate this BUT by YrWrstNtmr · · Score: 2

    Interesting. We've run a pretty large SP farm (100,000+ users, 7M+ items in the search index) for about the same time, and i can't recall it actually losing any documents/items, without direct deletion by a user. Ever.
    Can't find something? Poor design of sites/libraries/lists.

    For a simple at home Sp implementation, a couple of doc libraries with appropriate metadata would allow for easy searching.

    Oh, and SP Foundation can run on Win7. Doesn't strictly need Server2008.
    If you're well tied into the whole MS ecosystem, then SharePoint may be the way to go. If not....try somewhere else.

  23. Two projects to look at by dbosman · · Score: 2
  24. How to use Devonthink Pro by ThousandStars · · Score: 2
    I don't have a perfect answer for you, but I can tell you that I use Devonthink Pro as described here by Steven Berlin Johnson. In addition, I have a large "random" folder that consists mostly of snippets of text found in articles on the Internet.

    This isn't your ideal solution—as you've noted, DTP is currently OS X only—but it does work pretty well for me, especially when I'm thinking about a general topic and need to find information on it. I even wrote a post about the similarities between Joyce's method of composition / finding material and how Johnson uses DTP.

  25. Tracker by mugurel · · Score: 2

    Tracker ( http://projects.gnome.org/tracker/index.html ) is relevant to your needs. It stores relational information about your data. So for example, if you have an id3 tagged mp3 album, it will store the artist as an entity x, and the album as an entity y, and it will store that each of the mp3 files are a part of y, by x. In the same way, it stores authors and publishers as entities for pdf's. You can address this data store in the ordinary `desktop search' way (there are some search GUI's available), but more interestingly, you can use sparql to query the data store (allowing you for example to ask for all documents dated before 2000, by an author whose full name contains "Knuth").

    Tracker comes with some data miners that crawl your data (e.g. using full text-search, and meta-data extractors) to build up the data store, but in your case it might make sense to enter your organization of the data into the data store yourself (using RDF). This would allow you to use the Tracker infra structure to access your data afterwards.

  26. Database by Tablizer · · Score: 2

    Via RDBMS, I've done similar things with info I had to use extensively. I parsed the files, tracked the paths, classified sections (as best a machine could), added category tagging, word indexing, etc. I'm not called "Tablizer" without a reason.

    Note that it was for personal use, but could be extended to others with a fancier UI and less abbreviations.

  27. KT, Alfresco, OpenDocMan, various other DMSs by QuietRiot · · Score: 2

    You might consider exploring http://www.knowledgetree.com/ (commercial open source) or another DMS like the following:

    Alfresco - Open Source Enterprise Content Management (CMS) ... - alfresco.com
    OpenKM - powerful, easy to use, web-based scalable electronic ... - openkm.com
    Epiware - Document Management Solutions for Everyone. A powerful ... - epiware.com
    Document Management Software - Your Search for Document ... - ademero.com

    http://www.opendocman.com/ (PHP)

    http://en.wikipedia.org/wiki/Category:Document_management_systems

  28. Zoot on Windows by rhover · · Score: 2

    James Fallows from The Atlantic magazine writes periodically about these kinds of software programs, everything from the ancient Lotus Agenda to many of the programs mentioned above. One very powerful tool that appears to be missing from my skimming through the posts is Zoot! (only on Windows) - soon to be Zoot-XT in July: http://www.zootsoftware.com/index.html

  29. Dolphin by dotancohen · · Score: 2

    KDE's Dolphin file manager, coupled with Akonadi and Strigi (built-in, and seamlessly integrated) does everything that you are asking for. It runs best on Linux, but there are KDE Windows and Mac ports. Of course, that means that you must install all of KDE on that Winbox or Mac.

    Note that in the past there was much criticism of Akonadi due to resource usage, but that has been taken care of for at least two major version numbers (KDE 4.5 and 4.6). Let us know how it works out for you, you are really going to enjoy the tools that KDE and Dolphin offer for file management and organization.

    --
    It is dangerous to be right when the government is wrong.
  30. something like dmoz? by hendrikboom · · Score: 2

    Have a look at www.dmoz.org, the so-called Open Directory. Not that I suggest you post your notes on that site, but it is a good example how a directories-and-links homebrew solution could work. You could even have the actual notes sitting outside the index tree.

    What I'd like on top of this is a mechanism to track files as they get moved around in the file system, which does happen occasionally. Also to keep track of their copyright and confidentiality status, so I can avoid releasing that which shouldn't be.

  31. Wikipedia, "reference management software" by OldHawk777 · · Score: 2
    --
    Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
  32. Storing/Indexing/Tagging vs. Searching/Finding by TheLoneGundam · · Score: 2

    I've only had time to skim a lot of these comments, so forgive any redundancy, but one of the first questions to ask is: Will you really spend time tagging or organizing things as you add them? Many people think they will, but then don't follow through. Perhaps make sure you have full-text indexing and search - it is costly to implement at times, but might automate some of the work of getting the stuff ready to search.