Organizing and Analyzing Mounds of Research Text?
Andrew Green asks: "Four years ago, I stopped working on my Master's thesis in Social Anthropology. Now, I'm getting back to it again, and I find there's a _lot_ of text to deal with. I have a 350-page field diary, a dozen transcripts of recorded interviews, lists of books and articles, extensive notes about the books and articles, and the books and articles themselves in dead-tree or electronic format. I want to organize _all_ of it. I need to keep track of all the different text files on my computer (most are in MS formats, though I now use GNU/Linux), which ideally means keeping personalized sets of metadata about each file, linking files to other files and to entries in bibliography lists, and having some sort of version control. I'd also like to be able to do a free-text search on all texts on my local hard drive. And, most important of all, I need to build a hierarchical list of topics that are relevant to my thesis and relate specific sections of all these texts (not just whole files) to different topics. Any ideas? I know there are proprietary solutions out there, but I don't want to use them. What free applications can best deal with some or all of my needs? Would I be better off building something myself?"
well, one I've played with is:
DEVONthink
It's really cool and great for exactly what you describe: storing a bunch of loosely-connected information that you need to search and cluster into categories.
You just add your text and it will automatically classify it using semantic analysis.
But alas, it is for Mac only and is not Free.
If anybody knows about anything like this for Linux, and Free, I'd love to hear about it....
TWiki has many of the features you mentioned, and is a web based app that you could publish with later. ;]
... I need to build a hierarchical list of topics that are relevant to my thesis and relate specific sections of all these texts
http://www.twiki.org/
keeping personalized sets of metadata about each file, linking files to other files and to entries in bibliography lists, and having some sort of version control.
Each TWiki page can have custom searchable metadata in forms. Pages are linked to other pages by WikiWords. Version control and access lists are on every page.
I'd also like to be able to do a free-text search
Text searching is integrated. You can arrange TWiki pages into hierarchies with parent topic, and there is automatic crossreferencing.
TWiki uses a pure text format with some simple markup, *bold* for example. HTML can be used as well.
I'd suggest you check it out. No database required.
Demo
Try Multicentrix. It goes beyond hierarchies and recognizes 5 basic types of relationships between information. Every piece of information is an object, and unlike other systems, the relationship between objects is an object! More information here. Has XML support and even allows a database backend.
Multicentrix is over $600 although there is apparently a Lite version for $72 - don't know how crippled that version is. Sounds like an interesting product, tho...
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
I can thoroughly recommend Circus Ponies Notebook. It's what I use for all my programming notes and other research notes.
It looks like a notebook. It works like an outliner. You can organize work page by page, or use long scrolling pages. It has dividers with tabs for different sections, and customizable page styles. It has highlighters and stickies for annotation. It sets up system-wide clipping services to let you pull snippets of information in from any application. And the best bit is that it automatically indexes everything you put in it at the back, builds a dynamic table of contents at the front, date-stamps every outline entry, and has super-fast search including search by highlighter color and search of stickies. It imports and exports XML and RTF.
But unfortunately for you, it's one of the many great reasons to get a Mac.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Here are some links to indexing and searching software. There is a lot of stuff oriented towards providing search functionality on web pages, but you may want something that just searches your local drive.
http://www.asksam.com/ You have probably come across this software -- it is not free and is windows based, but it does address many of your needs.
I ran into this little program called "Namazu", which is kindof a simple (web accessible) search engine, and kindof a really neat version of "grep".
.doc. Mmmmm, proprietary file-formats.
Basically can run full-text searches against text that you have on your local HD and build up indexes. Reasonably fast, from what I can tell. Also has support for DOC/PDF, through some interesting method. If it doesn't, look for "catdoc", which will let you (usually) get plain text from a word
apt-get install namazu catdoc, for those who are enlightened.
--Robert
Pico is open, but not Free.
Try GNU Nano instead.
Just link nano to pico and never look back!
-Peter