Organizing and Analyzing Mounds of Research Text?

← Back to Stories (view on slashdot.org)

Organizing and Analyzing Mounds of Research Text?

Posted by Cliff on Sunday June 15, 2003 @10:34AM from the putting-it-all-together dept.

Andrew Green asks: "Four years ago, I stopped working on my Master's thesis in Social Anthropology. Now, I'm getting back to it again, and I find there's a _lot_ of text to deal with. I have a 350-page field diary, a dozen transcripts of recorded interviews, lists of books and articles, extensive notes about the books and articles, and the books and articles themselves in dead-tree or electronic format. I want to organize _all_ of it. I need to keep track of all the different text files on my computer (most are in MS formats, though I now use GNU/Linux), which ideally means keeping personalized sets of metadata about each file, linking files to other files and to entries in bibliography lists, and having some sort of version control. I'd also like to be able to do a free-text search on all texts on my local hard drive. And, most important of all, I need to build a hierarchical list of topics that are relevant to my thesis and relate specific sections of all these texts (not just whole files) to different topics. Any ideas? I know there are proprietary solutions out there, but I don't want to use them. What free applications can best deal with some or all of my needs? Would I be better off building something myself?"

28 comments

Min score:

Reason:

Sort:

working on it by ghostlibrary · 2003-06-15 10:41 · Score: 2, Interesting

I blended a UML tool with a tabbed editor to make a setup that (independent of where files are) lets you create arbitrary drawing/relationships between files, and then just click on the drawing to actually call up said file into the tabbed editor.

If others were interested, I'd happily post the diffs to do it (the UML drawer is 'ldsdraw', which isn't under a license yet so, while openly available, distribution of my 'fork' wouldn't be proper so I can only provide the 'diffs'. The author is open to licensing/sourceforging it but that's awaiting an excess of free time.) The editor is also modded but both set of 'diffs' are really short. I love Tcl/Tk.

--
A.
the open source app for your needs: by Tumbleweed · 2003-06-15 10:42 · Score: 3, Funny

pico
1. Re:the open source app for your needs: by pete-classic · 2003-06-16 03:09 · Score: 2, Informative
  
  Pico is open, but not Free.
  
  Try GNU Nano instead.
  
  Just link nano to pico and never look back!
  
  -Peter
2. Re:the open source app for your needs: by cloudmaster · 2003-06-16 04:35 · Score: 1
  
  Just link nano to pico and never look back!
  
  Or, link vim to pico and let the users' confused looks entertain you. :)
3. Re:the open source app for your needs: by pete-classic · 2003-06-16 07:14 · Score: 2, Funny
  
  Amateur. If you want to generate confused looks you link ed to their favorite editor.
  
  A typical "ed" session usually goes like this:
  
  hello
  ?
  asdf
  ?
  quit
  ?
  ZZ
  ?
  !wq
  sh: wq: command not found
  !
  ^C
  ?
  
  and so on.
  
  (Stolen from http://entropy.brni-jhu.org/unix-benefits.html.)
  
  -Peter
Some things that can help by ObviousGuy · 2003-06-15 11:03 · Score: 1, Funny

You'll probably need at least one of these. And lots of these.

--
I have been pwned because my /. password was too easy to guess.
Dude. by dynoman7 · 2003-06-15 11:13 · Score: 3, Insightful

Four years ago, I stopped working on my Master's thesis in Social Anthropology. Now, I'm getting back to it again, and I find there's a _lot_ of text to deal with. I have a 350-page field diary, a dozen transcripts of recorded interviews, lists of books and articles, extensive notes about the books and articles, and the books and articles themselves in dead-tree or electronic format. I want to organize _all_ of it. I need to keep track of all the different text files on my computer (most are in MS formats, though I now use GNU/Linux), which ideally means keeping personalized sets of metadata about each file, linking files to other files and to entries in bibliography lists, and having some sort of version control.

Dude. Quit procrastinating and write the damn thing...your teachers have no clue who you are and it will be a surprise when you dump said "thesis" on their desks.

--
Blarf.
1. Re:Dude. by tengwar · 2003-06-15 12:21 · Score: 3, Insightful
  
  'Fraid I've got to agree with parent. If you know the stuff, a paper index is enough. If you don't, you've just got to read it until you've internalised it before you can draw any vaild conclusions.
  If you insist on linking the documents, use plain hand-written HTML - I've done it before while getting in to a subject, but don't expect to need it after the first couple of weeks.
wait, FREE solutions? by Anonymous Coward · 2003-06-15 11:20 · Score: 3, Informative

well, one I've played with is:

DEVONthink

It's really cool and great for exactly what you describe: storing a bunch of loosely-connected information that you need to search and cluster into categories.

You just add your text and it will automatically classify it using semantic analysis.

But alas, it is for Mac only and is not Free.

If anybody knows about anything like this for Linux, and Free, I'd love to hear about it....
InfoSelect by falsification · 2003-06-15 11:22 · Score: 3, Funny

As an advocate of free software wherever possible, I can confidently recommend that you format your hard drive, install Windows, and then install InfoSelect. You will thank me later.
http://www.miclog.com/
(Or keep Linux and try to run InfoSelect with WINE. I don't know if that would work.)
Try TWiki by Demosthenex · 2003-06-15 12:11 · Score: 4, Informative

TWiki has many of the features you mentioned, and is a web based app that you could publish with later. ;]

http://www.twiki.org/

keeping personalized sets of metadata about each file, linking files to other files and to entries in bibliography lists, and having some sort of version control.

Each TWiki page can have custom searchable metadata in forms. Pages are linked to other pages by WikiWords. Version control and access lists are on every page.

I'd also like to be able to do a free-text search ... I need to build a hierarchical list of topics that are relevant to my thesis and relate specific sections of all these texts

Text searching is integrated. You can arrange TWiki pages into hierarchies with parent topic, and there is automatic crossreferencing.

TWiki uses a pure text format with some simple markup, *bold* for example. HTML can be used as well.

I'd suggest you check it out. No database required.

Demo
If you want a real tool for organizing knowledge by zhiwenchong · 2003-06-15 12:19 · Score: 1, Informative

Try Multicentrix. It goes beyond hierarchies and recognizes 5 basic types of relationships between information. Every piece of information is an object, and unlike other systems, the relationship between objects is an object! More information here. Has XML support and even allows a database backend.
Re:If you want a real tool for organizing knowledg by Master+of+Transhuman · 2003-06-15 12:44 · Score: 2, Informative

Multicentrix is over $600 although there is apparently a Lite version for $72 - don't know how crippled that version is. Sounds like an interesting product, tho...

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
Circus Ponies! by metamatic · 2003-06-15 13:16 · Score: 2, Informative

I can thoroughly recommend Circus Ponies Notebook. It's what I use for all my programming notes and other research notes.

It looks like a notebook. It works like an outliner. You can organize work page by page, or use long scrolling pages. It has dividers with tabs for different sections, and customizable page styles. It has highlighters and stickies for annotation. It sets up system-wide clipping services to let you pull snippets of information in from any application. And the best bit is that it automatically indexes everything you put in it at the back, builds a dynamic table of contents at the front, date-stamps every outline entry, and has super-fast search including search by highlighter color and search of stickies. It imports and exports XML and RTF.

But unfortunately for you, it's one of the many great reasons to get a Mac.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Nothing will think for you by RGRistroph · 2003-06-15 13:28 · Score: 3, Informative
Nothing will think for you. But if you are the type of person who studies by accumulating a pile of books, reading random pages and then looking up the interesting terms in the indexes of several other books, then you may be able to do that with electronic documents.
Here are some links to indexing and searching software. There is a lot of stuff oriented towards providing search functionality on web pages, but you may want something that just searches your local drive.
- MG (it is not necessary to buy the book just to use it).
- DesktopDig; nice graphical interface, I had trouble installing it.
- Clucene, a C++ version of Lucene. Stay away from Lucene, it's in Java.
some have tried asksam by Rory+Drum · 2003-06-15 15:23 · Score: 2, Informative

http://www.asksam.com/ You have probably come across this software -- it is not free and is windows based, but it does address many of your needs.
3 ring binder anyone? by foniksonik · 2003-06-15 17:13 · Score: 4, Insightful

Well it's not exactly opensource or even software but for your one time purpose you might consider just categorizing your data into sections and putting it in to one or multiple 3-Ring binders with subdividers w/ little tabs on them for easy lookup.

After that you will have a great start on digitizing the whole thing if you still feel like it later (presumably after you've actually gone ahead and written your paper).

Seriously, why take all the time to digitize this stuff when you're only going to use it once in your life as reference. When and if you get to publish it later you can spend the time inputting or maybe even hire someone to do it for you.

--
A fool throws a stone into a well and a thousand sages can not remove it.
Namazu? by Ramses0 · 2003-06-15 17:28 · Score: 2, Informative

I ran into this little program called "Namazu", which is kindof a simple (web accessible) search engine, and kindof a really neat version of "grep".

Basically can run full-text searches against text that you have on your local HD and build up indexes. Reasonably fast, from what I can tell. Also has support for DOC/PDF, through some interesting method. If it doesn't, look for "catdoc", which will let you (usually) get plain text from a word .doc. Mmmmm, proprietary file-formats.

apt-get install namazu catdoc, for those who are enlightened.

--Robert
Use Page Ranking ? by Ed+Almos · 2003-06-15 22:09 · Score: 2, Interesting

A few days ago I seem to remember a Slashdot article on running page ranking systems on your own PC, something similar to Google.

Why not use a system similar to this to index all your information ?

Ed Almos

--
The more corrupt the state, the more numerous the laws. - Tacitus, 56-120 A.D.
Or, AskSam by reallocate · 2003-06-16 00:08 · Score: 1

Along the same lines, AskSam is a Windows product that's been around for years. Not cheap, but has a good rep.

--
-- Slashdot: When Public Access TV Says "No"
problem with flat text? by Anonymous Coward · 2003-06-16 01:16 · Score: 0

How about vi and grep the way God intended?
Get it into your brain, not into your computer... by Anonymous Coward · 2003-06-16 04:17 · Score: 1, Insightful

Spend the time you'd otherwise spend scanning, acquiring software,in just reading the stuff. Underline, put stickies in to mark places, make up notecards, etc.

This ISN'T too much data to put in your head, and in your head is where it needs to be. Soak in it. Give your right brain a chance to have a crack at it. Then one day you'll wake up from a sound sleep at 3 a.m. in the morning and something will suddenly be clear that hadn't been clear before.

Admittedly, it will then take you about thirty minutes to find the supporting documentation (fifth pile from the left, half-an-inch down, with a lime-colored post-it on it; and that article from the journal with the buff cover. But if you spend the time computerizing, you'll be able to find everything--but no idea what you need to find.

Computers do have their uses in "augmenting" human intelligence (Engelbart's term) but if you can't master the data you've collected yourself on a deep, intimate, _topographic_ level, computerizing it probably isn't going to help that much.

Just my $0.02.
What you need is.... by Ismene · 2003-06-16 04:48 · Score: 1

an archivist who would deal with all that for you. ;-)
However, that is an expensive route to follow. As for a hierarchical database system, I have yet to find a cheap one or "free" one, however, I have only been searching for one in the context of archival purposes, so that is probably the problem. I believe you could probably manipulate Inmagic's DB/Textworks to do what you want. But I believe it isn't a "hierarchical" database. Good luck!
Wimp. by Dthoma · 2003-06-17 05:26 · Score: 1

Real nerds use ed, grep and mkdir to organise and analyse text!

--
Note to M1-ers: a curt but otherwise insightful message is not "Flamebait" or "Troll".
DOS is the answer by boulding · 2003-06-17 07:05 · Score: 1

Actually two great old Lotus DOS apps would work: Magellan: would organize the text info in place through its multi-level viewer/indexing system. Easy to use. You can create "meta" files that link to other files using a special viewer, and do indexed concept-based searching. Agenda: would require text import/entry, but has amazing tools for cutting through it with different "views" and categories. Hard to learn, but powerful when you do. These are getting harder to find nowadays... try a Google search or use eBay. Run in a DOS emulator under Linux ... no need to reformat!
one suggestion by g4dget · 2003-06-17 15:06 · Score: 1

Convert the Word stuff to HTML, PDF, or RTF, put it all into a directory tree, and index it with one of the open source intranet web search engines (htdig works OK for me).
Book on Apps for Anthropologists by Anonymous Coward · 2003-06-17 15:30 · Score: 0

http://www.amazon.com/exec/obidos/ASIN/0534171664/ qid%3D1055906670/sr%3D11-1/ref%3Dsr%5F11%5F1/002-3 941920-3501664

I looked at this while in grad school (around '96). Mostly stat analysis and kinship stuff as I remember. If I were you, I would look for some programs to assist you in classifying. I use POPFile (see Sourceforge)for helping me classify me emails and have been thinking about using the PERL mods it is based on to help me suggest categories to users using a doc management system I am building.
chip42@yahoo.com