Are Digital "Margin Notes" Possible Yet?
Stavo asks: "I'm looking for a robust, reliable personal knowledge management solution. As a professional researcher, I read a lot of text-based content. I prefer to mark up content, by underlining or adding margin notes. I also need to retrieve and search content. The low tech solution is printing the text and using a pen to mark up, then filing the papers. If I want to quote a source, I have to type the quote. With the advent of Tablet PCs and similar tech, I'd like to find a way to keep the content digital. In other words, if I download an journal article in PDF or HTML, how can I mark it up, save it, and later search/retrieve it? Shouldn't computers provide a better solution than voluminous file cabinets filled with dead trees?"
You're describing something that I have wanted to build ever since my advisor started handing me papers to read left, right and center. Unfortunately (or not, depending on how you look at it), I haven't had enough time to do more than think, "wow, what I really want is a database that can hold these papers, do some kind of semi-intelligent indexing, keep notes, and figure out what the BibTeX entry should be."
From the little bit of research that I've done, a lot of the pieces for this are already out there (i.e. APIs for manipulating PDFs, database engines, indexing engines, etc.) but I just haven't had the time to put any of it together. Anyways, if anyone does have an answer please let me know about it ;-)
In Adobe Acrobat with PDF files
"Eve of Destruction", it's not just for old hippies anymore...
Unless I'm missing something, the full version of Adobe Acrobat can do all that. Annotations in text, voice, file attachments, etc. and a file indexing service "Adobe Catalog". Any PostScript output can be turned into a PDF, there are even free tools to do this on Linux. But if you're using Macintosh or Windows, you can print directly to PDF format. Acrobat 5 can even render web pages into PDF format, preserving links. IIRC Adobe also has a fully functional time limited demo available.
Now, getting those dead-tree file cabinets into PDF format is another problem alltogether. Possibly using overseas data-entry companies?
Yep, head on over to www.adobe.com and research.
use "Track Changes"
(oh crap! this is slashdot...wait a minute, don't use MS Word!)
It would be wonderful for some open source standard whereby meta-information could be overlaid on HTML such that document position (for 'anchoring' your commentary in the context you created) would allow you to keep your marked-up copy as such, it should automatically 'wrap' the appropriate citation info around any selected text, and then 'carry' that citation into it's appropriate location in your source doc.
I've been a tech writer for years and entities from Sun to the local universities and utility companies all fail to implement systems of this sort for various reasons...often technical, more often political and financial.
I do believe it's possible, whoever...
STOP. You're being farmed.
Annotea is a W3C project. To quote from the site:
It provides annotation capabilities for HTML documents, and maybe XML documents, delivered in a web browser or similar UA.
Anonzilla is a project for providing Annotea capabilities for Mozilla. Check it out!
HTH
/mike
-- "So, what's the deal with Auntie Gerschwitz et all?"
Short answer: No
Longer answer: Nope
It's not enough to bash in heads, you've got to bash in minds. - Captain Hammer
It's an app used primarily in the legal industry. You can in hard copies or import text/doc files. Once the file's been imported into the system, you can highlight bits of text and do the things that you need to do. Used it a lot when I was a legal assistant, mostly for summarizing deposition or trial transcripts.
Check 'em out here at http://www.summation.com
-- anthony
is "marginalia."
...mosaic 1.0 and before?
US Citizen living abroad? Register to vote!
It's definitely possible, as others have said, Adobe Acrobat already does this. I happen to own a copy of Acrobat, so I've had the opportunity to play around it's capabilities.
Short answer -- it works pretty damn well. But not with a mouse. A mouse just isn't suited to making marginal notes (i.e., checking an important idea, underlining a particular phrase, or circling an important passage). A tablet device with a stylus, however - that holds promise.
Other things to note: Acrobat provides two types of commenting systems. First, notations -- you can hilight, underline, circle, or freestyle directly onto the document. Second, "sticky-note" style comments. One very cool thing about the sticky-notes are that they render translucent so that you can still read the text underneath the note.
Also, as far as I can tell, the commenting systems appear to be embedded into the document as PDF code. Specifically, gv is able to render notations (hilighting, underlines, etc). gv is not able to render the sticky-notes, however. I don't know if that's because gv simply can't handle the sticky-notes or because the sticky-notes are in some type of proprietary format. xpdf doesn't render either form of comments.
So, if you're using Windows, are comfortable with proprietary software, and can afford $250, you're more or less set (assuming that pen computing lives up to its promise).
Things get a bit more tricky if you're looking for free-software solutions. As far as I know, there's nothing out there as of yet. And I don't know how difficult it would be to implement (I do know that it's way beyond my capabilities, however). But because it appears that Acrobat embeds the comments as native PDF code, it should be possible. The question is whether or not anyone's willing to take up the cause...
Acrobat Reader can.
gv, ggv, and gsview cannot.
Come to think of it, the Open Source world has seriously missed the ball in general when it comes to PDF documents. Open Source PDF viewers suck. In every single Open Source PDF viewer I've used, I've run into documents where the renderer has the orientation wrong -- and not just the orientation, but the "orientation of the bounding box" being different different from the "orientation of the drawn data on the bounding box", so that the top and bottom of the drawn data is lopped off, and there's a ton of white space to the left and right.
The only Open Source PDF viewer I've used that can (gasp) search for text is gsview, and it's *really* flaky and doesn't highlight found text. Nothing like trying to read through a page of text to find the one word you're looking for.
I've never used an Open Source PDF viewer that can antialias embedded bitmap images, which makes things look awful and unreadable.
Finally, Acrobat Reader for Linux is completely awful, and leaks memory like a sieve. I have a friend with about a gig of RAM that Acrobat Reader sucked through in about six minutes of dragging and scrolling the document.
Since PDF-viewing is one of the major office activities (along with world processing, and email), this is an enormous impediment to the use of Linux (or any UNIX) in a desktop environment.
It's extremely embarrassing to say something nice about Linux, have a friend use it, and then realize how truly much Linux software sucks at handling PDFs. "You mean I have to read through this thing manually instead of searching?" "Why does this print turned sideways? It works fine on Windows!" "Why does this look so bad?"
I predict Linux will not take off on the office desktop until (a) OpenOffice doesn't look and work completely differently from every app out there, and is free of cosmetic bugs *and* handles MS Office documents almost flawlessly, and (b) PDF viewing doesn't suck.
And for home use, (c) until the Linux sound architecture doesn't completely suck. Right now, the only way to obtain software mixing is through a dropout-prone, non-real-time-scheduled sound server with lousy latency. They usually don't share the sound device very nicely, either. Many sound systems can't do hardware mixing. Linux doesn't have a single way to do software mixing fallback, where a user out of hardware channels will automatically do real-time-scheduled software mixing. Pretty lame. Oh, and at least esd has truly awful resampling. Usually, when new users come to Linux, I hear "why is my sound dropping out when it doesn't on Windows", "why is there lag between something happening and a sound playing", "why does my sound sound so bad (this when resampling is occurring", or "why can't I hear ICQ sounds when xmms is playing?"
May we never see th
is write on your screen with a dry erase marker, then wipe it off when you close the file. or maybe... use the highlighting tool in your editor of choice. (Most WISYWIG HTML Editors will allow you to highlight text in some fashion - bold,change text color, change background color)
There once was a plugin or tool you could download that allowed multiple people to annotate (more like put graffiti on) a web page.
Other people with the same tool could then view the annotations.
does this still exist?
comment directly in my journal
You might want to take a look at the excellent bibliography management software EndNote. It has a lot of functionality that might serve as a foundation for what you want to accomplish.
CMU's AUIS (or whatever they took to calling the expanded Andrew Tool Kit stuff) included a tool very similar to this, ~10 years ago. Students could use a word-processor-like program to write papers (or import them from other sources), then submit them on-line to a central server. Teachers/TAs/whatnot could read, edit, and markup (that is, either change the document or annotate it) on-line, and share the results with other teacher/TA/whatnots and/or the students. It was a pretty useful idea, used for a couple MIT classes, but the implementation was pretty flawed (the server crashed often, and was prone to losing files), so they stopped.
Come to think of it, that's pretty much the experience with AUIS/ATK in general. In significantly less-nice terms, a friend of mine once said:
like all CMU code: way cool design, implementation like wet camel shit.
At this point, I believe that AUIS is pretty much defunct, so I doubt anyone cares. The code is probably available under OSS license if anyone cares (I believe it was old-style BSD (with attribution)).
- Local storage of the document you wish to mark up
- A Uniform Resource Name for refering to said document, this URN must uniquely specify the document and not allow modifications (which would mangle or destroy the markup layers on top of it)
- A markup language which used the URN and relative markups, thus allowing multiple layers of markup
- A tool that understands and can write to all of the above
I'd like to see this soon, but knowing the nature of the internet, open source, etc... I'm not going to hold my breath. If you want to work on it with me, that would be great. My email address should be obvious.Using Acrobat is not an option, real markup of things on the web needs to be the goal.
You're not alone... I hope that's comforting.
--Mike--
Perhaps I am simply luckier, but I have never had xpdf get the bounding box wrong in either fashion you describe.
Regardless of luck, I search for text in xpdf without trouble* at least several times a week, for months.
Check it out, at freshmeat, for example.
* by `without trouble', I don't count xpdf's nearly overbearing ugliness as `trouble'. :-)
Take the md5 hash of a selected piece of text - the text you want to annotate. Since most document formats support inline comments, you then put a "marker" in the comment, and the md5 hash. At the end of the document, in another comment, put the annotation with the hash of the annotated text.
Then, all you need to do is search for the "markers" and match the md5 hash with the comment. If anyone fancies implementing this, give me a shout. I think it would be quite easy to do.
There's a short paper explaining this system at http://zesty.ca/crit/yee-crit-cscw2002-demo.pdf.
There's a short paper explaining this system at http://zesty.ca/crit/yee-crit-cscw2002-demo.pdf.
For articles that you scan from the printed world and store as scanned images, I would recommend something like PaperPort Deluxe. In addition to offering nice folder-based scanned image management and editing, it also allows annotating these scans via virtual sticky notes, text boxes, free-form drawing and highlighting, etc. All these annotations are stored in a separate layer but can be permanently "burned" into the underlying raster image at any time. In addition it also offers background full-text indexing (after OCR-ing on the fly) and searching. It's quite space efficient with scans, especially when scanning at 300 dpi lineart (which is most useful for archiving printed articles, since they can be printed again at decent quality), with the average magazine article page taking up only about 30-40KB as a compressed TIFF image.
Anyway, while it also lets you manage Word, Excel, PDF etc. files and web pages (and view them within its interface), unfortunately it won't let you annotate those. That would indeed be a very nice extra feature, maybe it should be suggested to ScanSoft. But still, scans of printed articles do make up a very substantial subset of research articles (my wife also does research and has to deal with this same issue), so PaperPort's features are still very useful. Plus it's a very inexpensive product, often included for free with $40 scanners.
Track changes (or more accurately the "new comment" feature) does work very well in the Office XP incarnation of word. Unlike previous versions, the inserted note shows up on the side of the document just like a margin note, with a line pointing to the referenced text in the document. It does look like they were trying to emulate exactly the idea of margin notes.
I have used this extensively when reading other people's documents and sending back suggestions. The only limitations I've seen vs. real margin notes is that you can't control the size of the font for the margin note (unless there is a way I'm not aware of)
Here's a (not so great looking) screenshot of the comment feature in word.
Also, look into the new beta OneNote from microsoft. I have note yet seen it, I just found it on the MS website while looking for a word comment screenshot. It looks like it's geared toward the TabletPC, but I can't tell from my brief reading if it can annotate existing documents or it's just a glorified notepad with its own file format.
Plenty of tools available now to do this, although not so many for the subset of software that most Slashdot readers use: open-source, written in C, and using fairly traditional and limited GUI toolkits.
.GIF, found on pre-PDF ubiquity on some old ZIP disks of mine.
*sigh*
As some have mentioned, you can do this with Adobe Acrobat. You can get it on Unix. And no, it is unfortuantely non-Free. Call me crazy, but in my moral scheme, I have a much higher importance on reducing the amount I waste than using the occasional hunk of proprietary (although free) software. Killing something that was alive comes before thet GPL. I know, I must be nuts.
That said, I luckily do not have to use proprietary software for doing annotation. I have a little tool written in Squeak Smalltalk for annotating documents. Namely, I can annotate HTML, PostScript and PDF right now. You can add text (less storage space) or a drawing. There's even a handy little button where you can enable and disable the annotation marks.
In PS and PDF, I cannot resave as a PS or PDF with the new layers, but I can save in a format I can later open up and read. I can also do a fresh export to GIF or PostScript (and could then use ps2pdf if I wanted to share as PDF).
The app in question would run on any platform (Squeak is actually cross-platform- don't equate this with Java), except for the current version does some calls out to libraries in OS X, namely the AppKit. This isn't really absolutely necesary, with more work, it could be written to work with both the AppKit as well as GhostScript. Someone is making progress on a pure Squeak PDF renderer, so if that becomes even more usable soon, I could ditch the usage of Mac OS X's class library and just use that...
It can also annotate a "stack" of images (PNG, JPG, GIF), but you don't often come by documents in such way. However, it was super easy to add, so I did- and there are some docs I've come across in this format, e.g., a bunch of books where each page is a
And yes, this tool is completely open source and Free. I don't have it online for download, but it was such an easy thing to write, I assumed it was not something hard to come by. If people are interested, I could prepare it for such distribution...
For the PDA...
Also, the Newton can do it. Every eBook reader on the Newton I've used (PaperBack and Newt's Cape) can do annotation. Just tap the annotation button, and it interprets what you write as a drawing to annotate. To my knowledge, neither let you do pure text annotation, which would be nice I guess- but it 's better than nothing!
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
ok, first of all - I'm not all that familiar with this - but here goes anyway. there is a W3 project called Annotea It is implemented in Amaya as annotations, which apparnelty can be stored on a remote server. It uses this RDF annotation schema and stored on a remote annotation server (the annotation server howto)
When you have created an annotation for a piece of text, there is a pencil icon next to it. Click it and the annotation appears as a popup. It appears to be a very nice concept - but I've not used it much. I assume that teh annotations could be presented inline in the document.
[Science] is one of the very few things that raises human life a little above farce and gives it the grace of tragedy.
Amaya has annotations buit in.
While I'm not about to start using amaya to surf the web and post to /. - it's fun to play with things liek the annotations. They can be stored on a remote server and shared apparently, but I've not tried that yet. All in all - I think it looks just liek what the posted was asking for.
[Science] is one of the very few things that raises human life a little above farce and gives it the grace of tragedy.
I just attached some annotations to the parent post, to demonstrate how it works. Check 'em out either with mozilla, or Anonzilla.
[Science] is one of the very few things that raises human life a little above farce and gives it the grace of tragedy.