Slashdot Mirror


How Do You Keep Track of Your Web-Based Research?

time961 asks: "I use the Web extensively to research a wide variety of topics (weird, huh?). However, much of the time I end up printing out web pages and filing them on paper, because that's the easiest way I know to say 'OK, that was interesting, I'll hold on to it until I actually do something about this topic'. Often, I'll run across something that seems relevant to a long-term project or interest and just want to grab it without even reading the details. Paper is OK for reading, browsing, and scribbling, but it's hard to search, it's heavy, and it's wasteful (and I yearn for a day when browsers can reliably print what's on the screen, instead of cutting it off at the margin because some designer doesn't understand layout!). How do others deal with organizing the results of browsing?" Bookmarks and histories aren't the answer — they're not very good for searching, the UI isn't very good for, say, adding notes, and they don't work offline. Also, stale URLs are a huge problem — a key advantage of paper is that it doesn't randomly fade out in a few days (or decades), so a good solution would have to keep copies, not just references. I imagine something like a FireFox plug-in with a 'Remember This' button and some options for category, keywords, annotations, etc., but I'll bet there are more creative approaches, too."

150 comments

  1. You want Google Notebook and a PDF printer by Anonymous Coward · · Score: 0

    That's it.

  2. PDF by daeg · · Score: 4, Informative

    First off, install a good PDF printer.

    1. Re:PDF by endianx · · Score: 1

      Wow. Can't believe I never thought of that. Can you recommend one for Linux? And probably one for Windows as well. Thanks.

    2. Re:PDF by TripMaster+Monkey · · Score: 2, Informative

      For Windows, I can recommend the following free solutions:


      Hope this helps...
      --
      ____

      ~ |rip/\/\aster /\/\onkey

    3. Re:PDF by Hognoxious · · Score: 1

      Should I buy it from from Santa Claus, or the tooth fairy?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    4. Re:PDF by Phil+John · · Score: 2, Informative

      For Windows there's either the paid route (Adobe Acrobat Suite), or you can use PDFCreator which uses ghostscript. GS used to produce really nasty looking output years ago on Windows (circa the late 90's), but that's not the case anymore.

      For linux, print to ps then use something like ps2pdf (once again GhostScript).

      --
      I am NaN
    5. Re:PDF by BruceCage · · Score: 2, Informative

      Under Ubuntu: sudo apt-get install cups-pdf

      That's it, the printer stores the PDF files by default to ~/PDF but you can change this location in /etc/cups/cups-pdf.conf.

      --
      Perfect is the enemy of done.
    6. Re:PDF by gEvil+(beta) · · Score: 1

      I know this is heresy around these parts, but the other requirements of notes/annotations are met with the full version of Acrobat. I'm sure some of the free PDF readers also support notes and comments, but Acrobat is what I have here (comes lumped in with the Adobe Creative Suite).

      --
      This guy's the limit!
    7. Re:PDF by daeg · · Score: 1

      Others recommended a few good ones. If you're in the research arena and you print to PDF and are crazy about getting your layout precisely correct, I suggest you check out PrinceXML. Full XML+CSS to PDF printing. For printing random web pages you may get away with running the HTML SGML through Tidy to produce valid XHTML which you can pipe through PrinceXML, but something like cups-pdf will probably work easier for you.

    8. Re:PDF by GiMP · · Score: 1

      In Firefox, on Linux, you can check 'print to file' in the print dialog. This will save a postscript file, which is similar to PDF, and can be easily converted.

    9. Re:PDF by thc69 · · Score: 1

      http://sourceforge.net/projects/pdfcreator/ is open source and Just Plain Works (tm). No hassle.

      --
      Procrastination -- because good things come to those who wait.
    10. Re:PDF by thc69 · · Score: 1

      I'm pretty sure that Foxit Reader supports notes and comments. It is free (as in beer, I think not open source). It is very small and independent; I don't install it, I just put the 4mb "Foxit Reader.exe" in C:\Program Files and associate it with pdf files. It's fast, too.

      Okay, I just checked, and it allows you to type over the PDF and save it but the free version leaves watermarks.

      --
      Procrastination -- because good things come to those who wait.
    11. Re:PDF by Anonymous Coward · · Score: 0

      Wow, man! That Firefox truly is a brilliant polymath!
      </sarcasm>

    12. Re:PDF by munpfazy · · Score: 1

      I often find myself printing to postscript and saving the files as well.

      But, in general, creating pictures of html pages (whether on actual paper or as postscript files) seems like a really bad idea. HTML files are small, searchable, portable, and easily transformed into other formats. Postscript files are none of these things.

      It may be less wasteful than generating paper (given some assumption about the environmental and economic costs associated with hard drive space), but an html based solution seems like a much better idea.

    13. Re:PDF by GiMP · · Score: 1

      HTML files are small, searchable, portable, and easily transformed into other formats. Postscript files are none of these things.


      Actually, PostScript and PDF are all of those things. Unfortunately, some applications will create a static image embedded in a PS/PDF, but they can certainly contain formatted text that can be searched, indexed, compressed, and transformed. In practice, PostScript and PDF have a lot in common with HTML. Although the languages are vastly different, they're functionally similar.

      1. It is not difficult to extract strings from PostScript or PDF for search or extraction.
      2. PostScript/PDF are easily (and highly) compressed. I wouldn't doubt that it is similar in ratio to HTML. Remember, PostScript embeds images, while HTML has them separated, so keep that in mind when making the comparison.
      3. In terms of transformation, PostScript or PDF are more easily and more exactly converted to image formats, while HTML has "results that vary".
      4. Postscript is just as, if not more portable than HTML. While it might not be hard to make incomplete parsers for HTML, it is nearly impossible to write a complete one. In comparison, it is relatively easy to write a COMPLETE PostScript interpreter.

      That said, they're purpose-built. PS/PDF files require a complete implementation and exact formatting, while HTML easily degrades for limited capability devices. These are design goals of the respective technologies and have their own benefits.

      It may be less wasteful than generating paper (given some assumption about the environmental and economic costs associated with hard drive space), but an html based solution seems like a much better idea.


      I agree, that in general, when archiving HTML, that maintaining the original format is preferable. After all, there is little to gain by converting HTML into PostScript/PDF other than the fact that images become embedded -- which is sometimes a benefit. Today, I use PS/PDF for static data such as receipts from e-stores as I just want a fast, easy, simple, reliable, single-file solution for that... but use the Firefox Scrapbook extension for just about everything else.

      I should additionally note that the PS/PDf "everything in one file" approach is nice when you're looking to email and/or GPG/PGP sign the document. With HTML + images, you need to create a zip or tar, and sign/send that.. and have the person on the other end extract the archive; while with PS/PDF you can simply sign the document.
  3. Media Server by Anonymous Coward · · Score: 4, Funny

    Just save your 'research' to a nice media server or something and then you can do the 'hands on' stuff once the missus has left for work innit.

    1. Re:Media Server by Anonymous Coward · · Score: 0

      +1 Funny-cos-it's-true

    2. Re:Media Server by scuba_steve_1 · · Score: 1

      The downside with this and other approaches mentioned is that they do not seem to provide a way to easily visualize, associate, correlate, cross-reference, etc.

      I have a friend who is an attorney and who performs extensive research against a wide array of source material...including the web...and he swears by Microsoft OneNote:

      http://office.microsoft.com/en-us/onenote/default. aspx

      Yes, a Microsoft product...let the flames begin.

  4. Errrr by HawkingMattress · · Score: 0

    Ever heard of bookmarks ?
    Of course one problem with them is that they can disappear or change between the moment you save them and the moment you use them. The obvious answer is to save a local copy (with wget, or whatever..) which will be easier to search than a paper... And you can still print it if you need.
    Then you can easily search though all the pages you downloaded for the one which holds the information you need, which probably takes you a long time with paper...
    Of course all those things are bloody obvious and i don't understand how they can make a ask slashdot headline. Or maybe I didn't understand something in your problem ?

    1. Re:Errrr by BosstonesOwn · · Score: 1

      Book marks are so 90's , I use Post-it notes, oddly enough I feel the need to listen to repetative music and jump over barrels. http://games.slashdot.org/article.pl?sid=07/04/14/ 2347229/ Admittedly I do a lot of web research.

      --
      This package Does Not Contain a Winner
    2. Re:Errrr by Atzanteol · · Score: 1
      From the post:

      Bookmarks and histories aren't the answer

      You:

      Ever heard of bookmarks

      You lose at slashdot!

      --
      "Ignorance more frequently begets confidence than does knowledge"

      - Charles Darwin
    3. Re:Errrr by iknownuttin · · Score: 1
      Ever heard of bookmarks ?

      I have a shit load of bookmarks. The trouble is that after a while I forget about them. There are many times when I want look something up, Google it, and then bookmark it. When I look in my bookmarks, I'll have the exact same page bookmarked multiple of times. Maybe something in Firefox one day that'll tell you that your bookmarking something again? Utility to weed out dupes? Of course, in some cases I cross bookmark items because they fit in a couple of categories. Such as, 'MySQL' info in "Programming/Database" and in "Web Development"

      --
      I prefer Flambe as apposed flamebait.
    4. Re:Errrr by XenoPhage · · Score: 1

      You lose at slashdot!

      Lose? I think not.. He's taken the slashdot evolution to the next level and doesn't even bother to read the summary..

      --
      XenoPhage
      Technological Musings
    5. Re:Errrr by XenoPhage · · Score: 3, Informative

      Maybe something in Firefox one day that'll tell you that your bookmarking something again?

      Ask and ye shall receive!

      http://bookmarkdd.mozdev.org/

      Or the Mozilla Addons page for it :

      https://addons.mozilla.org/en-US/firefox/addon/155 3

      --
      XenoPhage
      Technological Musings
    6. Re:Errrr by brusk · · Score: 1

      What do you mean "next level"? We already have lots of slashdotters whose approach is:

      1. Don't bother to read article title.
      2. Make lame inside joke.
      3. Get modded +1 funny for no reason.
      4. In Soviet Russia, ??? profits from you!

      The "next level" beyond that would be replying to comments without even reading them. Oh wait, people already do that all the time.

      --
      .sig withheld by request
    7. Re:Errrr by B'Trey · · Score: 1

      What I want is a tool that indexes every page I bookmark. (Better yet, indexes either every page I visit or put an "Index" button on the toolbar, or, best, make it user configurable.) Then I could search through only the pages I've visited to find information I know I've seen but can't remember where. It doesn't seem like this would be overly difficult to implement as a FF extension.

      --

      "The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.

    8. Re:Errrr by arachnoprobe · · Score: 1

      5. People who waste their time complaining about it instead of apply custom-made moderation rules.

    9. Re:Errrr by JustNilt · · Score: 1

      Damn. Funny AND insightful. We need the ability to mix points ... (+.5 Funny and +.5 Insightful).

      --
      You know the thing about UDP jokes? I don't care if you get it or not.
    10. Re:Errrr by jp10558 · · Score: 1

      AmDeadlink is useful here in general, with cleaning up bookmarks.

      --
      Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
    11. Re:Errrr by DeanPentcheff · · Score: 1

      In theory, Beagle does this for you. To quote: "Beagle is a Linux desktop-independent service which transparently and unobtrusively indexes your data in real-time."

      In practice, when I tried it out (quite some time ago, though), I found the indexing to be a bit quirky and inconsistent. There's been lots of time for improvement since then, so I'd recommend giving it a try.

    12. Re:Errrr by B'Trey · · Score: 1

      It appears that Beagle only indexes local files (your home directory by default.) The FAQ includes this:

      Does Beagle support Mozilla Thunderbird?

      Beagle is no longer built with default Thunderbird support, as of version 0.2.15. (Support for indexing email, news, RSS, and addresses had originally been added in Beagle 0.2.8, but was removed due to memory issues.)


      I'm currently looking at setting up a homebrew system using htdig. I run my own server, so in theory I should be able to run htdig on it, and run a custom service which accepts a URL as an argument and calls htdig to index that page. Then all that's necessary is to create a simple Firefox extension to pass the current URL on to the server ... Shouldn't be too difficult to hack out something like that, right? Damn, there goes my weekend.

      --

      "The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.

  5. Hey, it's the 21st century by edittard · · Score: 1

    Hey, it's the 21st century - put them all in your blog.

    --
    At the bottom of the /. main page it says 'Yesterday's News'. Well they got that right.
  6. Basket by auxsvr · · Score: 1

    You can use basket of the KDE PIM package. It allows you to organize bookmars, text, images and other data effectively and consistently. It's like a sticky notes program, though with much more functionality and it allows to store and retrieve information very quickly. It also saves automatically and may have a few very disturbing bugs (I think the major one is a Qt bug), yet it's definitely worth enough for me to use it every day.

    1. Re:Basket by bcmm · · Score: 1

      What bug is this?

      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
    2. Re:Basket by auxsvr · · Score: 1

      It crashes kontact as soon as I create a basket or a sub-basket (basket-0.6.0-26 on openSUSE 10.2) and has several problems with the focus of the boxes that contain the information, some text appears twice at the bottom (I'm able to scroll past the window contents, instead of a blank area there appears text from the previous box, which makes things very confusing).

  7. Zotero by Fruny · · Score: 3, Interesting

    I imagine something like a FireFox plug-in with a 'Remember This' button and some options for category, keywords, annotations, etc.
    Sounds like Zotero is what you're looking for.
    1. Re:Zotero by pragma_x · · Score: 1

      Thank you. I was hoping to find some nifty gems in this thread.

    2. Re:Zotero by Coan_teen · · Score: 2, Informative

      Zotero was developed at my alma mater, and we were the guinea pigs for it. The program has improved quite a bit since its early stages. It still sometimes has trouble recognizing that something is research, but in the instances where Zotero doesn't automatically give you the choice to copy the citation you can make a snapshot of the page. It's a nifty little add-on. The only problem with it is that you can't carry your research history from one machine to another like you can with the Google utility. The solution suggested to us by the Zotero Evangelist (yes, that's his job title, I love it) was that we install Firefox on a flash drive and carry the whole program around with us. They're working on a more viable option.

      --
      A Sherman can give you a very nice...edge.
  8. Re:PDF (printer) by Jeff+DeMaagd · · Score: 1

    I agree. It's very handy to have. I keep records of my online bill payments that way too. It might not do away with the formatting problems though.

  9. Bookmarks by knewter · · Score: 1

    I myself have an extensive 'drawer' of bookmarks. I've installed the TinyMenu extension for firefox, and placed the bookmarks toolbar folder on the same row as the menu was on prior to that extension. I've then got top-level folders across the entire browser, which each contain a highly nested / hierarchical structure. A sampling of my top level folders: Make (for hardware hacking related stuff), tools (where I keep various 'useful every three weeks' links), Queue (stuff I need to get to at some point), studies (links to lots of OpenCourseWare courses, in areas I want a refresher), Development (links to the various useful sites I've found. For instance, deep in there is a folder for all the Rails Plugins that I want to keep an eye on). I then use Deskbar (GNOME, or Launchy on Windows, or Quicksilver on OS X) to give me keyword access to these bookmarks. So when I want to upload photos to my flickr account, I "Alt+F3 Upload [Enter]" and I'm at the multi-upload page.

    A PDF Printer is important for longevity of articles, but I think a proper bookmarking system has to be in place first, and I think most people get this horribly wrong.

    --
    -knewter
  10. Seriously? by pla · · Score: 2, Informative

    File -> "save page as" -> "web page, complete".

    You can either keep what you save in some sort of logical arrangement, or trust your handy desktop search engine to find it for you later (though that seems to reduce the problem back to finding the info in the first place, though at least you don't need to worry about the content going offline at some future date.

    1. Re:Seriously? by WuphonsReach · · Score: 1

      I have two places I save information.

      I either e-mail it to myself (or my working group) or I'll blog about it. It's a very low-down and dirty system, but the search tools in my mail clients are good enough to let me find things as long as I know the year (I archive by year).

      Although the Scrapbook extension for Firefox sounds very intriguing.

      --
      Wolde you bothe eate your cake, and have your cake?
  11. Easy by aadvancedGIR · · Score: 5, Funny

    Just write to your ISP pretending to work for one **AA and you'll immediately get a complete list of your activities. As a bonus, you can also use that to terminate your subscription without the 2 mounthes notice.

  12. What's wrong with Bookmarks? by Andy_R · · Score: 0, Flamebait

    ...or 'favourites' if you haven't switched to Firefox yet.

    Use folders and subfolders to organise them, and if you really honestly have an unmanageably large number of bookmarks that you couldn't possibly just google again later, cut and paste from the bookmarks file to any kind of saveable text document.

    --
    A pizza of radius z and thickness a has a volume of pi z z a
    1. Re:What's wrong with Bookmarks? by MyLongNickName · · Score: 1

      Errrrm... pages change... sites become unavailable. I think the poster wants to make sure he can access the exact same information he came across two years ago.

      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
  13. Google Desktop, PDF, directory organization by rfunches · · Score: 1

    Google Desktop Search and PDF. GDS does the indexing, PDF preserves the original page.

    A good use of directories for organizing helps to avoid "lost" files from floating around. I use this for research papers and projects.

    1. Re:Google Desktop, PDF, directory organization by WuphonsReach · · Score: 1

      Google Desktop Search and PDF. GDS does the indexing, PDF preserves the original page.

      The last time I tried Google Desktop Search, I found it to be useless if you had more then a few hundred items to be indexed. (Think in the range of tens of thousands of files.)

      --
      Wolde you bothe eate your cake, and have your cake?
  14. Copy paste by sherriw · · Score: 1

    I often copy all the relevant text from several sites on a topic I'm researching... paste it into a text document then save it to my hard drive. I save pics that way too. Instead of grabbing the whole page. Makes for easy printing if I want a hard copy.

  15. PDF! by megabyte405 · · Score: 3, Insightful

    Any time someone mentions how they don't like having papers around but want a hard copy, my response is immediately, print it to PDF! Your operating system should be able to do this :) Linux firefox, print to generic printer to a file named something.ps, then run ps2pdf on it, in just about every other GNOME app PDF support is built in to the print dialog. Mac OS X, well, you already knew you could save PDF (or save the preview, same diff) from your print dialog. Windows: www.sf.net/projects/pdfcreator is your friend - just don't install their toolbar (the existence of which makes me rather sad). Then, you've got the page (or whatever) archived in a nice, portable, paper-like file, and when desktop search is ready for the masses (if you're not on a Mac), you'll even be able to search it - much better than paper!

    --
    I recognize people by their sigs. Is that a bad thing?
    1. Re:PDF! by i.r.id10t · · Score: 1

      Or set up a fake printer script that takes a postscript stream and converts to pdf automatically. Had a bookmark, had to go to the wayback machine, but here it is...

      http://web.archive.org/web/20011217172330/http://w ww.linuxgazette.com/issue72/bright.html

      --
      Don't blame me, I voted for Kodos
    2. Re:PDF! by megabyte405 · · Score: 1

      That looks like an interesting network solution for Windows, though I'm not sure if it has any advantages over a locally-installed PDFcreator. On linux, to get around the firefox weirdness I think also "CupsPDF" does the trick too.

      --
      I recognize people by their sigs. Is that a bad thing?
    3. Re:PDF! by i.r.id10t · · Score: 1

      Neat thing abotu the network printer is that it can also send faxes, since faxes are simply PS/TIFF images that are sent via modem. Did some neat shell scripting, etc. to setup a fax server for a local insurance office - 2 years, 65 thousand faxes in and out, no issues except when they loose power and forget to turn the machine back on...

      --
      Don't blame me, I voted for Kodos
  16. Mindmapping by jslalleman · · Score: 1

    I would use a mindmapping program (freemind in linux for instance or mindjet mindmanager in windows). This will give you the possibility to organize websites, notes, scans and more sources in one overview. Check more info on this via the following links: http://en.wikipedia.org/wiki/Mindmap http://www.mapyourmind.com/index.htm http://www.mind-mapping.org/mindmapping-and-you/ba sic-introduction-to-mindmapping.html good luck!

  17. Have you tried Acrobat? by griffon666 · · Score: 1

    Acrobat has a feature called "convert web pages to PDF" from within Acrobat that is quite useful to archive websites digitally while preserving the formatting and keeping things searchable (with OS X Spotlight, for example). When you install Acrobat, Internet Explorer 6.0 (or higher) even gains an Adobe PDF toolbar that you can use to generate PDF from within IE. I guess most Slashdotters use different browsers, but at least on OS X you can easily print to PDF natively.

    1. Re:Have you tried Acrobat? by Constantine+XVI · · Score: 1

      Or better, get the PDFCreator program for Windows, which installs a PDF printer to "print" PDF from anything. As an added bonus, it's GPL, which should please the OSS zealots here. If you're not on Windows, GNOME, KDE, and as parent mentioned, OSX have native PDF facilites

      --
      "I think an etch-a-sketch with an ethernet port would beat IE7 in web standards compliance."
  18. Research trails by nenya · · Score: 1

    As a member of the legal profession, I do a large amount of research online. My site of choice is Westlaw, produced by the largest legal publisher in the country (and thus probably the world). They have a feature on their website they call "Research Trails", which keeps a record of your navigation each time you log in. The list is fully linked, so you can access any document on the list easily. You can not only see what you looked at but see the order in which you looked at it, which helps in reconstructing thought patterns. This is a dramatically helpful feature for any research site, and they are to be commended for implementing it.

    Their major competitor, LexisNexis, has a similar feature.

    I know this doesn't help much for general-purpose multi-site research, but I can't say just how useful this feature is for a single site. I would recommend other site developers to create similar functionality as soon as convenient. Imagine how useful this could be for Wikipedia.

    On second thought, scratch that. Sometimes I really don't want to know how I got from point A to point B on that site.

  19. PDF/Annotating by xtracto · · Score: 1

    I will have to agree with other people saying that PDF *is* the way to save web pages for future reference (i used to use MHT but it is propietary and you cant add notes).

    For the annotations I would suggest the FoxIT PDF reader (free) and buy the Pro Pack [US$40 ](one of the few softwares I have found so useful and at good price to actually buy) which will allow you to add annotations and mark the text among other things.

    I will use this post to ask if anyone knows of an open source alternative to this the ProPack that lets you add comments, marking and other basic editing features. I would think that is something *lot* of people want.

    --
    Ubuntu is an African word meaning 'I can't configure Debian'
    1. Re:PDF/Annotating by smurfsurf · · Score: 1

      Skim PDF reader is a free reader that allows annotations, highlighting and other markings in a PDF. Works well. http://www.tuaw.com/2007/04/02/skim-pdf-reader/

  20. Save Page as ... by ShelfWare · · Score: 1
    Find somewhere to file it locally and do a save page as. Most browsers will save all the images, links, etc, intact.

    Then just get Google Desktop or something similar to index those.

    1. Re:Save Page as ... by vrmlguy · · Score: 1

      Saving pages isn't automated. Instead, set up a personal proxy server that never purges its cache, and use it to surf the web. Google desktop (or something similar) can be used to index the proxy's cache in case you can't remeber where you saw something. Finally, rdist the cache to a central location so you aren't tied to a single computer, and to protect against disk crashes.

      --
      Nothing for 6-digit uids?
  21. Recommend good free PDF printer? by sherriw · · Score: 1

    Can anyone recommend a good PDF printer driver application so people who can't afford Acrobat can still print to PDF?

    1. Re:Recommend good free PDF printer? by patelbhavesh · · Score: 3, Informative

      PDFCreator is a free open source pdf printer http://www.pdfforge.org/products/pdfcreator

    2. Re:Recommend good free PDF printer? by MyLongNickName · · Score: 2, Informative
      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
    3. Re:Recommend good free PDF printer? by porcupine8 · · Score: 1

      Acrobat is $300 - you can get an older Mac running 10.1-10.3 for less than that. :) (Yes, OS X can print anything to PDF.)

      --
      Warning: Apple/Nintendo fangirl. Likes her electronics cute & cuddly. May be rabid.
    4. Re:Recommend good free PDF printer? by WoTG · · Score: 1

      Ditto. I've been using pdfcreator for a few years now. Big bonus being that it's open source. It's also got a network server version that you can share with Windows and it can put the resulting PDF in each users home folder. Quite nice.

  22. Take a look at the ScrapBook Firefox extension by BruceCage · · Score: 5, Informative

    I imagine something like a FireFox plug-in with a 'Remember This' button and some options for category, keywords, annotations, etc., but I'll bet there are more creative approaches, too."
    ScrapBook is a Firefox extension created by Gomita (some Japanese fella), it allows you "capture" web pages, creating a locally stored cache and offers the ability to easily remove content from the captured web page, mark sections or add notes. It also has a whole bunch of tools such as full text search and a pretty intuitive interface.

    You can find all the features in a nice list at the official homepage with tons of pretty screenshots. There's even a 50 page manual (PDF) created by Andrew Giles-Peters.

    Even though development has seemingly halted since December 2005, it's still one of the most well rounded extensions for Firefox I've come across yet.
    --
    Perfect is the enemy of done.
    1. Re:Take a look at the ScrapBook Firefox extension by brusk · · Score: 2, Interesting

      I second this. I MUCH prefer Scrapbook to PDF saves, which I used earlier, because Scrapbook preserves all the original HTML and the format of images (whereas PDF converts them and makes them hard to separate out), is also searchable/indexable by whatever indexing program you want, and can be highlighted, annotated, etc.

      Let's just hope they keep developing, at least enough to ensure that it continues to work with future releases of Firefox. My sense is that they are, given that the developers blog at http://www.xuldev.org/blog/ is active and indicates that they're looking at Firefox 3 issues.

      --
      .sig withheld by request
    2. Re:Take a look at the ScrapBook Firefox extension by Jonah+Hex · · Score: 1

      ScrapBook 1.2.0.8 - Released on Dec 15, 2006
      You're only a year off ;)

      Jonah HEX

    3. Re:Take a look at the ScrapBook Firefox extension by GiMP · · Score: 2, Insightful

      I also second this (me too!)

      In years past, I used PDFs, but since 2003, I have been using scrapbook.

      Personally, I use it for vacations and business trips. When I'm on on the road, I just 'scrapbook' important pages (like Google map directions) and when I need to pull something up, I just open the laptop. Now, on the other hand, its a lot easier to pull the PDF files over to your PDA...

      Now-a-days, I use this less frequently due to the rise of high speed cellular internet, but its still extremely useful for times that I leave my coverage area.

    4. Re:Take a look at the ScrapBook Firefox extension by BruceCage · · Score: 1

      In all the excitement I made a couple of errors in my post (to "capture" web pages, a locally stored cached version), I saved the post in ScrapBook and corrected them but alas it has no use :) May the grammar and spelling Nazis have mercy on me today.

      --
      Perfect is the enemy of done.
    5. Re:Take a look at the ScrapBook Firefox extension by DavidTC · · Score: 1

      I was sitting here reading all the comments waiting to see if someone had mentioned that, because I was going to if they hadn't. Scrapbook is amazing, and what wasn't mentioned is that you can use file shares to put your scrapbook on, and have them accessible on multiple computers.

      Something that wasn't mentioned: The pages are normal HTML pages, and stored wherever you want, so they get indexed by whatever search tool you have on your computer, like GDS...and if you open them back up in Firefox, even without using the Scrapbook interface, you get all the editing tools and the original URL and all the stuff you need. (Also, it comes with the ability to search itself.)

      It not only lets you capture a page with images and stuff, it lets you crawl sites and capture all of them. And you can also capture tiny parts of pages.

      And if you have multiple pages, say four pages of a single article, you can merge them together, which, along with the aforementioned removal of elements, can make nice single page articles for reference later, and, of course, Scrapbook remembers where they all are from, even in merged pages.

      It was designed for people in school to be able to find information later, to be able to take a web page, trim it down to the information they need, and then cite it's original URL, but I'm not in school and I find it very handy.

      It eventually results in a context shift in bookmarking: You start scrapbooking information you need later, and only bookmarking places that you think will have useful information in the future. I run across a 'How to do something complicated in PHP', I don't bookmark it, I scrapbook it.

      It doesn't have any internet sync, just the ability to keep a store on a network drive, which seems to work okay with offline sync in Windows XP Pro. There's an extension for box.net that lets you import and export to box.net, but it isn't automatic and not incredibly useful.

      That, and the ability to check if certain pages had changed, would be very useful, but other than that it is awesome.

      --
      If corporations are people, aren't stockholders guilty of slavery?
  23. Word Processor by 0311 · · Score: 1

    Today's batch of Word Processors (not your simple notepad and editor software) is a pretty good bunch, by and large, and most, if not all, will take the HTML page and nearly faithfully reproduce it's content. Then store it in a topic named hierarchy of folders. Now it is organized, searchable and backup-able. Furthermore, all of the modern word processors I am aware of allow you to annotate the content and track your changes. Voila!! Simple solutions are really best.

  24. Scrapbook by nahgoe · · Score: 1

    You need the Firefox extension https://addons.mozilla.org/en-US/firefox/addon/427 .
    Helps you to save Web pages and organize the collection.

  25. Randomly enough by Anonymous Coward · · Score: 1, Insightful

    Microsoft has the answer to this one. One Note. It's absolutely magnificent for stuff like that. There are some other programs which take similar stabs at the same problem, Treepad, Infomagic, and, of course, Google Notebook. But One Note wins this one walking away.

    1. Re:Randomly enough by adonoman · · Score: 1

      I'll second this - OneNote is just awesome for grabbing stuff from everywhere, and organizing/searching it. It'll even OCR any images so that you can search them.

    2. Re:Randomly enough by Anonymous Coward · · Score: 0

      As much as I hate to say this - I have to agree with the two previous posters on this one. One Note is one of the most useful apps for the dollar that I have found in a long time!

  26. I wget it! by VE3OGG · · Score: 3, Informative

    wget is probably one of my favourite Linux command-line tools. All I need to do is wget -r http://www.doodahdoo.com/ and it saves a directory called doodahdoo.com and all the pages in it, as well as the images, and any embedded video and such. This is very handy, not only for getting a huge number of files (say my http backup server), but also for getting entire sites that I might have a use for in future.

    At the moment, I have on order of 10GB just of websites, radio clips, and what have you that I have used for previous research. Not only that but I can also maintain a simple directory structure and never have to worry that that "firefox plugin" will still be compatible with version 4.765.

    Another neat function is you can specify just a particular files (www.whatever.com/pic.jpg), or all the files with a particular extension *.jpg, or only the files in that directory. You can also use it to spider (limited) all the links on a site. Though be kind and don't do this too often, as I am sure it eats a lot of bandwidth.

    The last (and greatest) thing, is it remains in a well-known and easily editable format.

    Alternatively, I have also used a MediaWiki setup so that I could drop down notes for classes, or other interesting things in it, but this required substantially more overhead than wget.

    1. Re:I wget it! by Blakey+Rat · · Score: 1

      I don't get the Linux connection. There are a thousand utilities that can do this on Mac and Windows as well. They're called "offline browsers," and they've been around since the mid-90s. My personal favorite at the moment is SiteSucker for Mac, although the older non-WebKit version seems to do a better job than the newest version, go figure.

      In any case, it's not a flawless method because there are many sites that can't be downloaded in whole, due to them using Javascript links or dynamic content that confuses the downloader and will miss files. Of course, you're really no worse-off than printing, since browsers suck at that as well.

    2. Re:I wget it! by huckda · · Score: 2, Insightful

      good luck if it's a ruby on rails site...and possibly any other database driven site dynamically created
      you pretty much only get what is in the 'public' directory
      stylesheets, javascript, images...

      --
      "Just Smile and Nod." --Huck
    3. Re:I wget it! by Blakey+Rat · · Score: 1

      Also, just to be a jerk, I'll mention that it's bone-stupid to use WGET to get a single page considering every browser on earth has some kind of "save complete page" option right there in the File menu that also localizes links, downloads images/swfs/etc. In Safari, it's called "Web Archive." In Firefox, it's "Webpage (Complete)". In IE, it's "Web Archive, single file".

      Since you're already viewing the site in a browser, why would you LEAVE the browser to go to a CLI to do something the browser already has built-in? Goofy.

    4. Re:I wget it! by ksheff · · Score: 1

      The person could be using a browser that doesn't have that capability and doesn't want to upgrade.

      --
      the good ground has been paved over by suicidal maniacs
  27. A personal wiki? by AlHunt · · Score: 2, Interesting

    I've been struggling with this myself, to a point. How about a personal wiki, such as Didiwiki, that runs locally?

    I also save web pages as "Web Page, Complete". It now occurs to me that I should make a specific directory for those pages.

    --
    1 in 4 Maine children in struggle with hunger.
    1. Re:A personal wiki? by stonertom · · Score: 1

      The other personal wiki that i know of but forgot the name of, search for notepad on crack. Sorry about lack of name, maybe whoever had it in there sig could reply?

      --
      Shameless plugs and inaccessible site design FTW! - www.mistletoestreetmusic.com
    2. Re:A personal wiki? by Anonymous Coward · · Score: 0

      This is exactly what I do, I run Mediawiki on my personal webspace and put all my personal research on there.

    3. Re:A personal wiki? by Anonymous Coward · · Score: 0
    4. Re:A personal wiki? by AlHunt · · Score: 1

      Zulupad, I think. It's a windows and mac application only. http://www.gersic.com/zulupad/

      --
      1 in 4 Maine children in struggle with hunger.
    5. Re:A personal wiki? by mingrassia · · Score: 1


      >> How about a personal wiki, such as Didiwiki, that runs locally?

      I have been using TiddlyWiki for a while now and absolutely love it. No server or special setup required, just load the single file in my browser and start using it. I have several private wikis that I use regularly to keep track of multiple projects (both personal and for work). Best part is that I can move from working on my Linux box to OS X to (gasp) Windows and always have my information available.

      --
      OS X, Linux, Tivo, Amiga, my fascination with cult-like technologies would intrigue any psychiatrist.
  28. DEVONthink by Finque · · Score: 2, Informative

    http://www.devon-technologies.com/products/devonth ink/

    Using a good PDF exporter (I'm on OS X, so look elsewhere for free & easy ways to do this on Windows), DEVONthink will pretty much keep everything organized like a digital filing cabinet.

    'Course, the cheapest version costs $39.95, but I can attest to the fact that this software WORKS (I got it heavily discounted in the MacHeist 2006 bundle).

    1. Re:DEVONthink by Finque · · Score: 1

      ...And now as I belatedly read the product webpage, I see it's Mac OS X only. Sorry if that doesn't work for you =\

      To anyone using OS X and looking for a solution to submitter's problem however, I highly recommend this software. Try before you buy, they give 150 hours of runtime for the app during trial.

    2. Re:DEVONthink by TripMaster+Monkey · · Score: 1

      Free & easy ways to do this on Windows:


      Hope this helps...
      --
      ____

      ~ |rip/\/\aster /\/\onkey

  29. Yojimbo by smurfsurf · · Score: 2, Interesting

    I quite like Yojimbo http://www.barebones.com/products/yojimbo/

    You can either save a "web archive", which is the web page incl. all graphics/css/etc., or a PDF of the page (nicely integrated into print services). Both document types are rendered inside the app and are searchable. Yojimbo has also tags and folders to keep things organised. And you can also save regular notes (formated and with images). Covers all bases.

    When it comes to pure PDF, YEP http://www.yepthat.com/ is an excellent alternative. Kind of the iPhoto of PDF.

    1. Re:Yojimbo by jpkunst · · Score: 1

      I second the Yojimbo recommendation. A few more points: A third item type for dealing with web material in Yojimbo is regular bookmarks, which can have the same tags and/or labels as every other Yojimbo item. Yojimbo uses an SQLite database to store all its data, so you can't use the Finder to get at Yojimbo's PDF's or Web Archives.

      JP

  30. The problem is the media by Aaron_Pike · · Score: 1

    The problem is that the Web is stored electronically. Oh, if only there were some way of storing electronic data in some sort of non-volatile format. If only we could take a File that is a web page and Save page as... something.

  31. CutePDF vs PDFCreator? by sherriw · · Score: 1

    I personally lean toward the open source option... but how do CutePDF and PDFCreator stack up against each other in terms of stability, features and bugginess?

    Thanks for the suggestions by the way!

    1. Re:CutePDF vs PDFCreator? by jp10558 · · Score: 1

      Personally, at work we used to use CutePDF and switched to PDFCreator. One because PDFCreator didn't have a link asking us to buy an upgrade, but also because it supported encryption/PDF limitations, and a psudo print queue so you could combine prints from multiple apps into one PDF file. Of course, I added on PDFTK Bulider for PDF Split/Merge as well.

      --
      Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
  32. Suggestions by Arab · · Score: 1

    I used to email myself a link to a page when I found something interesting. The email account I used for that is so clogged up I had to stop using it. Now I've installed the del.icio.us plugin for firefox I just use that you tag pages by topic so you can just look through all the pages you have tagged with a particular topic.

    On the subject of PDF printing I used to do that too but my hard drive got clogged up with a bunch of stuff I would never get round to reading. Cute Pdf is free for windows, in Linux print to file and use pstopdf or a similar too, I'm sure there is a print to pdf tool as well I've never used one though...

  33. Pathway by smurfsurf · · Score: 1

    > I would recommend other site developers to create similar functionality as soon as convenient. Imagine how useful this could be for
    > Wikipedia.

    Pathway does just that for Wikipedia, it is great :-) http://pathway.screenager.be/about-pathway/

  34. New: Google Notebook by kestasjk · · Score: 5, Informative
    Something that recently came out of Google and is ideal for this task; Google Notebook. You find sites with Google, now you can take notes from them with Google, and it integrates nicely into Google search. Unlike bookmarks you can search the notes you take and have the URLs ready and waiting, etc.

    1. Why would I want to use Google Notebook?

    With Google Notebook, you can browse, clip, and organize information from across the web in a single online location that's accessible from any computer. Planning a trip? Researching a product? Just add clippings to your notebook. You won't ever have to leave your browser window.

    2. How do I get started?

    Simple. Just sign in to the Google Notebook homepage with your Google Accounts username and password, then download the Google Notebook browser extension (if you haven't already). As soon as you restart your browser, you'll see a Google Notebook icon in the bottom-right corner of your browser window. Click on this icon to open your mini Google Notebook, where you can save all the clips of content you want.
    --
    // MD_Update(&m,buf,j);
    1. Re:New: Google Notebook by Anonymous Coward · · Score: 1, Interesting

      Am I the only one getting a bit sick of Google Everything?

    2. Re:New: Google Notebook by XenoPhage · · Score: 1

      Google Notebook is on-line and depends on an outside source. So I have no real control over it... Nor can I access it offline.

      That said, it does look pretty interesting, so I'm gonna give it a whirl.. :)

      --
      XenoPhage
      Technological Musings
    3. Re:New: Google Notebook by biohack · · Score: 2, Informative

      I was surprised not to see Google Notebook as one of the first answers, as it indeed works very well for organizing material found on the web. I guess, Slashdot is less of a Google fan club than many people assume it to be!

      The FF extension makes saving "permanent" pages easy via a right-click option. For pages that may become inaccessible over time, the content of interest can be copy-pasted directly into the Notebook entry. And Google search options coupled with the possibility of creating multiple Notebooks (and sections within each Notebook) make sorting and reorganizing notes very straightforward.

    4. Re:New: Google Notebook by ReidMaynard · · Score: 2, Funny

      My browser must be borken, I could not locate the Google Everything page.

      --
      -- www.globaltics.net

      Political discussion for a new world

    5. Re:New: Google Notebook by xtracto · · Score: 1

      For quick annotating pages while browsing I use an extension called InterNote. It is part of the small details why I cant switch from Firefox to anOther PossiblE browseR Anyday.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    6. Re:New: Google Notebook by aussie_a · · Score: 1

      I discovered it a few days ago, I haven't found I needed it yet unfortunately. I thought it would be a great solution to a problem, but it seems the problem hasn't surfaced since I installed it.

    7. Re:New: Google Notebook by kramulous · · Score: 1

      Used it, didn't like it.
       
      This has been shitting me of for some time and I'm not an organised person. The important stuff I still use Endnote for. It is the only one that is freely available for me.
       
      After saying that, I *have to* stop using Thunderbird because the Uni I work for is in bed with Microsoft. Must now use Outlook.
       
      So please disregard the Endnote comment :(
       
      Current state of disgruntle over.

      --
      .
    8. Re:New: Google Notebook by kestasjk · · Score: 1

      Sick of fresh, professionally designed, platform-independent, free Web 2.0 applications?

      --
      // MD_Update(&m,buf,j);
    9. Re:New: Google Notebook by jp10558 · · Score: 1

      I'm guessing it's because you need a google account to try it out/use it... Plus it looks like it doesn't work in Opera, so cout me out anyway.

      --
      Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
    10. Re:New: Google Notebook by NickFitz · · Score: 1

      I was surprised not to see Google Notebook as one of the first answers

      Given that the very first response has the title

      You want Google Notebook and a PDF printer I'm surprised at your surprise .

      (OT, but why did the guy who created Slashdot's CSS format inline <quote> as a block element? That's what <blockquote> is for.)

      --
      Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
    11. Re:New: Google Notebook by Anonymous Coward · · Score: 0

      Sick of fresh, professionally designed, platform-independent, free Web 2.0 applications?

      Yes, frankly. It's tiresome.

    12. Re:New: Google Notebook by Anonymous Coward · · Score: 0

      Not to worry then, Microsoft has you covered for slow, shitty, proprietary, Windows-only Web 0.1 web applications. Enjoy!

  35. Webforia Organizer .... by XenoPhage · · Score: 1

    Back in the day there was this cool little program called Webforia Organizer. I somehow wound up on the Beta team for it and got to use it extensively. This program was really cool, it clipped pages, kept local copies, was searchable, etc. I loved it. Unfortunately, it was built on IE 5, but then again, Firefox wasn't released back then...

    Apparently Webforia went out of business some time ago and the software no longer works.. I believe it had limited functionality with IE 6, but not enough to make it worthwhile.. No clue if it would even work with IE 7...

    I still have my copies... I really wish it worked. I had amassed a huge database of research that's basically useless now.. (although, since it clipped them as web pages, I supposed I can, technically, view them... But the names were based off GUIDs, so identifying the pages is a little rough...)

    --
    XenoPhage
    Technological Musings
  36. Bookmarks plus a whiteboard by Metasquares · · Score: 1

    I'm very careful about managing my bookmarks, only adding what I'm actually interested in at a given moment and removing the link once it's gone. Since "the literature" required for my research primarily consists of journal and conference publications, the locations of which are fairly immutable, I don't usually worry about the URLs becoming invalid.

    If I get any "aha" ideas while reading these papers, I record them in a whiteboard or notebook. Eventually, I have the paper distilled to three or four of these and I no longer need to read the paper to think about the ideas presented therein.

    Basically, if you manage your bookmarks well and take good notes, that's all you need :)

    I'm a Ph. D. student in Computer Science with an INTJ MBTI type. YMMV, depending on profession ("research" means different things to different people) and personality ('P' types tend to organize themselves differently).

  37. Evernote by blighter · · Score: 2, Informative
    I use Evernote: http://www.evernote.com/.

    It's a program that allows you to easily save a copy of just about anything (certainly anything on the web...) with links to the original and everything else. The notes are automatically stored in chronological order for browsing. You can also apply tags to your liking and it has full search capabilities as well. It's free for the regular version, if you want to import handwritten notes and have them be searchable as well there's a charge.

    It's awesome and I think fits your needs exactly, or at least I use it to meet the needs you described and I've had no problems with it.

    Now if I could just force myself to go back and do something with the research later...

    P.S. There's a writer in The Atlantic named James Fallows who has a column on useful technology tools. That's where I first learned of Evernote. He had several other suggestions to fit the bill in that column and more generally, he's usually worth a read.

  38. Dual support by advid.net · · Score: 1
    • http://del.icio.us/ for sites of interest
    • GMail drafts for urls along with results and saved documents
  39. Firefox Bookmarks by Tronster · · Score: 2, Interesting

    Bookmarks and histories aren't the answer -- they're not very good for searching, the UI isn't very good for, say, adding notes, and they don't work offline. Also, stale URLs are a huge problem
    I agree with all of the shortcomings time961 posted, but despite these I have personally found bookmarking to work rather well for my projects. The pipeline is like this...

    In my bookmarks folder I have a "Projects" folder.
    Within my "Projects" folder I have an alphabetic listing of folders with each project's name.
    If the project is small, I fill it directly with book marks. I do take the time to add notes, because if the URL does go stale, the notes will let me know what I'm now missing. More often than not, missing information can be replaced in the future with another URL that has the same or more up-to-date information. Additionally Google Desktop searches my bookmarks file, so I just double-click ctrl and can search via keywords that way.

    This whole setup is a bit of a hack, but it's worked. I'm hoping either Firefox 3.0 will have a fantastic bookmark manager or a plug-in author creates something truly wonderful for the existing bookmark system.
  40. Opera Notes by Gorgeous+Si · · Score: 2, Insightful

    In Opera you can select some text in a webpage, then right-click and select "Copy to note" (Shift-Ctrl-C). Notes are stored in a panel, and double clicking a note will load the webpage it came from. Handy.

  41. Zotero by titchy · · Score: 1

    great research manager, just type zotero in google.

  42. Re:PDF (Mod up parent) by Drubber · · Score: 1

    This looks great. Thanks for posting.

  43. MediaWiki, plus my own pico-Google by pestie · · Score: 1

    I run a full-blown install of MediaWiki on a small server behind my firewall. I wanted to learn MediaWiki markup and I thought it would be a useful tool for organizing and annotating all the crap I come across on the web that I'm going to want to find later.

    I also wrote a sort of pico-Google in PHP/MySQL a couple years back, and I still use that regularly. It's a sort of searchable bookmark database. I feed it a URL, it goes out to the page and sucks down all the text, normalizes it, and breaks it into keywords. It then stores the keywords in the database. It's got well over 3,000 pages in it at this point and even on my little 1 GHz machine with 512M of RAM, it hauls ass. I used to have a separate component that went out and checked each link every night to see if it had moved or changed, but I gave up on that part when I decided the whole thing needed a rewrite anyway. And, as is typical for these hobby projects, I haven't yet gotten around to it. I want to implement multi-word text-string searching (i.e. searching for "a string of words in quotes"), a few Google-esque functions like inurl:, and make the interface not look like total crap the way it does now. Maybe someday...

    So, at this point, if there's information I consider especially worth saving or looking at, I dump it into my personal Wiki. If it's something I just think I might want to use later for some reason, I throw it in the bookmark database.

  44. Google Everything?! by Anonymous Coward · · Score: 0

    Where do I get THAT?

  45. Clever cut&paste by january · · Score: 1

    The most annoying part of web-based research was for me always copy & paste. Each month I am doing a literature digest from my scientific field, which requires me to copy titles, abstract, urls of selected articles. And each journal has another format / layout, furthermore, you sometimes need more than this information, so that manual copying is necessary. Copy, switch to the editor window, paste, switch to the browser window, where the hell am I, copy, ...

    Therefore, I have written myself a small tool to record all copy operations automatically. Essentially, anything that I mark (since this means "copy" in Linux) gets *added* to a clipboard. I am not going to publish it, though, because it was written in perl/tk and seems to work only with particular versions of perl/tk, but as an idea it greatly improved the process of storing my web searches. I tried to find a ready tool that does just that, but I could not find anything.

    Cheers,
    j.

  46. del.icio.us by finkployd · · Score: 2, Informative

    I'm normally not a web 2.0 bandwagon type of person, but del.icio.us is probably the most useful thing for this that I have ever run across.

    pros:
    -tagging
    -descriptions
    -accessible from anywhere
    -really simple to add to (with firefox plugin)
    -searchable

    cons:
    -web pages are ephemeral
    -del.icio.is itself could go away someday, and I'm not sure how to back it up locally

    The best way to address the issue of web pages being ephemeral is to, as others have said, print to pdf. You mac people have it nice in this regard, but it is not hard to set up on windows or *ix.

    I also mentioned that del.icio.us was searchable, but only the tags, titles, and descriptions. I fully expect google to someday roll out a similar service someday that lets you search through the pages you have tagged. That would be very useful.

    I also like the suggestion of a personal wiki, but more for keeping track of little "tips and tricks" that I stumble upon rather than entire web pages.

    Finkployd

    1. Re:del.icio.us by tweek · · Score: 1

      or as I mentioned previously, you could run Insipid ( http://www.neuro-tech.net/insipid/ ) on a webserver of your own. The snapshotting feature alone makes it worth while.

      --
      "Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
    2. Re:del.icio.us by finkployd · · Score: 1

      Very nice, I was not aware of this. I'm going to have to try it out.

      Thank you

  47. Text by value_added · · Score: 1

    How do others deal with organizing the results of browsing?

    I do this as regularly as anyone.

            lynx -dump > ~/docs/filename

    or if you're organised

            lynx -dump | add_to_database_script

    What's important to me is the content itself, not the "web content", so an attorney, for example, would take a very different approach (typically a hard copy that can be filed, duplicated, etc.). Note that unless you work for a law firm or a well-run business, managing paper is like a dog walking on its hind legs: it's rare to see to it done, and when you do, it's not done very well. The same applies to bookmarks which have the additional problem of referencing pages that may get moved or simply disappear at any time.

    In my experience PDF and HTML are like cousins who should refrain from getting to close to each other. By comparison, processing simple text is straightforward.

  48. Google Notebook by finkployd · · Score: 1

    I fully expect google to someday roll out a similar service someday that lets you search through the pages you have tagged. That would be very useful.

    Guess I should have read all of the comments in this story before replying. I would have learned about Google Notebook which looks like exactly what I was thinking of.

    Finkployd

  49. Email by kramulous · · Score: 1

    For web surfing I right-click and use 'send to'. I use the 'from' (me) and 'subject' to set appropriate filters on Thunderbird. This is then sent to various archive folders in the Thunderbird client and I archive occasionally. Also, highly searchable ... sometimes write a small blurb in the body.

    I stick to Endnote for papers. Is a little more time consuming but it is better in the long term.,

    --
    .
  50. Personal wiki by Average · · Score: 1

    Like a couple other people suggested, I have a personal installation of MediaWiki. Actually, several installs. One for my own personal info. One for thesis research (shared with a couple fellow students and my advisors), one for my sideline web-development biz, one with work documentation. Lots of uploaded files, too. When I get a new gadget, the manual (PDF hopefully, but scanned if I have to), a scan of the receipt, and my setup notes all go to a page on it. Random piece of software I hadn't heard about before, but don't have time to play with? Gets it's own page, and then a link from a "software I should check out someday" page.

    One critical thing is to be able to throw in just about any little bit of text information with no setup, from anywhere with a net connection. Unlike more rigid information management systems, it usually doesn't matter that there isn't a template for this kind of information. The other thing is searchability. The MediaWiki/MySQL text search isn't great, but it's enough.

    Now, there's lots of cruft in my wiki. My old airline flight schedules. Meeting notes. But, unlike a raft of little paper notes, a lot of unnecessary wiki pages are pretty harmless if you've got lots of server hard drive space.

  51. Why on earth would you want to print a webpage? by hcdejong · · Score: 1

    Instead save the page to disk. Much more accessible:
    - full-text search, on one or multiple files
    - text and other elements can be copied off the page
    - links still work

    Pity Windows doesn't attach comments to a file: in Mac OS 9 at least, if you saved a Web page, the page URL would end up as a comment (viewable by doing Get Info on the file).

  52. WikidPad by AnonymousDot · · Score: 1
    Wikidpad is a stand-alone wiki that works well for this. I'm using it every day to record all my researches. It's written in Python, opensource, and supports Graphviz (create graphs and orgchart on-the-fly with just text).

    Wikipedia info on WikidPad.

    Official download page

    Productivity note: Create your wikis in Original Sqlite and setup Google Desktop Search to scan .wiki files (with Larry's Any Text File Indexer).

  53. Zoot by hb253 · · Score: 2, Informative

    Zoot http://www.zootsoftware.com/ may meet your needs.

    --
    Self awareness - try it!
  54. Cheap dig. by mattpointblank · · Score: 1

    "(and I yearn for a day when browsers can reliably print what's on the screen, instead of cutting it off at the margin because some designer doesn't understand layout!)"


    This is so unfair. Are you a webdesigner? Are you even a designer at all? If you've ever done both print design and web design, you will appreciate how much more challenging web design is. Imagine designing for a completely unique for every viewer canvas, rather than, say, 10,000 identical copies of a newspaper or print ad. You have to allow for every possible screen resolution, browser platform, colour depth, operating system, javascript status, flash version, etc etc etc, and now you're insulting us because on top of that we "don't understand layout" because we can't make it print prettily as well? Thanks a bunch.
    1. Re:Cheap dig. by Anonymous Coward · · Score: 0

      "Designers" who don't understand the limitations and potentials of the web should stick with paper.

      There are a lot of jobs out there for hardcopy designers. Stay with what you know.

      Web design is a completely different realm; there is almost nothing from paper design that crosses into web design without major changes. Even the color theory is different.

      There is a word for a highly trained person who attempts to apply the principles of traditional hardcopy design to web pages. He is called an idjit.

  55. OneNote 2007 by DragonWriter · · Score: 2, Informative

    At risk of getting modded down for recommending a Microsoft product here, you might want to look into OneNote 2007 (or one of the versions of Office 2007 that include it.)

    It comes with a "print to..." driver so you can print to your OneNote notebook, and provides a good framework for organizing your notes, and you don't need to kill as many trees as printing to paper.

    Another possibility is to get a PDF printer; you can either just organize your notes with file system folders, or if you want something a little bit more useful to track relations between different items, you can use something like PersonalBrain to for organization.

  56. Google Notebook by magus_melchior · · Score: 1

    Along the same vein, Google Notebook is also quite nice, and comes with a Firefox extension as well. No images, however (I personally don't use them), but fairly simple, and you don't store everything locally. Worth a look, IMO.

    --
    "We are Microsoft. You shall be assimilated. Competition is futile."
  57. soon to launch: SwitchBooks by Anonymous Coward · · Score: 0

    There's a startup in Santa Barbarba called "SwitchBook" that has a browser plugin which is all about supporting complex searches. I expect it to be available quite soon. It allows you to cut and paste content from pages, placing them into a scrapbook sidebar, and as you build references an internal search engine starts finding more info based upon the content you're grabbing. Should be a winner...

  58. KDissert by electr01nik · · Score: 1

    You might want to check out the kdissert program. It runs in KDE, but if you have the proper libraries and dependencies, you should be able to run it on any WM.

    The description follows (taken from Ubuntu 7.04, I'm sure the description is the same for other distros as well)

    kdissert is a mindmapping tool for supporting the creation of complex documents: dissertations, theses, presentations, and reports. It supports pictures and features several document generators: LaTeX reports, LaTeX slides (based on Prosper and Beamer), OpenOffice.org documents, HTML, and plain text.

    A mindmap is a multicolored and image centered radial diagram that represents semantic or other connections between portions of learned material. For example, it can graphically illustrate the structure of a thesis outline, a project plan, or the government institutions in a state. Mindmaps have many applications in personal, family, educational, and business situations. Possibilities include note-taking, brainstorming, summarizing, revising and general clarifying of thoughts.

    Though this application shares some similarities with general-purpose mindmapping tools like FreeMind or Vym, the very first goal of kdissert is to create general-purpose documents, not mindmaps.

    The kdissert website is located here. The program was designed to manage and organize disserations, which from what you described, is probably very similar to the work you're doing.

    If you're looking for a tool more oriented towards 'mindmapping', there is Vym (website), which seems very interesting, and FreeMind (website), written in Java, though I have no experience with it.

    It sounds like from what you described, and the solutions others are offering, you are more interested in a 'general-purpose' document where you can list your sources, and if needed, map links, connections, and references to the various sources you're using. Vym might be more to your taste, since the layout is provides a great deal of information in very (imho) visually appealing format, with the ability to link objects together in complex ways (such as doumenting various reference sources in a paper, where they appear and/or referenced in other works, etc.) Such tools like Vym and KDissert are really only limited by your own mind, though the differences between the programs are sufficient enough that each one should be evaluated individually, since all three accomplish similar goals in very different ways.

    ~ow3n

  59. What I use... by tweek · · Score: 1

    is called Insipid - http://www.neuro-tech.net/insipid/

    It's basically a delicious clone but the feature I love the most is the snapshotting one. That way I never have to worry about the information going missing. It's been very useful for things that are hosted on university servers that disappear when the student leaves. Some of my bookmarks are private while others are public. They provide a javascript snippet you can put in your toolbar to bookmark the current page.

    It requires a server of your own to host it but it works for me.

    --
    "Fighting the underpants gnomes since 1998!" "Bruce Schneier knows the state of schroedinger's cat"
    1. Re:What I use... by phaggood · · Score: 1

      > Insipid ... snapshotting feature

      Page Save by Pearl Crescent has a nifty page-saving feature that you could use in Nautilus for an at-a-glance view of multiple web pages.

  60. BibTeX managers, and other bibliography by cretog8 · · Score: 1
    I'm writing academic papers using LaTeX, and finally remanaged my reference management. Most of the references I use these days are available electronically, and I've started dumping them all in the same directory, "bibliography". Then I use BibDesk (I'm on a Mac) to categorize and link to the file.

    For me using LaTeX, this is especially handy given that I'll want to cite many of these in actual papers. However, even for things I'm not going to cite, it helps a good bit in organization. You can search by authors, keywords, dates, whatever. I use keywords to tag whatever subjects it refers to (as far as my interests identify subjects), and an extra keyword if I have a specific project/paper in mind for it.

    If you don't use Mac, there's similar things on other platforms.

  61. Copy URL + helps a lot by rduke15 · · Score: 2, Informative

    I use text file and the Firefox Copy URL + extension:

    Copy URL + :: Firefox Add-ons
    "The Copy URL+ extension enables you to copy to the clipboard the current
    document's address along with additional information such as the document's
    title, the current selection or both."
    https://addons.mozilla.org/en-US/firefox/addon/129

    It installs a context-menu, allowing you to copy any or all of page title, URL, and most importantly: the text currently selected.

    At other times, I use bookmarks in a new folder specific to the subject. You can add keywords to bookmarks in FF.

  62. JumpKnowledge - web-based capture & annotation by yaakovsash · · Score: 1
    You can use JumpKnowledge located at http://jkn.com/ to capture and annotate any web page. It is the perfect tool for research, online collaboration, and for quickly emailing a web page to your colleague with your comments inside.

    In a nutshell:
    • JKN is free, web-based, cross-browser, and registration-optional.
    • JKN supports frame-sets, secure web pages (https), and multi-web pages.
    • With the optional FireFox add-on, you can annotate password-protected web pages.
    • The resulting Annotation can be emailed, blogged, saved, bookmarked (including delicious) and printed.
    • Annotations can be saved as public or private.
    Here is an annotation example of this very discussion with my (somewhat) insightful comments:

    http://jkn.com/View?j=814255.882997363546&t=03/

    Full disclosure: I am the founder and chief rabble-rouser for JKN.
    --
    Founder, JumpKnowledge - www.jkn.com
  63. File -Save As in IE7 (...so they tell me) by andcal · · Score: 1

    In IE 7.0, If you click File ->Save As, you can save the page as a .mht file. IE refers to it as a "Web Archive, Single file". It includes the HTML and graphics a single file, and doesn't seem to munge the page up (at least not any worse than IE does in the first place, that I have noticed).

    Er,uh, at least, that is what I hear, from some losers I know who use IE7 (What was I thinking? I must have forgotten where I am).

    --
    --something witty
  64. Websticky by msmiffy · · Score: 1

    This is something I wrote so that I could do away with Firefox bookmarks altogether and have the same bookmarks on my laptop and desktop.

    It's server-based and works with a Firefox toolbar (there's also an ancient and crude IE toolbar too) or just straight through the Web.

    My wife and I have been using it for over a year now, I never got around to doing anything like a public release. You're more than welcome to give it a go, if you wish.

    http://websticky.net/

    Let me know if you'd like an "invitation" (yeah, got that idea from a certain Search provider turned Mail provider). I'm not entirely sure that the invite system survive the change of hosting providers.

  65. the badness of bookmarks. by sowth · · Score: 1

    Worse yet, sometimes I'll bookmark a page and go back to it, and the page will be gone, the site down, or changed to something which isn't useful to me.

  66. Furl or Spurl by fcc3 · · Score: 1

    I use http://www.furl.net/ which satisfies the requirement mentioned: "a good solution would have to keep copies, not just references". Furl lets you save the text of the page you have visited, as well as the link. It saves them on the furl server, so you can furl from any machine. I notice that Furl has become less popular. I don't know whether people moved to http://www.spurl.net/ instead. I think the pdf solution may be best for the long term.

  67. File....System! by bill_mcgonigle · · Score: 1

    You can either keep what you save in some sort of logical arrangement,

    That's the best idea. The filesystem is the most robust database I've found and hierarchies work well for me.

    This story would be saved in 'computers > networking > internet > sites > slashdot.org > stories > ask > obvious', for instance.

    Anything I come up with goes on the blog. In theory, then somebody could do the same (were there people who cared to read what I write).

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  68. Just googling around, or doing proper research? by Bud · · Score: 1

    It's not worth archiving everything. But what you do archive, you should archive properly and carefully. All interesting information falls into two groups:

    1. Stuff to index and archive - Reports, newspaper articles, HOWTOs, manuals, books, specifications and other valuable and above all referencable stuff should be indexed and archived using the tool of your choice. Unfortunately, this is hard to automate and takes several minutes of your valuable time. Fortunately, this means that you won't have the time to archive all the useless crap, which naturally belongs in the second category...
    2. Stuff to leave to Google - Just remember the terminology or jot down the relevant keywords in an e-mail to yourself. Then use Google again when necessary. No need to store the URLs, Google does it for you.

    Indexing and archiving and maintaining their own bibliography is what researchers do to stay on top of things and there is no way around it (unless you are in a position to use your underlings for this purpose.) You may want to capture the author(s), date, title, URL, some keywords, a clickable or copyable path to your local PDF copy, and perhaps the abstract or executive summary.

    Over time you will learn what to archive yourself and what to ignore. You will also find that for every handful of documents, only one is important and insightful and worth archiving while the others simply reference or paraphrase the original document. This is basically the 80/20 rule and it holds regardless of the subject area. -- Murphy's law states that you will only find this document after you've indexed all the others... but that's life for you. ;)

    --Bud

  69. which bug by Krishnoid · · Score: 1

    KDE bug 140983 .