I'll second Zotero as well. I also use Zotero (I'm also a doctoral student and use it to handle my research). As for copying between computers, I've never had a problem. I just set it up to store everything under a specific folder rather than the default one under the Firefox profile, and copy that folder between computers.
As for citation formats, I did have a few problems until I figured the easiest thing to do was to leave everything as defaults. I was exporting to Bibtex format and importing it into LyX and wanting to change the style slightly. I could have created a zotero xml based export style but it was easier to leave it as was as all I wanted was a small change to the default one.
That brings up the one thing I would like changed/added: use or import normal Bibtex and/or Endnote styles rather than creating another way of doing the same thing.
There may be another solution that fits your needs if all you want is to keep track of a lot of files but if you also make notes on them, tag them, and say which is related to which, and even create snapshots of webpages etc. its just great. I'm also using it as a simple store for various piles of open source code I may want to find, use, or refer to later.
First off the 25X compression is either pure marketing hype or an average for some 'real world' scenario that they dreamed up. Either way its not a hard and fast figure.
Having said that I believe this is similar to another backup data compression algorithm I saw a presentation for a couple of days ago. There are two parts:
1) A database of unique chunks of data.
2) A blueprint of index numbers that define how the data fits together.
It takes a look at the data stream in X bit chunks and if its unique stores it in a database and stores an index pointer to it; if it has been seen before then it just stores the index pointer.
Obviously as this index gets bigger it takes longer to search through but there is less chance of a non-unique chunk. If this is done in a Disk-2-Disk-2-Tape situation it can take the backup of the server(s) onto a HDD and then run this algorithm at its lesuire to get the compressed version for tape. I'm assuming as they are marketing this for TB levels of data that they have this one worked out - at least for this level of data.
Another issue is that you get less and less compression as your index number takes up more bits (i.e. more and more unique chunks). This isn't going to be a practical problem in the near future as the one I was looking at was taking 8KB chunks. This means that to get enough unique chunks to get the index to be the same size as the data its replacing (8KB) you need at a minimum 2^65536 bits (10^19709 exbytes). This is simplified but even if you have a couple of orders of madnitude as a fudge factor for overhead in storing the index numbers you aren't going to run into this problem soon.
There is also a problem if you don't have too many repeating chunks. In fact there may be an *increase* in file size if you don't have many as you now have the overhead of the database to worry about.
So whats the the answer to the scoffers 'can you feed its output to itself?'
The answer would probably be yes but each time through you have less repeating chunks, therefore more unique ones so the database overhead eventually gets to be a problem i.e. you keep running its output through itself and it eventually comes near the theoretical minimum and oscillates, getting bigger then smaller, then bigger again.
I'll second Zotero as well. I also use Zotero (I'm also a doctoral student and use it to handle my research). As for copying between computers, I've never had a problem. I just set it up to store everything under a specific folder rather than the default one under the Firefox profile, and copy that folder between computers.
As for citation formats, I did have a few problems until I figured the easiest thing to do was to leave everything as defaults. I was exporting to Bibtex format and importing it into LyX and wanting to change the style slightly. I could have created a zotero xml based export style but it was easier to leave it as was as all I wanted was a small change to the default one.
That brings up the one thing I would like changed/added: use or import normal Bibtex and/or Endnote styles rather than creating another way of doing the same thing.
There may be another solution that fits your needs if all you want is to keep track of a lot of files but if you also make notes on them, tag them, and say which is related to which, and even create snapshots of webpages etc. its just great. I'm also using it as a simple store for various piles of open source code I may want to find, use, or refer to later.
Having said that I believe this is similar to another backup data compression algorithm I saw a presentation for a couple of days ago. There are two parts:
1) A database of unique chunks of data.
2) A blueprint of index numbers that define how the data fits together.
It takes a look at the data stream in X bit chunks and if its unique stores it in a database and stores an index pointer to it; if it has been seen before then it just stores the index pointer.
Obviously as this index gets bigger it takes longer to search through but there is less chance of a non-unique chunk. If this is done in a Disk-2-Disk-2-Tape situation it can take the backup of the server(s) onto a HDD and then run this algorithm at its lesuire to get the compressed version for tape. I'm assuming as they are marketing this for TB levels of data that they have this one worked out - at least for this level of data.
Another issue is that you get less and less compression as your index number takes up more bits (i.e. more and more unique chunks). This isn't going to be a practical problem in the near future as the one I was looking at was taking 8KB chunks. This means that to get enough unique chunks to get the index to be the same size as the data its replacing (8KB) you need at a minimum 2^65536 bits (10^19709 exbytes). This is simplified but even if you have a couple of orders of madnitude as a fudge factor for overhead in storing the index numbers you aren't going to run into this problem soon.
There is also a problem if you don't have too many repeating chunks. In fact there may be an *increase* in file size if you don't have many as you now have the overhead of the database to worry about.
So whats the the answer to the scoffers 'can you feed its output to itself?'
The answer would probably be yes but each time through you have less repeating chunks, therefore more unique ones so the database overhead eventually gets to be a problem i.e. you keep running its output through itself and it eventually comes near the theoretical minimum and oscillates, getting bigger then smaller, then bigger again.