File Organization — How Do You Do It In 2011?
siddesu writes "After 30 years of being around computers, I have, like everyone else, amassed a huge amount of files in huge amount of formats about a huge amount of topics. And it isn't only me — the family has now a ton of data that they want managed and easily accessible. Keeping all that information in order has always been a pain, but it has gone harder as the storage has increased and people and files and sizes have multiplied. What do you folks use to keep your odd terabyte of document, picture, video and code files organized — that is, relatively uniformly tagged, versioned, searchable and ultimately findable, without 50 duplicates over your 50 devices and without typing arcane commands in a terminal window? I found this discussion from 2003 and this tangentially relevant post from 2006. How have things changed for you in 2011? And how satisfied is your extended family with the solution you have unleashed upon them?"
.. seriously.. they still work for me.
I’ve got a 12TB file server (~6TB filled). It’s arranged as follows:
documents/
incoming_downloads/ (before you ask.. yes.. _legit_ downloads)
media/
media/video/
media/video/movies/
media/video/tv_shows/
media/video/tv_shows/some_tv_show/
media/video/standup
media/video/etc..
media/music/
media/images/
media/images/various_subfolders/
code/
virtual_machines/
tmp/
backup_links/
backups/
That’s always been enough for me. Never got into all this tagging/meta data stuff. If there’s anything I’d ever want to search on... I put it in the file name. Indexed every night via slocate.
backup_links is part of my hacked together backup system.
The thing is raid6, setup so two drives can fail without loss of data. I see this as adequate “backup” for stuff that is replaceable (the large portion of my media is rips of DVDs I own... so although it would be a huge pain in the ass to re-rip them all... it’s not impossible). Stuff that is irreplaceable, I backup to separate hard drives (via hot swap trays).
I leave one backup drive plugged into the machine, and keep the other elsewhere. I periodically swap these drives. I have a script that just rsyncs the files and directories pointed to in backup_links (the irreplaceable ones) to the currently plugged in drive (and yes I verified that I’m not getting a backup of my links ;p). This way I always have one drive that has a pretty recent backup (runs nightly), and one drive that has at most a month or so old backup if the plugged in one fails for some reason.
backups is backed up files from other machines.
Keeping everything in one place helps with the organization I think. Most of the other machines on this network are basically just OS installs. All the real files are on the file server. My desktop runs of a small SSD, which is not even half filled.
I think you left a directory out. ;)
What's wrong with typing arcane commands at the terminal? And who said I unleashed that upon my extended family?
Fap and non-fap, and must-not-fap.
i use windows home server and fill the shares with nicely named subfolders. around 6tb by now.
one could use linux for that, too. just make sure it's something that hides physical disks away from you.
If hasn't been accessed in 3 years then it gets deleted [porn files 6 months].
For personal documents and such, I don't even worry about it any more. Just make sure that I don't put anything important in /tmp, and then let Spotlight find it for me. The only problem is when you have several versions of the same document, and you they all hit on the Spotlight search. So you just need to discipline yourself to not have a zillion copies of them (let Time Machine do that for you).
Directories are fine until you forget what your taxonomy is, plus the "let's build an arbitrary genus on the fly" syndrome.
Nah. Ad hoc organizing when I feel like it, but Spotlight find them for me no matter how bad I mess it up.
Hierarchical named directory structures is how I organize things. I've actually been relatively conservative with the data I keep around, and have about 600g of it, with maybe 100g being irreplaceable. Everything is organized via an appropriately named directory with appropriately named sub-directories, sub-sub-directories and so on. The files themselves are also named with an appropriate name as to contents. I was doing this long before libraries and "tags" and stuff came along, and I've just always kept doing it that way since I just don't have the time to go back and "tag" thousands if not tens of thousands of files. For me, this named-directory approach has been best due to it's simplicity - the structure is easily transferable to any OS, and easily understood by anyone that sees it. It requires no application to handle it or interpret it either. I can't see myself deviating from this method even with 10 times more data as it would continue to be effective regardless of the amount of data I collected.
I've tried forcing myself to use various schemes including relying completely on metadata and search. The last couple of years this is how I've ended up setting things up:
"Public" network storage
This is for data that should be accessible to the entire network at home. NFS mounted on all my machines, stored on ZFS volume on my file server.
Private network storage
I use my home directory on the file server (also on the ZFS volume) for storing personal files and mirroring home directories from client machines in ~/Backup/homes/.
Local storage
On individual client machines I generally try to stick with whatever the operating system tries to make me use with an rsync script that syncs everything to the file server (automatically for desktops, run manually on portable machines).
This is what works for me. I would probably have stuck to the "just use metadata" approach if most user interfaces didn't seem to try and make it a major chore to edit and view metadata...
Greylisting is to SMTP as NAT is to IPv4
I have recently found an incredibly fast search tool called Everything. We're talking about Google-like searching where the results pop up as you type. It must be something on the order of a fifth of a second for my 1.5 million files. This kind of technology should be widespread - it makes searches actually *pleasant* to do. Anyway thanks to Everything, I worry less now about where I store my files, and I also try to pack in keywords into the filename.
Anyway, this kind of program is just a glimpse of what a future OS would look like. Imagine a system where everything is stored in tags and where folders become obsolete or used far less often. What you have then is a database or metadata file-system. The relatively new Haiku OS uses such a system, and I wrote about the massive advantages from this old page:
http://www.skytopia.com/project/articles/filesystem.html
Honestly, we'll all be better off the sooner we switch.
Why OpalCalc is the best Windows calc
consider using software like Zotero, Jabref, BibDesk or others. But from your problem description it sounds like you don't have the problem that you need to refer to documents quickly and keep a lot of pertinent meta data about them. For managing a few videos and photos for other people directories should be enough. And for anything resembling source code and configuration files you can also use directories plus a distributed revision control system like git.
Simple: Delete stuff.
Do you need all those instalation files for 10 year old shareware? Do you really need Gigabytes of movies you will never watch again? Music Collection so big that your playlist is months on lenght? Irrelevant TV shows? More ebooks than you can possibly read?
What you really need to keep are personal files - photos, home video, documents. Those can easily be managed - tag by occasion, file under year/month. done. (they do not take that much space either and people get tired of documenting everything sooner or later.).
-- Technology for the sake of technology is as pathetic as eschewing technology because it's technology.
I have my local folder structure the same as the project server, so that anyone who might need to, can get something off of it, and it's easier for me to make backups.
I also still use a similar directory structure, but I've made once change in the past few years that makes it much easier to manage: I keep the special, personal, irreplaceable in a separate hierarchy.
This negates the need for something like a backup_links directory, and makes it much easier to just share the "normal" media directory with everyone/thing on my home network and then handle permissions on the personal stuff with more granularity. It's also much easier when I know I'm looking for a photo I've taken or a document I've made that it'll be in the personal hierarchy under those categories rather than the main ones.
It's a small change, but keeping a separation between stuff I've made and the easily replaceable stuff I've acquired has gone a long way to making my personal data and treasures more secure--both from loss and accidental sharing.
For my media files (on a separate server)I use Rhythmbox for audio and XBMC for video.
And I back up everything somewhere else, just in case. I don't have terabytes of stuff though. Close to a terabyte.
I gave my son his own computer and, like many IT strategies, told him I'd back up what he asked me to. I made him responsible for his own collection, as am I. They may duplicate but hardware is so cheap. When we watch recorded TV shows sometimes we are both interested in keeping a copy, and that's ok. A gig here or there really doesn't matter when I can add 2TB for a $100.
That's very different from the scenario we faced when his brothers were kids. A 100MB hard drive was then pretty significant. I had to consider floppies and temp spaces. Now I'm more concerned with the age of the hard drive.
I don't think I'm the best one to decide how he might like to find his information - who knows what innovation might bring. I DO care that the systems are stable and reliable. That means repairable, at least to me.
My main file server, where anything not in immediate use is stored, is organized mostly for human convenience. That is, a tree-hierarchy of folders.
media
media/video
media/video/movies
media/video/tv
media/video/shorts
media/video/educational
media/audio
media/audio/music
media/audio/drama
media/audio/comedy
media/audio/educational
media/pictures
media/pictures/family (with various subfolders like "zoo", "picnic", "christmas 2010", etc.)
documents
documents/work/[person's name]
documents/school/[person's name]
documents/misc
web/[site name]
programming/[person's name]/project
family history/
misc/
At the end of the year, or when I do a mass data import, I spend more time getting the meta-data and tags correct than anything else. All of my audio and video are properly tagged. Ditto for any documents.
Almost all video is accessed with "smart" programs, like Amarok or XBMC which automatically pull in things like lyrics, trailers, cover art, etc. That stuff is almost never accessed thru the directory tree. The interfaces on the programs are way too good -- assuming the stuff is properly tagged.
The web and programming folders are basically .tar.gz files that are backed up and copied over (drag-n-drop via smb mounted share). They're archives of whatever project someone is working on their local system. I've set up cron/scheduled tasks to update those daily on everyone's PCs, even the kids.
Most media folders are read-only, to prevent accidental deletion. My account is the master and I can upload stuff there, but I don't want accidents from people wanting to just watch a movie. 600+ DVDs/BluRays, including movies, educational & television shows all on a 2 Tb file server in h.264 format. All *music* is FLAC format, with Amarok auto-transcoding if people want to transfer to an iPod. All other audio, like drama/comedy/educational is 128 Kbps MP3 for ease of streaming. And old comedy albums aren't exactly THX-quality to begin with.
Learning HOW to think is more important than learning WHAT to think.
In spite of what MS promised, we still have no SQL filesystems.. I'd love one of those by now. I have terabytes of data, photos, code, php, javascript, movies, chat logs all scatterd throughout different disks backed up when needed, double copies everywhere. I want something to manage this properly! Any advise?
Quack damn you!
Who has the time to hand-pick all the relevant tags for every file they download? Yeah, me neither.
Finding time to put things in their own directory, and not dumping them all in "downloads", is a great accomplishment.
However finding a meaningful, hierarchical structure is non-trivial. I'm still working on it.
For the most part, I use directory structure coupled with hard drive provisions that organize the content based on media type.
I also use http://bulkfilemanager.codeplex.com for managing names/relocation of mass files, makes life a lot easier. Especially in download sets where a uniform naming scheme for the media is non-existant, and having one implemented would be useful.
I have media drives that hold the bulk and they are easily organized into games/pictures/books/movies/tv/music. Smaller document/coding directories are on my C drive for source/text/spreadsheets I make myself.
I don't tag anything. For my pictures. I simply name the directories Year_date_mainContent. (ex 2010_12_25_Xmas). Media names are self evident, but I also run XBMC for video, so I guess that has internal tagging. But still easy to find video outside of XBMC which I only use about 50% of the time.
I almost never even use search to find things, because the layout is very logical and it is pretty much obvious where everything is.
Everything is online and in my computer, multiple TB drives. No raid.
For backup I simply use external esata multiple TB drives and FreeFileSync, that I run once/week.
Search, don't sort.
..don't panic
This problems keeps popping up more and more often, as people collect more and more data... I don't have the answer to the question on how to keep everything indexed and searchable, but I do have the answer to the question on how to safely version control and store/backup such large amounts of data data... A little project I have called Boar. I quote from the project front page:
"BOAR aims to be the perfect way to make sure your most important digital information, like pictures, movies and documents, are stored safely.
* BOAR prevents data loss due to human or machine error
* BOAR makes it possible for you to restore any or all of your files from any point in time.
* BOAR makes it easy to maintain verified backups of your data, including file history.
* BOAR will make it much more likely for your digital heirlooms to reach your grandchildren some day.
If you are familiar with vcs software such as Subversion, you might think of boar as "version control for large binary files". But keep reading, because there is more to it."
Please check it out at google code: http://code.google.com/p/boar/
Where you keep all your valuable data, so if we ever hack into your computer, we know where to steal (or at least make copies of) your pron collection
I'm working in a Mac OS X environment, but this should work for Linux too: I have groups for the various classes of stuff, e.g. photos, household files (like taxes and Christmas letters), etc. Each group has a group home associated with it, and I mount those from my server as needed. (The server's a RAID 5 box). Irreplaceable stuff like photos are copied a couple of times, once to a disk on a separate machine and periodically to a portable USB drive that I keep at a friend's house. (I have 2 of them and rotate them.) An advantage of the group-based approach is that I can use group privileges to limit access if required (e.g. my work related stuff is not readable by the rest of the family. Photos are updatable by my wife and I and readable by everyone else, etc.)
For sensitive materials, I actually use a Mac OS X encrypted disk image in the group home directory. One of these days I'll work out how to get whole-drive encryption on my Mac OS X Mini Server.
For my photos, I'm experimenting with various keyword Digital Asset Management schemes, inspired by "The DAM Book" http://oreilly.com/catalog/9780596523589/
And as a side note, I'm seeing -50% failure rate- on Seagate 3.5" 1tb drives that are about 1-2 years old. The RAID enclosure is running Toshiba 1tb drives. One of my 2 USB backups (with a Seagate drive) failed, so I'll replace that with a Toshiba or WD drive. I'm really disgusted with Seagate reliability!
1. Post it on the web, or run your own apache instance.
2. Use google to find back your data.
3. ?
4. Let others also profit from your data.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
I had great success with Google Desktop Search (on windoze) for a while. It would index my mail, files, and web history (if instructed to) - and the best part was hitting one key to get an instant, minimalist search box with auto-preview. From there, you could jump straight to what you were looking for, or open a further page to narrow the search.
Sadly, it doesn't work with Thunderbird 3.0, and Google doesn't appear to care, or even to be supporting it anymore. So now I'm on a hodgepodge of GDS, Windows built-in search, and the sucky T-bird search bar.
I honestly can't believe that nobody has duplicated this Spotlight-esque functionality yet. I realize there are other desktop search options, but none of the ones I've come across have that one-key mini search that goes away as easily as it is called up. For an operation that I'm performing dozens of times daily, that's pretty crucial. It even replaced the file browser for me - much easier to call up the GDS box & type a couple letters than to grab the mouse and drill down into some directory structure - even if I know exactly where I'm going.
Well thought out is well thought out, regardless of whatever system you use.
sticky labels on each floppy disk.
"We live in a global world" - Harvey Pitt, former Securities and Exchange Commission Chairman
It/is/all/junk/then/you/die
Score & Karma: SASA: Slashdot Approval Seekers Anonymous
I have a huge NUMBER of files, in a huge NUMBER of formats about a huge NUMBER of topics....
Here is my highly effective file organization methodology:
And that's pretty much all there is to it. This system hasn't failed me yet. Plus, it will stimulate the economy in approximately 0 to 60 years, because the investigator who has the pleasure of snooping through my computers after I croak will have job security for years.
One big disk, or multiple ones if you want redundancy for your data. It's the only way you'll keep anything.
Switching out media is a pain. There are people who do it. If you were one of those anal retentive types, you'd hardly be asking about it on Slashdot. So I'll assume you're a normal person.
So one big storage space. There are a number of ways to do that, from a USB or eSata drive, to a network share. Whatever choice you make, if you don't like it, change in another year or so. At the rate of data space increases, you won't run out anytime soon.
The real key is to keep your media up to date. Disks decay, drives break down. Yeah, you can buy a 5 1/4" drive, but is it worth the bother? Even 3 1/2" are fading, and it won't be long before some others hit the dust.
Of course, specifics are up to you, different people like different organizations, and who knows what you really have a use for? I remember some old games I've played once or twice. Should I keep them, or just forget about them because if I really cared, I'd still know their names and not just have to look for them.
I'm pretty much a "have a lot of structured directories" guy myself; I don't see your complaint about rising file sizes, or even total number of files. They've pretty much increased linearly in number while the speed of the linux "locate" command has gone up exponentially with Moore's Law. It's the other way around from management trouble - with TB hard drives, I have so much space I leave around TV shows and other media files I'll likely never watch again, "just in case".
At work, the search problems are harder, because I've got quite the multi-tasking job where I may spend just minutes on some problem, then be asked for an update months later, totally skeptical that I ever addressed the issue. And my favourite file-management with that is the most insane-sounding of all: one big directory. I sort it by date and rely on the fact that I take time to write out helpful file names like "downtown_condition_assessment_newmall_4_ernie.xlsx" (not actually that long, I use abbrevs in RL). Only files that have a whole lot of subject-matter friends get their own subdirectory; lonely "one-off" files go in the Big Pile.
The "sort the directory by date" uses the theory behind "lifestreams" promoted by Eric Freeman and David Gelernter at Yale. It really is the best thing I've found (same 30 years) to stimulate the memory - seeing the names of other things you did at the same time; you can actually sense yourself getting close to the file as you remember, "Oh yeah, I worked on that in the spring".
An additional word of Fear & Loathing for "document management systems" like LiveLink by Formark. Required to use this by work (shared directories are strictly for 'short-term' storage), it's awful. Terribly slow, the search function approaches useless, and it's hard (and slow, did I mention slow) to even re-sort a directory (sorry, that's a 'filter down' in Livelink's vocab) by name or date or whatever. After promising that photos would be displayed with thumbnails by the great new Version 4 for two years, it came, broke some stuff that was working, and did not provide thumbnails - all media files are unsearchable in any way. I suspect for long-term archiving, putting documents in a database would have advantages, but for active business usage, it's been crippling.
These are what I've come up with.
For Windows, I create C:\Software and C:\Hardware. Drivers, DirectX updates, and such all go in Hardware. Any software I install goes in Software. Games are the reason to use Windows, and are huge consumers of hard drive space, so they rate their own subdirectory, C:\Software\Game. (I've also decided to drop plurals from directory names I create. Was getting annoying having "pic", "pics", "pictures", "images", etc.) It doesn't have to be "Software", all it has to be is not C:\Program Files. That way I can tell at a glance what I put on there, and what else is there. Back in the days of dial up BBSes, I used C:\LOAD\DOWN and C:\LOAD\UP. When I installed Windows, I'd have it install into C:\W, figuring that would make various configuration files ever so slightly smaller.
For UNIX, of course I have /home mounted on its own partition. Makes upgrading and backing up a lot easier. I use 'u' (for "user") for my primary user name. (Some distros, such as SuSE, won't allow single char user names, so it's "u1" for those.) Besides keeping it as simple and short as possible, it also heads off any possibility of my real name being easily discovered from my chosen user name. As more and more crap has been stored in the home directories of users (directories like .mozilla, .gnome, .gnome2, Desktop, Documents, Downloads), I've recently taken to putting all my stuff in /home/u/own/, so I can easily tell them apart. I could live with it as long as they kept to hidden names, but when the desktop environments started pushing in with subdirs like /home/u/Documents, I decided to do something. Same idea with C:\HOME\U on Windows, when I have anything there. C:\My Whatever attracts too much junk from programs that take it upon themselves to save their ever so valuable configuration info there.
And lastly, I save configuration tweaks, with full path names, in /home/localconfig/. If I change, say, /etc/hosts.deny, I save the changed copy (not the original) in /home/localconfig/etc/hosts.deny. Really helps when I'm trying to remember what I had to do to get sshd, CUPS, XWindows, or whatever to work, or where the window manager du jour stores its global configuration and menus, or where the heck they moved DIRCOLORS functionality this time. Of course there is no user named "localconfig".
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
Nexus One / Nexus S / Folio (Android Tablett = iPad, cheaper)
Then : Chrome / Picasa / Google Power
Because the Invisible Garfield @french_matt ... pushes Innovation instead of just staying @ home ;)
in two places, everything.
All your database are belong to U.S.
Everything is what I use on the PC to quickly find any file I am looking for.
On the Mac I use Spotlight.
While it would be nice to be completely organized, these tools let me find my files anywhere they are located on my PC. I try to keep things organized into folders, but I am always falling behind so these are what I can use in the interim.
Another option is to split data over machines. Use a media server, and keep media there. Maybe pictures and personal video's can be on your normal laptop as well, but your MP3 collection, movies and tv-shows don't need to be on your laptop.
Then I use one old desktop with Ubuntu on it as backup-machine. I use Crashplan for this. It has a free option to backup to your own machine, or to backup to a friend's machine. I backup several machines (from my parents as well) to this one machine. Then this one machine can be backed up online for $5/month, no limit. (You have unlimited storage, only limited by the upload speed. But as we've seen with Mozy, that can change very quickly.)
porn/
notporn/
I don't use any tools. I just have all the content on two sets of external disks (copies of each other; I use external disks because I don't have large enough computer and I don't like the idea everything to be under current at all times). It's a pain to manage. I think Linux (or your favorite OS) desperately needs a 2-tier backup system with deduplication (but still making sure you have enough copies for recovery) and a good user interface.
Ideally, I would say, in file manager, unarchive me this file, and he would look for the file, let me mount the proper disks/CDs required to get and then copy it to some cache area on main harddisk. Here I could play with the files (change, sort, rename, tag, whatever), and then he would automatically backup them again when I wouldn't play with them any longer anymore.
What happened to Beagle for Linux? It used to work pretty well for me, and now it seems to have been abandoned.
I put everything on the desktop. When there is no more place on the desktop, I create a subfolder named "temp" and then put everything in it, including the last "temp" subfolder.
All files on your desktop with subfolders - along with one or two random virus .exe files.
All emails sent to me have subject lines with "Re:" followed by some completely unrelated subject.
And all VBA code is commented at a ratio of 1 commented line for every 600 lines of code.
That should it do.
when need a smart metadata filesystem. The system needs to be a simple and automatic system which file extensions and file headers are used to create the base level tags. Other tags could be added for items like music and video but the 'bread and butter' of the tag system needs to come from obvious information in the file and filename.
(waiting for someone to say 'CLOUD')
Not folders, "libraries", and sure as hell no tags (I tried that w/ Picasa for my pictures a few years ago; made a mess; deleted Picasa -- returned to sensible dir structure "pictures/TOPIC/year/month" and I'm fine). And I separate code by language ("code/4th/TOPIC" or "code/c++/TOPIC" ..I even keep a /fortran dir though I haven't used it since 80s, and a /48sx— some of my favorite code even though it's essentially unusable). Homebrew backup across local network drives and [for pics/video and code] solid state.
evil: I keep an /mp3 dir that's root-accessible only, with no subdirs whatsoever. ONE time I had a problem with this, and for that system alone I made a few subdirs. Learned hard way w/ iTunes. I despise programs that rearrange your files for you, make ridiculous subdirs w/out permissions, etc. I have to use iTunes, but I look forward to the day when I can get rid of anything apple and/or adobe. Hell, not even MS forces directories on you (not incl. the OS itself, I guess).
Interesting, but most people need (at least I do) more of a data de-duplication tool than anything else. That, and a subject-based whereis tool, would deal with 95+% of the problems most people (and organizations) face in this realm. JMHO, but then I only have about 40 years experience in this field... :-)
Sometimes, real fast is almost as good as real-time.
I finally dealt with this problem once and for all in the following way. I found the best personal wiki out there (Zim: http://zim-wiki.org/), and wrote a simple python script (http://www.inrim.it/~magni/zimDMS.htm) that scans nightly my folder structure, keeping up-to-date my wiki. My wiki, therefore, is a perfect mirror of my folder structure, with the added bonuses that I can navigate to each folder, comment it, describe its content, insert images, insert links to other folders, and finally by a single click I can open it in the file manager. My ~ 15000 folders are managed perfectly...
Arcane commands?? Terminal window?
Certainly you are confused, grasshopper.
I have a mix of online and offline files. Online files are stored across 20+ machines, but most live on a file server that runs 'updatedb' nightly. That means 'locate' can be used to find any file on that system efficiently.
For media files stored off line, it is all about building a text DB. Those offline files are (usually) stored on numbered optical media and the contents are stored with the equivelant of 'ls -lR' > nnnn.txt. If certain types of files are included in the media, additional information may be pulled from the internet and placed into another text DB with "additional" information. Egrep is used to find anything and the optical media number is shown in the results.
On Windows, you can use locate32 for similar capabilities to the UNIX 'locate' command. I think it will look inside files too, so the egrep command to find which media disc a file is stored would be easy.
I like that it is all TEXT files for efficiency, trivial access, and maintenance.
I've created web interfaces ... never use them. They just get in the way. Wife and kids use those, but only with limited searches based on filename. I search the additional metadata when that is desired.
Remember when you learned how to create a dictionary using a large text file as input?
The old ways are not necessarily bad.
Your project is very interesting. I will try it.
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
Everything I have is on an UnRaid box, so it's organized into shares (virtual drives)
Personal Files
Photos (actually, it might be called Images)
Music Archive (all my originals)
iTunes (my working volume/set, compressed to mp4)
Applications
Home Movies (Raw, In Process, & Finished folders)
Files (everything else goes here, organized by what it is - health, rockets, cooking, guitar...etc)
Work Files
My business stuff
Video
Separated by type (DVD, HD, Recorded TV, Youtube)
This lets me backup my Personal and Work volumes with very little fanfare - I use LiveDrive remote storage and local external drives with SyncBack Pro. I don't back up the Video directory. Sure, it'd be a bummer to have to re-rip 400 discs, and lose some old TV shows, but it wouldn't be the end of the word, and it's not worth building a separate box (or two) with 4TB of storage to back up what I own on commercial media. That may change in the next year or two, and when it does, I'll backup with SyncBack just like the others.
The files are organized enough that most files I need can be narrowed down to 2-4 folders, tops. (side note...I have another 2-3 levels under my Files share..but that's just a second level of organization) All my images are cataloged using Picasa. I don't subscribe to the "dump it in a folder and tag it for searching." Even in Evernote I'm relatively organized. I've found that you can never remember the tag you used 4 years ago to search for your stuff. I keep a running log of my work jobs - about 1200+ in the past 8 years - and I still have problems finding specific jobs from too long ago.
Occasionally I'll clean stuff out and reorganize, and having the folders makes it easy. The biggest thing is that the master set is in ONE spot, and all my other machines sync to that spot. Sadly, the LiveDrive engineers are a bunch of useless hacks with an inflated view of their servers, so I don't sync any of my personal stuff there. I use their sync for work because it's the only service that seems to work reliably for less than $200/yr, and I've modified my workflow to use their ass-backwards system. As a result, I have to manually sync things to and from my server if I go remote for a while, but that's rare for my personal stuff; usually if I'm away from home and not on business, I want to be away from technology.
Is it just my observation, or are there way too many stupid people in the world?
If you don't have a series of nested folders: "Old Laptop," "Desktop," "Laptop Files - to be sorted," "Laptop," "Desktop Documents" with counterparts for pictures, you're not storing files properly.
Yes, I hear that a lot... True deduplications seems to be important for lots of people, and boar will gain "true" data deduplication quite soon, it's on the roadmap. (Even if I doubt that most people actually need it... what do they do? Do they obsessively edit their exif headers all day?)
In the meantime, boar already has a trivial form av deduplication in that it only stores identical files once. This is actually quite useful, as it allows for the cheap creation of "views" of the data. I can keep all my pictures in one place, including the blurry ones, and then have another directory containing only the nice ones, ready for a slideshow. Also, I have often experienced the problem that I have two _almost_ identical copies of some large file tree. Which one to keep? Often, I'd just keep both and wait for a rainy day to sort it out (which never comes). This was one of the things that drove me to create boar. Nowadays, I will just import both trees into my boar repo and delete the originals. Duplicate files will only be stored once in the repo, so the benefit will be almost as good as if I had taken the time to look through all those files, but with zero effort. Maybe I'll merge those similar directories some day, but until then, they'll at least not clutter up my harddrive.
All_Photos
raw
2009 (here is a comment you can ignore)
Winter
Spring (echo quarter contains whole months, while seasons do not. Do not be afraid)
Summer
Autum (the postercomment compression filter sucks)
Oct
Nov (I'm afraid of whitespace)
Dec
2009_12_01
DCIM_102345.jpg
2009_12_31
DCIM_103456.JPG
(what the fuck is wrong with whitespace you piece of shit)
All_Phots (repeated for clairty) ...
2009
jan
feb
dec
2009_12_23_solstice_party_01.jpg
backup ...
incoming
apps
firefox_20_install.exe
photoshop_cs5_install.dmg
license_keys
photoshop_cs5.txt
outgoing
(e.g. calendar_2010, slideshow_birthday_party, send_to_parents)
Does anyone tried to organize the files in a tag based system? Something like: My C project for a microprocessor class at my university(UFRGS) would have: Programming, C, UFRGS and microprocessors tags. My C projects at work would have: Programming, C, (*company) and (*project) tags. Any idea how to do it?
- irreplaceable "live" files, backed up daily (about 200 megs): files I created myself, ie mails, docs, code... It's the "My Documents" folder.
- irreplaceable archives, backed up monthly to dvd (about 4 gigs): mainly photos and home videos: "My Pictures" and "\Archives", plus my "live" files.
- everything, backed up monthly to an external HD: mainly my painstakingly ripped CDs and DVDs, plus all of the above. "Flacs", "MP3s", "Films", "Series", "XXX"
The Cloud - because you don't care if your apps and data are up in the air.
Being a system architect specializing in SharePoint tech the choice was rather easy for me; i set up a fileserver and WSS3, created a document library in WSS with an attached event handler that stuck any uploads larger than 500mb or of specific media types (music,video and so on) onto the file server. Everything's nicely tagged and deeply indexed and easily searchable either through the search center or using a connected desktop search (or it will be next week when i get around to upgrading to SharePoint 2010 Foundation), and the videos and music is shared using media sharing.
The family's not aware that there are two separate locations for storage, they just toss their stuff into the library and access it mainly through search or windows media player.
Well, I'll have to check it out in more detail. The name is suckage however! JMHO... Sounds too much like "boaring" (sic).
Sometimes, real fast is almost as good as real-time.
That's the FOSS version of the venerable Andrew File System (Debian packages available). I use it together with MIT Kerberos V and OpenLDAP. It may not be the easiest system to set up and maintain, but what you get for your efforts -- a distributed file system -- is pretty cool even on a single server.
The problem with NFS and Samba solutions is that the manner in which servers and hard disks are organized has too much influence on directory structure: as disks and servers are changed over time, the structure of the file system usually changes as well. In addition, many users have different drive mappings, which further increases confusion and the risk of files eventually getting lost.
This is not the case with AFS; it's namespace (the AFS file system directory structure) is not influenced by disk structure and all users always see the same directory structure. So, when disks and/or servers are added or removed from a cell, which is what an AFS administrative unit is called, its namespace remains unaffected. In this respect it is less likely that anything will ever be lost.
Applying the Infinite Monkey Theorem I put everything into one folder, assigning each file a pseudo-random name. Although there's only one of me, in time, I'm confident that a pattern will emerge...
It must have been something you assimilated. . . .
Spotlight and fuhgeddaboutit.
Basically I don't worry about it any longer. Spotlight let's me search the entire file system, and subsets of it like emails, and OS X is pretty good about automatically generating metadata. Good enough for most circumstances, anyway.
And since you can search the contents of files, this makes looking for that PDF Joe Blow sent you last week dead easy.
Honestly, I have ~/Documents and a few subfolders, and that's about it. Between Spitlight and Quicksilver I don't have to worry about directory hierarchies any more.
The hierarchical file system is a nice structure when dealing with floppy disks, was adequately suited for a system where people were creating a few dozen documents, but it's felt a little dated for a while.
I'd just like a new paradigm that doesn't rely on backwards compatibility, and uses metadata as a primary means of associating files with each other.
yllacitebahpla nrop ruoy tros tsuJ
The solution is to consolidate all the data your care under one drive, one folder, so whatever you look for you only have one place to look under, one place to backup.
If you are using Windows, consolidate everything under c:\[some_folder]. Even if you run out of space on C Drive, it does not matter. The trick is to use "MkLink" (windows 7 and Vista only, for XP use linkd.exe from resource kit). It is like "link" in Unix, you can create a symbolic link or hard-link to anywhere else on your system, it can be located in an internal/external hard drive, or even network drives. And later if you move some data from d: to e:, your data will still be located in the same location under c:. You don't have to ever reconfigure any app to point to different folders. They will remain at the same location for the next decade or so.... And also, put the command to setup the symbolic links in one batch file, so you can easily recreate all the links when you setup a new computer.
If you are using unix, then you already know link, so no need to say more.
through a wisely ordered directory structure, though indexed with tracker. i have hundred thousands of files on my disks, so tags would drive me crazy and are imho insufficient. backups and syncing with rsync. i tend also to buy drives with the same block count, so i can backup and clone directly with dd. pretty oldschool, nothing fancy, for my data is sacred and should not get lost through insecure technologies like version control, binary diffs or distributed file systems. i really wish there'd be a standard for metadata and everybody using it appropriately.
Directories! Occasionally, desktop search provided by Nepomuk if I can't find it in less than two minutes or so of manual searching. Once in a blue moon, good ol' find piped to sort piped to less.
I ran into problems with duplicates over the years from copying files off my laptop before installing a new OS and for other reasons. I used dupmerge, which identifies the duplicate files and uses hardlinks to keep only a single copy. Freed up quite a bit of space for me.
I would lose my mind without Directory Opus. I would call it the swiss army knife of file system tools. Been using it since the Amiga days.
If you have a decent sense of order and something like Directory Opus - it goes a long way in maintaining your sanity.
Category and Date.
Overall, I like date-based folders (2010, 2009, etc.).
But first, I use a couple of high-level categories based on the type.
MyFiles
...2010
......ProjectA
MyMedia
...Pics
......2010
...Music
......Artist A
MyFinance
...2010
......BankA
MyFiles
...2010
......ProjectB
Bucket
...random files...
The categories are:
MyMedia / pics -> large files, changes slowly, not replaceable if lost. #1 backup priority
MyFinance -> bank statements, etc. -> small files, old stuff not replaceable, new stuff replaceable . #2 backup priority
MyFiles -> stuff I create / want to keep. -> small files, not replaceable. #3 backup priority
MyMedia / music -> large files, changes slowly, replaceable for $ or time. #4 backup priority
Bucket -> temporary files, downloads, etc. -> mix of large and small files. Not backed up.
This helps me organize my backups.
Nightly backup job copies everything to internal hard disk #2, and an offsite backup (crashplan) backs up offsite categories 1-3.
I don't tag. There's no universal format, so I stick with directories and filenames only.
Of everyone who posted their fancy choice of directory structure *nobody* told us where they keep their ~/.pr0n
I don't want to start an argument about which system is best for this task, but rather I have some ideas about how you could do this if you are on a Mac. I have over 20TB of storage in my home studio, with hundreds of thousands of files. Its one thing to have that much data, but its another to have it well organized. To that end, here are the apps I currently use on the Mac to organize my data
iTunes - 2TB of media on NAS drives
iPhoto - family and personal pics
Aperture - handles DSLR semi pro collections
iMovie - all family DV and HDV files
Suitcase Fusion - handles fonts very well
Final Cut Server - Media files for studio
Time Machine - local Mini server running OS X server backs up all local machines
It is also important to organize you data, and these work great for me.
Bento - personal database stores serial numbers, passwords, insurance info, and much more. Local network sharing
Yojimbo - stores and syncs news, photos, notes, and other media.
Things - amazing to do list
After years of trying many choices, these are apps on a regular basis. Good luck...
Be kind, for everyone you meet is fighting a difficult battle. - Plato
Everything is on 2 seperate harddrives at the same time. Some of the downloaded things like drivers and things that will be updated are deleted. A logfile where I just write notes and some subdirectories for rough sorting.
How does it compare to bup?
Nerd rage is the funniest rage.
... are condemned to repeat it.
I understand the problem. At work and at home, I use my homebrew hierarchy to manage my files. But looking at this thread, I felt a wave of nostalgia for Usenet's alt.barney.dinosaur.die.die.die.
The worst "hierarchies" I've encountered at my work are:
1. Save everything on the Desktop. (Good Heavens, woman! Do you even have wallpaper?!)
2. New Folder. New Folder2. New Folder3. Etc.
Honesty. Loyalty. Kindness. Laughter. Generosity. Magic!
Max two directories deep.
Reverse date system
Everything gets a folder (no unfiled files)
eg:
Music/Pink Floyd - Dark Side of the Moon
Photos/2008-11 Europe Vacation
I'll see your hokum and raise you a boondoggle.
Bup seems to be a great backup program, while Boar aims to be a full vcs. There is certainly some overlap between these types of applications, but in general, backups will sooner or later be replaced by more recent backups, and it is up to you to make sure that you don't accidentially delete any files, because you will have a limited amount of file history. Now, bup seems to allow for efficient backups, so perhaps you don't need to (cannot?) purge old backups. However, I like vcs-like workdir concept, and a commit and update command to keep workdirs synchronized between different computers. Also, boar allows for efficiently making or updating verified copies of the repository, which makes it easy to maintain good backups. But granted, the distinction in general between a good backup tool and a vcs is blurry.
Check the price of data storage.
Redundancy is cheap, data loss is expensive.
Yup, distribute across the LAN, and use frequent backups and sensible write permissions.
I have a tree of directories with mostly sensible names on the server (2TB) for stuff that's worth filing away, with automated nightly backup to one of two external 2TB disks. Another 2TB disk contains videos and movies which are supplied on demand to the TV, and a 1TB disk contains music and audio lectures (we've spent a fortune on stuff from the Teaching Company, for instance). The latter two disks are not in the backup cycle, as they are mirrors of disks on two of the workstations at home.
The kids have been instructed that only deletable stuff should be saved locally on a workstation - anything to be retained should be in their directories on the server. The kids have read access to almost all of the server (a few "special" directories are off-limits) but write access only to their own home directories. They know I'll copy the server's stuff to new disks or a new server when the time comes, but workstation disks are just wiped...
Once upon a time, I used to make file links if a particular file reasonably belonged in more than one directory in our tree. However, I stopped this since it was extra work without tangible benefit: locate has never let me down on finding a file.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Organizing your directories and files are just the beginning. I have found I need to use a couple of tools to have the power I need to find the specific document, phrase, number, image (w/tags), etc. I use the following: (1) Google Desktop (not all file formats supported); (2) A database software to catalog and LINK the document (full document path) for those very special items I KNOW I will need to find again [Software program: DBTextworks by Inmagic]; (3) Indexing Software - similar to Google Desktop - more robust and can handle MOST Professional Software Formats [Software Program: ISYS by Odyssey based in Australia]. Yes, I use all of these methods in combination WITH organizing directories and files. Whether you use multiple hard drives, a server system, or a single pc, these tools can make your file organization much, much easier. I use this power tool combination professionally and at home. I can spend more of my time working on the task at hand, instead of searching for "lost" documents. These programs run on PC (Micosoft operating systems) and were not designed for the MAC OS.
For Media:
- Photos are organized in Lightroom and iPhoto under a ~/Pictures folder.
- Music is organized by iTunes under ~/Music.
- There is a ~/Movies folder, but that's only for downloads. I don't keep movies around after watching them. Waste of space.
I have a ~/unix directory which is the prefix for manually compiled software. My own code also lives under ~/unix/src and is installed into ~/unix/bin. Oh, because of Eclipse there's also a ~/Workspace folder lying around. I should move that somewhere else.
Then there's ~/org for my org-mode system. This keeps track of notes, todos, appointments, and more and more project files, including my own scripts and stuff I do for Uni.
Mail lives under ~/Mail which I access via mutt and search/index with maildir-utils. I dump all mail into an =Archive mailbox and clean out my =Inbox every day. If a mail requires further action I capture it within org-mode to get it into my system and move it to =Archive with one key. mutt is also integrated with the OS X address book. (The integration of the address book with org-mode is an open loop.)
Downloads (except movies) from the internet go straight to ~/Desktop which I also try to clean out every day. Occasionally, I will check out a project folder and keep it around longer, which brings me to...
~/git. In there I keep (bare) git repositories for my org-system and e-mail (checked in daily), my code, and other large projects. I also use these repositories to sync stuff across different machines.
Oh, and then there's an 3-year old, 30-GB-large encrypted disk-image from my last computer lying around which contains stuff that I haven't brought over yet. (20 GB of that are photos. Most of the rest are caches and data cruft from programs.)
Last time I checked, my whole digital life (minus photos) fits into 6 GB. That includes a 15-year-old e-mail archive and all the other stuff I've kept since then. I have around 30 GB in music lying around but I don't count it because I find it completely replaceable and there are good online options available now, such as Soundcloud and others. (I'm also lucky that there are two excellent radio stations available here: Fritz and Motor.FM.)
Backups are handled by TimeMachine and I regularly swap out the drive and take it to my parents' place for offsite storage.
I generally try to avoid folders for organization and prefer to access the files through a software layer using folders only as a backend. For media there's ready-made stuff (iTunes, iPhoto) and Lightroom allows me to impose my preferred structure. My own stuff I usually manage through org and git. I also try to go through my stuff once a year and clean up the cruft. That keeps everything nice and clean.
Free Manning, jail Obama.
docs, img, programs, code, music + grep + find. Definitively everything had become very easy since I don't watch porn anymore.
Open Source Network Inventory for the masses! Kuwaiba
If you are on Windows you might want to give Nemo Documents a try. It gives a time based view and allows one to use tags. Disclaimer: author posting ;-)
private/backup/
private/documents/
private/pictures/
private/source/
private/videos/
public/books/
public/games/
public/images/
public/movies/
public/music/
public/software/
downloads/complete/
downloads/incomplete/
downloads/torrents/
No dependencies, requirements, maintenance, or bugs. Easy.
- Fotos: Smugmug account + local NAS.
- _General_ personal files: organized by directory with Faubackup backups to local NAS (no offsite backup)
- Music & Video (i.e. entertainment, not personal): local NAS.
- Smaller more important files (code, latex docs, text): git + github account
- Contacts, addresses, agenda: Google cloud
- General notes: Remember The Milk "pro" account
I recently got a laptop for the first time since I spend most of my time away from my main computer now. It's easy enough organising files in one location, but keeping them synchronised and secure is another matter.
The solution I came up with for myself is making use of an archive.
I have five main folders: Audio, Documents, Images, Software and Videos. I also have the same hierarchy in my archive folder on my main computer. This allows me to move the bulk of my files into the archive and keep my other folders synchronised with Unison.
Harddrive space is cheap, but:
- file transfers are slow,
- RAID isn't very portable,
- network access isn't always available and
- you don't need access to *everything* all the time.
My wife used to share an iTunes share on the network, but I started hauling my computer to work so our collections diverged and she now has things I don't and vice versa. In some cases, we'll have both imported the same cd, and one will have read better. I have no idea how I'm supposed to figure out exact duplicates or better copies, etc.
I concur with many of the others, directories are still a good way to organize your files :)
I have a three terabyte raid-5 Ubuntu file server I use to serve up the media in my house. I stick with a simple directory structure trying to keep it as shallow as possible:
This seems to work well for me, but just keep moving stuff around and you'll figure it out. File management is easy with Samba or your favorite SFTP client. I've also found that a bluray burner is good for stuff I want to keep, but in reality am never going to look at again.
Good Luck!
It's been proposed to have a database whose keys are metadata tags, whether improvised algorithmically, found within files of certain types, or hand-assigned. This would be fine when you want to find a file yourself. But many programs want to use file names. A Linux user-mode file system could query the data base and provide access to the files it finds. Designing the right notation would involve serious insight. The file name will likely need components addressing routing, status (do you want a developer version of a stable one?), content, and other attributes.
It's really a simple answer. Most stuff you'll never need again. Really. Some guy storing .iso images? Burn it to a CD/DVD and keep the freaking disk. If it's not important enough to keep up with retention, chances are it's not important enough to hold onto. Photos? Do you really need pics of your girlfriend thrice removed (unless she's naked, of course, in which case why haven't you already uploaded it to the internet to share with us)? Old emails? If they're really important, print them out and file them. If not, see above - most of the crap you're hauling about just isn't that important.
The rest of the stuff that you actually do reference frequently will fit just fine in a hierarchical file system. That stuff that you really need and can't, for the life of you, find, you can probably get from one of your friends - take the time to reconnect to them as a person rather than with just their information. Finally, not remembering some things can be more precious than remembering. It brings refound treasures and retold stories back into your world. And, trust me, those are a lot more important than any file you've misplaced...
That is all.
Redundancy can be done the wrong way (50 copies on 50 media) or the smart way - one copy on a redundant storage with regular backups off that. You get the benefit of cheap media and no management drawbacks, noob.
To get to the smart way, you need organization.
Having just started cleaning my house, this story comes close to my heart. Looking around, I have 6 boxes of old “documents”. What to do with them?
First to cover the common areas:
Video:
I have two TIVO boxes, one is high definition, both recording constantly.
I have one system with 8TB of storage to sort/organize the incoming TIVO recording.
I’m setting up two 60TB servers for my “movies and TV shows”. (Each will handle 26 hard drives). I use the term “setting up” as I’ve run into some issues with these systems.
Binary:
I have a 2TB system set up for binary files. (This would be development, OS, drivers, patches and the like). You never know when you will need a DOS bootable disc.
Music:
I have one system (with 2TB storage) to handle my MP3’s. (Still need to sort/organize/remove duplicates). Currently this one also houses my image collection, important documents and the like. It is acting as kind of a catchall for everything else.
Data:
I’ve recently set up a system to handle “data” (document based); with 130 GB of space. I’m using “Home Document Manager” . Though not mature, they are more amenable to fixing the problems.
And now to the point: Organization.
Overview
The first – glaring issue is lack of a good storage house. Most management systems sort a single file in a single location, sometimes with tags. A good example of the problem that I found: what if I have a Medical Bill, which is being kept for Legal reasons, which I will need at Tax time? What if I have a MP3, Music Video and Movie that I would like to tie together (or heaven forbid multiple playlists)? Or Movie props that I’ve purchased off eBay.
I would not like to keep the medical bill after 3 years, but for legal reason would like to keep it for seven. I don’t want to delete the “item”, but I no longer need to be reminded about the “bill”. I don’t want to have multiple copies of the same item, which makes searching a nightmare. And “tags” are a start, but are not granular enough.
Video organization:
Extreme Movie Manager. Ok, it has some bugs, but it does a VERY good job. With its multiple views, and multiple ways of keeping track of movies, it is the best one that I’ve seen.
Music: Currently I’m (just) using Media Monkey and MS Media Player. Media Money has a severe limitation in that it does not handle video (read music videos-Watch "Vertical Lines" by Leather Hands to get the point). I attempted to use an “automated sorting” system, however it has significant issues, the biggest being it took MPS’s from a known group (1970’s for example), and moved them to “Unknown”, “Unknown”. Can’t use that. I also used Clone Master, and found that I have almost 2500 duplicate (MP3) files. Unfortunately, it “guesses” the wrong one most time for the likely file needing to be deleted.
Binary is actually the most straightforward simple file structure
Other issues:
Video Servers: I’m also running hard drive selection into issues with the video servers. The problem is: Enterprise class SATA drives are expensive, “small” (only 2TB), fast (as such they use a lot more energy). “Green” drives are cheap and plentiful and use a lot less power (and generate a lot less heat) however they are not compatible with the RAID controllers needed.
Video Playback: I have a decent system to handle the Blue-ray, high def requirements. However the software also has problems: In/with high def you can’t read the “default” fonts displayed
It's surprising that no one seems to have mentioned hashing files or writing checksums to file yet. Besides organizing the files it's probably the most important thing you can do. There don't seem to be very many graphical interface tools for it, but here are two commands that work: /dev/sda1 /dev/sda1 securebackup /dev/mapper/securebackup
Within the top directory:
$ find . -type f -exec sha512sum '{}' + | tee hashes.sha
To verify:
$ sha512sum -c --quiet hashes.sha | tee verify.txt
Strongly recommend running this at least once. It's a good idea to hash before copying to a new hard drive then verify afterward. I was able to detect and prevent more data corruption issues by doing this on a Samsung Spinpoint F4 hard drive which had a (now resolved) firmware bug for example. Verification tasks can be split up over different operations by cutting lines out of the file and placing them elsewhere, so that one directory is checked while others aren't (due to how time-consuming the process is).
Also, while organizing personal data onto many hard drives, keep in mind encryption options that are very convenient (especially on Ubuntu). Truecrypt hard drives look like random binary data initially which makes it harder to know if a drive has the wrong password or is simply uninitialized. Cryptsetup works and prompts for passwords on Ubuntu:
$ sudo cryptsetup luksFormat
$ sudo cryptsetup luksOpen
$ sudo mkfs.ext4 -m 1 -L securebackup
Now, after unplugging the drive and plugging it back in, Ubuntu should automatically detect the presence of an encrypted volume.
Even with multiple computers it's possible to keep all the files in one location this way. A proper boot hard drive isn't even needed because the Ubuntu live CD, whether on a physical disk or on a (faster) thumb drive includes the cryptsetup software. Or, for added security, install Ubuntu using the alternate installer with encrypted system partitions.
For my personal photos and movies I sub categorise them using international date format in sub folders named like so:
"1998-03-30 - X's birthday party" (A single day, so full date.)
"1999-06 - Travel to Europe" (A few weeks, so just putting the month in.)
etc.
That way, I can sort alphabetically yet still find events in a sort of timeline.
Or for tax records... (submitted quarterly)
"BAS - 2009-Q4"
"BAS - 2010-Q1"
All the BAS sibmissions are now in order, but grouped together.
I use fdupes, http://netdial.caribe.net/~adrian2/fdupes.html
and FSLint looks interesting http://www.pixelbeat.org/fslint/
Of course, it would be nice to be able to replace some of the dupes with links
"amassed a huge amount of files in [a] huge amount of formats about a huge amount of topics"
Files, formats and topics are all numerable, unlike, say, wheat, so you want "number" rather than "amount".
30 years of attention to English would have been awesome!
My filing system is still under development off in the Someday Isles, as in:
Someday I'll get around to sorting and organizing all my stuff, but not right now.
This space unintentionally left blank.
Although I have by no means figured out the file problem, I've learned what I believe to be the fundamental formula of data management:
Backups != Archives
When I first got a computer with a CD/DVD burner, I was thrilled, as I would finally have limitless and cheap data storage. When I filled up my disk, instead of having to delete files to create new ones, I could "back up" the files that were taking a lot of space.
It came to pass that I literally had hundreds of CDs and DVDs with a variety of backups using multiple indexing schemes. AVI_SET_1 through AVI_SET_5. LS000 through LS038. DVD±RWs 0 through 8. Many of my discs remain unlabeled. Most of them contain a hodgepodge of file types, many of which are outdated.
I have hundreds of gigabytes of data, but most of it is unaccessible. When I want that project file from five years ago, I can't just search my hard drive for it. I have to dig through disc after disc, insert it, look at the contents, and occasionally write a memo of some sort concerning the contents of the disc (often losing track of the memo in mere months). It would be a lot easier if it were all on one monster hard drive (better yet, two hard drives in mirrored RAID configuration). If I actually knew how to organize the files as well, they might not even take that much space, and I wouldn't have needed today's hard drive prices to replace compact discs for the type of storage I was doing.
I failed to understand what a backup is for and what an archive is for. A backup is for disaster recovery. An archive is for shoving files out of sight until I need them in the future. I was using backup technology (compact discs) for archiving.
Let's say that, of all the data one has, 99% of it is junk that you hang on to "just in case", and 1% of it is used routinely (these figures vary wildly in real life). Backups target the 1%, while archives target the 99%. Fortunately, for most people, a DVD is more than sufficient to store the 1% at any given moment in time (operating system files aside).
The bottom line is, an ideal (and cheap) arrangement to keep archives and backups in their respective places is:
On my computer (Linux), I have gobs of directories full of applications that I use, and I archive shell scripts I create to build more complicated bits (if it takes 20 packages to build a major package like ffmpeg, use a script to do the job, and it gets done much faster, since it builds all packages, configures all pieces, moves all subdirectories, sets all permissions, modifies all configuration files from one place, one command (to rule them all). Other than the script, the source files can be fetched from the net (usually newer versions), so don't archive it. System files are untouched. I use rsync to do backups/incremental backups. It does a first 'big, everything' backup, and every subsequent run is incremental (unless you delete the big one, in which case it makes another big one). Mail, pictures, movies, etc., all gets backed up to a second drive. Every year or two, create a dvd of anything more than 2 years old (keep it secret, keep it safe), and toast it from the drive. If you are nervous about the DVD, make two. 4 GB of data is quite a bit of data. Done.
In my opinion, computing has become much more simple in recent years. Most major operating systems are of good quality and have worked out the major BSOD kinks. Furthermore, advancements in interface design and process flow have made computer use much more intuitive.
From all that, I've found that I actually hoard far fewer files these days. I have one folder that contains my documents (with some subfolders to organize topics, but not more than one layer of subfolders). My music sits in iTunes (yes, I know where it really is). Photos sit in iPhoto (same story). I really don't track that many files all together. I think it could fit all on one 4 GB USB stick.
Has anyone else had this experience lately?
Timely article, I am in the process of re-doing my setup at the moment. Here is the plan, most of which I intend to implement over the next 2-3 months:
1. A NAS box, with either 8 or 12 TB of storage in JBOD configuration as a media server. Media files originate with my main PC, and are transferred to the NAS box. As this is done the converted files are duplicated to a USB external drive (stored offsite), and also burned to DVDs (stored in the garage, seperate from the house, with numbered disks and an index file on the PC for recovery). Original disks are stored in a safe cool place - re-rippable in extremis.About 300 CDs and 1,400 DVDs (some purchased, some recorded off cable on a DVD recorder) still to process - tedious! Emulator software, copies of ebooks, my photos etc will also go here. Organised in a logica directory structure (/media/TV/documentaries, /photos/2010_03_15_Fred's_Party/ etc). As well as my main network, a pair of WD TV Live's will be wirelessly linked to this. For photo directories etc, I will include a small text file describing the contents.
2. A second NAS box with 2x2TB drives in RAID 1 configuration. This is for the stuff I am working on and that has value to me & isn't easily replaced. This will be backed up nightly to a USB drive. Two external USB drives will be used for the backing up, one kept at home, the other locked in my drawer at work, and rotated weekly. All three of my machines will be backed up to this box. These will be encrypted.
As disk capacities continue grow exponentially, and my storage needs will as well, I will just keep migrating the whole kit and kaboodle to NAS boxes with ever larger capacity, but with the same structure intact.
Have quite an international taste in movies, so I like the following folder format: $countryOrRegion/$YearOfRelease/$MovieName.$YearOfRelease.avi
At least I know I have a lot of company with my "stuff", "sort", "move_me", "misc", "etc" folders.
I had the same issue with music genres, but genres translated to user space (i.e. what use does it have to me) as what was I in the mood for. So, I just started organizing music by color. Red and blue being the most obvious. Still, bands that have really speedy songs and ballads? Well, I class them by the "majority".
I just realized, I don't have a "black" folder. I am more optimistic than I previously thought. Wait.. is that right? Maybe black can be for doubt, so Nickelback can go there as it shows there is little hope for humanity.
(As stated in another comment: Nickleback would just be porn)
Atlas Shrugged : Thematic Story
I have over million files and even though I do put them into folder hierarchies, I tend not to browse for files but simply use Spotlight to quickly get to them. Spotlight is amazingly usable, fast and always there (and it's available from the CLI as well). Queries are instantly updated as you type, can be saved as dynamic folders (views that update as more files match criteria) etc.
So for example, interested in all pdf books that mention python in file name? Type kind:pdf filename:pyton
Interested in all pictures you took at aperture f4 and where you used flash? Type kind:image fnumber:4 flash:1
And of course you can always put additional metadata on your files to find interesting binary files.
As the island of our knowledge grows, so does the shore of our ignorance.
A sensible directory structure never goes out of style. Human-readable, dependent on no external anything, eminently portable and transferable, and altogether future-proof. Metadata's as good as whatever the standard is. If you're pouring time into creating metadata for some gimmicky piece of proprietary garbage, you're only hurting yourself in the long run.
Without going into superfluous detail on my directory structure (lots of other people have discussed that here), I will say that the one area where I do use metadata religiously is in id3 tags for audio files. My audio files are sorted sensibly, a la /audio/artist/year - album/tracknumber - trackname, with all metadata as close to perfect as I can possibly make it. When I've got new music to add, it goes into the unsorted directory outside my audio path, where it sits until I get around to tagging and bagging it (I've been using Easytag for years, but lately I've been experimenting a bit with kid3), getting the names and directories uniform, et cetera, when it gets included in the audio structure. This is good for all kinds of reasons, especially human-readability and the ability to just copy shit over to an external drive or mp3 player (or import into an application database or whatever I want to do today) with no muss, fuss, or bother. It Just Works.
I learned this the hard way about six years ago when I first decided to organize my absolute shitty mess of audio files, which at the time was around 60GB. That process took months, but when I was finally through it it was perfect. Everything's named correctly, tagged correctly, all the cover art's there and named correctly, everything is perfect. I am determined to never have to go through that again, thus the unsorted directory regime.
Hey, I finally got my first freak! Took you long enough!
The limit to any organisational scheme that you're going to end up using is where, after drilling down through whatever directory-structure you come up with, you end up in a folder with a bunch of files in it, and then the *parcticular* file that you want is somewhere in that folder--sitting right in front of you, along with all of the other files that ended up in the same group, and the question at that point is how long it's going to take you to recognise the one among all the others.
Different types of organisation-schemes basically try to minimise this problem by reducing the number of not-easily-distinguishable items that you have to deal with at any given time.
The obvious approach is to try splitting-up large, flat collections into smaller collections and adding levels of indirection when a given tier has too many items in it to be maneagable, so the first task is to figure out what `too many items to be manageable'--and you want to avoid splitting things up too much beyond the point of `small enough to manage', because there's really a trade-off going on: in order to reduce the complexity at each particular level of your structured collection of stuff, you add some navigational complexity.
So, while that will help you tackle the `number' part of the problem, there actually be some interesting (and useful!) work done toward figuring out ways of making larger collections more manageable without splitting them (i.e.: tackling the `not-easily-distinguishable' part); one of the more notable ones is a scheme for fixing the homogeneity of file-icons--because it's significantly easier to recognise a thing when it actually appears distinct from its surroundings; J.P. Lewis et al. published an essay on this, a while back, called "VisualIDs: Automatic Distinctive Icons for Desktop Interfaces", and included the results of their user testing; Lewis has a website (with the title, "VisualIDs: Scenery for Data Worlds") that's worth looking at:
http://scribblethink.org/Work/VisualIDs/visualids.html
There's even a link, at the bottom of that page, to a reference implementation--and even patches to integrate VisualIDs into Nautilus.
-rozzin.
All of comments I read put the focus on hardware and folder tree. I simply use the default directory tree of MacOS X ([user]/[folder by type of datas]), the rest comes naturally and I retrieve them through finder or spotlight.
The most interesting thing is how works storage and backup of important files... in fact all datas are backup on my Dropbox account on the cloud.
At least, the next step should be to store all our datas in a personal space on the cloud, more reliable and more accessible.
What do you thing of such a possibility?
This posting inspired me to blog about the topic:
http://scienceblogs.com/gregladen/2011/02/how_to_organize_your_stuff.php#c3298223
Two categories (Pr0n and not-Pr0n)
Nickelback goes in my "Canadian Bands that aren't Rush, Loverboy, or Killer Dwarfs" folder. Extra points to anyone who knows who Killer Dwarfs are.
I haven't thought about organizing by color. Does red mean hot and blue mean cool? If so, are your red songs big hits and your blue songs sleepers? I listen to my music on MiniDisc, as such, I'm still basically making mix tapes in 2011. I want to organize my songs in a way that reflects mood, but I don't want to abandon my genre->performer->album format because I find that's still best for casual browsing.
The Music Genome Project's work is really interesting to me, and I would like to take advantage of genetic attributes in my own music searches because "hard rock" isn't a very worthwhile tag. Hard rock can mean everything from Deep Purple to Van Halen to Poison to Evanescence. That's a lot of difference. What is better for my needs are genome attributes like "Feel" and "Roots". I would benefit from odd custom attributes like "miami vice" and "belongs on an '80s sci-fi/horror movie soundtrack"
Does anyone know of a project to produce a quality audio player that has built-in search and allows you to add custom tags? Does FLAC and APE allow custom tags?
We do it in the road. I mean, really, no one will be watching us.