rpwoodbu · Slashdot Mirror

Subversion with a touch of bash on How Do You Sync & Manage Your Home Directories? · 2009-06-23 10:43 · Score: 4, Informative

I have found that using Subversion (svn) with the aid of a bash script that is run manually actually works really well and provides a number of special advantages. Here's how I have it constructed:

First, I don't actually make my whole home directory a svn checkout. I have a subdirectory in it that is the checkout, and my bash script ensures there are symlinks into it for the things I want sync'd. This makes it easy to have some differences between locations. In particular, I can have a different .bashrc for one machine than another, but keep them both in svn as separate files; it is just a matter of making the symlink point to the one I want to use in each location. My bash script will make the symlink if the file doesn't exist, and warn if the file does exist but isn't a symlink. It does this for a number of files.

Another benefit of this method is that I don't put all my files in one checkout. The core files I'll want in all my home directories (e.g. .bashrc, .vimrc, ssh .config and public keys, etc.) go in a checkout called "homedir". But my documents go elsewhere. And my sensitive files (e.g. private keys) go somewhere else still. I choose what is appropriate to install at each location (usually just the "homedir" checkout on boxes I don't own). My bash script detects which checkouts I have and does the appropriate steps.

The bash script not only sets up the symlinks but it also does an "svn status" on each checkout so I'll know if there are any files I've created that I haven't added, or any files I've modified that I haven't committed. I prefer not to automate adds and commits. I'll definitely see any pending things when I run my sync script, and can simply do an "svn add" or "svn commit" as necessary.

I also prefer not to automate the running of the sync script. I like being in control of my bandwidth usage, especially when connected via slow links (e.g. Verizon EV-DO, AT&T GPRS). Plus dealing with conflicts is much easier when it is interactive (although I can usually avoid that scenario). It also simplifies authentication to run it from my shell, as it can just use my ssh agent (which I forward, which is setup in my sync'd ssh config).

The sync bash script takes care of a few other edge-case issues, like dealing with files in ~/.ssh that have to have certain permissions and whatnot. And I've taken care to ensure that the script doesn't just blow away files; it will warn if things don't look right, and leaves it to me to fix it.

Using Subversion has another big advantage: it is likely to be installed already in many places. So when I'm given an account on someone's computer, I can usually get my environment just the way I like it in a few short steps:

svn co svn+ssh://my.server.tld/my/path/to/svn/trunk/homedir ~/homedir ~/homedir/bin/mysync # This is my bash script to do the syncing # Correct any complains about .bashrc not being a symlink and whatnot ~/homedir/bin/mysync # Log out and back in, or source .bashrc

No fuss, no muss. No downloading some sync package and building it just to get your .bashrc or .vimrc on a random box, or asking the admin to install something. Subversion is usually there, and even if it isn't, most admins are happy to install it. Subversion deals well with binary files, and even large files. For bulk things (like a music library), I'm more likely to rsync it, partly because it is bulk, partly because it doesn't benefit from versioning, and partly because it only needs to be a unidirectional sync. I could easily add that to my sync script.

I am simply in the habit of typing "mysync" from time to time (my .bashrc puts ~/bin/ in my $PATH). This works for me very nicely. Some people may prefer a little more automation, and of course my script could automatically do adds and commits, and even skip the log messages. But I prefer a bit more process; after all, this is my data we're talking about!

If there is interest, I may post my sync script.

Use Jabber on Internal Instant Messaging Client / Server Combo? · 2009-04-08 11:12 · Score: 1

Jabber should solve your needs. It is free and open. There are many client and server implementations. Almost every Jabber client and server supports SSL. There are servers that do server-side logging. You will want to prevent connection to external Jabber servers by use of a firewall rule. However, servers can exist on non-standard ports, and the only complete way to prevent access to that is to restrict the client's configuration (not sure which clients make that easy), and restrict your users from running software on their computers not installed by an administrator; you have to decide if it is worth being so Draconian.

Visit www.jabber.org a long list of servers and clients. Evaluate them to see which fit your needs. My recommendation for a client in Windows is Psi, as it is good, easy to use, flexible, and only talks to Jabber. I have experience with ejabberd and jabberd 1.x, and I've heard decent things about jabberd 2.x and Openfire; you'll need to evaluate them yourself to get the one that gives you the features you need.

Re:If they can do this... on Python-to-C++ Compiler · 2006-06-15 07:25 · Score: 5, Insightful

It is worth mentioning that one of the the original implementations of C++ (if not the very first) was "cfront", a C++-to-C converter. I see this as a much easier way to get a new language implemented quickly, as you can take advantage of the common functionalities already implemented in the target language of the converter. Although Python is not a new language, using it as a compiled language is new, and thus I believe it is comparable to being a new language for this argument. C++ and Python have a lot in common, which makes C++ a very suitable target language for a Python-to-[compiled_language] converter.

If this converter proves to be successful, I believe that a GCC frontend will be written eventually. There are probably potential optimizations that would be difficult or impossible to implement any other way.

Some may think that the dynamic nature of Python may preclude its inclusion in GCC. Technically, all that would need to be done is to have a runtime to handle dynamic things, similar to how Objective-C (for which there is GCC support) has a runtime to handle message passing and late binding. However, a large portion of the potential efficiency of a compiled version of the language would be lost to these dynamic capabilities; luckily, a compiler can detect when things are implicitly static (in fact, this converter is limited to implicitly static constructs), and optimise them to be truly static at compile-time.

We've come full-circle, sort of on Why Emails Are Misunderstood · 2006-05-15 09:43 · Score: 1

Disclaimer: I haven't read the article.

However, I have considered for some time now that e-mails are like a return to the days of old where people over distances communicated primarily by written letters. The significant technical difference is that letters took much longer to deliver, and of course there was a postage fee. Thus people would take great time and care composing a letter. These days, e-mails are so commonplace that people take precious little care composing and reading them.

I also feel that people today are much less capable of expressing themselves clearly and eloquently. One can write an unambiguous letter if one avoids writing like they speak (e.g. colloquialisms, slang). Writing should not be the same as speaking by in large, for all the reasons that I'm sure the article addresses. Inflection is hard to achieve, so the words literally need to speak for themselves on their own merits.

When I read letters written by the likes of Thomas Jefferson, it is hard to believe that there is anything inherently wrong or limiting with the written word. In fact, it is quite the opposite. The beauty of the written word is that it empowers the composer to scrutinize his work to ensure the message is clear and concise; it gives him the chance to retract a statement that might have been inflammatory, or add a statement that wins the argument.

If anything, e-mails should enable greater understanding. The medium is too often misused.

Multiple conversions still necessary on Data Centers And DC Power · 2005-11-11 15:17 · Score: 1

Regardless of whether AC or DC is used in the distribution, voltage conversions are going to be needed in order for the distribution to be practical.

To use telco as an example, they use 48VDC as their distribution. This is a convenient voltage for a few reasons: it is high enough to keep line loss and cable size within reason, and it is high enough to power most equipment without trouble. However, most devices do not operate on 48VDC directly; they tend to want 12 or 5 or 3.3 or [insert CPU voltage du jour], or quite often all of these at once. It is impractical to distribute these voltages over more than a few tens of feet, as you would need really big wires to avoid excessive line loss due to the higher current draw (ever see what happens to badly installed low-voltage outdoor lighting?). Plus it is impractial to distribute lots of different voltages. So the end devices must have their own power supplies to downconvert the DC voltages into whatever is needed for that device.

So we're not eliminating the PSU at each device. But the good news is that these PSUs can be considerably simpler and probably more efficient.

Some people are also confused as to why power entering the datacenter might be converted to DC and back to AC before it makes it to the PSU on any equipment. Beyond the need to charge the large batteries that support the UPS (which requires DC), some installations use what is often called "online UPS". This means that all power is converted to DC, then reinstated as AC 100% of the time. It is like you are always running off of the UPS. You enjoy a higher level of control over the power quality through this arrangement. Unfortunately, these systems cost a great deal of power efficiency.

A fully DC infrastructure would give you the benefit of an online UPS without the power costs. In a DC infrastructure, you have one large bank of DC power supplies that connect directly to the batteries and to the equipment. While the utility power is present, the equipment is being powered from the DC power supply and the batteries are charged. When the utility power goes out, the batteries are able to handle the load of the equipment. No switch-over equipment is needed; the magic of electricity makes this a rather passive system. This is what the telcos have been doing for decades. The only significant difference between a DC infrastructure and an online AC-based UPS is that the online UPS needs a large bank of power inverters to recreate the AC, which chews up more power.

With an "offline UPS" (like the one under your desk), there is always the chance that it won't switch over; the online system eliminates this point of failure altogether. A DC system eliminates both the switch-over equipment and the inverters -- a win-win!

PC and server manufacturers need to get on the ball and develop DC-DC power supplies for their equipment. It really isn't hard at all. I built one for an old desktop PC that I installed in the car to play MP3s. I just took an old dead AT power supply, gutted it, and replaced it with (albeit inefficient) linear power regulators for +12 and +5, and one low-current DC-DC converter (5 to 12) to give me the electrical isolation needed to create -12. I did this all with parts I had lying about. IBM/Dell/HP/et al. could do this in their sleep, and do it with much more efficient means.

Make sure your sales reps know that you want DC! If enough of us bring it up, they'll build it.

Re:It's time for Jabber on AIM's New Terms Of Service · 2005-03-11 18:41 · Score: 1

YESSSS!!!

Jabber offers so much, including:

Potential for full control of message path using an open and extensible protocol.
Ability to carry messages over a secure connection (i.e. SSL); this is well supported.
Flexibility to use different clients and servers, all which interoperate without the worry of a protocol change specifically designed to break 3rd party clients. There is no concept of a 3rd party client.
Support for cross-communication to those other chat services with those awful EULAs, just as a stop-gap until the world becomes fully enlightened. This does NOT require a multi-protocol client... it is called a "transport", and it lives on a server. One login, full communication... that's easy!

There are a number of freely usable Jabber servers, so you can begin enjoying it right away, without setting up a server yourself. Just because you're using one server doesn't mean you can't talk to users on another. Your Jabber ID is in the form username@server, just like an e-mail address, so this ability is intrinsic to the design of Jabber. This is the beauty of a decentralized model.

An excellent Windows client is Exodus. A popular cross-platform client is Psi (based on Qt). Even the ubiquitous GAIM has support for Jabber. And very soon, iChat in Mac OS X will support Jabber! I've even considered making my own cross-platform Jabber client; isn't it great that we have that option? For more information on Jabber in general, visit jabber.org

The most widely used Jabber server software is jabberd 1.4. It is usable in Linux and Windows. For a concise comparison of open-source servers, click here. For a comprehensive list of Jabber servers (both open and commerial), click here.

NOW HEAR THIS -- Start using Jabber!

Re:This is a very flawed logic on Experiences w/ Software RAID 5 Under Linux? · 2004-11-07 08:03 · Score: 1

Yes, I agree that it all "depends". If you have plenty of spare bus time, then it won't be a major issue. And indeed, there are things you can do to spread it all out. And yes, sometimes using a controller card instead of the built-in chipset controller can move data that was on a dedicated internal PCI bus onto the more crowded PCI bus for external devices. There are no hard and fast rules in this; good judgement must prevail.

However, I must disagree with your assertion that parity will be a small amount of additional data. Again, it all depends. For a RAID5 using 3 drives, the parity consumes an extra 50% of data (e.g. if three drives gives you two drive's worth of storage, the other third must be parity, hence 50% more). While "small amount" is a subjective measure, 50% seems a bit steep to consider small. If you increase the number of drives in your array, you decrease the proportion of parity data in the array, which would eventually reach a "small amount" by anyone's standard. In reality, most RAID5 implementations use three drives. In the case of this Slashdot story, he is using many more drives, thus the partiy data would indeed be a small amount. Therefore, a combination of his chipset controllers and add-on ATA contoller(s) on seperate busses could spread the load quite effectively in the case of an array with many drives.

Re:This is a very flawed logic on Experiences w/ Software RAID 5 Under Linux? · 2004-10-31 06:49 · Score: 1

No, there will be no difference between the recoverability of an equivalent software or hardware RAID. Now, some solutions might offer more options in the recovery department, such as trying harder to get data off of a failing drive before giving up, but that is not a function of whether the solution is based in software or hardware.

Read up on how the various RAID solutions work. In short, RAID5 is an "n-1" solution, meaning that you get the total storage capacity of n-1 drives, where n is the number of drives in the array. So in a simplified view, one drive is used as the parity drive (e.g. redundant data), so only one drive can be lost without invalidating the array. If the implementation is RAID5, this is the way it will work, regardless of whether it is done in hardware or software.

Bear in mind that you can do things like hot spares and mirroring that can give you an extra level of protection in addition to RAID5. But it does mean commiting more drives without getting more data capacity.

This is a very flawed logic on Experiences w/ Software RAID 5 Under Linux? · 2004-10-30 18:46 · Score: 4, Informative

This logic doesn't hold. Let's first talk about the performance.

Also, on any reasonably modern system, the software RAID will be faster. You just have a much faster processor to do the RAID processing for you. The added overhead of the RAID5 processing is nothing compared to a 1-2GHz processor.

The actual RAID processing is relatively easy, and any RAID solution, be it hardware or software, that is worth anything will not have any trouble doing the logic (perhaps the cards mentioned are indeed not worth anything). The processing isn't your limiting factor; it is data thoughput. This is where hardware shines. A lot of extra data has to be shipping in and out to maintain and validate the RAID. This can easily saturate busses. A hardware solution allows the computer to communicate only the "real" data between itself and the hardware device, and then allows that device to take the burden of communicating with the individual drives on their own dedicated busses. Sure, that device can become overwhelemed, but I submit to you that if it does, it was poorly designed.

I am not saying that one shouldn't consider software RAID solutions. Just don't consider them because you think the performance will be better.

Now lets talk about data recovery.

I've lost 4 drives out of a 12 drive system at the same time, and Linux has let me piece the RAID back together and I've lost nothing. Was the machine down? Yes. Did I lose data? No. Compare that with a 3ware hardware RAID system where I lost 2 drives. Even thought I probably could have salvaged 99% of the data off that array, the 3ware just would not let me work with that failed array.

Let us be clear: we are talking about RAID5. In RAID5, you simply cannot lose more than one drive without losing data integrity. And it isn't like you can get back some of your files; the destruction will be evenly distributed over your entire logical volume(s) as a function of the striping methodology. So it is quite impractical to recover from this scenario. I don't know what kind of system was being employeed with this 12-drive array that can withstand a 1/3 array loss, but it certainly wasn't a straight RAID5. I can come up with some solutions that would allow such massive failure, but then we aren't comparing apples to apples. I'd be very interested in knowing what the solution was in this example case. It should also be noted that we don't know how many drives were in the system that lost 2 drives, much less what kind of RAID configuration was being used. No conclusion can be derived from the information provuded.

As an aside, more often than not, when we as individuals want a large cheap array, we are less concerned about performance than reliability. We put what we can into the drives, and we hope to maximize our data/$ investment while minimizing our chances for disaster. A software RAID5 is a good solution. Some posts have said that if you can spend so much on the drives, what's stopping you from spending on a nice hardware controller? I submit that perhaps he's broke now! And besides, a controller that can RAID5 8 drives is quite the expensive controller indeed. This has software RAID written all over it.

Re:RUNT Linux on Essential Software for Thumbdrives? · 2004-09-06 11:55 · Score: 1

RUNT is perfect for this purpose. It stands for Resnet USB Network Tester, so the name says it all. After all, when you think about it, sometimes a Linux distro is actually the best utility. In the past I would often use a bootable CD distro like Knoppix as a diagnostic and rescue utility. But it means having the disc onhand, which I don't always. But since RUNT works from a USB Flash drive, I can keep it on my keychain at all times. If the system supports USB boot, you're golden! But even if it doesn't, you make a boot floppy from the image on the USB drive, and you can boot the USB drive on any i386 computer that has USB, in spite of BIOS limitations.

A fully bootable system, be it on CD or USB Flash, has real advantages in the field. You don't have to depend on the installed OS to be working, and lets face it, we spend a lot of time working on Windows boxes, because they spend a lot of time not working! It is great to get a shell prompt under a Linux that has drivers for lots of network devices. That means you can go online and find that Windows network card driver even when the only computer at your disposal is the very same Windows box with no network card driver! It is also handy to be able to rescue files off of an ailing NTFS system that isn't booting. Plus it is great to prove out the hardware when you're having weird problems that may be either hardware or driver related.

RUNT uses the UMSDOS filesystem, which makes it easy to install, customize, and coexist with other data. For those that don't know, UMSDOS uses a DOS-style FAT filesystem, but it mounts and works like a fully Linux-happy filesystem, storing the extra metadata (i.e. permissions, et al.) in special files that are hidden while mounted in Linux. The practical upshot is that the drive is still 100% compatible with other OSes (e.g. it is still FAT, not ext2 or something like that), so you can still use it for other data storage. Plus installation is as simple as unzipping the distro onto the flash. And making it directly bootable (e.g. no boot floppy needed) is pretty easy, and is going to be even easier when the new version is released soon.

If you don't put anything else on a USB Flash drive intended for diagnostics, put on RUNT!

Give Ghost another chance on Experiences w/ Drive Imaging Software? · 2003-11-12 09:44 · Score: 1

I have been using Ghost for many years now (even before it carried the Norton name), and it has worked when other things have failed. I have done it all, from being a developer to doing PC repair, and Ghost has been there for me. I have never wanted to look into another imaging program, because it did what I needed it to do. It has even gotten data off of bad drives when nothing else would.

That being said, you must know how to use the program effectively. When things are beautiful, it just simply works and works quickly. But it is rare that you are using it when life is beautiful. If you have a clean source image, and you are getting a system back up and running from it, it is pretty easy. But if you have a drive that is failing, or a filesystem that is hosed, things are a bit more tricky. Yes, it does like to give some nondescript error messages. But here are some ways to deal with it:

Make sure the source filesystem is clean (i.e. run chkdsk). If you are worried about chkdsk causing further damage (it has happened to me), or if that doesn't help, then read on.
Use Sector Copy mode. This is much slower and doesn't have the ability to resize the partition, but it will get everything bit-for-bit.
If you still get weird errors, your source drive may be going bad. You can tell Ghost to ignore sector read errors. You may not end up with a perfect copy, but this is the first thing I do when I suspect the drive is flaking out. On more than one occasion, using this feature at the first sign of failure has given me a workable backup just before the drive finally bit it for good. Oh, if my customers only understood how close to the brink they stood. And if I could just convince them to back up their data...

My two cents, anyway.

Article inaccurate and uninformed on Hard Drive Capacity Confusion, Lucidly Explained · 2003-10-07 20:44 · Score: 3, Informative

The basic point of the article is accurate: that HDD manufacturers use "standard" metric prefixes and OSes use "computer-ese" "metric-esque" prefixes, thus the confusion. However, the article notably lacks in these areas (and perhaps less notably in others):

It uses terms like "binary math" versus "decimal math". Last I checked, they were both equally viable ways of doing math, and as any viable method of doing math should be, they both always get the same answer! See section 3.5 if you want to get really mad! It isn't that the math is different that is causing a problem, it is that the algorithm is different. It just so happens that the algorithm was inspired by a number which is convenient when dealing with binary because it is an even power of 2.
There is no discussion of why HDD makers use normal math while OS makers use "computer-ese". It isn't wholly discountable that HDD makers are interested in making their drives look as big as possible against the competition, and if one manufacturer says a Gigabyte is 10^9 bytes then they all have to. And he paints the 1024-byte KiloByte basically as a stupid idea, which it isn't (albeit confusing).
The explanation (such as it is) for how much data is lost to OS overhead is inaccurate at best. He got his info for the Mac from the Drive Utility (akin to Disk Management or fdisk in MS-land), but got his WinXP info probably from the explorer. Fdisk will not report any filesystem size considerations, just the partition sizes, so neither should the Drive Utility. I'm betting the 1026 "lost" bytes are the partition table. This makes it look like the Mac loses 1026 bytes, while Windows tosses about 11 MB out the door. While I'm not trying to advocate for Windows, that simply isn't fair. He goes on to say that he has "no explanation for these variations", which brings me to my next point.
He can't explain the size variations between OSes, yet he makes this statement:
We note that operating systems take a portion of drive capacity for use as file tables. A typical drive utilizes 70MegaBytes for this function, which is not significant on a drive with a capacity of 120GB.
So now he's trying to explain it, and not doing a very good job. First of all, the FS overhead will vary roughly proportionally to the size of the partition, so giving out a number like 70 MB and saying that a "typical drive" loses this much is careless at best. Secondly, I'm not conviced that he doesn't actually have 70 MB of data on that drive. There's no accounting for the 11 MB that aren't showing up as "used", which sounds like FS metadata to me. I don't have a drive handy to format, so I don't know if Windows shows "0 used" on a clean NTFS drive or not (oh, is he using NTFS or FAT32... the world may never know). The bottom line: he should have used the Disk Management tool to compare apples to apples (no pun intended).
And the bottom bottom line is that he's in the storage business, and shouldn't be so ignorant. He's got a degree in mathematics for crying out loud!

I appreciate that this needs to be explained, and I know all too well that the average computer user (read average American) can hardly count, much less do it in binary, so a simple explanation is good. But I never think things should be simplified to the point of gross inaccuracy. This is just further compounded with the obvious lack of a clue. Someone write a better (and perhaps shorter) account for this, please!

Don't use folders; use Categories on How Do You Organize Your Data? · 2003-09-02 20:54 · Score: 2, Informative

Categories are a feature of MS Outlook; it probably exists in other clients as well (Evolution?), but my experience is with Outlook.

Outlook allows you to assign any number of categories to any object. An object can be an e-mail, task, contact, appointment, etc. Outlook comes with a list of "common sense" categories out of the box, but the user can make up categories as he/she sees fit.

If you keep all your messages in one folder and assign them to categories, you can use Outlook views to sort through the data however is most applicable at the time. One of the built-in views is "By Category". Items are grouped by category, then further sorted by whichever field you prefer within each category. If an item is in more than one category, it will be displayed multiple times in the list, inside the appropriate category grouping. It is better than folders, I assure you!

You can assign categories to objects multiple ways:

Entering the category in the field at the bottom of the dialog where the object is created/edited
Dragging and dropping the object into a category grouping when viewing by categories (does not allow multiple categories this way)
Using rules to automatically assign categories to messages as they arrive
Right-clicking an object and selecting "Categories..."
...and more, I'm sure

I find categories particularly useful for contacts and appointments, as they quite often fall into multiple categories. For example, a contact might be a family member, but also a member of my local LUG (Linux Users Group), and also works at a certain company where I have several business contacts. Folders simply won't do in this situation; I have no desire to maintain three seperate contact entires, but I want the contact to show in all three groups. But with categories, happiness ensues.

Slashdot Mirror

User: rpwoodbu

Comments · 13