Guaranteed Transmission Protocols For Windows?
Michael writes "Part of our business at my work involves transferring mission critical files across a 2 mbit microwave connection, into a government-run telecommunications center with a very dodgy internal network and then finally to our own server inside the center. The computers at both ends run Windows. What sort of protocols or tools are available to me that will guarantee to get the data transferred across better than a straight Windows file system copy? Since before I started working here, they've been using FTP to upload the files, but many times the copied files are a few kilobytes smaller than the originals."
Clearly you're looking for UDP. Next question.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
create checksums? .. md5/sfv for e.g.
SFTP should do since the communications are encrypted, if something changes along the way it should be rejected by the other end. HTTPS and any other protocol-over-SSL should do.
FTP is a plain-text protocol so if something changes along the way it won't give you any issues.
Custom electronics and digital signage for your business: www.evcircuits.com
Or I guess that would be WWCP. WWJD?
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
Jesus is awesome.
Automatic hashing of transferred pieces, ability to verify the entire package after transfer, higher performance if hosting on multiple machines...
I'm sure you can setup an adhoc private setup... Maybe even utorrent between two machines.
The summary states that with FTP, the downloaded files were of the wrong size. Can anyone explain why TCP's efforts to to deal with unreliable networks, such as the retransmission of unacknowledged packets and their reassembly in proper order, would not already deal with this? I am familiar with the concepts involved but I think I lack the low-level understanding of how you would get the kind of results the story is reporting.
It is a miracle that curiosity survives formal education. - Einstein
Robocopy? http://technet.microsoft.com/en-us/magazine/2006.11.utilityspotlight.aspx
W
Background Intelligent Transfer Service (BITS) can be used to transfer files between windows servers. It is the technology behind Windows Update. We use it in our company to transfer files across a low bandwidth sattelite connection. Great thing is that it can automatically resume transfer after rebooting both machines. SharpBits offer a nice .NET API. You can find it here: http://www.codeplex.com/sharpbits
I love it! Haha... that's probably one of the better tags I've seen.
I'd say BitTorrent -- with firewall rules or some other measure so random people can't see your microscopic swarm. It uses SHA-1 hashes of chunks, so if a torrent client says a file downloaded successfully it's pretty much guaranteed to be true.
________
Entranced by anime since late summer 2001 and loving it ^_^
Kermit has a reputation for being robust, and there's an implementation for Windows. I'm not speaking from experience, though.
hi there,
why don't you get cygwin on both the systems and then do a rsync ?
between your own network, you might want to use robocopy(http://en.wikipedia.org/wiki/Robocopy).
BR,
~A
Should work fine across a WAN and then just file-copy.
You can run over an SSL link. Plain-old FTP would be the worst choice as anyone could sniff your traffic.
Wasn't TCP designed for just this? Guaranteed transmission?
Similes are like metaphors
Cygwin + SFTP maybe? Not sure if that performs better. Easy to set up though. May get better grade of service off the network, depending on the rules, of course.
they've been using FTP to upload the files, but many times the copied files are a few kilobytes smaller than the originals
Twenty bucks says you're converting from Windows line endings (/n/r) to Linux line endings (/n).
Use binary mode and you'll be fine.
it's not just for Linux.
You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
Rsync over ssh and then a script to md5 at source and destination.
The last part may be tricky and/or slow depending on your filesize, but it will do the job for free.
NO SIG
It's crazy but it just might work. Not very quickly though.
There are no guarantees when it comes to the protocols and the internet .... it is always a "best effort" system. Many forget that it is always a best effort system because the internet has come to the point where for all intents and purposes, there are virtually no failures. I would probably use a tried and true protocol like FTP or maybe even SCP. Both work very well. I would think your best bet is to try to work with the government to improve their "dodgy" internal network. SCP has the advantage of securing the transmission as well as excellent error correction and recovery.
Probably tape drives, or hard drives if you prefer. Encrypt with a shared key. I think microwave is LOS already, so your distances can't be that large. It would certainly solve your "flaky" bandwidth and security considerations. You would "packetize" the data, eg: tapes are brought over in serial succession; if a tape went missing, you delete the key that encrypted it's contents and request a resend of the contents of that tape. That verifies it's receipt.
Not sexy, but it's probably the best solution. Since you're a government contractor, I'll now insult you to suggest that you need a project for which you can charge a lot more money, like a carrier pigeon training program, including pigeon consultants, a pigeon breeding program, and a pigeon habitat designer. But that's what you get for asking Slashdot to do your job for you, especially one with an obvious, non-sexy, non-technical solution.
--
$tar -xvf
"Guaranteed ... mission critical files ... microwave connection ... government-run ... very dodgy internal network"
A transactional store and source integrity verification at the destination point. Or something in between that and what you have now, depending on your requirements. I don't know of a tool that does that out of the box though.
... is what you want. Yes, you can use it with Windows (with or without cygwin bloat). Use -c and a short --timeout and you're good to go. If you're using it over ssh you're looking at three layers of integrity (rsync checksums, ssh and TCP), two of them quite strong even against malicious attacks not only against normal stuff. Put it in a script with a short --timeout; if anything is wrong with the link your ssh session will freeze completely, as soon as your --timeout is reached rsync will die and your script can respawn a new one (which will resume the transfer using whatever chunks with good checksum you have already transfered and will again checksum the whole file when it finishes).
Line ending was my first thought too. I've used FTP scripts in Windows to and from *NIX machines with no trouble at all. I can't vouch for how well it works for Windows-Windows transfers because in that case I've always just used shared folders. That worked fine too. Unless the data is sensitive, there's really no need for scp or anything fancy.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Take an MD5 hash of the data or something, then send it. If it comes back changed, you've got data loss. If it comes back the same, and the files are still a few kb smaller, then either you're the Wizard of File Hashes or you're reading off on-disk size instead of actual data size.
No kidding!!! What do you say at this point?
Ritchie's Law - assume you have screwed something up *first*, before blaming the tool...
I had this same problem but with cable modem internet on one end and DSL on the other.
I am guessing you are using the windows ftp command in a batch file. Problem is, if the line is interruped, ftp doesn't care and carries on in your batch file as if the transfer completed 100%
I purchased one copy of WS_FTP Pro by Ipswitch and using the bacth scripting function created a script that will loop until the file is sent entirly. It's low cost and works great.
It's not just for *nix any more.
DVD burner, FedEx.
--- Worst tagline ever.
It can even copy when the cableco sends spoofed FIN packets to reset connections.
I too, am a bit surprised that FTP is failing. It has been my experience that if there are network problems the transfer may slow to a crawl, but unless the network is dropping 10% of the packets, I would be surprised if it failed. Have you tried FTPing the same file back and see if there is really stuff missing, or if it is just technical differences in storage of the file? As an aside - some of the old modem protocols might work for this. The problem is likely the microwave connection coming and going. I have seen MW drop in and out - it's madening. FTP will definitely fail in that scenario. You could also write your own little protocol that breaks the file into small pieces and transfers a chunk at a time and wait for an ack checksum. If the connection is interupted, automatically stop and try to reconnect - then resume. You're reinventing the wheel, but then you know exactly how the process works.
Why don't you try rsync. That should do the trick nicely.
Several posters have mentioned that TCP is a reliable transmission protocol, but it doesn't guarantee anything above the actual network layer. If what you're looking for is guaranteed "once-and-only-once" transmission at the level of each message or file or whatever that you're transmitting, you need a transactional message queue -- something like "mqueue series." These are basically network-transparent queues, where you can put something on the queue at one end of the network and pull it off the queue on the other end. It's guaranteed to make it to the other side once and only once. The protocols are built on TCP, but give you transactional guarantees at a higher level.
Think of this transfer model like a car, the further it goes, the more bytes are burned up. they just need to be added back in with a network filling station. I would look to google for a government approved provider.
Also look into Windows DFS.
We use it to sync webfarm filesystems in Server 2008 and it works perfectly. At least in 2008 only file changes are sent across so it is very efficient, even for WAN scenarios.
Best regards
http://www.mulesource.org/display/MULE/Home
Using it purely as a file transport is a bit like using a sledgehammer to open a walnut, but it will do the job and do it well.
DVD = 1, Burner = 2, FedEx = 3
This is my sig.
Leechget is a cool program for this. I think it does data verification during or at the end of the download and supports pausing and resuming. It is primarily a download accelerator/manager but it also installs a right click context menu of "copy here using leechget" for all local file transfers. So go that over the network and you'll not only get it there correctly but it'll go at max speed because it opens multiple connections at once and sends parts of the file then re-joins them.
Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
You should look at the EDIINT AS2 protocol, AKA RFC 4130. This is a widely-used e-commerce protocol built over HTTP/S.
AS2 provides cryptographic signatures for authentification of the file at reception, non-repudiation and message delivery confirmation (if no confirmation is returned, the transfer is considered a failure), and is geared towards files. There is even an open-source implementation avaliable.
More complex than FTP/SFTP but entirely worth it if your data is mission-critical and/or confidential. Plus, passes through most networks because it is based on HTTP.
You're not old until regret takes the place of your dreams.
Even on reliable connections, using .complete files is a great idea.
It works this way: If you're pushing, open ftp, after ftp completes, you check remote filesize, if matches local file size, you also ftp a 0 size .complete file (or a $filename.complete file with md5 checksum, if you want to be extra paranoid).
Any app that reads that file will first check if .complete file is there.
If remote file size is less, you resume upload. If remove filesize is more than local, you wipe out remote file and restart.
Same idea for the reverse side (if you're pulling the file, instead of pushing).
You can also setup scripts to run every 5 minutes, and only stop retrying once .complete file is written (or read).
Note that the above would work even if the connection was interrupted and restarted a dozen times during the transmission. [we use this in $bigcorp to transfer hundreds of gigs of financial data per day... seems to work great; never had to care for maintenance windows, 'cause in the end, the file will get there anyway (scripts won't stop trying until data is there)].
"If anything can go wrong, it will." - Murphy
Seriously !!!!
That must be the most over head and unreliable method available.
Hire an IT person to setup and implement one of the following:
* rsync
* SSH (SCP/SFTP)
* HTTPS
* FTPS
And stop transferring binary files is ASCII mode !!!!
How about creating SHA1 checksum and then transferring data using netcat? You could split files in pieces then run them though sha1 and finally send over netcat using udp and retransmit at will. Or if files don't change too much you could try rsync.
This is all unix-centric solutions, so you'd have to install cygwin, unless there exists a python library that does all that.
Back in the very old days, we had slow modems with noisy lines. We used thinks like Zmodem and other things to handle this problem. It might just be the thing that will work now to solve your problem.
is Snail Mail.
Yours In Capitalism,
Kilgore Trout
mission critical files ... microwave connection ... government
They use FTP? Hopefully only through ipsec or something.
I used to have a similar problem over another connection, where even more advanced file copy utilities would say the file was copied, but a 2-4k chunk would be missing. What I did to solve the problem was to use an archiving utility that supported adding ECC records and install it on both endpoints. Then, I'd just archive the files I need, send them over the faulty link, and usually the ECC records were able to correct any errors that did crop up during the transfer when extracted on the destination machine.
I did this manually, but I don't think it would be too difficult to make a scheduled task that would check for files, use the command line of an archive utility to generate a temp archive, sling the archive across, then the machine on the other side of the link extract the files and if the corruption was too great for the ECC records in the archive, to give some type of warning or notice to someone.
Of course, this is not fixing anything on the network layer, so maybe running either PPP over SSH or a VPN link directly from one machine to another might help.
'hash' is also a nice feature...
I telecommute and need to reliably get install images from my office down to my desktop. I have used GatherBird's Copy Large Files utility for several years, and it has worked out very well. No problems. http://www.gatherbird.com/GBMain.aspx
If you are also utilizing SQL Server (2005/2008), one option would be Service Broker, as it has guaranteed message delivery. And as long as at least one of the endpoints is using the Standard Edition or better, the rest can use the free express version.
Setup a linux box on the same network next to the windows box that is at the 'remote' end of the transfer (eg, not the end the transfer is initiated from).
Use ssh from the 'local' end to transfer the file to the linux box. Then run something appropriate (ftpd? apache? samba?) on the linux box that makes the files directly available to the windows box.
Alternatively, rip the Windows crap out and replace both ends with a real OS.
Establish a openvpn tunnel over UDP. All network traffic tunneled through it, will be encrypted and with integrity ensured. If your wireless network link is too unstable, you will have plenty of dis/re-connections of openvpn but communications going through the tunnel will be intact.
For example, one day I connected home over openvpn and used nfs to transfer a couple of big files over a slow wireless link. After some time going on, I closed my laptop and put it into suspend and moved into another place and connected to another wireless link. After resuming the laptop, openvpn re-established the connection to home and nfs continued to copy the file.
I would reckon that something based on the Bittorrent protocol (or a subset of it) might be an exceptionally reliable way of, while running in the background, sending files from one machine to another one.
The protocol comes with built-in file splitting/recombination, block validation and you can get several GUIs (and I believe at least one command line implementation) for it. It might be a bit overkill though - pretty much everything in the protocol related to dealing with managing communications with multiple peers (most of the protocol really) is unlikely to be useful in your situation.
That said, your own private tracker listing one and only one peer all the time (the machine where the files are being read from) might be the only tricky bit you need to do to use any bittorrent implementation out there.
Xcopy /v over a network share? using the at command?
Command line robocopy available from the various Windows administration kits for free.
It can run on a schedule or via the task scheduler or only when X amount of changes have been made, you can set interpacket gaps to limit speed, it logs transfers, it has switches for retries and wait times, it restarts and different methods for opening locked files, you can maintain NTFS file permissions or not, it can mirror, delete in destination. It is a single exe file with nothing to install. It can bescripted, run manually, or in a batch file with options.
Simple, easy, and it works.
http://en.wikipedia.org/wiki/Robocopy
Does your file have to be transferred in-synchro?
Otherwise you might want to look into Message Oriented Middleware, things like MQ Series or in worst case scenario, even Microsoft MQ. There are plenty of options.
This would allow you to put policies on the messages, handle routing (in case you need to deliver to different recipients), guarantee delivery at least once, do type conversions/transformations etc.
http://www.columbia.edu/kermit/
It was designed to work under the worst conditions and with any type of machine on the planet.
It is old but still in use so it probably works really well.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
You are kidding about this, aren't you?
Let me get the facts straight:
- you have "mission critical files", and the network you're transferring them over is so incredibly badly managed that it doesn't support reliable data transfer
- you want a technical workaround for this brokenness.
If this is the case, you don't have a technical problem on your hands; you have a political one.
"Mission critical" has a meaning: it means critical to the success of the operation. I.e. without these files, your operation or someone else's operation will fail.
If your management believes that your files are "mission critical", and you're facing a problem of this sort, you need to document the difficulties you're having, along with measurements to support your claims and then make a clear statement that as long as your network path is completely broken, you are absolving yourself of responsiblility for the correct transmission of these files.
If your management doesn't do anything about this, then the files are not "mission critical".
Is the destination computer hard drive formatted the same as the source hard drive? If not then this could be as simple as the extra bits for each sector on the drive reporting the correct space for the drive.
If the files are often the wrong size, it's probably something like CR/LF translation, or shifting reported file size due to filesystem block size. What do you see if you compare the files? Do they have the same checksum on both systems? My guess is that you don't have a tools problem (FTP, lame as it is, is probably not having as many transmission errors as you describe).
In terms of file copying tools, the options that come to mind are:
FTP doesn't check data integrity, but it's fairly reliable nonetheless, due to the TCP layer, which does retransmission, etc. But if there are transmission errors you may not be notified (i.e. you won't know if there's corruption, but you will be told if the connection dropped. So don't use it.
SCP does check data integrity, so if there's a problem in transmission it'll detect it and tell you, but it won't correct the error - you'll have to script re-sending if there's an error.
rsync checks the integrity of each delivered file, which is great. That being said, rsync is really designed to duplicate a directory tree from one machine to another, not copy individual files, so you may need to play with it a bit to do what you want.
Pando (http://www.pando.com) gives you the above plus data validation and retransmission, which might help in your situation. I've used it when sending very large files over unreliable connections (e.g. dial, weak wireless) and it'll keep hammering away until the data gets through. It's primarily a GUI app (Windows, Mac, Linux) so it's more of a consumer tool, but it can be scripted, etc., so depending on your application it might be what you want.
Enable 3D printed prosthetics!
that this is an issue with the size of your hash... I think.
Check the message boards for documentation. Others have had this same issue.
It's not FTP that's the problem. It's the implementation. Try others.
Switching to a new protocol, especially one like UDP that has no built in error correction is going to be complicated and costly.
Especially if you can fix the issue with FTP; which, I think you can.
This signature has Super Cow Powers
Create a small set of PAR2 files. At the destination compare it. If it needs repair, then get the associated repair blocks. This solves 2 problems...
1. Don't have to retransmit a corrupted file. What's the good in knowing it's bad if you'll have to redownload it again.
2. Fixes the errors for minimal bandwidth. Due to the fact you mentioned the bandwidth limitation, I assume this is a major obstacle to just resending the file again when it's corrupt.
At SFTP and SCP if there is noise or the connection fails (what will have exactly the same appearence) you'll end with no file at all at the receiving side, not a corrupted file. FTP will create a corrupted file.
Anyway, from what I've read, he's looking for rsync.
Rethinking email
Ummm... try using the "BINARY" command in the FTP client first?
I had a similar issue once, and kicked around the idea of making it database-centric instead of file centric and use something like ODBC. Files are first "parsed" into a database format and un-parsed back into files at the other end. The downside is that it may not work well for open-ended documents, but is primarily designed for CSV-like (delimited) or fixed-column ascii data files. The plan's schema somewhat resembled:
// for internal processing // foreign key // sequence in file
table: lines
-----
line_ID
file_Ref
line_sequence
line_text
line_check_sum
table: files // or name
// first transfer pass
------
file_ID
file_check_sum
line_count
After a transmission session, the missing lines can be known using the line count, sequence number, and check-sums. Check-sum is merely the sum of all ascii character ordinal values. The algorithm is roughly:
save(requestFileInfo());
save(requestFileContent());
while (lineList = determineBadLines()) {
for lineID = each in lineList {
statusMessage("retrieving line " . lineID);
requestResendLine(lineID);
}
}
statusMessage("done.");
You may also need a file info check-sum(s) to make sure the file list is good. You can try it without a database, but it's hard to get "random access" to the problem lines without it.
Table-ized A.I.
Try using FedEX
Give you reliable connectionless (UDP) or connected (TCP) options respectively.
you had me at #!
If you're stuck on windows, have no control of the link, need to guarantee delivery, need encryption options, are stuck in specific delivery timeframes, and need it scriptable with triggering etc. there's quite a few commercial options.
The 2 I've used the most are the following:
Aspera FASP - http://www.asperasoft.com/
Sterling Connect:Direct http://www.sterlingcommerce.com/products/managed-file-transfer/connect-direct/
Connect direct is geared more towards highly distributed delivery channels, where you've got lots of ingress/egress points to different locations (think multi site, multi customer distribution) so it could be a bit overkill, but aspera is great at utilizing a link to it's peak (it'll udp flood a pipe and guarantee delivery), in fact if you don't watch what you're doing you can pretty much storm everyone else off the link, but your data will get there >;)
01:36AM up 426 days, 2:46, 1 user, load average: 0.14, 0.11, 0.05
I know I'm late on this, but I haven't seen it mentioned yet. Doubletake is fantastic, and I've yet to have it fail me when moving stuff from point a to b on Windows systems. You didn't mention cost as a factor in the summary but I should note that it is most certainly not free, but if you have to make sure it gets there it may be worth it.
It would explain a difference in file size.
First off, I second the recommendation for rsync. This is what it's for. It's also easy to do over a SSH tunnel or similar.
As far as your size discrepancy goes, it's possible that such a small variation could be accounted for by different block sizes on the source and destination volumes. The data may in fact be identical, but the two different volumes may need to allocate a slightly different amount of disc space to store it.
rsync does hash checks as part of its normal operation, but if you really wanted to be methodical, you could have a script rsync to three different directories and then compare hashes on all copies. This may or may not be overkill, depending on your mission.
It sounds like there's a good bit of work to be done.
The first would be to address the problems with the internal network. There's no excuse to have problems on a LAN. Upgrade away from hubs and consumer grade "switches". This is an easy and affordable fix. There are plenty of old Cisco switches available on eBay for very little money.
I upgraded an office with 6 Cisco Catalyst 2924-M-XL-EN and 6 WS-X2924-XL-V 100baseFX fiber modules cost $300 from eBay. I used fiber to connect all the suites, which cost about $150. $450 got rid of all their problems that the entire staff had been complaining about for years.
I'd also recommend having a look at the radio link. Why isn't it working properly? Is it a line of sight problem (a tree grew in the way, maybe?), a signal or interference problem that could be resolved with a better antenna or reconfiguration?
For a reliable protocol, you can use rsync and rsyncd. I use this between Windows clients and Linux backup server. rsync provides a very reliable protocol, ensuring the data was received correctly or retrying. FTP isn't a good protocol for mission critical data.
Myself, depending on the link, what's on each end, etc, you may consider putting a Linux machine at each site, and doing a PPP over SSH link. It's easy, free, and very reliable. :) It doesn't take reinventing the wheel, nor playing with VPN servers and clients. rsync over this connection will work very well, and due to compression and encryption by both rsync and ssh, you'll find that it ends up being faster. That may seem counter-intuitive, but I've found it to be true in reality.
For a while, I was on an unreliable home connection, which apparently did a good bit of traffic shaping and some port blocking (ahh, gotta love residential providers). I frequently saw packet loss. Once setting up a PPP over SSH connection, I routed every machine on my home LAN through it, and all of our traffic left through one of my datacenters. Despite adding this extra route and hops, all Internet traffic from home was faster. Because of the magic of SSH, it actually got rid of all my packet loss. :) It was still there, but SSH retried to keep the tunnel working with no losses. Since the provider couldn't see anything that was going over the line other than encrypted data, they couldn't do any sort of traffic shaping. I used an non-standard port number, so they couldn't even be sure of what the encrypted data was. I had already planned on changing ports, if they should start slowing down my traffic, but they never did. It was unidentifiable, so they left it alone. For several things, I enjoyed 10x the normal speed, just because of the PPP over SSH tunnel. Doing the same transfers via HTTP or FTP over the same route (but not over the SSH connection) resulted in very slow speeds.
Serious? Seriousness is well above my pay grade.
Economy, eschmonomy.
Seeing this problem and some of the hilariously incorrect answers ensure that regardless of this downturn, I'd have job prospects if I was out of work.
Thanks ignorance!
Take a note from the newsgroup file sharers. If it's good enough to share GB+ data on an old school discussion forum, it's good enough for your business.
1) Split the file into tiny rars, 1-10MB in size.
2) PAR2 the files with a decent amount of redundancy.
3) Send. Resume any broken files
4) Fix.
If you really wanted to get fancy. Run a news server on each end of the line. Upload files locally to the newsgroup and then grab them from the other end. The pirate community has a ton of tools for making this as painless as possible. (I heart hellanzb)
It's built in and it works. You can get something that's theoretically more efficient with a lot of work, but this is quick, cheap, and simple. Best of all, it actually works.
The client and server each run a checksum against what's on either end after a "successful" transfer. If the xcrc fails, delete and re-send. It's really that easy. If efficiency is an issue, just enable resuming on the client end.
What a simple question. Your problem is solved with about 20 minutes of work/setup/testing and a budget equal to whatever you get paid for 20 minutes worth of work.
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
I worked on a project that had similar deployment requirements, and we could that using Microsoft Message Queuing (MSMQ) as the transport mechanism took care of all these issues.
MSMQ itself only provides the transport mechanism, and there's no front end interface to send files- you'd have to code something up. However, it's the best "guaranteed delivery" system that I've seen on the Microsoft platform. Persistent across reboots, security controlled, FIFO queuing, very robust.
You may not be looking to code something up, but if you are, have a look at that.
This same problem happened to us when I was doing "service". Let's just say that something had a computer with an access database, and the mdb had to be sent over a microwave connection to something else (that also had a computer) which took mdb files from many somethings and did some other things. Well, anyway, at first they we're using FTP, but I got to script a solution that zipped the file over several small ones (50MB each) then sent them over SCP, made a checksum on each file and then rebuilt the mdb file.
I never trust windows copy for large file transfers from disc to disc, on the same machine. One error and BAM! You're left with an inconsistent file system copy. The only RELIABLE way to resolve this is to break out RSYNC and have it finish the copy. But then one wonders, why not start with rsync in the first place. Unlike windows copy, you can have it ignore & LOG errors. Then when the large batch is done, fix the errors and repeat the same command, and RSYNC won't copy the already-copied information.
It is a total joy, aside from the command line!
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
Use MD5 or some other relatively strong checksum (CRC-32) too, just to be _absolutely_ sure that you didn't get any errors. It's a paranoia indulgence though if you are using TCP and rsync.
Absolute statements are never true
Maybe think outside the box for a second and really look at what you are asking...
You plan to transfer _mission critical files_ over an unreliable link. Maybe you should provide a _mission critical link_? You know, a leased line with a backup? If you can't justify the cost, it probably isn't mission critical.
The right tools for the job people... Jez!
GPLv2: I want my rights, I want my phone call! DRM: What use is a phone call, if you are unable to speak?
If you have access to the file at the remote end I would run it though QuickPar http://www.quickpar.org.uk/ at that point you can split the file into a group of smaller files (less chance of any one file being mangled), plus add ckecksum files to the set, that way if one of the files is mangled in transport it's 1: identifiable and 2. automatically fixed by reassembling the par files on your end of the connection.
>
> they've been using FTP to upload the files, but many times the copied files are a few kilobytes smaller than the originals
>
This sound like operator error, probably transferring binary files in ASCII mode. TCP/IP already takes care of the reliable delivery.
... this isn't a file copying problem, but rather a verification problem. If a file sometimes winds a few KB short could it be that someone or some process at the destination grabbed the file before it finished copying? If that is what's happening then any file copying protocol that copies the file to a temp location and then links the file into place at completion would solve this problem.
Are you just looking at the file size on the disk, or have you done a data comparison?
If you are just looking at the size as they appear in windows by right clicking at looking a the properties, you are woefully under qualifed to do anything remotly like this and need training.
There is no reason Windows in and if itself can't do this with FTP.
What are you not telling us?
The Kruger Dunning explains most post on
On some level, there isn't much difference between an application and a protocol. In fact, if you ever take a networking theory course, you'll see that each protocol layer in the network stack is, in fact a "protocol machine" (i.e., an application), which does the little protocol dance that makes functions at that layer happen.
But I digress. What the user is running into here is a fundamental problem with TCP over lossy networks. It really was not designed with really lossy networks in mind. E.g., the congestion control mechanism in TCP ("exponential backoff") makes the assumption that there is a wire sitting there and that certain parameters (like bandwidth) are not going to change. If you need certain QoS guarantees on a wireless link, TCP may be hard-pressed to deliver, because TCP's [limited] QoS mechanisms may make the problem worse. There is a HUGE amount of overhead on 802.11 networks to make sure that TCP doesn't suck.
I don't know how this person's microwave link is configured, but they might be better served by thinking about the QoS guarantees in the various layers in their network stack. I know a previous poster was joking when they said UDP might be a good option, but look, part of the problem on wireless is TCP's retransmission mechanism. With UDP it is up to the user/application to ask for a retransmit. Bittorrent works exactly like this, so something like Bittorrent, where each small file chunk gets its own hash, and those hashes are checked upon receipt, might not be a bad idea. I like rsync as well (because it has a rolling checksum feature), but again, you have TCP in the mix, and if I recall correctly, rsync will not retry automatically on failures, which is what you want.
If you need guaranteed, fault tolerant, one-time message delivery you could use IBM's Websphere MQ. It is expensive bloaty, but it gets the job done and will tolerate intermittent connectivity problems. It runs on every platform and there are client API's for many languages.
Place an MQ server at each end, then have the client enqueue the message at one end and a listener dequeue at the other end. If the link or other host is down, the clients can still send messages which are stored and delivered when the comm. link comes back up. Transactions and rollback are supported, and large files are automatically segmented and reconstructed by the system.
You can configure it to send message receipts, prioritize messages, and report any failures to a dead letter queue.
There may be other brands of middleware that do this just as well for free. WMQ is just the one I'm most familiar with (I don't work for IBM).
FTP and TCP cannot "Drop" packets or bytes. You need to learn-up on TCP and FTP.
FTP _does_ translate DOS end-of-line sequences (carriage-return followed by new-line -- 2 bytes) with Unix end-of-line sequences (just new-line -- 1 byte). So your files may become shorter by as many bytes as they contain lines.
The solution is to tell FTP to not treat the file as text, but as binary image information in which new-line characters are treated with no special processing. Traditionally, FTP called this "file type I" and the command to set it is "bin" as in "binary":
C:\Documents and Settings\fred>ftp abc.net
Connected to abc.net.
220 ProFTPD 1.3.1 Server (ABC Global Enterprise Group) [10.13.131.34]
User (abc.net:(none)): freddy
331 Password required for freddy
Password:
230 User freddy logged in
ftp> bin
200 Type set to I
ftp>
Binary verses text mode?
1-to-the-01101
1101101
1001000
0101101!
Word!
Bow-ties are cool.
The transmission system is irrelevant. All that matters is that you know you have received whatever was sent.
Just make sure you send a checksum and that the received file matches.
oh wait... Windows scripting...
Deleted
With some sort of CRC checking going on...
Tsukasa: All I really want, is to be left alone...
Bittorrent?
Also, 2 mbit microwave wtf.
Run a cable for shit's sake.
Just get a big cannon/artillery gun/icbm and fire tapes of data, or flash drives even. Just stuff them into nerf balls. It'll work great, like a sneakernet but with explosions. :D
Use something like Netmotion, although it was designed for mobile wireless usage, primarily to maintain TCP session persistence while simultaneously providing an encrypted VPN, it works wonders at keeping a packet-loss-free connection over a lossy wireless network link too.
Interestingly, it uses a UDP stream thru which it tunnels it's encrypted protocol.
It's old but highly reliable and can be secured.
FTP rides over TCP so it isn't really possible to "lose" data. However the default FTP transfer mode for Windows is ASCII. This means that if you're transferring binary data that some might not make it through as expected.
Change the FTP transfer mode to binary for the transfers and you won't have a problem. The command is "bin" once you have the FTP client open (assuming you're in interactive mode).
because microwave apparently is just as prone to loss.
http://www.rfc-editor.org/rfc/rfc1149.txt
.' Windows 98 crashed I am the blue screen of death no one hears your screams... `.
Check out unison.
Directory synchronization over several protocols, brilliant include/exclude syntax, failure protection & rollback, rsync style 1-way or 2-way block synchronization and intelligent file change detection, Unix and Windows support, open source... It's what you need.
A government is a body of people notably ungoverned - AC
Uhh... Easy.... Teracopy
http://www.codesector.com/teracopy.php
-Josh
Look for a very old communications package from what used to be Communications Research Group, later was Blast, Inc - and somewhere in between I think was owned by U.S. Robotics.
Anyway, the software package was called "Blast", and it had a file transfer protocol that ran over modems and TCP/IP.
If the file transfer completed, the file was *guaranteed* to be correct on the recipient's side.
Robocopy is commonly used for this problem. Otherwise Cygwin and rsync.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
My sentiment exactly. It seems that you could use Torrents to share the files with more than one server for fail-over security.
Different allocation unit sizes between disks can cause a file size to appear different across two machines...
Set up a BSD lpd queue under Cygwin, something like:
sendit:lp=/spool/null:sd=/spool:if=/spool/sendit.sh:sf:sh:mx#0:
Have the sendit.sh script do whatever it is you want with the file. To send a file: lpr -Psendit filename
Configuration of the network queue left as an exercise for the student. (Hint - queue pathnames locally.)
If it's a Windows-Windows, Robocopy is what you want. It will do everything you need. It's free too and made for Windows.
Have you checked that the differences in files sizes aren't due to the differences in cluster size?
You can verify content by simply comparing hash values. If you need more assurance, then Sterling Managed File Transfer, or SFTP.
If you are talking DoD, DoE, or anything above a confidential level... for the love of God, call the NSA and request to use Assured File Transfer (AFT) or some other managed system. They should help you with this.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
How about Microsoft Data Protection Manager?
It was part of Windows 2003 R2, and thus probably requires a server in each end.
I guess you're able to run it atop an SSL connection or encrypted in some other way.
If you're worried about encryption you might want to make an IPSEC tunnel between the two nodes, then you can use whatever you want inside that tunnel.
It's been moved to System Center - http://www.microsoft.com/systemcenter/dataprotectionmanager/en/us/default.aspx - which might require other licenses than for Windows 2003 R2.
Also, BitTorrents, as suggested elsewhere, would be a possibility. .torrent files first....
Only you need to ensure that you can reliably transfer the
But if I were to choose I'd probably suggest using DFS replication since it's very simple - afaik it was improved in Windows 2003 R2 to be block based instead of file based, but I might be wrong on this....
Mod parent "-1 dull as dishwater and half as useful".
Infuriate left and right
Rename it hot_chick.avi and it'll be there in no time. And since it's Windows, you could just make it a payload of some malware and let it spread there naturally.
Do you mean a transfer protocol like kermit?
Here are the UDP-based file transfer systems that I prefer:
Kencast has been the leader in multicast IP satellite file transfer (where all kinds of weird things can happen in Ku band), now they have a system called BlazeBand built for point-to-point IP connections that used their FAZZT Forward Error Correction technology, validation algorithm, and missed packet collector algorithms. I've used FAZZT over satellite, but haven't tried BlazeBand yet.
Aspera is also widely used as a point-to-point UDP file transfer system in the entertainment industry. I've seen it used to move large video files for network television programming.
Most modern networks expect a highly reliable data link with congestion as the primary concern, and are tuned accordingly. You have a highly unreliable data link with little concern about congestion (assuming the microwave link rarely if ever has more than one simultaneous user). The problem is that when a packet is corrupted or lost, the transmitter assumes a crowded pipe and starts slowing down, when you really want it to retransmit as soon as possible. Since it's doing the exact opposite of what's needed to fix the problem, it can compound on itself out of control, eventually causing it to just give up the file transfer altogether. Therefore, you want a short retransmission timeout for TCP, a longer timeout for FTP (or whatever better application layer many people have recommended you try), and a smaller MTU so there is a lower probability of any one given packet getting corrupted or lost.
Of course, you can go too far the other way too, and the ideal settings are going to be highly dependent on your individual network.
This space intentionally left blank.
Use Winrar to compress and archive a group of files into one files - plus turn on recovery record so if during transmission of your files - if a corruption occurs the recovery record can figure it out.
Then use ftp/etc to transmit your file across
It uses the Microsoft Sync Framework - we've been using it for database backups for months and *love* it. You just install, set up your source/dest folders through the UI, then run them. For doing so on a schedule, just add a scheduled task for SyncToyCmd.exe -R
http://en.wikipedia.org/wiki/SyncToy
http://www.microsoft.com/downloads/details.aspx?familyid=c26efa36-98e0-4ee9-a7c5-98d0592d8c52&displaylang=en
http://en.wikipedia.org/wiki/Microsoft_Sync_Framework
If you just wanted "copy with retries" you could go with something simpler like Robocopy, but I think you'll be quite happy with SyncToy!
http://en.wikipedia.org/wiki/Robocopy
Use any common P2P file-sharing system, preferably one that doesn't require a central server. Gnutella would work fine, I know people do this using Shareaza and Limewire, and probably any other of the clients will work. DC would work. I used WASTE for a while to do this. Probably some other P2P software would work, too. They almost all use hash checks to ensure accurate transmission. Set up a private file sharing network, using one folder on each end, and your files should get transferred 100% error-free between your computers. Should be fairly simple, just don't share C: :) The only downside I can see (outside of the IT people freaking out because you've got a file-sharing program on your computer) is that they're "pull" oriented, rather than "push", so the recipient has to retrieve the files. See also Friend-to-friend and Private P2P
Like others have said, find out why your network is breaking data in transmission and fixing that would be the more direct fix, but I'm guessing OP doesn't have control over that part.
"mission critical" can mean different levels of paranoia depending on the mission.
Perhaps this guy's mission is backing up his pr0n - and his current backup approach is almost good enough for that mission, though it's a bit frustrating.
Other missions might demand digging a concrete trench and laying your own fiber between the computers he's trying to back up (say, if your mission is so important it has to survive the collapse of the major phone companies).
In most real businesses, though, a failed copy of mission critical data often results in a dissatisfied customer and the resulting customer service damage control -- for which most of the advice on this slashdot thread (rsync, etc) seem appropriate levels of protection.
I use sshfs file mounts for all office document file sharing and such, not just one time transfers. SSH encryption security, with the ability to open and edit files over the network. No goofing around with samba or windows file sharing. Regardless, some sort of ssh or sftp at least.
Not sure about getting it to work on windows, but there should be some options.
Living in Chile
The poster didn't say anything about scripting but I have noticed that the pro version of teracopy has a original/destination CRC value listed next to each file copied. I don't actually have the pro version and you'd have to get creative with script host (or whatever) to script it but this could very well fullfill your requirements.
Also, XCOPY has a "verify" flag...better than nothing I suppose...
"UNIX is very simple, it just needs a genius to understand its simplicity." -Dennis Ritchie
You can use Metalinks for downloads without errors. It can use whole file checksums, or partial file checksums to repair errors.
I would think that you would want something more secure that FTP or windows copy, even rsync should be across a secure connection.
I killed da wabbit -Elmer Fudd
Posted earlier AC...
http://www.blast.com/cgi-bin/blast.com/view_services.cgi?request=show_aisle_names&dept_id=1
As I stated earlier - guaranteed file transfers - if it says it's completed, the file is guaranteed to be there and be correct.
Who is general failure, and why is he reading my hard drive?
What you will want to do if you have a Windows machine on each end, is to secure each machine first. That would mean, over-writing the Windows operating system with a Unix/Unix-like system (Linux, BSD, Solaris, Hack-Mac). Then you have (most likely) built-in SFTP servers and clients. That is what I would do. Someone would eventually hack into your Windows system. That is the way Windows is, so that MS or an MS-partner can sell you the antidote to your virus woes (even after your cure your woes, they will come up with something another virus so as to sell you another cure to get your money.) Get security right from the start instead of an add-on. Go with Unix or a Unix clone.
This guy is dissin Microsoft. How dare he? FLAIMBAIT. RELGIOUS NUT! OUTDATED idiotic thinking! Throw chairs at him! Microsoft owns the WORLD. It is THE ONLY solution you should ever use. BAWHAWHAHAHAHAHAHAHAW!
Counter-Anti-Microsoft post paid for by Microsoft
a mission critical application under Windows... That in itself is the start of something going bad. TCP does guarantee transmission. The actual copy software will have to do some sort of MD5 check at both ends, in order to verify that content was copied properly. I would suggest you write your own copy script, any copy command would work. For government installation, I would suggest an encrypted copy process, scp, sftp, etc...
TOP DSLR Cameras Reviews of the top DSLRs
Look into Microsoft DFSR. If the servers are 2003 you just need to install the R2 update and configure it. Setup a source folder on your server and a target folder on the destination server. Excellent on a low bandwidth connection.
I'm sure the dodgy government network would love that....
RTFM is not a radio station.
Microsoft Message Queue is a guaranteed data delivery service for unreliable networks that can be configured to use encryption and authentication.
MQ-Series for guaranteed transactions.
CONNECT:Direct for guaranteed file transfers.
These are what telecoms, insurance companies and banks use.
You can play around with other solutions and validation/retry scripts if you like. At home, I do this. At work, I use things that work, every time and recover from failures.
The OP is confused.
He already has a reliable file transfer program (FTP) and he doesn't need anything new.
How did this question get past the gate to be published to the site? Doesn't anyone check the questions before they are put up?
The OP is either not switching to binary transfer or the source file changed during the copy.
Par2 while not a transmission protocol provides forward error correction and validation and should prevent the need to retransmit files when corrupted in transit.
http://www.ietf.org/rfc/rfc4838.txt
I used to work for this company (Kencast). Their specialty (back in the late 90's) was encoding a file with redundant data (user specified %) such that any portion of the file could be lost in transmission and the whole file could still be recovered as long as the % loss did not exceed the % redundancy. This was designed for true satellite broadcasting, with no return traffic.
http://www.kencast.com/
Example:
1 GB source file
Encode 10% additional data into the file
1.1 GB encoded file
Broadcast the encoded file
Receive the encoded file on the other end (possibly many recipients)
Decode the file
As long as no more than 10% of the packets were lost in transmission, each decoded copy of the file match the original exactly.
You raise questions on various levels, so let me traverse the stack in reverse order.
Link: just how reliable is it? I assume quality is unpredictable and varies (for instance during mobile and in-theatre deployment), which suggests you need to check for transmission errors in pretty small windows and force a error retransmit ASAP (if you have that capability on your specific type of link).
Protocol: I won't question using TCP/IP, but I would suggest you may want to ensure you tune the stack to small window sizes, and use UDP as that appears to match your transmission quality. MANET could help as well as that's made for mobile use, but I don't really know anything about it - it just may be an option worth checking.
FTP and TX security: I'm not sure of how sensitive this data is, but a microwave link does have stray signal issues, and FTP transmits UID and password more or less in cleartext. IMHO not quite desirable, but it depends what you do. In addition, FTP defaults in Windows to ASCII mode which makes a mess of data that is not of Windows origin or is binary. You MUST set to "binary" mode first before you start transmission, which others have already mentioned. In addition, do some tests with checksummed data so when you find differences you can work out if it's your own interpretation or a real problem.
Personally, I'd grab the PuTTY set and run a SSH session. You can find a server at FreeSSH. Also brutally easy to automate - I expect you're not that much in need of employment that you must generate your own opportunities to watch paint dry :-).
Good luck...
Insert
Crappy connection? Resumable transfers? Slow connections? Sounds like the good old BBS days!
Z-modem is your answer.
We've seen that too. We had a really slow connection to hosts in Singapore, and they were our only windows boxes. They set up cygwin and rsync, and we had crashes and incomplete files.
The answer in our case was to move those services to a real server and the not the crufty old former desktops that they were.
I'd rather boot those machines with LiveCDs and mount the disks read-only before cygwin/rsync on windows. What a pain.
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
I was once in a startup trying to deliver streaming video to tablets/laptops at sporting events via 802.11 wireless networks. Within 30-50 feet of an access point, it worked great, but any further than that and packet loss was too high to decode the video stream. Before our investors pulled the plug, we looked into technology from a company called Digital Fountain. They use a type of forward error correction called raptor codes to make data transmission high resilient to packet loss on unreliable networks. It's a commercial product, but I assume the OP's employer can afford it if they're transmitting "mission critical" files for the government.
Am I the only one to question whether there is a real problem? Did I read it wrong (ADD so it's possible)
What if the File allocation size is just different on the two drives? Then the exact same file can, and WILL be different sizes on disk, no harm, no foul, but maybe fowl in the way of the transmision route.
For instance, if you write a 6k file to a HDD w/ a 4096b allocation size, it's going to take up 8k(ish worth of space, despite the file being 6k.
Now if you write that same file to a system w/ 512b allocation size, that 6k file will take about 6k.
And it can get smaller by 4-xkb in a situation where a newer system with a larger allocation size is sending a file to a (usually) older system w/ a smaller allocation size.
Then again, maybe I failed math. but I thought that's why there's "Size" then "Size on disk"
How much is your data worth? Back it up now.
You want IBM's MQ software. Two MQ Servers to talk to one another.
It's a bit pricy, but guaranteed delivery.
You need a Disruption Tolerant Network (DTN) solution, there is no commercial DTN's out there that I know off but the DAKNet people at MIT were working on something and a group a U of Waterloo has an implementation. http://www.firstmilesolutions.com/ http://blizzard.cs.uwaterloo.ca/tetherless/index.php/KioskNet
Now, let's assume that it's not an CR/LF problem but that instead for some unknown reason the ftp transfers get aborted and thus the file size mismatches.
Okay, first of all, if you want to guarantee that a file that departed from one system is the very same file after its arrival on another system it is not wise to use the file size for verification, as the two files could have the same length but different contents. Therefore typically md5sum is used. Or better yet, use both MD5 and SHA-1 hashes so nobody could probably ever produce meaningful collisions for both of them at the same time.
Now, what programs should be used for the transmission itself? Well, that depends on your requirements: Is confidentiality important or is it really just about integrity and availability? Is speed or link saturation a topic? Like, if your current pipe is like 80% full, you probably cannot afford to encrypt your data. Otherwise, of course you should except for like if an IPS/IDS maintainer wants to be able to scan the contents. Let's take a look at both possibilities:
Basically suitable is every tcp data transfer application that does by itself not meddle with the data itself. So this kind of excludes ftp as it can substitute CR/LF (Unix) line brakes with CR (Windows) ASCII text line brakes while transferring data from UNIX to windows and vice versa. But then again, you can use FTP just fine if used in binary mode. However, even the Swiss army knife of network transmissions can be easily used for the purpose of reliably transmitting files from A to B: netcat.
nc or nc.exe is available for both Windows and Unix and is often used in the forensics world in manual combination with md5 and/or sha-1 hashes to transmit forensic evidence from e.g. a suspect drive to the examiners workstation. Here the chain of evidence would be maintained by recording a hash of the data on the suspect drive, recording a hash of the data on the examiners workstation after arrival and recording the date, time and contents of the transmission. Note that it might be vital to have a log of what has been transferred when so that it can be proven that you sent some data the other party claims to never having received it. So, recapping, e.g. netcat, ftp, SMB/CIFS shares, HTTP and any other TCP based file transfer utility could be used. HTTP and FTP could even be easily scanned for viruses/malware during transit. UDP based file transfer utilities could be used as well as long as the implementation does take care of the integrity. As most likely a short script would be used in order to generate logs containing MD5 and SHA-1 hashes on both sides, the time and date of the transfer and the filename, this script could as well easily handle data retransfers in the case of packet loss.
Sorry, this posting by now bores me. So, the recap:
Use SSH (SCP), cryptcat (used among others in forensics for the chain of evidence when confidentiality is an issue), HTTPS, SMIME or any other encrypted transfer tool, really. Hell, you could even generate an encrypted PGP file or whatever with a script and pipe it through whatever data transfer application you want. (Like ftp in binary mode ;) )
So, overall, what are needed here are two small scripts that do something like this:
On the sending side:
10 compute SHA-1 / MD5 hash of a file to be transferred (and optionally compress it) 20 send file 30 receive a SHA-1 / MD5 hash of the transferred file from the receiver 40 compare the hashed 50 complete transaction including logging the date, time, filename and hash, if hashed match 60 else goto 20
On the receiving side: 10 receive
Zmodem baby. Restartable transfers, auto-start by the sender, an expanded 32-bit CRC, and control character quoting. Just make sure you have a 16550 UART.
"Much better to only transfer ZIPs and check them at the other end if you only have control over the endpoints" - by samkass (174571) on Tuesday June 30, @01:45PM (#28531775) Homepage
I'd have to say that WinRar rar'd files MAY be a better option, because WinRAR offers 4 things that help 'guarantee' good files on delivery when you compress a file using it... they are:
1.) Create SOLID Archive
2.) Put authenticity information
3.) Put recovery record
4.) Lock Archive
(For options WinRAR offers for .RAR file, for the sake of file integrity, that you can put into place on files you send, along with MAXIMUM COMPRESSION (for smaller files to transmit/receive, & I generally find RAR files end up tinier than .zip files do, typically, even @ "max compression ratio" for both))...
Also, increasing the values of these IP settings here -> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters & the default settings for these:
A.) TcpMaxConnectResponseRetransmissions
B.) TcpMaxConnectRetransmissions
C.) TcpMaxDataRetransmissions
D.) TcpMaxRetransmissionAttempts
Can help as well! They're decently documented ALL OVER the place online, & going to "the horses' mouth" @ Microsoft is your best option, so you understand them, fully, so you can apply them for your case, specifically (I am not sure if ALL of them help, but some should)!
(That is, if you are experiencing "broken connects" & files coming back 'phunny' due to bad connections online)...
APK
P.S.=> Secure FTP (SFTP) or SSL might be of assist also... in fact, I'd recommend SFTP over SSL actually, because of what it is you are doing: Sending/receiving FILES, specifically here! OTHERWISE though? Hell of a GOOD post samkass... apk
You are looking for SkyPipe by AOS: http://www.aosusa.com/products_and_technologies/skypipe.html It works in exactly this scenario.
Bit Torrent protocol would be the simplest .
I'm not really a Windows guy, but I seem to remember server 2003 having a Distributed File System (DFS) that might work for this. I think 2k3R2 has an improved version that will only sync the changes to the files as well.
I work at a company with thousands of servers - Solaris, Windows, mainframe, even VMS. Connect:Direct runs on all of them. It costs, and it's not the easiest to script, but when you want the file to get there, and want to know exactly when it got there, it's the way to go.
(I have no financial interest or connection to Sterling Software.)
Look into NDM / Connect:Direct-- ( not the p2p, the software by sterling ). It's what banks and government agencies use to move files between systems. Auto restart, checkpoints during transfer, compression, etc....
I'd say DECnet.
See XTP: http://en.wikipedia.org/wiki/Xpress_Transport_Protocol Devices that use XTP to help transfer files over satellite links exist. For example: Mentat/Packeteer/Blucoat's SkyX product. (Disclaimer: I used to work for Packeteer, though not on the SkyX product.)
See subject-line above, per my last post!
(Upon my re-reading your original reply that I replied to? You were quite generic in terms of using ZIP, & did not note an OS or hardware platform, so, I am just letting you know that my reply is mostly based on a Windows "Point-Of-View" is all... as to be honest, I am not certain IF the .rar format in FULL function is on those other OS & hardware platforms, & I consider myself an "amateur/professional connoisseur (sp?)" of Win32 Softwares over time!)
Again though - good post on your part, samkass.
APK
Since before I started working here, they've been using FTP to upload the files, but many times the copied files are a few kilobytes smaller than the originals.
Are you sure you are using BINARY transfers? FTP allows "text" tranfers which can transform the CR+LF pair into 1 byte (CR I think). On large files, you could end-up with several KB missing. This transformation actually depends on client and server, not OS (although it was created because of Windows - Unix conventions).
Regarding what others have said, I don't see how TCP checksumming could affect the file unless a specific attack is made. Even then, it's hard to block the original TCP packet, and altering it will make the receiver transmit a "retransmission request". (I said hard, not impossible)
over ssh tunnel, it is secure, etc...
over openvpn tunnel, it uses UDP
ftp is over complicated and prone to trouble.
Whatever happened to ZModem? It's an amazingly resilient protocol that was used to great effect to transfer files over virtually any phone line, no matter how noisy. It accomplished this by an adaptive algorithm, whereby the block transfer size was either doubled or halved after every few blocks, depending on whether the previous blocks got through the line ok. Where's the equivalent in the modern internet world?? There are millions of people with dodgy or unreliable internet connections -- they would benefit from clients / servers that implemented a modern equivalent.
char*f="char*f=%c%s%c;main(){printf(f,34,f,34);}";main(){printf(f,34,f,34);}
http://www.samba.org/rsync/
Or something like it in Windows.
It will check the delivered files, retry chunks if needed.
Much better than just FTP
If it's truly mission critical (and if it is, it sounds like your mission is in real danger if you keep dropping bytes!), you could do worse than look at Connect:Direct from Sterling Software. It's the standard transmission software for bits of the core financial transaction world in the UK and with good reason.
Sure it's "only" a secure transmission and there's plenty of free alternatives, but this is one time when I would recommend paying out for the certainty you need... Others will no doubt disagree, but having used a variety of things for mission critical file transmission, C:D is a safe choice.
There is an entire class of software out there called "Message Queueing" systems.
MQ systems have guaranteed delivery of your message once and only once over any network conditions.
There are several big name vendors that supply this stuff.
They usually cost around $5K per copy (please check pricing on your own).
They run on virtually all operating systems and platforms.
They work great!
All the other suggestions you got are crap.
FTP? Please guys...give me a break.
Mike M
I been working at doing things like this for a living the last 9 years.
When you say Mission Critical and Corrupt Files in the same sentence I just had to reply.
There are a lot of firewalls and FTP clients that actually trashes FTP connections. It all depends on what you are using.
I would suggest that you either try to find out why the files get trashed. But if the files really are mission critical like you say
you need some way to make _sure_ they are transfered. It's called Transaction Safety.
If you can not figure out what's going wrong with your connections (why bytes are cut). Then you need to change protocol.
Changing it for a encrypted protocol might work, cause if the data in the transmissions gets trashed, the encryption would
also be trashed and the protocol would cancel the operation. Adding transaction safety to this, you would be resends when
things blowup. This should ensure you that your file passes though.
This feels like spam, and it is.. but.. it fits so good, I actually have to write it down for you:
Our product transfers data from machine to machine over different protocols in a transaction safe way.
It might be a bit overkill for what you are describing, but we general talk about Mediation to describe what you are trying to solve.
We can schedule transfers, over different protocols, with the ability to add business logic (translating data from one format to an other on the fly).
On top of my head, I think any of the following would work for you: FTP, SFTP, SCP (we also talk to webservices, x25, databases, etc.. )
If this sounds interesting, you contact us (digitalroute.com) for sales material.. but I do not have any :-) I'm just a developer here. /Kristoffer
How about a version of waste like http://waste.sourceforge.net/ or http://wasteagain.sourceforge.net/ ? It's encrypted and if the network connection drops, the transfer just restarts when the connection comes up again.
Fer christs sake!
WinSCP using either SCP or SFTP protocol. End of story. Can't believe this is being asked.
SFTP [...] SSL [...] FTP is a plain-text protocol
Wrong. FTP has a binary mode.
When we're talking in crypto-mode---as we are, made evident by the references to SFTP and SLL---the words "plain-text" refer not to ascii vs. base64 vs. bin2hex encoding, but whether the data is encrypted or not.
AFAIK, FTP doesn't have any provisions for encryption.
Your point is orthogonal to that, so your "Wrong" is wrong: your parent is not wrong. Your point is true, however: FTP does have issues with line endings etc.; rule of thumb: use binary for everything and recode newlines on the client side if need be.
Using SFTP over an already secure network will only slow things down greatly.
Sending data unencrypted through the air is not what I consider "secure". So, while true, this point is not particularly relevant (I think).
try the Windows command line FTP
I vote AYE on this! Everyone knows that BSD code is rock solid ;-)
TCP is so horrible. I wish HTTP used UDP by default so I wouldn't have the pro
Kidnapped by CJ too, I take it?
If only certain transfers from point to point are commonly failing then you probably have wiring issues. In either a hardware or medium case, you need to be fixing the network instead of finding workarounds.
Wiring issues? It's wireless. The summary stated that the physical layer is a microwave radio. Radio will always have noise. How many network admins have the domain knowledge to troubleshoot that?
Try opening a newlined file with notepad, for example.
Gedit on my Ubuntu laptop saves with LF newlines, and Windows Notepad can't read them because it expects CRLF newlines. But I add one to Notepad and all is well. In fact, pretty much every text editor but Windows Notepad can handle the differences between UNIX and Windows.
As for VMS, how much VMS is used in world-facing applications as of June 2009? Even HP, the owner of copyright in VMS after having bought Digital's parent company, uses HP-UX, Linux, or Windows Server on its popular public websites. Even HP's site about VMS was found to use HP-UX. Netcraft confirms it: VMS is dead.
And if FTP translates oddball operating systems' conventions for text/plain files, why doesn't it do so for image files (.ppm vs. .bmp), audio files (.au vs. .wav), or other MIME types?
Or 'WebSphere MQ' for you young'uns.
http://en.wikipedia.org/wiki/IBM_WebSphere_MQ
This transformation actually depends on client and server, not OS (although it was created because of Windows - Unix conventions).
OK, I guess technically FTP isn't *quite* older than Windows, but I really don't think there were many people using Windows 1.0 in 1985. :)
Here's some of the operating systems in use at the time:
OS/360, VM/CMS, etcetera: IBM operating systems stored text as 80 column card images, padded with spaces.
VMS and RSX-11 used a one or two byte record length, an optional binary line number, an optional Fortran carriage control character, followed by the text, with null padding added when a line would overlap a block boundary.
CP/M used carriage return or carriage-return line-feed, depending on the application, with a ^Z character to indicate end of file because files only had block counts, not byte lengths.
They'd have KILLED to only have to worry about CRLF. :)
I'm assuming you want something scriptable, but as a regular GUI replacement for Windows' file copy stuff, TeraCopy (http://www.codesector.com/teracopy.php) is quite nice. It's sort of like a GUI version of Robocopy.
FYI, the 32-bit version integrates perfectly with the Windows shell, but the 64-bit version's integration was a complete pile the last time I tried it (a while ago). It didn't work automatically after the installation, and even the manual integration instructions didn't get it working. TeraCopy is much less useful when you specifically have to open the app and select the source and destination to start the copy.
Damn, I picked the wrong RFC.
FTP dates back to 1980. That's older than MS-DOS.
Terracopy is a great program for copying files inside windows. It does error checking and supports resume if your connection drops.
You can download the home edition at http://www.terracopy.com/ to give it a try.
I had issues with copying about 300gb of backups around different servers until I installed that, works great.
We have similar needs, only we're exchanging files across dodgy Internet connections (e.g., satellite links to sites in the developing world). Our requirements including operation over low-bandwidth connections and the ability to suspend and resume transfers. We settled on Windows Live Sync, since it works on Mac OS X in addition to Windows, and because it required no additional software development effort on our part. Had Live Sync not been available, we would have developed our own wrapper around BITS. Because BITS is an extension to HTTP, it degrades gracefully into something interoperable with non-Windows clients. (BITS would also work over a private network, but that wasn't a feature we required.)
I'm proud of my Northern Tibetian Heritage
... is not the tool you use to remove the creases from your shirts.
IANAL but write like a drunk one.
What do you suggest? That he introduces optical fiber to the middle of Nowhereville?
Some times you have to deal with less than ideal situations for your job, yo will certainly have to explain that you'll find constraints, and yes, sometimes you may have to say that something is not possible, but mission critical does not mean what you think it does....
IANAL but write like a drunk one.
Connect:Direct is what the banks use.
It is cross platform and relatively easy to implement.
IANAL but write like a drunk one.
Using modern encryption like SSH does guarantee that things *have to add up* since keeping what you start with a secret
When you first wrote this it went against what (little) I knew about encryption. I'm very weak on the math, but I know that some encryption algorithms use a rotor model, meaning that they're just a software implementation of the rotor encryption machines used during WW II (Enigma being the most famous.) So it just doesn't make sense that a transmission error would screw up the process.
I still can't say that no encryption algorithm will choke if there's a transmission error, but I now know for a fact that 3DES (the encryption SSH uses by default) won't. And yes, 3DES is a rotor algorithm.
I decided to get my hands dirty with the DES software on Linux. (3DES is just DES with bigger keys.) Took a text file, encrypted, changed a single bit, decrypted. That one-bit change turned 10 bytes into garbage! Rest of the file was fine.
SSH has an option to use Blowfish instead of 3DES. Don't understand that algorithm well enough to say how it would handle transmission errors, and don't have time to set up a test.
I recently worked on a government program that had some of the same requirements that you describe. Lockheed-Martin proposed that they create a custom protocol to do this at the cost of about a zillon dollars. Some research turned up SCTP "http://en.wikipedia.org/wiki/Stream_Control_Transmission_Protocol" which is now available in Windows. We were using an implementation for Solaris.
If you're just sending files to yourself, then you can probably choose FTP and still live a productive and meaningful life.
However, if you have users, I strongly recommend against using FTP.
I had to set up a FTP server so that some business partners could send us regular bulk updates.
Setting up the firewalls was a nightmare. Did you know that FTP uses 2 socket connections? One for command and another for data. I didn't. The ports used for the second data socket are arbitrary. I know almost nothing about TCP/IP, networks, etc. and had zero control over the network administration. Imagine trying to resolve firewall issues alongside other know-nothings.
The FTP protocol has evolved over time, making configuration a nightmare. Being a legacy protocol, the client and server aren't smart enough to negotiate the protocol settings. No, you have to tell the user exactly how to configure their client. Some fun. Most the users were domain experts, not even IT monkeys or devs. "It doesn't work" was about the extent of the feedback I'd get.
Eventually, I ended up surveying the FTP clients the customers were using, installing them all on my machine, figuring out how to get all those fuckers configured, documenting (with screen captures) all the settings, and hope the users followed my instructions exactly. This process took months.
Don't even think about using FTP+SSH. Every client and server handles those things differently. If use a self-signed certificate and the clients freak out with warnings, so then the users think they're being hacked. I used FileZilla, which wasn't horrible. But getting that fucker to use my certificate was a chore.
My recommendation is to use secure copy over SSH. Putty is a pretty simple, if butt ugly, client.
Dealing with a fixed-record length file structure is OS dependent
I can create fixed-record-length files in UNIX, Windows, Windows Mobile, or any other operating system that supports ANSI C. Writing a record looks something like the following (untested):
But after reading this file I'm starting to see how deep the rabbit hole goes. A "stream" text file under VMS is a familiar '\n'-delimited list of strings, but a "non-stream" text file is a list of 16-bit-aligned Pascal strings. Likewise, classic Mac OS had the '\r'-delimited list of strings (TEXT files and TEXT resources) and the list of Pascal strings (STR# resources), but I never thought of STR# as a "text file" as much as a "list of distinct strings".
It costs money, but buy the file transfer edition. You get reliable and encrypted data transmission.
Simple :)
If you ever drop your keys into a river of molten lava, let'em go, because, man, they're gone.
$ cygrunsrv -I cron -p /usr/sbin/cron -a -D
$ net start cron
Or... install some linux distro and get the real thing
Kermit is your friend. You could also use zmodem with sliding windows :)