Slashdot Mirror


Multi-Threaded SSH/SCP

neo writes "Chris Rapier has presented a paper describing how to dramatically increase the speed of SCP networks. It appears that because SCP relies on a single thread in SSH, the crypto can sometimes be the bottleneck instead of the wire speed. Their new implementation (HPN-SSH) takes advantage of multi-threaded capable systems dramatically increasing the speed of securely copying files. They are currently looking for potential users with very high bandwidth to test the upper limits of the system."

23 of 228 comments (clear)

  1. Re:Alternative solution for a trusted LAN by upside · · Score: 4, Informative

    You can also use a cheaper cipher. From the ssh manpage:

    -c blowfish|3des|des
                              Selects the cipher to use for encrypting the session. 3des is
                              used by default. It is believed to be secure. 3des (triple-des)
                              is an encrypt-decrypt-encrypt triple with three different keys.
                              blowfish is a fast block cipher, it appears very secure and is
                              much faster than 3des. des is only supported in the ssh client
                              for interoperability with legacy protocol 1 implementations that
                              do not support the 3des cipher. Its use is strongly discouraged
                              due to cryptographic weaknesses.

    --
    I'm sorry if I haven't offended anyone
  2. Re:To *have* such problems... by dm(Hannu) · · Score: 5, Informative

    They claim that the first bottleneck is actually flow control of buffers, which prevents utilizing full network bandwidth in normal gigabit connections. The threads will help only after this first bottleneck has been cleared. They have patches to fix both problems. The slashdot summary was therefore a bit inaccurate, and reading TFA certainly helps.

  3. Re:Alternative solution for a trusted LAN by Anonymous Coward · · Score: 1, Informative

    Or just compile from source and enable the 'none' "cipher".

    I surely missed having that option when copying files between hosts on my LAN. I don't need to hide data from myself. If someone else connects and encrypting data is a concern, I'll simply not use the 'none' "cipher".

  4. Pretty much totaly incorrect summary by Eunuchswear · · Score: 5, Informative
    Almost all the improvements they talk about come from optimising TCP buffer usage. The summary of the fixes:

    HPN-13 A la Carte
    • Dynamic Windows and None Cipher
      This is a basis of the HPN-SSH patch set. It provides dynamic window in SSH and the ability to switch to a NONE cipher post authentication. Based on the HPN12 v20 patch.
    • Threaded CTR cipher mode
      This patch adds threading to the CTR block mode for AES and other supported ciphers. This may allow SSH to make use of multiple cores/cpus during transfers and significantly increase throughput. This patch should be considered experimental at this time.
    • Peak Throughput
      This patch modifes the progress bar to display the 1 second throughput average. On completion of the transfer it will display the peak throughput through the life of the connection.
    • Server Logging
      This patch adds additional logging to the SSHD server including encryption used...
    So the main part of the patch set is "It provides dynamic window in SSH and the ability to switch to a NONE cipher post authentication" and the only part that has to do with threading is marked "This patch should be considered experimental at this time".

    By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?
    --
    Watch this Heartland Institute video
    1. Re:Pretty much totaly incorrect summary by Arimus · · Score: 2, Informative

      By the way, does anybody else think "the ability to switch to a NONE cipher post authentication" is pretty dodgy?


      Not really, for some of the stuff I do via SSH: eg logging into my webhost to untar a patch and apply it the only part of the transaction I want to be secure is my initial password/key-exchange post authentication I really don't give a stuff who sees me type

      cd ~/www
      tar xvfz ~/patch.tar.gz
      or any of the other commands I type in. However it should be down to the admin of the system in the first place to decided whether to allow NONE down-grade (Either on system wide or per user/session basis) and then down to me as a user to decide whether to take advantage.

      --
      --- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
    2. Re:Pretty much totaly incorrect summary by rapier1 · · Score: 2, Informative

      Where the performance boost comes from is going to depend on a lot on the characteristics of the network path. If its a high bandwidth delay product path then the majority of the performance increase may very well come from the dynamic window sizing (this is application layer windowing by the way). However, if path has a low BDP and you are CPU bound then either the NONE cipher switch or the multi-threading may provide more of a performance increase than the window sizing. Alternatively, in some high BDP paths the windowing patch may speed up the transfer enough so that it becomes CPU bound at which point the threading and/or NONE cipher switch will allow the user to make full use of the network capacity. One of our test paths is a transatlantic GigE pipe. With the window patch and full encryption we were able to get around 300Mb/s and then we were CPU bound. When we used either the NONE cipher switch or the multi-threaded AES-CTR mode cipher our throughput increased to 700Mb/s.
      As for the dodgy aspect of the NONE cipher switching. I'll be the first to admit that its not a perfect solution. The authentication remains fully encrypted and you can't use the NONE switch in an interactive session which obviates some of the problems. However, it still, in some ways, is counter to the idea of SSH which is why we came up with the threaded cipher. If you are willing to accept the NONE cipher then you can use that but if you want full encryption then you can use the threaded AES-CTR mode cipher.

    3. Re:Pretty much totaly incorrect summary by rapier1 · · Score: 4, Informative

      As a note - while the NONE cipher switch turns off data encryption we do *not* disable the message authentication cipher (MAC). So each data packet is still signed and authenticated. If it detects any in transit modification of the packet the connection is immediately dropped.

  5. Re:Alternative solution for a trusted LAN by KiloByte · · Score: 5, Informative

    Actually, it appears that (at least on Debian) AES is already the default. Selecting 3des gives tremendous slowdown; blowfish is somewhat slower than AES.

    Copying 100MB of data over 100mbit ethernet to a P2 350Mhz box (the slowest I got) gives:
    * 3des 1.9MB/s
    * AES 4.8MB/s
    * blowfish 4.4MB/s

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  6. Re:Must be why rsync over ssh is much faster by AnyoneEB · · Score: 3, Informative
    Unless your servers are running rsh, rsync is probably going to get routed through ssh, in which case it gets encrypted just like scp. ref:

    Secure Shell - The security concious of you out there would like this, and you should all be using it. The stream from rsync is passed through the ssh protocol to encrypt your session instead of rsh, which is also an option (and required if you don't use ssh - enable it in your /etc/inet.d and restart your inet daemon if you disabled it for security).
    --
    Centralization breaks the internet.
  7. Re:Alternative solution for a trusted LAN by Diomidis+Spinellis · · Score: 2, Informative
    Nc is useful, but it still involves the overhead of copying the data through it (once at the client and once at the server). Nowadays, in most settings this overhead can be ignored. But, given the fact that a well-behaved application will work with a socket exactly as well as with a pipe or a file descriptor, I thought it would be very elegant to be able to connect two instances of (say) tar through a socket. Hence the implementation of socketpipe. Socketpipe sets up the plumbing and then just waits for the programs to finish.

    This is the setup using nc:

    tar --pipe--> nc --socket--> nc --pipe--> tar

    and this is the setup that socketpipe arranges:

    tar --socket--> tar
  8. Re:Hardware acceleration by neumayr · · Score: 2, Informative

    There were crypto acceleration cards, but I think the market was fairly small. They made sense for sites with lots of https traffic, but nowadays general purpose cpus are blazingly fast compared to back then.
    So I guess they disappeared..

    --
    Truth arises more readily from error than from confusion. -Francis Bacon
  9. Re:To *have* such problems... by egede · · Score: 5, Informative

    The limitations of transfer rates for scp is often the round trip time that consumes time for confirmation of received packages. This is a serious issue for transfers from the Europe to the US West Coast (around 200 ms) or to Australia (around 400 ms). Having several parallel TCP streams can solve this problem and has been in use for many years for transfer of data in High Energy Physics. An example of such a solution is GridFTP http://www.globus.org/toolkit/docs/4.0/data/gridftp/.

  10. Re:Sweet! by Per+Wigren · · Score: 3, Informative

    Use NX instead of plain old remote DISPLAY or ssh's X11 forwarding or even VNC! It's silly fast! You get a perfectly usable desktop even on slow, high latency connections. The free edition is free as in GPL.

    --
    My other account has a 3-digit UID.
  11. Re:Alternative solution for a trusted LAN by Neil+Watson · · Score: 2, Informative

    Actually, it depends upon the SSH protocol. Both Debian and Cygwin have this to say:

            -c cipher_spec
                              Selects the cipher specification for encrypting the session.

                              Protocol version 1 allows specification of a single cipher. The
                              supported values are "3des", "blowfish", and
                              "des". 3des (triple-des) is an encrypt-decrypt-encrypt
                              triple with three different keys. It is believed to be secure.
                              blowfish is a fast block cipher; it appears very secure and is
                              much faster than 3des. des is only supported in the ssh client
                              for interoperability with legacy protocol 1 implementations that
                              do not support the 3des cipher. Its use is strongly
                              discouraged due to cryptographic weaknesses. The default
                              is "3des".

                              For protocol version 2, cipher_spec is a comma-separated list of
                              ciphers listed in order of preference. The supported ciphers are:
                              3des-cbc, aes128-cbc, aes192-cbc, aes256-cbc, aes128-ctr,
                              aes192-ctr, aes256-ctr, arc
                              four128, arcfour256, arcfour, blowfish-cbc, and cast128-cbc. The default is:

                                          aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour128,
                                          arcfour256,arcfour,aes192-cbc,aes256-cbc,aes128-ctr,
                                          aes192-ctr,aes256-ctr

  12. Re:Must be why rsync over ssh is much faster by rapier1 · · Score: 3, Informative

    As a note, the changes are actually to SSH itself and not just SCP. So any application that uses SSH as a transport mechanism can see a performance boost. This isn't to say *every* user will. This is mainly geared towards high bandwidth delay product networks (greater than 1MB) or GigE LANs.

  13. Re:Why not loopback? by Mr+Z · · Score: 2, Informative

    BDP is the bandwidth-delay product. BDP is one of the main things these patches address. Loopback has very, very little delay. You could, I suppose, add artificial delay over loopback, but now you're diverging further from the actual deployment scenario.

    The other thing is that when sender and receiver are the same host, you don't engage the full network stack (no ethernet queuing, for example, no dropped packets, etc. etc.), so you don't find out all the curve balls that TCP/IP will throw you.

    And yet another thing is that sender and receiver will compete for the same CPUs, and so whatever upper CPU bound you have with separate sender and receiver, you'll be at roughly half that (assuming send and receive are balanced) when both are on the same machine.

    --Joe
  14. Re:Must be why rsync over ssh is much faster by gazbo · · Score: 2, Informative

    SSH can, of course, be configured to compress automatically.

  15. Re:To *have* such problems... by rapier1 · · Score: 2, Informative

    > if the CPU is the bottleneck, how could adding more threads possibly help? This is actually a great question. On single core systems its very unlikely that the multi-threading aspect of our patch will be of much use to you. The stock version of SSH is, because of its design, unable to use more than one core regardless of how many cores you actually happen to have. Which means that you could have one core thats pegged by SSH and have other cores that are essentially running idle (if you look at the presentation we go into that after we address with window issues). What we've done is allow SSH to offload the heavy work (the encryption) onto other cores in order to make full use of the CPU resources available.

  16. Re:Must be why rsync over ssh is much faster by ComputerSlicer23 · · Score: 2, Informative

    tar cfpz - . | ssh user@host '( cd /destination ; tar xfpvz - )'

    I'd use a "." instead of *, it avoids shell line length problems, and will also copy hidden files... as someone who as learned this the hard way. Also in my experience, on anything faster then 10MB, don't bother with compression (it's really a CPU to network speed ratio, on transfers I did regularly that was the rule of thumb with P4 2.2Ghz Xeons). Also, I removed the "v" from the source tar, as it duplicates every file name twice and can be hard to read. I can't remember if ssh or tar had better compression, I know I tested both. It really just changed the tipping point of the CPU speed. I also used to use blowfish for the cipher as it was easier on the CPU if you were running out of CPU instead of network. On a Gigabit network, I always ran out of CPU first.

    I normally use -C instead of a subshell, but that's merely a matter of taste. I also use the technique in reverse quite often so I can untar on the destination machine as root.

    Kirby

  17. Some comments from one of the authors by rapier1 · · Score: 5, Informative
    First off, thank you for taking the time to read down this far. There have been some very interesting and useful comments so far. Second, I need to point out that both Ben Bennett of PSC and Michael Stevens of CMU were instrumental in getting this patch written. Without them there would be no HPN-SSH patch. I also highly suggest that interested people go to the http://www.psc.edu/networking/projects/hpn-ssh and read about what we've done. There is a lot of good material in the papers and presentations section as well as the FAQ.

    A couple notes about the multi-threading: The main goal was to allow SSH to make use of multiple processing cores. The stock OpenSSH is, by design, limited to using one core. As such a user can encounter situations where they have more network capacity and more compute capacity but will be unable to exploit them. The goal of this patch was to allow users to make full use of the resources available too them. The upshot of this is that its best suited for high performance network and compute environments (The HPN in HPN-SSH stands for High Performance Networking). This doesn't mean it won't be useful to home users - only that they might not see the dramatic performance gains someone in a higher capacity environment might see. Its really going to depend on the specifics of their environment.
    Based on our research we decided the most effective way to do this would be to make the AES-CTR mode cipher multi-threaded. The CTR mode is well suited to threading because there is no inter block dependency and, even better, the resulting cypher stream is indistinguishable from a single threaded CTR mode cypher stream. As a result, we retain full compatibility with other implementations of SSH - you don't need to have HPN-SSH on both sides of the connection. Of course, you won't see the same improvements unless you do.
    We still see this as somewhat experimental because we've not yet implemented a way to allow users to choose between a single threaded AES-CTR and multi-threaded AES-CTR mode. As such users on single core machines - if using AES-CTR may see a decrease in performance. We suggest those users just make use of the AES-CBC mode instead (which is the default anyway). Also, you need to be able to support posix threads.
    Future work will involve pipelining the MAC routine and that should provide us with another 30% or so improvement in throughput.

    Also, its important to keep in mind that these improvements are *not* just for SCP but for SSH as a whole. People using HPN-SSH as a transport mechanism for rsync, tunnels, pipes, and so forth may also see considerable performance improvements. Additionally, the windowing patches don't necessarily require HPN-SSH to be installed on both ends of the connection. As long as the patch is installed on the receiving side (the data sink) you may (assuming you were previously window limited) see a performance gain.

    We welcome any comments, suggests, ideas, or problem reports you might have regarding the HPN-SSH patch. Go the website mentioned above and use the email address there to get in touch with us. This is a work in progress and we are doing what we can to enable line rate easy to use fully encrypted communications. We've a lot more to do but I hope what we've done so far is of use and value to the community.

  18. Re:A likely story by dodobh · · Score: 2, Informative

    This one perhaps? : Threads.pdf

    --
    I can throw myself at the ground, and miss.
  19. Re:FUNNY?! That's not funny, try for TRUE by empaler · · Score: 2, Informative

    No worries, she'll be right.

  20. Re:Must be why rsync over ssh is much faster by timeOday · · Score: 2, Informative

    tar cpzf - * | ssh user@host " cd /destination ; tar xpzf - "
    You don't need a quoted pair of commands, just use tar's -C option

    ssh user@host.com tar -C /remote/path -cpzf - remotefile1 remotefile2 | tar -C /local/path -xvzp -