Writing Linux Kernel Functions In CUDA With KGPU
An anonymous reader writes "Until today, GPGPU computing was a userspace privilege because of NVIDIA's closed-source policy and AMD's semi-open state. KGPU is a workaround to enable Linux kernel functionality written in CUDA. Instead of figuring out GPU specs via reverse-engineering, it simply uses a userspace helper to do CUDA-related work for kernelspace requesters. A demo in its current source repository is a modified eCryptfs, which is an encrypted filesystem used by Ubuntu and other distributions. With the accelerated performance of a GPU AES cipher in the Linux kernel, eCryptfs can get a 3x uncached read speedup and near 4x write speedup on an Intel X25-M 80G SSD. However, both the GPU cipher-based eCryptfs and the CPU cipher-based one are changed to use ECB cipher mode for parallelism. A CTR, counter mode, cipher may be much more secure, although the real vanilla eCryptfs uses CBC mode. Anyway, GPU vendors should think about opening their drivers and computing libraries, or at least providing a mechanism to make it easy to do GPU computing inside an OS kernel, given the fact that GPUs are so widely deployed and the potential future of heterogeneous operating systems."
Wonder how this compares in performance to AES-NI, because it sure as hell sounds a lot more complex and fragile.
..... through the summary??? Sorry, But, I had to read it 3 times, to sink in.... Sorry... but, as a geek myself, I find this just far too geeky!....Sorry. (hands back geek card!)
Hand off encryption routines to a closed source black box. Brilliant.
(I have never written kernel level code, and the statement that follows is only from listening to what other people are doing)
I thought that a tiny bit of kernel code reflecting calls into a user level process was old news, and has become established as the preferred development model. Is there a reason that it's undesirable?
Because the summary makes it sound like we're sad to be following this model, and we're only doing it because we can't pull NVidia's driver source into the linux kernel.
You are awash in a sea of fiercely stated opinions. Obvious exits are: 'File->Quit', 'Reply', and 'Page Down'.
GTFO!
This is what should be on slashdot, not stories about the latest iphone.
Until they open-source drivers, I refuse to buy them. Stuff like this is typically a nightmare to install and keep running anyway.
I came to read a discussion of writing kernel functions in CUDA and a discussion about the vagaries of encryption methods broke out.
I wonder if this would be any faster than an implementation that took advantage of the hardware AES on the newer Intel CPU's? Latency should be lower for the CPU based version as would memory bandwidth.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
It wasn't too geeky, but it was written as if by someone with ADD. Perhaps no surprise?
Poorly written maybe, but not that geeky. If you were that confused, I'm not certain you ever had a geek card to hand back.
The hardware that is so brilliantly made nobody is allowed to know how it works. And as a result it actually doesn't work. I say congratulations.
Proof of concepts are nice, but when the GPU is firmly planted in the CPU, this will make more sense. The PCI bus can be a bottleneck in these types of situations. AMD fusion is a great example of this idea.
HPC for Primates. Read Cluster Monkey
Wow, the fragility of an encrypted file system plus the instability of a GPU, implemented in the kernel. Do not even read TFA without doing a full backup of your system.
As someone who's doing a lot of the same work, this is pretty spectacular! I'm surprised they get > 100MB/sec in software - but I guess that's due to using ECB mode vs. CBC. I think the real I/O limit here is probably in the user/kernel mem copies - context switch weight can be optimized with good buffer alignments.
We did a lot of testing with CUDA under openssl 3-4 years ago - in the end it was better to just stick with software. The latencies are the real killers.
I said no... but I missed and it came out yes.
Completely off-topic, but I've been looking for a decent ssh client for my crapberry -- thanks!
Be relentless!
Is it a good idea for the protected kernel to rely on unprotected code for critical functions such as filesystem operations? I know that user-space code cannot directly interfere with the kernel, but it also doesn't have to do anything the kernel requests of it. Unless the kernel is designed to treat such user-space code as altogether untrustworthy, it seems to me a bad idea for the kernel to rely on user-space code in this manner.
"In prison you just have to shut your eyes and take it. Here you have to shut your eyes and give it."
I hope this is just a proof-of-concept design because ECB mode should not be used for this purpose. Wikipedia provides a pretty obvious example of the weakness of ECB mode:
"The disadvantage of this method is that identical plaintext blocks are encrypted into identical ciphertext blocks; thus, it does not hide data patterns well. In some senses, it doesn't provide serious message confidentiality, and it is not recommended for use in cryptographic protocols at all. A striking example of the degree to which ECB can leave plaintext data patterns in the ciphertext is shown below; a pixel-map version of the image on the left was encrypted with ECB mode to create the center image, versus a non-ECB mode for the right image."
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Initialization_vector_.28IV.29
it doesn't obscure patterns in your input data. Please take a look at the tux images here; http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29 (it may be faster, but it doesn't f---ing work.)
They should go with OpenCL, then there would be a chance that at one point one can use it with free drivers (and other hardware), but I guess that's the prise you pay for a graduate fellowship from NVIDIA.
Imagine mysql database GPU accelerated...
GPU accelerated routers, gpu acceleration of anti-virus software.
The use of gpus to accelerate search engines.
Sure.. but the summary is still badly written. Read the TFA, and that makes a lot more sense for us illiterate folks.
Instead, one should use OpenCL. It's Platform Agnostic for a reason, but don't let Linux's chance to be hypocritical step in the way.
In former times, people made sure you knew they used Slackware, then LFS, then Gentoo, now Ubuntu.
Distributions are like a penis and religion...
Anyway, get off my lawn.
4x speedup is nothing. Using the GPU correctly should bring much higher speedups.
That kind of gain could simply be obtained by optimizing the CPU code.
there are plenty of architectures specific vector instruction sets on the CPU that the kernel could be taking advantage of instead; for example SSE and AltiVec for x86 and PPC respectivlly.
For the last ~8 years I've needed extremely fast encryption (and compression) in the project I use. A few years ago when CUDA began to gain traction, I got all excited and actually decided to see what was necessary to make it work and see how fast it was.
Well at the time, I discovered that CUDA enabled encryption is quite fast. The problem is that copying the data segment to the GPU, doing the encryption and then copying the result back is painful. The copies and setup/interrupt/etc add so much latency that it runs at a roughly the same speed as just doing the operation on the CPU. Adding a couple of user/kernel space crossings probably makes the problem even worse. So during this timeframe we used dedicated compression/encryption boards for the customers that needed it fast, and everyone else just got a couple of extra CPU's dedicated to the effort. Now with AES-NI dedicated boards generally aren't necessary. Sure you have to buy a machine specifically with AES-NI right now, but I suspect that with all these instruction set extensions, within a couple of years it will be widespread.
To patch the kernel to support such an ugly hack would be quite stupid, given the fact that AES is already fairly respectable (~100MB/sec or so per CPU) anyone that needs it faster could use blowfish, or find a CPU with AES-NI.
Excellent, glad it helps! Look for some updates coming in the fairly near future...
"Anyway, GPU vendors should think about opening their drivers and computing libraries, or at least providing a mechanism to make it easy to do GPU computing inside an OS kernel"
Actually they shouldn't. There's always debate about this kind of thing, but in my humble opinion adding large and complex systems that don't have to be in the kernel into the kernel is not a good thing. For this, a cryptfs userspace crypto shim is a clean solution, this would allow for adding arbitrary new crypto systems too. Regarding "the chicken and the egg", if you have a encrypted root filesystem, a lot of distros already build an initramfs -- basically a preloaded RAM disk -- this is how all the SATA and SCSI drives can be built as modules, but the right ones are loaded before the system tries to mount your hard disk. So any extra cryptfs stuff can be handled there.