Remote Direct Memory Access Over IP

← Back to Stories (view on slashdot.org)

Remote Direct Memory Access Over IP

Posted by timothy on Sunday April 27, 2003 @06:22AM from the ping-to-break-your-train-of-thought dept.

doormat writes "Accessing another computer's memory over the internet? It might not be that far off. Sounds like a great tool for clustering, especially considering that the new motherboards have gigabit ethernet and a link directly to the northbridge/MCH."

14 of 166 comments (clear)

Min score:

Reason:

Sort:

Remote shared memory by sql*kitten · 2003-04-27 06:29 · Score: 4, Informative

This feature has been available for a while now, but using a dedicated link rather than IP. Sun call it Remote Shared Memory and it's mainly used for database clusters.
1. Re:Remote shared memory by sql*kitten · 2003-04-27 06:55 · Score: 5, Informative
  
  Doesn't MOSIX already do this? Or does MOSIX just migrate a single process from one host to another without 'sharing' memory?
  
  I'm not familiar with MOSIX, but Oracle uses RSM on the theory that the high-speed RSM link is always faster than accessing the physical disk. So if you have 2 nodes sharing a single disk array, and Oracle on one node knows that it needs a particular block (it can know this because in Oracle you can calculate the physical location of a block from rowid as an offset from the start of the datafile - that's how indexes work) then the first thing it will do is ask the other node if it has it. This is called "cache fusion". If it has, then it is retrieved. Previous versions of Oracle had to do a "block ping" - notify the other node that it wanted the block, the block would then be flushed to disk, and the first node would load it. This guaranteed consistency, but was slow. With RSM, the algorithms that manage the block buffer cache can be applied across the cluster, which is very fast and efficient.
  
  Speaking of process migration, there is a feature of Oracle called TAF, Transparent Application Failover. Say you are doing a big select, retrieving millions of rows, connected to one node of a cluster, and that machine fails in the middle of the query. Your connection will be redirected to a surviving node, and your statement will resume from where it left off. I'm unaware of an open-source database that can do either of these.
2. Re:Remote shared memory by mindstrm · 2003-04-27 07:21 · Score: 2, Informative
  
  No, mosix just migrates the user context of a task to the remote machine.
  
  There is some primitive distributed shared memory support in OpenMOSIX; no idea how stable it is though. Normal openmosix/mosix won't migrate tasks requiring shared memory (ie: threads)
not necessary for 90% of distributed computing by eenglish_ca · 2003-04-27 06:32 · Score: 2, Informative

Sharing memory is not necessary in distributed programming if the variables are kept mostly local and a single computer works mainly with what it has stored in its local memory. This is very applicable to renderfarms where the acceleration scheme itself works very well for distributed rendering because methods such as the grid subdivides into cells each of which can be stored on and evaluated on a single computer with its local memory. Only a central computer is needed to control these nodes and store the ouput which is of very limited size and without great computational needs.

--
Checking out my form of escapism.
Infiniband has excellent support for this by rdorsch · 2003-04-27 06:50 · Score: 4, Informative

Servers will very soon be equiped with Infiniband (http://www.infinibandta.org/). Infiniband has dedicated support for RDMA. This includes efficient key mechanisms, which minimize operating system involvement (which would be context switches each time) and low latency. Bandwidth available right now is 2.5 GBit/s and higher bandwidth can be anticipated very soon.
Re:Intel's VI Architecture by jonsmirl · 2003-04-27 06:51 · Score: 3, Informative

Here is an article that explains VI vs RDMA, etc.
RDMA article
FreeBSD's firewire already can do this by imp · 2003-04-27 06:55 · Score: 4, Informative

FreeBSD already supports gdb over firewire using
the firewire bridge ability to DMA to/from any
location of memory. Very handy for remote kernel
debugging.
No on read the article or looked at the spec. by nerdwarrior · 2003-04-27 06:55 · Score: 5, Informative

This technology is not what the headline claims.
First, what the headline would have you believe has been invented is making it appear as though the RAM of one machine is really the RAM of another machine. This technology has been around and used for quite some time in clustered/distributed/parallel computing communities since at least the 1980s.
If you look at a brief summary of the spec, http://www.rdmaconsortium.org/home/PressReleaseOct 30.pdf, you'll find that all that's happening is that more of the network stack's functionality has been pushed into the NIC. This prevents the CPU from hammering both memory and the bus as it copies data between buffers for various layers of the networking stack.
I'll also note that the networking code in the linux kernel was extensively redesigned to do minimal (and usually no) copying between layers, thereby providing very little advantage of pushing this into hardware.
Please, folks, don't drink and submit!
NUMA by TheRealRamone · 2003-04-27 06:55 · Score: 5, Informative

This article defines NUMA as
"an acronym for Non-Uniform Memory Access. As its name implies, it describes a class of multiprocessors where the memory latency to different sections of memory are visible to the programmer or operating system, and the placement of pages are controlled by software. This is in contrast to shared memory systems where the memory latency is uniform or appears to be uniform. ...may be further subdivided into subtypes. For example, local/remote and local/global/remote architectures. Local/remote machines have two types of memory: local (fast) and remote (slow). Local/global/remote machines add one more type of memory, global, which is between the local and remote memories in speed."
which seems to cover all of this.
Re:Bah, old stuff by C32 · 2003-04-27 07:08 · Score: 2, Informative

whoops, foot-in-mouth alert! -_-, ms = milli, not micro.. sorry.
That doesn't invalidate my point about networking latency though...
Re:Also by Rufus211 · 2003-04-27 07:11 · Score: 2, Informative

Erm, read the FAQ. As a previous person said, why would network access to DMA be any worse than local DMA? I mean you could open it strait up and have no memory checks or anything (*cough* win98 *cough*), but why on earth would you do that? Here's what their FAQ says:

Some Objections to RDMA
Security concerns about opening
memory on the network
- Hardware enforces application buffer
boundaries
Makes it no worse than existing security
problem with a 3rd party inserting data into the
TCP data stream
- Buffer ID for one connection must not be
usable by another connection
plan9's had this since it started by DrSkwid · 2003-04-27 07:28 · Score: 4, Informative

The proc device serves a two-level directory structure. The first level contains numbered directories corresponding to pids of live processes; each such directory contains a set of files representing the corresponding process.

The mem file contains the current memory image of the process. A read or write at offset o, which must be a valid virtual address, accesses bytes from address o up to the end of the memory segment containing o. Kernel virtual memory, including the kernel stack for the process and saved user registers (whose addresses are machine-dependent), can be accessed through mem. Writes are permitted only while the process is in the Stopped state and only to user addresses or registers.

The read-only proc file contains the kernel per-process structure. Its main use is to recover the kernel stack and program counter for kernel debugging.

The files regs, fpregs, and kregs hold representations of the user-level registers, floating-point registers, and kernel registers in machine-dependent form. The kregs file is read-only.

The read-only fd file lists the open file descriptors of the process. The first line of the file is its current directory; subsequent lines list, one per line, the open files, giving the decimal file descriptor number; whether the file is open for read (r), write, (w), or both (rw); the type, device number, and qid of the file; its I/O unit (the amount of data that may be transferred on the file as a contiguous piece; see iounit(2)), its I/O offset; and its name at the time it was opened.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Re:XFree86 and DRI by soccerisgod · 2003-04-27 07:53 · Score: 2, Informative

That would be possible, but do you think it's faster than GLX? It both goes over a network first... and I guess the security implications of such an attempt are more serious than the ones present in GLX implementations....

--
If a train station is a place where a train stops, what's a workstation?
not pushing 10 Gb...yet by soldack · 2003-04-27 11:15 · Score: 2, Informative

I have seen dell 2650s hit over 800 Megabytes (6.4 Gb) per second running MPI over InfiniBand using large buffer sizes. The limit is pretty much the PCI-X 133 Mhz interface we are on. I suspect that with PCI-X DDR and PCI Express, we will be able to get a lot closer to 10 Gbit.

--
-- soldack