Java IO Faster Than NIO
rsk writes "Paul Tyma, the man behind Mailinator, has put together an excellent performance analysis comparing old-school synchronous programming (java.io.*) to Java's asynchronous programming (java.nio.*) — showing a consistent 25% performance deficiency with the asynchronous code. As it turns out, old-style blocking I/O with modern threading libraries like Linux NPTL and multi-core machines gives you idle-thread and non-contending thread management for an extremely low cost; less than it takes to switch-and-restore connection state constantly with a selector approach."
Of course old school techniques are faster. We don't drop old school because we want better performance, we drop it because we're lazy, and want easier ways to get the job done!
I need trepanation like I need a hole in the head.
JDK7 will bring a new IO API that underneath uses epoll (Linux) or completion port (Windows). High performance servers will be possible in Java too.
Look at the timestamp of this presentation :) It's a bit of old news.
It was discussed here: http://www.theserverside.com/news/thread.tss?thread_id=48449
And it mostly shows that NIO is deficient. I encountered similar problems in my tests. Solved them by using http://mina.apache.org/ .
never really used the NIO API,and never saw a reason to,it's not even included in the SCJP exam,which is the basis for all Sun certification
Now we just need a Sun spokesperson to have a press conference and show that old-school blocking I/O with modern threading libraries affects all languages, not just Java.
Trolling is a art,
I'm not sure where / when NIO got equated to lower latency. The primary benefits of NIO (from my understanding of having designed and deployed both IO and NIO based servers) is that NIO allows you to have better concurrency on a single box i.e. you can service many more calls / transactions on a single machine since you aren't limited by the number of threads you can spawn on that box (and you aren't limited as much by memory, since each thread consumes a fair number of resources on the box).
For the most part (and from my experimentation), NIO actually has slightly higher latency than standard IO (especially with heavy loaded boxes).
The question you need to ask yourself is... do you require higher concurrency and fewer boxes (cheaper to run / maintain) at the expense of slightly higher latency (which would work well for most web sites), or are your transactions latency sensitive / real-time, in which case using standard IO would work better (at the cost of requiring more hardware and support).
Java NIO is actually rather old at this point and is slated for a huge overhaul with NIO.2 which will be released with Java 7 this Fall.
This looks like polling vs. pending, and if it is, pending won that war about 40 years ago.
the entire point of asynchronous is to acknowledge you will be waiting for IO, and try to do something else useful rather than just wait... asynchronous will obviously end up taking more time because of the overhead of managing states and performing the switches, but the tradeoff is something useful was getting done while waiting for IO a little longer instead of doing nothing except wait for the IO to complete. which method is best is completely application specific.
I wonder how much slower nio2 will be...
You'll laugh, hysterically.
On Windows, the fastest way to do multithreaded I/O with a producer/consumer queue pattern is IO Completion Ports.
The fastest way to write a bunch of buffers to disk is WriteFileScatter. The fastest way to read a bunch of data from disk is ReadFileGather.
SQL Server uses these APIS to scale.
When I used to work at MS in evangelism, there was a big debate about how Unix does things one way, and Microsoft does it a COMPLETELY different way that you just can't #define away - it's just different. A guy named Michael Parkes said "I cannot go to these clients and say REPENT! and use IO completion ports! They do thread per client, because they have fork()".
When you listen to the technical explanations, the Microsoft way actually IS better - it's just aht it's totally incompatible with evrything else.
Learn IOCP and watch your context switches drop.
Ff you have multiple cores that do nothing otherwise (like all benchmarks happen to act), multithreading will use them and asynchronous nonblocking I/O won't, so maximum transfer rate for static data in memory over low-latency network will be always faster for blocking threads.
In real-life applications if you always have enough work to distribute between cores/processors, your nonblocking I/O process or thread will only depend on the data production and transfer rate, not the raw throughput of the combination of syscalls that it makes. If output buffers are always empty, and input buffers are empty every time a transaction happens, then both data transfer speed is maxed out, and adding more threads that perform I/O simultaneously will only increase overhead. If it is not maxed out, same applies to queued data before/after processing -- that is, if there is processing. So if worker threads/processes do more than copying data, then giving additional cores to them is more useful than throwing them on to be used for I/O.
Contrary to the popular belief, there indeed is no God.
This presentation is actually from 2008 (as indicated by every single slide in the PDF -- and thanks for the PDF warning, BTW). Aside from being old, is there any indication that it's still true?
https://www.eff.org/https-everywhere
IIRC, even in Novell NetWare the old blocking I/O calls which prevented the client from doing anything else while it waited for a response from the server were much faster than the new non-blocking I/O...
I've abandoned my search for truth; now I'm just looking for some useful delusions.
So, you're touting .NET's cross platform friendlyness against Java's?
Really?
Mod me down, my New Earth Global Warmingist friends!
I was given the task of handling 10k concurrent connections on a $500 linux box. Standar IO (multi-threaded) quite simply could not cut it - the OS context-switching overhead is prohibitive after a certain number of threads. NIO can handle 10k concurrent connections easily, and keep scaling. At lower-volume, even up to 2000 concurrent connections, regular IO will be fine. But for higher-volume, NIO is a must.
This may be true for Java.
It isn't true for C/C++.
With C/C++ and NPTL, the many-thread blocking IO style yields slightly lower latency at low IO rates, but offers significant latency variability and sharply decreased thruput at higher IO rates.
It seems that the linux scheduler is much to blame for this-- the number of times that a thread is scheduled on a different CPU increases dramatically with more threads, and this trashes the caches.
I've seen order-of-magnitude decreases in performance and order-of-magnitude increases in latency as a result of what appears to be the cache trashing.
PropJavelin, for the HYDRA.
Which XNA version lets the public program for the Wii or Playstation?
You do not have a moral or legal right to do absolutely anything you want.
I... just... I don't even know what to say to this other than to echo the other comment.
Really? That's your argument?
The best way to write IO is to use one thread or process per CPU core and in that thread use non-blocking IO. I thought everyone knew this.
That's your argument?
Allow me to rephrase: If you're trying to bring an application to a given platform, sometimes the platform forces the language choice. On iPhone, it's Objective-C++; on Windows Phone 7, it's a .NET managed language. I agree that Java is useful on platforms with a JVM, but you may need to reconsider use of Java if your business plan includes expanding to a platform that lacks one.
Which XNA version lets the public program for the Wii or Playstation?
Nintendo and Sony have made the choice not to let the public develop for their platforms. (Even on PS3, the latest firmware will erase your Other OS.) Microsoft is the only game console maker with a public SDK, and the only way to use Java on its platform is through J#, which Microsoft appears to be phasing out.
So, you're touting .NET's cross platform friendlyness against Java's?
It doesn't matter how many platforms language X and library Y run on if the set of such platforms doesn't include platform P, and platform P is the only platform that remotely matches the business plan for your application.
it won't matter, they'll rename nio2 back to nio after realizing that nobody can figure out the java numbering system.
What really pains me is when people decide to do the "easy" threaded I/O and then evolve strange monstrosities like thread pools to try to deal with the scaling problems. Allocating N*k worker threads to N processors and then doing select-style polling in each of them is just so ugly. Either the OS and language runtime should provide high performance async I/O or unlimited threading via smarter compilers and schedulers (or both). Having every application evolve from "trivial but slow" (simplistic async I/O) or "trivial but fragile" (simplistic threaded I/O) into "very complex but usually fast and scalable" (weird hybrids with thread pooling and application level work dispatchers) is just a terrible waste of software engineering resources.
shut up and mod me up.
I've done similar benchmarking. On average, Java socket IO takes 3000 cpu cycles to handle a packet, including kernel time. That isn't bad for TCP which is a complex protocol.
Java NIO implementation easily adds another 1000 cycles on top of that. It tries to provide too much abstraction and it does excessive locking.
Therefore, Java NIO is about 25% slower, it looks like, in a naive saturation test.
But it's meaningless. 1000 cpu cycle is nothing in a java program. Let's do the simplest non-trivial thing with the data, say, xor each bytes. That's about 2 cycles per byte, and 2000-3000 per packet. That's another 50% slowdown. A real application needs to do much much more than that, and the overhead of Java NIO becomes negligible.
NIO is better if there are a lot of connections blocked on read/write. But this is odd if you really think about it! Why couldn't thread impl achieve the same thing? As of today, the only reason is that thread stack size is fixed and quite large. Java's default stack size is 256K. A few thousands of threads will consume all the memory on board. If you maintain connection state yourself, it's far smaller than that, and it pays to not create one thread for every connection.
You use threads when your app is CPU-bound and async I/O (it's actually sync, it's just multiplexed, but the name stuck) when it's I/O bound. Using threaded/synchronous models works better when you have a small number of connections, because the OS scheduler can cope with them. Going to 10k connections over the same model is downright stupid. Your software would spend more time switching contexts than performing useful I/O. Hybrid models, with per-thread I/O loops, are a good approach. IIRC, Mina does the same thing, moving connection objects from one thread to another when their state changes, while doing async I/O in each thread. I'm not entirely sure on this, I'm not a Java guy, but I remember reading this in a paper at some point.
FYI: Java will run on platforms that support C. Please see GNU gcj and the Classpath project.
So from a different perspective, Microsoft had to kill off Java to get anyone to use XNA, and this is supposed to be evidence of XNA's superiority?
...But I digress...
I don't think you quite got my point. Let's try a few more examples:
As should be painfully obvious by now, placing arbitrary restrictions on a comparison makes the comparison meaningless. Your original statement was that comparisons are null if the target systems aren't equal. Limiting the discussion to a single case where you know the comparison is flawed makes the comparison useless.
Instead, let's simply compare where the two technologies can be used. Java can be targeted for many systems. XNA can run on four. For any randomly-selected non-PC target platform, it seems the chance of Java working is significantly higher than XNA (or anything else, for that matter).
A more equally-weighted comparison is Java vs. .NET. Both are based on publicly-available specifications, and both offer similar functionality. I'd argue that neither is any better than the other in theory, though in practice Java has better support.
You do not have a moral or legal right to do absolutely anything you want.
Java will run on platforms that support C.
XNA Game Studio does not support C or Standard C++. It supports a largely incompatible C++ dialect called "C++/CLI with /clr:safe".
Please see GNU gcj and the Classpath project.
Does gcj compile Java into C or directly into object code? The iPhone developer program requires that Xcode and only Xcode compile your program's source code, which must be written in Objective-C (of which C is a subset) or Objective-C++ (of which Standard C++ is a subset).
I don't think NIO ever claimed to be faster than old IO. Its primary selling point is (and always has been, IMO), its ability to handle tens of thousands of connections on a single thread. With old IO, ten thousand connections would require ten thousand threads. I don't care how good your context switcher is -- that is a high price to pay, especially if the machine has other high-thread-count processes running as well.
A more equally-weighted comparison is Java vs. .NET.
If your game's back-end (physics and AI) is written in any 100% Pure .NET language, an XNA front-end (input, graphics, and sound) for this game can be created. But the only way to share the back-end between Java front-ends and .NET front-ends appears to be J#, and I don't know how long Microsoft plans to maintain that.
So when you're pushing data as fast as you can through a socket, the old read(byte[]) or write(byte[]) are faster? Wow, no kidding.
You do NOT use java.nio (like Jetty's SelectChannelConnector) for maximum throughput. You use it to handle persistent connections, like all those long polling requests via AJAX which return on an event or timeout after a minute. This article is like recommending Apache with its hard limits on how many requests it can serve concurrently over newer, asynchronous servers like Nginx for static media servers with keep alive enabled.
The slides even mentions the C10K problem, but what it doesn't do is mention when to use either technology - async IO for concurrency and endless scaling, and synchronous IO for pushing a 10G Ethernet link to the limits. No wait, the nio setup can do that too, 700MB/s or 5.6Gbit/sec per core on 2008 hardware should be enough to max out anything you can buy now. It's great that synchronous IO can hit 1GB/s, a whopping 30% faster, but useful? I'd say no.
For most users, you don't use either API. Lets be honest here, writing highly concurrent software is hard, why reinvent the wheel when you can get off the shelve software that can do it better? You use Jetty and choose between the SelectChannelConnector or SocketConnector, or choose between Apache or Lighttpd/Nginx depending on the traffic pattern. What you do write is the bit that accepts a whole HTTP request and returns a HTTP response, everything before and after is magic.
Unless you're a file server, each 50k sized HTTP response will require enough work to make sure you run out of CPU or Disk IO long before you hit even the 100Mb/s ceiling in most rack switches. Even if your app is fast, 16 cores x 100ms per request x 50K is only 62 Mbits. Not 5600.
But if you need to scale in concurrent client count, there's no way around async IO. The latest name to watch is Netty. In Plurk Comet: Handling 100,000+ Concurrent Connections with Netty, it scales up to 100000 concurrent connections on a quad core server with 20% CPU load.
Just stop worrying about sockets already, and start worrying about your SQL server suffering a meltdown. Even if you get manage to grow into the Facebook, it's not like using synchronous IO will save you from deploying 30000 servers, it's the application code that's slow. Zero copy, one copy, "string concatenation style twenty copies response building" socket writes don't matter at all, memcpy is cheap compared to a few lines of interpreted code, servers are cheap compared to developers, and never mind the cost of the programming gods giving these presentations.
Same facts, different story:
If your game's back-end (physics and AI) is written in any 100% Pure JVM language, a Java front-end (input, graphics, and sound) for this game can be created. But the only way to share the back-end between Java front-ends and .NET front-ends appears to be J#, or COM or JNI, through one of many supported forms of integration, or use the good old IPC systems the past 40 years have given us.
It seems .NET is the limiting factor here. Perhaps it'd be better just to get rid of it. This is 2010. There's no good reason to be locked into a language (or platform) anymore. As programmers, we should be thinking about creating the future. We should not be concerned with the ridiculous artificial limitations corporations impose on us. With XNA, Microsoft is giving you a small sandbox, for which they'll charge $99 whenever you try to build something interesting in it. To them, you're not an innovator. You're not a developer. You're a source of income.
You do not have a moral or legal right to do absolutely anything you want.
or COM or JNI
.NET has P/Invoke, a rough counterpart to Java's JNI, but in practice, it works only on PC. Non-PC platforms tend to require bytecode to pass the verifier, and they reject all native code not signed by the device maker even if it is compiled for PowerPC or ARM as applicable. Both the Java platform and the .NET framework have this problem. I'll admit I was a bit narrow-minded in claiming that compatibility with a video game console is a notable advantage of .NET over Java.
The slides are dated 2008. Was there any follow up? Were these experiments repeated and/or confirmed anywhere else?
Thou shalt not begin a subject line or post with the word "Umm".
Technically, you can't run C anywhere. You can run machine instructions that are translated from assembly that was compiled from C, but not C directly.
Now Brainfuck, on the other hand...
You do not have a moral or legal right to do absolutely anything you want.
You don't 'run' C, you compile it.
And from the gcj page:
"It can compile Java source code to Java bytecode (class files) or directly to native machine code, and Java bytecode to native machine code."
The compiled native code would then be dl'd to the device you describe, and executed directly. No VM involved.
"Gold still represents the ultimate form of payment in the world." - Alan Greenspan, 1999
My understanding is that it is not supposed to be faster. It is non-blocking and asynchronous which serves a different need.
Actually, Paul published this on his blog probably a bit over a year ago. I remember it well, it's a very nice presentation.
Move sig!
Is it me or is Mailinator a blatant rip-off of http://spamgourmet.com/?
You do understand that you just blew your own pro-.NET comment, don't you?
PS: No one ever retracted the Right Too For The Job principle.
That's not a very good argument. I would say there are more platforms that support java than support the .NET Framework. AFAIK, only windows and the xbox have a complete .NET Framework (Windows of all kinds, including winmo). I have used mono and it's very good, but not yet complete.
Only microsoft can control what is allowed on their devices. When the PS3 could still run linux, it had java. Sony has control of the PS3 platform and removed that ability.
http://soylentnews.org/~tibman
I don't really care how Java does it but I would expect they wouldn't always require one thread per socket. This quickly becomes a bad approach on a server application that has more than a few connections.
Short answer: java.util.concurrent.ThreadPoolExecutor it is available in the Standard Edition since Java 5.0 - very simple to use.
You do understand that you just blew your own pro-.NET comment, don't you?
OK, I admit I was narrow-minded when I emphasized the job of video games. There are jobs where Java is the right tool and jobs where .NET is the right tool. Video games happen to favor .NET over Java at the moment.
NIO provides three additional capabilities to Java:
* select style I/O - managing a bunch of connections with one or more threads (instead of requiring a thread per I/O)
* direct buffers - allow Java to use raw byte buffers from the native OS without having to copy all of the data into and out of the virtual machine for every operation
* channels - which have better defined characteristics for interrupting I/O
Select style I/O is an architectural change to your app that may or may not improve performance depending on your design. Direct buffers are an implementation improvement that will generally improve performance for simple copy operations.
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates