Slashdot Mirror


User: LunaticLeo

LunaticLeo's activity in the archive.

Stories
0
Comments
128
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 128

  1. Re:What were they expecting? on School Regrets Swapping Laptops For iPads · · Score: 0

    What do you mean "Based to your UID, you're probably already bald." ? :)

    BTW, I have a full head of hair.

  2. AGPL makes Opa dead on arival on Announcing Opa: Making Web Programming Transparent · · Score: 2

    I don't mind making any and all my code available to others. And I understand the "give-back" qualities of GPL. But a GPL language that makes every program written in it have the GPL License?!? Good grief. I am laid back about the whole License Wars, but AGPL gives me pause.

    I probably won't even bother learning a little bit of this language to understand it's good qualities. Plus I've never cared for indentation defining blocks.

  3. his prediction was not quite correct on Dvorak Says Apple Move to Intel Will Harm Linux · · Score: 4, Informative

    He said in 12-18 months and that was almost 27 months ago. This is something of a nit, but you can't say "Windows will be less than %50 of market share in the next 5 years" then 20 years later say "I told you so" when it actually happens.

  4. Re:Upgrade? Hell, you're already massively over-sp on x86 Commodity-Hardware Router? · · Score: 2, Informative

    Uh, PCI bus is 128 Mega-BYTES per second maximum thruput. That is 1 Giga-BIT per second. And that is just for the standard 32bit at 33MHz speeds. There are plenty of Intel based servers with 64bit and 66Mhz PCI variations.

  5. Re:Headline: Linux Makes Bad Code Look Better on Kernel Comparison: Web Serving On 2.4 And 2.6 · · Score: 2, Insightful

    Thats not informative, thats just dumb.

    I think I'll disagree with you on that.

    How do you think things like I/O completion are implemented?

    I've heard that the have a thread waiting for each completion port, sorta like the aio_* implmentation on Irix. But, you might be thinking I am some Microsoft stuge, just so you know I don't do windows; sometimes FreeBSD (cuz it has some cool stuff), but nearly exclusively Linux for the last 10 years.

    Crappy threading performance on Linux (and in unix in general) has historically been because of crappy threading libraries, and because process creation is relatively cheap so people tended to just fork children instead of spinning threads.

    I think I have used those very words myself.

    Just because you're doing it in kernel threads instead of userspace threads doesn't mean that it's not threaded.

    I never suggested user space threads. They don't require kernel intervention, but they don't utilize multiple CPUs. And M:N threading libraries are out of fashion. Apparently all those smart people realized that two levels of schedulers were hard to make fair and fast at the same time. NGPT lost in favor of 1:1 in NPTL (LinuxThreads just blew chunks). Solaris9 now makes 1:1 threading library default, where Sun used to trumpet the glories of their then new, now old, M:N thread library (now if they would just shoot mtmalloc in the head and be done with it).

    And I'm _really_ not sure why you're showing us a frigging Python framework as some sort of example of super-performant network programming. Pythons great and all but a performance monster it is not. "Yeah, boss, we use a runtime-compiled interperter for all our performance-critical code, but by God we avoid context switching!"

    As I said in another post in this thread: scalability comes from design not from optimization. If Perl or Python or Java are a constant factor slower, but I use a better algorithm, I can beat C/C++ hands down. I saw someones sig that said something like "If it can scale, I can buy performance." So while scalability != performance there is a relationship. And with a bunch of cheap PCs running Linux, I can crush someone elses Apache/JBoss/Websphere "enterprise" app running on some Sun E15K monstrosity.

  6. Re:Headline: Linux Makes Bad Code Look Better on Kernel Comparison: Web Serving On 2.4 And 2.6 · · Score: 1

    Thanks for your reply. Here is my best response.

    Try to remember I was replying to a article about how fast a web-based enterprise application is affected by the new linux kernel. That means many network connections to both clients, middleware, and even databases, with high transaction rates. I am not talking about an FTP server downloading big static files to a few users.

    Unless you are writing an eBay or Amazon.com or Yahoo.com, a threaded/blocking-IO approach will work fine.

    /me smiles

    To be completely optimal you wouldn't want a pure implementation of EITHER model, but some sort of hybrid

    Did you check out SEDA. That is an event-thread hybrid model. Unfortunately, it is not as alergic to context switches as I think it should be.

    The big win is that explicitly maintaining state for connections allows you to avoid context switches. The threading part of a hybrid model is for two things: one taking advatage of multiple cpus and two allowing for the use of blocking APIs when it can't be avoided. Hence, you learn a new rule: threads should scale with the number of CPUs, not with the number of data structures you are working on (aka the implicit "connection state structure").

    There was a time when GUI framework designers thought every widget in an application would have it's own thread. Notice how that hasn't occured? ever wondered why? GUIs are event driven programs with the sprinkling of threads where they make sence; sorta like what I am talking about.

    Maintainability is consistently underrated.

    I couldn't aggree more, but ... always a but ... event driven network programming requires a natural break down of your app to discreet events, aka request-do_work-response. This is how network servers nessesarily operate. Hence, programming in an event driven paradigm (eek I said it) is eminently maintainable and clear. Further, the point of using a framework is to let the framwork writter do all the dirty work and you just express the higher level operations, hence clarity and maintainabilty.

    Lastly, there is a distinction between Design which leads to performance and Optimizations. I am suggesting that in what ever reasonably upto date language you choose, be it Perl, Python, Java, C++ or C, the design of your program determins the scalabilty of your program. Optimizations take alot of work, and don't affect the scalability of your program by orders of magnitude. If an "optimization" does affect your program by that much you are probably removing brain damage, rather than genuinely tweaking your code.

  7. Re:Headline: Linux Makes Bad Code Look Better on Kernel Comparison: Web Serving On 2.4 And 2.6 · · Score: 2, Informative

    what has your two memory management lines got to do with anything?

    When the kernel needs to get free memory pages, it looks on some sort of free page list or it has to find a victim page. There are lots of strategies to find victim pages. The reverse page table mappings allow the kernel to scan only pages in memory for victims and not have scan the virtual mappings for pages in memory AND satisfy some "victimizable" criteria.

    Secondly, reverse page mappings alow you to know more about the page, like it is shared by several processes. The quick access to additional information allows you to make better choices about which page to swap out

    I don't pretend to know everything about virtual memory systems. However, I do read LKML and these are the arguments others have made which to some degree I am parroting here (but I do get the gist of it).

  8. Headline: Linux Makes Bad Code Look Better on Kernel Comparison: Web Serving On 2.4 And 2.6 · · Score: 5, Informative

    The new linux kernel is great, but the reason the this particlular kernel results is better performance ("5 times better") is because the application framework it is testing is horrible.

    All of the "enterprise" applications in this test have several performance cripling features in common: socket per thread connections, fundemental reliance on threads, and massive memory foot print. Apache has one thread/process (the diff is a stack) per connections. Java requires a sizable multiple of memory usage as most other application languages (C/C++ obviously, but also Perl, Python, and PHP). J2EE is an inherently thread driven programming framwork.

    So yes, Linux 2.6 ameliorates the downsides of unnecessary use of threading. It makes thread creation and context switching even faster on the Linux platform.

    And Yes, Linux 2.6 memory management is fundementally better. Reverse Page Table Entry mappings make finding victim pages better; and it is designed to avoid victimizing active pages better.

    But could you all imagine if people were designing fundementally better application framworks? Event driven application architectures like TwistedPython and POE, or Event-thread hybrid systems like SEDA.

    The performance stats given in that article are shit, complete utter shit. I know. In the proprietary world I work in, I code faster programs on the same Linux platform on a daily basis; orders of magnitude faster.

    All the accomplishments of Linux 2.6 can be used for true performance programming. I plead with you all, stop using Threads until you know what they are good for. Stop using the stack to maintain your program state. Throw off the shackles and learn to program network servers.

  9. Re:I'm busy tonight on Elegant Universe Airs Tonight on PBS · · Score: 1
    FYI, the federal contribution to PBS thru CPB (Corperation for Public Broadcasting) is less than %14 (see this link ) .

    Now there is also State contributions to individual PBS affiliated public broadcast stations. I don't know what the aggregate State contribution to all of Public Broadcasting in the USA.

  10. Recipe for a computer room on How Would You Build a Datacenter? · · Score: 3, Informative

    I assume you are not going to build a "datacenter", but rather build out a computer room. Given that here is what I have to say.

    Don't build a computer room or datacenter. Find a commercial hosting service. Rent some cages and contract for reserving contiguous cages.

    If you don't like the commercial hosting service here are the things I did to build out a computer room.

    Power: Contract with a commercial electrician to get many more 20amp drops. The electrical contractor will know how to deal with the owners of your building to arrange the additional circuts. For most two processor intel boxes you can estimate 3 amps per box.

    You can calculate the required volt-amps of your UPSs with this approximation UPSs volt-amps = Volts * AMPs * .7 . Computer volt-amps is really less than the volts times amps, due to complex impedence. Disk arrays are closer to 1.0 scaling. Don't skimp on power for disk arrays.

    Get rackmounted UPSs spec'ed out for the hardware connected to them. Don't skimp out here either.

    Cooling: You can purchase "portable" air conditioners and put them in your computer room. They will drop the excess heat into your office ceilings; assuming you are in one of those buildings with popup ceiling tiles. Office buildings recycle heat this way so it is OK. Find out if your building turns off AC on the weekends and nights. I was at a place that did that, and it sucked working weekends and it sucked worse for our computers. If they do cut AC on the weekends, then you will need more BTU cooling from your portable air conditioning.

    If you are really going to build a datacenter contract with an appropriate architecture firm. In my mind a "datacenter" is a basement or whole building with full on-site deisel power generators and raised floors or overhead wire guides. That is probably not something required for upto 100 hosts. Over 100 hosts is where that might be a good idea.

    Did I mention that commercial hosting service? You may grow out of your office space with employees and want to move. A commercial hosting service provides far greater quality computer and network capacity, and the don't tie you down to much.

  11. Re:When will it end??? on AMD to debut multi-core CPUs in 2005 · · Score: 1

    Ironicly, these deficits in the x86, non-orthogonal instruction set and a paltry 4 "General Purpose" registers, have forced x86 cpu makers to develop advanced techniques.

    One thing most people recognize is that CISC design allowed mem to mem streaming copies. And that has a performance advantage such that even RISC designers added similar instructions, even though it violates the LOAD/STORE doctrine.

    Another issue where CISC design benifits performance is that the instruction stream is smaller. Given the growing difference between cpu and memory speeds, saving instruction fetch times helps. You can think of CISC instructions as a customized compression of the instructions, and the ubiquitous conversion of x86 to "mircro-ops" (which are very RISC-like) in AMD and Intel cpus as the decompression phase.

    Lastly, the dearth of registers forced x86 cpu designers to create and optimize speculative execution with "shadow" register sets. If you only have 4 registers it is relatively (to 32 registers) cheap to create seperate register sets you can switch back to if a speculative branch fails. RISC developers are behind in speculative execution techniques because shadowning 32 registers was an expensive technique to develop. Now with 16 64bit registers in x86-64, which can be individually addressed at 32 32bit registers, x86-64 doesn't have the register starvation problem. I read some study that showed that you reach diminishing returns performance-wise with more name 20-30 registers.

    It is ironic that a shitty ISA like x86 has become an performance advantage 20+ years after it's original design.

  12. Use SCP with the 'none' block cypher on Sending Files w/o Sending Clear Passwords? · · Score: 1

    You need to compile openssh with 'none' cypher. This is not copiled by default.

    The 'none' block cypher will transfer you data in the clear. This gives a near-ftp speed transfer of your data. However, the good thing is that you get the full SSH authentication with passwords encrypted.

    If you can't convince your Sysadmin to compile and install a SSH with 'none' cypher. The next best thing is to use the 'blowfish' cypher. It impacts cpu usage and transfer speed less than any either cypher I have tested.

    BTW, the usage is as follows:

    scp -c none file remote.ip:/dest/

    or

    scp -c blowfish file remote.ip:/dest/

    Good luck.

  13. Re:Question for Java and Perl developers on Eye on Java performance Improvements · · Score: 2, Informative

    I think Perl. Use POE; see poe.perl.org. Fast well structured code. POE is gods gift to Perl programmers (or at least Rocco Caputo's Gift).

    Otherwise use java.nio. Unfotunately, since it is a new api there is only one shitty application framwork built around it called SEDA. At first, I thought SEDA was cool, then I used it, found problems, tried to report problems, got no response, noticed there have been no updates in nearly a year. Fuckers.

    If you like Python there is a feature rich, event loop style app framwork called TwistedPython. Haven't used it but it looks good. Check out www.twistedmatrix.com .

  14. Re:Yes, but... on Linus Moves To OSDL, Will Work On Kernel Full-Time · · Score: 1

    WooHoo!!! I got one (firewall stuff).

    I am not a windows guy. Went from DRDOS to OS/2 to Linux. Not really a religious thing. So I was unaware of some of the newer things in Windows.

    "From your comments, you appear unclear as to what NUMA is", ummm yeah, nooo, I sorta do. Thanks for the condescension though. I was referring to the dinosaur companies that make MS Windows NUMA machines. I was unsure whether that was a product by MS or an enhanced version of Windows by the computer manufacturer.

    FYI, NUMA is a controversial implementation for "Big Computers". Don't get stuck on it. I think it is for lazy programmers who don't want to use explicit message passing (as opposed to than implicit in NUMA). Also, NUMA is for hardware manufacturers who want to sell big ticket items. Not to be one of these linux-beowulf-wennies, but cheap-"er" hardware with high bandwidth low-latency interconnects is IMHO a better buy.

    hot-swap RAM? No I don't think so. What is the PC standard for hot-swaping RAM. You have to drain the ram and reprogram the memory controller to redirect bus requests for non-relocatable memory locations to other RAM. What Wintel box has a reprogrammable memory controler? This sounds not like a MS feature, but rather a hardware manufacturer's code. Maybe MS has hook for it though.

    hot-swap PCI? Yes.

    usable asyc I/O? (one of my beefs to) aio comming in 2.6 to a kernel near you. I am more impressed with FreeBSDs kqueue api. BTW, who the fuck uses select(2)? Even linux implements sys_select as a wrapper around do_poll. Point taken though.

    kernel debugger? Yes, but as a patch, Linus has issues with debuggers as crutches, and interfering with code paths, yada yada yada.

  15. Re:Yes, but... on Linus Moves To OSDL, Will Work On Kernel Full-Time · · Score: 2, Informative

    You are making a good point, but I think I can name three:
    [ Note: I am only comparing the MS Product WinXP or Windows Server 2003. If that is to restrictive I imagine you'll correct me. Also I am only thinking about kernel level features.]

    - Very robust full featured statefull packet munging, filtering, notifiying thing (aka firewall).

    - IPv6

    - Support for 64bit address spaces and CPUs. (Where is the ia64 or x86-64 Windows on this?)

    - NUMA (Does some version of Windows support Non-Unified Memory arch, may be something from Wang or some other dinosaur company).

    - I am sure there are some esoteric network protocols linux supports natively. But I am not so impressed by that.

    - Ether-switching (aka bridging; plus some stateful inspection).

    This is from the top of my head. NFS is probably another, but MS has that LanManager file system, CIFS.

  16. Re:um on QNX: When an OS Really, Really Has to Work · · Score: 1

    Mac OSX derives from NeXT OS and sure it uses an fork of the very old Mach 2.5 (I'm pretty sure) kernel. But it runs the entire OS in a single server last I heard. So Mac OSX is really just using Mach as a portability layer.

    QNX is a true multi-server micro-kernel OS.

    Even linux has a single-server Mach micro-kernel implementation called MkLinux. But that doesn't really make linux a Mach micro-kernel based OS.

  17. Re:As I said before on What Subnotebooks Work Best w/ Linux? · · Score: 1

    I have one as well.

    Do you have Linux installed on it with ACPI (aka suspend/resume mem or disk) working?

  18. Is it me.. on Why Nerds Are Unpopular · · Score: 1

    I've never felt like a "Nerd". No one picked on me in school, or they learned not to. And we had a big group of friends. Maybe it was because I went to a large high school, like 550 graduating class. I didn't even know most of the people in my class. The "smart" kids were pretty much segregated into the Honors classes. I grew up thinking the "Nerd" aphorism was dead, like a bad 70s kick. There were the "preppy" smart kids, and the "alternative" smart kids, and even the ROTC smart kids, oh yeah and "BandFags" but they were a very tight clique.

  19. Re:anybody use this? on POE 0.25 Released · · Score: 4, Informative

    Yeah, I use it to write high performance network servers. I think it is foolish that Rocco hasn't bothered to call it 1.0 yet. It is pretty stable. I love POE; it is powerful and smart.

    Here are some observations:
    - On a 1GHz linux box POE can execute 5000 no-op events per second. I conclude from that that the overhead of POE is pretty small.
    - Network programming is easy under POE. Network programming is inherently asyncronous (ie event driven). Hence any other paradigm, like blocking read/write threads, is a mismatch that undermines performance.
    - Breaking code into logical events allows for restructuring and refactoring code very easy.

    My advice, is to learn POE as soon as you can. It is a conceptual change in how you write code. Once you are over the learning curve, you will have a powerful new tool in writing Perl. Further, It will lead you to a new (proper) way of thinking about programming in any language.

    P.S. My favorite Alan Cox quote:
    "A computer is a state machine. Threads are for people who can't program state machines"

  20. Re:Ace HW needs a clue on Scaling Server Performance · · Score: 1

    But you failed to mention that for N-way SMP servers Event-driven + few threads == BEST.

    Agreed. My defense is two fold:
    1) Event-driven IO + a few worker threads is still requires the fundemental paradigm shift (ugh...yes I said it) to an event-driven state maintaining framework.
    2) In my post, I had already written alot and I just wanted to make a point about the fundemental design choice that leads to performance.

    SEDA is very cool, but forces a particular granularity of messaging via the queues that I fidn annoying and can cause unnecessary work when it is hard to break processing of events into discrete stages.

    I think you are misreading the SEDA design. Breaking a pipeline into logical stages is good design, but it doesn't need to be a deep pipeline and the stages don't need to be broken into even chunks (in terms of CPU usage). As an example, Apache is broken into approx 7 stages. They are something like url parsing, authentication, authorization, content handling, logging, and send response. The content handling stage is by far the biggest. In SEDA this fat stage would potentially accumulate more threads in it's thread pool.

    I think you've pronouced the death of thread-per-connection badness to early. Java just recently got the ability to do asyncronous IO in ver 1.4 . An nearly every book on C++ and Java teaches handling concurant sockets with "Just create a thread and ...".

    BTW, to utilize the performance available in a multiprocessor high performance network server with a single threaded event-loop program is not that hard. First, you start two event-loop programs. If you want to have only one TCP listen socket, instead of using and external IP load balancer, you can have one of the processes do the listening and pass accepted connections to the other process via a Unix Domain Socket. BTW, Apache does this. However, I still concider symetric multiple processes a variation on a single mult-threaded process (or vice versa).

  21. Ace HW needs a clue on Scaling Server Performance · · Score: 3, Informative

    Ace's Hardware needs to research real servers before talking about their "scalable" servers. Their numbers are really saying that their box performs like a dog.

    For those of you interested in this topic here is a few pointers and words of wisdom.

    Server scalabilty and performance has three basic metrics, thruput (urls/sec), simultaneous connections, and performance while overloaded. Of course, you could add latensy but I'd argue that with the correct design latency is directly proportional to the real work you are doing, bad design insertes arbitrary waits.

    I know of a HTTP Proxy by a large ISP that does user authentications & URL authorization (re: database), header manipulation, and on-the-fly text compression at 3000 urls/sec for 2000-4000 simultaneous connections and maintains that performance under load by sheding connections, all this on a dual 1GHz Intel PIII box running a Open Source OS that starts with "L". That is a maximum of 260 Million URL/day, three orders of magnitude greater performance than Ace's Hardware stats.

    The simple answer to the question "How do I create a scalable fast network server?" is Event-driven GOOD & Threads BAD. Event driven network communication is two to three orders of magnitude better performing than thread/thread-pool based network communications. See Dan Kegel's C10K web page. That means you must use non-blocking IO to client sockets and databases. Once you accomplish that small feat, dynamic content just consumes CPU; with 2.8 Ghz Xeon processors you have plenty of cycles for parsing HTML markup or whatever. Threads cause cache thrashing, and context switching. While thread programmers don't see the cost in their code, just read the kernel code and you'll see how much work HAS TO BE DONE to switch threads. Event driven programming just takes some state lookups (array manipulation) and a callback (push some pointers onto the stack and jump to a function pointer).

    Desgin is FAR MORE IMPORTANT than which runtime you use (execution tree, byte code, or straight assembly). I have done some very high load network programming with Perl using POE.

    Python has Twisted Python

    Java has the java.nio and the brilliant event/thread hybrid library SEDA by Matt Welsch.

    I am also looking into the programming language Erlang which builds concurrancy and event driven programming into the language. Further, Erlang is used by some big telco manufacturers to great effect (high performance and claimed 99.9999999% nine-nines reliability on a big app).

  22. Misconception of CatB on The Cathedral In The Bazaar? · · Score: 5, Insightful

    I am a bit annoyed by the constant mis-reading of the Cathedral and the Bazzar. ESR was originally exploring why the Linux kernel (aka the "Bazzar") progressed so fast without any published plan, nor an explicit patch inclusion process, and chaotically open mailing list. This was compared with FSF projects like gcc (aka. the "Cathedral"). Who had explict project goals and schedules, an elite group of patch committers, and a closed mailing list.

    CatB is not about Open vs. Commercial (usually closed). I do grant that Commercial is nearly always closed "Cathedral"-like development. But not always. The programming language Rexx was developed in IBM in a process like the "Bazzar". This worked because IBM mainframe community was pretty big and had good communications.

    So MySQL having GPL and Proprietary licencesing policies, does not make it both Cathedral and Bazzar. It has nearly always been Bazzar-like (though clearly they had fixed committers and some planning about features).

    Sheesh!

  23. Re:But separate processes ARE better. on Information for Managers - Understanding pthreads? · · Score: 2

    under Win32, thread creation is *much* faster than process creation, for a number of reasons (and COW is only part of that)

    I am aware of the init-fork model versus VMS/Win32 model of spawning processes without any inheritance.

    I would love to know what makes spawn() or CreateProcess() (I think those are the respective calls) so much slower in the thread versus process? I realize creating the task struct and inititializing its parts is greater than thread init, but is it the same order of magnitude of slowness as on old non-COW operating systems?

  24. Re:But separate processes ARE better. on Information for Managers - Understanding pthreads? · · Score: 2

    you have to realize that sharing data structures amongst processes are often times going to require using similar algorithms

    Your point it correct. I really wanted to make the point about debugging code using shared resources, not writing the code. I meant "effective" in the sense of limiting opportunity for bugs, and quickly killing the ones that show up.

    It is easier to debug processes than threads (re: availability and maturity of tools). It is conceptually easier to focus on debugging known limitied shared resources.

    A point I didn't originally make, was that context switching is most important when you are switching alot (duh!). First, you don't want to be switching alot that is called "load" and you are thrashing your caches whether or not it is threads or proceses. However, Context switching does naturally occur when you are message passing between two processes/threads (another reason micro-kernels are inevitably slow). I think message passing is really good, but the rate and size of messages can really blow performance with context switching, cache thrashing, and marshalling/unmarshalling. I still like it. :)

  25. Re:But separate processes ARE better. on Information for Managers - Understanding pthreads? · · Score: 3, Informative

    I must disagree with your advantages/disadvantages list. It was mostly accurate a decade ago and still true on some poorly designed operating systems (like Solaris).

    > Advantages (Thread vs. Process):
    > - Much quicker to create a thread than a process.
    "Much" is the nit I am picking here. The main difference between a process and a thread is that threads share the VM. Modern operating systems do Copy-On-Write (COW) for process creation. Hence, the diff between threads and processes is the creation of Page Table Entries (PTEs) in the process example (but that isn't as dramatically slow as copying the pages). Given that VM issues are the biggest performance issue in proccess creation versus thread creation, with COW you should only see minor to moderate speed hit for process creation versus thread creation.

    > -Much quicker to switch between threads than to switch between processes.
    That is again an implementation issue, and not true in the general case. Solaris makes context switching between processes slow compared to context switching between threads. Don't think of Solaris as a good example of a performance operating system (they have other things just not performance). An old benchmark comparison I saw, (circa 1998), it was lmbench running on Linux and Solaris 2.5.1 on the same UltraSparce hardware. The bottom line is that Linux's process context switching was faster than Solaris' thread (LWP) context switching.

    > -Threads share data easily

    Urr! I know you meant that it is easy because you can just pass pointers versus using some data sharing API like SysV shared memory. But I think threads are difficult to share memory _effectively_. Either you lock shared resources, or you get clever with thread safe algorithms/data structures. And that stuff gets real HARD real FAST.

    I just realized that you were including user space threading packages in your definition of threads. Yeah they can be pretty fast.

    If you want to expand your mind beyond this simpletons game of debating threads. Check out event driven systems, and co-routines. There are different trade offs for sure, but they are easier to debug than threads.