Faster Chips Are Leaving Programmers in Their Dust
mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."
....it wants it's article back.
Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.
just start a multithread process: 1 core for the program itself, the remaining 7 for the bugs...
II hhaavvee aann XX22 pprrocceessssoor? Ii ccaann ggooeess TTWWIICCEE aass ffaasstt nnooww?
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
I remember learning to write software for OS/2 back in the early 90's. Multi-threaded programming was *the* model there, and had it been more popular, it would be pretty much standard practice today, making scaling to multiple cores pretty effortless, I'd think. It's a shame that the single-threaded model became so ingrained in everything, including linux. For an example that comes to mind, why do I need to wait for my mail program to download all headers from the IMAP server before I can compose a new message on initial startup? Same with a lot of things in firefox.
Does anybody remember DeScribe?
Thank god that Java, C# and other piles of shit I hate do this quite intuitively and easily.
/me closes his eyes and embraces C++ for the last time before the inevitable doom
Guess I had it coming.
Bot Assisted Blogging
How many languages have multithread support already?
Java, C#(?), Fortran(?)...
I haven't been programming in those languages for some time, so just curious, and my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?).
There is a spark in every single flame bait point.
Wirth's law, even though he's not the one who came up with it.
Some algorithms are inherently not amenable to parallelization. If you have eight cores instead of one, then the performance boost you can get can be anywhere from eight times faster to none at all.
So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.
The cake is a pie
MSFT had C Omega, what did they do, crapify it into a .Net library to use via C# which is not the best library to use in a parallel world.
C Omega was nice and abstracted a lot of the parallelism but no they canned that project.
Look at LINQ, they crapified C# with VARIANT types and EXTENSIONS. Totally Crapping on OOP.
http://www.rense.com/general79/wdx1.htm
For now the biggest advantage of multiple cores is the ability to run multiple applications with each running at full speed. Within each application the problems get a lot more complex, using current algorithms many tasks are not easily subdivided. With data that is inherently paralizable it's pretty easy - each pixel on your display is relatively independent of the others and drawing on a common dataset. However the majority of other areas are not so easy. Generally, how do you take an algorithm and divide it in such a way that step A is separate from step B? Especially if the input of step B depends on the output of step A. Now that multicores are becoming common more research will be done in coming up with a fundementally new approach to algorithms themselves but two cores absolutely does not mean two times speed improvements - some algorithms simply cannot be divided with our current level of understanding.
Shh.
If you think your laptop doesn't need to run 8 programs at the same time you really should look under the hood more frequently :-)
Most of the jobs being created are not for achieving maximum speed but standards compliance. Companies want software which is easy to maintain & portable, but not necessarily the fastest. If it still was 1997 there would probably be ubiquitous implementations for SMP & vectored assembly language, but that's not the focus anymore.
The reason that parallel programming is so hard is that we're still using the same computing model that English mathematician Charles Babbage pioneered 150 years ago. It's time to change. To understand the problem, read, Parallel Programming, Math, and the Curse of the Algorithm.
...while all the clever folks have already started writing their scalable applications in something reasonable, like Erlang? No offense to anybody using C++. but I think that C++ would first profit from some serious weight reduction dieting, before they start trying to develop better concurrency concepts.
;-))
(Not that this concerns me too much, I'm going to stay with Common Lisp...yeah, I know, it might suffer from the same issues (sometimes even vague semantics) in some places, but I probably just have that strange incurable parenthetical personality disorder that suddenly broke out on the beginning of the 80's.
Ezekiel 23:20
Just be glad I didn't upgrade to the X4 yet! :)
"...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
There is currently no working concurrency model for standard C++. You want to make an atomic access to an object? Hope and pray that you have bug free system libraries and a compiler that doesn't optimize away your locking wrappers and do inappropriate speculative stores. Apparently the next C++ standard will address it, but it seems rather foolish to start a transition to massively multithreaded code without an actual standard.
"But seriously, isn't the OS responsible for the heavy lifting with regards to task scheduling and concurrency? Oh, wait, this is Microsoft, right? Perhaps this is similar to their take on Security being somebody else's problem."
Huhhh?
My guess is that you never wrote any code.
Linux doesn't do any more heavy lifting for you than Windows does. I doubt that OS/X does.
So what are you talking about.
An OS will never figure out what part of your program is going to need to be in which thread. A compiler MAY at some time do it but they are just now doing a good job with vectors.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
"processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing"
Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video. The video encoding part is already covered - that scales nicely to multiple threads, and even free encoders will use the extra cores to their full potential. That leaves gaming, which is basically proprietary. The game engine must be designed so that AI, physics, and other CPU-bound algorithms can be executed in parallel. This has already been addressed.
So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?
Dan East
Better known as 318230.
Well then you're not remembering very well. There was some crazy statistic floating around that a Prescott at ~25Ghz would put out as much heat per cm^2 as the surface of the sun.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
I agree with some of the previous posters that have faulted programmers for "the state of today." My feeling is that the divide between knowledge of hardware and knowledge of software is far too wide. In my experience, I have witnessed many programmers who spent more time organizing the readability of their code than analyzing the actual effectiveness of it: i.e. whitespace use vs algorithm optimization (be it processor method + instruction or i/o improvement). The end result: bloaty-pooh.
I feel that by making threading a C++ standard, or at least making the threading model a predominant one, the overall "state of today" will improve in the near future simply because more programmers will be aware of it. Parallel processing really does require some training -- it cannot be adapted to every task.
Take for example a simple thumbnail-generating program (a form of is used by everyday users). If the program is written in its traditional linear model, it would not take advantage of multiple processors or cores (unless you ran multiple instances of it or otherwise manipulated it in an unplanned fashion). However, if the program utilizes threading, it could become scalable without requiring any intervention. Knowledge of hardware -- and not necessarily relying on the compiler to optimize your code just might help.
As someone who got a master's in computer science with a focus in high performance computing / parallel processing, and have taught on the subject, *yes*, it does take a bit of work to wrap one's mind around the concept of parallel processing, and to correctly write code with concurrency. But *no*, it's not really that hard. Once you get used to the idea of having computation and communication cycles over a processor geometry, it becomes little more difficult to write parallel code than serial.
It's like of like when people see recursive functions for the first time. If they don't understand the base condition and inductive step, then they can easily fall into infinite loops or write bugs. Parallel code is the same way... just a bit more tricky.
Why should programmers be knocked for only using the tools they have?
It's a lot easier to develop for something if you can actually get your hands on it. When this "nifty but underutilized" sort of hardware gets out there where everyone can use it, perhaps the problem will sort itself out. Not everyone has the resources to test their ideas in this area.
If you're going to knock anyone, knock the professors for ignoring this area of research and not capturing the attention of their students who are now "substandard practitioners".
A Pirate and a Puritan look the same on a balance sheet.
Full disclosure: I am a Qt Developer (user) I do not work for TrollTech
The new Qt4.4 (due 1Q2008) has QtConcurrent, a set of classes that make multi-core processing trivial.
From the docs:
The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.
QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:
* QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
* QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
* QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
* QtConcurrent::filter() removes all items from a container based on the result of a filter function.
* QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
* QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
* QtConcurrent::run() runs a function in another thread.
* QFuture represents the result of an asynchronous computation.
* QFutureIterator allows iterating through results available via QFuture.
* QFutureWatcher allows monitoring a QFuture using signals-and-slots.
* QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
* QRunnable is an abstract class representing a runnable object.
* QThreadPool manages a pool of threads that run QRunnable objects.
This makes multi-core programming almost a no-brainer.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
What a ridiculous idea. The application developer's free lunch is over, now she needs to think concurrently? Ha, she probably has difficulty with a single thread of thought...
I think this sentence makes most sense if you imagine it being read in Comic Book Guy voice.
Toronto-area transit rider? Rate your ride.
It's not easy... especially since things sort of halted at 4 ghz, what on earth am I typing about? Well...picture this...limitations...yes they do exist..and sometimes it's important to think beyond what lies just straight ahead (such as the next cycle speed)...and think into a second...maybe even a 3rd dimmension to expand your communication speed. I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model. Imagine this folks...if your code could "walk" across a matrix of 10 x 10 x 10 instead of just 8 x 8 or 64 x 64 if you want...get the picture, no? Imagine that your data could communicate on a 3 dimmensional axis - imagine that you had 10 stacks of cores on top of each other - and instead of just connecting they communication bus to a parallel or a serial model...they could in fact communicate on a diagonel basis... this would make it possible to send commands...data..etc....in a 3d-space rather than just a "queue". This of course...would demand a different "mindset" of coding... everything would have to be written from scratch....though...but the benefits would be tremendeous .....you could 10 fold existing computational speed by increasing the communication across processor-cores...maybe even more! Even by todays technology standards. Ok..ok...sounds far fetched for you doesnt it? Well..get this...this was my invention 6 years ago (maybe even 9 years ago...I am getting older so I dont really care...I do care for freedom of information and sharing...Not so much wealth so listen on)...The theory of what I just wrote here on Slashdot (which has more implication on your life in the future than you will ever be capable of comprehending...yes...I am full of myself aint i....Who cares? You dont know me) .. point is... There was once a missing brick to the idea of diagonal cross matrix computing....with yesteryears technology it just would not be feasible to do it... but ...if you have ANY understanding of what I write here (yes...I am not kidding...this may change history as we know it...and I am drunk right now...and I dont want to keep a lid on it anymore)...here we go... Please think about what I just wrote - and - look up frances hellman's lecture upon magnetic materials in semiconductors...and you WILL have your 4-th link in the 3-B-E-C (base, Emitter, Collector) construction...to make the Cross Matrix Processor possible....just understand this....JoOngle invented this...Frances made it possible - YOU read it from a drunk nobody of Slashdot.org....) now...go make it real!
What this world is coming to - is for you and me to decide.
Not that it takes massive (by today's PC standards) compute power to do decent speech recognition, but it's definitely worth dedicating a core or two.
And then with Vista, you might need one or two cores dedicated to handling UAC events ("The user tried to breath again: Cancel or Allow?").
processors with more than eight cores, possible as soon as 2010 -- will transform the world of personal computing....
Translation:
Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. While I'm all for better / faster computer hardware, most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos.
"I bow to no man" - Riddick
Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang's model for concurrency. The slides are available here:
http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf
The presentation itself (OGG Theora video available here) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."
The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?
The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.
This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.
For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie.
Real men think in parallel.
A guy who's on the C++ standards committee AND works for Microsoft.
Actually, according to the latest Dr Dobbs, Herb is the *chair* of the ISO C++ Standards committee. (He had an article on lock hierarchies being used to avoid deadlock)
He's really going to know what he's talking about, then.
As chair of the committee, I'd say there's a pretty fair chance that he *does*.
I really love people who bash things just because Microsoft is involved. Contrary to what seems to be a popular belief here, they have some incredibly intelligent people who are very good at what they do there.
Everything I need to know I learned by killing smart people and eating their brains.
...richie - It is a good day to code.
But I sure like my newfound ability to compile multiple source files at once and finish a 5-files-changed compile in a few seconds.
If there's anyone I hate more than stupid people, it's intellectuals.
I've always thought that, didn't realise there was a law for it. People used to optimise everything way back when, but now I suspect that most people just let the faster processor take care of things rather than trying to squeeze every nanosecond of performance out of their apps :( At least graphics are still getting faster just because they're adding more parallel processors to the chips..
which is totally what she said
This is very, very wrong. Data-set partitioning is certainly one way of achieving parallelism in programming, but it is hardly the only way- nor is it applicable to all domains, as many problems have solutions with too many inter-cell data dependencies. In addition, threads provide a wealth of benefits to application developers by allowing multiple unrelated tasks to be performed simultaneously.
There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. It's not at all uncommon for overall performance to be improved by doing something *unrelated* at the same time, requiring less synchronization overhead.
Are threads perfect for everything? No. But calling them the second worse thing to happen to computing is, as best, disingenuous.
The ringing of the division bell has begun... -PF
The bigger problem is that which the article mentioned: That programmers don't know how to take advantage of the parallel cores. There are two major parts to this:
1) Just because a given algorithm can't be implemented multi-threaded, doesn't mean there isn't another algorithm that does the same thing that can. So part of it is learning new ways of doing old things, or inventing new ways of doing things (we haven't discovered every possible algorithm).
2) Rethinking program design so that even though a given algorithm may be a single thread, many of them can run in parallel. As a simple example say you have a program that processes audio. Rather than having it process one track completely, then move on to the next, then mix them all when it is done you have it process each track in a different thread at the same time, then hand off the mixing to yet another thread (most DAWs work this way).
Nobody is saying it is easy (or at least nobody who understands it) but that also doesn't mean it is impossible. I fully agree, there are things where each step is dependant on the previous step and there is simply no way to do two steps in parallel. However I bet those are much less than you might first think, especially in the scheme of a whole program and not just a single algorithm in a program.
Thus programmers face the task of learning how to deal with this, both in terms of program design, new algorithms, and hopefully better compilers to help. It seems as though multi-core is the way of the future at least for a while, so just saying "Well we can't make this parallel," may not be an option.
The fact is that programming by and large has gotten lazy, shiftless and sloppy over time and not any better or faster. They really did rely on processing and memory architectures getting faster to overcome their coding bottlenecks. The words; "optimized code" have little or no significance in todays programming shops because of budgets. Because of the push to get stuff out the door as quickly as possible, corners are cut all over the place on many things.
There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to. There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between. Old school coders had a process in mind to fit a project as a whole and could see the end running program. Many times now, you are to code an algorithm without any regard or concept as to how it might be used. A lot of strange stuff going on out there in the business world with this!
If there is a fundamental change in the base for C++, et al., this is going to possibly have a detrimental effect on the employment market as there will be many who cannot conceptualize multi-threading methodologies much less modeling some existing processing in this paradigm; and leave the markets.
I left the programming markets because of the clash of bean counters vs quality, and maybe this will have a telling change in that curve. I always did enjoy some coding over the years and maybe this would make an interesting re-introduction. I have personally not coded in a multi-threading project but have the concepts down. Might be fun!
All content in this message is copyright (c) 2008. All rights reserved. RIAA is prohibited here.
For the very first time, 8 and more cores CPU will enable exclusively windows users to run an extensive whole botnet spitting out spam... ..all this running on 1 single multicore CPU.
thank you, Microsoft !
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
No. Read the article and/or the thread. This isn't about interfacing with the hardware; it's about how you designed your algorithm in the first place. All that MS could do in the OS, they have pretty much done already. Now it's the application developer's job to work out how to split their program up into little pieces that can all run at once.
That said, there is some mileage in compilers figuring out the parallelism and doing it themselves; but I've seen some attempts at it and they really didn't help all that much, because the real parallelism win is generally at a coarser level than a compiler can easily pick out (i.e. it's beyond the bounds of one function). Compiler technology is going to have to change fairly radically (with a lot of intelligence moved to the linker) if we are really going to see a significant payoff from automatic compiler-driven parallelism. I think it could happen, but it's probably a way off yet.
Methinks you have no idea what you're talking about. Perhaps you should google for Herb Sutter?
It's the OS Stupid, Not Parallel Programming !!!
Just because the latest and greatest release of a New OS by a certain vendor is dog slow doesn't mean it's time to start blaming Programmers and calling them LAME.
There are several good Operating Systems out there that handle multiple threads on multi core machines just fine. They even do this in there basic scripting languages native to those Operating Systems and many have been doing them since the 70's.
There are techniques out there that handle work just fine in a Parallel Program/Core Environments. On a side note, Data Encapsulated Object Oriented techniques are not always the best way handle performance issues. A look back in time has the several answers to this question and more. (Less We Forget)
--- Old engineers never die, they just build away. (By deweycheetham) ---
Well, I'd guess he's thinking about the classic scheduling and concurrency problem of many processes trying to run on few cores. While it's surely interesting and OS's handle it slightly differently, it's a known problem dating at least as far back to the first multi-user unix machines. Now you have the problem of having one big task, and meny cores which isn't really new (you had beowulf clusters, SMP machines etc. before) which always required the programmer to specificly design for the work to be broken down and reassembled. The OS will never do that for you, in the general case it's a really ugly problem anyway - at which point is the overhead so large, you'd rather do it locally? If I'm sorting a list of ten items it may not make sense at all. A million and I might split it on all cores. A billion and I'd send it off to a server farm - if I have the bandwidth. Asking programmers to answer conditionals like that (and that may easily change at any moment) is just begging for trouble. But no, the OS doesn't have clue either.
Live today, because you never know what tomorrow brings
Two words: Clippy Core
in early CD players -- there was a shift away from 16 - 24 - 48 bit DACs when someone came up with the idea to use a 1 bit DAC 16 times as fast -- the same thing is happening now with cores on a CPU -- somewhere, someone is going to think of a CPU with 4096 -- 16383 -- and yes, even 32,768 discrete 1 bit cores (!!) -- it is going to take someone substantially changing the kernal scheduler to take advantage of it -- but there WILL be thousands of cores in a single CPU within the next ten years -- but it will require a rethink in the kernal scheduler, and programmers in their habits to take advantage of it.
So, what you're saying is the sun is a beowulf cluster of 25Ghz Pentiums?
So the OS is supposed to magically know how to poke it's fingers into a process and divide the workload of that process among multiple cores/CPU's without incurring performance loss or data corruption? If you know how, many people would sure be happy to know your secret.
If there's anyone I hate more than stupid people, it's intellectuals.
I have little hope for the C++ standards committee. It's dominated by people who think really l33t templates are really cool. Everything has to be a template feature. They're fooling around with a proposal for declaring variables atomic through something like atomic<int> n; This allows really l33t programmers to write really l33t code using really l33t lockless programming. But without the proofs of correctness needed to make that actually work reliably.
It's also long been Strostrup's position that concurrency is a library problem. As long as the OS provides threads and locking, it's not a language problem. This isn't good enough.
The fundamental problem is that, as currently defined, a C++ compiler has no idea which variables are shared between threads, and which are never shared. The compiler has no notion of critical sections. Fixing this requires some fundamental changes to the language. It's known what to do; Modula, Ada, and Java all have synchronization and isolation built into the language. But there's nothing like that in C++, and the designers of C++ don't want to admit their mistakes.
It's not just a C++ problem. Python has a similar issue. Python as a language doesn't deal with concurrency adequately. The main implementation, CPython, has a "global interpreter lock" that slows the thing down to single-CPU speed.
I didn't notice myself knocking anybody for using any tools. It's just that even now, C++ seems to be complicated enough with quite a lot of dark places even without trying to adopt some concurrency model. That might qualify as knocking the C++ committee :-), but perhaps I'm not the only one to do so. I have a friend who, in the past few years, publicly displayed quite a lot of C++ zealotry. As he was doing quite a lot of hard-core programming in almost anything you can imagine for the psat twenty years or so, his love for C++ stunningly contrasts with the insults I've recently heard from him, addressed to the C++ committee folks. I would very much like to see the qualities of C++ preserved and its pitfalls removed or compensated, but I fear this need not necessarily happen in the near future.
Ezekiel 23:20
I agree that threads are a crappy way of doing parallel processing. It's better to just extend c/c++ with some message passing interface. That way if a programmer actually knows parallel processing, they can incorporate it into their code in a portable manner. And people that don't know any parallel programming can still write an inefficient version of the same thing. For all the hype, parallel programming isn't THAT difficult. After all, something like VHDL or verilog is basically just parallel programming.
I know that languages like Erlang and Haskell are better for concurrent programming than more traditional languages. However, so far they have not been as popular as more traditional languages.
Will the new world of concurrency cause a shift in language popularity? Or will traditional languages remain more popular, perhaps with some enhancements? C++ is gaining concurrency enhancements; C++, Python, and many other languages work well with map/reduce systems like Google MapReduce; and even with no enhancements to the language, you can decompose larger systems into multiple threads or multiple processes to better harness concurrency.
If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? People grow very attached to their familiar languages and tools; to displace the entrenched languages, alternative languages need to not just be better, they need to be a lot better.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
The OS needs to be non blocking. I have multiple core servers running 2000/2003/XP in which I've wrote programs that can take advantage of the multiple cores. I'm not doing anything fancy, I have a parent job with multiple children jobs. The parent is responsible for giving the children work. The children all run independent of each other. Thus, the nature of Windows, the load is spread across the cores.
The problem I have is that I use some Windows API's to enumerate things such as all the machines in the network, shares on a remote machine, netbios info, etc. I've found that many of these calls are blocking. So when I attempt this call, the entire machine slows down to a crawl and API's in other programs will not complete until the API has finished doing it's thing. For example, one of these calls prevent you from unlocking the PC until it's complete.
I'm not an expert in this topic, that is empirical observations on my part. Perhaps Vista & 2008 server will resolve this.
... but I'm going to reply anyway.
Software that needs to be multithreaded today easily can be, and usually already is. There is no issue in the industry.
Please move along, nothing to see here.
The heat transfer problem has not been solved for thick, 3D processors with millions of transistors.
There's an alternative to multi-threading. Will we see MPI used in more than just supercomputing applications? The message-passing vs. shared memory arguments are as old as the hills in high-end computing, with most (but not all) applications using message passing. As the memory hierarchy on personal computers looks more and more like NUMA, message passing starts to look really attractive, even when the parallelism is across a single node.
Instead of developing single-core chips with better performance, chip makers are now making multicore machines and expecting developers to provide the extra performance.
Without the work of developers, multi-core chips will be like the extra transistors in transistor radios in the 1960s: good for marketing but functionally useless.
People used to optimise everything way back when, but now I suspect that most people just let the faster processor take care of things rather than trying to squeeze every nanosecond of performance out of their apps :(
Thank God for that.
I'm glad that coders today can use high-level tools and languages without having to spend half their time on performance tweaking.
Take as an example a game like Halo (or Guitar Hero, or World of Warcraft, or whatever your favorite modern game is). If the developers of these titles had to execute the same amount of care in optimization as developers did on the Atari 2600 -- where often, the author had to unroll simple countdown loops because they could not afford the overheard of DEC and BEQ instructions -- yes, the game kernel would probably run twice as fast. But on the other hand, each game would take a decade to complete!
I'd happily trade some (but not all) efficiency in program execution for an increase in efficiency in program authoring. And that's exactly what we've done.
Coarse-grained and/or multi-threading can help a program's design and performance if done well. But many people simply run one or two apps at a time (or, more commonly, switch between them) and all the cores in the world aren't really going to help things that much for most things. Gaming, certain computations (SETI, etc) perhaps, but not the day-to-day applications.
As a point of reference, I did some C and FORTRAN scientific research programming on an Intel box with 1024 processors (running Unix) back in the day (early 80s). A little skill, practice and a good compiler were the tools of the trade. Funny, even with all those processors, VI didn't run any faster :-)
It must have been something you assimilated. . . .
Fine grained (spread your for loops across processors) and coarse grained parallelism (different independent actors exchanging messages and working on tasks separately) are two completely different approaches, though they generally use the same mechanisms. Everybody always focuses on the fine grained and how that affects algorithms, but I personally believe that personal computing yields more benefit from coarse grained parallelism, where nothing in your program blocks because every task that it's performing is independent. Having modal, sequential operations that you have to wait for your computer perform before you get control back for an unrelated task in the same program is absolutely absurd in this day and age.
The few instances where a personal application does spend significant time in a single task (media manipulation, mostly) could use fine grained parallelism, but that is not the common case. Stop whining about algorithm parallelism and get your system/application design broken out into independent components and tasks properly.
Besides, as others have said, neither is particularly difficult to do properly. It's when you try to hack in threaded shared access without having properly contained the mutable data that you shoot yourself in the foot.
Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up. As for the compiler "optimizing" everything there is a little keyword : volatile that just tells the compiler not to optimize memory access for that varible. A think the real problem is working in a new programming paradigm : have a problem with sharing variables : code everything using pure functions.
I agree with MOBE2001, back when I was Sysop I remember even Microsoft's tech's saying that Multi-threading just doesn't assemble the code that was run on separate core's very efficiently, it was all due to the re-assembly algorithm in the OS.
It is arguable that the time-spent-on measurement is something of a red herring. Why might a programmer spend a lot of time on whitespace? The predominant reason is that it has to be readable. Economically it's a bad idea to spend your time on reducing your consumption of something plentiful (CPU cycles) at the cost of something that is not plentiful (programmer-hours, which, btw = $$$).
With most projects - by which I mean most business projects, because that is where most development occurs; in a business - looking at 'simple' programs is not as clear an indicator of what path you should take as you might like. There are certainly tradeoffs - writing nice, elegant, compact code that utilizes multithreading may, in fact, improve your bottom line. But it's currently a riskier proposition than I think good business sense indicates. The first and only measure is functionality. Only where it can improve functionality will these concerns be addressed.
That is until the research institutions can pathfind to a better way of doing things. But it's not as simple as deciding multithreading is the wave of the future - even if it is.
[Ego]out
Nothing is more embarrassing than doing math in public, but here's a go:
.01% chance of that), then a 25GHz Prescott would only output 1/10th the power of the Sun on a /cm^2 basis.
Surface area of the Sun:
Diamater = 1,390,000 km. 4 * pi * r ^2 = 6,069,871,166,000.839 km^2
That's roughly 6*10^12 km^2, which is 6*10^22 cm^2.
Total energy output of the Sun: "386 billion billion megawatts" (per second)
That is 386*10^24 W.
386*10^24 W / 6*10^22 cm^2 = 38600 W / 6 cm^2 = (roughly) 6433 W/cm^2
Prescott CPU = 112 mm^2 = 1.12 cm^2, energy output = 103 W (TDP @ 3.4GHz).
103 W / 1.12 cm^2 = 91.9643 W/cm^2 (@ 3.4GHz) * (25/3.4) = 676.208 W/cm^2 (@25GHz)
So, _IF_ my math is right (approx
(Note: Lameness filter sucks.)
my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?)
LabVIEW, by its very nature [which is graphical - based on "G" - the "Graphical" programming language] is kinda/sorta topologically self-threading: If a piece of LabVIEW code sits off in its own connected component, then [more or less] it gets its own thread.
Of course, all your ".h" & ".c" [or ".cc"] files [& their innards] might very well break down into little distinct connected components which are ripe for running their own threads, it's just that you can't - unless you're some sort of a super genius - you can't readily visualize all those connected components as they exist in your code.
Now you and your colleagues could try to anticipate the connected components a priori, during the "planning" phase: You could draw huge pictures on the dry-erase board, and everyone could yell and scream at each other about the topological structure which the code should ultimately embody, and then everyone would have to promise - Scout's Honor! - that they would stick to the blueprint [which they might very well resent as having been shoved down their throats by some pointed-headed suit who didn't have any clue what he was talking about] - but the beauty of LabVIEW is that THE CODE IS THE BLUEPRINT [which I think is a point that Jack Reeves used to make].
There's actually a Slashdotter, MOBE2001, who maintains a blog called Rebel Science News, who's got some pretty interesting ideas here - he seems to be leaning towards a graphical approach to this [realizing that the fundamental nature of the problem tends to be topological, rather than anything which we (YET!) would recognize as semantic], but his program is very, very ambitious [if I had a couple of spare lifetimes, I must just throw one in that general direction].
Another line of thought which everyone should keep an eye on is the discipline of Petri nets - it's kinduva big graphical/topological approach to state machines, which [if someone were to put the necessary elbow grease into it] might prove to be very useful in squeezing the most bang for the buck out of these massively-multicore CPU's.
Most "personal computing" software can only *really* benefit from two or maybe three threads. One for the GUI, and another for any task that can take more than ~100ms. This is already common practise in some languages (eg. JAVA), and it should become standard. But there's no reason for you're average mail reader needs 8 threads.
I could see a tabbed browser using a thread for each tab, but then, you're really running multiple instances, and using you're program as a "tabbed window manager" -which is more the exception than the rule.
"Knowledge is the only instrument of production that is not subject to diminishing returns" -Journal of Political Econom
No need for parallel computing all cores are already used.
:-)
Core one: For the OS
Core two: Anti-virus
Core three: Anti-Spyware / Windows Defender
Core four: Firewall
Core five: Windows update notifications and installations
Core six: Windows Genuine advantage checks
Core seven: Eye Candy (Vista) with XP you get a bonus CPU
Core eight: What ever the user wants to run, except when you get a virus, then
you have to share it with the SPAM bot.
Guess we will be waiting for 16 core CPU's.
Oh and don't start me on memory requirements
Threads aren't harmful - what an asinine thing to say. Bad programming is harmful. BUGS are harmful. The article you reference is about map reduce, which is actually more about distributed computing than threads. If you're really so concerned about concurrency programming errors, just use executors in java. Even a monkey could do it. Want to control threads manually but you're having problems debugging? Check out the multi-threaded testing framework: http://code.google.com/p/multithreadedtc/
Oh, damned, ignore my previous post - I messed up the threads. (How fitting fot this article... ;-))
Ezekiel 23:20
When I first started programming, in BASIC on an Apple ][ (not IIe), I remember being baffled by the fact that the computer did not operate with multiple concurrent streams. To me, this seemed the point of making something that was "more than a calculator," and the only way we would be able to do the really interesting stuff with it.
When I first started writing object-oriented code, I was somewhat dismayed to find that OO was an extension to the same ol' linear programming. It seemed to me that objects should be able to exist as if alive and react freely, but really, they were just a fancy interface to the linear runtime. Color me disapointed yet again.
It's an important paradigm shift to recognize parallel computing. Maybe when the world realizes the importance of parallel computing, and parallel thinking, we'll have that singularity that some writers talk about. People will no longer think in such basic terms and be so ignorant of context and timing. That in itself must be nice.
Sutter's article hits home with all of this. His conclusion is that efficient programming, and elegant programming that takes advantage of, not conforms to, the parallel model is the future. Judging by the chips I see on the market today, he was right, 2.5 years ago. He will continue to be right. The question is whether programmers step up to this challenge, and see it as being as fun as I think it will be.
technical writing / development
Higher level constructs are great, but I also wonder, why is everyone so hot to use multiple threads? For many problems, multiple processes work fine, allowing you to use your multiple cores with bulletproof separation of memory spaces (a basic advance we've had since the '60s).
"Not an actor, but he plays one on TV."
But Windows doesn't provide any less help with multi-threaded programming than Linux.
An OS could provide sort libraries and implement multi threaded solutions but I haven't seen Linux provide them anymore than Windows does.
I will say this much. If any effort is put into the average Linux distro to improve multi-threaded support please let it be in X-Windows. As far as I know X is still single threaded.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
That's exactly what it sounds like. Parallel processors are good for some things, but just an excuse for selling more processors for most.
God spoke to me.
I think you're thinking along the right track.... but people keep forgetting that we already DO run quite a few applications at the same time on the typical PC.
... but few programs seem to even be aware of it, and it often requires an OS X Server in the mix to co-ordinate the whole process. That makes it even more of a "niche" function.)
Users are often quick to say "I can only really work with one program at a time anyway. Anything else that's open is just sitting idle, minimized to my taskbar."
But that thinking doesn't consider anti-virus/anti-spyware software running in the background, or perhaps an anti-spam email filter. It doesn't account for system maintenance tasks that might be kicked off in the background (disk defragmenting, etc.).
It also neglects to consider a future where multiple PCs on a LAN might share their available processor time with an app that needs them, running on just one of those multiple PCs?
In the typical office environment, most PCs are doing little more than running someone's word processor or spreadsheet, but there may well be an administrator trying to compact a large database, compile an application, or even transcode some video - who would stand to benefit if the OS on all those systems allowed this kind of functionality.
(Apple's xgrid in OS X has promise in this area
All those viruses and adware will chew up those extra cores in no time.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I was pulling the speed number out my ass, but I do recall such a comparison. Besides, I'll submit that even 1/10th the surface temp of the sun per cm^2 is enough to melt a mobo...
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
Windows 7 will be out then.
throw new NoSignatureException();
> This makes multi-core programming almost a no-brainer.
What uttermost and complete crap.
We are nowhere near multi-core programming being a no-brainer.
Here's what we know right now:
1. We know how to manually create threads to perform specialized tasks. This comes nowhere near the ideal which is loading all the CPUs roughly the same, taking in account CPU affinity for some tasks in order to keep the caches warm and work well on NUMA architectures.
2. We know how to exploit data parallelism in those cases where we have large quantities of data.
Other than that we are still trying to find any paradigm that would make arbitrary systems scale well on a massive number of cores. Some of them are based on pi calculus, some on join calculus, some on more practical foundations.
At this point some things are obvious:
1. CPU threads are useless except as part of the foundation on which other abstractions are built. All really scalable systems use either lightweight threads/processes or smaller tasks which are scheduled in user space.
2. Native stacks are evil.
3. Thread affinity, as implemented by Windows USER and GDI modules and STAs is evil. Don't know how this works under Linux as I never did any GUI work there but I assume many components have similar limitations.
4. Any solution that exposes locks to the user instead of hiding them in the infrastructure is evil. Locks are not composable are very error-prone in real-world scenarios.
Dejan
I'm not a programmer by any means, but wouldn't a solution much like mesh networking or P2P software be effective in parallel programming? Again I'm not a programmer...
http://www.ni.com/multicore/
All day long... Our CMS does a lot of stuff like: cat file.txt |grep -e garbage |sed -e s/^foo/bar/g |cut -f1-3 -d\&
See, it is really easy to add more threads by just piping them together. Our server needs at least four processors to perform adequately with 10 web visitors, because of our excellent programming practices.
If like me your idea of Personal Computing is getting news on the Internet and making insightful comments, you should welcome eight processor computers which would result in four times more insight than this message from a two core processor.
my insights may be modded Funny, but at least some of my jokes are modded Insightful
The problem with os threads is that the things the benefit the most from parallel processing are the finest grained, but the os threads are only usable for the coarsest grained problems. So, OS threads are generally only useful for concurrency and not for parallel execution. Ie meaning that os threads can let you do two mostly different 'tasks' at the same time (repainting the GUI while the data is being processed), but are really bad at actually making a single task run faster.
You can, sometimes, with incredible effort make os threads run one task faster. But that doesn't change the fact that they are a really really bad solution for this.
I don't think that multicore processors will help me once the antivirus starts scanning my hard disk, or the system does a disk defrag. Most of the slowdowns in those situations are due to waiting for access to the hard drive. I think the same thing is true in about 99% of cases where I'm waiting on my computer. I frequently transcode video and don't notice any problems when trying to run my word processor. That's because the transcoding is run in low priority mode. It doesn't take much longer because most of the time my PC is idle, even when I'm using it, but the rest of the computer is much more responsive because it has priority for the processor available for that .25 seconds when it actually needs it.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
... on your desktop?
In my experience, unless you are a developer who needs to build a large application on their desktop, or you are playing some game that requires computation of some sort that cannot be offloaded to a graphics processor, or you are trying to invert the universe (or a small part of it), a desktop with a > 2GHz processor is a waste of money.
I'm sure there is the rare exception, I've never seen an office desktop at 100% CPU utilization which is doing something legitimate. You'd be better off sinking your money into maxxing the memory out, and getting a good malware scrubber package to ensure your machine utilizes its resources properly.
For a server, it's a different story altogether. In many cases, work that runs on servers tends to be quite parallelizable already - for example each request that is accepted by a shopping cart application usually is serviced by a different thread or process (which can be scheduled by the OS to run on a processor that is not doing meaningful work). Scaling involves adding another box to the server farm, and load balancing it, or adding more CPUs to a DB server's backplane. These are scalable as usage increases, where each individual request is not a CPU hog.
I think that the CPU problem really pertains to the problem space where any one of those server threads is always requiring the CPU (versus doing I/O or waiting for another process, human intervention on a HID, etc) - the space of parallelizable CPU-bound and non-parallellizable CPU bound type work.
Things like weather forecasting (solving partial differential equations), with processes that can be broken up into discrete areas, and then recombined are parallellizable via reprogramming. Some are, but only to an extent. For example, computing something like the mandlebrot set is parallellizable over each coordinate. You can kick off a bunch of threads for each point, one for each processor. But, at each point, the iterative process that is occurring produces intermediate results such that the one before depends on the one afterwards. Those individual point computations are not parallellizable.
Of course, if you're running spyware or adware on your server farm, you have other problems to solve too.
"640K ought to be good enough for anyone"
we all make fun of bill gates for that, but you are essentially saying the same thing
there are plenty of reasons that joe blow will need 32 cores in 2030. why? i'm not so arrogant as you as to pretend to know what he may be doing. but at the same time, i don't think you can listen to an mp3 and load webpages with flash and preview your 10,000 personal jpegs with 640K. so i do know that more power will be used, somehow
in other words, don't be a fool and think that we have reached anywhere near the limits of personal processing power usage. whatever the limit is, people will find constructive ways to fill that power. what those constructive ways are, i don't know, but whatever they are, people will think of them as vital and essential, like they do with their mp3 collection today, and with which they wouldn't have the faintest clue about in 1986
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Interesting historical synthesis. Too bad you were modded down for the link. Marketing is, after all, what MS does best, and regardless of technical merit, or legality, they *win*
This article, and the ensuing discussion about whether or not massively parallel computing is even necessary for general-purpose computing, reminded me of a short Sci-Fi story. I don't remember the name, and I couldn't find it by googling. The story is about serial beings that, because they "think" serially, can transmit themselves across the universe easily. They visit earth, and find the massively parallel structure of animal brains to be very wasteful. However they find promise in our computing devices (of course the story was written before the new multi-core craze).
Anyone know the name of this story?
Dan East
Better known as 318230.
There are a zillion thread classes for C++, I've written three myself. Just pick one and go with it, or just use pthreads directly, if you want/need to use C++. You don't have to wait for the next standard. Java's threads suck for a lot of things, particularly if you need lightweight threads. If you're using Mac OS X, the Cocoa NSThread class rocks, it's fast and easy. I don't really get what the big deal is here. People have been writing multithreaded code for a couple of decades.
-bill gates
to which we all laugh and guffaw at here on slashdot
and here you are, committing the same sin of poor foresight
personal processing power use will increase, and be filled by joe blow computer user, in myriad vital ways. i am not arrogant enough to prognosticate what those uses might be. but i don't think bill gates thought about a mp3 player running while a flash webpage ran while you watched the preview for the upcoming batman film (which, strangely enough, is true of 1987 and 2007: here comes a new batman movie. heh)
the point is, predicting that we've maxed out on memory usage/ processing power usage/ hard drive capacity usage has been a loser's game for decades now. so don't play that game
"Joe Sixpack users never need Super Computer power to surf the net"
oh, like you know what kind of apps over what kind of protocol will be delivered over fibre-to-premises in the year 2030. he will need a super computer. just like he uses a super computer today, in 1987 terminology, to look at football scores, and it seems like a slow computer (whatever the OS)
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
I wonder how he'd like it if every GUI program he ran was only a UI that then spawned other processes which would send UI updates via IPC. Things would be an absolute mess. Threads, if only for this one task (a separate thread to ensure a responsive GUI), are absolutely necessary.
You can say that threads are over-used by programmers who don't understand the reasons why you'd use a separate process instead, but I don't think you can say that threads don't have areas in programming where they're almost essential.
Good calc - aside from the questionable linearity of heat vs clock speed. However, I have to nitpick one statement you made:
Total energy output of the Sun: "386 billion billion megawatts" (per second)
The power output is in megawatts - period. Not megawatts per second. The energy output would be 386 billion billion megajoules per second, though. Energy is not equal to power.
But you used the figure for power correctly.
Microsoft funds some well-respected research labs. Why would they work elsewhere when they're getting paid to do pure research?
Erlang! Erlang has been around for 10 years and is designed for the current generation of multi-core/multi-cpu machines. It has concurrency built in as a core part of the language. If you want to get the most out of your multi-core machine, start learning Erlang today!
agree. Multi threading is easy in idea and hard in practice. You get logical errors and sometimes those errors don't occur all the time because of the way the OS/chip handles threads.
If i remember right, the power consumed by the clock speed is something like x * I^2, where I is current. x is changeable, but since we're assuming the same prescott core, it wouldn't matter.
If my math and everything is right, the prescott would take up 7.35 times more power than you predicted, probably within the margin of error given numerous people's fuzzy remembering of math and the original poster not being sure about 25 gHz.
Regardless there are several fundamental problems with increasing the speed, heat is just one. The fact that the speed of light is only so fast means that eventually the signal can only travel so far before the next clock cycle, so the surface are of the chip has to get increasingly smaller, complicating heat problems.
Multi-core, specialized processing units, or a different technology is the only way we'd get around the fact that clock speed gets harder and harder to push the higher it gets.
I am not an expert. If I am misled in something, please correct me.
It depends on the work you're doing. If you're working on a common set of data, threads are far more efficient. If there's very little in common, processes give you a bit more safety and more memory space.
In general, Windows programs tend to use threads because starting processes is expensive. On Linux, starting processes is trivial, so it gets used a lot more often. There are exceptions however, e.g. Microsoft Visual C++ 2005 spawns multiple processes to do parallel compilation.
Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up.
C++ Compilers currently can reorder instructions at will, for scheduling & optimization purposes, and that can affect concurrency unless you use memory barriers.
Which are hacky, nasty, and hard to grok.
Coming soon - pyrogyra
Yeah, my mistake for misquoting the website I sto^H^H^H borrowed the information from. It listed "...joules (386 billion billion megawatts) per second".
Sloppy job on my part.
I don't know where that statistic came from, but there are all sorts of devices (stoves, soldering irons, etc) that are designed explicitly for turning electricity into heat and don't come anywhere near that.
Hope and pray? eh.
The issue with a lack of threading primitives in C++ is that its the calls are not standardised, so each implementation can do it their own way, usually by doing it the OS way (eg Windows' BeginThread() call). Once the C++ gets standard threading, then all C++ compilers will provide calls to access these facilities in a common way, and portable code becomes easier. Until then, you either use a library or code for that particular OS, which is what people have been doing for many, many years.
It happens as fast as the bus can multiplex requests from multiple CPUs.
http://en.wikipedia.org/wiki/Occam_programming_language I remember using this languague back in late 80s. Very simple parallel _constructs_ available in the language itself, backed by machine level support available on the Transputer chips. One idea that comes to mind is to write a T800 emulator that can exploit todays multicore-processor capabilities.
Multicore just means more apps/threads can run in a multitasking environment without impacting each other. The title is misleading. Though the article is true to the point of not many programmers program for parallel processing, not all applications can make use of such.
from 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
to 45 2F 6E 40 3C DF 10 71 4E 41 DF AA 25 7D 31 3F
SETI@Home works by breaking up the problem into billions of small chunks. As far as i know, no effort was made to make the work units themselves run parallel. If you want to do that, run more than one on your box at a time. It took some effort to make SETI@Home work, though. This idea works for searching large number spaces - for primes, and such. This idea is basically "Hungry puppies". If you can get your app to work this way, then you can scale up to nearly any practical limit. (SETI@Home ran into bandwidth limits, sending out units - they eventually required work units to do more work).
Another approach is SIMD - Single Instruction, Multiple Data. Thinking Machines had hardware and software that used this approach. A single host sent out "instructions" on broadcast to thousands of processors. Each processor had it's own segment of the data to work on, but did the same instructions as all the others. This can be done completely in software on a MIMD machine, like a modern multichip, multicore SMP machine. Thinking Machines had two languages to support SIMD - C* (C star) and Lisp* (Lisp star). They had operations that could be sent out.
A down side of SIMD also affects vector machines. The non-vectorizable part of the code has to be run on the single processor host. Even if the vector processor is infinitely fast, if 30% of the application doesn't run on it, then you must wait at least that long. In this case you could only triple your total speed.
Again, on a modern MIMD system, a hybrid approach might work. The SIMD controller might itself run on multiple threads. There might be several instruction streams broadcast to processors.
-- Stephen.
The cure for solving all of parallel programming problems (deadlocks, priority inversion etc) is the Actor model: each object is a separate thread, and calling a method does not invoke code, it only puts a request in the message queue of the called object. Then the thread behind the object wakes up and processes the requests.
If an object wants a result from another object, then it obtains a future value that represents the result of the computation when it will be ready. When the caller wants the actual value, it blocks until the result is available.
Of course, blocking on a result would cause a deadlock in recursive algorithms...therefore, objects don't wait for a result, they simply enter a new message loop at the position they wait for a result. When the result is ready, the callee wakes up the caller by putting a 'terminate current loop' message in the caller's message loop after the result is computed.
The Actor model, implemented as described above, not only solves the problems of classical parallel programming (deadlocks, priority inversion, etc), but it also exposes whatever parallelism is there in a program.
Synchronization is performed only in two places:
1) when inserting/removing elements in an object's queue.
2) when adding the current thread into the waiting list of a future value.
Both synchronizations are implemented via spinlocks. In the case of the queue, there is no need to synchronize on all the queue, just on the edges.
I have made a demo in C++, using Boehm's garbage collector (it is a quite complex system, it needs gc), and it works beautifully. With this model, there is no need to use mutexes, semaphores, wait conditions, or any other synchronization primitive.
I chose C++ because:
1) operator overloading allows future values to be treated naturally like non-future values.
2) when waiting for a result, the waiting thread puts itself in the waiting list of the future. The nodes of the list are allocated on the stack; only c/c++ can do this, and it is crucial, because it minimizes allocation.
Another advantage of this system is that tail recursion comes for free: when you call a method which you don't want the result of, the local stack is not exhausted, because there is no call, only a message placed in a queue.
Patterns like the producer/consumer pattern come for free: one object simply invokes the other.
Data parallelism comes for free: invoking a computation on an array of objects will execute the computations in parallel, on each element of the array. For example, increasing the elements of an array can take O(N) with one CPU and O(1) with N cpus.
Of course, it is much slower on two or even four cores than the same sequential code. But given 10 or more cores, programs start to exhibit linear increase in performance, depending on algorithm of course.
The system is much like the nervous system of an animal: signals are transmitted slowly from one nerve to another, but processing is parallel, so the organism can do many things at the same time.
Another similarity between this system and the nervous system of an animal is that when a nerve wants to transmit an electrical signal to another nerve, the nerves must synchronize, much like there should be synchronization when an object puts a message in the object of another thread.
Occasionally I'll do something that needs a few seconds of CPU time, such as recalculating a spreadsheet or having Word re-juggle large parts of a document, or doing public-key encryption setting up a VPN tunnel, but normally my CPU's running at about 5%, and the only time it really burns a lot of CPU is when Mozilla gets cranky about something and decides to burn all the CPU it can get, probably running some badly-written Flash or Javascript dancing ad banner, so a multicore CPU would just lose one core to Mozilla instead of the whole thing. (And yeah, maybe a newer Mozilla version would help, and I could run Adblock, but basically that would just mean my CPU would be idle even more of the time.) Once in a while I'll watch movies on the PC, which can burn a bit of CPU but is still basically limited by download speed.
Obviously servers are an entirely different case, but for the most part they're either running work that's easily parallelized (serving lots of web pages or other kinds of sessions) or else they're running a database application that gets lots of transactions dumped into it, so the DBMS needs to have its parallelism written carefully but everything else is still multiple applications. So they'd do well switching to multicore, but otherwise it's really just the gamers who need the extra horsepower.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
...or is that a lie?
I know that the author of the NY Times article is well known, but there seems to be nothing new in his article or mlimber's abstract of it. Perhaps people just needed a parallel computing discussion, but is this "news?"
The basic flawed assumption that you (and hundreds of other people in this discussion) are making is that this is all about "speed," in the sense of raw computation throughput.
Speeding up intensive computations by splitting them up across threads that run in parallel in separate execution units isn't the only application of concurrent programming. Here's another one: making interactive or real-time programs more responsive, by flexibilizing the order in which the program can perform its computations, and allowing "important" events to preempt less important computations. Using concurrency to make software more responsive, in fact, often makes programs slower in raw throughput (since they have to pay the cost of complex scheduling and synchronization of all their threads), but it's very often OK to make a program objectively slower if it makes it feel subjectively faster to the end users.
Concurrency has had the potential to improve software since long before multi-core processors became commonplace. The recent mainstreaming of multi-core processors, contrary to what the story here would have us believe, really isn't all that relevant when you evaluate the industry's failure to develop and market advanced solutions for developing concurrent software. The absence of advanced tools for concurrent programming has been a problem since long before, and the cost is in usability of software.
Are you adequate?
Exactly, before we tackle putting 8 core processors to full use, can we solve the spinning beach ball of death? I hardly ever use my dual core system at peak capacity, but no matter how fast my machine is, sometimes it still just locks up.
"But Windows doesn't provide any less help with multi-threaded programming than Linux."
I suspect the OP is talking about things like scheduler performance, task switch latency, per-thread overhead, etc.
"I suspect the OP is talking about things like scheduler performance, task switch latency, per-thread overhead, etc."
But that doesn't make a lot of sense. You don't put that into a programing language at all.
And if you want to start a fight over scheduler performance Linux is probably the best of all possible OS's to do it on.
It was an off topic flame at Microsoft. I am not a big windows fan but this is just dumb.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Yes, you can just take a standard sort algorithm and run it multithreaded. Have you heard of the Quicksort and Mergesort algorithms? Did you learn linear algebra in University?
Both Quicksort and Mergesort are divide-and-conquer algorithms. They are recursively defined to split the data-set and sort each part independently. In fact, most any recursively defined algorithm will run multithreaded.
Linear algebra lends itself to multithreading. A matrix product is several independent dot products, which each can run on separate cores.
Modern Fortran compilers already support parallel matrix operations, without any code change. There is no reason why another languages' libraries couldn't also have hidden parallelism.
Not to mention that, from what I've seen and heard from people who have worked there, they pay pretty well with decent benefits, the place is relatively relaxed, and they actually encourage their people to have a life outside of work.
How horrible and terrible that must be for them. Can't imagine why *anyone* would want to work there (for the less observant of you, this was meant as sarcasm)
I hate to burst the gp's bubble, but they do actually make a lot of good things there (I, for one, am quite fond of Visual Studio. It makes my life a great deal easier). Like every company, there are good products and products that need some work, but on the whole their stuff is pretty good.
Also, as you said, a lot of the *really* smart people are in their R&D labs, working on things that are 5+ years in the future.
Everything I need to know I learned by killing smart people and eating their brains.
I see a lot of comments indicating that all a programmer needs to do to scale to more cores is just multithread your algorithms. If only that were true! Unfortunately, memory access patterns become extremely important for getting good performance, and that requires some pretty sophisticated knowledge about the hardware and proper tuning is almost a black art. Once large numbers of cores are in use, scaling your software optimally is going to be very difficult. Don't delude yourself. Talented programmers are going to be very much in demand, and I suggest starting to learn everything you can about it now. For starters, Ulrich Drepper has written an incredibly detailed and helpful article available at http://people.redhat.com/drepper/cpumemory.pdf which should really help dispel any notions that this change to computing is going to be easy!
Even a child's toy like the Nintendo DS from 2004 has two cores. Developers need to remember it isn't the early 1990s anymore and that they will have to deal with multiprocessor machines.
You seem to have a very rose-tinted view of garbage collection. It has numerous advantages, but performance is not (yet) one of them.
I am TheRaven on Soylent News
We bash in attempt to convince those smart people to leave MS and work in a more open way.
Perhaps I am the only person who thinks this, but is seems to me that threads are not a very good low-level primitive for concurrent programming. They inherently assume that whatever is running on the different processors is independent. As a result, writing a tightly coupled parallel algorithm is "hard".
I would much rather the operating system switch 4 or 16 synchronized cores completely over to me. Add prefixes to the assembly instructions so that I can explicitly execute instructions on processor 1, 2, 3, etc, in a shared memory model. Add logic similar to simultaneous multithreading to keep unused cores saturated with instructions from other threads when possible. This would help the programmer extract parallelism from tightly coupled algorithms. There seems to be no real multithreaded analogue to assembly language, and I think that is a big part of the problem. If we had such a thing it would be much easier to write tightly coupled parallel code, and higher level parallelization (from compilers) would follow inevitably.
Of course I'm not saying this is some sort of magic bullet. We would still need to split up computations and use threads as best as possible, but I think this is an obvious tool that we are missing.
In Soviet America the banks rob you!
We bash in attempt to convince those smart people to leave MS and work in a more open way.
In doing so, you prove yourself a fool. It is a childish action that only hurts your cause, and Microsoft (as well as most people with any business or social sense) knows it.
You see Microsoft as some great evil to be overcome without seeing that a large part of your problem is yourself.
Companies see people like you bash anything that isn't open source or "free" and they quite rightly think that you haven't really thought things out or lack the business acumen to realize why all of the world can't work that way. (Not to mention the extreme lack of social skills that it shows)
I like open source, I use it, I occasionally write it, and I've championed the cause in a sane way.
What you are missing is that Microsoft is giving a lot of people and companies what they want - software that is relatively easy to use and which everyone else is already using ("best" doesn't matter most of the time, which a lot of you have problems understanding).
At the same time, they treat their employees well, paying them well with good benefits (from what I've heard from people I know who work there), and maintain well-respected research labs.
You do not draw good people from a good environment by telling them it's not a good environment because they don't make everything open source. You draw good people by being a better environment in terms of pay, benefits, culture, work-life balance etc *and* appealing to their sensibilities.
If you can't do that, and instead simply bash anyone for associating with "the enemy", you are doomed to fail because, at best, people will work on it as a hobby. The lion's share of good open source software is done by people being paid to do it. Bashing the company of people you want to work for you does not help.
Not all of the world cares about open source, and many of us who do are not fanatical about it and realize that, while it is good for some things, is absolutely horrible for other things from a business standpoint. We like working on things that we see as important, but we also like being able to pay our bills and having a life outside of work.
Everything I need to know I learned by killing smart people and eating their brains.
I tried looking for that reference...
But the googles, they do nothing!
-- Terry
Perhaps multi-cores will become specialized, along the lines of videocards, diskdrives and ADSL modems (which have their own CPUs these days); and the nascent physics card. This is a natural effect, and occurs in biology (organs in your body), and the size of commercial organizations.
Perhaps we can further divide up the tasks?
Eg. caching, in the right way, can yield incredible performance benefits - so, perhaps a predictive caching processor?
ANYTHING but parallelizing the unparallelizable.
Threads are a superset of a message passing interface. With threads, you can trivially implement a message passing interface if that's what works best. The thing is, you don't have to use message passing if it's not.
FYI, no programming language compiler (except maybe ADA) has ever fully complied to any particular version of any international standard of any programming language. Programming language standards are just a convenient way to document what a compiler does : say it's a C++ compiler, it compiles the C++ standard, already written, to which you add your own delta, which can usually be written up more concisely. Or a guide to new implementation. Or a guide to draft programs that will be relatively easily portable across platforms.
Microsoft just has a past history of being a little bit more officially disdainful of the standard, in particular C++90, claiming that compliance was not a feature asked for by their users, of which there are a lot. Technically it's true. But look at C# : not only did they do a very open standard unlike Java, but also an elegant and concise one at that. And very closely implemented by their own compilers of course...
This is a silly argument. We've had a stable POSIX threads API for 10 years. No standard says the computer can't refuse to run your application entirely.
I really don't think that CPU is the bottleneck in most cases, but rather the hard-drive. When my machine is slow (outside of networking issues), usually I see the hard-drive light flashing like a UFO on crack.
Table-ized A.I.
Not sure if the following belongs in a MS- and C++ article on parallel programming, and also not wanting to sound like a fanboy, but there is some interesting stuff going on in this area in the OS X / Cocoa camp. A nice introduction can be found on Apple's developer website as well as the marketing literature for Leopard. I'm sure that the Java folks and .Net folks have something also, but this is worth noting for the sake of completeness.
Not sure if the following belongs in a MS- and C++ article on parallel programming, and also not wanting to sound like a fanboy
Don't sweat it. It's always nice to be able to compare with people in other groups. It lets you know what you're doing right that others may or may not be, shows you places where you might be deficient, and generally gives you the ability to look at the issues from a different angle.
Those, in my opinion, are positive things because they're one of the ways we learn to do what we do better.
Everything I need to know I learned by killing smart people and eating their brains.
The problem is that our brain has been doing linear thinking from a long time. We really don't have the ability to multi thread, I don't know if it is possible to train our brains to think "in threads". An algorithm is essentially a single thread model, so are many of the control statements(if, for and while). I don't think it is sufficient to make a loop run in 100 threads for faster time, the loop in itself need to be changed. It is a special ability, and we might just end up abusing multi-threading by incorporating it into our linear brains. Still, it might be a start, and I really think it would do a lot good for the future of computing.
http://monkeynesianeconomics.blogspot.com/
"Multi-CPU systems started becoming common in the mid 1990s so developers being a decade behind the times is a little embarrassing and there are many situations where the task is not completly serial."
So after a decade of poor adoption on the part of software developers, the chip makers have ignored the fact that the wisdom of the (programming) mob indicates that multi-processing is not an attractive solution. Chip makers have known for more than two decades that they were going to run into physical limits eventually using the current technology, but opted for milking the 1970's model as long as possible rather than developing new technologies that might lead to much better single-core performance.
If I can put my $0.02 in, I'd say to be on the lookout for more apps developed using the concurrency runtime in MSRS. Speaking as someone who works with it every day, I have to admit it took a while to get used to the task/message port paradigm. Once you grok it, though, it's an extremely powerful and elegant way to write parallelized code. Imagine writing multithreaded code without your trusty mutex primitives and condition variables, and where in most cases C#'s lock {} is useless. Objects implement service contracts, and mutual exclusion is enforced by defining interleaves over message handlers. Tasks (very light-weight execution contexts) can be instantiated using anonymous delegates, and the runtime automagically instantiates as many threads as you need, depending on your hardware.
So long, and thanks for all the Phish
Referential transparency + memoization + automatic load balancer in a truly distributed system = no threads to care for = give me 64 cores
I was about to say 13256278887989457651018865901401704640, but it appears this number is private property.
Some of the major computational hurdles that the average Joe runs into, just aren't amenable to paralization. Video games are probably the biggest example. A traditional graphics engine is not a ridiculously parallelizable business. If you're bottleneck is your ability to generate polygons to pump into the graphics card, then more cores aren't going to do you a lot of good since actually pushing things to the graphics card needs to happen in a synchronized fashion anyway.
If you have a lot of raw computation that needs doing, then more cores can be great depending on the particular algorithm. However, most people aren't running into raw computational bottlenecks, but bottlenecks that have to do with latency between the CPU and harddrive, or the CPU and memory, or network latency. Other issues include having user processes compete with background tasks. Vista and to a lesser degree XP were both pretty bad in terms of constantly reading or writing from disk for some random reason, which will really screw any other IO intensive applications you are using.
Really, modern personal computers still seem slow in many ways, but those ways are largely due to IO limitations, and poor management of competition for IO by the operating system. Until these bottlenecks are addressed, there isn't that much incentive to make the CPU intensive part of applications, which are already fast, faster by using parallel programming techniques.
It's not misleading at all, if you have ever studied the subject you would see how things are.
First of all, allocating an object on the stack can work only if we are sure that the address of this object will not escape the stack frame it is declared into. If we are not sure (perhaps because the algorithm we have at our hands is complex), then we have no choice but allocating the object on the heap.
Secondly, by adding garbage collection to c++, it does not mean that you can't allocate objects on the stack any more. Garbage collection for c++ exists (Boehm's gc), but it is way more inefficient and problematic than it ought to be.
Thirdly, raw data copying is a very quick operation. Even the raw PCI bus can move 133 MB/second...this means that, for most programs where the actual live set is very small (usually under 64 MB), copying takes very little time to be executed.
Forthly, c++ has the advantage over languages that do not have stack allocation and templates. In c++, the amount of heap objects is much smaller.
As the situation is right now, Java comes on top in many benchmarks against c++, and one of the advantages is garbage collection. The execution of a Java program might be slower than the corresponding c++ program, but overall Java wins because of optimizations applied in run-time, and the collector helps in that.
Finally, the greatest advantage of garbage collection is that it saves tremendous amounts of money off debugging memory management issues. There is no c++ application that has not crashed, at least once, from a memory management problem. In Java, there is no such thing.
Tell 2005 to get off the phone FFS, and let 1984 get a word in
Actually, programmers have been doing multitasking for decades. One could argue that it's taken hardware a long time to catch up. There are very few applications that really need to max out 2 cores, never mind 8, 256, or 4096. Take a look at the number of processes running on any modern OS though; multiple processes are pretty commonplace, even just for what would normally be considered core "kernel" stuff.
Yep, that level of optimisation is getting a little silly (especially when it can be done rather easily by the compiler anyway) for most applications, but in general we just don't even need to think about memory requirements or efficiency these days for most apps. It's nice to be able to do that, but for example it would be nice if the lovely chaps down at Microsoft tried to minimise the amount of resources their flippin OS takes up and leave more for the actual 'important' bits - the applications. Oops that almost turned into an MS rant..
which is totally what she said
Funny thing, but this discussion reminds me on decades old rambling on assembler programming techniques. IMHO, the whole "paradigm" will get rendered obsolete by smarter CLIs being able to automagically ;-) interpret parts of code to fit multicore architecture. This is, after all, one of the features of CLI layer anyway (beside portability) - to conform to underlying hardware with minnimum or no programming efforts.
I guess it will a dumb question but:
:-)
Why a Java virtual machine can't take the burden of the multi-core adaptation?
They have promised "write once run anywhere"!
Lazy coder
There is more to it than volatile. For one, you still need explicit memory barriers. One example when that is needed is when you need to implement singleton via double checked locking correctly.
Yeah, and in Soviet Russia, the Sun melts YOU!
So say we all
that's n log(n) comparisons in a comparison sort; we can get around the limitation by, for example, using radix sort, where we do something different from comparison sorts, or by using many many processors, thereby spreading the comparisons around in clever ways. if we go the route of multiple processors, we can achieve very nifty speedups if we have, like, n of them, but even if we only have a few processors, we can get nice speedup from a not-big-oh perspective.
so, basically, n log(n) is not the final word on sorting.
the privacy of one's mind is important.
you do have something to hide.
Communication channels are the right way to tackle this. Bell Labs had the right idea. See http://swtch.com/~rsc/thread/ and the slides at http://www.cs.kent.ac.uk/teaching/07/modules/CO/6/31/slides/ if you have an inquisative mind. For the slides, read them in this order: motivation.pdf -- just pages 1-39, basics.pdf, applying.pdf, choice.pdf, replicators.pdf, protocol.pdf, shared-etc.pdf.
it want's it's apostrophe's back. 's.
Escher was the first MC and Giger invented the HR department.
CPU design has progressed steadily from static to dynamic optimisation. Complex memory-based indirect addressing modes have been replaced by simple loads and stores, complex multi-function instructions were broken down to simpler basic operations - in both cases, this allowed better optimisation simply because there is more flexibility. The problem with VLIW, Itanium included, has always been that it's a step back from dynamic to static decisions - EPIC just got rid of the no-ops by encoding the VLIWs a bit better.
Itanium does have a lot of other really nifty things about it, but those were not new. Its speed comes from the DSP world, and almost all of its features are just to compensate for the problems DSPs have trying to deal with non-DSP issues like interrupts and exceptions, caches (DSPs hate non-uniform memory), function calls, and so on. That ends up making it almost as fast as a much simpler CPU with a DSP coprocessor, for only a few times more cost.
Parallel New World: http://www.jnthn.net/papers/2007-fpw-parallelism-slides.pdf
See also http://dev.perl.org/perl6/
Further, large numbers of C++ programs already exist. Such a fundmental change in philosophy would cause all these programs to have to be rewritten. All to support a dubious programming style. It is not going to happen. The people that ask for it do not understand C++ and its problem space. It is just annoying.
Right. And right again, you can't change the complexity (even some compilers try) but you can try to design the complexity (simplicity) yourself. See the sorting algorithm and parallelism development since 60's. Not advertising but Syncsort was and maybe still is one very good example. US radar tracking built highly parallelized sorting systems for incoming radar data at that time, still a viable technology, can be found (maybe) in several thesis papers. Parallel processing has been around longer than most think. Now, to get most of parallel processing don't think just one algorithm, think systems. Even a lot of SQL access today can use a lot of sorts and parallel I/O and not always needs the results sequentially but to combine it later when the results are needed, a perfect parallel task while other processing is done. Think compilations, several programs compiled and later combined to one object or library, what's better than compiling in parallel? If you can break a task ( big or small, a report, a transaction, a matrix manipulation, a loop, etc ) to smaller pieces it often can be parallelized. Not easy always because something like XML was not originally designed for that but if you need performance ( and flexibility ) don't design one huge XML but several with references where even if each needs serializing they can be parsed or created in parallel. This really is less a coding problem but a design problem and a little OT because the article was of CPU's but there is much more in computers, systems and parallelism than just number of CPU's.
Yes and yes. You explained it better than I usually do. Actually there are such things for software also, it is called multi-tasking. One time cost to load and then forever it is just normal call cost (almost, with hw help) any time you give the task more work (or as the name says something to do). If a task or a thread is independent, feeding the system without control once started, not receiving new "orders" like reader tasks, I would call it multi-programming, just schematics. As someone already mentioned it gets even more interesting between SIMD or MIMD machines (AP(attached) and MP(multi) for oldtimers) and if it involves any external resources as I/O, memory, etc which can be shared (and/or cached) between or separated between processors. Or hybrids, 2+2, etc. An interesting subject trying to make it fast in parallel without problems but probably way off from "normal" development just good to remember when trying to make applications running faster. Also, I agree that in "normal" desktop more than two processors are mostly not a big benefit but games, CAD, rendering, statistics, modeling / simulation, pick your own, if done right, can have great benefits of multiple processors. Clusters help but fast connections are not cheap between systems. And an advice to start-up / tear down philosophy, why tear down if it is not one time task? Write re-entrant and reuse. And if you have two processors, why create more than maybe three equal threads, are you sure that the resources they use are not serialized? Even if threads are in pools there is a cost to start / stop.
Correct and a good answer, can only be used when only certain type of people around. This is one of the basic problems not too often explained to CS students, throughput and response times are only loosely related. Actually latency plays even bigger role today because of networks (any kind) internet, buses, execution path, even SOA between services, etc and often overlooked. Now of course there may be some turnaround (setup) time which comes to play but with computers it's getting faster, I don't know about humans?
You definitely should have more points, if I had.. You answer so many questions so well that there is no point to comment those except because you have FORTRAN there. Kind of related to parallel processing everyone should know how the vector processing and specialized vector FORTRAN did/does it. The problem today compared to earlier systems are the week links, the compiler really can not analyze everything in system, maybe the data used in thread is used by another but because it is not known at the compile time there is nothing the compiler can do. Kind of have to agree with just in time optimization, the programs can learn and the longer they run the more optimized they get. Most relational database systems do that (if you let them), the longer they run, the more efficient they get (to certain point, even they can not fix logic errors.) And some of them span tens of nodes with hundreds of cpus and even more.
Thank you, you really had to bring APL to this when I'm having flashbacks! APL was a beautiful language and because we were one of the first AP (attached processor) systems I got to work on that. Who ever wrote that interpreter was good except in assembler so I ended up fixing several bugs in code when we build (mid 70's?, I don't really remember the year) first real applications, not just math but an HR system based on relational ideas(system R anybody?). And once you get the keyboard it is no more difficult than using all the weird ESC, etc sequences in your favorite editor. Actually it grows on you but, agreed, a little math background is needed. At end the system did run well on AP, was fast, easy (the user didn't actually need an APL keyboard) but after one look to the code our developers did run away (screaming!) So, that project didn't catch but it was lots of fun.
The truth, in my humble experience, is that threading is rarely smart thing to do. There is the occasional rare case, but much more frequently people think they have a rare case when they don't.
"Not an actor, but he plays one on TV."
Multi-Threading is more efficient than Multi-Processing (as in running multiple processes, not processors) precisely because it has shared memory. Programs can be that much more efficient if they don't have to be copying large buffers around all the time. Multi-Processing's advantage is that it DOESN'T have shared memory, it's much tighter from a security, and reliability point of view.
;)
They're different approaches for different uses. Neither is better than the other overall.
As for an automatic/manual choke, my current car has an automatic choke, and in the current cold weather it uses it. It's really nice to not have to figure out how much choke you need in order to start the car. However, if I push the accelerator down too slowly when pulling off, the choke disengages automatically too early and the car stalls. A manual choke doesn't have this problem.
(Obviously this last paragraph isn't serious, it likely just needs adjusting a little)