Multithreading - What's it Mean to Developers?
sysadmn writes "Yet another reason not to count Sun out: Chip Multithreading. CMT, as Sun calls it, is the use of hardware to assist in the execution of multiple simultaneous tasks - even on a single processor. This excellent tutorial on Sun's Developer site explains the technology, and why throughput has become more important than absolute speed in the enterprise.
From the intro: Chip multi-threading (CMT) brings to hardware the concept of multi-threading, similar to software multi-threading. ... A CMT-enabled processor, similar to software multi-threading, executes many software threads simultaneously within a processor on cores. So in a system with CMT processors, software threads can be executed simultaneously within one processor or across many processors. Executing software threads simultaneously within a single processor increases a processor's efficiency as wait latencies are minimized. "
How long has hyperthreading been available on Intel CPU's?
I am a developper, mainly in C, and I did a lot of programation on QNX4 with multi-threading (even if QNX4 implantation is not *really* threads), now I am doing it in Precise/MQX.
Multi-threading comes with synchronization, semaphore, mutex, etc, once you know how to deal with them, it's easy.
from Intel's hyperthreading?
Nobody has anything like this and it will probably take competitors at least -2 years to catch up to sun.
I dont mean to look a gift horse in the mouth..
..but wouldn't it be even better if it was hyper-multi-threading?
air and light and time and space
this makes me wonder what the effect would be on something like stackless python?
the whole state pickling concept is pretty cool, and kind of throws threads all over..
anime+manga together at last.. in real time.
anything i write usually maxes out the processor at 100% for days at a time (i deal with huge data conversions)
so yeah i'd also like to know: what does it mean to me?
This is Sun's Niagara Design. The more I learn about it, the more I think that it's nothing that exciting.
From the lack of non-Sun-supplied buzz regarding this technology, it would appear that many people aren't finding it very exciting.
I'm a big tall mofo.
Can somebody explain to me how this differs from intel's hyperthreading technology?
Is this just a fancy name for sticking multiple cores on the same die?
What's the real story here?
Online Starcraft RPG? At
Dietary fiber is like asynchronous IO-- Non-blocking!
"What's" is a common contraction of "What does." The fact that it is used, heavily, in American speech is evidence enough of this. In British English, who knows.
I havn't read all the technical docs, so I'm not sure what the difference is between Sun's technology and Hyperthreading, but I'm sure there is a difference, with Sun's technology probably being more complete. Not sure but perhaps this technology would be better integrated with multi-core processors, to give not only multiple virtual processors, but also multiple simultaneous threads on each core. After all, if they want to compete against the Cell, they have to go multi-core. -d
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
I would have had first post but I was reloading Slashdot using only a single thread!
It means we're going to have to lean to program in parallel. We're going to have to parallelize our data processing and we're going to have to learn synchronization and locking methods.
This is nothing new. The decreasing returns and impending limits of single threaded processing has been upcoming for a long time now.
Start Running Better Polls
This is kind of a trivial optimization! Basically, you extend your pthreads library so all the threads within a single shared memory application schedule themselves on cores on the same chip. Big deal! Now if it could figure out how to schedule processes on "adjacent" cpus to optimize their common memory accesses, I'd be more impressed.
Not enough teeth and too many showings of Dukes of Hazzard and Heart Like a Wheel.
Can I still use INKEY in my basic programs? Will multi-threading make it more efficient? Can I actually run a second program on my DOS PC without having to force it as a TSR?
In informal British English speech it's fine. In informal writing it's probably okay, and in formal writing contractions shouldn't be used.
Decode these
And how is this different from hyperhreading?
Seriously, is there a difference, or is this just a marketing name to differentiate the two?
Never ask for directions from a two-headed tourist! -Big Bird
HyperThreading is simply an implementation of SMT, it isn't exactly an Intel invention. I think a lot of the tech was designed into an Alpha processor that never saw general release, IIRC it was a 4 thread core. HyperThreading is a naff 2 thread variant that sometimes reduces overall processing power, lol.
Throughput computing maximizes the throughput per processor and per system. So a processor with multiple cores will be able to increase the throughput by the number of cores per processor. This increase in performance comes at a lower cost, fewer systems, reduced power consumption, and lower maintenance and administration, with increase in reliability due to fewer systems. (from TFA, emphasis mine)
So it seems they invented a way to linearly scale peformance. WOW! But maybe I misunderstood and the thing is over my head.
CC.
TaijiQuan (Huang, 5 loosenings)
... their continued use of the word "Enterprise." What does this mean anyhow?
Not sure I buy that this "increases a processor's efficiency as wait latencies are minimized". It seems to me that decreasing latency reduces efficiency because you spend a greater percentage of your cycles changing state (overhead) instead of doing useful work. This is why realtime OS'es aren't the norm: they reduce latencies to critical maximums, but at the cost of overall throughput.
[ home ]
If you'd bothered to RTFA, you'd see that CMP = multicore, *not* CMT. CMT uses "logical processors" in exactly the same way as HyperThreading.
Get a clue in general before posting crap comments, please.
It means "Difficult to reproduce bugs".
It worries me how many people just say "it means faster programs and doesn't take much more work". That mindset leads to lazy programmers who A - Can't optimize to save their jobs; and B - Don't actually understand what multithreading really does.
If you consider it easy, you've either just thrown great big global locks on most of your code, in which case your code doesn't actually parallelize well; or you've written what I refer to in my first sentence - Bugs that take an immense effort just to reproduce, nevermind track down and fix.
1.3 Simultaneous Multi-Threading
Simultaneous multi-threading [15],[16],[17] uses hardware threads layered on top of a core to execute instructions from multiple threads. The hardware threads consist of all the different registers to keep track of a thread execution state. These hardware threads are also called logical processors. The logical processors can process instructions from multiple software thread streams simultaneously on a core, as compared to a CMP processor with hardware threads where instructions from only one thread are processed on a core.
SMT processors have a L1 cache per logical processor while the L2 and L3 cache is usually shared. The L2 cache is usually on the processor with the L3 off the processor. SMT processors usually have logic for ILP as well as TLP. The core is is not only usually multi-issue for a single thread, but can simultaneously process multiple streams of instructions from multiple software threads.
1.4 Chip Multi-Threading
Chip multi-threading encompasses the techniques of CMP, CMP with hardware threads, and SMT to improve the instructions processed per cycle. To increase the number of instructions processed per cycle, CMT uses TLP [8] (as in Figure 6) as well as ILP (see Figure 5). ILP exploits parallelism within a single thread using compiler and processor technology to simultaneously execute independent instructions from a single thread. There is a limit to the ILP [1],[12],[18] that can be found and executed within a single thread. TLP can be used to improve on ILP by executing parallel tasks from multiple threads simultaneously [18],[19].
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
Er, HyperThreading but with added hardware "task switching", which is what the OS takes care of on HT processors.
Hyperthreading (which is SMT) and CMT (the original CMT, not Sun's new acronym) is at:
= RW T122600000000
http://www.realworldtech.com/page.cfm?ArticleID
It's dated a while ago, I think before hyperthreading came out (and Alpha was still being developed). The other two parts of the series are also interesting, and explain some of the possibilities with hardware processor threading. I think the first part has more explanation, but I couldn't find it quickly.
The forums on the site are also good, better in a technical sense than ars-technica or aceshardware and especially slashdot.
I skimmed through the article, and it seems this is just multiple-cores-on-one-die (hyperthreading style), but they also add hardware context-switches. So, you can feed the processor 10 different threads and it'll take care of switching contexts as soon as a cache misses or such, without invoking the OS.
Is this all there is to it? I mean, with just one L1 cache per core, this is not going to work very well, is it?
But thats outside the point - in the new world of very many cheap rackmount servers clustered together, loose coupling has taken over. Maybe if the world had turned out differently and was dominated by big servers, threading would have caught on.
As many others have already pointed out, Intel has had Hyperthreading available in Pentium 4 and Xeon CPUs for a couple of years now, which does exactly what the article is talking about.
I was skeptical at first, and read some of those articles showing that some applications could actually run slower. But then I tried it for myself, and I have to admit I've been impressed. My main box is a dual-Xeon, each with Hyperthreading turned on. It appears to Linux as if I have four independent CPUs. A few numerical tasks saturate the processors if I have just two of them running in parallel, but several tasks do fine with four or more copies. My favorite is "make -j 4" - starting four gcc processes in parallel works surprisingly well. How long does it take you to compile the Linux kernel?
The real issue is how large each thread can be (in the matter of memory) before it has to access data that is external to the thread. It may mean a lot for gamers running close to reality games and also for those that are doing massive calculations.
The important thing is that developers has to be aware of the possibilities and limitations around this technology. Otherwise it would be like throwing a V8 into a T-Ford. It is possible, but you would never be able to utilize the full power.
Another thing is that todays programming languages are limited. C (and C++) are advanced macro assemblers (not really bad, but it requires a lot of the programmer). Java has thread support, but it's still the programmer (in most cases) that has to decide. Java is not very efficient either, which of course is depending on which platform it's running on in combination with general optimizations. C# is Microsoft's bastard of Java and C++ with the same drawbacks as Java.
There are other languages, but most of them are either too obscure (like Erlang or Prolog) or too unknown.
The point is that a compiler shall be able to break out separate threads and/or processes whenever possible to improve performance. It is of course necessary for the programmer to hint the compiler where it may do this and where it shouldn't, but in any way try to keep the programmer luckily unknowing about the details. The details may depend on the actual system where the application is running. i.e. if the system is busy with serving a bunch of users then the splitting of the application into a bunch of threads is ot really what you want, but if you are running alone (or almost alone) then the application should be permitted to allocate more resources. The key is that the allocation has to be dynamic.
Anybody knowing of any better languages?
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
If the point is not clear one should program to get maximum Instruction per Cycle (IPC). This is what essentially is obtained via multithreading i.e. the ability to execute more instruction in a given cycle.
"What's"
Can mean any of the following:
"What is"
"What does"
"What has"
So the title of this post can validly be read
"What is it Mean to Developers?"
So the answer can validly be stated as
"Yeah, it's real mean to developers".
Go on, look it up.
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
Comment removed based on user account deletion
This is what it means for me: http://www.cs.bell-labs.com/who/rsc/thread/
Also see Brian W. Kernighan's "A Descent into Limbo" and Dennis M. Ritchie's "The Limbo Programming Language".
And of course Hoare's classic: Communicating Sequential Processes.
Now you can enjoy the power and beauty of the CSP model in Linux and other Unixes thanks to plan9port including libthread and Inferno; yes, it's all Open Source.
"When in doubt, use brute force." Ken Thompson
Yet another reason not to count Sun out...
Who has ever counted them out?
In my experience parallel makes on a hyperthreading CPU don't run faster. Did you time the total build each way ? Are you not running -pipe, so that you are really just paralizing access to a disk ? To do the experiment correctly, compile with make -j 4 and make -j 2 on with and without hyperthreading, geting 4 measurements in all.
Actually, the "best" way to implement the design is to split the thread state from the processing elements, then use locking on the elements. If two threads use independent processor elements, they should be simultaneously executable.
By having many instances of the more common processing elements, you would have many of the benefits of "multi-core" (in that you'd have parallel execution in the general case) but the design would be much simpler because you're working at the element level, not the core level.
Yes, none of this is really any different from hyperthreading, multi-core, or any other parallel schemes. All parallel schemes work in essentially the same way, because they all need to preserve states and lock resources.
Personally, I think REAL Parallel Processing CPUs that can handle multiple threads efficiently are already well-enough understood, they just have to become reasonably mainstream.
For myself, I am much more interested in AMD's Hyper Tunneling bus technology, which looks like it could supplant most of the other bus designs out there.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Since I mostly work on J2EE stuff, I let the container take care of the threading for me. The one exception is J2EE Connector Architecture (JCA) bits that use the work manager. Even there, however, most of my work is simply putting a thin JCA layer in place between the outside world and the J2EE stack.
For me, these new chips simply mean increased performance for deployed apps, without any modification to the app code.
Beauty!
668: Neighbour of the Beast
http://en.wikipedia.org/wiki/Simultaneous_multith
------ Take away the right to say fuck and you take away the right to say fuck the government.
CMT is nothing more than multi-core processors. Sun is using the marketing idea of CMT to hide the fact that the UltraSparc IV is nothing more than two UltraSparc III cores on one chip.
One way to look at this is Sun maximizing their existing engineering efforts. However, by marketing it as some revolutionary feature advance, they're implying that they've done something new and exciting, as opposed to something that IBM is already doing and AMD and Intel are working on.
Beyond that, Sun and Fujitsu have a co-manufacturing and R&D deal now, confirming something those in the enterprise space have been saying for a long time - Fujitsu was making better Sun servers than Sun.
Plus Sun killed plans for the UltraSparc V, leaving only the Niagra. They have the Opteron line pushing up from below, and rapidly evaporating sales at the high end. They're resorting to marketing gibberish to add new features to the product line, while simultaneously offloading R&D and manufacturing to a partner.
Remind me again why Sun is in the hardware business?
Thanks,
Matt
me@mzi.to
As many others have already pointed out, Intel has had Hyperthreading available in Pentium 4 and Xeon CPUs for a couple of years now, which does exactly what the article is talking about.
As many others know, you know exactly nothing about what you are talking about. HT has basically two sets of registers so that during a cache miss which would cuase a bubble the chip switches to the other set so it doesn't sit idle. Suns chip on the other hand actually have multiple corses physically doing work at the same time. In fact were it not for Intel's hideously flawed NetBurst architecture the hideous hack that is HyperThreading would not provide any preformance increase at all (in fact it doesn't as much provide an increase as much as negate a decrease...). For evidence consider how many Pentium Ms have HT on them... Now I may not be fully correct but I didn't volunteer a comment; I only posted to prevent the misinformation of others. You'll find more on ArsTechnica. I'd link to the article but I can't find anything on their redesigned site.
Your CPU is not doing anything else, at least do something.
Yes it *sometimes* reduces it. Not always, though. For processor-bound tasks (such as crunching numbers), it would reduce performance as you can only do so much at one time. For normal tasks, it tends to increase performance as most tasks usually are waiting on memory or disk leaving a lot of free CPU time. Also normal tasks tend to have a lot of waste due to things like branch mis-prediction. The hyperthreading concept fills in those gaps with other running threads rather than always flushing the pipeline or causing a stall on the entire thing.
Hexy - a strategy game for iPhone/iPod Touch
Try make -j5 or -j6. Tends to have better results than the -j4 on my dual Xeon rig. And yes, I have benchmarked it.
My blog. Good stuff (when I remember to update it). Read it.
Sun's upcoming "Niagra" chips are supposed to have eight cores, each core being able to execute four threads. So that allows upto 32 threads executing at once -- on one physical chip.
And we're not talking about "HyperThreading" where one of the CPUs is virtual. It's a real execution unit.
And Intel and AMD are talking about dual-cores?
This should help save space and energy (both in the power needed to run the box, and in running the cooling system).
Of course - all things are better when they're hyper*. Of course they tend to jump from A to B so quickly everything becomes blurry. Besides, jumping into hyper-multi-threading [isn't] like dusting crops, boy!
* See Compu-Global-Hyper-Mega-Net.
I want to drag this out as long as possible. Bring me my protractor.
there are still some applications where raw CPU speed matters.
We have been at the thoughtput is good enough point for several years. In truth, this is old news really. I've got IRIX servers doing lots of things plenty fast, clipping along at a brisk 400Mhz. There is not much you can't do with that, particularly when running a nice NUMA box.
I assume the same holds true for SUN gear. (I think their NUMA performance is a bit lower than the SGI, but I also don't think it matters for a lot of enterprise stuff.)
One application I have running, NUMA style, is MCAD. It's cool in that I have one copy of the software serving about 25 users, running on a nice NUMA server that never breaks. Admin is almost zero, except for the little things that happen from time to time --mostly user related.
However, I'm going to have to migrate this to a win32 platform. (And yes, it's gonna suck.) Why? The peak CPU power available to me is not enough for very large datasets and I cannot easily make the data portable for roaming users. (If there were more MCAD on Linux, I could do this, alas...)
Love it or hate it, the hot running, inefficient Intel / AMD cpu delivers more peak compute than any high I/O UNIX platform does. And it's cheap.
Sun is stating the obvious with the whole I/O thing, IMHO. In doing so, they avoid a core problem; namely, peak compute is not an option under commercial UNIX that needs to be. (And where it is, there are no applications, or the cost is just too high...)
This is where Linux is really important. It runs on the fast CPU's, but also is plenty UNIXey to allow smart admins to capture the benefits multi-user computing can provide.
Linux rocks, so does Solaris, IRIX, etc... The difference is that I can get IRIX & solaris applications.
WISH THAT WOULD CHANGE FASTER THAN IT CURRENTLY IS.
Blogging because I can...
"Intel has had Hyperthreading available in Pentium 4 and Xeon CPUs for a couple of years now, which does exactly what the article is talking about"
You are wrong. Period. Sun's CMT is several independent CPU cores on the same die with a huge bandwidth interconnect on-die. Intel's Hyperthreading is a gimmicky technology that has a very small real-world impact on performance.
And your personal "benchmarks" cite no numbers. I be trolled!
-- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
Now all we need is a revolution in memory. A lot of software design is memory managment. From GCs to solve the "running into each other" memory pool problem, to simply shoving memory contents in and out of processors to set something really simple. e.g. #11111111b*
*Yes DMA helps with some of this, but...
While conceptually unrelated, I put threads into the same mental category as untyped pointers. They are extremely powerful, but a complete PITA to debug if anything goes wrong, even moreso if you are maintaining someone else's void* or pthread_create filled application.
What I've always done is code extremely defensively:
1. make the various threads data-independent enough to be free-running and only co-ordinate at the start and finish of a thread's activity. If necessary, re-architect everything in sight to make this possible.
2. when interaction is required, get a nice big coarse-grained lock and do everything that needs to be done and get it over with. profile it; there's a good chance it'll be over with quickly enough that it won't erase gains from parallelism or at least you can see what's taking so long and move it outside the lock.
3. do TONS of load testing with lots of big files and random data. thread-related bugs can often hide for years in your code. Unlike divide by zero or null pointer references, a thread bug won't necessarily give any kind of hardware fault or exception. You have to go hunt for the bugs, they won't just pop up and say hi here i am.
4. If you have multiple people of various technical abilities working on the code, you should add a grep/sed script to your makefile to check for accidental introduction of mt-unsafe library calls (strtok, ctime, etc). Flag new monitors and locks for review. Warn about dumb things like using static or global variables.
5. Last trick is to use a layer to allow your program to be compiled for fork/wait, pthread_create/pthread_join, or just plain old co-routine execution (esp if there is a socket you can set to non-blocking). In addition to being able to test your code for correctness in various situations, you also have a baseline to see if the multithreading is an actual improvement.
With the obvious exceptions for embarassingly parallel algorithms, I've found that humdrum client/server or middleware stuff:
(a) gets only marginal gains from multithreading
(b) you have to work for it--profiling and tuning are still required to get top-notch performance
(c) effectient scaling beyond a handful of threads is the exception not the rule. If you have more threads than CPU's, it's a simple fact that some of them are going to be waiting and then your scaling is done.
More like none of Sun's competitors have anything which comes remotely close.
Notice how nearly a year after Sun announced this, intel finally admitted that clock frequency (i.e. gigahertz) isn't everything and that they'd be bringing out dual core processors?
Niagara has 8 cores each capable of 0-clock cycle latency switching between 4 different thread contexts.
Who else has working hardware and an OS to go that can do this?
Stick Men
http://www.dcs.ed.ac.uk/home/stg/pub/P/par_alg.ht
http://www.informit.com/articles/article.asp?p=36
http://static.cray-cyber.org/Documentation/Vector
------ Take away the right to say fuck and you take away the right to say fuck the government.
Troll, go back to where you came from.
I think Sun is suggesting that they'll be canceled just as fast.
When Intel ships a CPU that can run 32-threads simultaneously, then you should ask if Intel is inventing what Sun already did.
Intel is more marketing than substance. Hyperthreading on Pentium is like 1/16th of Niagara, at best.
What that means from a programming POV is that you really need to exploit multithreading. And scalability becomes much more important. Things that scale well with 1 or 2 processors, aren't going to scale with 16 or 32 processors. Lock-free synchronization will become more important since it scales better (11 on a scale of 1 to 10).
That said, I think it will be some while before lock-free becomes important which is why I put my project on the back burner. I think what will happen instead is Sun will position Niagra and Rock as a commodity solution, cheaper than a bunch of cheap pc's running disparate tasks. But, Sun in the commondity market?
On the plus side, whatever you might think of CMT technology, the description given demonstrates the opportunity CMT brings for redundancy:
"...the execution of multiple simultaneous tasks - even on a single processor."
"Chip multi-threading (CMT) brings to hardware the concept of multi-threading, similar to software multi-threading..."
"A CMT-enabled processor, similar to software multi-threading..."
"...CMT processors, software threads can be executed simultaneously..."
"...executes many software threads simultaneously within a processor..."
"Executing software threads simultaneously within a single processor..."
I could be wrong, but I get the idea that CMT allows you to perform multiple simultaneous software threads, even within a single processor!
If I RTFA too, do I have a FTA (Fault-Tolerant Article), or am I simply creating a RAIABSF (Redundant Array of Independent Articles and Buzzword Sentence Fragments)?
Tim
Since the Pentium 4 according to Intel, but it's not a good question as that's Intel's trademarked term for their two-thread implementation of simultaneous multithreading:
By contrast, Niagara is implementing Chip-level multiprocessing:
In other words, Niagara implements in hardware, at greater scale, what Pentium 4 offers as an emulation feature. In theory one could SMP on top of CMP chipsets for even greater throughput. If you find the Sun article too hard, the Wikipedia references I have cited will probably prove much easier to understand.
"As far as threading is concerned, one of the few languages I've dealt with that makes mutexes, semaphores, etc. easy to deal with is Java."
So can Erlang.
Wings3D is written in Erlang.
It makes good sense to fix the bottleneck, because that's where the problem lies. Improving other parts which don't have problems, according to Amdahl, is A Bad Idea (:-))
davecb@spamcop.net
It would be nice to have more than hype. IIRC the Intel hyperthreading documents were mostly hype, plus a few very unimpressive benchmarks. When benchmarks by the original company are borderline, a little bell should go off. So now Sun has something similar. We're supposed to buy their new proprietary hardware and rewrite our programs and introduce concurrency bugs? And for what, a few percent improvement? Hmmmm.... Pass..
Delphi's standard library includes very convenient wrappers around such things as threads. Writing a new thread is as easy as descending from TThread and overriding the protected Execute method.
Unfortunately, D7 was the last gasp of a fantastic language and environment. Between the Borland->Inprise->Borland fiasco, terrible marketing, and C#, Delphi will never be recognized for the fantastic language & product that it was.
You claim that C# is a bastardization of Java, but in fact is a combination of Java (which, by the way, simply employs a 100% object-oriented architecture) and Delphi. The lead designer, Anders Hjelsberg, was heading up the object-orientation of Pascal as Delphi. Many Delphi developers are very comfortable in C# due to the similarities in design and structure.
yeah, i'm flamebait....
actually i was saying how cool beos was.
just because theres caps doesn't mean its flamebait.
thanks mods.
"Martha Stewart can lick my Scrotum......do i have a scrotum?" -- Sharon Osbourne
Now we have bad grammar in the headlines.
In fact I do know a better language, Ada95/2005.
It's simply meant for threading and unconventional compiler optimizations (through the enforcement of constraints), while still being imperative and having a familiar syntax. And it's meant to be compiled unlike Java.
Here's a site about Ada and here's another one.
A good (alas not perfect) Ada95 compiler is included in GCC 3.4.
So aye, we are ready for the CMT systems.
You are correct in saying that there is an important difference between Intel's hyperthreading and actually having independent CPU cores on the same die. But you're wrong in claiming that Intel's hypethreading has a very small real-world impact on performance; the win can be substantial for apps with lots of memory traffic (though the win disappears in number-crunching applications), and this is a common case. For "make -j" on a large project, the Xeon win is significant.
TCL's creator's paper 'Why Threads Are A Bad Idea (for most purposes)'
On the plus side, you'll have loader code that auto-routes around bad cells. --Mike--
Freaking impossible to debug!!
For "make -j" on a large project, the Xeon win is significant.
Just how significant? All the reviews when Hyperthreading came out showed 4% at best. It was kind of a joke at the time.
Actually, when you think about it an improved threading model would actually strongly benefit well-programmed games. Why? Because there are a lot of semi-related processes occuring. Sound, graphics, physics, etc etc... they're all part of the game but work in very different ways.
Now if you're working with a multithreaded CPU, one processor can be handling your CPU-bound graphics work (much of this is handed off to the video card anyhow), another can be doing sound/surround mixing, etc.
In an FPS with complicated AI, you could theoretically hand that off to CPU #2 while #1 is handling different things. Your graphics engine might not have ugly-mofo-alien #235 onscreen to render, but meanwhile he's watching you and looking for a boulder that will offer him good cover to snipe you from instead of just sitting like a drone waiting for a computer-acurate headshot.
Now let's say that PC's going multi-CPU. Maybe you don't need a single superpowerful processor, just a videocard and a few lower-powerful processors. Processor #1 is handing off the environmental data, #2 is prepping it for rendering and shovelling your GPU full of vertices, #3 is playing pinpoint surround for that cricket chirping behind the rock on your far left, and #4 is doing AI for ugly alien mofo #287.
When I think about how games are advancing a lot can come down to interprocess communications and/or bandwidth limitations. The GPU still handles much of the video stuff so your CPU isn't really a bottleneck there in many cases, but as internet connections speed up then you're going to have MMORPGs, FPS's, and more chock full of "actors" that make up sight, sound, physics, and AI that could very well benefit from more CPU's rather than extra ticks on your overclocked single processor.
After all, eye-candy is only a part of realism. True realism is also very much about a multitude of things happening at once.
Make a comment and ask a question and get marked as troll.
Go figure.
Hexy - a strategy game for iPhone/iPod Touch
I think the most interesting part of the article was when it said "Processor speed has increased many times -- it doubles every two years, while memory is still very slow, doubling every six years."
So maybe it would be more efficent for people to stop screwing around with new processor design ideas for a while and put a little effort in doubling the speed of memory access (and I don't mean by using level whatever caches). Selling motherboards with a faster memory bus would be easy, just give it a cool sounding name kind of like Sega's "Blast Processing". Let's call it "HyperRAM Technology!"
Losing faith in humanity one person at a time.
In reality games tend not to be I/O bound (except for graphics that tend not to be interupt driven anyway) - but many other workloads (think heavy HEAVY network) are rather I/O bound and they are bound in the Interupt processing. Freeing a CPU up to handle just Interupt processing will help the other CPU do the important User level work
I have mod points and I am not afraid to use them
How about something as simple as a search through an array of data.
You could have a CPU do a sort-based search, or a linear search. You could have two CPU's divide and conquer your list independantly and tackle it on a first-there basis.
For example, a list of names. If threaded became commonplace (if multi-CPU were common), a thread function might become:
Take list of 10000 clients where you are looking for client "Doug Ellis"
CPU #1 tags all items (assuming alphabetical sort by lastname) between 1 and 5000. CPU #2 takes 10000-5001.
The race begins, CPU #1 will hit the result and CPU #2's process can be terminated. OK, so this probably isn't much faster than your standard search.
But how about "Al Sanders"... a single CPU could would have more divides to reach that name, dividing names and finding if the target result would be more/less than the middle name. A linear search would take longer.
With either way, a second CPU starting from the list end would tag that name first, ending the search faster. Of course, a single CPU which was truely 2x faster than either of the duals would do nicely as well... but when we've reached a point where more MHZ aren't so easily forthcoming then dual CPU's handle the situation nicely indeed.
http://www.annexia.org/tmp/multithreading.ps
Rich.
libguestfs - tools for accessing and modifying virtual machine disk images
all you need is the ability to run processes... which I do right here.... on this abacus...
-pyrrho
Bruce
Bruce Perens.
I'm not certain, but i thought intels HT processors still only had one execution unit. They just have two fetch and decode processes and fast context switching between them.
Sure this 8 core chip won't be good for everything. But when you've got a lot of similar processes in some server environment then it should do very well.
My re-entry into programming as a hobbyist was via BeOS circa 1998 (I did a lot of CS in college but eventually decided to go to medical school). BeOS had a design philosophy that everything should be multithreaded as much as possible. It made for a very user-responsive system, but it also made almost all apps susceptible to race conditions. The BeOS API was very "fun" and easy, but I think it was a little deceptive. My two fairly usable apps were a MineSweeper clone and a front-end for a single-player chess program. In both cases, I had to deal with synchronization issues to avoid bizarre behavior (e.g. making sure the chessboard displayed each piece once and only once).
There was a great discussion around the time Be broke up involving some of the Be engineers where it was acknowledged that the "pervasive multithreading" idiom really made it exceptionally difficult to write bug-free apps, and also imposed extraordinary demands on the OS with respect to messaging. (Usually, we think of the OS as managing memory allocation, processor scheduling, and disk I/O, but under BeOS the app_server had an additional, highly critical role in handling messages between and within threads).
Well, it means you should be pestering QT to make QT thread-safe, without a thread safe tookit your hands are tied.
(QT is GPL, so there's no reason a bunch of developers couldn't get together and mane a thread safe version of QT)
thank God the internet isn't a human right.
Multithreading - What's it Mean to Developers?
Just another word to confuse your boss...
buffering...
Personally I would be more pissed at Oracle. Very few software vendors are taking this approach. MS SQL server doesn't charge per core. Oracle even changes 2x for a hyperthreaded x86. Its just plain gouging by Oracle. Another slimy thing that they do is to charge you for the enterprise version based on the number of "cores" that you can put into your server. Not the number in there but the number it CAN have. So if you happened to buy a 4 cpu sparc in the us3 days, you could pay for Oracle "standard". But now, since its possible to put dual core us4's in, the box is no longer allowed to use the "standard" version because it is capable of holding 8 cores. How is THAT Sun's fault.
Oracle sales is the used car lot of the software industry.
If you don't happen to be running Oracle then everything is usually fine. Resin changes per server, so use that instead of Bea. Use DB2 or SQL server, or Postgresql instead of Oracle. Oracle's cool but the licensing insanity makes it less than useful in practice.
Either way, thats a good word. A warning that its a simplification, some taking responsibility for the errors that will come from the simplification, the implication that the simplification might reach the level of a lie, and a little fun with language.
Thanks for putting on the feedbag. Thanks for going all out. Thanks for showing me your Swiss Army knife.
I just tested it with GCC 2.95.3, 3.2.1, 3.3, and 3.4.2, and it works fine. Of course, GCC is just ignoring the #pragma. I didn't know about OpenMP before this, but it does look like a good way to "optimize later" and have your code still compile with gcc. And you don't have to write and maintain two different versions separated by #ifdef, #else, #endif.
My other first post is car post.
It's not really a language but it does allow the programmer to say that sections of code could be parallelized and it will handle the number of threads, the forking, dividing up the work, and joinning of the threads. The programmer still has to make sure that the code is really parallelizable.
(appended to the end of comments you post)
I guess a few years ago you would have said: With Sun, IBM, Intel, and AMD all going SMP, companies like Oracle are just milking it for all its worth before they have to cave in and charge per machine. It is inevitable.
What is so special about a socket? Why is per-socket pricing legitimate, but per-core is not?
Oops, it looks like I was wrong...what the article is talking about is definitely not the same as hyperthreading, but goes far beyond this! Sorry, my bad....
However, I was certainly not "trolling" when I claimed that Hyperthreading really is impressive. As a quick demonstration, I compiled the latest version of ImageMagick (6.2.0) on my dual-Xeon (3.0 GHz) with Hyperthreading on, with 1, 2, 4, and 6 threads (make -j):
make -j 1: 6:26
make -j 2: 4:09
make -j 4: 2:54
make -j 6: 2:48
Anyone have a dual-CPU box without hyperthreading to compare this to? In the past I've tried it, you don't get nearly that much of a speed boost using 4 threads instead of 2, without Hyperthreading.
I always thought Microsoft's Visual"X" developer products did the same thing. That is they lead to lazy (and often unskilled) programmers who A - Can't optimize to save their jobs; and B - Can't debug to save their jobs either.
Think Deeply.
Have you never heard "what does" shortened to "what's"? Come on, don't jerk me off here.
MS per cpu licensing is per physical cpu socket, not per cpu core.
Ok lets face it, 98.9% of the time these decisions are made by either the OS, the compiler, or the VM. Very few programmers out there are really capable of making these decisions, and even fewer work in an environment where they are allowed to make them.
This is of interest to the OS developers, the compiler developers, and people who work on Beowolf Clusters.
Uses who are running an application mix with a lot of different threads (or processes) will see a benefit as well.
Think Deeply.
Infant, come back for a spanking. Or don't you know any other words except poser nerdspeak?
--
make install -not war
Not sure I agree with that. The thing is if your box doesn't have enough speed after you have optimised the application and the database, there's not much you can do about it. If it doesn't have enough throughput, you can add more boxes.
For example I've been working on a nontrivial system which has a 200ms response time ceiling and huge data volumes. That is hard to achieve (much harder than a previous project I worked on which has a 100us response time requirement). The system with the 200ms response time requirement also will have large transaction volumes. However, the response time is the scary bit. Adding more servers will deal with the total volumes, but if we can't deal with the individual requests within 200ms, we're in real trouble.
And yes, you're right. Throwing hardware at the problem is expensive. We now have more kit installed than I have even seen in one place before. In fact, at a rough estimate, more Sun boxes than I have eaten pizzas, and I've been working in IT for 15 years...
As a developer, the OS provides the interface for handling threads. Yes, we may have to handle locks on some platforms, but the OS still does most of the heavy lifting. What Sun describes would mainly apply to the interface between the part of OS that schedules process and/or threads (some OS'es, like Linux, treat threads and processes as contexts and don't differentiate between them for scheduling) on hardware, rather than something that a developer can make direct use of.
Sure, Intel uses the concept of logical processors for Hyperthreading, but the main thing that does is allow backwards compatible with API's available in current OS'es. I'm sure that, at some point in the future, OS'es may provide some other great way to use the capability.
I don't see that Sun has offered anything significantly different. Yes, they're handling caching diferently and their solution may actually work better, but the overall concept and net impact looks very similar. A developer will still use a threading API, so for a given hardware/OS/Development environment, there should be little difference. Future API developments and thread-safe libraries will make the biggest difference to the developer.
Moderation -1
100% Offtopic
I point out that HW threading support of SW threading techniques needs a mapping app, which would be gcc. How is that offtopic? And why the flame?
--
make install -not war
This is not college. Slashdot does not start with "there are no stupid questions". There are, you asked one, AND it was already more covered than the genitals in a tiroller soft sex movie.
For WikiPedia? ;D
So basically I can run a bunch of threads slowly simultaneously, and pay 10x the cost of AMD. Thanks but no thanks.
Vote for Pedro
Multithreading is really cool. Maybe it's about time programmers took a look at CSP, CCS, the pi-calculus and other parallel programming languages.
:D
Maybe the transputer and OCCAM will even return
The technique you are looking for is called "cache coloring". If you search for those two terms in Citeseer, you'll get about 60 papers back.
Effectively, multicore architectures are morally very similar to ccNUMA. In both cases you're talking a hierarchy of execution units.
http://www-1.ibm.com/servers/eserver/iseries/perf
http://www.cs.washington.edu/research/smt/
It uses fast task switching of 2 or 3 or 4 cycles of many soft-tasks using only one real core (more cores better!).
open4free ©
Imagine that the core has 1 active task and many sleeping tasks ...
And the special fast switching task for SMT consumes only 2 cycles.
Without SMT, when the first task is failing because of any missing, the remaining time is unused.
With SMT, when the first task is failing because of any missing, the remaining time is used by the special fast switching task for SMT and the next 2nd, 3rd, 4th, ... tasks of his scheduling list!.
open4free © it's beast IPC. More cores better!!!
Microsoft is winning these datacenters one at a time.
Vendor = UGS in this case.
Top reason: Which Linux do we support?
Of course I tell them to just pick one and let their users sort it out. --No dice. They believe expectations are too hard to manage and what happens when their particular linux dies.
Dorks.
Blogging because I can...
Why they don't see this is beyond me. All they need to do is spec what their config is and the rest can be handled by the users, VAR, whatever.
PTC does it, why can't the others?
Blogging because I can...
--Mike--