Apple Freezes Snow Leopard APIs
DJRumpy writes in to alert us that Apple's new OS, Snow Leopard, is apparently nearing completion. "Apple this past weekend distributed a new beta of Mac OS X 10.6 Snow Leopard that altered the programming methods used to optimize code for multi-core Macs, telling developers they were the last programming-oriented changes planned ahead of the software's release. ...`Apple is said to have informed recipients of Mac OS X 10.6 Snow Leopard build 10A354 that it has simplified the`... APIs for working with Grand Central, a new architecture that makes it easier for developers to take advantage of Macs with multiple processing cores. This technology works by breaking complex tasks into smaller blocks, which are then`... dispatched efficiently to a Mac's available cores for faster processing."
Why... is there... there so much... punctionations in the summary?
Because the summary is directly quoting the article and using ellipses to indicate that certain party of the quotation have been omitted. Usually there would be a space on either side of the ellipsis when this was done, but this is /. so I'll let this one slide.
Because it's a qoute. You see there are rules to any language and one of them in the English language is regarding quoting. When you quote a source the text written must be matching every word of the source. When the quote contains unnecessary text to the topic at hand you cut out that part and replace it with three periods. This indicates that there's a piece missing from the original quote, in case e.g. someone is questioning the quote at hand. So you see quoting is not interpreting, and must be, at all times, matching every word of the source.
Turn to side B for the next lesson.
I am the lawn!
The problem is shared-memory, not multi-processor or core itself. Graphics card have dedicated memory or reserve a chunk of the main memory.
It is true, because they privilege immutable data structures which are safe to access concurrently.
No ellipse is not a change to the text but a deletion from the text.
I'm one of the seed testers, and even posting anonymously, I am concerned not to violate Apple's NDA. So, I'll put it like this: I have 2 PPC machines and an Intel machine. I have only been able to get the SL builds to work on the Intel machine due, I'm pretty sure, to no fault of my own.
I'm by no means a multiprocessing expert, but I suspect the problem with your approach is in the overhead. Remember that the hardest part of multiprocessing, as far as the computer is concerned, is making sure that all the right bit of code get run in time to provide their information to the other bits of code that need it. The current model of multi-CPU code (as I understand it) is to have the programmer mark the pieces that are capable of running independently (either because they don't require outside information, or they never run at the same time as other pieces that need the information they access/provide), and tells the program when to spin off these modules as separate threads and where it will have to wait for them to return information.
What you're talking about would require the program to break out small chunks of itself, more or less as if sees fit, whenever it sees an opportunity to save some time by running parallel. This first requires the program to have some level of analytical capability for it's own code (Let's say we have two if statements one right after the other, can they be run concurrently? or does the result of the first influence the second? What about two function calls in a row?). The program will have to erect mutex locks around each piece of data it uses too, just to be sure that it doesn't cause dead locks if it misjudges whether two particular pieces of code can in fact run simultaneously.
It also seems to me (again I'm not an expert), that you'd spend a lot of time moving data between CPUs. As I understand it, one of the things you want to avoid in parallel programing is having a thread have to "move" to a different CPU. This is because all of the data for the thread has to be moved from the cache of the first CPU to the cache of the second. A relatively time consuming task. Multicore CPUs share level 2 cache I think, which might alleviate this, but the stuff in level 1 still has to be moved around, and if the move is off die, to another CPU entirely, then it doesn't help. In your solution I see a lot of these moves being forced. I also see a lot of "Chunk A and Chunk B provided data to Chunk C. Chunk A ran on CPU1, Chunk B on CPU2, and Chunk C has to run on CPU3, so it has to get the data out of the cache of the other two CPUS".
Remember that data access isn't a flat speed. l1 is faster than l2 which is much faster than RAM, which is MUCH faster than I/O buses. Anytime data has to pass through RAM to get to a CPU you lose time. With lots of little chunks running around getting processed, the chances of having to move data between CPUs goes up a lot. I think you'd lose more time on that then you gain by letting the bits all run on the same CPU.
I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
Unfortunately that's not the issue at hand. You're referring to the video card using system RAM for it's own, but the issue they're talking about (which only occurs in the 32-bit world, not 64-bit, due to the MMU) is that to address the memory on the video card, it has to be put into the same 32-bit addressable block as the RAM, which cuts into being able to use it all, rather than using it physically. At least, that's how I understand it works.
Snow Leopard is going to be the first version of Mac OS X that only runs on Intel Macs, so I'm afraid you're going to be stuck on plain old leopard