Hannibal_Ars · Slashdot Mirror

Re:The benchmarks are bogus on Preliminary OS X & PPC 970 Benchmarks · 2003-05-05 14:57 · Score: 2, Insightful

What benchmarks? There are no publicly available benchmarks for the 970 that show anything, aside from whatever numbers IBM's PR dept. released as "projected" SPEC numbers and whatever numbers some Mac rumor site cooks up to drive up traffic.

Re:The benchmarks are bogus on Preliminary OS X & PPC 970 Benchmarks · 2003-05-05 12:05 · Score: 4, Informative

Ok, I see that I should've clarified a bit more, but I was in a hurry to get out of the house so I just fired off a quick post. First off, see this thread:

http://arstechnica.infopop.net/OpenTopic/page?q= Y& a=tpc&s=50009562&f=8300945231&m=3470943335&p=5 6

Check out pages 52 onward for some detailed discussion of these issues in advance of my article.

Now for a preliminary explanation (let's see if I can condense this):

In a very small nutshell, the 970 has less general-purpose integer hardware than either the G4e or the P4. It has two general-purpose ALUs (or arithmetic logic units, which do integer computation) that are both mostly symetric. This means that both ALUs handle almost all types of integer ops with a two cycle latency. However, there are some differences, but more on that in a sec.

The G4e, on the other hand, has one complex integer unit and three simple integer units. The three simple integer units have a one-cycle latency and handle all the basic types of integer instructions (add, multiply, etc.). Longer, more complex multi-cycle instructions, of which there are few and these show up statistically more rarely than the fast integer ops, are handled in the complex ALU.

So a basic comparison of ALU hardware shows you that the G4e has slightly more integer hardware that's more specialized and hence potentially faster. (Think a supermarket with two general purpose checkout lanes vs. a supermarket with three express lanes and one general purpose checkout lane).

This doesn't tell you the whole story. First, the good: The 970 handles CR logical operations in a separate unit, the CR logical unit. These types of ops are done on the G4e in the complex integer unit. So this bit of specialization helps the 970 out just a bit, but only a bit because CR ops are relatively rare.

Now for the bad, which is a killer: the 970's group dispatching scheme dictates that one ALU is fed from dispatch slots 0 and 3, while the other is fed from dispatch slots 1 and 2. (If you don't know what a dispatch slot is, reread my first 970 article.) So of the four possible integer ops that can be dispatched in parallel on any given cycle to the 970's ALU issue queues, two are constrained to go to one unit and two are constrained to go to the other. This sort of partitioning scheme makes code scheduling critical, because if there's a mix of integer ops and other types of ops (e.g. loads, stores, etc.) then one ALU's issue queue(s) could be oversubscribed while the others' languishes, due to the fact that the other ops happen to be pushing all the integer ops into one particular pair of dispatch slots (i.e. either 0 and 3 or 1 and 2).

Now, this is potentially bad enough already. But when you factor in the fact that the ALUs are not symetrical, and that certain types of ops can only go to one ALU and hence MUST go into one of only two dispatch slots, then you get a recipe for further choking of dispatch bandwidth.

Ok, I've probably managed to confuse anyone who's read this far, but so be it. You asked for an explanation. Read that thread I linked above for more discussion, or just wait for my article (it should be finished any day now) for a more user-friendly explanation with nice color diagrams and such.

The end result is that the 970's ALU hardware is weaker than that of the G4e, the P4, and the Athlon. So its clock-for-clock integer performance will be worse, at least this is what I'm predicting. We'll see if I'm right.

Now, this really isn't too big a deal to my mind, because most people care more about floating point and vector ops for the types of desktop and workstation apps that run on a Mac.

More worrisome is the inferior Altivec (or, as IBM calls it, VMX) hardware. The G4e has a superior and more robust SIMD implementation, but it's severely hobbled by a lack of FSB and memory subsystem bandwidth. I'm sure that IBM will improve the SIMD situation in future releases of the chip, though. Right n

The benchmarks are bogus on Preliminary OS X & PPC 970 Benchmarks · 2003-05-05 08:18 · Score: 5, Insightful

We've had some discussion of these in the Ars Mac forum, and the consensus is that they're bogus. I'm currently wrapping up part II of my 970 article, and I'm pretty certain that these numbers are made up.

Here's how it will break down clock-for-clock:

Floating-point: the 970 will spank the G4e
Integer: The G4e will spank the 970
Vector: it's a tie, even though the 970's Altivec hardware is inferior to that of the G4e. What gives the 970 a boost is Dual-channel DDR400 and a real FSB. If you were to put the G4e in a similar system, it would out perform the 970 clock-for-clock pretty handily.

Anyway, I could elaborate more, but I'd rather work on my article.

Re:Embedding an oscillator on Intel's Anti-Overclocking Technology Simplified · 2003-04-11 08:01 · Score: 2, Informative

This is not about embedding an oscillator on the chip. The article linked above implies this, but it's wrong. Please read the patent before making any claims about it. The reference pulse is generated by a special circuit in the chipset. This circuit uses either a ring oscillator or a quartz crystal.

The article is wrong on Intel's Anti-Overclocking Technology Simplified · 2003-04-11 07:58 · Score: 5, Informative

I don't think that the author of this article actually understands the patent in question. Specifically, the reference signal is absolutely not generated on the CPU die, as the author claims. Intel's new scheme is still dependent on the chipset's cooperation.

Anyway, I won't go into anymore detail here, because I explain the patent and its implications for overclocking in the following Ars news post:

http://arstechnica.com/archive/news/1048630320.h tm l

Re:Well, it works. on Intel Patents Anti-Overclocking Technology · 2003-03-25 09:19 · Score: 1

This isn't actually how the patent works. The reference crystal is on the chipset.

I just made a post at Ars that will hopefully clear up any confusion:

http://arstechnica.com/archive/news/1048630320.h tm l

Re:Yet another commercial on Video Capturing Guide at Ars Technica · 2003-03-20 07:25 · Score: 1

Ok, sorry I went off on you like that. I'm just a little sensitive about things that look like accusations of commercialism on our part, because I do have a day job and I do participate in Ars for largely non-commercial reasons. Of course, I'm not saying that we don't make any money off of it, but it's not making us rich, or even lettings us quit our day jobs. So we do Ars for the same kinds of intangible reasons that people do open source.

But I share your disappointment in Slashdot's poor editorial standards. And it doesn't help that the "editors" seem to think that such poor standards are a virtue in that it marks them as "hardcore" and "in the trenches." To my mind, hardcore and in the trenches are when you come home from a long day at your regular job, sit down, and work like hell to produce, tweak, refine, edit, and polish an article or a news post so that it's the best and most accurate possible piece of journalism that you can produce. It rankles me as someone who does this, at some cost to my personal life and other career ambitions, to see laziness and carelessness being passed off as a positive thing.

Re:Yet another commercial on Video Capturing Guide at Ars Technica · 2003-03-20 04:05 · Score: 4, Informative

"Hey, I'm a complete moron who, in between late-night one-handed surfing sessions, likes to make inane and ignorant slashdot posts bashing anyone with the initiative to make a contribution to the online tech community. I didn't actually read the article, so I wouldn't know that it was put together by a group of volunteers who donated their time and effort so that people like me can have easy access to technical information."

"Oh, by the way, I also have no clue about Ars in general, so I wouldn't know that the entire site (with the exception of the forums) runs on a single server, and that the guys who own it, run it, and contribute to it have day jobs in order to support themselves so that they can spend their precious free time creating high-quality web content that they give away for free. I would get a life, but it's just too easy to sit back and fire off a cynical post to Slashdot, hoping someone will mod me up and I'll have my very own flaccid little moment of poseur fame--a moment that, unlike the folks who contributed to the article I'm bashing, I didn't actually have to do any work for. "

Re:Reasons for not subscribing. on Slashdot Subscribers Now See The Future · 2003-03-06 08:16 · Score: 1

I'm sorry Taco, but your statements to this effect have always mystified me. This an absolutely juvenile stance to take, it?s counterproductive, and it?s mildly insulting to your audience.

It doesn?t take but a few seconds to run a simple spell check, or to look over a post for things like run-ons and whatnot. We at Ars occasionally post dupes, and we occasionally have typos, but it?s relatively rare. Why is it rare? Because we care enough about what we do and about what our readers think of us to go the extra mile. (Actually, it?s more like an extra 10 yards or so.)

And before you protest that Ars doesn?t post as many stories as /., I?ll say that there have been times when our news volume was as high as it is here. And even then we were able to control dupes and typos quite effectively.

If you?re asking people to pay for a service, and they respond with a perfectly reasonable request for a little professionalism and respect on your part, then it?s in your best interest to try and make them happy. Being professional is not synonymous with selling out, and being unprofessional is not synonymous with being ?hardcore? and ?in the trenches.? This is a puerile fantasy more appropriate to adolescent suburban males who grudgingly work in the service sector stocking groceries or flipping burgers; it?s not at all appropriate for someone who runs a business.

When people see that you care about your work and that you think enough of them to give them your very best, then not only do they not mind supporting you, but they?re glad to do so.

Re:This again? on Understanding Moore's Law · 2003-02-20 09:39 · Score: 4, Informative

If you're referring to that recent Red Herring article, my article was indeed "inspired" by it in the sense that I thought it was sensationlistic crap and I just couldn't take it anymore. For more info, see the news blurb that announces the article:

http://arstechnica.com/archive/news/1045747027.h tm l

Re:That's not right.... on Understanding Moore's Law · 2003-02-20 09:36 · Score: 2, Interesting

LOL! It was about 3am when I wrote that line, and I was completely fried and just wanted to be done with the article. You're right, though, that sentence (and some of the other parts of the intro) is completely overwritten... or something.

Oh well, at least the rest of the article (hopefully) doesn't appear to take itself quite as seriously as the intro :0)

Jon

Re:Just a few problems on Understanding the Microprocessor · 2002-12-04 08:11 · Score: 5, Informative

"First of all, I think it would have been beneficial to examine a really stupid CPU (like the 8086 perhaps) before launching into stuff like SIMD."

Did you read the article, or did you just skim it. Nowhere do I launch into a discussion of SIMD. The only reason the term is present is because I used a diagram from a previous article.

"Second, the first two instruction types given are arithmetic and load/store. Unfortunately something like half the instructions (or more) in a program are usually arithmetic and branch instructions (conditional jumps in fact.) So those are definitely the things to discuss first, before load/store, if you're going to do it that way. I personally would bring all three types of operation to the front right away and then delve into how they work, but that's a personal decision. "

Yes, it's "personal decision," and I opted to go a different route. I think the order in which I introduced the concepts works. Other orders, are, of course, possible.

"Speaking of branching instructions he describes forward and backward branches. This is silly. There are two kinds of branches, relative (offset) and absolute. You can jump to a location which is +/- however far from your current position, or you can jump to a specific address."

Once you're done with your little intro to ASM, chief, you might stick around for some more advanced courses. In them, you'll learn that what branch prediction algorthims care about are whether a branch is forward or backward, because this tells you whether or not to assume it's part of a loop condition or not. I won't explain further, though, because a. I've covered the topic in previous articles, and b. I don't like to feed trolls anymore than I have to.

"I thought that this article was going to talk about how it was actually done. Maybe I'm just special (where's my helmet?) but I've got most of this material (in this article) out of previous ars technica articles."

Maybe if you'd have read the intro a little more closely, you'd know that I made it clear that everything in that article was covered in more depth in previous Ars articles. This article was intended as background for those articles.

"If you ignore every other point I've made in this, consider the possibility that it is a big mistake to start talking about heavily pipelined CPUs."

I don't discuss heavily pipelined CPUs, or pipelining in general, in this article. I do refer back to previous articles on the P4, but that's recommended as furthe reading. I'll cover pipelining in a future article (a point that I made clear in the conclusion.) And yes, I know that PC = IP in x86 lingo. Thank you. Now we all know that you know, too. Here's a cookie.

"Finally, is it just me or is it amusing that we're supposed to understand this before hammer arrives but every page has a gigantic animated Pentium IV ad? Up yours, ars adsica. "

I made one reference to Hammer in the intro, along with a reference to Itanium2, Yamhill, etc. Let it go, man. This article doesn't pretend to have much of anything specific to do with AMD.

Re:did anyone notice? on Final Fantasy Movie Interview · 2001-07-31 06:30 · Score: 4

Regarding the voice synch issue, see my comment in the Discussion Link attached to the front page post. Basically, it's going to be an issue for quite a while with CG, photorealistic movies. It goes much deeper than just a standard synching or tech problem.

Stellar investigative reporting on Read To Your Children, Go To Jail (Not Really) · 2000-12-14 01:00 · Score: 5

If you actually download and install a beta copy of the eBook reader, you'll find that the "Read Aloud" permission setting has nothing to do with whether or not the book can be read aloud to your children. In fact, the setting refers to a function of reader software, which you can use to have a synthesized voice read the book aloud to you if the book comes with that permission. The book pictured does not, so the top button on that bottom line of buttons on the left only says "Read". Were the "read aloud" pemrission enabled, that button would say "Read Aloud", and a synthesized voice would read the contents of the book through your speakers. Yeah, it's stupid and maybe even slightly ominous, but it's not nearly what it has been made out to be here.

Exactly on Rambus and DDR RAM writeup · 2000-08-28 03:54 · Score: 1

As I said in the conclusion, I'll be stepping back and forming some opinions closer to the end of the piece. And as jazzyfox has pointed out, the article is neither a review nor a performance comparison. It's a technical explanation of how two technologies work, their individual advantages, and their individual drawbacks. Whenever I make a comparison between the two, it's usually for didactic purposes and not necessarily so I can make a blanket call as to which one is "better."

And as far as which one is "better" it all depends on the situation. (I say as much in my intro to the Rambus section.) Individual technologies are "better" or "worse" for _particular applications_. Depending on the constraints that you're operating under (cost, latency, bandwidth, availability, granularity, etc.) one solution will fit your needs better than another one.

Yeah, sometimes it's easy to make a clear call on which of two similar technologies is better for 99% of the applications out there, like if you're comparing FPM RAM to EDO RAM. But Rambus is complex enough and different enough from DDR SDRAM to where it's not always a black and white issue. Rambus has advantages that make it better for certain applications, and DDR DRAM has advantages that make /it/ better for other applications.

But again, there are no benchmarks in the article, nor will there be. There are plenty of places where you can find out how an RDRAM system configured a certain way stacks up against a similarly configured DDR DRAM system running a certain set of application benchmarks. I'd suggest you check out one of those to see which technology best suits your particular needs. If you're just curious about how it all works, though, I hope my article can be of some help to you.

Slashdot Mirror

User: Hannibal_Ars

Comments · 40