Reverse-engineered KNI Documentation
Clive Turvey has continued his investigation into KNI, and
has documented
the instructions he has found so far. MMX coders will be
happy to see that KNI contains a 1-cycle shuffle instruction,
like Altivec does, but unlike MMX. As expected
it has some instructions specifically for 3D (1/x and 1/sqrt(x)).
Update: 02/24 12:11 by S : As reader Christopher Thomas points out,
Intel has released a manual that includes a functional
description of KNI. But I can't find any instruction timings in it.
It weighs a hefty 5.5Mb.
or maybe not. Im sure intel will just enforce their will upon the innocent masses. I noticed Visual Studio 6.0 is mysteriously lacking an "optimize for 3D-NOW" checkbox while having both an optimize for KNI and MMX box. does Intel have some secret deal with Microsoft that im not aware of?
Actually I found an incredible deal where this place is Cali was selling em for the same price as most other places were dealing PIIs. I landed em for under 500 each, including warranty and heatsink fan.
PDG--"I don't like the Prozac, the Prozac likes me"
"Where is my mind?"
The damn components of Visual Studio can't even pass integer arrays reliably between them! You really expect this to work?
-bleh
DIR !!!!
That was obvious sarcasim !!!
A lot of that coding is well over my head-But one thing I do agree with is that MMX is a lot of hype, There is only one game I can think of that you can tell a difference between MMX and non and that is Rainbow 6-if you can ever get it to work in the first place.
This came as no surprise to those of us who actually went to their web site and read the I2O sig charter, rather than relying on misinformation from Bruce Parens and incompetent Wired News psuedo-journalists.
I have heard that Nvidia is comming out with a card with geometry processor code named NV10 right now, but this is just a rumor i dont have any hard proof yet.
Well, then write a QT parser for Linux. It's not hard, yet I hear all of these "OSS" advocates bitching that QT isn't on lnux- that gives me the utmost confidence in the contributions of many OSS people, oh yeah.
"You've got to be willing to read other people's code, then write your own, then have other people review your code." -- Bill Gates
that's great! whence did it come?
A small subset of these SIMD instructions are useful in what you might consider the kernel, or the core libraries for packet handling and copying blocks of memory. Don't write them off completely.
great when will g77_KNI come out?
Remember the U.S. Congress deemed reverse engineering ILLEGAL. Duh, I'm telling the feds on j00.
keep up the good work!
samedi@disinfo.yadda.net
Why reverse engineer publicly available information? Does this guy just have way too much time on his hands?
Daniel
Hurry up and jump on the individualist bandwagon!
I suggest you go to www.anandtech.com and check out the benchmarks for the celerons and Pentiums II/III's etc and then you can see how the celeron shapes up.
btw whats the rush? if cache and bus are so important why not wait for the k7? Problem is that currently it doesnt make much difference.
-kojak
hey, I just grabbed myself a pair of PIIIs for my dual cpu server, where I plan to do some massive graphics streaming, and reproduction. Put it with the 500megs of RAM and this thing is gonna scream like the devil
PDG--"I don't like the Prozac, the Prozac likes me"
"Where is my mind?"
when are we gunna get some optimazations for 3Dnow? I have a k6-2 300 and would love to see someone enhance the voodoo2 drivers in Linux for 3Dnow. I don't knowthat much about 3Dnow, but would it help to include 3Dnow into the kernel? I would love to see this. Anyway I just think it would be nice for people to write there apps and include 3dnow enhancements! Im not gunna purchase a PIII, its too expensive and the K3 looks better (plus its cheaper).
NaTaS
natas777@geocities.com
Natas of
-=Pedophagia=-
http://www.mp3.com/pedophagia
Also Admin of
http://loki.linuxgames.com
would you rather go thru the PDF hell of getting Intel's file or see this guys stuff fast and easy? Also give the guy credit for hacking this chip to scratche his itche in a format I find easy to read; not that I think I'm ever going to use this stuff. Allez Clive. You have My support!!
"...the KNI instructions won't ever get
used because all the rendering is passed
off to the graphics card."
Is incorrect. All a 3d accelerator card does is speed up the DISPLAY of 3d objects, the processor still has to figure out where to PUT those objects, hence the usefulness of KNI (or SIMD, as Intel wants to call it, or even AMD's 3dNow.)
See below for a more in-depth explanation.
As for 2d? Again, the only thing the card does is DISPLAY pictures, it does not do ANY calculations on where anything goes. In your standard Windowing OS, the processor has to figure out what is going to be displayed, and where it is going to go. After it figures this all out, it passes it on to the video card, which displays it. The difference between accelerated (2d) and non-accelerated is that on a non-accelerated, the processor has to tell it where every pixel goes, whereas an accelerated card can have "draw box", "put this text in box" passed to it instead.
3D explanation:
There are four steps in displaying a real-time 3d scene (I'll use a first person shooter for examples):
1. Process the scene for objects (walls, cieling, floor, polygons for bad guys)
2. Plot exactly where those objects are going to be, including removing hidden lines
3. Apply textures and lightsourcing to polygons
4. Display it on the monitor.
In a 3d-accelerated computer, the accelerator does steps 3 and 4. That leaves the processor to figure out where everything goes (which is where about 50% of the processing power is needed.)
This is why a P2-450 without 3d accelerator is approximately equal to a P-200MMX with one. The P2 is about twice as fast, so it does the same.
On VERY, VERY high end video cards ($2000+ workstation OpenGL cards) they can also do step 2, and THAT would take the hard work away from the processor.
Another non-functioning site was "uncertainty.microsoft.com."
The purpose of that site was not known.
Intel might not have directly offered you you the instructions. But hundreds of developers that are signed up with Intel have recieved information ages ago. Some of them also got beta machines several months ago.
So could you please clarify what you mean with 'Independent developer'??
Jurjen Katsman
nix@knoware.nl
Okay, first a disclaimer: The list of instructions that I read don't tell the whole story. (There are surely instructions that aren't documented there right?)
Comparing the 3DNow set to KNI, it seems like KNI is actually INFERIOR to 3DNow as far as performance is concerned! One thing that I still would need to read on is how many of each kind of pipeline are present in the P3. Just for some figures though:
(remember that KNI registers are twice the size
of 3DNow registers so divide the numbers below by two)
KNI 3DNow
Add: 2 1
Subtract: 2 1
Multiply: 2 1
Approx 1/x: 2 1
Approx 1/SQRT(x): 2 1
Accurate 1/x: 15-115* 4
Accurate 1/SQRT: 31-249* 5
Accurate Divide: 15-115* 6
Accurate SQRT: 16-134* 7
* There doesn't appear to be any instructions
in KNI that extend an approximated value to full
precision, unlike 3DNow. Even if they did exist,
why would Intel provide long-hand all-in-one alternatives? On KNI, Accurate 1/x is calculated by dividing X into 1. Accurate 1/SQRT is calculated by dividing SQRT(X) into 1. Accurate Divide uses a native instruction as does accurate SQRT. It also should be pointed out that KNI approximations are only accurate to 12bits whereas 3DNow approximations are accurate to 14bits for 1/x and 15bits for SQRT(x). I hope you KNI guys don't need the extra accuracy. (These seem to be drop-in replacements for the x87 SQRT and divides)
...And what about the extra instructions that KNI has? 3DNow uses MMX instructions to load and play with data while KNI needs its own special instructions for doing so. KNI also has single data versions of most of SIMDs... (pretty useless IMHO) So, obviously I can't really say which ones are better until I know how the pipelines are arranged. I did hear a rumor that the PIII only has one KNI pipeline which would mean that it majorly sucks after all. (One pipeline means that it is automatically half speed of 3DNow)
Actually, the hard way is to get a Slot One Celeron and drill out a pin, and solder a wire or two on the chip board thing. The easy way is here and it doesn't void your warranty on the Intel chip. It may void a warranty on a $10 "Slotket" adapter, but WHO CARES?
- - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - -
I run BeOS. The rules don't apply.
MMX should make any algorithm using a Discrete Cosine Transform run about 4 times faster, e.g. JPEG, MPEG, and most videoconferencing. KNI, on the other hand, only enhances certain graphics operations that can only be done in floating point, e.g. ray tracing. Which basically means, unless you're doing computer animation, the PIII is all of 1% faster than a PII at the same clock rate, which hardly justifies the additional money they're asking for it. I predict the PIII will be an abysmal flop for Intel -- sell your Intel shares and buy AMD now!
Is it safe to assume this is meant sarcastically?
"Genius may have its limitations, but stupidity is not thus handicapped." --Elbert Hubbard (1856-1915)
Well, it would be nice if gcc would implement a "use KNI instructions" option. Hopefully it'll happen in less time than it took GNU to implement MMX instructions.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
3dnow code is availble, as well as the docs. I know mpg123 can be compiled with 3dNow optimizations, if you have the correct version of binutils
oh well..
Playing games at a fast speed is fun..
but coding your engine to go 1frame/sec faster then someone else is funnier...
Hell.. next they place WHOLE GAMES into 1 single processor-instruction..
Does gcc have a command-line option to produce 3dNow! inline instructions in the executable? If not, it should.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Geometry acceleration will almost certainly start showing up by 1H00, when 0.18 is in full swing and another round of cards show up, and may be showing up in 2H99 with the current round of new cards. Which you believe depends on the veracity of the rumour mill and the competence of the card manufacturers.
Putting geometry acceleration on the cards is a Good Thing from the graphics card manufacturer's point of view. This frees up the processor in CPU-limited cases, and frees up the bus if some of your models can stay resident on the card and just be transformed instead of reloaded. Card manufacturers have stated for a while that they want to put geometry acceleration on cards, and since the middle of last year have been saying that it will show up Real Soon Now (tm). Whichever manufacturers _do_ manage to get good geometry acceleration out there with good drivers will decimate the members of the competition who put out cards without acceleration, at least in the short term. It is in their interests to implement this as quickly as possible.
Now, the rumour mill. Rumours include, but are not limited to:
They have the beginnings of this on the G200 already. Two Matrox cards are rumoured to be ready to ship, but tied up in NEC's fab lines as they work on the Dreamcast. One is the G400, rumoured to be a pair of G200s at 0.25 micron. One is the G300, which is an unknown quantity.
This wouldn't hurt sales of their high-end cards, which are multi-chip, have vast amouns of RAM, and are in general designed to be better cards than anything on the consumer end.
As opposed to just being a shrink of the TNT to 0.25 micron. I'm doubtful of this one.
Take your pick, but it wouldn't surprise me if at least one of these turned out to be true, and these aren't the only cards coming out this year.
This also doesn't address the fact that for doing certain things ( wieghted meshes, physics,...) a geometry accellerator isn't going to help.
Quite true. However, there is enough geometry grunt work being done in most cases that a geometry accelerator would certainly help.
No consumer level ( 3DLabs gamma chip doesn't count) 3D Geometry hardware is announced by any of the major players so if you expect to see anything shipping in 99 I think you are dreaming. Just look how long it took 3DLabs, NVidia, ATI,... to ship the latest chips after they were announced. This also doesn't address the fact that for doing certain things ( wieghted meshes, physics,...) a geometry accellerator isn't going to help.
I thought the only way you could get a Celeron to SMP was by souldering on some extra wires and stuff?
I'm out of my tree just now but please feel free to leave a banana.
I don't care if they added a bazillion new instructions. I still think Intel's P-III is a load of crap. I heard that they are using recycled Pepsi can parts. hmm... sounds kinda fishy to me.
You must have got 450Mhz PIIIs. Price Watch's lowest price for a 500Mhz PIII is $708, for a 450Mhz PIII it is $504. Which is still a lot more than the price of a Celeron, which can be overclocked to run at 450Mhz...
>if the OpenGL and DirectX driver authors decide to write KNI-optimized drivers
That's the problem though, when MMX first came out it was the same old thing-it was going to be real useful whenever the authors decide to use it-Now they are going to KNI-and again, we will have to wait and see if our money will be well spent. Although this time around I think it is slightly different than MMX in the way that people have their eyes open, and hopefully realize that it will be some time before there is software available to use KNI to it's max potential. Most people don't realize that KNI and MMX won't enhance their current apps.
It most certainly would, which is why the full instruction set manual is up on their web page in plain view for anyone who wants to look at it.
Ideally they'd have released it before the PIII launch, but now that the PIII has been officially released, it's definitely publically available.
The URL is http://developer.intel.com/de sign/pentiumiii/manuals/.
hmm. well it seems strange that they didnt include 3dnow support in their VS and they ALREADY have KNI support.
For real-time 3D multimedia, people who are
interested in performance will probably
invest in a 3D accelerator card anyway.
So the KNI instructions won't ever get
used because all the rendering is passed
off to the graphics card.
And for the 2D stuff, everyone already
HAS graphics acceleration anyway. Since
when is the main system CPU doing any
multimedia calculations in a modern
high-end PC? (Except for video decoding)
It seems to me that the only time these
KNI instructions would help speed up a
3D app would be when you are rendering
in non-realtime, such as when you use
POV to raytrace an image. But that
certainly doesn't seem to be the market
Intel is aiming for.
Someone please correct me if I have made
some mistake in my brief analysis.
-- Bret
kni and mmx are cool because all you have to do is download the latest service pack to visual studio, load your code, go to project options, click the "optimize for mmx" and "optimize for kni" checkboxes, recompile, and magic your done.
To take best advantage of it you probalby want to handcode in assembly, but when you don't have time for that - remember the checkbox!
Actually there is no difference in the celeron core to a pentium II and its cache is actually
an *improvement* over the pentium II unless it is
used for mundane repeative business application. Also because its cache runs at cpu clock speed watch is *surpass* the PII at highier than 450mhz which is why the celeron line will continue. I'm
really sorry you bought into the hype and wasted your money but thats how intel makes its money.
-kojak
Microsoft's release of Windows KNI..
.WHEEEE!!
run to your local dealer..
You mean you already blew a few thousand dollars without stopping to see if there was a better solution first. I have one thing to say, alpha. Everyone knows the drill, 64 bit and all the rest.
I assure you I am very biased but that is only because I know I am right
Did I miss something, or did Intel completely
neglect to publish any timing information for
the new streaming SIMD instructions?
as someone who's done a bit of 3d/game development, i have to say that the kni are a big step forward. four parallel floating point divisions in one instruction? finding an average or max/min of multiple integers in one instruction? hell, if used right, this should result in order-of-magnitude speed improvements!
I sent an e-mail to AMD about this, and they pointed me in the direction of their docs
l
for the 3DNow instruction set - Unfortunately I am not skilled enough to do anything constructive ( like making a 3DNow asm target....)
They were quite nice about it, and not at all hostile to Linux -they'd presumably be only too happy for someone to do something about it,
it could only increase their sales...
Here are the relevant links:
Dear David,
For the 3Dnow code instructions, check the Data book:
http://www.amd.com/K6/k6docs/index.html
For general information on 3Dnow, consult the following site:
http://www.amd.com/products/cpg/3dnow/index.htm
Choice of masters is not freedom.
Part of our friend the CPU ID instruction was a thermally-based random number generator that was going to be used to encrypt the CPU serial number. Does anyone know if this random number generator will be exposed to the user instruction set?
A true random number generator would be really handy to have, if only to seed a traditional pseudorandom number generator. The thermally based random number generator on Katmai sounds about as close as we're going to get.
I ate two sticks of celery at the same time once, and I did get all those wirey (s) things in my teeth. Thats about all the good it did me tho, probably about the same that a dual celeron system would do anyone.
do a ftpsearch for binutils, and download the latest version
:D
ftpsearch.lycos.com
and I just think it would be nice if you wrote those apps and include 3dnow enhancements
Are we without sarcasm and humor this day, friend?
That's super, but how do I get Windows to run my KNI app (or any app for that matter) without collapsing in a pile of shit on the floor?
Grow up.
You are getting a peek on pre-release documentation, just by signing up, information that will be released to general public after release, but oh noh, you are to stubborn to show some goodwill, and afterwards complain you don't get to look at the specs.
I can't seem to find those checkboxes in my version of gcc....
IIRC, not too long ago, primarily due to Intel's being part of I2O and also an investor in RedHat, the I2O specs were opened up, at least to a few interested parties like RedHat without requiring that they be a "member" of the I2O group first. (i.e. pay big bux.)
Blew a few thousand? I think not. But aside from buying an SGI or an even more Unix system, I think building this system has been rather affordable. Besides, the software is cheaper, easier to find, and my bank account isn't UNlimited.
PDG--"I don't like the Prozac, the Prozac likes me"
"Where is my mind?"
some mistake in my brief analysis.
Re. 3D graphics, you are most certainly correct. As of roughly 2H99, enough 3D graphics cards with geometry acceleration will exist to make SSE useless to the gamers who would otherwise have cared about it.
Re. 2D graphics, the situation here is a bit odder. 2D acceleration has existed for a while, but most image processing programs use software filtering for better control over the output. Filtering is one thing that AltiVec will be good at, so I expect to see a horde of Mac users proclaiming that the G4 is the ultimate in computing because it runs Photoshop five times faster than a PII. If your main use of your computer is image processing in Photoshop, I guess that's a good point. If your main use is Quake, then it will be less relevant.
I've been dabbling in rendering and ray-tracing for a while, but would be more interested in a cheap 8-way SMP system than a SSE system for the time being (regrettably, these don't seem to exist).
What an informed, objective opinion...
And then you can watch the compiler produce the wrong assembly...
What we need are cheap geometry chips. The bottleneck on most 3D cards right now is the CPU and/or the bus. They just can't feed the card enoug triangles to keep it happy. This would be solved if there were cheap geometry chips that could be put on the 3dcard. Then your CPU could be 50% idle while running Quake :) I mean, think about it, the only reason people buy these powerful chips (these x86 chips) is because of games, multimedia processing (get an SGI, which are, I know x86, but at least they're different) maybe server stuff (get a Sun or an Alpha). With cheap geometry processors, you knock out a LOT of Intel's market and KNI becomes practically useless.
Whoa! So much for formatting. The first number is for KNI and the secone one is for 3DNow! okay kiddies? And don't bother to post that KNI is really named SEMDS or whatever the hell it is. ;)
Of course, if you have more information on the whole deal..........
While we are on the subject:
Is there a real random number generator connected to the Web someplace? Wouldn't it be cool to have a net of servers that listen on a certain port and provide a truely random 32bit integer (or whatever) which could be used as a seed for your PRNG? Kind of like time servers work today.
I'm going to get an Altivec-enabled chip when they're available - if you're going to have SIMD instructions, you might as well have them done correctly.
Altivec and QuickTime on Linux would be very compelling.
The ironic truth is, when Intel released the 166MMX processor. It was faster than even the regular 200. (without mmx apps) The dirty little secret was that they doubled the L1 cache.
Ok, show me that magic checkbox which will make Visual Studio generate MMX instructions (I don't even talk about PIII). MMX instructions are not compiler friendly, so AFAIK there are no compilers that generate them for portable ANSI C code (pgcc guys said they'll do that ... maybe).
didn't Carmack say that Quake 3 gets a 20-25% boost from SSE? Anyway, it will rock when TNT SLI comes out_ tntsli_p/
http://www2.sharkyextreme.com/hardware/metabyte
Bottom line--if everyone wants to tout celerons and AMD, then find me a motherboard that can handle TWO PROCESSORS, plus ultra-wide scsi 2, and still handle major video streaming and encoding.
PDG--"I don't like the Prozac, the Prozac likes me"
"Where is my mind?"
I'm not quite clear here... if Intel wants KNI to become a new processor standard, and wants everybody to write software for it, wouldn't it behoove them to publish the instruction set themselves, not leave it to hackers to reverse engineer it???
I've been researching 3D cards a bit, because
I'm considering applying for a job doing
OpenGL programming. Anyway, any card which
implements OpenGL will do the coordinate
transformations as well as the actual
rendering.
Perhaps the KNI was a good idea which is
about to be eclipsed by the next generation
of 3D video cards.
BTW, the Vodoo2 chipset does OpenGL and
Direct3D as well as 3Dfx.
-- Bret