Eh.. what kind of logic? Are you talking about classical logic? Even classical logic is tainted with flaws. Let me give you an example. "If P then P.and P", where P is some proposition is quite legal using classical logic [P => (P/\ P)]. Put P to mean "I have a dollar", and you can easily see that we're in for some trouble. [Girard's linear logic, essentially started in the late 1980s - not too long ago - started addressing these issues.]
It is important not to be too hung up on the mechanics of proof theory as much as it is folly to think that one can put concepts such as geometry into the constricting confines of axiomatic theory. How incredibly inelegant and off the point is the standard epsilon-delta proof for continuity! Logic should be taught, but there is no need to grind into the students to boring mechanisms underlying our logic systems.
Oh, and please do read the article!
You misunderstand the concept of "works". It "works" means that it works automatically. The user should not have to read the fine manual (that's what RTFM means). It should just work!
Linux has gotten a lot better over the years, but until the user installment process is so easy that anyone can do it, then you have to agree that Linux is still not good enough.
PS: I personally have no problems figuring these things out, after all I have contributed code to the Linux kernel, but it still bugs me that I have to.
Unrolling on CELL is slightly more involved than on x86 machines. I took the liberty of trying to code the above C-loop in SPU assembly, and I got the following results, assuming a large byte-array: 0.1132 cycles/byte!! Beat that!
Assume data is unsigned bytes, and that we have lots of data. First thing to notice is that the SPU has a SUMB instruction, that sums the bytes in a quadword. I.e.: If we have two registers R0 and R1 containing 16 bytes:
This instruction has throughput of 1 cycle, but a latency of 4 and is an even instruction.
We then have to add the individual half-words together, R2 = [h0,...h7]. This can be done by interpreting R2 as 4 32-bit words, masking off the high and low parts of R2 and adding them into R3:
R3 = (R2 >> 16) + (R2 & 0xFFFF), for each 32-bit word in R2.
However, since we are even-bound, we can instead use two shuffles and an add. We can use this as a balancing technique.
We will accumulate all of the sums into these 4 components, and then when the loop is over, add them together to make the final tally. Notice that we could have skipped the half-word to word reduction, but we want to make sure that we can count up to more than 65535.
The final loop would look something like this (w/branch hint penalty on the last loop). It reads in 256 bytes worth of data, (unrolled 8 times), and adds them together into a single vector accumulator. Since there are 29 even/odd pairs of instructions, this loop should take 29 cycles per 256 bytes, i.e., give a speed of 0.1132 cycles/byte. Of course, there are setup penalties and a penalty for missing the last branch. Still, this should show the power of the SPU.
Loop: {e2} a s0, l0, h0 {o4} shufb h4, s4, s4, m_hiwords {e2} a s1, l1, h1 {o4} shufb h5, s5, s5, m_hiwords {e2} a s2, l2, h2 {o4} shufb h6, s6, s6, m_hiwords {e2} a s3, l3, h3 {o4} shufb h7, s7, s7, m_hiwords {e2} a s4, l4, h4 {o6} lqx r2, base0x20, index {e2} a s5, l5, h5 {o6} lqx r3, base0x30, index {e2} a s6, l6, h6 {o6} lqx r4, base0x40, index {e2} a s7, l7, h7 {o6} lqx r5, base0x50, index {e2} a s0, s0, s1 {o6} lqx r6, base0x60, index {e2} a s1, s2, s3 {o6} lqx r7, base0x70, index {e2} a s2, s4, s5 {o6} lqx r8, base0x80, index {e2} a s3, s6, s7 {o6} lqx r9, base0x90, index {e2} a s0, s0, s1 {o6} lqx ra, base0xa0, index {e2} a s1, s2, s3 {o6} lqx rb, base0xb0, index {e2} a acc, acc, s0 {o6} lqx rc, base0xc0, index {e4} sumb s0, r0, r1 {o6} lqx rd, base0xd0, index {e2} a
People told you to walk away? Jeez...
What piece of advice would you give yourself now?
Walk away?
Dude, as much as I hate people blaming "someone else", or "something that happened to me" - I have to tell you - Stand up for yourself! They picked on you because you let them! Kids are weak. Size has nothing to do with anything. A bigger kid kicked the shit out of you? Revenge it! Everyone has a weak spot.
I'm sorry, I'm just sick of this "oh, poor me, my parents were terrible to me, and that's why everything goes wrong with me". Change things!
Roggie.
I'm a nerd - my wife will testament to that. I was also a jock, though no longer (my belly will attest to that). And I was a brute - my fellow classmates will agree on that.
I would shout "what did you get on your math test, Bob?" for everyone to hear. "A" was the reply. "What?!? Just an A???". My class mates hated that.:-)
Why didn't anyone kick the shit out of me? Because I was smarter than them, because I was stronger and bigger than them and because I didn't act like a loser. "Wanna fight me? Better bring all your buddies, if you have some."
I was popular among the girls - not so much with the boys, but a lot of the kids liked the fact that I didn't care who they were (in the pecking order). In fact, I treated everyone pretty much the same - hmm, well I was rather mean to a couple of bullies... but they deserved it.
I really don't think it matters if you are a nerd or not. You just have to be a little bit smarter and unafraid. The best advice I got through this time was that all kids are really just insecure and afraid little monsters. With this in mind you should be able to put your social life in the right place so that you can forget about it and study/read whatever you want.
I will never forget this classmate of mine telling me she really started studying when I had snorted: "Hah, you will _never_ get an 'A' in maths." She got so pissed off she started reading - and guess what - she succeded.
Ah well, the States might be different from my home country...
Eh.. what kind of logic? Are you talking about classical logic? Even classical logic is tainted with flaws. Let me give you an example. "If P then P.and P", where P is some proposition is quite legal using classical logic [P => (P /\ P)]. Put P to mean "I have a dollar", and you can easily see that we're in for some trouble. [Girard's linear logic, essentially started in the late 1980s - not too long ago - started addressing these issues.]
It is important not to be too hung up on the mechanics of proof theory as much as it is folly to think that one can put concepts such as geometry into the constricting confines of axiomatic theory. How incredibly inelegant and off the point is the standard epsilon-delta proof for continuity! Logic should be taught, but there is no need to grind into the students to boring mechanisms underlying our logic systems.
Oh, and please do read the article!
Hold on a second there.
You misunderstand the concept of "works". It "works" means that it works automatically. The user should not have to read the fine manual (that's what RTFM means). It should just work!
Linux has gotten a lot better over the years, but until the user installment process is so easy that anyone can do it, then you have to agree that Linux is still not good enough.
PS: I personally have no problems figuring these things out, after all I have contributed code to the Linux kernel, but it still bugs me that I have to.
Unrolling on CELL is slightly more involved than on x86 machines. I took the liberty of trying to code the above C-loop in SPU assembly, and I got the following results, assuming a large byte-array: 0.1132 cycles/byte!! Beat that!
Assume data is unsigned bytes, and that we have lots of data. First thing to notice is that the SPU has a SUMB instruction, that sums the bytes in a quadword. I.e.: If we have two registers R0 and R1 containing 16 bytes:
R0 = [a0,b0,c0,d0; e0,f0,g0,h0; i0,j0,k0,l0; m0,n0,o0,p0],
R1 = [a1,b1,c1,d1; e1,f1,g1,h1; i1,j1,k1,l1; m1,n1,o1,p1],
then
SUMB R2, R0, R1
gives the 8 half-words
R2 = [a0+b0+c0+d0, a1+b1+c1+d1, e0+f0+g0+h0, e1+f1+g1+h1,
i0+j0+k0+l0, i1+j1+k1+l1, m0+n0+o0+p0, m1+n1+o1+p1].
This instruction has throughput of 1 cycle, but a latency of 4 and is
an even instruction.
We then have to add the individual half-words together, R2 = [h0,...h7]. This can be done by interpreting R2 as 4 32-bit words, masking off the high and low parts of R2 and adding
them into R3:
R3 = (R2 >> 16) + (R2 & 0xFFFF), for each 32-bit word in R2.
However, since we are even-bound, we can instead use two shuffles and an add. We can use this as a balancing technique.
R3 will contain:
R3 = [h0 + h1, h2 + h3, h4 + h5, h6 + h7] = [w0, w1, w2, w3].
We will accumulate all of the sums into these 4 components, and then when the loop is over, add them together to make the final tally. Notice that we could have skipped the half-word to word reduction, but we want to make sure that we can count up to more than 65535.
The final loop would look something like this (w/branch hint penalty on the last loop). It reads in 256 bytes worth of data, (unrolled 8 times), and adds them together into a single vector accumulator. Since there are 29 even/odd pairs of instructions, this loop should take 29 cycles per 256 bytes, i.e., give a speed of 0.1132 cycles/byte. Of course, there are setup penalties and a penalty for missing the last branch. Still, this should show the power of the SPU.
Loop:
{e2} a s0, l0, h0 {o4} shufb h4, s4, s4, m_hiwords
{e2} a s1, l1, h1 {o4} shufb h5, s5, s5, m_hiwords
{e2} a s2, l2, h2 {o4} shufb h6, s6, s6, m_hiwords
{e2} a s3, l3, h3 {o4} shufb h7, s7, s7, m_hiwords
{e2} a s4, l4, h4 {o6} lqx r2, base0x20, index
{e2} a s5, l5, h5 {o6} lqx r3, base0x30, index
{e2} a s6, l6, h6 {o6} lqx r4, base0x40, index
{e2} a s7, l7, h7 {o6} lqx r5, base0x50, index
{e2} a s0, s0, s1 {o6} lqx r6, base0x60, index
{e2} a s1, s2, s3 {o6} lqx r7, base0x70, index
{e2} a s2, s4, s5 {o6} lqx r8, base0x80, index
{e2} a s3, s6, s7 {o6} lqx r9, base0x90, index
{e2} a s0, s0, s1 {o6} lqx ra, base0xa0, index
{e2} a s1, s2, s3 {o6} lqx rb, base0xb0, index
{e2} a acc, acc, s0 {o6} lqx rc, base0xc0, index
{e4} sumb s0, r0, r1 {o6} lqx rd, base0xd0, index
{e2} a
People told you to walk away? Jeez... What piece of advice would you give yourself now? Walk away? Dude, as much as I hate people blaming "someone else", or "something that happened to me" - I have to tell you - Stand up for yourself! They picked on you because you let them! Kids are weak. Size has nothing to do with anything. A bigger kid kicked the shit out of you? Revenge it! Everyone has a weak spot. I'm sorry, I'm just sick of this "oh, poor me, my parents were terrible to me, and that's why everything goes wrong with me". Change things! Roggie.
I'm a nerd - my wife will testament to that. I was also a jock, though no longer (my belly will attest to that). And I was a brute - my fellow classmates will agree on that.
:-)
I would shout "what did you get on your math test, Bob?" for everyone to hear. "A" was the reply. "What?!? Just an A???". My class mates hated that.
Why didn't anyone kick the shit out of me? Because I was smarter than them, because I was stronger and bigger than them and because I didn't act like a loser. "Wanna fight me? Better bring all your buddies, if you have some."
I was popular among the girls - not so much with the boys, but a lot of the kids liked the fact that I didn't care who they were (in the pecking order). In fact, I treated everyone pretty much the same - hmm, well I was rather mean to a couple of bullies... but they deserved it.
I really don't think it matters if you are a nerd or not. You just have to be a little bit smarter and unafraid. The best advice I got through this time was that all kids are really just insecure and afraid little monsters. With this in mind you should be able to put your social life in the right place so that you can forget about it and study/read whatever you want.
I will never forget this classmate of mine telling me she really started studying when I had snorted: "Hah, you will _never_ get an 'A' in maths." She got so pissed off she started reading - and guess what - she succeded.
Ah well, the States might be different from my home country...
Roggie.