Slashdot Mirror


User: Aart+Bik

Aart+Bik's activity in the archive.

Stories
0
Comments
6
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 6

  1. Re:Vectorization example on Open Watcom 1.0 Released · · Score: 1

    You may want to try adding the option -Qipo (if you have not done so already), since this can help C++ code that makes heavy use of templates (and MSVC does a -ip equivalent under -O2). If you are willing to contact me personally and provide me with some more details, I may be able to help you improve the performance of your application(s).

  2. Re:Vectorization example on Open Watcom 1.0 Released · · Score: 1

    I regret to see that again you try to change the subject by presenting the results of an application without vectorizable loops (and at the very least you could have provided some more information on the used compiler switches and target architecture). If you call someone's work a "joke", but subsequently are not willing to backup your claim due to time constraints, then I can only hope that other readers will take your claim for what it really was... If you ever have real issues with vectorization, please do not hesitate to contact me directly. Aart

  3. Re:Vectorization example on Open Watcom 1.0 Released · · Score: 1

    Seriously, please try one of our latest versions! Version 7.0 has no problem with vectorizing this loop. [C:/cmplr/temp] icl /Fa /QxW sl2.cpp sl2.cpp(4) : (col. 1) remark: LOOP WAS VECTORIZED. Due to the large constant, the compiler (in combination with some alignment optimizations) is also able to see that a so-called streaming store is useful to minimize cache pollution: ... Back: movntps XMMWORD PTR [ebp+edx*4], xmm0 movntps XMMWORD PTR [ebp+edx*4+16], xmm0 add edx, 8 cmp edx, ecx jb .B1.8 ... How much speedup is actually obtained due to vectorization heavily depends on the context in which this fragment is used in your application.

  4. Vectorization on Open Watcom 1.0 Released · · Score: 1

    Since you called Intel's vectorization a "joke", I was specifically interested in examples where the Intel compiler fails to vectorize loops.
    Incidentally, your example contains one initialization loop that is nicely vectorized (so I rest my case)

    => icl sl.cpp ...
    sl.cpp(28) : (col. 1) remark: LOOP WAS VECTORIZED.

    I realize that this does not address your performance concerns (which we can discuss offline), but your example did not provide what was requested.

  5. Re:No, actually on Open Watcom 1.0 Released · · Score: 1

    No it is "a", not "the" although seeing Intel's vectorization being called a "joke" became rather personal :-) As for any future plans, I am the wrong person to ask. I am just vectorizing loops here....

  6. Re:No, actually on Open Watcom 1.0 Released · · Score: 1

    Would you mind sharing some examples of code where the Intel compiler misses obvious opportunities for vectorization, since I find your claim rather strong (also considering the fact that you have not even tried version 6.0 yet; verions 7.0 is already out now)? A recent article with programming guidelines for vectorizing compilers that you may find useful can be found at: http://www.cuj.com/articles/2003/0302/0302c/0302c. htm?topic=articles Privately I maintain a web page with some more in-depth information on vectorization for SSE/SSE2. See: http://www.aartbik.com -- Aart Bik, Senior Staff Engineer, Intel Corporation email: aart.bik@intel.com