Caktus · Slashdot Mirror

Re:CBE Performance on Cell Hits 45nm, PS3 Price Drop Likely to Follow · 2008-02-07 11:15 · Score: 1

Also, it seems to take a lot of select, shift, and shuffle instructions to make efficient use of the quadword (SIMD) instructions. With Xeon and Opteron, use of the quadword instructions seems to require far fewer other additional cycles. These instructions are only required for scalar code. Vectorizable code does not generally require additional selects, shifts and shuffles, unless the compiler can't guarantee that the data is aligned. However, since all instructions are SIMD, for scalar code, the compiler has to emit those additional instructions in order to mask the effect of loads and stores and the rest of the instructions being SIMD. This makes it slow and bloated for scalar code, but not for SIMD code.

Re:"Cell" on Supercomputer On-a-Chip Prototype Unveiled · 2007-06-29 23:55 · Score: 1

Well, my point was that you were not clearly separating what happens inside a chip and what happens between chips. For example, the EIB bandwidth figures you were mentioning are for the aggregate for the whole chip. Chip to chip communication has much less bandwidth.

About unified memory, PPU to PPU is coherent (plus load/store queues), but inside the chip the instructions are not running on the same memory space unless you do some tricks and then it still not a unified memory architecture. This is one of the main benefits of this architecture, SPUs operate on local data, without the effects of false sharing, ping pongs, etc.

Re:"Cell" on Supercomputer On-a-Chip Prototype Unveiled · 2007-06-29 11:32 · Score: 1

I think that you are mixing chips and cores together.

And the Cell is designed for scalable multicore/chip parallelism.

The Cell is an heterogeneous multicore design with very good bandwidth between its cores, but that does not mean that it has been designed for scalable multichip parallelism. In fact there is a paper that shows that the bandwith between chips is not that great.

Its main magic is its coherent, superfast "elements" bus, which retains coherency even at 1.6Tbps across multiple cores and chips.

Again, the Element Interconnect Bus has quite a lot of bandwith, but it is only available between the cores of a single chip. Interchip communication must be performed through the IO port, which has much less bandwidth.

IBM has 4-core chips in pairs already deployed in public, and 128-core chips in the lab, where a massive new top-predator supercomputer is being built on the new architecture.

That's interesting. Could you provide a link to that information, please?

The Cell has builtin allocation facilities, so app code doesn't have to schedule or otherwise closely manage the fast SPEs, just send tasks to a generic pool.

The Cell does not have those facilities in hardware. All that is implemented in software.

Which SPEs just DMA into a unified memory model.

That is a bit confusing. The PPE operates on main memory and it is accessible to the SPEs, but only through DMA operations. They operate on their own memory (Local Store in the literature). I consider that a non unified model.
Nevertheless, this model can be altered by memory mapping the SPE Local Stores onto the memory of the PPE. But that still does not allow the SPEs to operate directly on main memory.

That kind of simplicity makes Cell programming harder than, say, PowerPC programming, but much easier than other parallel programming, without losing its speed. Once there are some basic libraries for programming "common" new parallel tasks on the Cell, it won't be considered any harder than it was to program x86 "Protected Mode", Extended vs Expanded Memory, word alignment, etc.

I think that in general, programming for the Cell is much more complicated than programming for an SMP, and even in some cases MPI.

There is very few storage on the SPE side, which must be shared by the code, the data and the stack.
The SPEs do not have memory protection on their Local Store, which means that smashing your data or code with the stack is not detected and handled automatically.
The SPEs have a pure vector ISA, which forces the programmer to vectorize the SPE code in order to obtain good performance. In fact having a pure vector ISA forces the compiler to emmit lots of additional instructions (rotating and masking) for non vectorized code (compared to scalar ISAs), making the LS space limitations toughter.
The PPU, although multithreaded, is not as powerful as a traditional PPC (e.i. no OoO execution), which in practice means that you cannot spend too many cycles on scheduling work for the SPEs, otherwise your SPEs will be starved.

Without the help of tools and libraries that hide those low level details from the programmer, programming the Cell can be quite hard.
I think that programming any non embedded processor should be simple and for that reason libraries, compilers and other tools are going to be as important for the Cell processor as the compiler is for Itanium.

Re:Fantastic on Intel Pushes Back with Xeon 5100 · 2006-06-29 09:01 · Score: 1

Wouldn't it make more sense if he asked if it was Celsius or Kelvin?

Re:Nice summary. on Microsoft's Bold Patent Move · 2005-08-12 04:38 · Score: 1

But then, would you trust the summary or would you read the patent application to be sure?

Re:timely and focused PR on Building The MareNostrum COTS Supercomputer · 2005-02-16 10:35 · Score: 1

This is an all in one reply to several posts.
First of all, the comments in this post are my personal comments and not the comments of the parts involved (IBM, BSC and the Spanish Government).
The final destination has always been Barcelona, but they put the machine together in Madrid because the final building was not ready in time for running the Top500 benchmark. Even then, they didn't have enough time to set up all the nodes and then the result in Top500 had to be done with less nodes than the fully assembled machine. I believe that SGI had also submitted data the machine before it was fully assembled, but sent the results of their full machine after the deadline and got the result accepted. When the final building was ready in Barcelona, they moved the machine to its final destination.
Refering to the limits of scalability, I think that having such a configuration presents new challenges for the computer science researchers that work for the center. Having such a machine at our dispossal will provide us with a very interesting oportunity to improve the scalability of our parallelization techniques.
Regarding the memory configuration, the login nodes have 4GB of RAM, and I believe the rest of the nodes have the same configuration.
An finally, the file systems are currently mounted using NFS, but it is expected that soon they will change to GPFS.

Re:vaporware? only for now. it's the right step. on Microsoft Renovates Office Suite as a Web Service · 2004-08-22 07:04 · Score: 1

I can already see which ISP will host their servers giving their clients optimum performance.

I think the full plan has been layed out by now.

Re:Build dll's using cl.exe on Free Optimizing C++ Compiler from Microsoft · 2004-04-18 04:03 · Score: 1

DLL Dynamic Link Library
SO Shared Object
LIB Library
A Archive

Re:Gamers are Awful on On Gay Characters In Videogames · 2004-03-18 09:09 · Score: 1

I also know bisexual people that do that, and homosexual people that laugth at the "jokes". But they also pretend that they are not gay, talk about their "girlfriends", and so on.
I call them hypocrites. Being myself a gay man, they make me sick. I think that tons of gay bashing come from gay people who are ashamed of themselves. That's tragic. The world hasn't changed that much as people want you to believe.

Silent Mice for Silent PCs on Silent Mice for Silent PCs? · 2003-12-13 03:16 · Score: 4, Funny

My mouse, on the other hand, makes a very audible *click* each time I use it, and while providing a pleasant tactile feedback, it keeps my girlfriend awake during my late-night work sessions.

You dont't snore, do you?

Re:Use open grid computing standards instead on Enter Warriors In GridWars Interactive · 2003-10-21 09:14 · Score: 1

BTW, I vote "between 0m4.000s and 0m3.001s".

Re:Use open grid computing standards instead on Enter Warriors In GridWars Interactive · 2003-10-21 09:11 · Score: 1

In my experiece globus is very slow even when there is no file transfer.
I propose a new slashdot poll:

time globus-job-run machine /bin/true

less than 0m1.000s
between 0m2.000s and 0m1.001s
between 0m3.000s and 0m2.001s
between 0m4.000s and 0m3.001s
between 0m5.000s and 0m4.001s
between 0m10.000s and 0m5.001s
between 0m30.000s and 0m10.001s
more than 0m30.000s

Re:please create on SCO Targets US Government, TiVo · 2003-08-06 19:53 · Score: 1

Bad idea. Do you want to give them a reason to sue slashdot for trademark infringement?

Misleading story title on Animal Crossing+ Japanese Details Revealed · 2003-06-17 08:04 · Score: 1

Am I the only one who understood that title as Details on a crossing between an animal and a japanese?

Re:An infinite loop is not a bug in the applicatio on HTML Rendering Crashes IE · 2003-05-02 22:36 · Score: 1

Evei if it is a bug in he document, the browser should never crash.

Re:Why BitTorrent? on Mozilla and BitTorrent? · 2003-04-29 04:53 · Score: 1

OK, so it's a protocol, like say ftp or http, but different. So it seems, as per the bugzilla discussion, that the problem should be solved by creating a mozilla plugin to handle URL's written torrent://domain.name/localpath/file.torrent .

The real problem is that it doesn't use just one protocol, it uses two protocols. The first protocol is the traditional protocol (http, ftp, email, whatever) you use to download the dot torrent file that contains the description of where to get the proper file. The second protocol is the real torrent protocol by itself. Using a single URL for two different protocols is not very clean IMHO.

I can think of three solutions: a) eliminating the first protocol by putting the necessary data in the URL, e.x. torrent://server/enough_data_to_begin_the_transfer , b) always assuming that a torrent://server/path/file.torrent URL will allways be downloaded using http or whatever fixed protocol, and c) letting a plugin or other application handle the dot torrent file.

Re:So what? on Crack Windows XP With... Windows 2000 · 2003-02-16 01:05 · Score: 2, Insightful

Don't use the password as encrypting key, just have the encrypting key in a file encrypted using your password.

Re:horror, horror, look at the keyboard! on New Tadpole SPARCbook RSN · 2002-11-02 00:39 · Score: 1

Or how about the key between Fn and Alt? Yes, that's the infamous "diamond key".

Do you mean the meta key?

Re:A bad omen for the NBMers on Microsoft Anti-Trust Rulings Due Tomorrow · 2002-10-31 20:17 · Score: 2, Insightful

Please, dear moderators, moderate the parent post up. I'm sick of seeing that all posts that are +5 are "Funny". Please, give a chance to interesting and insightful posts.

The moderation ability of a person is based on how his posts have been moderated. When most +5 posts are "funny", we get moderators that just know what funny is, not what insightful or interesting is.

Please, remeber that this is not segfault.org.

Re:Reiser4 on Linux 3.0 · 2002-10-21 08:06 · Score: 1

I realise that treating the contents of files using simple comands and virtual directories is very interesting, but it has very important consequences.

What happens to old applications? They should work as before with no changes. Imagine the result of creating a tar archive if it didn't behave as before.

For which file types will reiserfs4 will be capable to create virtual directories to access their content? Or put it another way, when do we stop?

How is it implemented, as kernel modules or a a userspace library? If implemented as kernel modules, then they will bloat the kernel or it will be wasting much time and fragmenting memory loading and unloading modules. If implemented as a library, why not implement it fully in userspace, independently of the choosen FS?

Is there a userspace alternative? For many cases the answer is yes. For example, the password file does not have to be a plain text file. It can be a db file if the system is configured accordingly.

IMHO, accessing the contents of files using virtual directories has no advantadge over userspace solutions. If a file requires database like functionality, it should be in a db or equivalent format. Implementing that functionality at the kernel level is just overkill and doesn't provide any benefit, just added complexity. What's missing (or I am missing ;) is the proper tools to deal with those files with the ease of the exampple given before.

Re:Why aren't Oopses dumped to swap? on Linux 3.0 · 2002-10-21 06:52 · Score: 1

Oopses are not normally dumped to swap because it may not be available or reliable in that state. In fact you could produce more corruption. I think that the best solution is to leave things as they are, put a mark in some place in memory, reboot and let the boot code deal with it (assuming it is in a sane state after the reboot).

Re:The change I want to see... on RandR Support on XFree86 4.3 · 2002-10-20 23:00 · Score: 1

You can do that. I think that the requirements are higher that what you suspect. Look at this page:

The server software runs on UltraSPARC servers supported by the Solaris 2.6, Solaris 7, or Solaris 8 Operating Environments. The suggested server configuration for most installations includes at least two processors, about 25 active sessions per CPU, 20-40 Mbyte random access memory or more for each active session, and about 50-100 Mbyte of swap space per session.

IMHO it is not cost effective for most cases.

I don't think that the load balancing is possible yet.

Re:The anti-pro-X debate is missing the point! on RandR Support on XFree86 4.3 · 2002-10-20 22:20 · Score: 3, Interesting

But neither one will get any design awards

Of course the x86 won't get a design award. The x86 wasn't created with extensibility in mind. It has been handicaped from the very beginning. It wasn't designed thinking that they could use more registers in the future, or that it could end up using any register for any purpose. In contrast, the X system has been designed for extensibility, network transparency, multiuser systems and isolation from the kernel.

Extensibility allows adding functionality to the system. The common example is the Renderer extension, but that is just a small example. They could have been created a widget set as an extension to reduce network traffic (not that it would be a good idea). The problem with extensions is not the proper extensions but standarisation. A non standarised extension is useless.

Network transparency allows to use any machine (that uses X) from a single location. You can have a desktop with several apps from different machines. You can move to another location and use the same machine you used before.

Multiuser systems allow various users to be logged into the same machine at the same time without interfering one to the other except for some level of resource competition. This allows to reduce the number of systems to be configured and mantained.

Isolation from the kernel allows to execute the X server in a separate process. The implications of this are that if for any reason there is any operation that will cause a crash, it will only crash the X server and not the entire operating system.

I understand that these features are of no use to the average user of a computer. In the other hand they are completely transparent to the user. I like it's design very much. What problems do people find in X?

Re:unfortunately on RandR Support on XFree86 4.3 · 2002-10-20 21:32 · Score: 1

most people do need the ability to change resolution and color depth on their desktops easily

Why? I configured my screen resolution and depth when I installed the OS. Why should I need to change it again?

This is not intended to be a troll. I just don't get it.

Re:The change I want to see... on RandR Support on XFree86 4.3 · 2002-10-20 21:28 · Score: 1, Redundant

Sun Ray terminals are similar in concept to VNC terminals (if they existed). There is a server to which the terminals connect using a propietary protocol. The X server runs on the server and the terminal is just a framebuffer with keyboard, mouse, USB sockets and audio. When you start a session you start an X server. When you switch terminals you disconnect from your X server and reconnect with another terminal. This method requires a dedicated *monster* server that has enough memory for each frame buffer, has enough cpu power to draw into the frame buffers and has dedicated networks with enough bandwith to the terminals.

What would be really interesting is migrating aplications from one X server to the other transparently just using the X protocol. Without dedicating a server to the task, or having to migrate the entire session.

Slashdot Mirror

User: Caktus

Comments · 68