For a while I've wanted to use a crusoe chip in various little embedded devices, simply because they're so easy to run fanless. And I know there are companies that make small form factor (5.25" or smaller) boards like these. Problem is, they're impossible to order unless you're getting them in huge quantities. These things could be a hobbiest's dream, much better (lower heat, smaller) than the mini-itx C3 boards a lot of people are using, you just can't find them. I've only found one place that sells them, and the prices they charge are ridiculous. Easily twice what an equivalent C3 board costs (only the crusoe has a tiny little heat sink while the C3 has a heat sink that spans the whole board).
Does anyone know where you can find these things for a reasonable amount of money?
This doesn't do anything on servers. He's referring to websites putting a link to a java appplet in their pages. This applet does computations on the client side.
I've been very happy with Suse on my opteron system, but there's one thing that keeps a 32 bit installation on another partition. ATI, though they've made several press releases about how they "fully support the opteron", has not felt the need to release 64 bit versions of their drivers (either for linux or windows), and the open source radeon driver doesn't support the radeon 9700 pro that I own. I'm almost tempted to get an nVidia card simply because they have 64 bit drivers, even though this generation of cards just isn't as good as ati's.
Sigh, this comes up every time someone mentions transmeta. Yes the "translator code" (its acually called Code Morphing) is cool. Yes it takes x86 and converts it to the crusoe's native instruction set which is actually a 4 way vliw processor. No that was not done to run multiple instruction sets. That was done so that some of the complexity of the chip was done in software instead of silicon, making the chip smaller and less power hungry. In fact they've repeatedly said that while its theoretically possible to code morph other instruction sets, they've designed the underlying, real instruction set to effectivly run x86 code. Just in a simple and more efficient manner. The whole hype about multiple instruction sets was from people speculating about what could be done with this cool new code morphing thing, and then others looking at the comments assuming it was already planned. Transmeta themselves never contributed to that hype in the slightest.
Most distributions do. With redhat you can subscribe to the redhat network, and with debian, its package manager, apt-get has this built in. Both of these however are dependant on the distro maintainers actually putting the new version in, and resolving dependancy issues that might arise.
On the other hand, unlike windows update you don't need to reboot every time you update something like this (the only time you ever need to boot is if you update the kernel).
Re:Shoehorn
on
Java vs .NET
·
· Score: 2, Interesting
Just more capacity. The NT4 box was just for convenience, it was what the developers had on their desks. The linux boxes were our first production environment, but turned out not to be powerful enough to handle everything when we got more traffic and the app got more complex capabilities. When we went to the e6500s it was actually a mixed environment. We had some things running on those e6500s, some things stayed on the linux boxes. Mind you, this could have been avoided by better planning, but java's portability let us do a bit of exploring first.
Re:Shoehorn
on
Java vs .NET
·
· Score: 5, Interesting
I'll agree with client side stuff, but I've been in a situation in the past where we developed a server size web app (in J2EE) on a single nt4 server (purely for ease of access). Then when we went gold, capacity grew and it moved to a cluster of linux servers. Then when the business picked up and capacity grew in large amounts we moved to a couple Sun E6500s..NET wouldn't have allowed us to migrate like this.
That's really not what he means. This is using a completely non unix oriented system (a mainframe) running a VM (which is not an emulator, virtualization is built into mainframes) to run many instances of linux (which isn't emulated either, linux runs natively on mainframes).
Top (and ps and anything that uses the standard psacct method of getting process information) reports a program's memory usage as all the memory it has mapped. For one thing, in a heavily multithreaded app like nautilus, each thread looks like it has its own memory map. Too bad they all actually share every bit of memory. Second, some pieces of read only memory are mapped to multiple processes, most notably libraries. When a program loads a shared library it maps it as read only memory. But since read only memory isn't going to change, the next program that uses that library just has that same block of memory added to its memory map. And gnome has a ton of libraries. Most apps only use one or two, but nautilus makes heavy use of almost all. So its sharing a bit of memory with all the gnome apps on your system. Combine that with the fact that it probably speeds up its graphics by using alot of X11 shm pixmaps (blocks of memory shared between an app and the X server, to speed up communication) you've got even more memory showing up in two places (not so coincidentally, this is one of the main reasons everyone thinks X is bloated. It's actually sharing lots of memory with every app using it). Altogether, top just isn't all that accurate when it comes to memory usage.
Your "Is -mmmmx and such worth it?" guide is a little unfair. Thing is, you notice that only -O3 really made much of a difference. Well, that's because each of your tests is just one big loop and -O3 does heavy loop unrolling. You basically chose the absolute optimal case for -O3 to win (besides possibly having a small function call inside that loop so that -O3 could inline it). And you didn't do any floating point multiplies or divides which is where -mmmx+sse and -m3dnow would help you.
If anything you were also using a relatively small dataset. If you get a large enough data set (or code size) -O3 might actually hurt you (loop unrolling and function inlining will bloat both code and data size and make it much more likely to have a cache miss).
Anyways, synthetic benchmarks are one thing but your is so synthetic as to be rediculous.
That would be true, but we're not moving from a 32 bit bus to a 64 bit bus. Both the G4 and the ppc970 (as well as x86) already had 128 bit busses. The 64 bit nature of the processor isn't dependant on this.
No. You don't transfer twice as much data. You do integer operations on 64 bit integers instead of 32 bit integers. Which is nice if you need to work with large numbers. An add of a number > 2^32 can be done in 1 clock cycle, as apposed to several with a 32 bit number (you have to emulate it in software with multiple 32 bit integers). However the vast majority of numbers most programs will ever have to work with aren't > 2^32. So for all of those it is absolutely no benefit. In fact its almost a detriment, because for all of those number 2^32 you're still fetching 8 bytes from main memory, unlike the 32 bit machine which is only fetch 4 bytes. The faster fsb might help, but only in some cases.
Basically for those applications that could use the extra precision in calculations (engineering, possibly some high end video and graphic work), the 64 bits are immensely usefull. For those that don't need it (99.9% of the apps the average consumer would ever use), the best you can hope for is that the faster fsb will mitigate the cost incurred by its larger fetches.
As far as I know, pop is a mostly midwest thing. I came from DC (where everyone says soda) and moved to the northwest where theres kind of a competition between midwest influence making people say pop and california influence making people say soda.
Except for species. We do have a hard definition of a species, and thats any group of life forms that can reproduce together (at least with sexually reproducing forms, not sure about asexually reproducing ones).
The one other issue I've heard of is that since acceleration is implemented in userspace, the server can't block on a hardware interupt and so ends up doing a bit more busy waiting than it should. Though this is only an issue on a machine with a pegged cpu.
A "direct rendering" method as you describe it is where the functionality to draw a specific graphic object is in the application (or in a library linked to the application) and all the system does is map video memory into the application's address space for that application to do bit by bit operations on. In a case like 3d rendering, where everything is a completely custom graphic, and the only predefined operation is transforming of a polygon description into a rastered image, which is done by a few very low level syscalls.
Windows doesn't use this method. Windows has the entire widget set in the kernel (or the hal or whereever they draw the line these days), and there is a syscall for each individual widget.
The idea behind X was to figure out just how high you need to make 2d primitives to have them be accelerated, and draw the line right there. So there is a call (in the form of an X protocol request) for each accelerated primitive like draw line and bitblt, and widgets are made up of these and stored in application side libraries.
And no, local X does not go over a "essentially a network protocol". Unix domain sockets are extremely simple and extremely fast. When an X application makes a request, it sends X protocol information over the socket by packing a request data structure and writing that structure to the socket. Since its a local socket, the kernel just memcpys that data structure into a buffer in the X server which is waiting on a read. No network stack, no network hardware, no processing. Just a memcpy. And as for the formatting of the data structure, a shared memory system would do that too. In a shared memory system, an app would pack a data structure into some piece of shared memory and then trigger a semaphore. The display server/framebuffer system/whatever would then memcpy that data structure out and interpret the request. There really is almost no difference.
Please stop bashing X until you've looked at how it actually works.
There is no system for any media player where the same plugin binary works on multiple processors. You'd need something akin to java to do that and java is too slow to write a codec out of.
For a while I've wanted to use a crusoe chip in various little embedded devices, simply because they're so easy to run fanless. And I know there are companies that make small form factor (5.25" or smaller) boards like these. Problem is, they're impossible to order unless you're getting them in huge quantities. These things could be a hobbiest's dream, much better (lower heat, smaller) than the mini-itx C3 boards a lot of people are using, you just can't find them. I've only found one place that sells them, and the prices they charge are ridiculous. Easily twice what an equivalent C3 board costs (only the crusoe has a tiny little heat sink while the C3 has a heat sink that spans the whole board).
Does anyone know where you can find these things for a reasonable amount of money?
This doesn't do anything on servers. He's referring to websites putting a link to a java appplet in their pages. This applet does computations on the client side.
I've been very happy with Suse on my opteron system, but there's one thing that keeps a 32 bit installation on another partition. ATI, though they've made several press releases about how they "fully support the opteron", has not felt the need to release 64 bit versions of their drivers (either for linux or windows), and the open source radeon driver doesn't support the radeon 9700 pro that I own. I'm almost tempted to get an nVidia card simply because they have 64 bit drivers, even though this generation of cards just isn't as good as ati's.
Sigh, this comes up every time someone mentions transmeta. Yes the "translator code" (its acually called Code Morphing) is cool. Yes it takes x86 and converts it to the crusoe's native instruction set which is actually a 4 way vliw processor. No that was not done to run multiple instruction sets. That was done so that some of the complexity of the chip was done in software instead of silicon, making the chip smaller and less power hungry. In fact they've repeatedly said that while its theoretically possible to code morph other instruction sets, they've designed the underlying, real instruction set to effectivly run x86 code. Just in a simple and more efficient manner. The whole hype about multiple instruction sets was from people speculating about what could be done with this cool new code morphing thing, and then others looking at the comments assuming it was already planned. Transmeta themselves never contributed to that hype in the slightest.
Most distributions do. With redhat you can subscribe to the redhat network, and with debian, its package manager, apt-get has this built in. Both of these however are dependant on the distro maintainers actually putting the new version in, and resolving dependancy issues that might arise.
On the other hand, unlike windows update you don't need to reboot every time you update something like this (the only time you ever need to boot is if you update the kernel).
Just more capacity. The NT4 box was just for convenience, it was what the developers had on their desks. The linux boxes were our first production environment, but turned out not to be powerful enough to handle everything when we got more traffic and the app got more complex capabilities. When we went to the e6500s it was actually a mixed environment. We had some things running on those e6500s, some things stayed on the linux boxes. Mind you, this could have been avoided by better planning, but java's portability let us do a bit of exploring first.
I'll agree with client side stuff, but I've been in a situation in the past where we developed a server size web app (in J2EE) on a single nt4 server (purely for ease of access). Then when we went gold, capacity grew and it moved to a cluster of linux servers. Then when the business picked up and capacity grew in large amounts we moved to a couple Sun E6500s. .NET wouldn't have allowed us to migrate like this.
That's really not what he means. This is using a completely non unix oriented system (a mainframe) running a VM (which is not an emulator, virtualization is built into mainframes) to run many instances of linux (which isn't emulated either, linux runs natively on mainframes).
Its not "one of the original pure OO languages", its THE original OO language. OO as a concept was demonstrated by making smalltalk.
Top (and ps and anything that uses the standard psacct method of getting process information) reports a program's memory usage as all the memory it has mapped. For one thing, in a heavily multithreaded app like nautilus, each thread looks like it has its own memory map. Too bad they all actually share every bit of memory. Second, some pieces of read only memory are mapped to multiple processes, most notably libraries. When a program loads a shared library it maps it as read only memory. But since read only memory isn't going to change, the next program that uses that library just has that same block of memory added to its memory map. And gnome has a ton of libraries. Most apps only use one or two, but nautilus makes heavy use of almost all. So its sharing a bit of memory with all the gnome apps on your system. Combine that with the fact that it probably speeds up its graphics by using alot of X11 shm pixmaps (blocks of memory shared between an app and the X server, to speed up communication) you've got even more memory showing up in two places (not so coincidentally, this is one of the main reasons everyone thinks X is bloated. It's actually sharing lots of memory with every app using it). Altogether, top just isn't all that accurate when it comes to memory usage.
Your "Is -mmmmx and such worth it?" guide is a little unfair. Thing is, you notice that only -O3 really made much of a difference. Well, that's because each of your tests is just one big loop and -O3 does heavy loop unrolling. You basically chose the absolute optimal case for -O3 to win (besides possibly having a small function call inside that loop so that -O3 could inline it). And you didn't do any floating point multiplies or divides which is where -mmmx+sse and -m3dnow would help you.
If anything you were also using a relatively small dataset. If you get a large enough data set (or code size) -O3 might actually hurt you (loop unrolling and function inlining will bloat both code and data size and make it much more likely to have a cache miss).
Anyways, synthetic benchmarks are one thing but your is so synthetic as to be rediculous.
That would be true, but we're not moving from a 32 bit bus to a 64 bit bus. Both the G4 and the ppc970 (as well as x86) already had 128 bit busses. The 64 bit nature of the processor isn't dependant on this.
No. You don't transfer twice as much data. You do integer operations on 64 bit integers instead of 32 bit integers. Which is nice if you need to work with large numbers. An add of a number > 2^32 can be done in 1 clock cycle, as apposed to several with a 32 bit number (you have to emulate it in software with multiple 32 bit integers). However the vast majority of numbers most programs will ever have to work with aren't > 2^32. So for all of those it is absolutely no benefit. In fact its almost a detriment, because for all of those number 2^32 you're still fetching 8 bytes from main memory, unlike the 32 bit machine which is only fetch 4 bytes. The faster fsb might help, but only in some cases.
Basically for those applications that could use the extra precision in calculations (engineering, possibly some high end video and graphic work), the 64 bits are immensely usefull. For those that don't need it (99.9% of the apps the average consumer would ever use), the best you can hope for is that the faster fsb will mitigate the cost incurred by its larger fetches.
As far as I know, pop is a mostly midwest thing. I came from DC (where everyone says soda) and moved to the northwest where theres kind of a competition between midwest influence making people say pop and california influence making people say soda.
Its a badly written line of C. Anyone who insists on one letter variables (and one letter structs with one letter members even!) should be shot.
They're referring to the fact that IBM promoting running linux on its mainframes lately.
Except for species. We do have a hard definition of a species, and thats any group of life forms that can reproduce together (at least with sexually reproducing forms, not sure about asexually reproducing ones).
A fellow seattlite I presume?
No I am referring to those optimizations, but remember, the X server is in userspace. It accesses the video card by mmapping /dev/mem.
The one other issue I've heard of is that since acceleration is implemented in userspace, the server can't block on a hardware interupt and so ends up doing a bit more busy waiting than it should. Though this is only an issue on a machine with a pegged cpu.
Oops. I wasn't very awake when I wrote that. Things like that tend to slip by me when I'm tired.
sendmail is port 25. Port 80 is http.
A "direct rendering" method as you describe it is where the functionality to draw a specific graphic object is in the application (or in a library linked to the application) and all the system does is map video memory into the application's address space for that application to do bit by bit operations on. In a case like 3d rendering, where everything is a completely custom graphic, and the only predefined operation is transforming of a polygon description into a rastered image, which is done by a few very low level syscalls.
Windows doesn't use this method. Windows has the entire widget set in the kernel (or the hal or whereever they draw the line these days), and there is a syscall for each individual widget.
The idea behind X was to figure out just how high you need to make 2d primitives to have them be accelerated, and draw the line right there. So there is a call (in the form of an X protocol request) for each accelerated primitive like draw line and bitblt, and widgets are made up of these and stored in application side libraries.
And no, local X does not go over a "essentially a network protocol". Unix domain sockets are extremely simple and extremely fast. When an X application makes a request, it sends X protocol information over the socket by packing a request data structure and writing that structure to the socket. Since its a local socket, the kernel just memcpys that data structure into a buffer in the X server which is waiting on a read. No network stack, no network hardware, no processing. Just a memcpy. And as for the formatting of the data structure, a shared memory system would do that too. In a shared memory system, an app would pack a data structure into some piece of shared memory and then trigger a semaphore. The display server/framebuffer system/whatever would then memcpy that data structure out and interpret the request. There really is almost no difference.
Please stop bashing X until you've looked at how it actually works.
People keep mentioning that link. What they fail to realize is that both of these articles were written by the same person.
There is no system for any media player where the same plugin binary works on multiple processors. You'd need something akin to java to do that and java is too slow to write a codec out of.