Yup, bug it's not specific keywords... it's the "Mozilla Application Suite" product (see list here). It may get changed to SeaMonkey when someone at the Mozilla Foundation has time to reorganize the Mozilla Application Suite components into the SeaMonkey component setup we would like.
The "Mozilla Suite" under that name is no more... the Mozilla Foundation isn't doing any more releases (well, security updates to 1.7, but that's all). However, a community group is continuing its development under the name SeaMonkey. It contains all the core improvements that went into Firefox 1.5 (pretty error pages, svg, canvas, performance improvements) and some new features of its own. Not all changes to Firefox go into the suite - SeaMonkey doesn't aim to be exactly like Firefox.
If you're interested in it, we'll be shipping 1.0 alpha very soon now (based on the code that would have been Mozilla 1.8 beta4), and nightlies are available here (you want the -mozilla1.8 directories at the bottom). We're hoping to ship within the next week or two (it's just an installer bug that we need to fix before release).
A type of automatic pointer (nsCOMPtr) is used that handles refcounting. Unfortunately, there are still situations where a manual addref is required; these are some of the kinds of things that can lead to leaks. The JS engine uses a normal mark & sweep garbage collection algorithm.
Register renaming helps (otherwise Intel wouldn't have invented it). I don't recall who first implemented it, but the MIPS R10000 implemented it about a year before Intel did in the PPro.
Sure, Intel may have originally hoped to migrate the world to IA64, but given the wild success of AMD64 in bringing 64-bit to the x86 world, it doesn't look like that's happening. The Itanium chips Intel is releasing are obviously not aimed at tasks that could be handled by a 386 with some SCSI drives ("fax server"? a file server?)... who is going to use a multi-thousand-dollar CPU for anything other than database|web|high-end server anyway?
I've been playing with coLinux today (I run Windows, but the SeaMonkey Project needs nightly Linux builds for users to test, and our resources are pretty limited) and I have to say, it's pretty cool. I recompiled gcc in less than 30 minutes, and it doesn't feel slow at all to use. It's significantly faster than the cygwin process that's used to build Windows versions of SeaMonkey (and Firefox).
I was thinking it might be possible to set up some sort of coLinux-based package which lets people run gnome, openoffice.org, gaim, etc. You can access your host system's data, which means the "where are my mp3s?" type things won't be a problem. You are running native binaries (I'm running an ordinary debian setup), so you get the exact look and feel you'd get if you were running linux. You can install packages just like you would.
At this point, there are some challenges with sound, and you need an X server or vnc client on the Windows host for GUI apps, but the package could take care of all that stuff so users don't have to worry about it.
You could have various levels of transition - ranging from launching apps from the start menu (I can think of easy ways to get that sort of integration), all the way to hiding explorer and using gnome full time, with Windows skinned to look like GNOME for any Windows apps the person still uses. The end result should be users who aren't afraid just because Linux desktops look different, and aren't worried they'll have to hack text files or learn the commandline (things you shouldn't have to do in any decent end-user-oriented distro).
TFA claims the new chips will be in PCs in 18 months - given the incredibly long design times of modern processors, that means they've probably been working on it for at least a couple years.
If you use Mozilla (well, "SeaMonkey" now), you can do something similar through the UI - the cookie manager lets you do things like "disallow persistent cookies for all sites except ". It's a lot more convenient than changing file permissions and editing text files. Plus, since you can still have per-session cookies, stupid sites that depend on them don't break.
I don't know how much power is dissipated in the wires, but I would think the bigger factor would be that the decreased interconnect capacitance lets you use smaller transistors, which would definitely save significant power.
7.0 is definitely a pretty big quake. The Northridge Earthquake killed more than 50 people in the Los Angeles area and it was "only" 6.7. There was some prettysignificantdamage . Of course, its epicenter was in an urban area..
The OS has some threads managing I/O and performing housekeeping operations, and you're probably also listening to some music, and you probably have some other apps running that occasionally need a little computation. So none of that stuff will impede your compile. This is true, but if you look at how much CPU time that stuff actually uses, it's negligible - over 1 week, with winamp playing ~12 hours/day, it's used 20 minutes of CPU time. "System" (where drivers live) has used 1 hour, as has my virus scanner. If we assume all background tasks combined used 4 hours of CPU time (seems unlikely, looking at task manager... probably closer to 3), on average about 2% of the time is going to these tasks. If you add a whole extra processor core just for these 2%, you're going to be wasting a lot of computational power.
However when we move to a 4GHz machine that requires 400 cycles to access main memory, 25 cycles to access L2 cache and 4 cycles to access L1 cache, the difference between OOP and in-order starts to fall away. Actually, a major point of OO execution is to hide small delays - with the window sizes of modern processors, you can easily hide L1 latencies and possibly hide the latencies of L2, and only pay severely for accesses to main memory. An in-order core is the one that really loses performance as L1 and L2 take a few extra cycles.
Also just because the processor is in order doesn't mean a memory/fp/int instruction can't all be run in parallel depending on how its designed (however they must be retired in order). They're retired in order in out-of-order processors... they just execute out of order. In order to run n instructions in parallel in an in-order processor, you need n consecutive instructions in the program that all have their dependencies met at the start of the cycle. I'd doubt that happens often. Plus, you said yourself L1 is 4 cycles away... that means at every memory access you can't issue any new instructions for 4 cycles. Read some assembly code - memory instructions make up a HUGE portion of the instructions. In order execution is a big sacrifice, and you need to be able to find a lot more parallelism to make up for its loss.
The reason that the simulation results coming from the original UWashington research on the subject - http://www.cs.washington.edu/research/smt/ - looked far better was their use of unreasonably large caches in their simulations, and that they completely ignored the OS overhead of enabling SMT - which is non-negligeable - and is a thing that has been pointed out often on the Linux Kernel mailing list as well.
I didn't read most of the princeton paper... but you're arguing that caches need to be big to get any gains, and that Intel's HT chips show SMT doesn't offer anything. The Intel chips have ridiculously small L1 caches - only 8KB. A quick sampling of washington papers shows they simulate machines with 64-128KB L1 caches, which are entirely reasonable - all AMD processors since the Athlon have had 64KB L1 caches. Both companies are increasing L2 sizes, and 1-2MB is not unreasonable either.
I don't know anything about OS overhead, but section 2.3 of the princeton paper argues SMP kernels (which SMT requires) are slower, and thus you pay for extra overhead when using SMT vs a non-multithreaded single processor. However, they themselves don't make the same claim for multiprocessors (because you have to pay the OS overhead anyway), and with the introduction of dual core processors at the consumer level, everybody will soon be using the SMP kernels anyway. This point is [rapidly becoming, if not already] moot.
Their analysis in section 3.3 implied that the memory subsystem becomes the bottleneck in multiprocessor systems with SMT enabled, but before you take that and agrue SMT offers nothing, I again point out problems with the Intel implementation: their memory bus is shared among all CPUs, so the per-CPU bandwidth drops with an increase in CPUs, and per-thread bandwidth is half again. AMD's Opterons don't suffer from this same problem due to their NUMA configuration, so a 2-CPU 2-thread SMT with an Opteron-like memory system would get the same per-thread memory bandwidth as a 2-CPU non-SMT Xeon system, while supporting twice as many threads.
I'll misquote Fred Weigel and suggest that the next problem is branching: Samba code seems to generate 5 instructions between branches, so suspending the process and running something else intil the branch target is in I-cache seems like A Good Thing (;-)).
That's close to average... normal integer applications tend to be about 20% branches. That's why branch predictors are important, and why so much research goes into improving them. You can get around 95% accuracy without too much difficultly (higher with the really fancy predictors). If the branch target isn't in the I-cache, it's not a branch problem, it's a cache problem. You either need a good L1 prefetcher, a bigger L1, or code that isn't so bloated;). (Of course, applications like OLTP tend to not fit in ANY size cache so you're screwed no matter what, but fortunately they parallelize well so you can just throw more slow CPUs at it).
For simple linear devices (like resistors), power = V^2 / R, i.e. it changes with the square of the voltage. Halving the voltage quarters your power.
For CPUs, a better model is P = C*V^2*F (capacitance times voltage squared times frequency). If you halve the voltage and halve the frequency, the [dynamic] power drops by a factor of 8. Unfortunately, modern transisots leak, so you probably won't actually see that much drop, but the point is, underclocking even a little can result in huge power savings.
Gecko (the engine used in Firefox and Mozilla) probably won't be passing it too soon. See Robert O'Callahan's blog entry here. This means Firefox 1.1 won't pass Acid2.
I guess nobody read the next article? "Shattered Mac illusions"...
Now, let's review: This was a brand-new machine, the system detected no problems and iPhoto hadn't been used before, but handling just less than 15,000 images made it blow up. And I thought Mac applications were generally considered to be better than Windows applications. Evidently this is not the case.
The arbitrary remote code execution hole is not exploitable in Mozilla by default. You can still steal cookies. I personally find the cookie stealing exploit an order of magnitude less serious, since it requires knowing what sites I visit to steal cookies.
On Saturday, the Mozilla Update team, plus some Mozilla devs, took steps which prevented all published exploits we'd found from working. On Sunday, Mozilla Update was moved to an untrusted URL; as a result, users who have not added other sites to their whitelist should now be safe from the remote code execution attack.
Yup, bug it's not specific keywords... it's the "Mozilla Application Suite" product (see list here). It may get changed to SeaMonkey when someone at the Mozilla Foundation has time to reorganize the Mozilla Application Suite components into the SeaMonkey component setup we would like.
The "Mozilla Suite" under that name is no more... the Mozilla Foundation isn't doing any more releases (well, security updates to 1.7, but that's all). However, a community group is continuing its development under the name SeaMonkey. It contains all the core improvements that went into Firefox 1.5 (pretty error pages, svg, canvas, performance improvements) and some new features of its own. Not all changes to Firefox go into the suite - SeaMonkey doesn't aim to be exactly like Firefox.
If you're interested in it, we'll be shipping 1.0 alpha very soon now (based on the code that would have been Mozilla 1.8 beta4), and nightlies are available here (you want the -mozilla1.8 directories at the bottom). We're hoping to ship within the next week or two (it's just an installer bug that we need to fix before release).
A type of automatic pointer (nsCOMPtr) is used that handles refcounting. Unfortunately, there are still situations where a manual addref is required; these are some of the kinds of things that can lead to leaks. The JS engine uses a normal mark & sweep garbage collection algorithm.
Register renaming helps (otherwise Intel wouldn't have invented it).
I don't recall who first implemented it, but the MIPS R10000 implemented it about a year before Intel did in the PPro.
Sure, Intel may have originally hoped to migrate the world to IA64, but given the wild success of AMD64 in bringing 64-bit to the x86 world, it doesn't look like that's happening. The Itanium chips Intel is releasing are obviously not aimed at tasks that could be handled by a 386 with some SCSI drives ("fax server"? a file server?)... who is going to use a multi-thousand-dollar CPU for anything other than database|web|high-end server anyway?
I've been playing with coLinux today (I run Windows, but the SeaMonkey Project needs nightly Linux builds for users to test, and our resources are pretty limited) and I have to say, it's pretty cool. I recompiled gcc in less than 30 minutes, and it doesn't feel slow at all to use. It's significantly faster than the cygwin process that's used to build Windows versions of SeaMonkey (and Firefox).
I was thinking it might be possible to set up some sort of coLinux-based package which lets people run gnome, openoffice.org, gaim, etc. You can access your host system's data, which means the "where are my mp3s?" type things won't be a problem. You are running native binaries (I'm running an ordinary debian setup), so you get the exact look and feel you'd get if you were running linux. You can install packages just like you would.
At this point, there are some challenges with sound, and you need an X server or vnc client on the Windows host for GUI apps, but the package could take care of all that stuff so users don't have to worry about it.
You could have various levels of transition - ranging from launching apps from the start menu (I can think of easy ways to get that sort of integration), all the way to hiding explorer and using gnome full time, with Windows skinned to look like GNOME for any Windows apps the person still uses. The end result should be users who aren't afraid just because Linux desktops look different, and aren't worried they'll have to hack text files or learn the commandline (things you shouldn't have to do in any decent end-user-oriented distro).
Just a thought.
TFA claims the new chips will be in PCs in 18 months - given the incredibly long design times of modern processors, that means they've probably been working on it for at least a couple years.
If you use Mozilla (well, "SeaMonkey" now), you can do something similar through the UI - the cookie manager lets you do things like "disallow persistent cookies for all sites except ". It's a lot more convenient than changing file permissions and editing text files. Plus, since you can still have per-session cookies, stupid sites that depend on them don't break.
Yes, with actual math.
I don't know how much power is dissipated in the wires, but I would think the bigger factor would be that the decreased interconnect capacitance lets you use smaller transistors, which would definitely save significant power.
I suggest taking a look at this paper which discusses theoretical limits on the binary switching model.
Were you using a gecko-based browser (Firefox, Mozilla, Camino, K-meleon)?
Especially when combined with XUL.
7.0 is definitely a pretty big quake. The Northridge Earthquake killed more than 50 people in the Los Angeles area and it was "only" 6.7. There was some pretty significant damage . Of course, its epicenter was in an urban area..
Most processors are at least a little bit faulty.
The OS has some threads managing I/O and performing housekeeping operations, and you're probably also listening to some music, and you probably have some other apps running that occasionally need a little computation. So none of that stuff will impede your compile.
This is true, but if you look at how much CPU time that stuff actually uses, it's negligible - over 1 week, with winamp playing ~12 hours/day, it's used 20 minutes of CPU time. "System" (where drivers live) has used 1 hour, as has my virus scanner. If we assume all background tasks combined used 4 hours of CPU time (seems unlikely, looking at task manager... probably closer to 3), on average about 2% of the time is going to these tasks. If you add a whole extra processor core just for these 2%, you're going to be wasting a lot of computational power.
However when we move to a 4GHz machine that requires 400 cycles to access main memory, 25 cycles to access L2 cache and 4 cycles to access L1 cache, the difference between OOP and in-order starts to fall away.
Actually, a major point of OO execution is to hide small delays - with the window sizes of modern processors, you can easily hide L1 latencies and possibly hide the latencies of L2, and only pay severely for accesses to main memory. An in-order core is the one that really loses performance as L1 and L2 take a few extra cycles.
Also just because the processor is in order doesn't mean a memory/fp/int instruction can't all be run in parallel depending on how its designed (however they must be retired in order).
They're retired in order in out-of-order processors... they just execute out of order. In order to run n instructions in parallel in an in-order processor, you need n consecutive instructions in the program that all have their dependencies met at the start of the cycle. I'd doubt that happens often. Plus, you said yourself L1 is 4 cycles away... that means at every memory access you can't issue any new instructions for 4 cycles. Read some assembly code - memory instructions make up a HUGE portion of the instructions. In order execution is a big sacrifice, and you need to be able to find a lot more parallelism to make up for its loss.
The reason that the simulation results coming from the original UWashington research on the subject - http://www.cs.washington.edu/research/smt/ - looked far better was their use of unreasonably large caches in their simulations, and that they completely ignored the OS overhead of enabling SMT - which is non-negligeable - and is a thing that has been pointed out often on the Linux Kernel mailing list as well.
I didn't read most of the princeton paper... but you're arguing that caches need to be big to get any gains, and that Intel's HT chips show SMT doesn't offer anything. The Intel chips have ridiculously small L1 caches - only 8KB. A quick sampling of washington papers shows they simulate machines with 64-128KB L1 caches, which are entirely reasonable - all AMD processors since the Athlon have had 64KB L1 caches. Both companies are increasing L2 sizes, and 1-2MB is not unreasonable either.
I don't know anything about OS overhead, but section 2.3 of the princeton paper argues SMP kernels (which SMT requires) are slower, and thus you pay for extra overhead when using SMT vs a non-multithreaded single processor. However, they themselves don't make the same claim for multiprocessors (because you have to pay the OS overhead anyway), and with the introduction of dual core processors at the consumer level, everybody will soon be using the SMP kernels anyway. This point is [rapidly becoming, if not already] moot.
Their analysis in section 3.3 implied that the memory subsystem becomes the bottleneck in multiprocessor systems with SMT enabled, but before you take that and agrue SMT offers nothing, I again point out problems with the Intel implementation: their memory bus is shared among all CPUs, so the per-CPU bandwidth drops with an increase in CPUs, and per-thread bandwidth is half again. AMD's Opterons don't suffer from this same problem due to their NUMA configuration, so a 2-CPU 2-thread SMT with an Opteron-like memory system would get the same per-thread memory bandwidth as a 2-CPU non-SMT Xeon system, while supporting twice as many threads.
I'll misquote Fred Weigel and suggest that the next problem is branching: Samba code seems to generate 5 instructions between branches, so suspending the process and running something else intil the branch target is in I-cache seems like A Good Thing (;-)).
;). (Of course, applications like OLTP tend to not fit in ANY size cache so you're screwed no matter what, but fortunately they parallelize well so you can just throw more slow CPUs at it).
That's close to average... normal integer applications tend to be about 20% branches. That's why branch predictors are important, and why so much research goes into improving them. You can get around 95% accuracy without too much difficultly (higher with the really fancy predictors). If the branch target isn't in the I-cache, it's not a branch problem, it's a cache problem. You either need a good L1 prefetcher, a bigger L1, or code that isn't so bloated
For simple linear devices (like resistors), power = V^2 / R, i.e. it changes with the square of the voltage. Halving the voltage quarters your power.
For CPUs, a better model is P = C*V^2*F (capacitance times voltage squared times frequency). If you halve the voltage and halve the frequency, the [dynamic] power drops by a factor of 8. Unfortunately, modern transisots leak, so you probably won't actually see that much drop, but the point is, underclocking even a little can result in huge power savings.
Gecko (the engine used in Firefox and Mozilla) probably won't be passing it too soon. See Robert O'Callahan's blog entry here. This means Firefox 1.1 won't pass Acid2.
I guess nobody read the next article? "Shattered Mac illusions"...
Now, let's review: This was a brand-new machine, the system detected no problems and iPhoto hadn't been used before, but handling just less than 15,000 images made it blow up. And I thought Mac applications were generally considered to be better than Windows applications. Evidently this is not the case.
The arbitrary remote code execution hole is not exploitable in Mozilla by default. You can still steal cookies. I personally find the cookie stealing exploit an order of magnitude less serious, since it requires knowing what sites I visit to steal cookies.
On Saturday, the Mozilla Update team, plus some Mozilla devs, took steps which prevented all published exploits we'd found from working. On Sunday, Mozilla Update was moved to an untrusted URL; as a result, users who have not added other sites to their whitelist should now be safe from the remote code execution attack.
While the hole exists in Mozilla, Mozilla by default ships with an empty whitelist, making it non-exploitable.