JollyFinn · Slashdot Mirror

Re:Since this is a dupe on Inside Intel's Next Generation Microarchitecture · 2006-04-07 09:53 · Score: 1

Well the problem you need to fix is called physics. The RC delay with process scaling increases.
The basicly in every process generation you have to reduce length of each wire by 0.7 or have half as many wires. Inorder to keep the delay per mm at same. Since rc delay increases when scaling wires smaller. The latency of moving data around increases all the time.
Your transistor budget may go up, but the area that you can use with reasonable clockspeed per cycle goes down.

Here's a hint, even in a good condition if you would put all your resources in one core, there would be 16 cycle distance between furthest parts of the core. That means branch missprediction penalty is flushing pipeline which probably is 40 cycles or more and then the extra 16 cycles that comes from moving the information about branch missprediction to front end.
Also the extra buffers to run the data take extra die area and power. Also the communation between processing units goes down.

As for wideing processor to SMT there is something designed called EV8, it would of been a great CPU but it wouldn't of been most efficient for multithreaded workloads, since the core was 5x larger than EV6 core in given process. It was ultimate SMT processor, it had double the resources of EV6 core, but 5x the area. It was cancelled due to company politics but still it was great project.
Now here comes the rebuttal of your first fallacy, the amount of resources available per die area isn't constant across different width CPU:s the smaller core more efficient its use of power&die area is.

Now the scaling makes the EV8 style core *LESS* feasible in 0.45u process, than it was in 0.9 target, since transistor delay scales by 0.7 while wiredelay worsens every generation. [Scaling down the wires makes them slower.] So in overall cores have to either become smaller or else we start DECREASING clockspeed when we improve transistor density.

You don't want to run 16 wide 1Ghz core against a 8 cores of 4 wide and running at 4Ghz. Thats why people don't put all transistors in single core anymore. Read this 3page article and you get my point. iacoma.cs.uiuc.edu/CS497/PIM2b.pdf

The communication latencies between cores are not mostly because of design of how they connect but because of area they consume as a whole. And you don't wan't those latencies happening everywhere inside core that comes from moving data around.
Also more buffering and moving data-around large areas consume more power than simply moving inside smaller area, so smaller cores consume disproportianaly less power than the large core.

Re:No Surprise... on Cleaner Air Adds To Global Warming · 2006-04-07 08:36 · Score: 1

Pollution = Greenhouse Effect
Greenhouse Effect = Increase in global temperature
Increase in temperature = More water evaporating
Vapourous water = Greenhouse gas.
More clouds = The sunlight warms locally more atmosphere instead of ground.

There is other effect of clouds it warms the nights, it blocks earth from releasing the heat to space at night time. What we have seen in Finland is that with clouds in any other time than summer its warmer nights and relatively warmer days than with clear sky.

Greenhouse effect->tundra melting
Tundramelting->More CO2
MoreCO2->GreenHouse effect

Greenhouse effect->Sea warms up.
Sea water warm up-> Sea can hold less CO2
More CO2 to atmosphere -> Greenhouse effect.

The thing is that after a tipping point there is no going back unless we can make some how collect all the CO2 that we put in the atmosphere in the , AND what nature put there because of greenhouse effect releasing them.

Re:Don't agree with global warming on Cleaner Air Adds To Global Warming · 2006-04-07 07:46 · Score: 1

Yes that is good idea, lets start with muslims.

Re:Damn that's a lot of Data on AT&T Forwarding All Internet Traffic to NSA? · 2006-04-07 05:23 · Score: 1

And they got your IP address from AT&T network, stored to database which they can search who posted this peace of information.

Lots of n^2 was changed to n in submission. on Inside Intel's Next Generation Microarchitecture · 2006-04-06 23:36 · Score: 1

Submision changed n^2 complexities to n complexities.
Its register rename, choocing which instruction goes next etc... increasing n^2 when when core changes.

Re:Since this is a dupe on Inside Intel's Next Generation Microarchitecture · 2006-04-06 23:33 · Score: 2, Interesting

It is only natural to extend this idea to the sharing of all resources on the chip. This is accomplished by putting them all in one big core and adding multicore functionality via symmetric multi-threading (SMT), a.k.a. hyperthreading. The secret is designing a processor for SMT from the start, not bolting it on a processor designed for single-threading as happened with the P4. I strongly believe that such a design would outperform any strict-separation multicore design with a similar transistor budget.

Too bad it doesn't work that way. Lots of structures in CPU are n in complexity where n is width of processor. Also when the travaling of information across a die takes more than 10 cycles you need to have smaller structures, it will increase latencies of instructions.

Here's one example, the bypass path needs to connect load port and every integer unit to every integer unit So there is (n*n) Connections between units, and the number of stages it needs to go in selecting input hampers the clockspeed eventally. There is practical limit on core size if we go bigger the clockspeed penalties and latencies will reduce more performance than adding core resouces will increase. Also SMT hurts cache hit rate, and that penalizes per thread performance also. When you put more execution units the maximum distance between execution units grows so the time its needed per cycle increases, due to delays moving data between execution units. But execution units are *NOT* the area where widening hurts mosts, its still easiest to explain. So then you either use 2 cycle latencies or go for very lower clockspeeds, or increase the voltage but power consumption is relative to v so no matter what the efficiency goes down.

I believe SMT isn't completely dead, it can make a comback in intel machines at somepoint, with SOME additional per core resources. But from now on there is multiple cores.

To make it clear, the transistor budget right now is so large that putting them all in single core isn't efficient, due to need to move data inside the core, and the n complexities.

Re:Interesting. on Into the Core - Intel's New Core CPU · 2006-04-06 22:54 · Score: 1

No its called bytecode.

Re:Core has OOOE? on Into the Core - Intel's New Core CPU · 2006-04-06 07:42 · Score: 1

That would be due to several "lessons learned" as Intel developed Itanium.

1. The instruction overhead due to extra hint bits, etc, means Itanium instructions are much larger than x86 32/64 instructions. With the addition of poor branch performance (read: more wasted instruction bandwidth), the need for large, high-bandwidth caches makes Itanium expensive.

No what makes itanium expensive is that they target a niche which buys only itaniums between 2000 and 4000 USDs the lower end itaniums don't sell. The extra cache is cheap. The instruction cache bandwith is cheap. Its the decoding and scheduling thats expensive.

2. The compilers have not caught up. EPIC lacks OOOE, and has poor dynamic branch prediction hardware, so it is at the mercy of the compiler.

The strength of dynamic branch prediction has nothing to do with epic, except that epic has more ways to completely avoid branching, but there is no reason for settling anyweaker branch predictor than x86. The pipeline is wider and shorter, so branch missprediction penalty is in terms of instructions is about same.

1. x86 is hard to decode (takes more silicon), but it takes less bandwidth than other instruction formats. Bandwidth is even more expensive than the cost of more complex decoders, just look how expensive it was for Intel to add full-speed cache to the original Pentium Pro, and how pricey the Itanium is with huge, fast on-chip cache.

You got it totally wrong. The original Pentium Pro was expensive since the CORE was huge due to x86 support and they couldn't test the core before assembling it with the huge amount of cache, then they lost the silicon area of both cache and core when core failed. Itanium has relatively small core compared to x86. The cache can be considered as maximum of 1/4 of its area for yield purposes these days probably even less. There is something called redundancy used there, and they right now get 4 as much silicon per dollar than when pentium pro was made as addition. As for 60000mm of silicon costs under 3000$ for intel. So they sell their highend itaniums over 50 times the montecitos (next gen HUGE itanium) silicon costs. And montecito will be made in 0.9 process that is getting phased out of x86 production. So die areas comparisons are not equivalent since that process has already a great yield for producing x86 over a year. While 0.65u has twice the transistor density, and has lower yields with big dies than the old process

2. OOOE + Branch Prediction + internal RISC is king. One reason the original Pentium never performed well is because it could RARELY execute more than one instruction per cycle. Thus, it performed like a fast 486 unless the code was recompiled as Pentium optimzed. The P6 was designed to avoid the reliance on compilers to improve performance, as it could optimize code in any condition. Funny, we didn't start seeing Pentium-optimized code on the market until the P6 started taking over.

This doesn't really bring anything new to the table, there is big difference between this and itanium situation. Nowadays there is something called Java. All java gets optimized on new itaniums when virtual machine is improved. Also people do recompiles, since most of the time itanium is running either HP-UX shipped with machine or linux. All the itanium generations from first to now to next itanium generation have been and will be 6 instructions wide. The improvements have been in and will be cache systems and more functional units to have more relaxed issue rules. This compares to 1 instruction wide to 2 instruction wide pentium on a time when performance critical code was written in assembler. So basicly all newer itaniums run faster on code that was optimized for older itaniums, simply because improvements where made in that way. SOME improvement is left on table without recompilation but not huge amounts like in pentiums case. We don't see transition from 6 wide to 12 wide execution which would be equivalent to 486->pentium

Well from the article. on How Bill Gates Works · 2006-04-06 01:15 · Score: 2, Interesting

The guy is 100% manager these days. He has some filtering whitch decides whether his assistant will read it or him personally.
I think thats best for a guy like him. If hed get all emails that where send to him he would spend all time getting unimportant emails, now there is assistant who checks the filter which if there is some email he should get.
He has triple screens, but those screens aren't the 30" dells.
He has such huge amount of information to go through and manage that he needs to use some search application to keep it in order.

You need no stinking business expert! on Should the Computer Science Guy Be CEO? · 2006-04-05 18:38 · Score: 2, Interesting

Lets make this straight, what needs to be done in the business at first.
You need to develope the thing.
2nd you need to sell the thing.
3rd you need to do some paper work.
4th discuss with investors if you cannot do above well without pouring more money to it.

The 2nd part happens after most important risks related to business have already taken. And 3rd part isn't big deal until you have your start hiring people. 4th part is only important if you plan to hire or cannot sustain your living entire developement time.
So basicly if your thing isn't ready nor the business person do not add value to your business so you are already getting ripped off by giving him 50% of your business. And if its ready the business person should invest the money atleast equal to 5 times the salary you would of taken when developing the thing, in order to match your investment on the business.

I'd say read the Eric Sink:s articles beginning here. They teach part of the business part that geeks need to know. Basicly business part is easy if you need to know it. And computer guy is far better in the helm of software company than a business person. Since software person understands whats possible, and what not and proper technical trade offs.

Of course if he can do developement too and his domain expertice is needed for making the product then it wouldn't be obvious who should get bigger part. Oh and 50% /50% deal someone ALWAYS gets ripped off since people don't invest equal amounts of time to the business.

http://software.ericsink.com/bos/Geeks_Rule.html

Okay I realized what they did... on New 25x Data Compression? · 2006-04-05 09:41 · Score: 1

They are doing CVS style for backups. For instance instead of storing 100 times the system state you get 1 system state and 100 diffs for it. Of course some compression on basic state and diffs are applied. And it looks like they also compress across multiple machines. So they are just applying compression in scale and location that isn't normally done. You normally don't compress across multiple backup generations, nor multiple workstations. When considering 30 backups of 25 developer workstations the dataset is having so much redundancy in data that I'm surprised if the compression ratio would be only 25x. Here's a good one. How much multiple backups help after that compressor. Perhaps they help if you need to get to a specific stage to undo some things that happened after certain backup. Also there is problem that if ONE set goes bad backups on *ALL* backups on all workstations go bad. Good new is that they probably have some redundancy duplicate raid1 style system below this compression layer. And taking tape backups every now and then on the compressed dataset would make it reasonable to have on tape backup of ALL data on 100+ workstations at end of every day they are ran depending on amount of data that is different between workstations and amount of changes that happen on the workstation.

The importance of proper exercise. on Health Problems Related to the Geek Lifestyle · 2006-04-05 08:54 · Score: 1

Exercising 45 minutes every other day decreases my need for sleep by 1 hour each day. [I don't normally use alarm.]
Second benefit is that it makes me more awake during day time. The end result is that with routine exercise, I can spend more time with computers than if I don't exercise, and can concentrate more on the task at hand, since with exercise I don't feel sleepy so often until there is time to sleep.

This is more than malicios code. on The 2006 Underhanded C Contest Begins · 2006-04-05 07:22 · Score: 1

Its basics of benchmarking. The goal of producing benchark that performs given task and results show one system inferiour to other is REALLY easy. Too bad I don't have one of the old mac mini:s to show my athlon64 the superiority of RISC architecture. We all know very well that RISC is 1000 times faster than CISC dinosaur.

Finally. on The 2006 Underhanded C Contest Begins · 2006-04-05 06:30 · Score: 1

There is good way to measure the real difference of different distributions!

Compiler says on The 2006 Underhanded C Contest Begins · 2006-04-05 06:08 · Score: 1

: undefined reference to `LaunchMissles'
collect2: ld returned 1 exit status

Re:That's invalid on The 2006 Underhanded C Contest Begins · 2006-04-05 06:05 · Score: 1

Id say it could be valid if wine ran it MUCH FASTER than windows ;-)

GREAT on RIAA Recommends Students Drop out of College · 2006-04-05 05:11 · Score: 1

Now we are slashdotting MIT. What will that do to the future of american engineer.

First post!!! on Slow Starters Have Higher IQ? · 2006-04-04 20:02 · Score: 1

Some people get bad answers very fast.
Some people get correct answers on time.
I get best answers too late.

How goverment could work for environment. on Americans Gearing up to Fight Global Warming · 2006-04-04 06:56 · Score: 1

100% tax on oil. So it would double gas price so that it would be slightly cheaper than in europe.
Similar tax on coal.
Give permissions for building nuclear plants.
Start making federal railroad network just like federal highway network.
Federal goverment should handle the tracks [natural monopoly] while private companies would handle the trains.
Give private companies chance to give up their network for goverment. I mean, offload the expense of upgrading and keeping the tracks in good condition to goverment which probably has more interest there.

Increase amount people could earn without paying income taxes, for compensating the tax on oil, if there isn't deficit after these changes. Oh and get rid of SUV tax evasion.

Wow.... on Look Ma, No-Hands Fasteners! · 2006-04-02 20:46 · Score: 1

Well. This is great news. Intelligent fasteners. You know great if would be if the cockpit could be filled with airbag with all the seats getting loose including pilots seat. And with proper timing for such manouver could result lots of crushed meat. Think of it, all the airplane seats with people sitting in them falling freely until there is stop to free fall...

The barbieOS... on OMG BARBIE LINUX LOL!!1!!!! · 2006-04-01 20:42 · Score: 2, Interesting

Well if it weren't reasonable to assume this could be aprils fools. This would be something I'd expect to happen at some time.
Most girls seems to like linux small games. And don't like the kind of big large budget windows games. Atleast those who tried what I had on my computer. Other things that they seem to do with computers, is IRC, IM &WEB surfing, and writing school essays, and wathing movies, and listening to music.
The movie playing part maybe only weakness of linux [getting past some DRM to play the thing], but if its stable in sence that they don't have to do anything and it just works then there is no reason that there couldn't be such distribution aimed at girls.

Re:Time to move... on Unmanned Aerial Drones Coming Soon Above U.S. · 2006-03-29 21:16 · Score: 1

Leave the country until it collapses or someone cleans it up.

I have one question. In 1935 if you where dissident in german would you have fled to.

Austria
Poland
france
Hungary
Greece
philippines
????

There are plenty of places where you shouldn't of gone at that time. But how would you recongice which place would be "safe" place to go now. Today safe place, don't have any oil reserves, is not English speaking country, nor near one. Does not have large muslim population. etc... United States has such military hegenomy that with its potential if it goes 100% totalian regime with some how people ACCEPTING casualties in war, there is no place which would be safe from that machine. Then with that militarism, there would be military build up else where, and weaker nations would either join bigger nations, or be invaded by them forcefully.

Re:We can't control our own borders... on Unmanned Aerial Drones Coming Soon Above U.S. · 2006-03-29 20:46 · Score: 1

If you meant the U.S., then I'm afraid a republic is a form of democracy.

No its not. The democracy eliminates corruption and waste completely, and you can set tax rate to 100%. The down side is that every agressive unit makes TWO citizens unhappy, instead of only one under republic.

Professional Dresscode in Oracle consulting... on Sandals and Ponytails Behind Slow Linux Adoption · 2006-03-29 03:11 · Score: 1

Here's some consulting dress codefor professional Oracle Database consultants. That could teach something for OSS about how to look in front of a customer. And how important is to look good when customer is *NOT* seeing you.

Re:Size on 48 Core Vega 2 in the Making · 2006-03-29 02:52 · Score: 1

TFA does not mention anything about this new processor's die size. But, if we scale up the Cell processor's transistor density, the Vega processor, with 812 million transistors, would result in a die size of about 800 mm^2, which is more than one square inch. In the processor industry, that kind of die size is just plain ridiculous. I wonder what the yields are?

If those transistors are most of cache then the yields are pretty good. If they are logic then there needs to be more clarification. Firstly they probably have some sort of redundancy there. Probably having something like few spares in case there is problem in one of the cores. As for yields of cache divide area dedicated to cache by 4. What they are doing is quite different from cell so they might have larger portion of area in cache, which bring transistor densities up, and your area estimate way off. They might even have less logic transistors than in SINGLE athlon core on *ALL* their cores combined, and rest as cache.

There is reason why they would put lots of cache instead of logic transistors. A simple Arm is 30k transistors,ARM9 is 110k transistors & jazelle unit is 12k transistors. Thats amount of transistors take to execute most of javabytecode, without floating point support directly on hardware designed to run risc ISA. It could be even simpler than that, but adding fpu support probably costs some of the simplicity but I doubt seriosly if each core would have large execution resources trying to find ILP. Now what about scaling 48 cores or so. At that point most of need is memory bandwith and cache bandwith on shared cache levels. It should be obvious that increasing L1 cache hit rate is important to avoid stalling on shared memory bandwith. And Increasing L2 cache hit rate would be important to avoid wasting memory bandwith. The cell vs azul cores is partially dependant on inside core parallerism and very high clockspeed target of cell to add more logic transistors on cell design. And of course high disparity on number of cores. If they chooce not do bytecode directly but have jit then still vliw or simple risc would still be a good choice.

Montecito has 1.65Billion transistors in 0.9u process and has die size of 600mm. Thats dual core itanium. With its transistor density the Vega2 would be 300mm with most of die area spend on cache, and yield wouldn't be bad at all.

Slashdot Mirror

User: JollyFinn

Comments · 516