The solution to the bandwidth probelm is to launch more satellites. Spot beams and frequency reuse make such a system scalable.
That turns out not to be the case, for reasons which I pointed out in my original message.
Unless you have a really, really _huge_ dish on the satellite, you just can't focus a microwave beam very finely, due to diffraction effects. I'd already been assuming lots of low-orbiting satellites using spot beams when quoting my original figures. If, say, all of Manhattan fits in a spot, it doesn't help much.
Satellite 'net access is a very cool toy, and is extremely useful for ares that don't offer cable/*DSL (everywhere except large cities, in other words).
However, don't expect satellites to replace cable any time soon. There are difficulties when you try to scale up to the silly bandwidth levels required:
Area served by one satellite is large. Your satellite is at least 300km up, and making microwave beams really parallel is tricky (unless you want to use a huge dish, which adds weight and cost to the satellite). This means that, even pulling tricks like having multiple fairly-narrow-angle transcievers per satellite, you still get everyone within a few tens of kilometres sharing the same uplink. Fine for low-density areas, but not for cities.
Bandwidth is bad compared to fiber. Microwave beams have a data bandwidth comparable to their frequency - a few Gbits/sec. at most. This is the maximum _shared_ bandwidth per uplink region, and the maximum bandwidth of the pipes between the satellites. Bump the frequency up to get more bandwidth, and you start getting blocked by light cloud cover and thin walls (and ceilings). While you could do something like have an optical link from satellite to satellite, your uplink/downlink bandwidth is going to be pretty crummy compared to, say, a fiber backbone serving the same area.
I'm not trying to bash satellite data services - as mentioned above, they do have their uses. I'm just trying to stave off the inevitable flood of "Wow! Now everyone in the city can get cable bandwidth from their palm-pilots!" messages.
Some of the cool stuff some researchers are doing is integrating a laser onto a normal ASIC.... [...] Now all we need is a way of producing RAM and peripherals that keep match with the speed....
For the RAM, at least, the answer is straightforward. Keep latency at its current range, but _heavily_ interleave RAM both on a bank level and a chip level. You now have RAM that can get 100 cache row requests and service all of them with a batch latency of 7 ns (or 5 ns or [etc]).
This would let you, say, put 8 or 16 cores on a die without worrying about cache misses slowing you down (as long as you have a deep miss buffer).
This would also be useful for transferring vast amounts of data with good locality in a known pattern (for instance, triangle or texture data) from RAM to a peripheral.
This is probably what busses will look like in a decade or two, as it's much easier to eliminate cross-talk and interference on an optical bus than on an electrical one.
Stupid question time - what is the maximum switching frequency of a plain old phototransistor?
Exotic technologies are very neat, but I'm wondering if more conventional technologies might already work.
If phototransistor switching speed is comparable to ordinary transistor switching speed, you could probably build an optical transciever more cheaply by using closely packed frequency channels with bandwidth comparable to the switching speed, and a prism or diffraction grating to split them for parallel reading.
"...IBM has developed a proprietary technique to build chips using silk, a low-k dielectric material that is commercially available from the Dow Chemical Co. (DOW.N)."
Probably an acronym of SIlicon, Low-K. The approaches I've read about to date mostly involve "foaming" whatever material is used, as vacuum or whatever gas ends up in the cavities doesn't have nearly as high a dielectric constant.
Parasitic capacitance is directly related to the dielectric constant of the insulating material.
This is a wonderful accomplishment - now we can get started on the main problem.
Having a complete map of a creature's DNA tells us, in principle, all of the proteins that it can synthesize throughout its lifetime. This gives us the building blocks that the creature uses to build things, and the chemical signals that it uses to direct internal operations.
This is wonderful, and essential. To use an analogy, this is like a Victorian scientist, after years of studying a 1999 notebook computer, managing to deduce how transistors and the wires that connect them work.
He still needs to deduce a lot about capacitance, resistance, and inductance to tell how signals will propagate and influence each other, and needs to build up from scratch all of the disciplines involved in integrated circuit design before he can understand how it works, but it's a start.
Similarly, we can now move on to the next step in understanding biological creatures - trying to figure out what all of the proteins do, and how the systems built from them operate and interact with other such systems.
This would not be an easy task under the best of circumstances. It's made worse by the fact that evolution puts little value on modularity - the systems will interact with each other to such a degree that it will be difficult to even define individual systems within the chaos that is an evolved being.
I wish them luck. They have opened the door, and made available for study the vast landscape of interacting systems that we'll have to understand to truly understand how living creatures work.
For decades, we've more or less all been aware that environmental protection makes economic sense; I don't think anyone doubts that. Unfortunately, it rarely makes immediate- or short-term sense, which is why we see millions of hectares of rainforest disappearing annually. Basically, you can't rely on human beings to act in their own self interest in anything but the short term.
As mentioned in the article, though, owners of potential pollution sources (like landfills) are liable for any pollution that they cause down the road. This gives them a strong incentive to take measures _now_ to prevent themselves from having to shell out a lot of money later.
Tax incentives also help with this, again by providing a short-term reason to do things, but can be politically difficult to implement (best way is probably just to let companies write off plantings like this as a tax expense, thus halving their after-tax cost).
You can never have authentication portability to untrusted networks, because, by definition, they are untrusted. OTOH, it should be possible to set up (with existing tools) a globally distributed LDAP or NIS+ tree to accomplish the reasonable part of what you're trying to do here.
There's a clunky way that I can think of to handle validation over untrusted networks. It adds complexity but seems fairly robust.
Give each person a portable computer. This computer serves as their access point to systems they come in contact with. It has hardware-assisted strong encryption, and can digitally sign anything (ideally everything) that it sends out.
This tells every system in contact with the user (direct or via an untrusted network) who the user is in an unforgeable manner (the public key distribution problem is left as an exercise to the reader). The untrusted system can't forge the signature, and can't even read the packet if it's encrypted for the intended recipient.
Now, you run into serious problems with this when trying to run applications on the untrusted network as opposed to just using it as a communications link. I'd be interested in hearing about anyone's ideas on letting the untrusted network run software that processes private data without being able to see the data...
Network computing, where one can log in anywhere and have one's environment and files preserved and run applications without caring where they're hosted?
Sounds a lot like any university's internal LAN. Which probably runs Unix (Solaris at my university).
Unix was already designed to address many of the issues that come up with network computing. I only see a few things that need to be fine-tuned:
Iron-clad authentication/validation, everywhere. Portability of user accounts across the entire network, checking of permissions/licenses for running applications, etc. This is already pretty much here; you just have to know what you're doing to set it up. The challenge is to make sure that all applications on your system work fine for all users, everyone can do what they're supposed to and have access to what they need to, but to make sure that nobody can do anything they shouldn't be able to.
Portability of user authentication and of services to other (untrusted) networks. If you are a validated user at one university, you would ideally be able to log into another university with guest privileges, and have it recognize you as a specific user ("foo at bar university"). Similarly, I'd like a validated user on my personal LAN to be able to access someone else's service while keeping an individual identity. Or through another network access their "home" network's services with their full privileges. The idea being that identities and permissions carry over robustly and securely over heterogeneous and possibly untrusted networks.
Support for running applications on a distributed virtual computer.. This ties into the whole "the network is the computer" idea. If I could just use the collective computing power of all of my hypothetical LAN terminals to distribute tasks, I might not need a central server at all (assuming that my tasks have low communications overhead). Similarly, it would be nice to be able to farm off tasks to another "friendly" network. Protocols and support for this is in development, but would need to be standardized for true "network computing" to come into its own.
Again - by and large, these are capabilities that already exist, or exist at least in part. They continue to be developed - and chances are, that development is happening under Unix.
Hundreds of times a second? Please, we're moving to optical because of the bandwidth. Communications at the speed of light (in glass or plastic). We're talking multi-gigabits per second. And we want to switch packets at that speed with switches that can switch hundreds of times a second? Who can ponder the packet sizes required; megabytes to be sure.
That's not what this is designed for. You'd actually use it for things like this:
Routing around damaged backbone nodes. If a backbone node goes down, it's not going to go back up a microsecond later. You want to switch _all_ traffic to an alternate route, and then switch it back a few minutes or hours later when the node goes up again.
Dynamically adjusting bandwidth for backbone pipes. Think of this switch as acting something like a crossbar bus, connecting pipes point-to-point. Need more bandwidth between point a and point b? Allocate an additional pipe connecting them. Not using all bandwidth? Remove a pipe and allocate it to another pair of servers. Load patterns vary over minutes or hours, not microseconds, so this works fine.
Now, a purely-optical switch that _could_ work on the microsecond or nanosecond level would be very nice; however, a slower switch is still very useful.
What I wanna know is, does this also work with red grapes? Do seeded grapes produce a different effect? and What kind of wine would result from the fermentation of said grapes? Charred-donnay?
It should work with red grapes too, or anything else conducting and about the right size and sliced so that it has a thin membrane in the middle of the "antenna".
Disclaimer: I haven't tried this.
Seeds shouldn't make much difference, though if one heats up enough to go *poof!*, that could be mildly interesting.
"Charred-donnay" is a good guess for the wine result. Mmm. Carbon.
I'd heard about the grape trick a few years ago, but the screenshots here are nice.
What happens is that the sliced, somewhat-conducting grape is just about the right size to act as a microwave antenna. Large currents are set up in the grape, which cause the (very thin) junction between the halves to go *poof*.
You could probably enhance this effect by sprinkling the open surfaces with salt to make them more conducting.
Digging through Sony's pages, I can't find anything about this in the North American page. The Japanese pages, naturally, I can't read:). If someone with knowledge of the language could confirm that there's an actual announcement there, it would be much appreciated.
Once in December of last year (by CmdrTaco), and once even earlier. It was shot down then, and I don't see why it'd fly now.
This is quite correct. Many of the claims made were shady, and while the technology itself may be feasible after a lot of engineering, the article cited here is certainly a lot more questionable.
Logic errors picked at random, because I'm too tired to cut it to ribbons thoroughly right now:
You won't magically make the layer-selection problem go away here. Previous layers will still fluoresce as your UV beam shines through them - just not as brightly. However, they will fluoresce over a larger area, conserving total luminosity. Therefore, you'd better have extremely good selectivity in your readout optics if you don't want stray light mucking things up. Depend on the previous layer bits averaging out? Bet I can find special cases that still cause problems. Summary: This is not magically superior for layering.
Using dots of light vs. pits. The problems facing multi-layer pit surfaces are exactly the same as those facing multi-layer fluorescent surfaces as described above. No better, and no worse (well, a few implementation differences in error correction, but you get the idea).
"...store data in a way that is embarrassingly similar to Thomas Edison's old gramophone records" Shady support. Wheels have been around for thousands of years. Does this mean that they are obsolete now that we have alternatives? Analogy, as well as logic, is stretched a bit thin here. Data layout is similar, readout scheme is unrelated. FMD, by coincidence, would use very similar layouts in any spinning-disc devices (I have yet to see a convincing description of how they'd make a credit-card sized solid-state device with this technology).
Short version: Technology is mildly interesting but nothing spectacular. DVD technology has the same potential; neither is much easier to implement. Article itself is vapour, heavy on hype and short on actual thought.
Just about anything can be crammed into a rackmount case, but can you BUY it that way? Otherwise you end up spending too much time and money; buy the box, buy a new case, move components from old case to new... You get the idea?!?
Back-of-the-envelope cost analysis:
Doing it yourself:
order(10) boards at order($1000) each: order($10,000).
order(10)-board rackmount enclosure at order($1000): order($1000).
Value of assembly time: order(10) hours at order($100) per hour: order($1000).
Total cost is dominated by the cost of the boards you're putting into the rack. Both the cost of the rackmount enclosure *and* the cost of fiddling with the boards to put them into the rack are irrelevant compared to that, even at a very high dollar-per-hour cost for effort.
Disclaimer: This is a Fermi estimate, not a detailed cost analysis.
It costs time and effort to learn what you need to know to build a system, to pick out components, and to actually put the thing together. And when you sum it all up, it's not such a small amount of time and effort. So unless you're in the business of PC hardware support and thus have to possess all this knowledge anyways, then you're getting a *lousy* deal.
It takes me five minutes looking over a parts sheet to decide what I want in an x86 system. I go down to the store, and say "build this for me". I come back a couple of days later, take it home, spend another five minutes attaching cables, and it goes (well, then there's Linux installation, but if I was feeling masochistic I could get Windows pre-loaded).
I used to build my own machines from parts as a hobby. If it's fun, it isn't "cost". I switched to paying for pre-assembly when it became less fun. Cost increase is minimal.
Current Macs use IDE, SDRAM, PCI, AGP and several dozen other acronyms.
I've never heard of Macs with AGP ports, and until going back to university I was working for a graphics driver development company. We would have been overjoyed to have AGP Macs to write drivers for.
Maybe because 350MHz ist TWO YEARS OLD TECHNOLOGY in the x86 World????
Clock speed will often vary quite widely between architectures; this does not directly affect performance (look at Sparc chips, for example; similar SPEC marks to x86 chips at much lower clock rate).
Performance is based both on clock rate and on how much work is done per clock. This in turn is affected by how pipelining on the chip was set up, and many other things.
A good reference on the subject is "Computer Architecture: A Quantitative Approach", by Hennessy and Patterson (published by Morgan Kaufmann).
Um... not quite. Ever since the PPC was invented, it's always been faster than whatever Intel-based chip was on the market at the time, assuming identical clockrate. The exception was the 601, because you couldn't get equal clockrates; the minimum speed for the 601 was 60 MHz; 486's never went that fast the the Pentium didn't come out for a few months after the 601 did.
Actually, a couple of points here are not quite right (though your figures are more accurate than the previous figures quoted).
The 486 went up to 100 MHz core clock (3:1 CPU to bus ratio). These were sold as "DX4-100" chips (the "4" is the product of marketing). Actually, IIRC AMD offered a DX4-120 (their best offerings in those days ran on a 40 MHz bus). Time frame almost certainly overlaps heavily with the 601; I would have to do more research to quote date/clock frequency points for either line.
Performace-wise, I've mainly relied on SPEC benchmarks (www.spec.org). These are pretty much the canonical measures of performance for real CPUs (and desktop CPUs as well, which are asymptotically approaching workstation-class). By insisting on the same tests (compiled with the tester's choice of compiler) and on full disclosure of the test systems used, they are as close to vendor-neutral as we're likely to get.
x86 and PPC based machines benchmark at roughly the same speed at any given time in SPEC history. Clock frequencies aren't the same, but that's irrelevant - performance is what matters. While PPC was certainly fast, and definitely has a cleaner architecture than x86, it failed to substantially outperform x86 (and conversely, x86 failed to outperform PPC).
Where things get interesting is the G3 and G4. There has been a suspicious dearth of SPEC information from Apple in recent months/years, and a strong outpouring of questionable benchmarks quoted by their marketing departments (most bizzare was the "1.5 clocks/pixel vs. 200 clocks/pixel" filtering quote, debunked on Slashdot by a few people who provided far faster x86 code). The G3 and G4 are most certainly excellent processors, but Apple has failed to put believable numbers behind them when quoting benchmarks.
What I'd really like to see is an independent testing of SPEC marks. This would be do-able on any of the *NIX variants currently running on Gx, and would be quite straightforward on MacOS X. The problem with independent benchmarking is that Apple is best qualified to produce a compiler for the G3/G4. If a PPC based *NIX group tried it, their numbers would most likely be lower than optimal because the compiler wouldn't optimize as well as it might be able to. It would still be interesting as a data point, though.
Before anyone objects that SIMD instructions (like AltiVec and SSE) are difficult to compile to, I'd like to point out that loop unrolling optimizations take you half way there already.
Summary: In the past, advocates of both architectures have failed to prove that their architecture trounces the other. IMO, current _meaningful_ bickering is hampered by a lack of SPECmarks.
This gives 80 clocks per pixel written to the screen
Erk. For one colour component. Add another 12x4 clocks per component for the additional components. Total is (8+12+12+12)*4 per pixel, or about 180 clocks. 80 clocks would be with SSE doing the component calculations in parallel.
This seems slow. I'm going to have to try coding this to see what I can really get. An UberCoder, as mentioned, can certainly get a factor of two or three by using superscaling to issue several operations at once.
Take a gigahertz X86 processor and toss 256x256 texture bilinear filtering at it, and it's gonna choke. Take a Voodoo 1 that has entire gate arrays devoted to doing nothing else *but* filtering 256x256 textures scaled to arbitrary sizes, and it'll do just fine.
While I agree that custom hardware will dramatically outperform general purpose hardware at dedicated tasts, I question a couple of your statements here.
Firstly, there is a world of difference between a custom integrated circuit and an FPGA. FPGAs have very hefty overhead. While you can build custom logic with them, the slowdown just from using an FPGA will offset the advantage from this in the vast majority of cases (not all, but most). Click on "user info" above and see my previous post on the topic for a more detailed discussion.
Secondly, a 1 GHz general purpose processor can handle texture filtering just fine. Work through the number of operaitons required.
Nastiest case: each pixel drawn to the screen is from a different region of the texture.
Operations needed:
- Extraction of texture coordinates.
Left as an exercise, since you seem to be focusing on filtering. In the absolute worst case, this involves four multiplications and four additions (one 2-element vector subtraction and one 2x2 vector-by-matrix multiplication to convert from screen coordinates to texel coordinates). Finding the origin vector and matrix in the first place are irrelevant, as that's done once for the whole polygon. Total operations needed: 4 fast, 4 medium (multiplication is slow compared to addition, but fast compared to division).
- Filtering of the texture. Here's a naieve algorithm that still works pretty well:
Step 1: Truncate texture coordinates, retaining only the fractional components. Call these p and q. Operations required: Two fast operations.
Step 2: Blend pixel values, for each colour component. Formula is: result = q(ap + b(1-p)) + (1-q)(cp + d(1-p)) Operations required: Extracting (1-p) and calculating it only once, this needs 4 fast operations and 6 medium applications (multiplication is slow compared to addition, but fast compared to division). You also need four texel fetches.
Total number of operations required: 6 fast, 6 medium.
Total operations for both steps: 10 fast, 10 medium.
Fast operations happen once per clock. Medium would happen once every 3-4 clocks without pipelining, but can happen once per clock also with pipelining. We should have enough filler instructions for bit-twiddling and loads/saves to avoid stalls. Speaking of which, quadruple the instruction count; we need to shift, mask, and convert to floating-point for each colour component of each texel (whole texel is loaded as a 32-bit word, once and only once).
Some of these instructions can be issued simultaneously due to superscaling, but there are also a handful of other instructions for loop control and so forth, so we'll call it even. Memory latency should be completely masked - we only do four fetches and one store per pixel drawn to the screen.
This gives 80 clocks per pixel written to the screen, or a fill rate of about 12 megapixels per second in the worst case using naieve algorithms, for your 1 GHz processor. This gets you around 30 FPS at 640x480, assuming an overdraw of 1.3. Not beautiful, but pretty decent, given that this is non-optimized code doing bilinear filtering.
A real game coder could easily produce a loop that does bilinear filtering in half the time that my illustration version does.
A real game coder working with SSE could produce a loop that gets a factor of three speed gain over *that*, as all of the colour components could be filtered in parallel.
To conclude, while I agree with the gist of your argument, I think you might want to re-check your numbers:).
This is a good observation. However, what is the speed of a typical FPGA these days? I find it hard to believe they'd compete with the processes Intel, et. al. are using. Wouldn't it be smarter to have a chip with lots of different specialized function units on it?
It turns out that there are a lot of trade-offs here.
FPGAs are indeed much slower than custom integrated circuits. There are a variety of reasons for this, which are beyond the scope of this reply. Typically the speed difference is very substantial (FPGA is typically 5x-10x slower). You also have serious density problems (FPGA has 5x-10x fewer effective gates than a custom IC). This is enough to make FPGAs slower for almost all problems. A few very specialized problems might be better solved on a large FPGA array than an a custom (or even general-purpose IC), but most problems aren't in this category.
Where FPGAs _are_ useful is in quick prototyping and validation of new IC designs, use as "glue logic" on boards, and for processing where performance isn't critical and you aren't shipping very many product units (custom ICs are only practical in lots of 10,000+).
Now, there is the question extending general-purpose processors to contain "lots of specialized functional units". This is useful... to an extent. It depends on what you're doing, and how many units you try to add.
Remember that a functional unit that isn't being used is dead weight - adding silicon and cost. If you have a dozen types of functional unit, each used for a few of the tasks you use your computer for, but each sitting idle most of the time - then I would argue that you have a computer that's 5x or 10x more expensive than it needs to be. While a custom functional unit is considerably faster than a general-purpose processor emulating the same operations, this speed difference isn't huge. On a good nicely-optimized superscalar processor, it might only be a factor of 2 or 3 (unless you're doing something really ugly, like emulating FP in integer or emulating quote notation numbers). So, for most cases, you'd be better off emulating the desired functions, paying 5x less, and living with a processor that was 2x slower for specialized tasks and just fine at everything else.
Exceptions exist. SSE and 3DNow are good examples. While most general-purpose operations don't benefit from SIMD floating-point operations, several common applications do (most notably games). If enough demand exists for a feature, it becomes practical to add a functional unit to handle it. You just have to be careful not to let this get out of hand.
Summary: Adding many new functional units would not be cost-effective, but adding one or two that make sense works very well. FPGAs can't compete with this, though they're useful for other things.
Re:Space travel won't solve overpopulation.
on
On to Mars
·
· Score: 2
On the other hand, I think your premise is flawed some what. Obviously space exploration isn't the only thing we should be working on (birth control in particular makes sense to me), but it doesn't need to be the only thing.
Oh, I'm most certainly in favour of space exploration and colonization - I just don't think that it will help overpopulation, which is what the original poster was proposing.
Also, if we can get a significant portion of the population up, their children will be born in space - not on earth - to begin with. I'm looking at this from the point of view that it may take 1000 years to truly accomplish, but I don't think 'never' is a very good answer.
More people being born in space will not cause fewer people to be born on earth. If it helps, rephrase my statement as, "is it _possible_ to evacuate Earth, if Earth's birth rate substantially exceeds its death rate?".
The rate at which you remove people from the planet must be greater than the rate at which the population of the planet is increasing for the answer to be "yes". This can happen; there are just constraints on the rate of population increase for which evacuation is still possible.
Hmm. Trying back-of-the-envelope calculations:
- Assume doubling time of about 50 years, exponential population growth. - Assume current population of 6.0e9. - Ergo population t seconds from now is about 6.0e9 * exp(t / 2.0e9). - Ergo rate of change of population now is about 3 new people per second.
Best possible spacecraft is almost all cargo, and all energy goes into overcoming cargo's GPE. Escape energy from Earth's surface is about 6.0e7 J/kg. Assume 100 kg/person, including carry-on baggage. Power required for our perfect space ships for rate of evacuation to balance population growth: 6 GW.
Ok, maybe attainable, if there are several *large* spaceports devoted to this purpose.
Now, chemical rockets. Assume that the best possible reuseable chemical rocket imparts 10% of its energy used to its cargo. This gives 60 GW. A very large industrial infrastructure supporting the spaceports, producing vast amounts of fuel.
Actual costs will be at least an order of magnitude higher, due to manufacturing/repair costs, additional infrastructure, etc., but this might be do-able. If we start evacuating now, and devote all of our efforts to doing so. And have somewhere immediately ready to evacuate to.
From the equations, you can see how it becomes much, much easier if the rate of population growth is reduced. Reduction to no net growth has the handy side effect of eliminating the problem.
The solution to the bandwidth probelm is to launch more satellites. Spot beams and frequency reuse make such a system scalable.
That turns out not to be the case, for reasons which I pointed out in my original message.
Unless you have a really, really _huge_ dish on the satellite, you just can't focus a microwave beam very finely, due to diffraction effects. I'd already been assuming lots of low-orbiting satellites using spot beams when quoting my original figures. If, say, all of Manhattan fits in a spot, it doesn't help much.
However, don't expect satellites to replace cable any time soon. There are difficulties when you try to scale up to the silly bandwidth levels required:
Your satellite is at least 300km up, and making microwave beams really parallel is tricky (unless you want to use a huge dish, which adds weight and cost to the satellite). This means that, even pulling tricks like having multiple fairly-narrow-angle transcievers per satellite, you still get everyone within a few tens of kilometres sharing the same uplink. Fine for low-density areas, but not for cities.
Microwave beams have a data bandwidth comparable to their frequency - a few Gbits/sec. at most. This is the maximum _shared_ bandwidth per uplink region, and the maximum bandwidth of the pipes between the satellites. Bump the frequency up to get more bandwidth, and you start getting blocked by light cloud cover and thin walls (and ceilings). While you could do something like have an optical link from satellite to satellite, your uplink/downlink bandwidth is going to be pretty crummy compared to, say, a fiber backbone serving the same area.
I'm not trying to bash satellite data services - as mentioned above, they do have their uses. I'm just trying to stave off the inevitable flood of "Wow! Now everyone in the city can get cable bandwidth from their palm-pilots!" messages.
Some of the cool stuff some researchers are doing is integrating a laser onto a normal ASIC....
[...]
Now all we need is a way of producing RAM and peripherals that keep match with the speed....
For the RAM, at least, the answer is straightforward. Keep latency at its current range, but _heavily_ interleave RAM both on a bank level and a chip level. You now have RAM that can get 100 cache row requests and service all of them with a batch latency of 7 ns (or 5 ns or [etc]).
This would let you, say, put 8 or 16 cores on a die without worrying about cache misses slowing you down (as long as you have a deep miss buffer).
This would also be useful for transferring vast amounts of data with good locality in a known pattern (for instance, triangle or texture data) from RAM to a peripheral.
This is probably what busses will look like in a decade or two, as it's much easier to eliminate cross-talk and interference on an optical bus than on an electrical one.
Stupid question time - what is the maximum switching frequency of a plain old phototransistor?
Exotic technologies are very neat, but I'm wondering if more conventional technologies might already work.
If phototransistor switching speed is comparable to ordinary transistor switching speed, you could probably build an optical transciever more cheaply by using closely packed frequency channels with bandwidth comparable to the switching speed, and a prism or diffraction grating to split them for parallel reading.
From http://www.dow.com/dow_news/co rporate/20000403a.html:
The Dow Chemical Company is supplying IBM with SiLK* semiconductor dielectric resin
This is an artificial polymer with a low dielectric constant. Not the silk used for cloth.
The article also gives a moderately technical description of why a low-k dielectric is a Good Thing.
I gotta admit, I haven't read the article yet, but plain jane silk certainly isn't the most durable substance on earth.
Read the article. This is a low-K dielectric with the trade name "SILK" (probably an acronym).
"...IBM has developed a proprietary technique to build chips using silk, a low-k dielectric material that is commercially available from the Dow Chemical Co. (DOW.N)."
Probably an acronym of SIlicon, Low-K.
The approaches I've read about to date mostly involve "foaming" whatever material is used, as vacuum or whatever gas ends up in the cavities doesn't have nearly as high a dielectric constant.
Parasitic capacitance is directly related to the dielectric constant of the insulating material.
This is a wonderful accomplishment - now we can get started on the main problem.
Having a complete map of a creature's DNA tells us, in principle, all of the proteins that it can synthesize throughout its lifetime. This gives us the building blocks that the creature uses to build things, and the chemical signals that it uses to direct internal operations.
This is wonderful, and essential. To use an analogy, this is like a Victorian scientist, after years of studying a 1999 notebook computer, managing to deduce how transistors and the wires that connect them work.
He still needs to deduce a lot about capacitance, resistance, and inductance to tell how signals will propagate and influence each other, and needs to build up from scratch all of the disciplines involved in integrated circuit design before he can understand how it works, but it's a start.
Similarly, we can now move on to the next step in understanding biological creatures - trying to figure out what all of the proteins do, and how the systems built from them operate and interact with other such systems.
This would not be an easy task under the best of circumstances. It's made worse by the fact that evolution puts little value on modularity - the systems will interact with each other to such a degree that it will be difficult to even define individual systems within the chaos that is an evolved being.
I wish them luck. They have opened the door, and made available for study the vast landscape of interacting systems that we'll have to understand to truly understand how living creatures work.
For decades, we've more or less all been aware that environmental protection makes economic sense; I don't think anyone doubts that. Unfortunately, it rarely makes immediate- or short-term sense, which is why we see millions of hectares of rainforest disappearing annually. Basically, you can't rely on human beings to act in their own self interest in anything but the short term.
As mentioned in the article, though, owners of potential pollution sources (like landfills) are liable for any pollution that they cause down the road. This gives them a strong incentive to take measures _now_ to prevent themselves from having to shell out a lot of money later.
Tax incentives also help with this, again by providing a short-term reason to do things, but can be politically difficult to implement (best way is probably just to let companies write off plantings like this as a tax expense, thus halving their after-tax cost).
You can never have authentication portability to untrusted networks, because, by definition, they are untrusted. OTOH, it should be possible to set up (with existing tools) a globally distributed LDAP or NIS+ tree to accomplish the reasonable part of what you're trying to do here.
There's a clunky way that I can think of to handle validation over untrusted networks. It adds complexity but seems fairly robust.
Give each person a portable computer. This computer serves as their access point to systems they come in contact with. It has hardware-assisted strong encryption, and can digitally sign anything (ideally everything) that it sends out.
This tells every system in contact with the user (direct or via an untrusted network) who the user is in an unforgeable manner (the public key distribution problem is left as an exercise to the reader). The untrusted system can't forge the signature, and can't even read the packet if it's encrypted for the intended recipient.
Now, you run into serious problems with this when trying to run applications on the untrusted network as opposed to just using it as a communications link. I'd be interested in hearing about anyone's ideas on letting the untrusted network run software that processes private data without being able to see the data...
Sounds a lot like any university's internal LAN. Which probably runs Unix (Solaris at my university).
Unix was already designed to address many of the issues that come up with network computing. I only see a few things that need to be fine-tuned:
Portability of user accounts across the entire network, checking of permissions/licenses for running applications, etc. This is already pretty much here; you just have to know what you're doing to set it up. The challenge is to make sure that all applications on your system work fine for all users, everyone can do what they're supposed to and have access to what they need to, but to make sure that nobody can do anything they shouldn't be able to.
If you are a validated user at one university, you would ideally be able to log into another university with guest privileges, and have it recognize you as a specific user ("foo at bar university"). Similarly, I'd like a validated user on my personal LAN to be able to access someone else's service while keeping an individual identity. Or through another network access their "home" network's services with their full privileges. The idea being that identities and permissions carry over robustly and securely over heterogeneous and possibly untrusted networks.
This ties into the whole "the network is the computer" idea. If I could just use the collective computing power of all of my hypothetical LAN terminals to distribute tasks, I might not need a central server at all (assuming that my tasks have low communications overhead). Similarly, it would be nice to be able to farm off tasks to another "friendly" network. Protocols and support for this is in development, but would need to be standardized for true "network computing" to come into its own.
Again - by and large, these are capabilities that already exist, or exist at least in part. They continue to be developed - and chances are, that development is happening under Unix.
That's not what this is designed for. You'd actually use it for things like this:
If a backbone node goes down, it's not going to go back up a microsecond later. You want to switch _all_ traffic to an alternate route, and then switch it back a few minutes or hours later when the node goes up again.
Think of this switch as acting something like a crossbar bus, connecting pipes point-to-point. Need more bandwidth between point a and point b? Allocate an additional pipe connecting them. Not using all bandwidth? Remove a pipe and allocate it to another pair of servers. Load patterns vary over minutes or hours, not microseconds, so this works fine.
Now, a purely-optical switch that _could_ work on the microsecond or nanosecond level would be very nice; however, a slower switch is still very useful.
What I wanna know is, does this also work with red grapes? Do seeded grapes produce a different effect? and What kind of wine would result from the fermentation of said grapes? Charred-donnay?
It should work with red grapes too, or anything else conducting and about the right size and sliced so that it has a thin membrane in the middle of the "antenna".
Disclaimer: I haven't tried this.
Seeds shouldn't make much difference, though if one heats up enough to go *poof!*, that could be mildly interesting.
"Charred-donnay" is a good guess for the wine result. Mmm. Carbon.
I'd heard about the grape trick a few years ago, but the screenshots here are nice.
What happens is that the sliced, somewhat-conducting grape is just about the right size to act as a microwave antenna. Large currents are set up in the grape, which cause the (very thin) junction between the halves to go *poof*.
You could probably enhance this effect by sprinkling the open surfaces with salt to make them more conducting.
Wasn't there just an article about the Playstation 2 being banned for export from Japan?
o rt/index.html
:). If someone with knowledge of the language could confirm that there's an actual announcement there, it would be much appreciated.
This is correct. Tom's Hardware had a link to this article about it:
http://headline.gamespot.com/news/00_03/01_vg_imp
Digging through Sony's pages, I can't find anything about this in the North American page. The Japanese pages, naturally, I can't read
URL for the Japanese playstation pages is:
http://www.scei.co.jp/index-n.html
This is quite correct. Many of the claims made were shady, and while the technology itself may be feasible after a lot of engineering, the article cited here is certainly a lot more questionable.
Logic errors picked at random, because I'm too tired to cut it to ribbons thoroughly right now:
Previous layers will still fluoresce as your UV beam shines through them - just not as brightly. However, they will fluoresce over a larger area, conserving total luminosity. Therefore, you'd better have extremely good selectivity in your readout optics if you don't want stray light mucking things up. Depend on the previous layer bits averaging out? Bet I can find special cases that still cause problems. Summary: This is not magically superior for layering.
The problems facing multi-layer pit surfaces are exactly the same as those facing multi-layer fluorescent surfaces as described above. No better, and no worse (well, a few implementation differences in error correction, but you get the idea).
Shady support. Wheels have been around for thousands of years. Does this mean that they are obsolete now that we have alternatives?
Analogy, as well as logic, is stretched a bit thin here. Data layout is similar, readout scheme is unrelated. FMD, by coincidence, would use very similar layouts in any spinning-disc devices (I have yet to see a convincing description of how they'd make a credit-card sized solid-state device with this technology).
Short version: Technology is mildly interesting but nothing spectacular. DVD technology has the same potential; neither is much easier to implement. Article itself is vapour, heavy on hype and short on actual thought.
Back-of-the-envelope cost analysis:
Doing it yourself:
Total cost is dominated by the cost of the boards you're putting into the rack. Both the cost of the rackmount enclosure *and* the cost of fiddling with the boards to put them into the rack are irrelevant compared to that, even at a very high dollar-per-hour cost for effort.
Disclaimer: This is a Fermi estimate, not a detailed cost analysis.
It costs time and effort to learn what you need to know to build a system, to pick out components, and to actually put the thing together. And when you sum it all up, it's not such a small amount of time and effort. So unless you're in the business of PC hardware support and thus have to possess all this knowledge anyways, then you're getting a *lousy* deal.
It takes me five minutes looking over a parts sheet to decide what I want in an x86 system. I go down to the store, and say "build this for me". I come back a couple of days later, take it home, spend another five minutes attaching cables, and it goes (well, then there's Linux installation, but if I was feeling masochistic I could get Windows pre-loaded).
I used to build my own machines from parts as a hobby. If it's fun, it isn't "cost". I switched to paying for pre-assembly when it became less fun. Cost increase is minimal.
Current Macs use IDE, SDRAM, PCI, AGP and several dozen other acronyms.
I've never heard of Macs with AGP ports, and until going back to university I was working for a graphics driver development company. We would have been overjoyed to have AGP Macs to write drivers for.
Maybe because 350MHz ist TWO YEARS OLD TECHNOLOGY in the x86 World????
Clock speed will often vary quite widely between architectures; this does not directly affect performance (look at Sparc chips, for example; similar SPEC marks to x86 chips at much lower clock rate).
Performance is based both on clock rate and on how much work is done per clock. This in turn is affected by how pipelining on the chip was set up, and many other things.
A good reference on the subject is "Computer Architecture: A Quantitative Approach", by Hennessy and Patterson (published by Morgan Kaufmann).
Um... not quite. Ever since the PPC was invented, it's always been faster than whatever Intel-based chip was on the market at the time, assuming identical clockrate. The exception was the 601, because you couldn't get equal clockrates; the minimum speed for the 601 was 60 MHz; 486's never went that fast the the Pentium didn't come out for a few months after the 601 did.
Actually, a couple of points here are not quite right (though your figures are more accurate than the previous figures quoted).
The 486 went up to 100 MHz core clock (3:1 CPU to bus ratio). These were sold as "DX4-100" chips (the "4" is the product of marketing). Actually, IIRC AMD offered a DX4-120 (their best offerings in those days ran on a 40 MHz bus). Time frame almost certainly overlaps heavily with the 601; I would have to do more research to quote date/clock frequency points for either line.
Performace-wise, I've mainly relied on SPEC benchmarks (www.spec.org). These are pretty much the canonical measures of performance for real CPUs (and desktop CPUs as well, which are asymptotically approaching workstation-class). By insisting on the same tests (compiled with the tester's choice of compiler) and on full disclosure of the test systems used, they are as close to vendor-neutral as we're likely to get.
x86 and PPC based machines benchmark at roughly the same speed at any given time in SPEC history. Clock frequencies aren't the same, but that's irrelevant - performance is what matters. While PPC was certainly fast, and definitely has a cleaner architecture than x86, it failed to substantially outperform x86 (and conversely, x86 failed to outperform PPC).
Where things get interesting is the G3 and G4. There has been a suspicious dearth of SPEC information from Apple in recent months/years, and a strong outpouring of questionable benchmarks quoted by their marketing departments (most bizzare was the "1.5 clocks/pixel vs. 200 clocks/pixel" filtering quote, debunked on Slashdot by a few people who provided far faster x86 code). The G3 and G4 are most certainly excellent processors, but Apple has failed to put believable numbers behind them when quoting benchmarks.
What I'd really like to see is an independent testing of SPEC marks. This would be do-able on any of the *NIX variants currently running on Gx, and would be quite straightforward on MacOS X. The problem with independent benchmarking is that Apple is best qualified to produce a compiler for the G3/G4. If a PPC based *NIX group tried it, their numbers would most likely be lower than optimal because the compiler wouldn't optimize as well as it might be able to. It would still be interesting as a data point, though.
Before anyone objects that SIMD instructions (like AltiVec and SSE) are difficult to compile to, I'd like to point out that loop unrolling optimizations take you half way there already.
Summary: In the past, advocates of both architectures have failed to prove that their architecture trounces the other. IMO, current _meaningful_ bickering is hampered by a lack of SPECmarks.
This gives 80 clocks per pixel written to the screen
Erk. For one colour component.
Add another 12x4 clocks per component for the additional components. Total is (8+12+12+12)*4 per pixel, or about 180 clocks. 80 clocks would be with SSE doing the component calculations in parallel.
This seems slow. I'm going to have to try coding this to see what I can really get. An UberCoder, as mentioned, can certainly get a factor of two or three by using superscaling to issue several operations at once.
Take a gigahertz X86 processor and toss 256x256 texture bilinear filtering at it, and it's gonna choke. Take a Voodoo 1 that has entire gate arrays devoted to doing nothing else *but* filtering 256x256 textures scaled to arbitrary sizes, and it'll do just fine.
:).
While I agree that custom hardware will dramatically outperform general purpose hardware at dedicated tasts, I question a couple of your statements here.
Firstly, there is a world of difference between a custom integrated circuit and an FPGA. FPGAs have very hefty overhead. While you can build custom logic with them, the slowdown just from using an FPGA will offset the advantage from this in the vast majority of cases (not all, but most). Click on "user info" above and see my previous post on the topic for a more detailed discussion.
Secondly, a 1 GHz general purpose processor can handle texture filtering just fine. Work through the number of operaitons required.
Nastiest case: each pixel drawn to the screen is from a different region of the texture.
Operations needed:
- Extraction of texture coordinates.
Left as an exercise, since you seem to be focusing on filtering. In the absolute worst case, this involves four multiplications and four additions (one 2-element vector subtraction and one 2x2 vector-by-matrix multiplication to convert from screen coordinates to texel coordinates). Finding the origin vector and matrix in the first place are irrelevant, as that's done once for the whole polygon.
Total operations needed: 4 fast, 4 medium (multiplication is slow compared to addition, but fast compared to division).
- Filtering of the texture.
Here's a naieve algorithm that still works pretty well:
Step 1: Truncate texture coordinates, retaining only the fractional components. Call these p and q.
Operations required: Two fast operations.
Step 2: Blend pixel values, for each colour component. Formula is:
result = q(ap + b(1-p)) + (1-q)(cp + d(1-p))
Operations required: Extracting (1-p) and calculating it only once, this needs 4 fast operations and 6 medium applications (multiplication is slow compared to addition, but fast compared to division). You also need four texel fetches.
Total number of operations required: 6 fast, 6 medium.
Total operations for both steps: 10 fast, 10 medium.
Fast operations happen once per clock. Medium would happen once every 3-4 clocks without pipelining, but can happen once per clock also with pipelining. We should have enough filler instructions for bit-twiddling and loads/saves to avoid stalls. Speaking of which, quadruple the instruction count; we need to shift, mask, and convert to floating-point for each colour component of each texel (whole texel is loaded as a 32-bit word, once and only once).
Some of these instructions can be issued simultaneously due to superscaling, but there are also a handful of other instructions for loop control and so forth, so we'll call it even. Memory latency should be completely masked - we only do four fetches and one store per pixel drawn to the screen.
This gives 80 clocks per pixel written to the screen, or a fill rate of about 12 megapixels per second in the worst case using naieve algorithms, for your 1 GHz processor. This gets you around 30 FPS at 640x480, assuming an overdraw of 1.3. Not beautiful, but pretty decent, given that this is non-optimized code doing bilinear filtering.
A real game coder could easily produce a loop that does bilinear filtering in half the time that my illustration version does.
A real game coder working with SSE could produce a loop that gets a factor of three speed gain over *that*, as all of the colour components could be filtered in parallel.
To conclude, while I agree with the gist of your argument, I think you might want to re-check your numbers
This is a good observation. However, what is the speed of a typical FPGA these days? I find it hard to believe they'd compete with the processes Intel, et. al. are using. Wouldn't it be smarter to have a chip with lots of different specialized function units on it?
It turns out that there are a lot of trade-offs here.
FPGAs are indeed much slower than custom integrated circuits. There are a variety of reasons for this, which are beyond the scope of this reply. Typically the speed difference is very substantial (FPGA is typically 5x-10x slower). You also have serious density problems (FPGA has 5x-10x fewer effective gates than a custom IC). This is enough to make FPGAs slower for almost all problems. A few very specialized problems might be better solved on a large FPGA array than an a custom (or even general-purpose IC), but most problems aren't in this category.
Where FPGAs _are_ useful is in quick prototyping and validation of new IC designs, use as "glue logic" on boards, and for processing where performance isn't critical and you aren't shipping very many product units (custom ICs are only practical in lots of 10,000+).
Now, there is the question extending general-purpose processors to contain "lots of specialized functional units". This is useful... to an extent. It depends on what you're doing, and how many units you try to add.
Remember that a functional unit that isn't being used is dead weight - adding silicon and cost. If you have a dozen types of functional unit, each used for a few of the tasks you use your computer for, but each sitting idle most of the time - then I would argue that you have a computer that's 5x or 10x more expensive than it needs to be. While a custom functional unit is considerably faster than a general-purpose processor emulating the same operations, this speed difference isn't huge. On a good nicely-optimized superscalar processor, it might only be a factor of 2 or 3 (unless you're doing something really ugly, like emulating FP in integer or emulating quote notation numbers). So, for most cases, you'd be better off emulating the desired functions, paying 5x less, and living with a processor that was 2x slower for specialized tasks and just fine at everything else.
Exceptions exist. SSE and 3DNow are good examples. While most general-purpose operations don't benefit from SIMD floating-point operations, several common applications do (most notably games). If enough demand exists for a feature, it becomes practical to add a functional unit to handle it. You just have to be careful not to let this get out of hand.
Summary: Adding many new functional units would not be cost-effective, but adding one or two that make sense works very well. FPGAs can't compete with this, though they're useful for other things.
On the other hand, I think your premise is flawed some what. Obviously space exploration isn't the only thing we should be working on (birth control in particular makes sense to me), but it doesn't need to be the only thing.
Oh, I'm most certainly in favour of space exploration and colonization - I just don't think that it will help overpopulation, which is what the original poster was proposing.
Also, if we can get a significant portion of the population up, their children will be born in space - not on earth - to begin with.
I'm looking at this from the point of view that it may take 1000 years to truly accomplish, but I don't think 'never' is a very good answer.
More people being born in space will not cause fewer people to be born on earth. If it helps, rephrase my statement as, "is it _possible_ to evacuate Earth, if Earth's birth rate substantially exceeds its death rate?".
The rate at which you remove people from the planet must be greater than the rate at which the population of the planet is increasing for the answer to be "yes". This can happen; there are just constraints on the rate of population increase for which evacuation is still possible.
Hmm. Trying back-of-the-envelope calculations:
- Assume doubling time of about 50 years, exponential population growth.
- Assume current population of 6.0e9.
- Ergo population t seconds from now is about 6.0e9 * exp(t / 2.0e9).
- Ergo rate of change of population now is about 3 new people per second.
Best possible spacecraft is almost all cargo, and all energy goes into overcoming cargo's GPE. Escape energy from Earth's surface is about 6.0e7 J/kg. Assume 100 kg/person, including carry-on baggage. Power required for our perfect space ships for rate of evacuation to balance population growth: 6 GW.
Ok, maybe attainable, if there are several *large* spaceports devoted to this purpose.
Now, chemical rockets. Assume that the best possible reuseable chemical rocket imparts 10% of its energy used to its cargo. This gives 60 GW. A very large industrial infrastructure supporting the spaceports, producing vast amounts of fuel.
Actual costs will be at least an order of magnitude higher, due to manufacturing/repair costs, additional infrastructure, etc., but this might be do-able. If we start evacuating now, and devote all of our efforts to doing so. And have somewhere immediately ready to evacuate to.
From the equations, you can see how it becomes much, much easier if the rate of population growth is reduced. Reduction to no net growth has the handy side effect of eliminating the problem.