"NUMA" stands for "Non-Uniform Memory Architecture". It's one approach to dealing with system memory on a machine with a large number of processors.
The idea is that each processor module has its own dedicated RAM, which can be accessed both locally and remotely by other machines across the network. System memory as a whole is the aggregate of the local memory banks on all of the processor modules. While this is all in one physical address space, access time will vary depending on whether you're accessing a local or non-local bank. Hence, "Non-Uniform".
There are undoubtedly extensions to NUMA that do more complicated things with system memory; this is just the version that I was told about at university.
Does Linux currently have NUMA-aware memory management? The article states that IBM is developing their own patches for this, but it would be interesting to know what else is out there.
Optimizing for local vs. nonlocal memory accesses can have a big performance impact, as (if memory serves) Sun found a while back.
There seems to be a lot of viruses coming out these days. How immune are the Linux/Unix systems and what can we do to prevent these kind of viruses from causing us trouble? How would the current viruses need to be configured to bother us?
The short answer is that most flavours of Unix, including Linux, don't have much to worry about from the current crop of viruses. This may change in the future, but due to the architecture of Unix it is more difficult for viruses to propagate or to really damage a system.
The long answer is "it depends". Details as follows.
Macro Viruses Viruses and trojans that are embedded in Word documents, Visual Basic scripts, or the like have no effect under Unix, because most Unix systems don't process Word macros or Visual Basic scripts. Thus, most of the crud that has been affecting Windows users has been completely unnoticed by Unix users.
Bombs and Trojans If you are sent an executable, or fetch an executable yourself, and run it, it can modify anything that you have permission to modify, even under Unix. This means that a trojan executable, if you run it, could quite easily destroy all of your files - but not the files of anyone else using the machine, and not the operating system files. In principle, a trojan could also access any facilities that you have access to; this means that a sufficiently clever trojan could mail itself to other people from your account. However, it would have a harder time finding addresses to send itself to (maybe scan ~/mail and/var/spool/mail/username for addresses). So damage is limited, and nobody's bothered implementing effective propagation so far (though it could be done).
True, Infecting Viruses A true virus is capable of infecting arbitrary executables, which themselves will contain the virus and infect other executables. While in principle this could be done under Unix, the virus would again be limited only to executables that you have permission to modify. System tools would not be affected - you couldn't infect "cp" or "ls", for instance. Distribution would also be curtailed, as you don't usually send executables to your friends; you send them a source tarball, or point them to where they can download an executable. So, while something like this could be done, it wouldn't be as devastating as it is under Windows or DOS.
Social Engineering Social engineering remains one of the biggest threats under Unix. It means, simply, convincing a user to do something harmful. In the case of email viruses, the virus must convince the user to open the attachment. Heaven help us when inexperienced users have root access; a virus could simply tell you to "su to root and run this install script" to have devastating impact. This will probably be one of the biggest threats in terms of viruses under Unix.
The idea of a Linux email worm is so interesting that I'm tempted to write one. Must... stay... good...:).
My primary goal in coding is to have fun. The projects that I seek out will be those that interest me and look like fun to work on, as opposed to projects that fall into any given category or serve any given goal.
My major programming interests include graphics, simulation, and complex data processing, but I've dabbled in AI, drivers, user interfaces, and may other things. I especially enjoy being able to define my own implementation structure when solving a problem, but I still enjoy extending others' structures as long as what I'm adding to is already clean and elegant. I also enjoy making things elegant and easy to work with; rewriting code for modularity, simplicity, and style, and writing documentation and manuals (yes, I actually produce documentation).
Summary: There is no one set of interests that drives me, and I suspect that true gurus have similarly broad collections of tastes.
The thing about CISC is that they have a habit of using microcode to translate all of the complex instructions into smaller ones for the core of the processor (AFAIK). The time it takes to decode instuctions is considerable at this stage, sith several hundred instuctions.
Trust me on this one - it's at most one clock, and possibly less. This goes for the x86 instruction set especially; each byte of the variable-length instruction has a fixed purpose, instead of being completely random. To process this, you need to prefetch several bytes ahead and have three or four shifters and a MUX to select and align the next instruction while you're processing the current one, giving a throughput of one instruction fetched and aligned per clock (or more, if you add more silicon). Once the instruction is aligned, you read out each of the fields that might possibly be there, and use a MUX to select them. You have a combinational logic block that processes the opcode and tells you if this is an instruction that can be processed atomically or that needs to be broken down, and a lookup table giving a series of RISCian stages for multi-clock instructions. If the "multi-clock" flag is set, you stall the instruction fetch unit and route in instructions from the lookup table instead.
It should be noted that processing the argument-location byte may insert memory load/store instructions too. However, as these are very predictable, you don't need a table lookup for them (just stalls and instrucion register preloads at appropriate times).
Lots of extra silicon, but little extra time. You do much the same thing in a RISC processor, except without the alignment stage or the lookup table.
And how can these CISC processors outstrip Sun's RISCs? I don't care how many mhz it runs at, the two just work differently.
There are efficient ways to implement CISC; the most common is to break down CISC instructions into smaller RISC-ian operations that are easily pipelined. You might have an extra clock's latency for the decoding, or you might not, depending on architecture.
The functional units that perform operations are the same. The cache and memory subsystems are the same. Processors tend to be bound by design tradeoffs in these or in the motherboard or in things like the register file size as opposed to by the instruction set.
Short version: RISC and CISC are different ways of telling a processor to do the same things. CISC used to make scheduling harder. Now it doesn't.
What makes the Xeon better? The cache is larger, yes? Everyone else on Slashdot knocks L2 cache saying it's overrated and underutilized as is.
For most personal systems, adding cache would give diminishing returns fairly quickly. However, a personal system usually has one or at most two major tasks draining resources. A server has many cpu- and memory-intensive tasks running at once. A small cache would thrash on every context switch. A larger one wouldn't. Of course, how much of a performance gain this gives depends on how long the timeslice is, but if the timeslice is short enough, it's relevant.
Anyone have any insight as to what makes the Xeon a good choice in the server arena?
Probably price. A Xeon system is cheap compared to its competition, if I understand correctly. It doesn't scale as well as a Sun box, and won't give you the floating-point performance of an Alpha box, and doesn't have the memory integration of an SGI box, but as an inexpensive mediocre solution it can be a good buy if you don't really need something cutting-edge.
But I'd really like to know: do potatos taste as good when they've had all their electricity taken out?
Probably not. After use, you'll have oxides and salts of zinc and copper in your potato, which probably won't taste very good.
The potato is actually just acting as an electrolyte and semipermeable membrane - the power comes from the zinc and copper.
There's also the possiblity of sponsorship here. If it were powered by burgers instead of fries, they could put up one of those 'one billion served' banners.
A neat idea, but it probably wouldn't work. Burger grease wouldn't make a very good electrolyte.
Can anyone with inside information explain to me the reasoning behind the Duron's cache setup? It just seems that having an L2 cache smaller than the L1 cache wouldn't increase performance much over L1 with no L2 cache, while it does increase die size. Is there a reason why 64/64/64 was chosen over 64/64/0 or 32/32/128?
#!/usr/bin/perl # # Really simple dialectifying Perl script. # Written by Christopher Thomas # # This can be transformed into a CGI very easily. Sample code for this # has been included but commented out. # # This code may be freely used, distributed, and modified. #
# # Libraries. #
# use whatever_cgi_library; # CGI use strict;
# # Translation tables. # # Make up a set of these tables for each dialect you want to produce. # Remember that this is case-sensitive. #
# # Jar-Jar table. # # I've set this translation table up to mimic the "Jar-Jargonizer" that # was featured in Quickies a while back. # I've forgotten many of the suffix translations the original script did. # No great loss. #
# Translate prefixes of words. my (%jarjar_prefix_trans); %jarjar_prefix_trans = ( );
# Translate suffixes of words. my (%jarjar_suffix_trans); %jarjar_suffix_trans = ( "'re"=>"sa be" );
my (%jarjar_table); %jarjar_table = ( "prefix"=>\%jarjar_prefix_trans, "suffix"=>\%jarjar_suffix_trans, "word"=>\%jarjar_word_trans );
# # Master table indexing hook tables. #
my (%dialect_tables); %dialect_tables = ( "jarjar"=>\%jarjar_table );
# # Utility functions. #
# # This function translates the specified list according to the specified # dialect rules. # # Arg 1 is an array reference pointing to the text to modify. # Arg 2 is a hash reference pointing to a dialect hook table. # # Returns 0 if successful and an error code if not.
sub translate { my ($text_p, $htable_p); my ($prefix_p, $suffix_p, $word_p); my ($errcode); my ($this_line); my ($index); my ($key);
# Get a table to the hook table for this dialect. $hook_p = $dialect_tables{$dialect_choice};
# Sanity check. if (!(defined $hook_p)) { # No such dialect. print "ERROR: Can't find dialect \"$dialect_choice\".\n"; } else { # Try to read the specified web page. @page_text = `lynx -source $target_URL`;
# Blithely assume that this worked.
# Translate the page. translate(\@page_text, $hook_p);
# Dump the page contents to standard output. for ($index = 0; defined $page_text[$index]; $index++) { # Don't add newlines; we've kept the old ones. print $page_text[$index]; }
# Print a trailing newline, just in case the web source didn't have one # and a human is viewing the output. print "\n"; }
But in all seriousness, releasing the code to the dialiectizer would allow us to enjoy it without putting rinkworks at risk of lawsuit, and without overloading rinkworks' server.
This is actually very straightforward to clone. Back when Quickies featured the "jar-jargonizer", I'd spend half an hour fidding with Perl and posted a clone script as a comment.
I'll do the same here.
My point is that for a program this simple, you may be off simply writing your own version rather than asking for source, if the simple clone will be adequate for your needs.
DDoS is the direct result of sloppy upstream administrators. IF I were in your shoes, I would be suing every person upstream for atleast a few hops for passing those 10.0.0.0 packets along for gross negligence.
Um, no.
DDOS simply requires that a lot of compromized boxes be able to send you packets. Spoofing to non-existant return addresses is an orthogonal issue. You reply that it's used to mask the souce boxes? Any _valid_ address could also be used for that, so filtering would gain you nothing against that.
I agree that filtering of reserved addresses should be done, but that would not hinder a DDOS attack.
That was a good account of what happened, but in part two, we want to hear what you are doing to track the bastards down.
Unfortunately, if I understand correctly, that can only be reliably done by manual traffic analysis by the sysadmins of the various routers en route, if I understand correctly. The origins and possibly routes of the incoming packets will have been forged, so you have to actually go from router to router looking for unusual traffic.
Disclaimer: I am not a networking guru.
Various modifications to routing software have been proposed that would make tracking easier (see the recent slashdot article). However, at present you're in for a lot of work and still probably out of luck.
Why are you installing a Unix-based firewall in front of some Unix-based public servers? Why not secure the servers in the first place?
Having a firewall in place to filter invalid packets and other crud thrown at the servers means that more of the servers' time is spent generating slashdot pages. Also, the simpler the Unix box, the easier it is to secure - hence, securing a stripped down firewall instead of a big, complex slashdot server.
Secondly, I assume you understand the purpose of the "acknoledgement", which is essentially the "hey, I'm with the previous result, I'm ready for the next set of inputs"? The acknoledgement along with the normal properties of CMOS prevent any "race condition" from occurring (I assume that is your fear?).
My fear isn't a race condition; it's a spurious signal emitted from a previous output stage causing processing to begin before it should in the following stage, with invalid data. Spurious signals like this occur all of the time, and are called "glitches"; they result when multiple paths through a logic block have different lengths. The canonical solution is to ignore all outputs until enough time has passed for them to stabilize. Glitches can also be minimized by adding redundant logic terms.
Again, I'm not exactly following you here, but remember, we're not using traditional NAND, etc... gates in CMOS, but NCL gates (e.g., 3 of 5) and they are designed specifically for NCL and the acknowledgement flow.
However, your NCL gates are still composed of transistors set up using CMOS logic rules (or any of a variety of dynamic schemes that accomplish the same thing). This winds up giving effects similar to those you would see with standard boolean logic circuits. As far as I can tell from the documentation, in actual implementation NCL isn't so much a departure from boolean logic as a layer of meta-logic on top of it that makes it self-clocking. The actual physical signal encoding on individual lines is boolean (the lines are just grouped in interesting ways).
Thus, while the gates are self-clocked, they seem to be as vulnerable to glitching as any other combinational logic blocks.
Information regarding "orphans" noted. It's interesting, but doesn't relate to my question.
Again, I do *NOT* speak for Theseus Logic and there are much better individuals here who can clear up any questions. Feel free to fire off some questions to the address(es) on the web site.
Noted; thanks for posting the link, btw. This is a very interesting approach to asynchronus circuitry.
The ideas presented in the papers on the Theseus Logic site are interesting. However, the True/False/Null logic scheme defined seems to be vulnerable to glitches in gate inputs. A brief transition to a valid state on all inputs as the previous stage's logic settled would be interpreted as a new input datum by the gate in question, possibly resulting in unwanted output being produced. In other words, using T/F/N logic seems to place stricter timing requirements on input signals than clocked logic with edge-triggered registers.
Is this correct, or am I missing something? I realize that glitching can be reduced by careful logic design, but this seems to be an issue that is addressed neither in your post nor in the papers on the Theseus site.
And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines).
They aren't; what asynchronus logic in an IC context deals with is reducing power consumption by not clocking all parts of the chip all of the time.
In a synchronus microprocessor, the system clock is distributed to all functional units, and the functional units even when not in use usually wind up having some kind of internal state change every clock cycle. This results in a lot of heat production, because every time the state of a bit in a register or of a bus line changes, heat is dissipated (by nature of the way the parisitic capacitances are charged and discharged).
In a truly asynchronus microprocessor, there is no master system clock distributed to the functional units of the chip. Instead, actions in a functional unit take place when input data changes (i.e. new input data arrives). This results in only the state of units being used changing, which in turn means much less power dissipation if only one or two units is being used at a given time.
In practice, real systems don't fit into either category. Fully synchronus circuits burn a lot of power, but truly asynchronus circuits are difficult to design and are very sensitive to certain types of process variation. An often-used compromise is to use gated clocks - A synchronus clock is propagated, but only to the functional units that are being used. This principle is extended within the functional units themselves; internal clocks and data are propagated only when they need to be for the operation being performed. This results in a circuit that is much easier to design and fabricate than a truly asynchronus circuit, and that is almost as good from a power consumption point of view.
I hope this clarifies what the debate over asynchronus computing is about.
There are actually 3 options for B to understand A's code. Option #3 is: the code is so simple that anyone can understand it. This is actually very common in the real world for itches that start out small.
In my experience, this doesn't happen much. Even the toy utilities that I've written (such as the document-formatter I used to produce the LML33 guide) are complex enough to be non-trivial. For a more realistic example, look at the LML33 driver itself. It's relatively cleanly written, but something like my guide helps a *lot*.
Similarly, documentation and clean packaging are pretty essential to avoid scenario #1 for almost all projects.
On a totally unrelated sidenote: I'm very interested in the LML33 for a home project I'd like to do, but I'm having trouble figuring some things out (pre-purchase). I was actually at your page last night and downloaded your paper. Would it be OK if I emailed you and asked some questions?
Certainly, though my knowledge grows increasingly out of date. I'd also suggest researching the Video4Linux driver for Zoran cards. Parts of the LML33 driver still need considerable work.
>My point is that its hard to maintain open source packages.
Which is why YOU shouldn't. Here's how Open Source works:
1) Programmer A has an itch. 2) Programmer A scratches until the itch stops. 3) Programmer B tries to use the same program to scratch his itch, but finds himself only half-scratched. He patches until the itch is fully scratched.
You seem to be overlooking the fundamental problem with releases - programmer B has to *UNDERSTAND* the code that programmer A wrote.
This involves one of two things.
Option #1: Programmer B has to spend weeks digging through leftover cruft that programmer A didn't bother to delete and reverse-engineering code that programmer A didn't bother to document and finding specs that programmer A didn't bother to write or point to.
Option #2: Programmer A has to spend a few person-weeks of effort and clean _up_ the code, _write_ documentation, and attend to all of the other details of packaging the release that allow other programmers to modify it without pain.
This takes a LOT of effort, and if it has to be done in one-hour chunks while you work at your day-job or take courses full-time or what-have-you, it will take a long time to do.
I've done this myself. I wrote a series of extensions to the driver for the Linux Media Labs 33 video capture card and posted packaged results to the web. http://www-ug.eecg.toronto.edu/~thoma sc/lml33.
Please make sure that you have full knowledge of the effort involved in a task before belittling someone else's efforts with it.
Perhaps it would then be better to begin pushing for more efficient programming usage of multi-processors on a single system? From what I know which isn't a lot, we could make more use of parallel processing than most systems currently allow for, or am I completely wrong?
It turns out that, for several reasons, multiprocessors aren't likely to dominate desktops for a few years yet.
The first reason is that systems with multiple _discrete_ processors are more expensive. You need to pay for multiple processor modules, and the motherboard needs a more complex chipset. Joe Average Gamer is better off spending the same amount of money getting a top-of-the-line video card, and a new single processor six months later. Joe Average Non-Gamer doesn't need a multiprocessor for email and office apps.
The second reason is that writing good parallel code is much more difficult than writing good sequential code. Race conditions, interprocess communication, and so forth add plenty of complexity, and compiler tools won't save you - parallelism is designed in at a higher level than compilers deal with.
The third reason is that interprocessor communications bandwidth and memory coherency overhead are *big* problems for multi-processor systems, and they keep on getting bigger as more processors are added. Something like a Starfire, for instance, isn't a large set of processors and memory with a bus tacked on - it's the Bus Network of the Gods with processors and memory tacked on as an afterthought. It has to be, to handle supercomputer communications loads. This means that a lot of the money you spend on a parallel system *won't* be on processing power. If, on the other hand, you're willing to wait another design generation, you can get a comparable processor for a much lower price.
The fourth reason is that while we could indeed integrate many old cores on a new die, we get better performance by doing other things. Adding more cache, for instance, or adding fast, complicated FP units that would have taken too much silicon to add before. Making a bigger translation lookaside buffer (important with a 64-bit address space). Improving branch prediction (a big source of stalls). Adding deeply pipelined load/store units (another big source of stalls). Or adding whatever other performance-enhancing widgets are invented over the next five years. Multple cores are an interesting idea, but at _present_ aren't the most effective way of increasing performance.
All of these factors mean that parallel processing isn't used except by those who really, *really* need it (dual-processor doesn't count).
Now, the caveat.
Once cache performance saturates - and it eventually will - we'll have a lot of silicon to play with when moving to higher linewidths. At the same time, we'll also have to break chips into asynchronus pieces to solve the clock skew problem. We may also be reaching limits to superscaling (scheduling is an NP-complete problem, and approximations reach diminishing returns eventually). At this point, it starts to make sense to put multiple cores in a chip, along with the coherency logic and communications pathways needed.
However, I don't see your desktop machine running a processor like that for 5-15 years, for the reasons mentioned above.
This is the single worst comment in the Linux kernel, and I don't even have to look at the rest of them to know it.
Fast code does not have to be messy.
Actually, if I remember the story correctly, that was a comment in the context-switching code, and so has little bearing on the current optimization threads. The writer didn't feel like explaining how context switching worked on the architecture in question.
wire of the same length as in a previous chip would be slower in the new chip because of the reduced driver sizes, thinner wires (increased resistance), and the relatively unchanging capacitance. (The capacitance per unit length stays about the same at smaller sizes because of fringing effects.)
Capacitance is still (AFAIK) dominated by the diffusion capacitance of transistor sources/drains connected to the wire. Second contributor, IIRC, was gate capacitance. Both of these go down with feature size.
You might point out that gate and drain area will only go down in one dimension, as I'll be sticking more devices on the bus, but they'll still go down.
Wire resistance similarly isn't a huge contributor AFAIK. In all of the sets of parameters that I've seen, even a long bus wire would have resistance lower than the effective resistance of a MOSFET in saturation mode.
Lastly, while your drivers get smaller, the W/L ratio of the gates remains the same. This means that, should you be inclined to melt down your circuit, you could still pass the same amount of current through a smaller MOSFET.
Now, as far as using intelligence is concerned... Most of the cynicism I've seen expressed both towards coding and towards IC design has been put forward by people who aren't doing coding or IC design (in general; I don't know what your personal qualifications are). The fact remains that while boneheaded code gets written and while boneheaded ICs are most likely designed, there are still companies that do it right. These gain market share, grow complacent, and fall to the next group that does it right, continuing the grand cycle.
My point being that you aren't likely to get order-of-magnitude performance improvements by "using intelligence". The people you're competing against already are.
As far as the ultimate limits of communication on smaller, faster chips are concerned, I doubt this will become a serious problem. Designers will simply focus more on pipelining and asynchronus operation of modules to relax system-wide signal timing constraints.
Solid state photonics is coming, and there's nothing you can do about it.
Solid state photonics will still have its feature size limited by the wavelength of the light used within its devices. _Current_ integrated circuit chips use feature sizes that are much smaller - by the time photonics matures, it will already be left in the dust as far as density is concerned.
Use smaller wavelengths of light? Not unless you want to destroy your material by photoionization.
Your next logical argument is to point out that most proposed photonic devices are three-dimensional. My logical counterargument is to point out that you can build three-dimensional electrical devices too. It's just currently cheaper to shrink 2D fabrication processes.
Your next probable point is to make noise about propagation delay in electrical circuits. It turns out that these aren't the limiting issue in conventional ICs - heat dissipation is.
Your next likely point is to say that a photonic circuit would have less heat dissipation. My response is that I'll believe it when I see it. Absorption happens, and whatever diode lasers are pumping this device won't be perfectly efficient either.
Lastly, I'd like to point out that most of the effort that goes into designing integrated circuits goes into designing the logic, not the fabrication processes. Computer engineers would still be employed in your hypothetical universe. Electrical engineers design motherboards and specialized analog ICs, both of which would still exist, so they wouldn't be out of work either.
Summary: Photonics is not the magic wand you hold it out to be.
>Satellites at any altitude take *no* fuel to stay in orbit Actually no. Theoretically yes. But earth's atmosphere actually extends way way up there.
...As was clearly stated in my previous post. However, orbital decay due to atmospheric friction even in low orbit takes years or decades - the atmosphere's density drops off exponentially (not as one exponential, but as a piecewise-exponential curve). It's extremely tenuous and gets even more so as you go up.
Let's use figures from the ionosphere data you cite for density - about 1.0e12 particles per cubic metre at the most dense layer. Let's also assume low earth orbit, which has an orbit length of about 4.0e7 metres. This gives collisions with 4.0e19 particles per square metre per orbit.
By comparison, water has about 3.3e28 particles (molecules) per cubic metre. Air has about 2.4e25 particles per cubic metre at one atmosphere of pressure. Much, much denser.
Lets assume that you have a satellite with a mass of one tonne and a cross-sectional area of 5 square metres (solar panels and all). Assume it's in LEO with an orbital period of about 90 minutes. How long will it take before it loses 0.5% of its orbital velocity (enough to lower its orbit by about 70 km)?
Let's assume that all collisions are inelastic - that is, mass that the spacecraft collides with sticks and accretes. This will give an approximately correct answer and is easy to calculate.
The spacecraft's mass needs to increase by about 0.5% for inelastic collisions to lower its velocity by 0.5%. It needs to collide with about 5 kg of matter to do this. With an area of 5 square metres, that means 1 kg of matter per square metre.
Let's assume that air has a density of around 1.1 kg/m^3 at one atmosphere (I may be off by 0.1 or so, but that's close enough for these purposes). This gives a mass of 1.1 kg * (4.0e19 / 2.4e25) = 1.8e-6 kg per orbit.
At 90 minutes per orbit, this means 4.9e7 minutes for orbital velocity to be reduced by 0.5%, or about 90 years. That's considerably longer than the expected lifetime of the satellite's electronics.
In summary, for anything placed in LEO or higher, it will be something other than orbital decay that determines satellite lifetime.
if a satellite is at 300 km it will take a lot of fuel to keep it in orbit. that is impractical and very expensive
Satellites at any altitude take *no* fuel to stay in orbit. That's the definition of "orbit" in this context. An object in orbit is circling the earth quickly enough that centripetal acceleration in its curved path exactly balances gravity. Newton's Laws keep it circling forever (or at least many, many years, until the whisps of atmosphere at those altitudes cause it to slow down and crash).
regarding palm pilots with satellite links, let's not forget that microwave comm is line of sight. it not very practical in cities, mountains, tunnels, etc. forget about being inside man-made structures
Your cell phone is operating on microwave frequencies. Microwaves will penetrate a few wavelengths through most substances, and wave wavelengths on the order of a few centimetres. This means that they will happy pass through several tens of centimetres of brick, concrete, and what-have-you, which is enough for most locations (line of sight to a satellite that's *not* directly overhead doesn't have to pass through dozens of stories of a building - just the nearest walls).
Moot point in a city, though, for the reason mentioned above.
you have to be outside and know where your satellite to be able to talk to it
No, you just need a transmitter powerful enough that a satellite 300 km away can see its omnidirectional signals with sensitive detectors. A palm-pilot would have trouble doing this, but not a somewhat larger transciever in your briefcase (with a lower-power link to the pilot).
Again, though, you seem to be missing the point of my post - that satellite service to cities isn't practical for the bandwidth demands of a city.
"NUMA" stands for "Non-Uniform Memory Architecture". It's one approach to dealing with system memory on a machine with a large number of processors.
The idea is that each processor module has its own dedicated RAM, which can be accessed both locally and remotely by other machines across the network. System memory as a whole is the aggregate of the local memory banks on all of the processor modules. While this is all in one physical address space, access time will vary depending on whether you're accessing a local or non-local bank. Hence, "Non-Uniform".
There are undoubtedly extensions to NUMA that do more complicated things with system memory; this is just the version that I was told about at university.
Does Linux currently have NUMA-aware memory management? The article states that IBM is developing their own patches for this, but it would be interesting to know what else is out there.
Optimizing for local vs. nonlocal memory accesses can have a big performance impact, as (if memory serves) Sun found a while back.
The short answer is that most flavours of Unix, including Linux, don't have much to worry about from the current crop of viruses. This may change in the future, but due to the architecture of Unix it is more difficult for viruses to propagate or to really damage a system.
The long answer is "it depends". Details as follows.
Viruses and trojans that are embedded in Word documents, Visual Basic scripts, or the like have no effect under Unix, because most Unix systems don't process Word macros or Visual Basic scripts. Thus, most of the crud that has been affecting Windows users has been completely unnoticed by Unix users.
If you are sent an executable, or fetch an executable yourself, and run it, it can modify anything that you have permission to modify, even under Unix. This means that a trojan executable, if you run it, could quite easily destroy all of your files - but not the files of anyone else using the machine, and not the operating system files. In principle, a trojan could also access any facilities that you have access to; this means that a sufficiently clever trojan could mail itself to other people from your account. However, it would have a harder time finding addresses to send itself to (maybe scan ~/mail and
A true virus is capable of infecting arbitrary executables, which themselves will contain the virus and infect other executables. While in principle this could be done under Unix, the virus would again be limited only to executables that you have permission to modify. System tools would not be affected - you couldn't infect "cp" or "ls", for instance. Distribution would also be curtailed, as you don't usually send executables to your friends; you send them a source tarball, or point them to where they can download an executable. So, while something like this could be done, it wouldn't be as devastating as it is under Windows or DOS.
Social engineering remains one of the biggest threats under Unix. It means, simply, convincing a user to do something harmful. In the case of email viruses, the virus must convince the user to open the attachment. Heaven help us when inexperienced users have root access; a virus could simply tell you to "su to root and run this install script" to have devastating impact. This will probably be one of the biggest threats in terms of viruses under Unix.
The idea of a Linux email worm is so interesting that I'm tempted to write one. Must... stay... good...
My primary goal in coding is to have fun. The projects that I seek out will be those that interest me and look like fun to work on, as opposed to projects that fall into any given category or serve any given goal.
My major programming interests include graphics, simulation, and complex data processing, but I've dabbled in AI, drivers, user interfaces, and may other things. I especially enjoy being able to define my own implementation structure when solving a problem, but I still enjoy extending others' structures as long as what I'm adding to is already clean and elegant. I also enjoy making things elegant and easy to work with; rewriting code for modularity, simplicity, and style, and writing documentation and manuals (yes, I actually produce documentation).
Summary: There is no one set of interests that drives me, and I suspect that true gurus have similarly broad collections of tastes.
The thing about CISC is that they have a habit of using microcode to translate all of the complex instructions into smaller ones for the core of the processor (AFAIK). The time it takes to decode instuctions is considerable at this stage, sith several hundred instuctions.
Trust me on this one - it's at most one clock, and possibly less. This goes for the x86 instruction set especially; each byte of the variable-length instruction has a fixed purpose, instead of being completely random. To process this, you need to prefetch several bytes ahead and have three or four shifters and a MUX to select and align the next instruction while you're processing the current one, giving a throughput of one instruction fetched and aligned per clock (or more, if you add more silicon). Once the instruction is aligned, you read out each of the fields that might possibly be there, and use a MUX to select them. You have a combinational logic block that processes the opcode and tells you if this is an instruction that can be processed atomically or that needs to be broken down, and a lookup table giving a series of RISCian stages for multi-clock instructions. If the "multi-clock" flag is set, you stall the instruction fetch unit and route in instructions from the lookup table instead.
It should be noted that processing the argument-location byte may insert memory load/store instructions too. However, as these are very predictable, you don't need a table lookup for them (just stalls and instrucion register preloads at appropriate times).
Lots of extra silicon, but little extra time. You do much the same thing in a RISC processor, except without the alignment stage or the lookup table.
And how can these CISC processors outstrip Sun's RISCs? I don't care how many mhz it runs at, the two just work differently.
There are efficient ways to implement CISC; the most common is to break down CISC instructions into smaller RISC-ian operations that are easily pipelined. You might have an extra clock's latency for the decoding, or you might not, depending on architecture.
The functional units that perform operations are the same. The cache and memory subsystems are the same. Processors tend to be bound by design tradeoffs in these or in the motherboard or in things like the register file size as opposed to by the instruction set.
Short version: RISC and CISC are different ways of telling a processor to do the same things. CISC used to make scheduling harder. Now it doesn't.
What makes the Xeon better? The cache is larger, yes? Everyone else on Slashdot knocks L2 cache saying it's overrated and underutilized as is.
For most personal systems, adding cache would give diminishing returns fairly quickly. However, a personal system usually has one or at most two major tasks draining resources. A server has many cpu- and memory-intensive tasks running at once. A small cache would thrash on every context switch. A larger one wouldn't. Of course, how much of a performance gain this gives depends on how long the timeslice is, but if the timeslice is short enough, it's relevant.
Anyone have any insight as to what makes the Xeon a good choice in the server arena?
Probably price. A Xeon system is cheap compared to its competition, if I understand correctly. It doesn't scale as well as a Sun box, and won't give you the floating-point performance of an Alpha box, and doesn't have the memory integration of an SGI box, but as an inexpensive mediocre solution it can be a good buy if you don't really need something cutting-edge.
But I'd really like to know: do potatos taste as good when they've had all their electricity taken out?
Probably not. After use, you'll have oxides and salts of zinc and copper in your potato, which probably won't taste very good.
The potato is actually just acting as an electrolyte and semipermeable membrane - the power comes from the zinc and copper.
There's also the possiblity of sponsorship here. If it were powered by burgers instead of fries, they could put up one of those 'one billion served' banners.
A neat idea, but it probably wouldn't work. Burger grease wouldn't make a very good electrolyte.
Can anyone with inside information explain to me the reasoning behind the Duron's cache setup? It just seems that having an L2 cache smaller than the L1 cache wouldn't increase performance much over L1 with no L2 cache, while it does increase die size. Is there a reason why 64/64/64 was chosen over 64/64/0 or 32/32/128?
#!/usr/bin/perl
#
# Really simple dialectifying Perl script.
# Written by Christopher Thomas
#
# This can be transformed into a CGI very easily. Sample code for this
# has been included but commented out.
#
# This code may be freely used, distributed, and modified.
#
#
# Libraries.
#
# use whatever_cgi_library; # CGI
use strict;
#
# Translation tables.
#
# Make up a set of these tables for each dialect you want to produce.
# Remember that this is case-sensitive.
#
#
# Jar-Jar table.
#
# I've set this translation table up to mimic the "Jar-Jargonizer" that
# was featured in Quickies a while back.
# I've forgotten many of the suffix translations the original script did.
# No great loss.
#
# Translate prefixes of words.
my (%jarjar_prefix_trans);
%jarjar_prefix_trans =
(
);
# Translate suffixes of words.
my (%jarjar_suffix_trans);
%jarjar_suffix_trans =
(
"'re"=>"sa be"
);
# Translate whole words.
my (%jarjar_word_trans);
%jarjar_word_trans =
(
"me"=>"meesa",
"I"=>"meesa",
"you"=>"yousa",
"am"=>"be",
"I'm"=>"meesa be",
"are"=>"be",
"people"=>"Gungans",
"person"=>"Gungan",
"microsoft"=>"the Sith"
);
#
# Translation table hook pointers.
#
my (%jarjar_table);
%jarjar_table =
(
"prefix"=>\%jarjar_prefix_trans,
"suffix"=>\%jarjar_suffix_trans,
"word"=>\%jarjar_word_trans
);
#
# Master table indexing hook tables.
#
my (%dialect_tables);
%dialect_tables =
(
"jarjar"=>\%jarjar_table
);
#
# Utility functions.
#
#
# This function translates the specified list according to the specified
# dialect rules.
#
# Arg 1 is an array reference pointing to the text to modify.
# Arg 2 is a hash reference pointing to a dialect hook table.
#
# Returns 0 if successful and an error code if not.
sub translate
{
my ($text_p, $htable_p);
my ($prefix_p, $suffix_p, $word_p);
my ($errcode);
my ($this_line);
my ($index);
my ($key);
# Default to success.
$errcode = 0;
# Read arguments.
$text_p = $_[0];
$htable_p = $_[1];
# Sanity check.
if ((!(defined $text_p)) || (!(defined $htable_p)))
{
# Bad arguments.
$errcode = 1;
}
else
{
# Extract hook pointers.
$prefix_p = $$htable_p{prefix};
$suffix_p = $$htable_p{suffix};
$word_p = $$htable_p{word};
# Blithely assume that these hook pointers are valid.
# Translate.
for ($index = 0; defined ($this_line = $$text_p[$index]); $index++)
{
# Pad the string, to make life easier.
$this_line = " $this_line";
# Replace prefixes.
foreach $key (keys %$prefix_p)
{
# Take precautions against munching HTML tags.
$this_line =~ s/([^\w\])/$$suffix_p{$key}$1/g;
}
# Replace words.
foreach $key (keys %$word_p)
{
# Take precautions against munching HTML tags.
$this_line =~ s/([^\w\])/$1$$word_p{$key}$2/g;
}
# Remove padding.
$this_line =~ s/^.//;
# Store this line back in the array.
$$text_p[$index] = $this_line;
}
}
# Return the result.
return $errcode;
}
#
# Main program.
#
my ($target_URL);
my ($dialect_choice);
my (@page_text);
my ($hook_p);
my ($index);
# OUTPUT CONTENT-TYPE HEADER HERE # CGI
# Get arguments.
# CGI READING GOES HERE # CGI
$target_URL = $ARGV[0];
$dialect_choice = "jarjar";
# Get a table to the hook table for this dialect.
$hook_p = $dialect_tables{$dialect_choice};
# Sanity check.
if (!(defined $hook_p))
{
# No such dialect.
print "ERROR: Can't find dialect \"$dialect_choice\".\n";
}
else
{
# Try to read the specified web page.
@page_text = `lynx -source $target_URL`;
# Blithely assume that this worked.
# Translate the page.
translate(\@page_text, $hook_p);
# Dump the page contents to standard output.
for ($index = 0; defined $page_text[$index]; $index++)
{
# Don't add newlines; we've kept the old ones.
print $page_text[$index];
}
# Print a trailing newline, just in case the web source didn't have one
# and a human is viewing the output.
print "\n";
}
But in all seriousness, releasing the code to the dialiectizer would allow us to enjoy it without putting rinkworks at risk of lawsuit, and without overloading rinkworks' server.
This is actually very straightforward to clone. Back when Quickies featured the "jar-jargonizer", I'd spend half an hour fidding with Perl and posted a clone script as a comment.
I'll do the same here.
My point is that for a program this simple, you may be off simply writing your own version rather than asking for source, if the simple clone will be adequate for your needs.
DDoS is the direct result of sloppy upstream administrators. IF I were in your shoes, I would be suing every person upstream for atleast a few hops for passing those 10.0.0.0 packets along for gross negligence.
Um, no.
DDOS simply requires that a lot of compromized boxes be able to send you packets. Spoofing to non-existant return addresses is an orthogonal issue. You reply that it's used to mask the souce boxes? Any _valid_ address could also be used for that, so filtering would gain you nothing against that.
I agree that filtering of reserved addresses should be done, but that would not hinder a DDOS attack.
That was a good account of what happened, but in part two, we want to hear what you are doing to track the bastards down.
Unfortunately, if I understand correctly, that can only be reliably done by manual traffic analysis by the sysadmins of the various routers en route, if I understand correctly. The origins and possibly routes of the incoming packets will have been forged, so you have to actually go from router to router looking for unusual traffic.
Disclaimer: I am not a networking guru.
Various modifications to routing software have been proposed that would make tracking easier (see the recent slashdot article). However, at present you're in for a lot of work and still probably out of luck.
Why are you installing a Unix-based firewall in front of some Unix-based public servers? Why not secure the servers in the first place?
Having a firewall in place to filter invalid packets and other crud thrown at the servers means that more of the servers' time is spent generating slashdot pages. Also, the simpler the Unix box, the easier it is to secure - hence, securing a stripped down firewall instead of a big, complex slashdot server.
Secondly, I assume you understand the purpose of the "acknoledgement", which is essentially the "hey, I'm with the previous result, I'm ready for the next set of inputs"? The acknoledgement along with the normal properties of CMOS prevent any "race condition" from occurring (I assume that is your fear?).
My fear isn't a race condition; it's a spurious signal emitted from a previous output stage causing processing to begin before it should in the following stage, with invalid data. Spurious signals like this occur all of the time, and are called "glitches"; they result when multiple paths through a logic block have different lengths. The canonical solution is to ignore all outputs until enough time has passed for them to stabilize. Glitches can also be minimized by adding redundant logic terms.
Again, I'm not exactly following you here, but remember, we're not using traditional NAND, etc... gates in CMOS, but NCL gates (e.g., 3 of 5) and they are designed specifically for NCL and the acknowledgement flow.
However, your NCL gates are still composed of transistors set up using CMOS logic rules (or any of a variety of dynamic schemes that accomplish the same thing). This winds up giving effects similar to those you would see with standard boolean logic circuits. As far as I can tell from the documentation, in actual implementation NCL isn't so much a departure from boolean logic as a layer of meta-logic on top of it that makes it self-clocking. The actual physical signal encoding on individual lines is boolean (the lines are just grouped in interesting ways).
Thus, while the gates are self-clocked, they seem to be as vulnerable to glitching as any other combinational logic blocks.
Information regarding "orphans" noted. It's interesting, but doesn't relate to my question.
Again, I do *NOT* speak for Theseus Logic and there are much better individuals here who can clear up any questions. Feel free to fire off some questions to the address(es) on the web site.
Noted; thanks for posting the link, btw. This is a very interesting approach to asynchronus circuitry.
The ideas presented in the papers on the Theseus Logic site are interesting. However, the True/False/Null logic scheme defined seems to be vulnerable to glitches in gate inputs. A brief transition to a valid state on all inputs as the previous stage's logic settled would be interpreted as a new input datum by the gate in question, possibly resulting in unwanted output being produced. In other words, using T/F/N logic seems to place stricter timing requirements on input signals than clocked logic with edge-triggered registers.
Is this correct, or am I missing something? I realize that glitching can be reduced by careful logic design, but this seems to be an issue that is addressed neither in your post nor in the papers on the Theseus site.
And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines).
They aren't; what asynchronus logic in an IC context deals with is reducing power consumption by not clocking all parts of the chip all of the time.
In a synchronus microprocessor, the system clock is distributed to all functional units, and the functional units even when not in use usually wind up having some kind of internal state change every clock cycle. This results in a lot of heat production, because every time the state of a bit in a register or of a bus line changes, heat is dissipated (by nature of the way the parisitic capacitances are charged and discharged).
In a truly asynchronus microprocessor, there is no master system clock distributed to the functional units of the chip. Instead, actions in a functional unit take place when input data changes (i.e. new input data arrives). This results in only the state of units being used changing, which in turn means much less power dissipation if only one or two units is being used at a given time.
In practice, real systems don't fit into either category. Fully synchronus circuits burn a lot of power, but truly asynchronus circuits are difficult to design and are very sensitive to certain types of process variation. An often-used compromise is to use gated clocks - A synchronus clock is propagated, but only to the functional units that are being used. This principle is extended within the functional units themselves; internal clocks and data are propagated only when they need to be for the operation being performed. This results in a circuit that is much easier to design and fabricate than a truly asynchronus circuit, and that is almost as good from a power consumption point of view.
I hope this clarifies what the debate over asynchronus computing is about.
There are actually 3 options for B to understand A's code. Option #3 is: the code is so simple that anyone can understand it. This is actually very common in the real world for itches that start out small.
In my experience, this doesn't happen much. Even the toy utilities that I've written (such as the document-formatter I used to produce the LML33 guide) are complex enough to be non-trivial. For a more realistic example, look at the LML33 driver itself. It's relatively cleanly written, but something like my guide helps a *lot*.
Similarly, documentation and clean packaging are pretty essential to avoid scenario #1 for almost all projects.
On a totally unrelated sidenote: I'm very interested in the LML33 for a home project I'd like to do, but I'm having trouble figuring some things out (pre-purchase). I was actually at your page last night and downloaded your paper. Would it be OK if I emailed you and asked some questions?
Certainly, though my knowledge grows increasingly out of date. I'd also suggest researching the Video4Linux driver for Zoran cards. Parts of the LML33 driver still need considerable work.
>My point is that its hard to maintain open source packages.
Which is why YOU shouldn't. Here's how Open Source works:
1) Programmer A has an itch.
2) Programmer A scratches until the itch stops.
3) Programmer B tries to use the same program to scratch his itch, but finds himself only half-scratched. He patches until the itch is fully scratched.
You seem to be overlooking the fundamental problem with releases - programmer B has to *UNDERSTAND* the code that programmer A wrote.
This involves one of two things.
Option #1: Programmer B has to spend weeks digging through leftover cruft that programmer A didn't bother to delete and reverse-engineering code that programmer A didn't bother to document and finding specs that programmer A didn't bother to write or point to.
Option #2: Programmer A has to spend a few person-weeks of effort and clean _up_ the code, _write_ documentation, and attend to all of the other details of packaging the release that allow other programmers to modify it without pain.
This takes a LOT of effort, and if it has to be done in one-hour chunks while you work at your day-job or take courses full-time or what-have-you, it will take a long time to do.
I've done this myself. I wrote a series of extensions to the driver for the Linux Media Labs 33 video capture card and posted packaged results to the web. http://www-ug.eecg.toronto.edu/~thoma sc/lml33.
Please make sure that you have full knowledge of the effort involved in a task before belittling someone else's efforts with it.
Perhaps it would then be better to begin pushing for more efficient programming usage of multi-processors on a single system? From what I know which isn't a lot, we could make more use of parallel processing than most systems currently allow for, or am I completely wrong?
It turns out that, for several reasons, multiprocessors aren't likely to dominate desktops for a few years yet.
The first reason is that systems with multiple _discrete_ processors are more expensive. You need to pay for multiple processor modules, and the motherboard needs a more complex chipset. Joe Average Gamer is better off spending the same amount of money getting a top-of-the-line video card, and a new single processor six months later. Joe Average Non-Gamer doesn't need a multiprocessor for email and office apps.
The second reason is that writing good parallel code is much more difficult than writing good sequential code. Race conditions, interprocess communication, and so forth add plenty of complexity, and compiler tools won't save you - parallelism is designed in at a higher level than compilers deal with.
The third reason is that interprocessor communications bandwidth and memory coherency overhead are *big* problems for multi-processor systems, and they keep on getting bigger as more processors are added. Something like a Starfire, for instance, isn't a large set of processors and memory with a bus tacked on - it's the Bus Network of the Gods with processors and memory tacked on as an afterthought. It has to be, to handle supercomputer communications loads. This means that a lot of the money you spend on a parallel system *won't* be on processing power. If, on the other hand, you're willing to wait another design generation, you can get a comparable processor for a much lower price.
The fourth reason is that while we could indeed integrate many old cores on a new die, we get better performance by doing other things. Adding more cache, for instance, or adding fast, complicated FP units that would have taken too much silicon to add before. Making a bigger translation lookaside buffer (important with a 64-bit address space). Improving branch prediction (a big source of stalls). Adding deeply pipelined load/store units (another big source of stalls). Or adding whatever other performance-enhancing widgets are invented over the next five years. Multple cores are an interesting idea, but at _present_ aren't the most effective way of increasing performance.
All of these factors mean that parallel processing isn't used except by those who really, *really* need it (dual-processor doesn't count).
Now, the caveat.
Once cache performance saturates - and it eventually will - we'll have a lot of silicon to play with when moving to higher linewidths. At the same time, we'll also have to break chips into asynchronus pieces to solve the clock skew problem. We may also be reaching limits to superscaling (scheduling is an NP-complete problem, and approximations reach diminishing returns eventually). At this point, it starts to make sense to put multiple cores in a chip, along with the coherency logic and communications pathways needed.
However, I don't see your desktop machine running a processor like that for 5-15 years, for the reasons mentioned above.
/* You are not expected to understand this */
This is the single worst comment in the Linux kernel, and I don't even have to look at the rest of them to know it.
Fast code does not have to be messy.
Actually, if I remember the story correctly, that was a comment in the context-switching code, and so has little bearing on the current optimization threads. The writer didn't feel like explaining how context switching worked on the architecture in question.
wire of the same length as in a previous chip would be slower in the new chip because of the reduced driver sizes, thinner wires (increased resistance), and the relatively unchanging capacitance. (The capacitance per unit length stays about the same at smaller sizes because of fringing effects.)
Capacitance is still (AFAIK) dominated by the diffusion capacitance of transistor sources/drains connected to the wire. Second contributor, IIRC, was gate capacitance. Both of these go down with feature size.
You might point out that gate and drain area will only go down in one dimension, as I'll be sticking more devices on the bus, but they'll still go down.
Wire resistance similarly isn't a huge contributor AFAIK. In all of the sets of parameters that I've seen, even a long bus wire would have resistance lower than the effective resistance of a MOSFET in saturation mode.
Lastly, while your drivers get smaller, the W/L ratio of the gates remains the same. This means that, should you be inclined to melt down your circuit, you could still pass the same amount of current through a smaller MOSFET.
Now, as far as using intelligence is concerned... Most of the cynicism I've seen expressed both towards coding and towards IC design has been put forward by people who aren't doing coding or IC design (in general; I don't know what your personal qualifications are). The fact remains that while boneheaded code gets written and while boneheaded ICs are most likely designed, there are still companies that do it right. These gain market share, grow complacent, and fall to the next group that does it right, continuing the grand cycle.
My point being that you aren't likely to get order-of-magnitude performance improvements by "using intelligence". The people you're competing against already are.
As far as the ultimate limits of communication on smaller, faster chips are concerned, I doubt this will become a serious problem. Designers will simply focus more on pipelining and asynchronus operation of modules to relax system-wide signal timing constraints.
Solid state photonics is coming, and there's nothing you can do about it.
Solid state photonics will still have its feature size limited by the wavelength of the light used within its devices. _Current_ integrated circuit chips use feature sizes that are much smaller - by the time photonics matures, it will already be left in the dust as far as density is concerned.
Use smaller wavelengths of light? Not unless you want to destroy your material by photoionization.
Your next logical argument is to point out that most proposed photonic devices are three-dimensional. My logical counterargument is to point out that you can build three-dimensional electrical devices too. It's just currently cheaper to shrink 2D fabrication processes.
Your next probable point is to make noise about propagation delay in electrical circuits. It turns out that these aren't the limiting issue in conventional ICs - heat dissipation is.
Your next likely point is to say that a photonic circuit would have less heat dissipation. My response is that I'll believe it when I see it. Absorption happens, and whatever diode lasers are pumping this device won't be perfectly efficient either.
Lastly, I'd like to point out that most of the effort that goes into designing integrated circuits goes into designing the logic, not the fabrication processes. Computer engineers would still be employed in your hypothetical universe. Electrical engineers design motherboards and specialized analog ICs, both of which would still exist, so they wouldn't be out of work either.
Summary: Photonics is not the magic wand you hold it out to be.
>Satellites at any altitude take *no* fuel to stay in orbit
Actually no. Theoretically yes. But earth's atmosphere actually extends way way up there.
...As was clearly stated in my previous post. However, orbital decay due to atmospheric friction even in low orbit takes years or decades - the atmosphere's density drops off exponentially (not as one exponential, but as a piecewise-exponential curve). It's extremely tenuous and gets even more so as you go up.
Let's use figures from the ionosphere data you cite for density - about 1.0e12 particles per cubic metre at the most dense layer. Let's also assume low earth orbit, which has an orbit length of about 4.0e7 metres. This gives collisions with 4.0e19 particles per square metre per orbit.
By comparison, water has about 3.3e28 particles (molecules) per cubic metre. Air has about 2.4e25 particles per cubic metre at one atmosphere of pressure. Much, much denser.
Lets assume that you have a satellite with a mass of one tonne and a cross-sectional area of 5 square metres (solar panels and all). Assume it's in LEO with an orbital period of about 90 minutes. How long will it take before it loses 0.5% of its orbital velocity (enough to lower its orbit by about 70 km)?
Let's assume that all collisions are inelastic - that is, mass that the spacecraft collides with sticks and accretes. This will give an approximately correct answer and is easy to calculate.
The spacecraft's mass needs to increase by about 0.5% for inelastic collisions to lower its velocity by 0.5%. It needs to collide with about 5 kg of matter to do this. With an area of 5 square metres, that means 1 kg of matter per square metre.
Let's assume that air has a density of around 1.1 kg/m^3 at one atmosphere (I may be off by 0.1 or so, but that's close enough for these purposes). This gives a mass of 1.1 kg * (4.0e19 / 2.4e25) = 1.8e-6 kg per orbit.
At 90 minutes per orbit, this means 4.9e7 minutes for orbital velocity to be reduced by 0.5%, or about 90 years. That's considerably longer than the expected lifetime of the satellite's electronics.
In summary, for anything placed in LEO or higher, it will be something other than orbital decay that determines satellite lifetime.
if a satellite is at 300 km it will take a lot of fuel to keep it in orbit. that is impractical and very expensive
Satellites at any altitude take *no* fuel to stay in orbit. That's the definition of "orbit" in this context. An object in orbit is circling the earth quickly enough that centripetal acceleration in its curved path exactly balances gravity. Newton's Laws keep it circling forever (or at least many, many years, until the whisps of atmosphere at those altitudes cause it to slow down and crash).
regarding palm pilots with satellite links, let's not forget that microwave comm is line of sight. it not very practical in cities, mountains, tunnels, etc. forget about being inside man-made structures
Your cell phone is operating on microwave frequencies. Microwaves will penetrate a few wavelengths through most substances, and wave wavelengths on the order of a few centimetres. This means that they will happy pass through several tens of centimetres of brick, concrete, and what-have-you, which is enough for most locations (line of sight to a satellite that's *not* directly overhead doesn't have to pass through dozens of stories of a building - just the nearest walls).
Moot point in a city, though, for the reason mentioned above.
you have to be outside and know where your satellite to be able to talk to it
No, you just need a transmitter powerful enough that a satellite 300 km away can see its omnidirectional signals with sensitive detectors. A palm-pilot would have trouble doing this, but not a somewhat larger transciever in your briefcase (with a lower-power link to the pilot).
Again, though, you seem to be missing the point of my post - that satellite service to cities isn't practical for the bandwidth demands of a city.