Slashdot Mirror


User: pslam

pslam's activity in the archive.

Stories
0
Comments
394
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 394

  1. Re:Nobody give a fig about optimizing on Where Have All The Cycles Gone? · · Score: 1
    Well, it's not that nobody cares. It's just that a programmer's time is worth more than a user's time. (Especially if the programmer isn't a user of the same software.) And while on amateur projects, the programmer can afford to take some pride in their work and will want to make it fast, it's totally different in the professional realm, where you just want to get the work over with as soon as possible so you can bill it and get on to the next project.

    I think this is taken too strictly by many people, even (so mind numbingly stupidly) in the embedded world. Here's a real world example: if you can optimise your MP3 player to use 30% less CPU, you get 20% better battery time. Or, you can fit a battery that's 20% smaller, and save $1 per unit sold. Sell a million units and you've saved a million dollars. Now that's expensive!

    Perhaps one day computers will require so much electricity that algorithmic optimisation is suddenly cost effective again :)

    (Or people will realise that optimising code greatly benefits laptops...)

  2. Not to mention power consumption on Where Have All The Cycles Gone? · · Score: 2, Informative
    Nobody except embedded programmers. My biggest project of late runs on an 8-bit, 8 MHz CPU with about 7k of Flash and 192 BYTES of RAM. Not megs, not kilobytes, but bytes. That's equivalent to less than three lines worth of text. And the code's written in C, rather than assembly, so while it's easier to maintain, it takes more effort to make sure it stays efficient.

    I think all programming students should have to code for a system like this. It gives you a MUCH greater appreciation for what the compiler is doing for you, and what the consequences of simple changes can be.

    Indeed - it's quite a pain trying to port bloated code to an embedded environment, even if it's nowhere near as restricted as the one you're describing. And if you're running off battery power, then every clock cycle costs you battery. It's amazing how hard it is to describe to some people how much of a problem that is. This is something that infuriated me about (for example) Vorbis - a lot of the design really doesn't permit a low footprint, so at the end of the day it's actually a rather battery expensive codec to use.

    Still, the old saying that 90% of cpu time is spent in 10% of the code holds just as true for power consumption. In my case, you optimise the MP3/WMA/Vorbis/etc decoder to its limit, and speed up disk reads (to keep it spun down as much as possible). It's due to these efforts that stuff we make (see my info) has far better battery life than rival products. And as a bonus - that also makes user visible responsiveness much better.

    Most of it's down to careful design and not micro-optimisation. It really doesn't need a smattering of assembler all over the place (but maybe a couple of functions here and there, e.g memcpy and friends). Perhaps one day people will realise this also holds true for laptops, and we'll see those cycles getting used a bit more efficiently...

  3. Re:Not necessarily on Samsung's Linux-based Diskless Camcorder · · Score: 1
    In the MMUs I've studied, the memory cache is physically mapped to avoid MMU overhead for the most common case, and there is also a translation cache which keeps recently used mapping entries cached. For every memory access (even ones in the cache), the translation cache is queried so its result will be available without delay if the access misses the local cache and thus requires a bus transaction.

    This is the wrong way around: physically mapped caches are usually less efficient than virtually mapped. For every single memory access, the physical address needs to be looked up in the MMU first before accessing the cache, which adds a clock cycle of latency. In virtual tagged caches, the memory address is used directly to index a cache line, and the TLBs are polled concurrently.

    Virtual tagged caches are always faster in a system where aliasing isn't a problem.

  4. Re:Not necessarily on Samsung's Linux-based Diskless Camcorder · · Score: 1
    Only if you have MMU onboard, but don't use it. Otherwise, changing from user adress space to kernel's and back should cost you something in terms of cpent CPU cycles and cache trashing.

    On many processors (ARM included), kernel and user space have the same address mappings, and switching priveledge levels can be done in a few clock cycles. There are no tables to switch, or caches to flush. System calls on ARM are ludicrously fast - a "SWI" instruction takes only a few clocks, and you emerge in the system call handler with supervisor priveledges.

    A well designed MMU (such as on ARM processors) incurs a negligable penalty.

    If all processes share same address space, many things in kernel become faster and easier.

    Process switching is usually the only thing that's expensive, because it changes address mappings and needs a cache flush (on ARMs with a VIVT cache, which is most common). In an embedded environment, you generally don't have much process contention, or even multiple processes for that matter. I've never found the MMU to be the cause of any measureable performance hit on any ARM-Linux based work I've done.

  5. Not necessarily on Samsung's Linux-based Diskless Camcorder · · Score: 3, Informative
    Since it has no MMU. Without the overhead of actually having to manage the memory, it's got to be faster.

    This is not necessarily true. The difference in speed you'll get with a properly arranged MMU will be negligable. I hate SoC manufacturers who fall for this line of thinking and miss out the MMU "because it's not needed". It just makes development and debugging 10 times harder for a mostly negligable speed and power consumption gain.

    Any SoC designers out there: please stop producing high spec CPUs without MMUs! You aren't doing anyone a favour.

  6. Re:wikipedia article on OFDM on A House Divided: UWB's Double Standards · · Score: 3, Informative
    http://en.wikipedia.org/wiki/OFDM

    After reading this, it seems pretty clear that Motorola backs cdma-based solution just because it has already invested huge amounts in (w)cdma-based technologies, already having lots of patents giving it more royalties, not because of it's technological merits.

    The so-called Multiband OFDM Alliance appears to be rather counter to the whole point of OFDM. OFDM is extremely efficient with the frequency band it has to fit in, and doesn't need to be blasted over the whole spectrum to achieve high data rates. There are gigabit wireless networks already in the works (and an ITU standard), aimed as an upgrade to the existing 802.11 stuff.

    Plain ordinary OFDM/COFDM is here today - it actually works right now, is in wide spread consumer use, and doesn't mess up band allocation like UWB does. You're right in that the only reason UWB is being proposed is to support a patent regime centered around Intel and the deceptive scumbags at Time Domain (UWB is not a magic system which uses no spectrum - it uses all of it and spectrum allocation is there for a good reason). I'm pretty sure OFDM will have data rates far exceeding even the theoretical maximums of UWB, far before UWB oozes out of the lab and into the unfortunate consumers' hands.

    You should see how little code (relatively speaking) it takes to decode (and encode!) OFDM, and how little spectrum you need.

  7. Re:You hear what you want to hear on Inside the iPod, Past and Present · · Score: 1
    The people who complain about bass must be using 'phones with impedance that doesn't agree with the iPod's headphone jack.

    What they're complaining about is they used too small capacitors on the headphone outputs, which means if you have a real load (i.e headphones and not a line-in socket that people are mistakenly measuring responses with), you get a rather early roll-off in bass, at around 100Hz instead of 20Hz. It's more severe with lower impendance headphones.

    Take your pick as to whether it's to save a few cents, or just a design error.

  8. Re:Reasons to use threads on uniprocessor x86 on Comparing Linux To System VR4 · · Score: 1
    When event dispatch is sufficient.

    Yes, that's the term I was grasping for. It's amazing how quickly people dive into a monstrously threaded program without considering that option. I put the blame for a lot of this down to the original (thankfully going obsolete) Java programming model - have a thread for everything that blocks. Many people are unfortunately now raised on this misdesigned concept.

  9. Reasons to use threads on uniprocessor x86 on Comparing Linux To System VR4 · · Score: 5, Informative
    He's obviously quite unqualified to write the article and didn't even bother to ask anyone. A single processor can emulate multiple processors, and this is often a convenient and even efficient programming model. To elaborate:
    • Sometimes it's cheaper in memory and/or clock cycles to use context switching and multiple stacks than scheduling functions off a single thread. This can be true even if the threads aren't concurrent (e.g coroutines).
    • It's often easier to use multiple threads even when not necessary, despite having to deal with mutexes. The amount of state in some protocols can lead to a mess.
    • When you need low latency, threads are often the only solution.
    • Single threaded apps cannot schedule tasks preemptively. Reason enough right there.
    • If you need prioritisation of preemptive tasks. When you do, the kernel is best off doing the scheduling because you might not be the only process with priority needs.
    • A thread is just a process without most of the baggage, and you don't see people arguing that processes don't belong on x86.
    Then again, mindless use of threads does annoy me. So I'll list some "soft" indicators of when you shouldn't use threads:
    • When a single threaded app would be substantially faster.
    • When you don't need preemption.
    • When you're going to be using 8,000 of them. It's at least 4-16KB per thread, and thread switches aren't negligably cheap. Rewrite with poll().
    • When you cannot say with certainty that you won't deadlock or race.
    • When you don't understand what the previous point means.
    • When your hardware/OS/platform has a hideous thread switching cost. Can't think of any reasonable system these days where this is a show stopper.
    Leave criticism of OS features to those who are qualified, Murphy. Better still, try asking one of them - there's no shortage.
  10. Re:Um, flaw in the film? on A Strange Streak Imaged in Australia · · Score: 5, Interesting
    My guess is a very bright event (the failure of the streetlight, probably) causing CCD overexposure and subsequent temporary ill effects on the rest of the CCD scan line. Any Canon geeks in the house who know about the CCD scanning direction of a Powershot G3 and can compare it with the streak "trajectory" angle?

    I've had all sorts of annoying artifacts like this on my image capture setup at home, but generally overloading the CCD produces horizontal and vertical streaks only, which would follow the layout of the CCD (rows and columns?) The image could still be explained by either:

    • The CCD being deliberately mounted at an angle in the G3 (perhaps to reduce aliasing effects).
    • The bright spot caused lens flaring towards the top level just before the shot, with nearby pixels being dimmed in the image taken very shortly after.

    My theory is the bright flash is actually sunlight reflected off the lamp and either overloading the CCD or causing a lens flare just before the image, resulting in this artifact. I get that a lot with cars going by my camera setup at home, especially at sunrise and sunset. The only difference I get is that they're all perfectly horizontal and/or vertical.

  11. Re:I think I can hear... on A Brief History of the iPod · · Score: 5, Informative
    Also, I find it interesting how many ipod-clones are coming out. I guess it's true what Steve J. once said about "imitation being the greatest form of flattery"

    Apple was not the first to make a hard disk portable player. They were the first to ship one with a 1.8" hard disk, which hardly makes everything else a clone - they just got there second. Nobody was really taken by surprise, and the major MP3 companies were already well into designing their own.

    Apple was also not the first to make a mini hard disk portable. They were the first to ship a 4GB 1" hard disk player, and then only just. They were beaten by many companies to ship a 1" 1.5GB HD player (including where I work) - but they had a supply of 4GB drives before everyone else. In fact, Rio even managed to announce and demonstrate their own 4GB player hours before Job's keynote speech. Spot how he deliberately missed the comparison of the Mini iPod to the Rio Nitrus (a 1" HD player), and instead picked a Rio 256MB flash player as a convenient strawman.

    It's slightly irritating that Apple's reality distortion field now makes it possible for everyone to claim that all other players are "clones".

  12. Re:Misleading on Opera Facing Losses While Firefox Usage Grows · · Score: 4, Insightful
    Not for long if Minimo has anything to say about it.

    And they say: The primary focus of Minimo to date has been system with ~32-64 MB of RAM, running Linux and using the GTK toolkit.

    Not to belittle their efforts, but 32-64MB of RAM is more than your average palm top device, and GTK is a memory hog. Something that fits in 2-4MB RAM is more like what a portable device needs.

    Still, it's a good start.

  13. Re:Compression is the magic smoke anyway on Bluetooth Plans to Triple Bandwidth · · Score: 1
    "Three times more bandwidth", because it is being compressed typically at 3-to-1.

    I'd have to look deeper into the specs, but I thought there was at least a doubling in data rate by using PSK modulation with multiple (2?) bits per symbol, instead of GFSK with 1 bit per symbol. I bet that impacts signal quality somewhat. It's also not something you can upgrade in firmware. Now that you mention compression is involved, I think I'll delve into the specs some more...

  14. Comment riddled with errors on Bluetooth Plans to Triple Bandwidth · · Score: 4, Informative
    for those who don't know, bluetooth is currently only 10mbps bandwidth. this is about as much as usb 1.0.

    Currently Bluetooth is about 721 kbits. EDR will extend it to 2.1Mbits.

    tripling the bandwidth isn't really a good solution either if you ask me. while 30mbps is faster, it's not nearly enough to over take the up and coming wireless usb or wireless firewire. both of which i believe are going to be UWB based (i.e. 400mbps).

    Tripling the bandwidth would allow lossless transmission to stereo headphones, where currently it's (slightly) compressed. It's a relatively small change in spec too - mostly just a change to the modulation scheme.

    UWB will likely have a hard time passing regulations (except in the US), because it's a deliberate radiator over a large chunk of everyone else's spectrum. It's also dubious whether it's actually a low power solution, or better than OFDM (802.11g and friends) when power isn't an issue. It also doesn't exist in a useful commercial form, and probably (in my opinion) never will. Or at least, never should.

    one of the interesting design decisions with bluetooth is that it operates at the exact same hz as a cell phone signal. hence the convergence with cell phones and bluetooth, as it was obviously designed with this purpose in mind.

    No, it operates at 2.4GHz, like most other consumer digital wireless stuff.

    maybe we'll get lucky and cell phones will have 1gb+ memory with built in mp3 player support one day, so i won't have to carry so many different damn devices:P

    Because Bluetooth was designed with low power consumption firmly in mind, it's ideal for MP3 players. The transceivers these days are incredibly small. I'm sure you'll see it common place soon.

  15. Again with the speculative battery problem... on DS Pre-Orders Stopped as Sales Soar · · Score: 1
    The PSP has been riddled with plausible problems since its inception - a high cost to manufacture, attempting to reach out to a market that's not there (not many older people aside from the hardcore gamers play handheld systems), and low battery life.

    To me it sounds like the battery problems are overstated. Mostly by DS fanboys. Me, I'm buying neither, so I'm looking at these platforms purely on their technical merit. The DS has an enormous battery life, mostly due to its CPU being enormously underpowered - I mean, most new MP3 players are more powerful than both of its CPUs put together. It's a huge step up from the GBA, but that doesn't say much because that couldn't even decode MP3s in real time.

    The PSP has a rather marginal battery life, and depends largely on how much you use the drive. This is very familiar to me, from working on MP3 players. If you don't carefully manage the interval between disk spin-ups, you end up with it having a large impact on your battery life. In practice, you can get caching working so effectively that it only cuts into about 5-10% of your total battery life. The rest is taken up by the CPU, LCD backlight, and headphone amp. (Turn on wireless and you eat batteries like an American eats oil)

    It probably restricts the design of your game somewhat, but it's really not a hard problem to solve in the total system. If you get it right, you have far more capacity than a memory card, at very little extra power consumption.

  16. Re:Technical merits? on DS Pre-Orders Stopped as Sales Soar · · Score: 2, Informative
    Not really. (Actual Nintendo DS developer here, don't want to get fired or sued by N so staying anonymous of course.)

    MP3 player developer here :)

    CPU is way too slow for DivX - main CPU is just a 66MHz ARM, and the second CPU (33MHz ARM) is dedicated to running the OS services. (Really wacky design.)

    The 4MB of RAM is pretty small by PDA standards nowadays. Also, it uses a very proprietary memory card format for its ROM storage, though it has 32-bit addressing, and it'll probably be reverse-engineered quickly anyway. The hardware-level implementation is kinda crappy though, and although there's a basic MMU I don't know if it'll be useful enough for "real" Linux (uCLinux maybe though)

    That's a pretty mediocre system spec. Most of the MP3 player CPUs we look at these days are ARM7-90MHz or ARM9E-133MHz and up. Most of the time they're only running at about 20-30MHz or so for decoding, but the headroom is great for the occasional burst of UI activity or database access etc. And they still manage enormous battery life like that. About 30-100mW is the power consumption you get these days. I find it ridiculous that most MP3 players probably have more more CPU power than the DS.

    I'm really quite confused by Nintendo's choice of platform. The only reason I can think of for them using:

    • 66MHz ARM9, when they can usually clock 133-200Mhz.
    • 33MHz ARM7, when they can usually clock 90MHz.
    • 4MB RAM, when it's actually really hard to find a single 8 bit SDRAM chip less than 16MB these days, and doesn't price-point very well. SDRAM is a very small power drain in the total system, and larger chips don't draw measurably more.
    • A limited MMU, when ARM have one already designed for the purpose that's good enough for general usage (i.e full Linux). I don't buy arguments about tightly coupled memory being incompatible with an MMU - the development effort with TCM nullifies the miniscule performance (and battery) increase it gives you in reality vs a cached memory system.

    Actually, this reminds me somewhat of a CPU I've worked with *cough* PortalPlayer *cough* that was seemingly designed around being highly efficient and low power. Trouble was it didn't work out efficient in practice and it just made development incredibly difficult instead.

    Here's an example: why didn't they just use a highly integrated ARM-9E 133MHz CPU like you can find from many vendors (i.e Samsung)? Perhaps the answer lies in no-holds-barred cost cutting...

  17. Reality check on C++ In The Linux kernel · · Score: 2, Informative
    Nothing will come out of it, for the simple reason that C++ does not belong in any kernel. In a kernel, all the code needs to be transparent, and you definitely don't want to hide implementation and the usual abstractions.

    The simple reason for that is that otherwise the kernel would be unpredictable. Let's say the error logging function used the string class (which likes to allocate memory behind your back). If the memory allocation function fails and tries to print an error message... you got yourself a kernel crash. This is why the kernel is significantly more difficult to program than, say, a word processor.

    Put the strawman away. You don't absolutely have to use all the features C++ gives you if they aren't right for you. Don't use exceptions or run-time typing in an environment where you don't need it, or don't know if it'll work. Personally I never use either because I strongly disagree with the code they generate. If you don't understand that skilled usage of C++ generates exactly the same code that the equivalent C would (except the C++ took less time to type and checked your syntax better), you don't understand the point of C++ whatsoever. It's just a glorified type checking macro expander. And there's no reason to use plain C when you've got that at your disposal.

    Now have a look at examples of kernels written in C++. One easy example: Red Hat eCos. Just saying that C++ doesn't do kernels well doesn't remove the fact that people do use it in kernels, it works very well, and there are many examples out there.

  18. Re:Different operations on Intel And AMD's Dual-Core CPUs Investigated · · Score: 1
    What is being referred to here is the possibility of having different cores, not just two identical cores on the same silicon. Similarly to how the PowerPC970 has two different branch prediction algorithms which "compete": each calculating which branches should be taken, with a central heuristic keeping track of how well each has been doing lately and chosing which will be used for the next series of branch predictions, a heterogeneously cored chip could offer several differing implementations of the same realestate.

    You see something a bit like this on many embedded CPUs these days, particularly the ones designed to do a mix of vector DSP and complex conditional work (e.g MP3 players). Most commonly you get an ARM based core (a nice down-to-earth architecure) partnered with a heavy duty DSP (arcane and weird e.g 24 bit word size!) communicating via tightly coupled memory. This works fantastically well - often you end up running at much less than half the clock speed you would have needed otherwise, which more than compensates for the increase in chip area. This works because each core is doing only what it does best: bitstream decoding on the ARM, and iMDCTs on the DSP.

    For MP3 players, that saves a ton of battery power, but the win scales to high end servers where you find you can do so much more within the same heat or clock frequency budget. Intel, AMD, IBM and friends have tried to get high bandwidth vector stuff done using SIMD extensions, but they still manage way less than their theoretical peak performance because the instruction set doesn't pack enough information to get the job done without extra "register shuffling" operations - and using the SIMD extensions takes instruction decode bandwidth away from the rest of the stream...

    I'd love to have FPGAs on the side of my CPU, but I'm still a little unconvinced about their practicality. Most of the time you probably just wanted a large array of general purpose multipliers or dividers. In my opinion the only thing FPGAs get around is instruction decoding, and if that's not your bottleneck it doesn't buy you anything. Most DSPs get pretty much their peak theoretical performance in practice, so it's not a problem for them. If your application ends up with lots of time spent not actually pumping data, then an FPGA looks like a big win, though...

    Me, I'd like somebody to put 8 Xilinx Spartan 3E-400 chips on a PCI-Express card. That would be 128 18x18 multipliers and 3 million gates put to whatever use you want :)

  19. Re:Seems an easy tradeoff to me... on FCC Approves BPL Despite Interference Concerns · · Score: 2, Interesting
    BPL will bring communications on par or better than amateur radio to the areas in which it's deployed. You won't need a license to use it, and you can do much more with it.

    So long as it doesn't interfere with emergency freqs, it's a net gain.

    • Broadcast crap all over amateur radio bands.
    • Amateur radio users have to sign up for BPL.
    • BPL ISPs rake in money from all those monthly fees.
    A net gain, but not for the amateur radio users.
  20. Re:Is any BPL being done in the US at the moment? on FCC Approves BPL Despite Interference Concerns · · Score: 3, Informative
    I know that a great deal of work is being done in Europe on it already, and even here in South Africa (with some of the Eurpoean deployment in Spain being done by an SA firm, which is basically what I know of the global BPL situation:) ). To the best of my knowledge, these implementations are still experimental work though.

    To my knowledge, every single trial of BPL in the UK has been abruptly cut short by legitimate complaints about interference - people not being able to watch digital TV, use portable (dect) phones and especially interference with amateur radio. And yet another trial pops up, goldfish memory style, only to be cut short again. I like to call this "proof by exhaustive irrelevance." They think if there's enough evidence, it's proof that it works, even if it's proof that it's an utter failure and everyone's concerns are vindicated.

    I'm horrified that the US regulator has allowed this to happen, because I know the same tactics will be used on the UK regulator, and it'll probably succeed. Guess I can say goodbye to my long range lightning storm radio detection hobby...

  21. Enough with the pure boolean logic on UK Record Industry Sues 'Major Filesharers' · · Score: 2, Insightful
    I just thought I would clear this up, because the babbling of all the braindead asswipes that frequent this place can sometimes confuse newcomers who don't understand what's so hard about "don't take things that aren't yours to take".

    The braindead babble, much like yours, is the result of the naive application of pure boolean logic to situations which aren't black and white. You can't just boil a complex problem down to a small number of truths in this hand-waving manner, as you will notice by the number of well written replies picking large holes in all the assumptions you've made to get there.

    These days, I'm seeing this style of grandstanding boolean logic applied to so many of the world's issues by people with far too much power. It doesn't solve anything, and usually ends with war.

    Have sledgehammer, see every problem as a nail.

  22. Re:Virtual Machine Syndrome on Open Source Speech Recognition - With Source · · Score: 5, Informative
    It is most easily recognized in a release announcement, where for no reason whatsoever the afflicted developer suddenly interjects a statement like "and it's just as fast as C", to the bewilderment of the audience.

    An expecially odd statement considering much of speech recognition can be broken down into great big vector operations, which are perfect for hand coding in C. Bet I could quadruple the speed of it in a couple of hours with some hand coded SIMD ops in x86 assembler.

    It's funny because Java is fantastic at JIT compiling code with lots of non-local behaviour (e.g complex UIs) because it can take into account global behaviour at runtime. But it sucks at tight, heavy computation loop. DSP is a fantastic example of something Java is going to get creamed at when pitched against non-virtual machines.

    Of course, if you have some cross-platform standard API calls for those vector DSP ops, then it's a different argument...

  23. Re:Can someone explain??? on Samsung Demos Future Memory Chips · · Score: 1
    OK. I just RTFA. The FLASH uses a slightly smaller geometry. But I am still surprised by the large (fourfold) increase in the FLASH size comapred to the DRAM.

    Possibly they can get away with larger die sizes on FLASH because they can get away with errors. Many blocks (up to 1% I think) of a FLASH die are allowed to be bad (but reallocated) at production time, whereas a single bad bit in SDRAM means that die goes in the bin. I guess that makes it possible to have good yields even with large die areas.

  24. Re:64 bits is awfully big already on ZFS, the Last Word in File Systems? · · Score: 1
    If you're going to use more than 64 bits, the next step might as well be 128.

    Not for stored data - it's trivial for a processor to zero pad up to 128 bits, and truncate down to 96 for storage. You want to store metadata as tightly as possibly otherwise things like directories start to take a long time to transfer. The negligable amount of time it takes to pack/unpack the data is nothing compared to that. Hell, you might as well use 80 bits.

  25. Re:Hmf. on ZFS, the Last Word in File Systems? · · Score: 1
    The point would be to be ready as storage densities increase. In the last 8 years we've gone from a terabyte filling a room to a terabyte on a desktop, and I'm sure there are more density breakthroughs coming.

    Given 1TB drives today, and doubling in capacity every year, it'll be 24 years before drives are bigger than 64 bits. As another poster points out, Sun's patents (and not to mention Sun itself) will have expired by then :)