Embedded Linux 1-Second Cold Boot To QT
An anonymous reader writes "The blog post shows an embedded device cold booting Linux to a QT application all in just one second. This post also includes a link which describes what modifications were made to achieve this."
I assume they mean Qt application, not QuickTime, or whatever.
...speechless!
Ciao!
HOW?
Any details on what they've done? Is it largely hardware-specific initialisation, or have they made changes that would be useful on a traditional desktop?
# cat
Damn, my RAM is full of llamas.
...booted in about 5 seconds, and that was to a general desktop.
And my toy homebrew OS boots to a primitive UI in under 2 seconds after BIOS, and much of that is running interpreted bytecode.
The fact is that a full BIOS + Linux / Windows system is a horrible fucking mess of bloat, but part of it is the price you pay finding and initialising all those millions of third party devices your old/embedded device isn't going to need to worry about.
Still, as always, I believe any engineer's claim not before I get to test it myself.
Its good to see a fun tech article like this on /. I haven't seen any in a while (maybe its just me).
I assume that during boot time, the Qt UI and low level hardware modules are loaded immediately. Then other modules and services can be loaded later on such as networking, video capture drivers and other lower priority services. I also assume the UI is not based on X but a Qt implementation that is directly drawing to the frame buffer.
Lately I have been on a bit of an embedded systems kick playing around with PLC's and embedded micro controllers. This is a great article.
I have to say, the most impressive/innovative tweak, to me, was the re-ordering of required functions in the compiled binary. Doing so allowed them to reduce load time, by making it that only two blocks had to be demand-read off the flash filesystem, instead of four.
That's some crazy, use-the-drum-spin-as-timing, innovative thinking right there. Serious kudos.
That's actually common practice in profile guided optimization, put commonly used code close together in the image to minimize the number of pages loaded.
This is the year of Linux on the de- wow, that was quick.
"We live in a global world" - Harvey Pitt, former Securities and Exchange Commission Chairman
I have to say, the most impressive/innovative tweak, to me, was the re-ordering of required functions in the compiled binary. Doing so allowed them to reduce load time, by making it that only two blocks had to be demand-read off the flash filesystem, instead of four.
That's some crazy, use-the-drum-spin-as-timing, innovative thinking right there. Serious kudos.
Most assuredly!
I realized these guys were serious when they talked about the bandwidth reading from flash vs. the bandwidth of boot image decompression. Kudos, indeed.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
The bleedin' slideshow requires Flash 9 , for crying in a bucket! Why the hell do you need Flash to show a sequence of static images in succession!? And to download the slides, you have to have two accounts: one on Facebook, and one on something called Slideshare.
Screw it.
</rant>
(Be glad I'm sparing you my take on Javascript.)
I refuse to believe corporations are people until Texas executes one. -- desert rain on http://www.dailykos.com/user/
Some mighty awesome hacking going on there!
Too bad it won't translate well into desktops.
Another optimization that was common old Mac compilers was "dead-stripping", where they avoided linking in any functions that were never called. Apparently this isn't commonly done and instead if a single function in a file is called, then ALL are linked in, at least when I looked into it for Linux a while back.
Um, when he pulls the plug - he inserts it at 1:03, the device is mostly booted by 1:06, but the video feed isn't restored until 1:08. In other words, I'd say at least 3 seconds, I'd call it 5 seconds to fully boot and load the application.
Don't get me wrong, still impressive, but it seems to be a bit more than one second...
On an unrelated note: Why the hell does Slashdot still not work properly with Google Chrome? It's the only site I've ever had any problems with....
Um. This capability has been around since FOREVER. I worked at MS 'beside' the group that created exactly such a tool, called BBT. This was ~2000.
It was the most used, but not the only result of a tool suite called Vulcan. This would allow you to pull apart and re-assemble a binary, either for re-ordering (optimization) or to add instrumentation for other optimization projects (like the one I worked on).
Stop!
If you thought of posting "Who among us needs a fast-booting machine when we just leave our machines on all the time? Nobody reboots!", please choke on a lump of poison, because fast booting is desireable and awesome.
Don't say I'm tilting at windmills, people do post that kind of shit.
No, it's not about the Knights who say Ni. Does this thing have USB support? Does it have Wifi support? Products by Technogic Systems boot to a shell prompt in less than 2 seconds but if you need to load USB drivers and populate a /dev it drops to about 6-10 seconds. Wifi takes longer.
ahhh damn what was that drum-spin story? I've read it around here before and I've been trying to find it again but with no luck. It's a great story and if you have a link it would be much appreciated!
The Story of Mel! http://foldoc.org/The+Story+of+Mel (That's not the original, that's the "free verse" version which is better IMHO.) It might even be a true story! http://en.wikipedia.org/wiki/Mel_Kaye
I don't think it's all that impressive.
Almost 10 years ago QNX offered a 1.44 MB floppy disk that contained the OS, a nice user desktop, a network stack, a modem driver, and web browser.
It took a little time to load, only because it was running off a Floppy. How long would it take to load 1.44 MB off flash these days?
If you've got hardware that doesn't make you wait a long time on a bios startup, loading up an OS should be able to be quite fast.
You forgot to mention the performance died when marketing made you link in ie.h .
Got Code?
that it took to write this post? For a minute I was thinking this was twitter.
Um. This capability has been around since FOREVER. I worked at MS 'beside' the group that created exactly such a tool, called BBT. This was ~2000.
It was the most used, but not the only result of a tool suite called Vulcan. This would allow you to pull apart and re-assemble a binary, either for re-ordering (optimization) or to add instrumentation for other optimization projects (like the one I worked on).
You deliberately seem to miss one really important point.
These guys do not have to pull apart binaries they can mod them from source if they feel like it. You have let the cat out of the bag and admitted that Microsoft uses disassemble technique on other peoples binaries all the time. No wonder it was so easy for you guys to clone closed functions in IBM's Lotus Suite and com SQL framework, then pretend that you did not know what was going on when all of a sudden their binaries were slower and less reliable than yours!
So you guys have been doing what you say others have no business do to your software FOREVER. Face it... considering the fact that embedded Linux runs most TVs, BD players and other really popular devices on the market today, you are just mouthing a party line and not making a real valid argument why MS is losing out in the embedded products market!
But doesn't rely on linker optimization support to work (eg GCC's new LTO)? Normally in the first pass when all the objects are being compiled, there's no way to know which functions are going to be unused (I'd imagine, IANACD).
Wake me up when they get to ~80 millseconds.
Is there anything better than clicking through Microsoft ads on Slashdot?
http://www.qnx.com/products/reference-design/fastboot-building-automation-kontron-video.html
http://www.youtube.com/watch?v=yTUweJKAUfk
QNX, the little OS that could.
this shows how many ressources are wasted in normal operating conditions. Before optimisation, the processor and other things were just idling most of the time... This is a lean approach, focus on removing the waste.
As an engineer in electronics, I really like the "to the metal" approach. Good job!
It redirects to 123 reg. What's happening?
It's called -ffunction-sections, which puts each function in a file in its own section so the linker can get rid of all the unused ones. No need for LTO.
My thoughts exactly, I don't want to use non-free adobe flash and slideshare does not work with gnash. Fortunately a very similar looking PDF does seem to be available at
http://elinux.org/images/f/f7/RightApproachMinimalBootTimes.pdf
The linker's task is to put in the executable only what's needed. Taking only the right object files from archives is logically similar to taking only the right functions from object files. You could call both steps optimizations. If you skip either step, the only result is a larger executable, but it will still work.
The "new" LTO in linkers actually works on statement level, not function level. That's far more complicated because the relation between instructions is far less clear, and far more architecture-dependent. Function-level linking is comparison is so trivial, I'm wondering why it's so uncommon on Unix. I think Visual Studio got it about two decennia ago.
Use the drum spin as timing, are we talking bendix G15 here?
True, my Acorn Electron didn't boot into a Qt application, but it managed to get to a prompt virtually instantly.
I've been waiting for those days to return.
To do function level linking you need to compile your sources with -ffunction-sections -fdata-sections and pass --gc-sections to ld. It might be scary to some people because the .o files produced this way are much larger. Also, it does not work on shared libraries, for the obvious reason that you can not tell what will be unused.
Demand paging in of executables made sense back when we only had a few megs of ram in a general purpose computer but I wonder if it's an idea that is past it's time and if we would be better off loading smaller executables in a single read operation. Especially on hard drive based systems (flash is better but still once you add all the layers on I bet a single large read is faster than many small ones)
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
We won't have our uptime to brag about anymore!
My beliefs do not require that you agree with them.
Function-level linking is comparison is so trivial, I'm wondering why it's so uncommon on Unix
Because static linking is so uncommon. If you are linking statically, you can use -ffunction-sections to put each function into its own section so the linker can skip all the unused ones.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
A program's mainline code is statically linked. At least when I'm writing a program, I often use small utilities I've written over the years. They aren't substantial enough to be put into a shared library, and they contain plenty of functions a given program never calls.
Without demand paging, he would have had to read the entire executable in; far more than just four or two blocks from the filesystem. The issue was I/O bandwidth rather than response time to requests, from what I understood.
Well, if you're really crazy you put the init code together into a contiguous block so you can use it as a data buffer after you're done initializing everything.
Of course we realized that that was still too conservative. The next year we started allocating data buffers inside the init function while we were still executing it. Just follow the instruction pointer down and allocate behind it.
A program's mainline code is statically linked. At least when I'm writing a program, I often use small utilities I've written over the years. They aren't substantial enough to be put into a shared library, and they contain plenty of functions a given program never calls.
I use shared libraries for those.
On Unix systems there's very little reason not to. On Windows it's a bit of work (not a lot, but some) to fix your code up so it can be used as a DLL, but on Unix it's just some compiler and linker flags -- and not even that if you're using a build system that takes care of it for you. The only downsides are (a) possibly a tiny bit of wasted RAM as unused functions that happen to be on the same page as used function are loaded, but that is typically more than offset if two programs using the library might be running at once, and (b) an extra lib to distribute with the executable. In practice, (b) is a non-issue as the installation and packaging tools can pick up all such dependencies automagically. Well, I guess it increases the size of the distributed package a bit, but that's trivial.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Ummm.. I guess. I guess it all depends on your background and perspective. When I started working for an embedded sys company after high-school that was one of my first contributions... The tradeoff/choice between compression vs no decompression overhead is a pretty typical one made in flash based embedded products for years. There is another parameter that is less important today, with demand loading from NAND more common... The cost of NOR flash and minimizing it. Sometimes compressing RISC code is a big win from a cost perspective. Like I said I used to investigate this as an entry level schlub.
I later moved on to an embedded Linux project and of course the first thing was bringing down Uboot+arm-linux miserable boot time to something not so bad. In that time, the default u-boot code didn't bother with enabling caches (because there was not even basic MMU setup for the ARM). But I recall, getting the Linux and X11 based system booting in a similar timeframe (say 3 to 4 secs) as these guys. I'll note that I was also quite proud of the fact that when the screen came up, everything save for WLAN was actually done (no ongoing background init)
These guys obviously are doing a job, but this isn't rocket surgery.. this type of stuff is all in a day's work for embedded grunts. Fast booting is an enabler to doing something useful which is usually more interesting, it is isn't the focus of use... though it can be damn important.
on 300 MHz ARM, NAND flash: http://www.makelinux.com/emb/fastboot/