Firefox Faster In Wine Than Native
An anonymous reader writes "Tuxradar did some benchmarks comparing Firefox's Windows and Linux JavaScript performance. 'We did some simple JavaScript benchmarks of Firefox 3.0 using Windows and Linux to see how it performed across the platforms — and the results are pretty bleak for Linux.' Later on, they tried Wine. 'The end result: Firefox from Mozilla or from Fedora has almost nil speed difference, and Firefox running on Wine is faster than native Firefox.'"
Check the doco
Firefox 3.0 built for Windows was PGOed (Profile Guided Optimisation)
PGO was not yet enabled for linux builds
Try a newer build.
FAIL
The Singularity is closer than you think
Quant
On the flip side, the pop-unders I get from my local newspaper's site under Firefox don't happen under Linux, only Windows.
"Can't you see that everyone is buying station wagons?"
Seriously, how fast does a web browser *need* to be? I've never been using Firefox on Linux and thought to myself that it was prohibitively or even annoyingly slow.
Reading TFA, in most cases, the differences in times don't seem dramatic, either, so who really cares?
http://en.wikipedia.org/wiki/Profile-guided_optimization
By default Firefox for Linux uses shared system libraries rather than statically linking them altogether as the Windows version does. That's bound to have an impact on performance because code and data pages will be all over the place. Type "about:buildconfig" into the browser and it will tell you its build settings.
the qt rewrite is dead jim.
Its a shame i was really looking forward to the qt port, but just like the old port, it got done then dropped AFAICanTell there wasn't enough developer interest and no users were using it as it wasn't quite usable :(.
IranAir Flight 655 never forget!
Profile Guided Optimization (PGO) is where you compile a special "recording" build of a program, then run it just using your core feature set and "ordinary" tasks. You don't perform a full test, or click on all the options or settings, you just go through normal end-user use cases. The special build then records a "profile" of your typical usage. You then feed the source code plus the profile back into the build process to build your production code.
The idea is for the linker to identify the hot spots in memory, and group as many of them together as possible so they live on common pages. This helps keep those pages from being swapped out of memory to disk due to disuse, which greatly reduces the amount of thrashing your end users will see during normal use. Less thrashing == improved performance.
John
... RTFA
Oops, sorry, I didn't answer your "why not?" question directly. My guess is that because it takes a fair amount of additional work to create the profile after each build, the step may have been skipped by the Linux build team. As far as I know, profiles are unique to each build: you can't create a profile under the Windows image and reuse it on the Mac or Linux builds.
That's just a guess, though, I could certainly be wrong about that. I'm sure a PGO expert or perhaps a member of the Firefox build team will chime in here soon to correct me if I am.
John
That's way off base. There are no context switches when making a library call. Context switches occur when you ask the kernel to do something by making a syscall. So memcpy or memcmp don't incur a context switch. Nor do fopen or fread in and of themselves cause context switches. But one will occur when the underlying open and read calls are made.
What's really needed here is a profiler to find where the code is spending the bulk of its time. My guess is that it's a compiler issue. And other comments about the windows build using profile guided optimization tell me my guess is probably right.
You obviously have no idea what a context switch is.
A context switch happens when the scheduler stops one process/thread and gives the CPU to a different one. This has nothing to do with cross-library calls.
sig intentionally left blank
My experience of using Ubuntu via wubi (i.e.a file image stored on an NTFS disk) is that the performance is hit severly by the ntfs-3g process. Run top while performing mild disk activity and you'll see what i mean. If you use it regularly you might want to use dual boot instead.
PGO in GCC: http://gcc.gnu.org/install/build.html#TOC4
Colorless green Cthulhu waits dreaming furiously.
Unless I have a HUGE hole in my dynamic library knowledge, you are wrong.
Linux dynamic libraries, like any windows dynamic library, don't force context switches at all, neither on windows or on linux. They force a few page faults to generate new data segments, but they do so on both systems.
And in practice ... linux easier to bugfix ? Dream on. Truth be told, as long as it's "high level" stuff, windows is massively easier to bugfix, due to massively better development tools (sorry but nothing beats microsoft's visual set of tools).
It doesn't really have to do with X or Firefox so much as the interaction between X and Firefox. Composition effects and pixmap caching at the two prime issues.
Composition is when you draw an image that blends with what is already on the screen. Right now, a lot of the Xorg code that accelerates composited effects is unfinished. In particular, rendering composited text is painful. The brute force solution of blending with what is on-screen is awful, because reading from video ram is very, very expensive. So optimizing this is pretty non-trivial since the optimization must be that you don't look at what you need to blend with! Progress is happening though.
Pixmaps are used to store images in the X server. Firefox, to get the rendering effects it wants, often uses large pixmaps for application elements. Large pixmaps can cause memory fragmentation issues, making later allocations harder, causing performance to slowly decline over time. Again, this is something being worked on, but in this case, the client is really not behaving very nicely.
Like I said, progress is being made on these fronts - Xorg's xserver 1.5 and 1.6 are supposed to have some good acceleration improvements. There's been work on a much improved glyph cache for EXA accelerated fonts. I haven't run any of these, since my distro currently ships 1.4, and I don't really plan on upgrading until Debian does. But since it's a pain point for me, and I read the development mailing list, I thought I'd share.
I recently went through a round of attempting to use OpenSolaris on my work laptop (damn I want that ZFS juju)... and there were a couple of things that drove me back to Debian - one of them was the horrible performance of Firefox under OpenSolaris. Under VirtualBox on OpenSolaris host, Firefox was faster on either a Debian or a WinXP guest than it was on the host... the difference between usable and not. The specific application that really showed this was Zimbra (pretty heavily AJAXy). In trying to track this issue down, the general feedback on OpenSolaris forums was "Firefox on OpenSolaris kinda sucks, sorry". My personal experience with Firefox is that under Linux or Windows it's subjectively close enough not to worry about (on a variety of hardware, not just the laptop that I tested OpenSolaris on).
Browser response, not speed, is what annoys most people on Firefox, since version 1.
Instead, it's the lack of threading - that the notion "UI, the rending engine, and plugins should run in separate threads, with the UI thread having the highest priority".
Konqueror runs Flash player in its own process "nspluginviewer", which I can renice to 19 - just like how IE runs Flash in the lowest priority by default. Still, on Firefox 3, a few tabs running CPU-intensive Flash can still effectively freeze the browser UI.
gcc is primarily interested in x86. They nominally support other architectures, but they don't receive as much attention and end up being outdated, deprecated and removed. OpenBSD resurrected the pcc compiler because gcc dropped support for architectures they support. Apple is working on llvm (among other reasons) because gcc's ARM code generation isn't good enough.
The profile in question here is a profile of what variables and chunks of code the program (in this case FireFox) uses the most, not your FireFox user profile. By knowing this it knows stuff like what variables are read and or written the most and the least and it knows what functions should be next to other functions used at the same time. This gives it a good idea of where to store things when it compiles the source. For example the variable containing the users bookmarks will not get accessed as frequently as variable containing the current tabs. While this profile could be effected by how the user uses FireFox it is very unlikely to be a significant difference.
I think that is why GP said the impact of swapping "for an average desktop linux user is almost non-existent" ...because for an average desktop linux user swapping is almost non-existent.
I've run Linux machines (for short periods of time, with no more than normal desktop use loads) without any swap, and they work fine... but when you hit that wall of running out of physical RAM you'll feel it a lot more without swap than you would with a swap file/partition.
Windows on the other hand seems to want to use several hundred megs of swap whether it needs it or not.
From TFA, the first of the "Answers to some predictable comments":
Why didn't you use Firefox 3.1?
We tried using a nightly build of Firefox 3.1 to see how performance might change in the future, but it locked up while running the Dromaeo tests so we opted to leave it for now. To be fair, the browser is still in beta, so it wouldn't really be a good test.
Wine runs Chrome a lot slower than Windows.
Your little theory is disproved by this chart:
http://www.agner.org/optimize/optimizing_cpp.pdf
And scroll down to page 68. GCC does everything MSVC does, and more. The chart says that GCC doesn't implement PGO yet, but currently it does.
The cause of Firefox underperforming on Linux is most certainly not using PGO in Linux builds, which is a distribution issue more than a Firefox issue.
Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.
I am not sure how you got modded insightful. Linux, in terms of the kernel, is in fact a monolithic structure but has nothing to do with the API/lib/small packages that can be chained together that the OP was talking about. Linux in the GNU/Linux sense (a distribution), is in fact composed from many small libraries that each perform a specific function well.
Regarding your point about how the app was built: How do you draw a distinction between the Windows, Linux, and UNIX builds of Firefox? I'll help you - each version is a port using libraries on the system that it is ported to. Those dependencies do in fact have an impact on compilation, how the memory map is built, and how well the application performs.
Also - the optimization process differences are significantly more complicated than you implied. I strongly suspect, although am not positive because I have never built it from source or examined how an RPM was built, that the Linux Firefox build was done with at least -O2 or -O3 flags. The difference that FP was talking about was PGO (Profile Guided Optimization), which is more involved (and thus better performance gains) than just turning on the default compiler optimizations.
Actually he's right but in the wrong direction. On Wine many things that would be pure syscalls on Windows do force a context switch into the wineserver, because the emulated "kernel" is actuall a separate process. For instance opening a file involves an RPC to the wineserver on Wine, whereas it simply switches into kernel mode on Windows and there's no TLB flush overhead. The fact that Firefox is still faster under Wine than native suggests a serious bottleneck somewhere rather than a general problem - if I had to choose, I'd pick text rendering as my first guess.
cache misses are __WAY__ more expensive than subroutine calls.
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
Also, Wine would need more context switches than either Linux or Windows. If you are running a single process you usually switch between at most two contexts, your userspace process and the kernel. However since the functionality of the Linux kernel isn't a perfect match for the Windows kernel, Wine needs an additional context/process, the wineserver, to provide this functionality. So context switches wouldn't benefit wine over plain Linux.
Firefox appears to be using an inefficient method to render the content to the screen. If a load up a page in Firefox and drag the window around fast, the content inside the window tears and blurs and stays that way for a second after I stop whipping the window around. Konqueror and Opera don't do this.
Context Switch also refers to switching between user and kernel modes via system calls or interrupts. OP is still a raving lunatic though.
Serious question: What is glibc doing that you don't think it should be doing?
This isn't so much a complaint about glibc-as-implementation, but I do think the standard C library design has a lot of crap in it that it just doesn't do well.
In my mind, the main offender is internationalization and localization support. It's a non-trivial problem that the standard library just isn't very well-suited to--I usually end up using a library like ICU for this.
The C people should have stuck to byte string manipulation, math, and basic I/O, but there's no putting the horse back in the barn after that.
It (MSCC) out compiles gcc any day of the week.
GCC has gotten pretty good over the past years. I used to swear by the Intel compiler until recently.
Some non-anicdotal evidence. 1) Apple uses GCC to compile their code on Intel and PPC and whatever CPU the iPhone uses. 2) The performance driven code where I work is compiled with GCC. Money is no object, and GCC simply is a better option when you consider compatability and usability, and for our heavy routines we use SSE routines in assembler (so the compiler doesn't matter).
One pet pieve, why do most linux distributions still set up GCC and compile most of their code for 20 year old CPUs? My Mac's GCC generates assembler for current hardware. I don't understand why you have to use Gentoo to get a compiled version for modern hadware. /rant
I don't know why, but even under complete OS virtualization FireFox is faster in Windows under VMWare or VirtualBox than it is natively on the same box.
It's basically a compiler benchmark. What this proves is that the microsoft C++ compiler produces better code than gcc. This isn't suprising. Re-run the benchmark with the Linux code compiled with the intel compiler. gcc is a good compiler, but it doesn't produce code as tight as some commercial offerings.
To be honest, the quality of generated code between MSVC and gcc is not that different. MSVC tends to do inlining somewhat better (I've personally witnessed it unroll a ~60000-deep call chain - produced by template metaprogramming - into a single statement). On the other hand, gcc is sometimes more tricky with rearranging the code smartly to produce the same effect for less effort, on the level of individual instructions. I do not think that it's what makes a difference here.
Look up gprof for gcc, yes gcc does profiling.
Support my political activism on Patreon.