Nailing the Cause of Recent Linux Power Issues
An anonymous reader writes "For the Linux kernel power regressions that were found a few months ago, and hit in Ubuntu 11.04, Phoronix has found the regression that's still present in the Linux 3.0 kernel. The power regression is caused by a change in ASPM, the Active-State Power Management, for PCI Express support."
Interesting headline. I was trying to figure out how old-school manual construction work would be responsible for tricky power supply problems on Linux machines only.
I am a geek attorney, but not your geek attorney unless you've already retained me. This is not legal advice.
It's due to some buggy BIOSes not properly advertising power-saving features of PCIE cards. Older kernels didn't honor those BIOS hints, and disabled power to unused PCIE cards anyways (causing hangs in rare cases), whereas new kernels do the right thing (causing power wastage in lots of cases). The workaround is to specify pcie_aspm=force on the boot (Grub) command line, to tell the kernel to forge ahead, and just use power management on these cards regardless of the BIOS advice.
As bad as some of the Phoronix articles can be, they have contributed a lot to the community. After all, they played a pivotal role in setting up openbenchmarking.org, and are pretty much the only source of Linux hardware reviews.
The article is full of sensationalism like "serious bug", "major regression" to promote Phoronix and its "wonderful test suite". If you read it closely, you'll see they have seen a 10% increase in power consumption on just one of their test laptops that depends on BIOS settings. That particular laptop has a bug in its BIOS where it claims it wants to manage configuration of a particular piece of hardware, and new kernels obey that request. You can even tell the kernel to disregard BIOS and force power settings anyway.
For me, improving power efficiency everywhere but that particular laptop is a major win. If you feel nice, you can even detect this particular buggy BIOS and ignore its request. But then, even after throrough fiddling, Phoronix guys weren't able to improve power usage by more than 15% even on this laptop, so it's not a big issue anyway.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Add pcie_aspm=force to your boot options.
/etc/default/grub and editing the GRUB_CMDLINE_LINUX_DEFAULT variable. Then call sudo update-grub.
Test it by editing grub (which is a temporary edit that will be lost next boot) first and test out suspend, hibernate, etc.
If that works, edit your grub configuration files. For ubuntu users this means editing
Machines like mine are probably the cause of the offending commit. Since maverick I had to force ASPM off on it or use lucid kernel because it caused frequent hard locks.
Is it possible that unused PCIE cards waste that much power? On Linux I drain my laptop's batter in under 2 hours, sometimes 1.5. On Win7 it used to take 3+ hours with brightness at 100% (because I was outdoors).
DISCLAIMER: Author of this post is currently using Linux because of superior performance and availability of tools not available on Windows platform.
Every harsh word you utter has the right address. It only sounds harsh because the one on the envelope is the wrong one.
That is an accurate summation of the article; but calling things "right" and "wrong" is a little nieve. Windows treats this information very differently to Linux, and BIOS manufacturers are caught between the two. Simply advertising ASPM sounds good, unless it causes Windows to treat card without ASPM support as if they have it just because the bios advertised that the system supported it. Now current versions of Windows might act rationally in this regard, but XP and older are still highly prevalent particularly amongst corporate clients and governments.
So I guess my point is - it isn't a simple right or wrong/black or white scenario. It is a messy, ugly, undocumented hack, that ultimately leaves nobody happy. Linux will likely wind up having to implement a hack too to fix this, which makes them no better or no worse than the bios manufacturers who did exactly the same thing.
The article points out that there is also a power regression in the scheduler. Which is the next thing that the writer will look at.
Hard to say without the exact specs of the machine, and probably a bunch of test-probes clipped in awkward places inside the laptop; but the overall trend in hardware does seem to have been toward ever higher theoretical maximum-if-we-felt-like-burning-that-much power draw(remember back when a ~50-80 watt CPU was considered a howling-mad-danger-to-self-and-others overclock/overvolt insanity demandng nerves of steel and custom cooling? Now boring retail CPUs have TDPs in the ~130 watt range); but a corresponding increase in the ability of hardware to throttle various clocks(CPU, GPU, high sped busses), sometimes cut Vcore as well, and turn off(or very nearly so) unused peripherals.
Exactly where the delta exists vs. Windows seems to be a matter of some confusion; but unless Linux is just plain burning more CPU time for housekeeping purposes(which, one assumes, is the sort of things that the Big Serious Corporate users of 1000+ node commodity server/compute setups would have noticed by now), it likely rests largely in the hands of a (no doubt alarmingly large and ever changing) set of hardware-specific power throttling stuff whose responsibilities were designed to be divided between the buggy BIOS and the vendor's Windows drivers. If it were Just One Mistake, it'd likely have been quashed by now...
Would it work to have the kernel default to using whatever the BIOS indicated, but also have a database of overrides based on the exact card model?
Never upgrade your Linux distribution in place.
Have 2 (or more) OS partitions of about 20GB each.
Install your OS's to partition 1.
Install your upgraded version to partition 2.
Easily switch back and forth.
Oh, and keep a separate /home partition.
I'm not a lawyer, but I play one on the Internet. Blog
Not sure, but is Windows using BIOS or drivers as a first reference for power saving support? As such, could this be yet a case of hardware shipped as known buggy and "cleaned up" via driver code?
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
I can't stand it and immediately dump it for lilo as soon as I've done an install. I just want the boot loader to load the OS and get the hell out of town. End of. I don't need a boot "enviroment" thanks.
Hippies. They fight the power.
Vote monkeys into Congress. They are cheaper and more trustworthy.
ACPI: RSDT 00000000bf780000 0003C (v01 7593MS A7593300 20100210 MSFT 00000097)
ACPI: FACP 00000000bf780200 00084 (v01 7593MS A7593300 20100210 MSFT 00000097)
ACPI: DSDT 00000000bf780480 06E90 (v01 A7593 A7593300 00000300 INTL 20051117)
ACPI: APIC 00000000bf780390 000AC (v01 7593MS A7593300 20100210 MSFT 00000097)
ACPI: MCFG 00000000bf780440 0003C (v01 7593MS OEMMCFG 20100210 MSFT 00000097)
ACPI: OEMB 00000000bf78e040 0007A (v01 7593MS A7593300 20100210 MSFT 00000097)
ACPI: HPET 00000000bf78a480 00038 (v01 7593MS OEMHPET 20100210 MSFT 00000097)
Every frigging machine I've owned is the same way. Microsoft-built ACPI, filled with bugs that I have to manually fix.
That is an accurate summation of the article; but calling things "right" and "wrong" is a little nieve. Windows treats this information very differently to Linux, and BIOS manufacturers are caught between the two.
In other cases this has been because microsoft wrote the tools and designed them to be hostile to Linux, e.g. ACPI. is there any of that here?
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I'm using a Vaio S that gets 7+hr battery life in Windows, and under 2hr battery life in Fedora. The big problem that I see with this laptop is that Fedora is not utilizing the "hybrid" graphics system, and it is constantly running off of the graphics card instead of the integrated graphics (in windows, this brings the battery life to under 2 hours, as well). It would be nice to be able to switch that permanently to integrated to get the battery life.
Linux does thing the way they should be done according to standard. Windows does things they way they actually are done in the real world. The reason is simple: BIOS vendors noticed Windows doesn't follow the standard well, and made the reasonable assumption that the vast majority of users would run windows. Thus they deviated from the standard in order to better support it.
That has happened before so many times you cant count.
Under Ubuntu, I'm using the integrated only, and offload to the real GPU using bumblebee, but the battery still drains too quickly.
Every harsh word you utter has the right address. It only sounds harsh because the one on the envelope is the wrong one.
Make up your mind. Scum or "weazels". If you weren't a retard, you would understand that scum is around the bottom of the food chain, while weasels occupy a niche at the higher end of the food chain. Nothing in common, whatsoever. Things that eat scum, in turn feed other things, which weasels prey upon. I know the concept is difficult to grasp, for one of your limited mental capacities - but please, try to make the effort. You'll be so proud of yourself, and your mommy and daddy will be proud too! Go ahead, put on your big boy pants, along with your thinking cap, and work hard to figure this stuff out, alright?
BTW, I think Linux bashers are closer to the pond scum than they are to the weasels. But, that's just an opinion, with no scientific proof to back it up.
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
One nice example is the problem with some laptops that you have to close the lid twice to make a machine suspend under Linux. This is due to an ACPI bug where the lid status remains in state "closed" on resume. Linux power management wants the transition to be exactly "open" -> "closed" for suspend, in Windows simply a lid event with state "closed" is enough.
If you are skilled, you can also hack the ACPI DSDT and inject the new one on boot. :)
Why does it have to be that they are hostile to Linux? It could just be that they don't give a fuck and never tested their products with Linux. Or maybe some developers on the Linux kernel project are incompetent and stupid? Why do you tin foil nuts always make it out to be some conspiracy against Linux? Or to look at it from another point of view, why should any OEM give a fuck whether their desktop products (which are going to require good suspend/resume/battery support etc) work with OS that is an economically insignificant portion of the market? Linux users are going to format the PC and install Linux anyway. They aren't going to drive any new customers for the trialware shit that ships with the PCs. Something which some (retarded) windows users are likely to do.
causing hangs in rare cases
- On new Sandy Bridge laptops, booting always caused hangs. I couldn't boot Linux (Arch or Ubuntu) if I had the power brick connected. Hanging is just one issue, another one is that the network card is unable to connect to the network. Also, if you dual boot, Windows might tweak with cards settings and when you reboot into Linux, it still hangs or simply can't connect to network. Solution is complete power off and rebooting without power brick being on. I even tried de-activating ASPM and didn't work (pcie_aspm=off). Only thing working as you mentionned is pcie_aspm=off, at least for me!
Unrelated to the problems mentioned here, there's still a lot of work to do to allow us to manage our power more efficiently in Linux. See what I posted in the previous discussion.
Non-Linux Penguins ?
Support for Hybrid GPU setups in the kernel has been supported for the last few releases. If you google for something like "linux gpu switcheroo" you should be able to find what I'm talking about. Yes, it was called "Switcheroo" by the original author of the code. The primary way of switching GPUs is through the /sys filesystem unless there are some GUI programs that do that for you.
On Linux I drain my laptop's batter in under 2 hours, sometimes 1.5. On Win7 it used to take 3+ hours with brightness at 100% .
How long ago was that? Maybe your battery is nearing end of life.
Is it possible that unused PCIE cards waste that much power? On Linux I drain my laptop's batter in under 2 hours, sometimes 1.5. On Win7 it used to take 3+ hours with brightness at 100% (because I was outdoors).
DISCLAIMER: Author of this post is currently using Linux because of superior performance and availability of tools not available on Windows platform.
Probably depends strongly on the laptop and the drivers available for its hardware.
On my 7½-year-old laptop (Sony VAIO VGN-A117S[*]) with original battery, the battery typically lasts slightly less than 2 hours, but even with intensive use it lasts more than 1½ hours. It runs Lubuntu 10.04 and it's years since any version of Windows dirtied its disk, so I can't do a direct comparison right now. As far as I recall, it lasted about 2½ hours on Windows XP when it was new (early 2004), and somewhat less when running Warty or Breezy. With subsequent Linux kernels the battery life became almost the same as it had been with Windows, and Windows was ditched completely with Dapper. Considering the age of the battery, I expect most of the shortened life since 2004 is simply age-related degradation of the battery.
[*] This is actually a beautiful laptop, made when Sony had not yet slid into the abyss of evil. Its 17" 1920x1200 LCD was the main reason I bought it, and the main reason it's still in service. It also runs quite nicely with Lubuntu, despite being limited in RAM. We upgraded the disk a month ago (the original was still working, but 80GB seems small nowadays), while the rest of the hardware is original and working perfectly - still no dead/hot/wonky pixels in the display.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Many moons ago there was some deep bitterness from some of the devs at the Ottawa Linux Symposium about the fact that hardware developers weren't actually following the specs but instead implementing their own, then just writing Windows drivers to work around their tweaks.
Since Linux doesn't typically support pluggable hardware drivers from manufacturers (and they often don't care to write them), Linux was trying to communicate using the actual specifications, and failing. This has been a problem for years now, and I have no reason to believe the hardware manufacturers are any less to blame now than they were before.
- Michael T. Babcock (Yes, I blog)
It used to like a few months ago. The laptop is practically brand new. I had Win7 as a temporary solution while I was figuring out how to get hybrid graphics working on Linux.
Every harsh word you utter has the right address. It only sounds harsh because the one on the envelope is the wrong one.
As far as I recall, it lasted about 2½ hours on Windows XP when it was new (early 2004), and somewhat less when running Warty or Breezy.
I'm sure WinXP cannot compare in terms of power consumption to Win7 + latest drivers from hardware vendors. Sadly, in all other aspects, they don't differ by much. I might switch one day if the actual performance becomes on par with Linux. On the other hand, if Linux becomes better in power management, the switch would probably never become an option. (Hm... this reminds of of those Linux vs Windows discussions, with roles slightly reversed.)
Every harsh word you utter has the right address. It only sounds harsh because the one on the envelope is the wrong one.
Who are these pansies that use a boot loader at all? I enter in the machine code by hand, that's the only way to be sure.
Youngsters today just don't appreciate how toggling in absolute addresses and machine instructions via front panel switches could build character. It especially expanded one's vocabulary of expletives and expressiveness in screaming them. The PDP-8 only had 12-bit words which saved a lot of toggling, so after a little practice it could be booted to having multiple teletype[*] terminals active in less than 10 minutes. Confession: the last minute or two were reading in from magnetic tape, whose drivers were loaded from a hard-coded[**] circuit board in an act of heinous cheating.
Mind you, that's still faster than booting and logging into my dual core XP laptop at work, which is burdened by an awful lot of corporate cruft (policy enforcement, antivirus, spyware and antispyware, security & encryption craplets, etc.) which must be loaded before the desktop is responsive.
[*] Teletype meant a typewriter sort of thing with fewer mechanical hammers than a real typewriter, but which still needed a roll of ink tape and a larger roll of paper for typing on. Any color ink you wanted, since the ink tapes were usually re-inked a few times by soaking them in suitable muck.
[**] Hard-coded in this context meant a mess of thousands of wires sticking out from the board which were selectively snipped with a pliers to make a suitable array of binaries.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
In other cases this has been because microsoft wrote the tools and designed them to be hostile to Linux, e.g. ACPI. is there any of that here?
This is what he's talking about. You don't have to be a conspiracy theorist to think that Microsoft could have deliberately made ACPI difficult for Linux to implement.
Dewey, what part of this looks like authorities should be involved?
I think you're spot on. Over the last decade I've constantly read articles about broken hardware whereby the manufacturer simply hides in their windows drivers. Chances are extremely high any power regression is actually a case of extremely broken hardware more dramatically exposed because of a bug fixes and/or compliance improvements in the Linux implementation.
Based on what I've read over the last decade, I definitely get the impression hardware bugs, specifically in power management, are fairly common. As a whole, manufacturers just don't give a shit about pumping out broken, non-compliant hardware specifically because 1, they hide their shame in their drivers, and 2, non-windows systems likely represent a fraction of their overall sales. Which means, who cares because who's actually going to know they can't properly follow a specification.
Unless someone has a smoking gun which proves Linux is doing the wrong thing, chances are the regressions are actually shit-poor hardware implementations with full knowledge of the manufacturers.
More like Windows used to do it as per the standard. Then Microsoft realized a good chunk of the crap people buy doesn't support it properly, so they have to add a bunch of hacks and tweaks in order to get it to work "properly".
Honestly, hardware sucks. Between buggy BIOSes and hardware with buggy support for everything, it's amazing something like Windows could even work, or that you can use the same Linux kernel without recompiling for different PCs.
You still find the odd USB devices with crappy descriptors in them these days because the manufacturer can't be half-assed to do it right. "Oh, it works in Windows? Great, ship it".
Sometimes the old adage of "be liberal in what you accept, conservative in what you emit" causes more problems - people do the bare minimum to get stuff working.
Hell, it's one of the reasons why the old Creative Soundblaster Live cards only worked well on Intel machines - they violated the PCI spec. It's just the Intel chipsets were more forgiving to violations, while other chipests that adhered more to the spec caused random lockups and crashes as the cards locked up the bus. (Only the Audigy line actually fixed the issue...)
emember back when a ~50-80 watt CPU was considered a howling-mad-danger-to-self-and-others overclock/overvolt insanity demandng nerves of steel and custom cooling? Now boring retail CPUs have TDPs in the ~130 watt range
Only if you're still using a Pentium-4. Most of the new i5s have 95W or less TDP and real-world measurements show they rarely go over 60W.
The new i5 server/DVR I'm building should use less power than my old dual-core Atom when idle and only about 40W more under full load.
The reason is simple: BIOS vendors noticed Windows doesn't follow the standard well, and made the reasonable assumption that the vast majority of users would run windows. Thus they deviated from the standard in order to better support it.
I suspect it's more than there are people paid to clean up their turds in software, so companies don't care about crapping out defective hardware with a broken BIOS.
When I was writing video drivers for Windows we'd often have to incorporate workarounds for broken host chipsets; I'm guessing all the other video card manufacturers were doing the same and the chipset manufacturers either didn't realise their AGP bus implementation was a heap of steaming monkey crap or didn't care.
One can only hope that this dirty "hybrid GPU" hack goes away rather sooner than later. I simply can't comprehend why it's inventor came up with the idea to _add more_ hardware to decrease power consumption, instead of just fixing the original problem.
Is there... more to that article? All it quotes from Microsoft is this:
- Bill Gates, 1999
Then it goes on to say:
That's IT. That's the entire content of the link. How does that prove anything? At all?
So yes, I'm sorry, but you have not lifted this thread above the "paranoid kook" level. How about some evidence instead of a quote (that could refer to anything, and has nothing to do with sabotaging Linux anyway) and some completely baseless speculation?
Comment of the year
As I understand it, the hack is already implemented (pcie_aspm=force). The good thing about it is that you can enable it when you need it, and not use it when you don't need it. If the same is true of the BIOS manufacturers, then I agree it is no better or worse. However, I suspect that the reason Linux needed to implement a workaround is because there is no way for end-users to make the BIOS do it right. In that case, it is more of an example of how two wrongs (BIOS breaking things to make it work with a broken OS) don't make a right.
Please correct me if I got my facts wrong.
Did you read the linked PDF at all? Here's what the rest of it said:
In summary, Bill Gates explicitly wanted to break ACPI on Linux.
Dewey, what part of this looks like authorities should be involved?
Shame there's no way linux can use the pluggable windows drivers =(
The average TDP and real world numbers have indeed fallen since the brief reign of the dual-die Prescott parts, those suckers were toasty. High end i7s(not the 95w Sandy Bridge ones, the original QPI-based LGA-1366 ones), though, still quote a 130 watt TDP(though they can also be had as low as 18watts, and I suspect that the market is vastly larger in the 18-60 range). Xeons are in the same boat. The Nehalem ones, still available, are up to 130 watts TDP, the Sandy Bridge ones up to 95.
Your point is largely correct, in terms of CPUs that people generally buy, most of the 130s are either crazed enthusiast parts or painfully expensive Xeons; but the 130s do exist on the shelf, as do the utterly boring business desktops with cooling hardware that would have made those PIII overclockers mad with jealousy...
It would be nice to be able to switch that permanently to integrated to get the battery life.
I had a similar issue on my ASUS U35JC. Its Gentoo system now uses the Intel based integrated graphics card exclusively. The additional NVIDIA card can be powered down by loading the acpi_call kernel module and then executing:
echo '\_SB.PCI0.PEG1.GFX0._OFF' > /proc/acpi/call
In my case this saves about 6 watts instantly. To avoid problems with suspend/sleep states, scripts automatically restore the NVIDIA card just before suspending and power it down after resuming.
I just got a new Sandy Bridge laptop. HP Pavilion dv7t Quad. ($900, Intel-i7 (4/8) CPU, 6G-RAM, 750G-disk, Radeon HD 6770M graphics, bluetooth, etc.)
I'm still testing it, but so far Fedora 15 has handled it just fine. No booting problems. No network card problems. No wireless problems. Slight difficulty with graphics card drivers, but it is a 6770M!
The only real issue I've seen so far was the defective harddrive (with unreadable sectors / media errors). Linux spotted it immediately. Every tool I tried under Win7, including the SMART-check, insisted the drive was fine. (Under Linux, smartctl -a showed over 10,000 errors in the log.) Kudos to HP, who believed me, verified the drive was bad via BIOS tools, shipped me a new one (in just a day or two), and let me replace it myself!
We can for NDIS drivers, it'd be interesting to write another binary wrapper for the Windows drivers but it means trusting more non-open code in the kernel's memory space at run time.
- Michael T. Babcock (Yes, I blog)
I am happy that this bug is finally getting attetion of the techie people.
On any of my laptops, any linux distro makes noticeably more fan noise than on any windows version. I tested so far Win XP, Vista, 7, Ubunutu from 07.04 up to 11.04, Fedora 13-15. Tested on my laptops of various brands: Dell, HP, Lenovo. It's always the same and in any ubuntu forum or such, the users (especially the dual-booting ones) keep asking this question again and again since long time now. There's a noticeable difference. Most of the suggestions focus on correct (toshiba, hp, dell, [put your own here], etc..) driver. I read somewhere in all those discussions that it might relate to the multi-core processing technolog and how linux scales down all of the cores in sync whereas windows turns off all unnecessary cores (which may be better to prevent heating). There's a lot of guessing and misinformation. Anyway, the consequence for me is: I like linux, but this high noise and heat just scares me. I don't want my laptops to damage too soon, so I rather stay in windows until this gets solved.
Great work! I disabled ASPM on my Dell Laptop and my normal CPU temp has dropped 15C. Heat was killing it.
in a just world driven by karma would be enough to make sure he never gets a Nobel Prize.
What a Dick.
thank god they fixed this....my laptop will actually be able to run for more than an hour now. thanks guys.
People other than apk don't like you as well....
Note: Current processors may have TDPs of up to 130W but this is spread over 4 or more cores. While older processors, like Prescott, were using the same amount of power in just one physical processing core.
IANA HW geek, but I learned this lesson vicariously a long time ago: "Never design to the spec." Chips (TTL in this case) vary in performance, and some of them do better than spec, others do worse. In the particular case, the TTL-based CPU had a stack that was implemented using four chips (IIRC FIFOs, but I don't recall). The timing was based on the spec. As a result, those four 'identical' chips had to be matched - if a slower one came after a faster one, the CPU would crash. The difference in timing was too small to reliably measure, so manufacturing or repairing these boards involved careful testing and trials until you got a set that worked through all the diagnostics.
Of course, that was back in the day when a CPU came on one or more large circuit boards that always carried at least a few little red wires to fix hardware bugs. Even IBM CPUs. It was always gratifying to see the IBM techs at a tradeshow madly going at their server with duck tape and chisel like the rest of us. :)
It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
Under Ubuntu, I'm using the integrated only, and offload to the real GPU using bumblebee, but the battery still drains too quickly.
if you didn't already had a look at this, you can check http://linux-hybrid-graphics.blogspot.com/
I downloaded the code, compiled and installed the module...got a increase of 5 hrs on Fedora 15.
The will soon be fixed if you have an ATi and use the FLOSS drivers. There is no need for 'hybrid graphics'-bla bla bla as you can achieve this purely with software lol.
Some guy from Red Hat is now working on GPU distribution-stuff. That means being able to also run both at the same time. Currently the status on this is that he can now only do it by restarting the X.org server and just one GPU at the time with 'vga-switcheroo'.
Progress is on its way ;)
Here be signatures