64-bit x86 Computing Reaches 10th Anniversary
illiteratehack writes "10 years ago AMD released its first Opteron processor, the first 64-bit x86 processor. The firm's 64-bit 'extensions' allowed the chip to run existing 32-bit x86 code in a bid to avoid the problems faced by Intel's Itanium processor. However AMD suffered from a lack of native 64-bit software support, with Microsoft's Windows XP 64-bit edition severely hampering its adoption in the workstation market."
But it worked out in the end.
For over tweenty years.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Does 64 bits really mean that every program is twice as big as it needs to be? Every time I hear about an innovation that requires things to be bigger, I question the necessity.
Erm that 'smart memory management' (PAE) has a nice big performance hit. Somewhat bigger than a 3% slowdown.
Also 64 bit can handle bigger numbers (over 4.3 billion) an awful lot faster than 32bit can. It doesn't help with small numbers but for the bigger ones 32bit processes them rather inefficiently.
My 32 GB of RAM, absolutely essential for my work, laughs at your "memory management" bullshit.
Life needs more saving throws.
If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build? 64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.
Sure, it helps with the 4GB memory space limit, but so can smart memory management and other approaches.
I could see it being useful for super-computing things, but in general, there still just doesn't seem to be a point.
Wow, just wow. Do you actually work in the software field???
AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.
Those were x86-based? The title was "64-bit x86 Computing Reaches 10th Anniversary", not "64-bit Computing Reaches 10th Anniversary".
FC Closer
We need 200 versions of Windows.
But it worked out in the end.
Yes, mostly due to the fact that we needed a way to get past the 4GB memory limitation, and not because we gave a damn about whether the processor was native x64 or not. AMD has had some great ideas, but they've almost always shorted themselves on the implimentation, leaving the field wide open for Intel to come in with a better offering and take the lion's share of the profit.
#fuckbeta #iamslashdot #dicemustdie
do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.
Bitch, the cluster I work with has over 700G RAM in each node.
Dude you really NEED a new Lithium prescription! Really!
And counseling!
And ..Oh for Christs Sakes! You NEED to be institutionalized!
They got TV, Video games - ANY one you want! Really!
And ...
Goddamn it Sam!
Guys ...
The parent is with me at Bellevue.
His name is Daniel. He's a Time Lord - that's why you can't get to him and why he's always first post.
Yes, he took me back to 2021 and we Trolled President Hillary Clinton and vp Jeb Bush. We're on the run from the Secret Service for that. i wish he wouldn't do shit like this!!
No ...really BUUY GOLD and MORE Facebook!!!
FUCK - they're shooting at us.... go tot go!
*Hurrrrr ...huurrrr ....hurrrr....hurrr wahhhhhhh .... ping*
I can't wait for 128 bit !!!
Heard x64 was barely faster than 32-bit, wrote this program to find duplicate files on Windows: http://poshcode.org/3377 - it's at least twice as fast in x64 than 32-bit. Naturally it won't apply to everything, but for certain things x64 is really good.
"...I think the Microsoft hatred is a disease." - Linus Torvalds
Depends on how it's coded, for example: 64 bit MAME runs around 30% faster than the 32 bit version: http://www.mameui.info/Bench.htm
Yes, but I work with embedded software, where the goals are to make it work with the smallest/cheapest hardware footprint possible. I know this differs vastly from the goal of desktop developers to use as much electricity as possible and code for bragging rights about specs and CPU usage.
Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.
Where 64-bit does become really valuable is working with very, very large amounts of sequential data (want to allocate a 10GB array? Can't do that on x86, no way no how). That's hardly a typical requirement right now (although I wrote a program a few weeks ago that needed to do it). However, it's getting closer. Additionally, while clever memory mapping can allow a 32-bit process to access over 4GB of RAM (just not all at the same time), there is a (small) performance impact associated with the need to be constantly re-mapping that memory.
The other area where 64-bit really helps is with security, specifically exploit mitigation. High-entropy ASLR in recent versions of Windows and some other OSes randomly places 64-bit aware executables and their various data regions across their entire 64-bit address space. This not only makes it completely impossible to correctly guess the address of any given bit of code in memory, it also makes spraying (heap spray, JIT spray, etc.) attacks completely infeasible; to cover even a tenth of a percent of the address space, you'd need to spray 16 million gigabytes of data. That's not only quite impractical at modern CPU speeds (even on a blazingly fast CPU and done in parallel, it would take a week or more), it also is far more memory (physical or virtual) than any modern computer will be able to allocate.
There's no place I could be, since I've found Serenity...
I've seen Firefox run into the 2GB user-mode address space / process limit many times... Chrome and (recent) IE don't have this problem due to per-tab processes, but Firefox definitely hits it when you use as many tabs as I do.
There's no place I could be, since I've found Serenity...
I vote roman_mir for best current Slashdot troll.
Was Win XP x64 ever widely used? Nowhere I worked ever formally supported it. The driver support was poor, as Vista x64 was adopted more widely before the advent of 7.
In fact, I think we quickly migrated the 2 people (out of 20k+) we found using it.
I can't wait for the IPv6 version of this article in 2060.
Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Most programs don't need more than one floating point pipeline.
Most programs don't need lots of cache.
The pros outweigh the cons.
Not if by node you mean NUMA node.
Bitch, the cluster I work with has over 700G RAM in each node.
Sure, but can you handle that much RAM or is it mostly hanging idle.
do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.
1) Yes, I do.
2) You are so wrong that it's actually funny.
Like other commenters, I disagree with you for the most part, but I have to add this:
If it's such a success, why do so many people still run 32-bit OS's, even on 64-bit CPU's? Why isn't there a 64-bit version of Chrome? Why is there even a 32-bit version of anything?
That makes no sense as you application "should" be mostly I/O bound.
How did you test this speed difference?
Hello roman_mir.
64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.
Maybe there is a lot of software written in C that uses int or unsigned when it should have typedef'd a size appropriate for its needs. Programmers need to be mindful of when scaling to the machine is important, and when narrowing to the application is important. I pity the fool who uses an array of unsigned to store values that will never exceed 255.
XP x64, Microsofts ginger step-son of an OS. Ignored and dropped like a hot potato as soon as they could.
You couldn't get drivers for half the stuff, even MS didn't provide their own software and lots of 'free for home, pay for commercial' stuff would detect it as 2003 Server and refuse to run/install.
Somewhat of a shame really as it wasn't a bad OS.
So those 32 extra bits of memory addressing are nice. But don't forget about that 1 extra bit for identifying registers!
Those who fail to understand communication protocols, are doomed to repeat them over port 80.
The speed improvement depends entirely on whether or not you are taking advantage of the features provided.
Remember that not all things in your address space are "memory".
Games that need to access obscene amounts of data quickly and randomly, for example, are able to map gigs of data into their address space and let the OS deal with caching. This in particular has led to massive speedups for me personally in software I develop.
More address space == better, for many reasons.
32-bit address space, and weak PAE workarounds were just a bigger form of the i386 (real-mode) days with EMS extensions. Crap.
I respectfully submit ShanghaiBill for your consideration.
with 32 bit on some system you get like 2.5-3.7gb useable ram. and yes video ram eats from the 4gb pool.
can you handle that much RAM or is it mostly hanging idle
God made hand size proportional to RAM for a reason.
A 32-bit x86 app has access to 8 32-bit "general purpose" registers - they ain't really all general purpose because three of them are the stack pointer, frame pointer, and program counter.
A 64-bit x86 app has access to 16 64-bit "general purpose" registers. Optimize away the use of the frame pointer (if you can), and your app goes from 5 32-bit registers to 14 64-bit registers.
Of course, when you wrote your app you didn't do stupid brain-dead shit like "gee, size_t is really an unsigned int, so I'll use that to hold this pointer value", now did you?
And for those that want the best of both worlds, there is the x32 ABI, which uses all the good stuff from x86-64 (more registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction... ) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.
They're working on porting Linux to the new ABI...kernel and compiler support is there, not sure about all the userspace stuff.
If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build? 64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.
Sure, it helps with the 4GB memory space limit, but so can smart memory management and other approaches.
I could see it being useful for super-computing things, but in general, there still just doesn't seem to be a point.
Wow, just wow. Do you actually work in the software field???
methinks he works in the middle school education field, and not as an instructor
with 32 bit on some system you get like 2.5-3.7gb useable ram. and yes video ram eats from the 4gb pool.
I'm not sure if you're agreeing with the parent or disagreeing. Per-process you get less than 2 GB, for the system as a whole you get somewhat less than 4GB (depending on how much the system has mapped to something else). PAE can hack around the 4GB system limit somewhat.
He IS too stupid to be real.
Uh, no. He never said anything like that. But hey, don't let the facts stop you... just keep repeating that retarded meme.
The program is written in C#. Only MS knows what is going on there.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I just want all of you to know that you're making this thread very hard for me to score cause I'm not sure who's exactly right and who isn't. that is all. you may carry on now.
In my day, a Beowulf cluster had 128MB ram per node.
Uphill. Both ways. In the snow.
I am very small, utmostly microscopic.
Snow?
My first computer required that you toggle in the boot loader binary code from front panel switches!
That has to be the modern equivalent of hand crank started horseless carriages.
In our algorithms lab there were programs that would gain more than 2x when compiled for 64 bit.
A more "real-world" example is when I started in 2005 at my current company. The engineers had 6-month old P4s @ 3.2 or 3.4GHz, running 32bit linux. For a project they used VisualStudio on VMWare and it took over a minute to compile the project. The company allowed engineers to choose their hardware, so I built an Athlon 64 @ 2.2 or 2.4GHz and I had it run 64bit SuSE. I remember the shock and awe from the first time I tried to compile the project under VMWare - a little more than 10 secs - the engineer next to me had his jaw drop. Of course most of the engineers immediately requested to switch to 64bit machines. I am not sure why it made such a difference in that application - perhaps the 16 general purpose registers come in really handy in this scenario? Of course it didn't help that the P4 was slower in everything (funny how at the time very few reviews really clarified this), but not order of magnitude slower...
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
I never knew it was suposta be faster
That's why I switched to using 1 bit microprocessors. My programs are really small now. I just wrote a database which I can fit in my pocket.
do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.
On Debian Linux and I can peg with Flash a stupid Zynga game running past 3GB of RAM. For Multimedia Creation/Editing you bet your sweet ass 64 Bits matters. Then again Linux doesn't have shit like GCD and quality OpenCL built in the OS with app suites that can leverage both and welcome 32/64 GB of RAM with open arms. Quality drivers, quality OpenCL/OpenGL etc., are coming with all the hard work at LLVM/Clang, Mesa and more. When that shit lands you better believe 64 bit matters and any heavy engineering/scientific computing, to Blender Modeling/Rendering damn well loves it. So does GIMP.
Kristopeit > ShanghaiBill
The man created dozes upon dozens of accounts due to his numerous bannings, and to this day people still debate whether he was real or a script.
Top. Shelf. Troll.
I think if you understand how truly horrifying PAE is, you would have no doubt at all that 64 bit platforms were the way to go. There's a lot of memory management cruft in the Linux kernel that x86_64 eliminates.
x86_64 also slipped in a few much needed enhancements to the ia32 architecture, including some extra general purpose registers.
http://en.wikipedia.org/wiki/X86-64
I am constantly hitting the 3.3 MB RAM cap on my 32 bit machine at work just having the applications I need to do my job open at the same time. Combined with the fact that the hard drive is fully encrypted makes using it for swap space extremely expensive. I would kill someone for a 64 bit machine at times just for the increased RAM space.
Yeah, he's trolling in real life. This is the email I sent him to refute his garbage. No response. Imagine that.
Are you really serious about having 650 thousand lines in your hosts file? I can't imagine why you'd need that many. It also has a crippling effect on one's computer.
To test this, I created a sample copy of a hosts file with that many entries, using the "0" shorthand for IP address and a randomized hostname of average 32 characters. Total size of this file is 22855 kilobytes, and after an hour the DNS cache had only loaded a third of it in. This is primarily due to the choice of algorithm used by the DNS cache service - it wasn't designed for tens of thousands of hosts file entries to be stored, so uses a rather inefficient method of growing the space used to store that involves copying huge swathes of data around for each new entry. It also blocked any name lookups while loading the file.
So instead of this, I tried with only 65k entries, and made three copies of this file. Each had an identical list of hostnames, but used "0", "0.0.0.0" and "127.0.0.1" respectively. The DNS cache now took 1 minute 55 seconds to load each one; the choice of IP address style didn't make any difference to the loading time as the bulk of the processing was in inserting new entries as described in the paragraph above. Name resolution was at normal speed after that, though. Searching in-cache - even for such a large set of data - added no discernible penalty.
I decided to try with the DNS cache disabled. This isn't a good idea, as it forces uncached name resolution to be done for every single lookup. This is indeed what it did, and the original 650,000 entry hosts files added around 3 seconds onto every single name lookup, the amalgamated effect of which slowed general Internet access down considerably. Unlike the DNS cache loading, this time there was a slight difference in loading times between the different hosts files - this was expected, as it was reading the entire file each time so that became the bottleneck.
Finally, to address your last question: every IPv4 address is sorted in the cache using the same size of four bytes. e.g. both "0" and "0.0.0.0" become 00 00 00 00, both "127.1" and "127.0.0.1" become 7F 00 00 01, and so on. This is consistent with the binary format used in the sockets API.
In conclusion, using the hosts file to store tens of thousands of entries has a negative effect on the performance of Windows' name resolution. You should really consider another option to filter all those hostnames.
Ahem, there was no x86 in the title when it was posted.
An awful lot of people run 10 year old computers, and also an awful lot of people run XP on computers that could handle 7 64bit or linux 64bits. So you'd better have a 32bit version of your program (Google Chrome, Google Earth, Firefox, whatever).
Though, it ought to be easier to have a fully 64bit system (a linux distro without Wine might do it, if you're careful to not install 32bit software and if Chromium and/or Firefox are 64bit there. But the benefits is only not storing and running duplicate 32bit libraries)
Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.
You're arguing on the correct side, but what you wrote here is badly flawed. Packing multiple 32-bit values into a 64-bit register is near worthless, what is valuable is amd64 gives you twice as many general-purpose registers (that also happen to be 64-bits wide). A far bigger gain for 64-bit on x86 was the addition of full relative addressing. Instead of 32-bit jumps always being to absolute addresses, in 64-bit mode software can do addressing relative to the program counter. This helps a great deal with libraries, since instead of needing large relocation tables, they simply use relative jumps that are valid no matter what address the library is loaded at. With most processors using 64-bit mode loses performance due to having to shuffle more data around, x86 is about the only one that gains performance.
If the OP compares each file with every file, that would be CPU bound. With a well chosen hash table it shouldn't be.
If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build?
32? Heck, I'm still using 16. 32 and 64 are just a market gimmick to keep you buying new hardware.
so you have been properly trolled. i am appoint.
Is this still Slashdot? Nobody mentioned that Linux supported x86 64 in 2001, before it was even released, while Windows was stuck at 32 bit for another four years.
PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. So if you're dealing with tiny amounts of 'more than 2/3gb' then the overhead is a lot higher than if you're mapping out 2GB on every window change. PAE is just another form of paging. It is slower, but you're making numbers up from nothingness.
The interger math performance of the processer has nothing to do with it being 64 bit. Most (All now?) x86-64 processors internally will process 2 32 bit numbers in the same span as a 64 bit number if properly optimized by sending the 32 bit values through together. 64 bit code using less than the OS max for 32 bit code is actually slower than 32 bit code due to the increased pointer sizes wasting the processors registers filling them with 0s.
You really have no idea how processors work. While nothing you said is illogical, it is still in fact wrong in every account. Under the hood, processors don't work anything like they do on the surface.
Other processors also do other weird things. I have an 8 bit CPU that can handle 32 bit numbers in a single clock cycle, exactly like it does 8 bit numbers ... and the neat thing ... it can do 2 16 bit numbers in a single clock cycle! Why? Because the processor as I see it from a software developers perspective isn't anything like the actual hardware doing the work. Processors have translation units in front of them to provide you with one look while allowing themselves to rewire the backend in all sorts of different ways.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Perhaps not most, but a whole awful lot of programs want more than that. I'd say the mean for "large" apps on my laptop is 1.5GB, and the resident size distribution is (to my eye) more or less gaussian. That means that few apps want more than 2GB today, but if the average app grew by 33%, about half of them would be over the 31-bit size limit.
Dewey, what part of this looks like authorities should be involved?
Snow?
My first computer required that you toggle in the boot loader binary code from front panel switches!
That has to be the modern equivalent of hand crank started horseless carriages.
Takes me back to loading those Interdata model 3s with the front buttons so we could load the paper tape. Then we could watch the registers with lights on the front as our code executed. Ah glad those days are over.
Then you did something wrong.
There is no logical reason that an x86-64 procressor in 64 bit mode would perform faster than 32 bit mode unless you are memory constrained. Raw operations are not inherently faster in 64 bit mode than they are in 32 bit mode.
If you are not exceeding 32 bit memory limits, your 64 bit version SHOULD be a tiny little bit slower than the 32 bit version.
Let me guess, you ran it in 32 bit mode, then ran it again immediately after in 64 bit mode ... and then ignored the disk cache completely?
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Really? PAE is bad? Have you just learned to completely ignore segmentation unless its named PAE?
Segmentation on x86 is utter tripe as well, but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
My first point was that PAE does have a overhead somewhat larger than the 3% the parent mentioned.
And that overhead increases with the amount of ram you have. Sure 32gig of ram has very little overhead with PAE. That is of course unless you actually use the 32gig of ram and then it will be constantly swapping memory pages around.
Yes I know most people don't use that much RAM. My point is still valid.
Also my 2nd point was that 64bit processors handle big numbers faster, not small numbers slower.
Yes different architectures can behave differently. We are talking about x86 though.
Fact: 64bit x86 will process a 64bit number faster than 32bit x86.
Practically speaking, PAE (the NAT of memory addressing) is sufficient for the vast majority of users (and the specialized applications requiring single huge memory spaces are moving to specialized compute nodes). The one desktop use case where an application would require >4GB of memory was a browser with a ton of tabs, but browsers have started moving each tab into its own process and reaping a security gain to boot. I'd much rather the adoption been the other way around, with IPv6 becoming commonplace and x64 languishing. Processes on my computer should be using the OS's IPC architecture anyway: nodes on the Internet have bigger benefits from being full hosts.
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
My experience with moving applications to 64-bit that didn't need the massive single memory space was that I started paging a lot more, since they were allocating words twice as wide (and while I could address every molecule in the computer separately, the same number of them were still memory). Physical memories have since expanded to compensate, but I'd like to see some statistics on the entropy of the upper 32 bits of the average QWORD.
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
Maybe no one felt like sucking the Linux dick today.
but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.
Actually, no, it's a mode that changes the page table format to allow larger physical addresses in page table entries. Nothing to do with segmentation.
Notwithstanding all of that, amd64 also has more registers, so there''s less having to move stuff to and from memory and you can make most function calls by passing parameters in registers instead of on the stack. amd64 provides a worthwhile increase in performance just due to having twice as many general purpose registers (actually, more than twice as many because there's only really 4 proper general purpose registers on 32 bit x86 - amd64 adds 8 more registers).
Oolite: Elite-like game. For Mac, Linux and Windows
64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.
Maybe there is a lot of software written in C that uses int or unsigned when it should have typedef'd a size appropriate for its needs.
Software that's written in C, in all of the environments I know of for x86, has 32-bit ints (signed or unsigned) whether compiled 32-bit or 64-bit, so you're presumably not saying that those programs suddenly get 64-bit ints when compiled 64-bit. They will get 64-bit longs on UN*X (but not on Windows), and will get 64-bit pointers in either case.
x64 has twice as many registers. That alone means less having to move stuff in and out of memory, so that will improve the speed when compared to 32 bit applications. 32 bit x86 has only 4 truly general purpose registers. x64 adds another 8 64 bit registers.
Oolite: Elite-like game. For Mac, Linux and Windows
PAE is more or less old school segmentation.
PAE isn't segmentation at all. It's a mode that changes the page table entry format to support more physical memory. Maybe you're thinking of something else as being "PAE", but Intel's (and AMD's and...) idea of PAE is a Physical Address Extension.
I frequently have to run a VM on my laptop. When I had 32-bit Win7 running with 4 gb of RAM it was painfully slow, 8-10 minutes between boot and getting a login. After reinstalling with 64-bit Win7, exact same hardware, same VM, boot time went down to 3-4 minutes. With 8 gb of RAM boot time was around 2 minutes, not even enough time to go get a cup of coffee.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
Why would a 64 bit program be slower when modern processes are optimized for 64bit programs?
If you ignore ACs because they are anonymous - you're an idiot.
Does Windows actually have any middle ground that would allow an application to directly do its own segmented memory management, kind of like 64k realmode on steroids? Say, asking Windows for 9 gigs... 1 handled normally, and used for the program's "normal" runtime objects, variables, data structures, etc... and 8 that are stacked into the same 1-gigabyte address space and virtually bank-switched upon request of the app itself? Or some smaller segment size that happens to better-fit the data being banked? Just to give one example, this would pretty much solve the "Photoshop Problem" (and most video-decompression GOP-buffering problems, too). You could even resurrect some tricks from the old realmode toolbox, like arranging the banks so you can sequentially fetch the layers from the same pointer offset, and just toggle the segment pointer until you're ready to move on to the next chunk of byteplanes.
Complicated? Of course... but sometimes, you HAVE to touch the bare metal if you want to push the boundaries and redefine what a given piece of hardware can do ;-)
http://www.youtube.com/watch?v=PRrXi411ESA
http://www.youtube.com/watch?v=w6Ge6G9sT9E
Or, put another way... when Future Crew created Second Reality ( http://www.youtube.com/watch?v=XtCW-axRJV8 ) ~20 years ago, it wasn't written in an object-oriented language that ran under a VM.
No Linux fanboys on Slashdot today? That's hard to believe. That's about as likely as not having anyone who just can't admit, after all these years of fail, that their "team", Microsoft, sucks horribly. So bad that not Only is Apple spanking them in sales, but a bunch a greasy, toenail-fungus-eating hippies programming in their spare time kicked the crap out of Microsoft for servers, embedded systems, and just about anything that's not attchached to a 19" CRT.
and yet, I only a hand full of 64 bit programs on this windows.
32 GB Ram High-Five! Seriously, anytime Asus is feeling poor, they can release a Crosshair motherboard that takes 64 GB or perhaps 128 GB of RAM.
I am not through upgrading until I can virtualize the speed and location of every particle in the universe. Then I'm going to see what exactly this Time dimension actually looks like from a different angle. Maybe. I have a few other ideas, but I probably won't be allowed near a computer this powerful if I announce them all at once. =^_^=
I am John Hurt.
He's never written anything that's tested the limits of computing...
Meanwhile, I need only load up my badly coded evolutionary program to see my machine scream at the ~12 GB hit to the RAM. I say badly coded because I have found a few tricks to help get some additional memory savings out of it...also on topic, the aggression level was kind of low, so I imagine future tests might break the 32 GB barrier easily. Currently thinking of giving it a SSD for virtual memory...
I am John Hurt.
Thank you. The people spouting nonsense about 32-bit programming, and how they can't understand why 64-bit computing would be faster (in the x86 world) drive me loony...it's like they missed an entire year's worth of classes where we went over, in detail, the various changes, and why it's faster...and they have the gall to ask for your notebook the night before the final. I mean, it's impressive, that kind of blindness, but they're aren't getting the notebook without a pimp slap to go with it (extra baby powder).
It's kind of like watching the functional programming people slowly reinvent OOP...makes me scream inside. "Dude, we've figured out a new way to organize our methods / fields so that it's easier to keep them straight in our heads..." "Please God, let it not be OOP." "*talks for a bit*" "Damn it."
I am John Hurt.
Most programs don't need a GUI...but they tend to function better with one. Most computers don't need a SSD...but they tend to run faster with one, and users tend to agree that you can have your SSD back when you pry it from their cold dead fingers.
You don't have to fly First Class, you're getting there at the same time as the people in Business or Economy class...but it's a lot nicer.
I am John Hurt.
I don't.
It sounds like you where just talking to a very bad functional programmer. You also have the order completely backwards. ANSI Common Lisp was the first standardized OO language. But more importantly most "OO" concepts come from functional languages to start with.
Design patterns for the most part are actually adaptations of pre-existing functional concepts. For example Chain of Responsibility is really just a slightly simplified monad (input must equal output). The first Iterator pattern was (map fn list). Flyweight is a simplified form of Memoization.
Packages and namespaces also first appeared in many functional languages first. Encapsulation vai lexical closures has been around since Scheme was invented in the 70's. Lambda functions? Those little gems, making there way into every OOP language where invented with lisp.
You have missed the entire point though if you think OOP is about organizing you programs or something. OOP is largely about encapsulating moving parts into logical pieces. Functional code is largely about minimizing or removing "state" (aka moving parts) from your code. E.g. an input to a function should always give the same output. These concepts are not incompatible at all.
PAE is more or less old school segmentation.
PAE isn't segmentation at all.
I suppose it depends on how you look at it. If you view the page directory as a bank select then it is a sort of segmentation.
PC-relative addressing makes position-independent code significantly faster. This is useful for shared libraries, but also for position-independent executables which, in combination with address space randomisation, add some security.
SSE is guaranteed to exist. This alone accounts for most of the speedup, because compiling for x87 is really hard (crazy hybrid of a stack- and a register-based architecture), so generating SSE ops for floating point, even if you're only doing scalar arithmetic, is a lot more efficient.
More GPRs. x86-32 code ends up with a lot of stack spills because it only has a tiny number of general-purpose registers. x86-64 has 16, which makes it a lot easier to work with.
64-bit registers. On x86-32, 64-bit arithmetic is painful, because you need two registers for each of the operands, and you only have 6 registers to use (two of which must be used for the destination in a lot of ops). On x86-64, it's a lot easier to do sequences of 64-bit arithmetic without spills.
I am TheRaven on Soylent News
There is no logical reason that an x86-64 procressor in 64 bit mode would perform faster than 32 bit mode unless you are memory constrained
Or you benefit from more registers. Or you benefit from vastly more 64-bit registers. Or you're doing floating point and benefit from the compiler being able to assume SSE is present and never use x87 arithmetic. Or you're using shared libraries so benefit form faster position-independent code. But, apart from that, no logical reason at all...
I am TheRaven on Soylent News
Then this raving nutcase / troll can post his mind spool as much as he likes but its impact will be minimal.
So good in that AMD got the contract. It is money, no question about it, and the console market is not small. Better (for them) they should have it than IBM or someone.
So how is it bad? Low, low margins.
Consoles are very cost driven devices. Often sold at a loss initially, and then little to no profit later. The reason is they want to pack as much hardware as they can in for as cheap as they can. Well the other side of that is they lean on suppliers, hard, to offer very low prices. They don't give their suppliers a lot of profit. They don't force them to take a loss or anything (the suppliers wouldn't agree) but it is just this side of it.
So selling 50 million units for consoles is way less profitable than selling 50 million units for laptops, desktops, servers, that kind of thing.
Hence while it is better than having no sales at all, it is not as good as taking a bigger slice of the computer market.
Simula 67 was standardised in 1968. ANSI Common Lisp dates from 1984, and the OO implementation it includes (CLOS) was a relatively recent development at the time. CLOS is also a hack, although Lisp bores try and pretend it isn't by claiming the omissions make it "more poowerful".
I agree that 64-bit machines are somewhat niche, but I work in that niche.
If you do anything serious with Java, on Windows, because of the memory layout and the insistence of the HotSpot VM on being allocated contiguous stretches of address space, you're limited to about 1.2GB of heap space. When you have a domain that has object counts in the 3 - 5 million region, that fills up rapidly. This is for a big graph of objects and the queries for them involve lots of graph traversal. The code in question can do set queries in about 0.5s that an RDBMS takes over 5 mins to do, so there's a real value to caching all the objects on the heap.
Yes, I could use another language that doesn't have a stupid VM and have ample overhead in 4GB, although this data set will grow (even if it's not "social network" level of growth). But with working code in Java, it's much cheaper and easier to throw a 64-bit OS and another stick of RAM at it.
A shame that my employer is still tragically stuck in the 90s and thinks 32 bits should be enough for anyone..
He's de-duping files with SHA512, from the listing.
That will get a major boost on 64-bit machines just because of the increased word width. I imagine the hashing step is what is consuming most of the CPU time, and making the code CPU bound instead of I/O bound.
64 bitness was never about performance. It was always about larger address space. The fact that in some cases there is a performance increase is just a bonus.
You had snow??? All we had was a steady rain of comets and asteroids.
It's kind of like watching the functional programming people slowly reinvent OOP...makes me scream inside. "Dude, we've figured out a new way to organize our methods / fields so that it's easier to keep them straight in our heads..." "Please God, let it not be OOP." "*talks for a bit*" "Damn it."
I find it quite funny when the OO-crowd goes off like this :-)
(In case no one else clues you in: you've got it backward - Functional came first, and gave the world OO. OO now constantly reinvents everything that lisp had, under the guise of "new and improved")
I'm a minority race. Save your vitriol for white people.
If you're heavily multitasking perhaps not so much, but individual applications still only have a 32bits address space (some of which is reserved). That means even if you have 64GB of RAM your video editing program that would love some larger buffers is still limited to 3-4GB. So most of the RAM above 4GB will rarely be used, not a very efficient use of it.
For anyone interested in learning more about x86-64, Coursera, in conjunction with UWashington, just started a "Hardware/Software Interface" course that focuses on 64-bit processors.
You had it good. Those clay tablets were a bitch to load,
I am very small, utmostly microscopic.
This is simply incorrect about PAE. Why isn't it modded into oblivion.
Seriously, anytime Asus is feeling poor, they can release a Crosshair motherboard that takes 64 GB or perhaps 128 GB of RAM.
How much ram we can put in our desktops is not really up to motherboard manufacturers like ASUS, it's up to the CPU and RAM manufacturerers.
Current intel mainstream desktop CPUs support four DIMMs and current high end high end desktop CPUs support eight DIMMS. Afaict the largest DIMM of desktop memory* currently available is 8GB. So the current limit is 32GB for mainstream desktop and 64GB for high end desktop. I belive that the high end desktop stuff theoretically supports 128GB but noone makes the DIMMs needed to do it yet.
Workstation/server platforms can take a lot more than that both through supporting more DIMMs and through supporting types of DIMM that come in higher capacities. I've seen systems that claim support for up to 2TB of ram.
* DDR3, unregistered non-ecc.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Every 64-bit platform i'm aware of still has a 32-bit int. There may be some software that will waste memory on unix like systems when it uses "long" (which is typically 64-bit on 64-bit unix like systems) where a 32-bit value is fine but I doubt that is significant in the grand scheme of things.
The code itself is usually slightly bigger on x86-64 than on x86 which is probablly what the GP was reffering to but in the grand scheme of things code is usually pretty small and the greater efficiencies for position independent code offset this by reducing the chance of multiple copies of the same code being loaded at once due to load time relocations.
The real problem is pointer heavy code. If a program uses data structures that are mostly made up of pointers (or integers that could potentially contain a typecasted pointer and therefore need to be pointer-sized) then those data structures will nearly double in size on x64.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
If you do anything serious with Java, on Windows, because of the memory layout and the insistence of the HotSpot VM on being allocated contiguous stretches of address space, you're limited to about 1.2GB of heap space. When you have a domain that has object counts in the 3 - 5 million region, that fills up rapidly. This is for a big graph of objects and the queries for them involve lots of graph traversal. The code in question can do set queries in about 0.5s that an RDBMS takes over 5 mins to do, so there's a real value to caching all the objects on the heap.
Yes, I could use another language that doesn't have a stupid VM and have ample overhead in 4GB...
Actually, I think you're mistaken about how much more heap space you could get outside of Java. On Windows you can tune the amount of your address space taken by the kernel down and thus, possibly, start with as much as 3GB for user space available to your app. On Linux, you can't even do that and are stuck with 2GB to start. Load your libraries and runtime for *any* language and tool set, and you're not going to have all that much more than 1.2GB. Sure, maybe 1.5GB and on Windows maybe even a bit more. But until you go to 64-bit, you're stuck starting out with 1-2GB.
He's never written anything that's tested the limits of computing...
And he's making invalid assumptions about what ordinary users might need. It's not that 32-bit is enough because most users don't need anything that requires 64-bit, it's that most users have never been offered things that require 64-bit, because they didn't have it. You know what needs more heap than you'll get in 32-bit to work well? According to IBM researchers: voice recognition. Yep, their research finds that being able to keep around 2GB+ enables huge, qualitative improvements. In my own work, I've hit the limit trying to do something simpler: provide really good auto-complete suggestions based on the individual user's corpus of work. Then of course there's all sorts of other searching algorithms, there's a huge difference in usability when you can provide a user a "browsable" interface that responds literally at the speed of thought vs requiring the user to compose the entire query and then wait a few seconds.
Meanwhile, I need only load up my badly coded evolutionary program to see my machine scream at the ~12 GB hit to the RAM. I say badly coded because I have found a few tricks to help get some additional memory savings out of it...also on topic, the aggression level was kind of low, so I imagine future tests might break the 32 GB barrier easily. Currently thinking of giving it a SSD for virtual memory...
So is 64-bit with 64GB in your computer not an option for you?
But 64 is twice as big as 32, thus it must be twice as good!
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
Don't forget the obvious benefit of a 64-bit time_t data type when 2038 comes along.
Current intel mainstream desktop CPUs support four DIMMs and current high end high end desktop CPUs support eight DIMMS
Not exactly. Memory controllers support ranks, not DIMMs. One rank is one fully populated bus width. Standard DDR memory controllers are 64-bits wide, and memory modules are typically 8-bits, meaning you have eight modules to a rank. The memory controllers on desktop CPUs typically support two ranks per channel at full speed, and four ranks at reduced speed, so two ranks per double-sided DIMM, and two DIMMs per channel. On the other hand, if you get high density quad-rank DIMMs, then you can only add one per channel.
> Afaict the largest DIMM of desktop memory* currently available is 8GB.
> * DDR3, unregistered non-ecc.
Depends how you define "desktop memory"
16 GB, and 32 GB sticks are "available" in extremely limited supplies
$360 Kingston 16GB 240-Pin DDR3 SDRAM DDR3 1333 Desktop Memory Model KVR13LR9D4L/16
http://www.newegg.com/Product/Product.aspx?Item=N82E16820239525
$1400 HP 627814-B21 32GB DDR3 SDRAM Memory Module
http://www.newegg.com/Product/Product.aspx?Item=N82E16820326202
Not sure if this counts as desktop memory ... (technically NewEgg lists it as Server Memory)
$1400 IBM 32GB DDR3 ECC Registered DDR3 1066 (PC3 8500)
http://www.newegg.com/Product/Product.aspx?Item=N82E16820135081
> I've seen systems that claim support for up to 2TB of ram.
The HP ProLiant servers support up to 2 TB with 64 DIMM slots. Only $10K for the mobo, the RAM will only cost you $90K :-)
http://h10010.www1.hp.com/wwpc/us/en/sm/WF04a/15351-15351-3328412-241644-3328422.html?dnr=1
But yeah, looks like we have to wait another 10 - 20 years before we start seeing "normal" desktop motherboards support more then 128 GB. The 4 or 8 DIMM sockets will be "good enough" for a long time.
you're limited to about 1.2GB of heap space
That always pissed me off when trying to load large datasets. I remember buying a pricey, brand new dual-core Opteron and 2GB of memory back in 2005, so I could work on some things at home, and having to reboot into Linux to actually make use of it. Even on XP64, 32-bit applications still fell under the same restriction.
Actually, the x87 FPU has always been 80-bit precision, even on old 32-bit processors. There was no significant improvement in floating point performance between K7 and K8, besides clock rate. 32-bit versus 64-bit only holds relevance for integer math and pointers.
Thank you. The people spouting nonsense about 32-bit programming, and how they can't understand why 64-bit computing would be faster (in the x86 world) drive me loony...
To be fair, increased register space is completely independent of 32-bit versus 64-bit processors. It was a much needed architectural improvement that just happened to coincide with the transition. The only direct computational improvement of a 64-bit CPU is when doing 64-bit integer math.
Because various things suddenly take up twice as much memory, and thus require more memory bandwidth to operate at the same speed. In reality, the performance hit is slight, and more than accounted for by the increased register space available to applications properly compiled for x86-64.
Beyond me why they are having this argument at all. Is it even possible to buy a 32-bit PC any more? I have had 64-bit PCs for the last 5 years, running 64-bit software. I don't "need" 64-bit over 32-bit and I am sure that some of what I do, like editing, could be on 8-bit. It's not for any performance gain, 64-bit is just the current standard as far as I am concerned.
Yet there are millions of people out there running 32-bit OS's on 64 bit PCs - why?
Actually, you get however much of that memory is split off to userspace. The default on Windows is a 2GB/2GB split. Linux defaults to a 3GB/1GB split, offering more available to the application. In both cases, that is a user-configurable option.
There is no "31-bit size limit". It's a 32-bit computer, able to access 32-bits, or 4GB, of memory.
That 3.3GB cap is likely because you have 512MB towards your video memory, and another couple hundred MB consumed elsewhere. All accessible memory gets lumped into the same 4GB cap. Why not make a separate swap partition independent of your encrypted system partition?
I'm not sure what in my post you think you're referring to. My point about SSE is that all x86-64 chips support it, only some x86-32 chips do. It is part of the basic ISA, not an extension. This means that ABIs use it (for example, the SysV ABI uses SSE registers for parameter passing and value returning). And, because it's always there, the compiler can use it. It is vastly easier to generate code for a machine that has 16 registers, any of which can be used as operands for any instruction, than one where you have 8 registers and most operations can only use 2 and a lot have the side effect of moving all values up or down one register. You often end up with a lot of spills in x87 calculations because register allocation and instruction selection are really hard.
I am TheRaven on Soylent News
Despite the fact that two of those modules are listed in the "desktop memory" section they are all listed as being "registered". Afaict at least in the intel world "registered" memory can only be used with "server" platforms.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
I'm confused. If the first chip supporting x86 64 didn't come out until ten years ago, how did Linux support it two years prior?
You are not alone. This is not normal. None of this is normal.
Dammit, that's two days in a row that a very "interesting" post has gotten my attention enough for me to click the Parent link and it's been this stupid spam post that wasn't useful or current when it first started being posted several years ago.
Just stop replying to this crap! Sure, sure, their parents won't piss in their skull if you don't make the effort, but damn I'm tired of having to scroll past all of that.
This might be the 10th anniversary of 64 bit x86 chips, but it's been 22 years since MIPS released their R4000 chip, and 21 years since DEC released the first ALPHA chip. Either of which are superior architectures to that lowly AMD Athlon 64. But... neither of which ran the "standard" of x86 instructions.
Yet another example of mediocrity beating out a superior technology through better marketing. And the customers -- that would be us -- paid the price by waiting nearly 25 years for the inferior tech. to finally catch the frell up. And this, you want us to celebrate? Um.... no.
PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. [..]
You really have no idea how processors work.
Actually any x86-64 processor in Long mode uses PAE with an extra level in the heirarchy. See for yourself:
http://en.wikipedia.org/wiki/Physical_Address_Extension
I don't think you can make an argument that PAE is slower than x86-64 when it is in fact used by x86-64 in long mode.
I think if you understand how truly horrifying PAE is, you would have no doubt at all that 64 bit platforms were the way to go. There's a lot of memory management cruft in the Linux kernel that x86_64 eliminates.
x86_64 also slipped in a few much needed enhancements to the ia32 architecture, including some extra general purpose registers.
http://en.wikipedia.org/wiki/X86-64
You do realize that in X86-64, PAE is actually the addressing scheme used right? See for yourself:
http://en.wikipedia.org/wiki/Physical_Address_Extension
I don't think you can argue that x86-64 is somehow superior to PAE when it in fact always PAE addressing in long mode.
Only 256MB are mapped to the vid card even if you have more vid memory (possibly 256MB are reserved no matter what even with a 128MB or 64MB one?). Only having multiple video cards will make you waste more memory.
I suppose it's the slashdot equivalent of being Rick-Rolled.
Microsoft dropped the ball on 64bit, Linux does too, however because most Linux tools are open source they can just be recompiled, so it isn't as big of an issue.
However compared to Solaris and Apple Implementation from 32bit to 64Bit, the PC transition is very sloppy.
We have Windows 32bit and 64bit. You would expect if you have a 64bit Computer that getting the 64bit OS would be the best choice. No not really, there are too many (Not most, but a lot of them) 32bit apps out there that just will not work, or if you need to have them talk across each other you get more issues. .NET would have worked on helping resolve the issue 10 years ago, why else would we have a development platform that compiles to run as slow as Java but only works for Windows, I figured it would be for an easy transition to 64bit systems. No .NET doesn't even do that too well.
I though
Sure the old 16bit apps for Windows 3.1 have finally died, I can get over that, but if you have Office 2007 and Office 2010 apps installed on the same system, you can get into trouble with some other tools that integrate with them.
Working with Solaris during this transition a few years earlier, it was seamless apps worked as designed and we weren't fighting 32bit vs 64bit. Apple too made it transparent. But Microsoft really dropped the ball, they could have allowed the move to 64bit happen much earlier, but they were too busy fighting Linux and Apple and Google vs trying to make this migration easier.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
I don't know what's more disturbing. That you think someone will bother spending more than couple of seconds even glancing over at that drivel. Or that I did.
I mean what the fucking hell is that crap? Oh well..
"Let me guess, you ran it in 32 bit mode, then ran it again immediately after in 64 bit mode ... and then ignored the disk cache completely?"
Nope, I did dozens of runs for each, ignoring the first result that was obvious disk I/o bound (because it was much longer). As others have explained, and I said, some code benefits greatly from x64.
"...I think the Microsoft hatred is a disease." - Linus Torvalds
AMD wouldn't launch a chip they had never booted, so it couldn't really be publicly released BEFORE it had any OS support. So, you need an OS before you release.
In fact, they really wanted to test the instruction set and design before spending big bucks fabbing the chips. Linux had already been 64 bit for five years on Alpha, so 64 bit Linux was well proven. So they created an emulator for the new instruction set and Linux was ported, running on the emulator. Therefore, Linux supported x86_64 before the processor physically existed.
On Windows you can tune the amount of your address space taken by the kernel down
Link? I'm interested to see how that's done.
On Linux, you can't even do that and are stuck with 2GB to start.
Huh? The default 32-bit Linux memory split is 3/1; 3 GiB for userspace, 1 GiB for kernel space. If you compile a custom kernel you can configure this differently.
Did you perhaps swap "Windows" and "Linux" in your comment?
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I think a good project would be to port OpenVMS to a still surviving RISC such as MIPS, POWER or even SPARC. Both MIPS & POWER have open consortiums, so making a 'VAX' platform based on either, and then porting OVMS to it would elongate its lifetime for those who must have it. Even better would be to re-implement the Alpha 21364 architecture - the netlists and whatever - to today's lithographies w/o changing a thing about frequencies, or anything else. You'll get a lot cooler CPU which runs the same OVMS software w/o any changes, and can then just extend the lives of existing Alphaservers indefinitely. Certainly better than nervously following HP and Itanium to goodness knows where.
Love that idea. Send it here: mailto:feedback@slashdot.org.
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
I suppose it depends on how you look at it. If you view the page directory as a bank select then it is a sort of segmentation.
If you view the page directory as a bank select then any form of paging with more than two levels of page table is a sort of segmentation, including x86 paging without PAE (all the way back to the 80386), and the form of paging on just about every modern processor.
...typically using a signed int for pointers so that you don't have to have separate code paths for adding and subtracting from an address. The host itself can address 4GB but the OS may not let individual processes go over 2GB. This is the case in Windows x86, for example.
Dewey, what part of this looks like authorities should be involved?
Disclaimer: I'm a Mac guy working in OS X (which is different yet), so I'm only generally familiar with Windows & Linux VM details. Anyway:
On Windows you can tune the amount of your address space taken by the kernel down
Link? I'm interested to see how that's done.
It's a /3GB switch in boot.ini. Took me a while to find any decent info about it (because in the Windows world there seem to be a bunch of hosers with blogs who don't know the difference between kernel address space and the page file, and google doesn't that know their posts are tripe)--the normal is the even split of 2GB each to kernel and user space, this makes it 1GB to kernel and 3GB to user space.
On Linux, you can't even do that and are stuck with 2GB to start.
Huh? The default 32-bit Linux memory split is 3/1; 3 GiB for userspace, 1 GiB for kernel space. If you compile a custom kernel you can configure this differently.
Did you perhaps swap "Windows" and "Linux" in your comment?
No, I was simply mistaken about the default split on Linux. (And my comment about "can't do that" referred to user-accessible configuration, not building your own kernel...)
Yes, this is a bit of an oversight on my part. I really should have been discussing the way PAE is implemented on ia32 processors. I had a little bit of difficulty finding information about it online, I'll have to consult my architecture books at home, and will expand on the original post. Here's a bit of the info I could find about PAE weirdness on IA32.
https://www.kernel.org/doc/gorman/html/understand/understand005.html#sec: High Memory
The key quote:
That is NOT the case for Windows x86. That is the case for the default configuration of Windows x86. It could be modified to the users' preference using a boot flag.
How does 32-bit Linux handle video cards with a gig or more of VRAM? Honest curiosity; the last time I ran 32-bit Linux I think my video card had only 256MB. On Windows, I think it would just fail to load the video driver, although I haven't checked (been a long time since I ran 32-bit Windows on raw hardware too).
There's no place I could be, since I've found Serenity...
Aha, thanks for the clarification. I know a few assembly languages, including x86, but had never really read up on the x64 extensions. Doubling the register count *and* doubling the width is definitely a huge improvement, as is being able to do relative jumps with large offsets.
There's no place I could be, since I've found Serenity...
You're embarassing yourself Jeremiah Cornelius http://slashdot.org/comments.pl?sid=3581857&cid=43276741 since you posted that using your registered username by mistake (instead of your usual anonymous coward submissions by the 100's the past 2-3 months now on slashdot) giving away it's you spamming this forums almost constantly, just as you have in the post I just replied to.
You need to learn a bit more logic I think. There are logically many reasons, but most are not obvious.
No, I was simply mistaken about the default split on Linux
Ah, okay.
And my comment about "can't do that" referred to user-accessible configuration, not building your own kernel...
On Linux building your own kernel is user-accesible configuration :-)
Seriously, on Debian/Ubuntu, it takes a maximum of three commands (including installing all of the required tools), and it may be doable without touching the command line at all. It definitely doesn't require editing any files. It's more time-consuming (due to the time required to download tools and build) but may be easier than finding and modifying boot.ini.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Thing is, I'm positive there were multiple copycats, and with the content Kristopeit usually wrote it was trivial to duplicate. In a sense it made the whole Kristopeit troll even more epic because not only would he (or it?) be writing troll posts but copycats too and it just spiraled into madness.
With APK it's almost impossible for a normal, rational individual to accurately duplicate the unique style of a genuine APK rant. That long copy pasta that's been making the rounds is a pretty good approximation though.
Celebrity worship is a poor substitute for Deity worship and costs more to boot.
Dude. It's not 1993 any more. Being too impressed with a bunch of screensaver effects is definitely a 1993 thing. And nobody sane wants to manually bankswitch memory in and out. It's not going to speed anything up, it's just extra pain, for no good reason given that today's processors are 64-bit.
Speaking of which, the "scene" had an awful lot of people working on 68K platforms (Atari ST, Amiga). The 68000 had no such thing as bank switching, it had a simple linear 32-bit address space. This was not a problem for democoders. In fact, they loved it. Freed them up to focus on the things which mattered.
For that same reason, it was pretty common for advanced DOS games (and probably demos too) to use a DOS extender (either one of the standard ones, or homegrown) so they could enjoy 32-bit mode with a 32-bit flat address space.
(Also... what video decompressor uses even 1 gigabyte of RAM?! It's amusing that you think this is a great example of a problem for which Bank Switching Is The Answer.)
I need to offer you credit; you are right. The issue isn't really PAE, it's how the kernel manages memory on 32 bit x86 architectures with more than 1GB of memory installed. PAE simply exacerbates the problem. Here's an explanation of the complaint:
On ia_32 systems, the kernel splits memory into 3 zones; DMA, NORMAL, and HIGHMEM.
ZONE_DMA is the first 16MB of memory, and is generally avoided unless needed (due to lack of available higher memory, or for DMA mappings.) The kernel tries to reserve this address range for devices that use DMA mapping.
ZONE_NORMAL is an address space that is directly accessible to the kernel, and extends from 16MB to 896MB. Kernel data structures are stored in this space, including the kernel page tables. Memory mappings start to consume a lot of memory in ZONE_NORMAL, and thus PAE on ia_32 with a lot of installed memory can cause out of memory issues, even when there is a lot of available physical memory. User data can be allocated into ZONE_NORMAL, but is preferred to be placed in ZONE_HIGHMEM to free ZONE_NORMAL for kernel data structures.
ZONE_HIGHMEM is memory above the 896MB barrier. This address range is not directly accessible to the kernel. In order for the kernel to access anything in this zone, a temporary map must be made into ZONE_NORMAL. These mappings consume pages of ZONE_NORMAL, and suffer a performance hit. User space processes can access these pages directly (handled by the virtual memory manager system, of course.)
Generally, memory will be allocated to ZONE_HIGHMEM, ZONE_NORMAL, or finally ZONE_DMA in that order of preference.
The x86_64 architecture eliminates the need ZONE_HIGHMEM. ZONE_NORMAL extends all the way from 16MB to the end of physical memory. This approach simplifies memory management, improves performance, and is generally more flexible.
You're correct that there was a major issue with my original post... My memory of the kernel architecture had garbled HIGHMEM with PAE, and I was thinking that PAE required mapping pages above 4GB into lower memory. This would of course cause a huge performance penalty for any process consuming memory above 4GB. I deserve downmods for the technical inaccuracy.
Here's a very brief summary of the problems with HIGHMEM:
http://linux-mm.org/HighMemory
Here's a bunch of links used to refresh my memory:
http://www.makelinux.net/ldd3/chp-15-sect-1
https://www.kernel.org/doc/gorman/html/understand/understand005.html
http://unix.stackexchange.com/questions/5143/zone-normal-and-its-association-with-kernel-user-pages
Different AC here. You're reading far too much into that kernel.org doc. PAE is not weird at all. An ia32 CPU with PAE supports page tables which map 32-bit virtual addresses to 36-bit physical addresses. That is, the physical address in a page table entry (PTE) is 36 bits wide, instead of the original 32. Simple.
What the kernel.org doc is talking about is practical limits in ia32 Linux, not problems with PAE. In order to permit fast user/kernelspace transitions, ia32 Linux defaults to a 3/1 split. This means the currently running user process and the kernel share page tables. The user process lives in the lower 3GB of virtual address space, the kernel (and memory mapped IO devices etc.) live in the upper 1GB, hence 3/1. That's key to understanding that doc you quoted. It doesn't say so outright, because to a kernel hacker it's like breathing, but kernel address space on ia32 is a precious resource. Apparently at about 16GB physical RAM the kernel data structures required to track more RAM simply become too large.
Other operating systems have alternate design choices. For example, OS X uses a 4/4 split on ia32. User processes get 4G address spaces with no kernel stuff mapped, and the kernel gets its own 4G address space with no user stuff mapped. There's a performance penalty (the system has to flip between user and kernel page tables every time the user process makes a system call), but it's also better at supporting large memory configurations on ia32+PAE because the kernel's address space layout is much less cramped.
It's certainly possible to replumb ia32 Linux to make more than 16GB work. But as the kernel.org doc implies, the kernel community's response to this idea boils down to "here's a nickel, kid, buy yourself a real computer". Which is not actually unreasonable today, given the ubiquity of 64-bit x86, the 64-bit Linux kernel, and 64-bit Linux application software.
Yep, you're right. I corrected myself in another post.
Thanks for the suggestion, I did just that.
If they implement that, I'm going to have to figure out a way to give you a virtual hug.
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
You fail it, Paul. Your skill is not enough.