You set up the CA Authority - and use it to self-sign your certs - and it's safer than a commercial one.
That depends what you mean by "safer".
It's safer to you. Onto your machines you can install the certificate of your CA, and you'll know everything is peachy.
But if your audience is "the general internet population", e.g. because you're trying to sell stuff to them, it's less secure. Without a trusted or semi-trusted third party (normally served by the default CAs), there is no way to convey the authenticity of your own CA and thus of your own public key to them.
because the CA is just one extra link in the chain to be broken.
No, no it isn't. Not really.
According to this post, this post, and my own intuition, CAs never see your private keys. A CA cannot reveal more information than is known publicly anyway, even if they are thoroughly malicious. The most you could argue about the standard set up is that CAs give a false sense of security.
I can only think of one attack that could occur with CA-signed certificates but not with self-signed certs. If you remove all default CAs from being accepted and just store the fingerprint of the public key (e.g. what happens with SSH), then it becomes impossible for the real amazon.com public key to be silently substituted with an imposter (but malicious-CA-cleared) amazon.com public key. But if you don't clear out your list of CAs, there is no hard benefit to be gained here.
Actual question: do the CAs even ever have access to the private keys?
I'm pretty sure there's no technical reason they need them -- the CAs just need to attest to the public key, which they could do just by signing the public key. But that doesn't mean that's how the system is set up in practice, of course.
The reason I ask is that there is mention in some of the threads in this story's discussion that refer to statistics like this. But an Apache OpenOffice guy makes a good case that those statistics are wholly misleading.
Although the quality of Microsoft Products have risen after gates left. When Gates left XP was just released and getting hammered by security issues. Compared to say Windows 7 and even Windows 8 which runs very stable and is a lot more secure than ever. A big part of that was the "trustworthy computing" initiative that Gates started though. Actually even Vista was very close to being released (Nov 8, 2006) when Gates announced he was reducing his day-to-day role at MS (June 15, 2006).
There have been compatibilitiy problems irking consumers ever since Vista x86_64 hit the market.
What? I never ran Vista x64, but I did run Windows 7 and do run Windows 8 in 64-bit. The only compatibility problems I've ever seen are with 16-bit programs which I cannot run any more. XP 64 had more problems, but I've even had success with that. (Then again, I didn't have to set that one up.)
I don't doubt that there were occasional problems, but there would also have been occasional problems with just Vista, regardless of bitwidth. Almost everyone who's talked about 64-bit Windows says that the issues were basically ironed out in Vista and 7.
So what compatibility problems do you refer to?
And because people on/. seem to "forget" their history, ARM isn't even close to the first non-x86 architecture that Windows has been available for; it's previously supported Alpha (NT 3.1-4.0), MIPS (NT 3.1-4.0), Power (3.51-4.0), and Itanium (XP, Server 2003, and Server 2008).
Yeah, it's way better to screw your users over something that you have a smaller chance of affecting than you have of seeing some pigs aloft over a week-old snowball in hell.
The moral of the story: keep a can of mosquito spray in your trunk (Those things don't exceed 120F right?)
"Those things" = trunks? Ahhhahahahaha.
It's probably easy to exceed 140 degrees. I've personally measured my trunk to be off the scale of the thermometer I used (a combined clock thermometer; I don't know what it's highest temperature is but I saw it at 121 after it had been cooling off inside for a few minutes), and I live in the northern US. I wouldn't be surprised if you could hit 160 in something like Texas or Arizona.
They said that they were offered a revenue-sharing stream of $10K/month for 30 months. The reason they declined that, from what the editors were saying, was that it would leave too little revenue for other stuff. They also say that the current owner is using the revenue stream to fund other, unrelated startups.
If those are true, it's likely they're getting more than $10K/month. You're looking at 3-5 years to break even at the most if that were to continue. Hackaday has been around way longer than that.
You claimed it does not matter how many "dead" objects you have. I claimed: it does. Hence I proposed you think a few hours about how memory allocation and GCs realy work by writing some pseudo code.
But you told me to write pseudocode for a bad, unpopular GC (mark-sweep). And I said even in my original post that yes, you're right that it looks at dead objects. My point was that isn't what's used, and better and more common GCs rarely or never do anything at all with dead objects.
Here, I'll give you pseudocode (in some unholy mix of C and Python I wrote for some reason) for a semispace collector, which is completely unaffected by the number or size of dead objects (except in terms of how often it runs):
const int SPACE_SIZE = 100 * 1024 * 1024;// 100 MiB, just because
struct TypeInformation:
int size// including header
int references[]// offsets into the object that contain pointers to other objects
struct Header:
TypeInformation * type
void * forwarding
struct Block:
Header header
byte data[]
// 'size' includes enough space for the header void * allocate(int size, TypeInformation * type):
if next_alloc + size > end_of_from_space:
do_gc()
if next_alloc + size > end_of_from_space:
panic()// memory is full.
void do_gc():
next_alloc = start_of_to_space
for each pointer p in the root set:
p = move(p)
swap(start_of_from_space, start_of_to_space)
swap(end_of_from_space, end_of_to_space)
next_alloc = start_of_from_space
// Moves 'object' to the to space, and returns the new address. If the object // has already been moved, just returns the address. Block * move(Block * object):
if object->header->forwarding != NULL: // Object was already moved
return object->header->forwarding
// Move its referees
for each offset in new_location->header.type->references:
*(new_location + offset) = move(*(new_location+offset))
return new_location
I don't know for sure it's completely correct, but it's at least the general idea. As it traverses the object graph it moves the objects to the other space. When it's done with that, it's done. No looking at dead objects at all. If there is exactly one live object, it will move that object and then it's done.
That sort of thing is what GCs do now. At the "worst", you have to do a mark-sweep phase and look at dead objects only when you do a full collection, which is a small percentage of total collections for almost all programs, and even that doesn't happen with all algorithms.
There is no single GC algorithm that uses ref counting.
That depends on how you define "garbage collection", and there's not even close to universal agreement about whether refcounting is included. For instance, Aho et al.'s dragon book includes a section "Reference Counting Garbage Collectors". Anyway, I don't really care what definition you use; when I put the parenthetical in "efficient (non ref-counting) GCs" it was less to say "hey I like the broader definition" and more to clarify exactly what I meant.
As to the rest, I just replied to your other response to me.:-)
I suggest you simply pseudocode on paper a mark and sweep algorithm.
OK, so I was going to respond by saying "why are we talking about mark and sweep?", but I checked myself and I'm thinking too much from the viewpoint of the desktop. So let's look at both.
On the desktop, pure mark and sweep isn't very popular, and is mostly used as a component of a more complex GC, like in the final age of a generational collector. The JVM doesn't use it,.Net doesn't use it, PyPy doesn't use it, V8 doesn't use it, Allegro CL doesn't use it, SBCL doesn't use it, Chicken Scheme doesn't use it, Racket doesn't use it, and Glasgow Haskell doesn't use it. The Boehm-Weiser collector for C uses it because it doesn't have a choice. SpiderMonkey uses mark and sweep, but work is underway on other collectors. Perl, (I think) CPython, and Flash use reference counting primarily, with mark-and-sweep to reclaim cycles. Ruby and Lua are the only current systems I could think of that unabashedly use mark-and-sweep. Even counting all of the exceptions, we're still 9 systems with a copying collector vs 7 that use mark-and-sweep. (I would say that dropping Perl/Python/Flash from the count entirely is reasonably fair, which would make it 9-4, and to be 10-3 when SpiderMonkey's generational collector is done.)
So in one sense I want to say that you are off-target when you're talking about mark-and-sweep, because how it behaves as the amount of memory changes is not particularly relevant as the most popular platforms don't use it. What matters is how copying generational collectors (which most of the "don't have it" have some variant of) perform. And for generational collectors, it's absolutely true that they take a lot of memory to perform well. That statement is supported by hard evidence gathered by Matthew Hertz and Emery Berger and published in 2005. If you disagree, I look forward to your study.
(I do see your response here. It's possible that giving too much could lead to further degregation. But 3-5x the memory requirement of a manually-managed version (what people are describing as "what it needs") is supported by the above evidence as well as reasoning about what's going on inside the system.)
On the other hand, that's not the mobile space, which is the primary purpose of the article. For instance, I am not positive about this, but I infer that Davlik uses mark and sweep. (They separate mark bits from the objects because of some memory stuff which I would guess also means they don't move objects.) I can certainly imagine that the GCs in place in mobile apps are less mature and more likely to use mark-and-sweep.
----
However, Hertz and Berger look at mark-and-sweep allocators too, and in overall results mark-and-sweep does worse than the best collector (a 2-generation generational collector with mark-and-sweep old space) and second-best collector (a 2-generation collector with a semispace old space) across the entire range of memory sizes tested. In fact, all collectors tested (5 in total, vs two versions of manual allocation) Get nearly monotically better as you increase available memory; and that result holds even for the individual benchmarks (whereas the relative order of how good the collectors are can vary; in some, mark-and-sweep is the best at low memory overheads).
I'm not going to claim the benchmarks are the best or all-inclusive, but that paper is still by far the best analysis I've seen of performance of manual vs automatic memory management. So when it says that performance gets better as you provide more memory, it'll take rather more than someone asserting on/. to convince me otherwise (no offense).
I accept that (non-moving) mark-and-sweep and generational collection are not mutually exclusive in theory. But I already said that in an aside.
All Wikipedia says is "In order to implement [generational collection], many generational garbage collectors use separate memory regions for different ages of objects". It doesn't provide any suggestion or citation of anything that talks about how you would implement generational collection any other way. What I was really asking is whether you've ever seen such a thing.
It's possible that it's actually not so hard, and it just requires knowing how remembered sets or whatnot are actually implemented to figure out how it works, and I don't have that knowledge.
Congratulations, you have a very specific statement which I can demonstrate is exactly, concretely wrong. Then I'll describe why the rest of your statements, which are more waffley, are also wrong as a result. (Being wrong is good, I'm just in a slightly antagonistic mood now, sorry!)
Because when the GC starts it will have to examine a huge working set (which is meanwhile all dead).
So this reflects a fundamental misunderstanding of how GCs in actual use work. (There's another problem too, but I'll stick with that one.) For purposes of this post, by "GC" I am sticking with the definition that excludes reference counting.
The simplest GC algorithm, mark-and-sweep, does look at all heap blocks, both alive and dead. However, no one uses mark-and-sweep.* Why? It's slow, and the fact that it has to look at dead objects is one reason for this.
Instead, GCs in actual practice are moving collectors. (Usually they're called copying collectors but I prefer this term.) What they do is traverse the reachable object graph, and as they encounter a live object they copy it to another part of memory. (Redirecting references to it is also part of the algorithm that isn't so relevant for this post, but it's why I like "move" instead of "copy" to describe it.)
The simplest copying collector to describe is probably a semispace collector. you divide the heap into two regions, a from half and a to half. You allocate in the from half. When the GC runs, it traces over the object graph starting from the root set, and when it finds an object it hasn't seen before, it moves it from the from half over to the to half. When the collector has completed its search, it switches which half it calls from and to, and it's done.
Note that at no point did the collector do anything with dead objects! Nothing at all! The only performance impact that dead objects have is indirect, in terms of increased GC frequency and reduced cache locality.
GCs in practice aren't usually semispace collectors but even better, but they still follow the same principle: they only do things with live objects and don't do anything at all with dead ones.
However, note that the semispace collector has doubled the amount of memory needed. If you reduce the memory below that, you not only increase collection frequency but you reduce the "don't have to look at dead objects" benefit because you collect more objects that are soon to be dead.
(Incidentally, why is all this moving data around actually not bad in practice? I'll give three mitigating factors. First, it lets you do generational GC, which brings even more benefits. Second, as objects are copied around, the collector actually improves the cache locality -- the cache locality immediately after a semispace collection is likely to be rather better than a manually-managed program! (Of course that mitigating factor is destroyed a bit by what the collector itself does to your cache, but it's still part of the larger factor.) Third, when talking about how GC is slow they often ignore the fact that malloc/free are actually not exactly speedy themselves all the time. In the worst case, malloc might have to spend a fair bit of time looking through the free list for a block, and free might have to do a bit of time coalescing blocks. It's not as expensive as a collection of course, but again it's part of the larger story. By contrast, a copy collector can allocate memory blisteringly fast, with basically just a pointer increment and a check to make sure it didn't go beyond the end of the heap (which would trigger another GC cycle).)
* I've recently learned that there may be a more broad definition of mark-and-sweep, e.g. as described on Wikipedia. I'm sticking with the narrower one, as per Aho et al.'s dragon book, which is basically automating the usual C malloc/free functions. (The moving collectors I describe later work completely differently from malloc/free.)
First, thinking about it a little more, I guess you could argue because architectures usually provide instructions for at least making function calls, that functions aren't an abstraction. I see the point, but I stand by my statement for a few reasons. First, those instructions usually don't really do anything that you couldn't do without them. (Maybe Sparc, with its register windows, is an exception?) It's usually just a shorthand, perhaps ever-so-slightly more efficient, for a store of the instruction pointer into a register or onto a stack, then a jump. Second, while an instruction used for calls is common, specific instructions for returns are less so. For instance, MIPS provides the jal instruction (jump and link) for calls, but I think there is no return instruction -- you just jump to the value in the ra register. Third, there is no additional CPU knowledge beyond the initial and final control transfers: CPUs don't know about the local variables (you could easily implement functions without using the hardware stack -- and some languages do this to provide support for closures and other features), CPUs don't know about the region of code occupied by a function (and often there is no "the code region"), etc.
Second, I thought of another great example of an abstraction that doesn't really cost CPU time: labels. Take goto cleanup;... cleanup:.... The CPU has no idea about cleanup. That has been entirely removed by the compilation toolchain, and turned into a symbolic address. The name cleanup is undoubtedly an abstraction for "the address of here", and yet it costs no CPU time.
Or take C++ objects. C++ objects, unless you explicitly request otherwise, basically have no additional overhead vs C structs and C functions. So are objects bad and structs or functions also bad, or are structs and functions good and objects also at least sometimes good?
I think you may be trolling, but I'll try to respond as if not in the case I can enlighten either you or someone else as to any of your questions. I actually know very little about Rust, and I think I'll deliberately not look up anything about it while responding here so you can see that it's not a matter of TopSpin just throwing around meaningless buzzwords. As a result I'm not evaluating the correctness, just whether or not TopSpin's post makes sense to someone with a fair bit of knowledge about memory management. (Of course the flip side is that if I get a lot wrong here (in terms of what Rust is or does), then that means TopSpin did a bad job after all.:-))
deterministic -- what does this mean?
It means... it's deterministic, i.e. not at the whim of things like what the OS or user decides to do to you?
It's used slightly more specifically meaning though when it comes to memory management. In particular, it is usually meant to mean that a block is freed at the moment it becomes garbage instead of some point in the future.
Suppose I have a Java File object allocated in the heap; it has a finalizer that will close the file when the object is finally collected. When does that happen? No idea, because it depends on when the GC runs. Maybe that's right now. Maybe the user is AFK and the program is idling, and the File won't be collected until the user comes back after her nap in an hour. Maybe you have multiple threads that are each allocating memory, and they interact in nondeterministic ways because of the OS's scheduler, and so the point at which the GC is triggered will vary from run to run.
More to the point: the time is unexpected and potentially distant.
Contrast that with when heap objects are destroyed. You know when they are destroyed: they're destroyed when the currently-executing function returns. In C++, you know that the destructors will run when the block they are a member of exits. True, you may not know exactly when that happens in wall-clock time, but you know when it'll happen relative to the code. This is important because an fstream object that you allocate on the stack in C++ will actually close when the function returns. This is both predictable and probably soon.
(This usage of the term is, however, extremely standard, so it's in some sense it's your ignorance for not taking them seriously because of it, not their weird terminology.)
efficient and safe -- subjective/opinion
Eh, not necessarily. Both are technically spectrums, and so yes, you could say that there's an element of opinion. But if you are in, just to make up a number, the safest 10% of the spectrum or the most efficient 20% of the spectrum, it's pretty hard to argue that you're not safe and efficient.
Especially when it comes to safe, which they are using as an abbreviation for "memory safe" (which TopSpin probably should have used). That has a reasonably specific definition, so there's not a ton of room for interpretation. (For example: is C# memory safe? After all, it has unsafe blocks.)
(Mind you I'm not evaluating the correctness of those claims, just the plausibility.)
endemic use of a garbage collector... -- what does this mean?
I actually don't know.:-)
reference-counted heap objects -- what does this mean?
Probably, C programmers can think of an "object" as synonymous with a heap block.
So, heap blocks are reference counted so they are freed when the last reference is released. (And if you "don't know what reference counting is"... well, then you be trolling sir.)
"exchange" vs "local" heap
They're just names for their two heaps. I would guess that they have something to do with either threads or functions -- the local heap being
And.. wasn't the original point that you could "allocate" memory for huge, sparse objects, but only really use the portion you actually need, with the buffer of paging in case you go over a bit, to keep things from crashing?
I'm fairly certain that it was never intended to allow an application that needs 500MB, to run in 50MB of RAM....
It depends what you mean by "virtual memory", because there are two meanings; but at least I would say "no."
The "real" meaning is just the fact that programs deal in virtual addresses, which are then translated to physical addresses. The address 0x4000 in your process and 0x4000 in my process are completely different.1 This has basically nothing to do with how much space programs need and everything to do with making it so processes can't step on each others toes for both ease of programming and security.
The added layer of indirection2 that was added to support virtual memory also made it easy to do automatic paging of memory to disk, and that's the other definition of virtual memory. But even here I would say there's a lot to argue against your statement that "it was never intended to allow an application that needs 500MB to run in 50MB"; at least if you make that less extreme. Before virtual memory there were overlays, which were basically regions of memory that the program, on its own volition, paged in and out to get at extra resources. I'm not nearly old enough (or insane enough) to have dealt with them, but some systems provided some explicit support for overlays, though still with a lot of programmer burden to designate what needed to go into in the overlays. So dealing with programs that had much larger memory requirements than available memory was already a common problem. I can't imagine that this would have escaped the notice of people designing paging systems. That being said, I can't go back in time and read the thoughts of the people putting in MMUs.:-)
(One disadvantage of the paging vs overlays is, ironically enough, the programmer effort. Because it requires you to think about exactly what goes in each overlay, it means that you can't just haphazardly place stuff in memory and hope it works then whine that it doesn't because it thrashes too much.)
1 shmget blah blah blah 2 "All problems in computer science can be solved by another level of indirection, except for the problem of too many layers of indirection"
The principle of RAII is that you acquire a resource in a constructor and release it in the destructor. When you copy, you copy the resource.
That's an extraordinarily limited definition of RAII in my opinion.
For example, the Wikipedia article on RAII states "Ownership of dynamically allocated objects (memory allocated with new in C++) can also be controlled with RAII, such that the object is released when the RAII object is destroyed. For this purpose, the C++11 standard library defines the smart pointer classes std::unique_ptr for single-owned objects and std::shared_ptr for objects with shared ownership."
It also cites Bjarne Stroustrup as the inventor of the term "RAII". Let's see what he has to say: "The technique for managing resources using local objects is usually referref to as [RAII]" [The C++ Programming Language, special edition, p.366]. Doesn't say that the constructor must copy and the destructor must release exactly -- just that they must manage the resource.
Or let's look at what is probably one of the most widely-regarded C++ books, Herb Sutter's Effective C++ (3rd ed). Sutter defines RAII as "the idea of using objects to manage resources" [p. 63 in Item 13]. Later he says "Item 13 introduces the idea of [RAII]... and it describes how auto_ptr [also doesn't fit your definition] and tr1::shared_ptr are manifestations of this idea for heap-based resources" [page 66 in Item 14]. Then he goes on to describe possibilities for a question "that every RAII class author must confront: what should happen when an RAII object is copied?" [page 67].
That is complete nonsense. It heavy depends on the application (they way the application is doing its own caching e.g.) The less "overhead" memory you grant the application the faster the GC will be, because the less unreferenced objects are there.
Not really. Efficient (non ref-counting) GCs themselves all have substantial memory overheads just because of the way they divide the heap up into spaces and move objects around.
Yes, application performance does affect a lot: a program that is much better about allocations and makes a lot fewer of them will trigger far fewer GC cycles. But unless it just allocates all the memory it needs right away at the start and never calls new again, it still needs the GC overhead.
The "4x vs. what the application needs" is misleading in some sense -- it's not what the application needs, it's relative to a version of the program that uses manual memory management. So it should be "GC needs 4x more memory to be efficient than a manually-managed version." So if you say "I'll just reduce the size of the program's cache to make the GC do better", to get an apples-to-apples comparison you'll also have to do the same for the manually-managed version, and you'll be back to 4x.
That depends what you mean by "safer".
It's safer to you. Onto your machines you can install the certificate of your CA, and you'll know everything is peachy.
But if your audience is "the general internet population", e.g. because you're trying to sell stuff to them, it's less secure. Without a trusted or semi-trusted third party (normally served by the default CAs), there is no way to convey the authenticity of your own CA and thus of your own public key to them.
No, no it isn't. Not really.
According to this post, this post, and my own intuition, CAs never see your private keys. A CA cannot reveal more information than is known publicly anyway, even if they are thoroughly malicious. The most you could argue about the standard set up is that CAs give a false sense of security.
I can only think of one attack that could occur with CA-signed certificates but not with self-signed certs. If you remove all default CAs from being accepted and just store the fingerprint of the public key (e.g. what happens with SSH), then it becomes impossible for the real amazon.com public key to be silently substituted with an imposter (but malicious-CA-cleared) amazon.com public key. But if you don't clear out your list of CAs, there is no hard benefit to be gained here.
Actual question: do the CAs even ever have access to the private keys?
I'm pretty sure there's no technical reason they need them -- the CAs just need to attest to the public key, which they could do just by signing the public key. But that doesn't mean that's how the system is set up in practice, of course.
Source?
The reason I ask is that there is mention in some of the threads in this story's discussion that refer to statistics like this. But an Apache OpenOffice guy makes a good case that those statistics are wholly misleading.
...there are lots of talented people.
There's also this factor. Which if you think about it too hard, is actually really depressing.
Although the quality of Microsoft Products have risen after gates left. When Gates left XP was just released and getting hammered by security issues. Compared to say Windows 7 and even Windows 8 which runs very stable and is a lot more secure than ever.
A big part of that was the "trustworthy computing" initiative that Gates started though. Actually even Vista was very close to being released (Nov 8, 2006) when Gates announced he was reducing his day-to-day role at MS (June 15, 2006).
What? I never ran Vista x64, but I did run Windows 7 and do run Windows 8 in 64-bit. The only compatibility problems I've ever seen are with 16-bit programs which I cannot run any more. XP 64 had more problems, but I've even had success with that. (Then again, I didn't have to set that one up.)
I don't doubt that there were occasional problems, but there would also have been occasional problems with just Vista, regardless of bitwidth. Almost everyone who's talked about 64-bit Windows says that the issues were basically ironed out in Vista and 7.
So what compatibility problems do you refer to?
And because people on /. seem to "forget" their history, ARM isn't even close to the first non-x86 architecture that Windows has been available for; it's previously supported Alpha (NT 3.1-4.0), MIPS (NT 3.1-4.0), Power (3.51-4.0), and Itanium (XP, Server 2003, and Server 2008).
To play devil's advocate for a second:
Yeah, it's way better to screw your users over something that you have a smaller chance of affecting than you have of seeing some pigs aloft over a week-old snowball in hell.
WOOOOwoooooWWOOOOOwwooooo!!!
Alert! Alert!
Joe has push a commit using tabs instead of spaces!
wooooooWOOOOOwooooWOOOOOO
The moral of the story: keep a can of mosquito spray in your trunk (Those things don't exceed 120F right?)
"Those things" = trunks? Ahhhahahahaha.
It's probably easy to exceed 140 degrees. I've personally measured my trunk to be off the scale of the thermometer I used (a combined clock thermometer; I don't know what it's highest temperature is but I saw it at 121 after it had been cooling off inside for a few minutes), and I live in the northern US. I wouldn't be surprised if you could hit 160 in something like Texas or Arizona.
They said that they were offered a revenue-sharing stream of $10K/month for 30 months. The reason they declined that, from what the editors were saying, was that it would leave too little revenue for other stuff. They also say that the current owner is using the revenue stream to fund other, unrelated startups.
If those are true, it's likely they're getting more than $10K/month. You're looking at 3-5 years to break even at the most if that were to continue. Hackaday has been around way longer than that.
Sevier is seeking damages from Apple, but said he we will drop the lawsuit if Apple agrees to sell devices with a 'safe mode.'"
Apple machines do come with a safe mode. Here is the relevant documentation (though primarily geared toward leaving safe mode).
Heh, no worries. It's mostly the editor's job anyway...
Don't blame Timothy for the dupe; no doubt he posted it from his mobile.
But you told me to write pseudocode for a bad, unpopular GC (mark-sweep). And I said even in my original post that yes, you're right that it looks at dead objects. My point was that isn't what's used, and better and more common GCs rarely or never do anything at all with dead objects.
Here, I'll give you pseudocode (in some unholy mix of C and Python I wrote for some reason) for a semispace collector, which is completely unaffected by the number or size of dead objects (except in terms of how often it runs):
I don't know for sure it's completely correct, but it's at least the general idea. As it traverses the object graph it moves the objects to the other space. When it's done with that, it's done. No looking at dead objects at all. If there is exactly one live object, it will move that object and then it's done.
That sort of thing is what GCs do now. At the "worst", you have to do a mark-sweep phase and look at dead objects only when you do a full collection, which is a small percentage of total collections for almost all programs, and even that doesn't happen with all algorithms.
That depends on how you define "garbage collection", and there's not even close to universal agreement about whether refcounting is included. For instance, Aho et al.'s dragon book includes a section "Reference Counting Garbage Collectors". Anyway, I don't really care what definition you use; when I put the parenthetical in "efficient (non ref-counting) GCs" it was less to say "hey I like the broader definition" and more to clarify exactly what I meant.
As to the rest, I just replied to your other response to me. :-)
I suggest you simply pseudocode on paper a mark and sweep algorithm.
OK, so I was going to respond by saying "why are we talking about mark and sweep?", but I checked myself and I'm thinking too much from the viewpoint of the desktop. So let's look at both.
On the desktop, pure mark and sweep isn't very popular, and is mostly used as a component of a more complex GC, like in the final age of a generational collector. The JVM doesn't use it, .Net doesn't use it, PyPy doesn't use it, V8 doesn't use it, Allegro CL doesn't use it, SBCL doesn't use it, Chicken Scheme doesn't use it, Racket doesn't use it, and Glasgow Haskell doesn't use it. The Boehm-Weiser collector for C uses it because it doesn't have a choice. SpiderMonkey uses mark and sweep, but work is underway on other collectors. Perl, (I think) CPython, and Flash use reference counting primarily, with mark-and-sweep to reclaim cycles. Ruby and Lua are the only current systems I could think of that unabashedly use mark-and-sweep. Even counting all of the exceptions, we're still 9 systems with a copying collector vs 7 that use mark-and-sweep. (I would say that dropping Perl/Python/Flash from the count entirely is reasonably fair, which would make it 9-4, and to be 10-3 when SpiderMonkey's generational collector is done.)
So in one sense I want to say that you are off-target when you're talking about mark-and-sweep, because how it behaves as the amount of memory changes is not particularly relevant as the most popular platforms don't use it. What matters is how copying generational collectors (which most of the "don't have it" have some variant of) perform. And for generational collectors, it's absolutely true that they take a lot of memory to perform well. That statement is supported by hard evidence gathered by Matthew Hertz and Emery Berger and published in 2005. If you disagree, I look forward to your study.
(I do see your response here. It's possible that giving too much could lead to further degregation. But 3-5x the memory requirement of a manually-managed version (what people are describing as "what it needs") is supported by the above evidence as well as reasoning about what's going on inside the system.)
On the other hand, that's not the mobile space, which is the primary purpose of the article. For instance, I am not positive about this, but I infer that Davlik uses mark and sweep. (They separate mark bits from the objects because of some memory stuff which I would guess also means they don't move objects.) I can certainly imagine that the GCs in place in mobile apps are less mature and more likely to use mark-and-sweep.
----
However, Hertz and Berger look at mark-and-sweep allocators too, and in overall results mark-and-sweep does worse than the best collector (a 2-generation generational collector with mark-and-sweep old space) and second-best collector (a 2-generation collector with a semispace old space) across the entire range of memory sizes tested. In fact, all collectors tested (5 in total, vs two versions of manual allocation) Get nearly monotically better as you increase available memory; and that result holds even for the individual benchmarks (whereas the relative order of how good the collectors are can vary; in some, mark-and-sweep is the best at low memory overheads).
I'm not going to claim the benchmarks are the best or all-inclusive, but that paper is still by far the best analysis I've seen of performance of manual vs automatic memory management. So when it says that performance gets better as you provide more memory, it'll take rather more than someone asserting on /. to convince me otherwise (no offense).
I accept that (non-moving) mark-and-sweep and generational collection are not mutually exclusive in theory. But I already said that in an aside.
All Wikipedia says is "In order to implement [generational collection], many generational garbage collectors use separate memory regions for different ages of objects". It doesn't provide any suggestion or citation of anything that talks about how you would implement generational collection any other way. What I was really asking is whether you've ever seen such a thing.
It's possible that it's actually not so hard, and it just requires knowing how remembered sets or whatnot are actually implemented to figure out how it works, and I don't have that knowledge.
You didn't say it was a bad idea, you said it wasn't RAII.
(OK, you did kind of say it was a bad idea, but that wasn't what I was replying to.)
Congratulations, you have a very specific statement which I can demonstrate is exactly, concretely wrong. Then I'll describe why the rest of your statements, which are more waffley, are also wrong as a result. (Being wrong is good, I'm just in a slightly antagonistic mood now, sorry!)
So this reflects a fundamental misunderstanding of how GCs in actual use work. (There's another problem too, but I'll stick with that one.) For purposes of this post, by "GC" I am sticking with the definition that excludes reference counting.
The simplest GC algorithm, mark-and-sweep, does look at all heap blocks, both alive and dead. However, no one uses mark-and-sweep.* Why? It's slow, and the fact that it has to look at dead objects is one reason for this.
Instead, GCs in actual practice are moving collectors. (Usually they're called copying collectors but I prefer this term.) What they do is traverse the reachable object graph, and as they encounter a live object they copy it to another part of memory. (Redirecting references to it is also part of the algorithm that isn't so relevant for this post, but it's why I like "move" instead of "copy" to describe it.)
The simplest copying collector to describe is probably a semispace collector. you divide the heap into two regions, a from half and a to half. You allocate in the from half. When the GC runs, it traces over the object graph starting from the root set, and when it finds an object it hasn't seen before, it moves it from the from half over to the to half. When the collector has completed its search, it switches which half it calls from and to, and it's done.
Note that at no point did the collector do anything with dead objects! Nothing at all! The only performance impact that dead objects have is indirect, in terms of increased GC frequency and reduced cache locality.
GCs in practice aren't usually semispace collectors but even better, but they still follow the same principle: they only do things with live objects and don't do anything at all with dead ones.
However, note that the semispace collector has doubled the amount of memory needed. If you reduce the memory below that, you not only increase collection frequency but you reduce the "don't have to look at dead objects" benefit because you collect more objects that are soon to be dead.
(Incidentally, why is all this moving data around actually not bad in practice? I'll give three mitigating factors. First, it lets you do generational GC, which brings even more benefits. Second, as objects are copied around, the collector actually improves the cache locality -- the cache locality immediately after a semispace collection is likely to be rather better than a manually-managed program! (Of course that mitigating factor is destroyed a bit by what the collector itself does to your cache, but it's still part of the larger factor.) Third, when talking about how GC is slow they often ignore the fact that malloc/free are actually not exactly speedy themselves all the time. In the worst case, malloc might have to spend a fair bit of time looking through the free list for a block, and free might have to do a bit of time coalescing blocks. It's not as expensive as a collection of course, but again it's part of the larger story. By contrast, a copy collector can allocate memory blisteringly fast, with basically just a pointer increment and a check to make sure it didn't go beyond the end of the heap (which would trigger another GC cycle).)
* I've recently learned that there may be a more broad definition of mark-and-sweep, e.g. as described on Wikipedia. I'm sticking with the narrower one, as per Aho et al.'s dragon book, which is basically automating the usual C malloc/free functions. (The moving collectors I describe later work completely differently from malloc/free.)
Two followup thoughts:
First, thinking about it a little more, I guess you could argue because architectures usually provide instructions for at least making function calls, that functions aren't an abstraction. I see the point, but I stand by my statement for a few reasons. First, those instructions usually don't really do anything that you couldn't do without them. (Maybe Sparc, with its register windows, is an exception?) It's usually just a shorthand, perhaps ever-so-slightly more efficient, for a store of the instruction pointer into a register or onto a stack, then a jump. Second, while an instruction used for calls is common, specific instructions for returns are less so. For instance, MIPS provides the jal instruction (jump and link) for calls, but I think there is no return instruction -- you just jump to the value in the ra register. Third, there is no additional CPU knowledge beyond the initial and final control transfers: CPUs don't know about the local variables (you could easily implement functions without using the hardware stack -- and some languages do this to provide support for closures and other features), CPUs don't know about the region of code occupied by a function (and often there is no "the code region"), etc.
Second, I thought of another great example of an abstraction that doesn't really cost CPU time: labels. Take goto cleanup; ... cleanup: .... The CPU has no idea about cleanup. That has been entirely removed by the compilation toolchain, and turned into a symbolic address. The name cleanup is undoubtedly an abstraction for "the address of here", and yet it costs no CPU time.
Or take C++ objects. C++ objects, unless you explicitly request otherwise, basically have no additional overhead vs C structs and C functions. So are objects bad and structs or functions also bad, or are structs and functions good and objects also at least sometimes good?
I think you may be trolling, but I'll try to respond as if not in the case I can enlighten either you or someone else as to any of your questions. I actually know very little about Rust, and I think I'll deliberately not look up anything about it while responding here so you can see that it's not a matter of TopSpin just throwing around meaningless buzzwords. As a result I'm not evaluating the correctness, just whether or not TopSpin's post makes sense to someone with a fair bit of knowledge about memory management. (Of course the flip side is that if I get a lot wrong here (in terms of what Rust is or does), then that means TopSpin did a bad job after all. :-))
It means... it's deterministic, i.e. not at the whim of things like what the OS or user decides to do to you?
It's used slightly more specifically meaning though when it comes to memory management. In particular, it is usually meant to mean that a block is freed at the moment it becomes garbage instead of some point in the future.
Suppose I have a Java File object allocated in the heap; it has a finalizer that will close the file when the object is finally collected. When does that happen? No idea, because it depends on when the GC runs. Maybe that's right now. Maybe the user is AFK and the program is idling, and the File won't be collected until the user comes back after her nap in an hour. Maybe you have multiple threads that are each allocating memory, and they interact in nondeterministic ways because of the OS's scheduler, and so the point at which the GC is triggered will vary from run to run.
More to the point: the time is unexpected and potentially distant.
Contrast that with when heap objects are destroyed. You know when they are destroyed: they're destroyed when the currently-executing function returns. In C++, you know that the destructors will run when the block they are a member of exits. True, you may not know exactly when that happens in wall-clock time, but you know when it'll happen relative to the code. This is important because an fstream object that you allocate on the stack in C++ will actually close when the function returns. This is both predictable and probably soon.
(This usage of the term is, however, extremely standard, so it's in some sense it's your ignorance for not taking them seriously because of it, not their weird terminology.)
Eh, not necessarily. Both are technically spectrums, and so yes, you could say that there's an element of opinion. But if you are in, just to make up a number, the safest 10% of the spectrum or the most efficient 20% of the spectrum, it's pretty hard to argue that you're not safe and efficient.
Especially when it comes to safe, which they are using as an abbreviation for "memory safe" (which TopSpin probably should have used). That has a reasonably specific definition, so there's not a ton of room for interpretation. (For example: is C# memory safe? After all, it has unsafe blocks.)
(Mind you I'm not evaluating the correctness of those claims, just the plausibility.)
I actually don't know. :-)
Probably, C programmers can think of an "object" as synonymous with a heap block.
So, heap blocks are reference counted so they are freed when the last reference is released. (And if you "don't know what reference counting is"... well, then you be trolling sir.)
They're just names for their two heaps. I would guess that they have something to do with either threads or functions -- the local heap being
It depends what you mean by "virtual memory", because there are two meanings; but at least I would say "no."
The "real" meaning is just the fact that programs deal in virtual addresses, which are then translated to physical addresses. The address 0x4000 in your process and 0x4000 in my process are completely different.1 This has basically nothing to do with how much space programs need and everything to do with making it so processes can't step on each others toes for both ease of programming and security.
The added layer of indirection2 that was added to support virtual memory also made it easy to do automatic paging of memory to disk, and that's the other definition of virtual memory. But even here I would say there's a lot to argue against your statement that "it was never intended to allow an application that needs 500MB to run in 50MB"; at least if you make that less extreme. Before virtual memory there were overlays, which were basically regions of memory that the program, on its own volition, paged in and out to get at extra resources. I'm not nearly old enough (or insane enough) to have dealt with them, but some systems provided some explicit support for overlays, though still with a lot of programmer burden to designate what needed to go into in the overlays. So dealing with programs that had much larger memory requirements than available memory was already a common problem. I can't imagine that this would have escaped the notice of people designing paging systems. That being said, I can't go back in time and read the thoughts of the people putting in MMUs. :-)
(One disadvantage of the paging vs overlays is, ironically enough, the programmer effort. Because it requires you to think about exactly what goes in each overlay, it means that you can't just haphazardly place stuff in memory and hope it works then whine that it doesn't because it thrashes too much.)
1 shmget blah blah blah
2 "All problems in computer science can be solved by another level of indirection, except for the problem of too many layers of indirection"
That's an extraordinarily limited definition of RAII in my opinion.
For example, the Wikipedia article on RAII states "Ownership of dynamically allocated objects (memory allocated with new in C++) can also be controlled with RAII, such that the object is released when the RAII object is destroyed. For this purpose, the C++11 standard library defines the smart pointer classes std::unique_ptr for single-owned objects and std::shared_ptr for objects with shared ownership."
It also cites Bjarne Stroustrup as the inventor of the term "RAII". Let's see what he has to say: "The technique for managing resources using local objects is usually referref to as [RAII]" [The C++ Programming Language, special edition, p.366]. Doesn't say that the constructor must copy and the destructor must release exactly -- just that they must manage the resource.
Or let's look at what is probably one of the most widely-regarded C++ books, Herb Sutter's Effective C++ (3rd ed). Sutter defines RAII as "the idea of using objects to manage resources" [p. 63 in Item 13]. Later he says "Item 13 introduces the idea of [RAII]... and it describes how auto_ptr [also doesn't fit your definition] and tr1::shared_ptr are manifestations of this idea for heap-based resources" [page 66 in Item 14]. Then he goes on to describe possibilities for a question "that every RAII class author must confront: what should happen when an RAII object is copied?" [page 67].
So no, you're wrong.
Not really. Efficient (non ref-counting) GCs themselves all have substantial memory overheads just because of the way they divide the heap up into spaces and move objects around.
Yes, application performance does affect a lot: a program that is much better about allocations and makes a lot fewer of them will trigger far fewer GC cycles. But unless it just allocates all the memory it needs right away at the start and never calls new again, it still needs the GC overhead.
The "4x vs. what the application needs" is misleading in some sense -- it's not what the application needs, it's relative to a version of the program that uses manual memory management. So it should be "GC needs 4x more memory to be efficient than a manually-managed version." So if you say "I'll just reduce the size of the program's cache to make the GC do better", to get an apples-to-apples comparison you'll also have to do the same for the manually-managed version, and you'll be back to 4x.