That was clearly not the founder's intention. If you take your interpretation, also unconstitutional are libel laws, slander laws, and copyright laws, despite the latter being explicitly written into the Constitution.
MSR *had an implementation* before Russinovich heard about it. Furthermore, their implementation at the time was more complete (except for the detection arms race I mention in a couple other posts) than RootkitRevealer is even now.
There's plenty to bash MS for. You don't have to make up crap.
On another note, it's a darn good thing we didn't send him into Iraq to look for WMDs,... or did we, and that's the real reason we didn't find anything there?
He was over there embedded with some US troops.
Then he aired a description of their troop movements as part of his news broadcast (including a map drawn in the sand), and the Army essentially booted him.
Yes, because creating a solution that you can't distribute because the black hats will find out about it will do everyone who didn't get it a big favor.
int main() {
C l;
l.x = 0;
foo(l);
std::cout << l.x << "\n"; }
The following should print 0 if you take the time to clean up any compile errors:
class C { public int x; };
class JavaBites {
public void foo(C p) {
C l = new C;
l.x = 1;
p = l;
}
public int main() {
C l = new C;
l.x = 0;
foo(l);
System.out.println(l.x);
} }
This is what I mean when I say Java passes a reference by value.
In C++, I don't know that it really makes sense to say that it passes a reference by value... it's possible that people refer to that sometimes (possibly even the standard), but conceptually I don't think it makes too much sense. It's only when you drop to the implementation perspective and realize that references are syntactic sugar for pointers that it makes sense.
But while C++ references are implemented *using* pointers, and are syntactic sugar around pointers, Java references ARE pointers. (The syntax differs from C++ because it uses . instead of ->, and you don't have pointer arithmetic in Java, but other than that the semantics are the same as C++ pointers, and there is no equivalent entity to C++ references.)
He big difference: Ghostbuster does a high/low scan with low being a "reboot to trusted media". Rootkit Revealer just uses two different APIs to do the high/low scan
Strider GhostBuster can do three different types of scans. See http://research.microsoft.com/rootkit/. Only one of the requires a reboot, and the one they call an "inside-the-box scan" is essentially the same thing that RootkitRevealer does. (And yes, even that was described in a paper submission before RootkitRevealer was authored.)
Detect the Ghostbuster scan and report honestly to it but not to anything else, hoping that nobody will notice one extra file with a cryptic name in the system32 directory. This reduces the effectiveness of the rootkit and puts the rootkit author on the wrong side of an arms race if Ghostbuster scans start changing.
And in fact this is already the situation in some sense; it helps a LOT less if you can do a cold scan (because you can do signature checking), but for the "hot" scan (option 2 on the MS site I believe), there's been a mini arms race going on between rootkits and RootkitRevealer on this very issue for some time. I mention it in a couple other posts on this topic; see one of them. (Probably other users have mentioned it by now too.)
Still, it's hard to see the functional improvement over Tripwire on a live CD.
GhostBuster has a couple improvements: 1. You don't really need a known-good configuration. All GB's cold scan does is compare the results from an offline scan to the results from an online scan. If you have a known-good scan I could imagine you could use it in place of the offline scan if you haven't made any changes, but you don't need one. By my limited understanding of Tripwire, without a known-good scan, Tripwire's useless.
2. GhostBuster can produce useful results for most known rootkits even without an offline scan, so there's no need to reboot. Basically what it does is get a lower-level representation of what it's scanning, parse it, and consider that a good scan. For instance, if looking for hidden registry keys, it will parse the registry hives on the file system. To pass this test, either the rootkit has to not modify the API results (returning correct results) when being sent to GhostBuster (this is where the arms race I talked about above actually came into play) or modify the results of the lower-level scan (reading the registry hives) as well. This part is, to my knowledge, completely novel.
The functionality that tripwire contains is only a part of what Strider GB contains, and only overlaps at the highest level of description with the part that's in RootkitRevealer.
The online hot scan that RootkitRevealer and GB do is more thorough than my understanding of what tripwire does. My understanding is that tripwire calculates hashes of a set of files in a known good configuration, then periodically recomputes these hashes and compares them to the known good. If the hashes change, that means that you might have a problem. Is this correct, or is it more advanced than that?
If you do RR's scan or GB's hot scan, it does a much deeper analysis. For instance, it will query the Windows API, but it will also parse the registry hives in the file system. I think it might be able to do the same thing with directory listings gotten from the Find(First|Next)File Win32 API calls vs what is found by doing block requests to the disk. The theory behind this is that so far, rootkits are not sophisticated enough to intercept these requests and change them to be wrong in the right way, leading to an inconsistency in the two views. (Making them be correct is non-trivial too, as evidenced by the the fact that rather than try to intercept those calls and hide the reg entry/file, they have tried to just *unhide* the view through the API to RR/GB only, thus making its view consistently correct, but still hide from other processes. RR's response was to randomly rename the exe each time its run, which led black hats to create rootkits that use the same technologies in virus scanners to detect rootkit detectors so that it can not hide from them.) Note that this process does NOT require a known-good configuration.
The current PoC can be found by just scanning all memory, and if that could be solved (very difficult)
It's not as difficult as you think. There's a proof of concept rootkit called Shadow Walker which uses a very clever technique taken from PaX's method for preventing stack and heap execution of code without hardware NX support. It's not pefect -- there are a couple avenues of detection that are hard to "solve" -- but it does go a fair way towards achieving that goal.
though this needs an external reference timesource since you can't trust the host
Which means you're subject to network delays and whatnot. This is actually harder than it seems to do well.;-)
I already have total control over everything in it (provided my user allows me to have it, which is pretty much a given with MS OSs). Why do I need a rootkit?
You don't. It's poor reporting. GhostBuster isn't a rootkit; it's just a rootkit detection program. (Or set of programs.)
The article is misleading if not outright wrong; GhostBuster isn't a rootkit itself, it's just a rootkit detection thing very similar to RootkitRevealer. (GhostBuster came first and is more complete.)
It's closer to anti-virus than it is to a rootkit itself, though the similarities there don't go very far either. (AVs almost universally work by signature matching; GB works by comparing registry entries and files against each other by multiple means of acquiring that information in order to find the symptoms of having a rootkit -- missing information. This assumes that the rootkit is imperfect in hiding. For instance, this will do a scan of the registry through the standard API calls. But then it will parse the registry hives that are on disk. The assumption is that the rootkit is going to hook the API calls. Hooking the I/O calls is rather more difficult, and it's impossible if you can do a clean boot. (One of the options is to do a diff of a hot scan vs. a known good scan done from a Windows PE boot.) There are still things that rootkit authors can do though, specifically NOT hide from GB itself. IN the case of RootkitRevealer, this has actually turned into a mini-arms race of itself. Rootkits started not hiding from rreveal.exe or whatever it's called (so that it wouldn't detect diffs), so RootkitRevealer started randomly renaming itself each time it runs. The state of the art on the black hat side is to carry a signature of RootkitRevealer-like programs and do pattern matching in very much the same way that AV does pattern matching to find viruses.)
2. Rather than perfecting a rootkit, they should be working towards making a rootkit an impossibility in their OS.
If you can run drivers in kernel mode, you can run a rootkit. (Unless you can statically prove everything you let run in kernel space is safe... this may or may not be possible. For what it's worth, my current research is related to model checking drivers.)
Actually, in a rare turn of events, GhostBuster isn't the reincarnation.
MSR has been working on GhostBuster for some time, with a white paper released July 2004. That MSR site says that RootkitRevealer was released Feb 22, 2005. This fact is confirmed by archive.org, where the version archived Feb 22 does not contain RR and the one from Feb 23 does. (Not to mention the front page listed it as Feb 22.)
But it's not just digital TVs that do it. I have about a 15 year old TV. I also have a TV tuner for the computer. Plugged into the same analog signal, the tuner takes a second and a half or two to change channels. Many (analog) TVs I've seen are about the same. But that over-a-decade old TV... changes channels almost instantly. Maybe a quarter second.
I always get very frustrated when on a TV that takes a while to change channels.
And from what you're saying it can take up to 10 seconds on digital? Holy crap... how can you channel surf with that? Maybe I'll just not go digital. Sometimes old tech is better.
I never used Java (tried it a bit and intend to use it soon), but that sounds terrible. I don't like special cases in languages. In C++ you decide for yourself if you want to pass by value or by reference
It's not really a special case though, because Java *isn't* pass by reference. It's pass by value. It just passes references by value.
If you think of reference = pointer then there's no confusion, and no special case.
I'm not saying Java is a great language (I'll take C++ any day), but this part is not as conceptually unclean as it seems.
Except in Vista it doesn't work and on Mac it does.
Based on what? This article? From the comments throughout this story, it seems the article is just FUD. I'll admit that Macs seem to be better, but if you look through you'll see at least a couple saying that even their Macs have problems.
Also, in Vista, it doesn't do hibernate at the time when it goes to sleep. So if you sleep your laptop and remove the battery five minutes later, you're royally fucked without a warning.
Did you even read the post you're repyling to:
When chosen, this new 'Sleep" mode saves information from the computer's memory to the hibernation file on disk, but instead of turning off the computer, it simultaneously enters Standby mode. After a specified amount of time (3 hours by default), it shuts down (hibernates). If power is lost during Standby mode, the system resumes from the existing hibernate image on disk. Sleep mode, thus, offers the benefits of fast suspend and resume when in Standby mode and reliability when resuming from hibernation, in case of power loss. [emphasis mine]
Some of your previous posts sound like you go to Berkeley.
Hah! I wish. Berkeley was my top choice when I was applying to grad schools, but they didn't like me apparently. I'm at UW Madison now, which is still a fine school, and it's very nice. (Albeit a bit cold for the past couple weeks.)
Do you have a source for this? I've followed the CIL mailing list for awhile, and I thought that project was scrapped awhile ago.
Like I said, I don't go there, so I'm quite removed from what's going on. That said, I'll tell you what I know. I'm working on a project right now that uses Elsa, so I've been in correspondence with the people behind that a little bit. The prof in the class this project is for is a recent Berkeley grad (like a couple years ago) and said that there was some effort on this a while ago, but it was scrapped, so this is probably what you are referring to. I'm looking back at the email that talked about this, and I think saying there's early work might have been a little premature actually; but it's definitely still on the radar there.
First, I should give a little background. Scott McPeak, a (former now I think?) grad student there, has a Generalized LR parser called Elkhound which at least sounds quite fancy. (I don't know enough about parsing theory and grammars and such.) He used that to create a C++ front end called Elsa. This has been sort of slurped up under the umbrella of a project called Oink, which is overseen by Dan Wilkerson and Karl Chen. Elsa development is continuing under that heading, though it seems that Scott McPeak has little time for it now. The idea behind Oink is to provide platform for different static analysis tools for C++. Right now it's essentially a C++ version of CQual (which is another project out of Berkeley, this time by a guy named Jeff Foster).
[The poster I'm replying to can skip this paragraph.] CIL is an abbreviation for C Intermediate Language, which is in some sense a simplified form of C. (Many kinds of operations are "lowered" into other forms to reduce the number of constructs that one has to analyze. For instance, all loops are changed into "while(1) {... }" with an if statement and a break to leave.) But what it really is is a set of OCaml bindings to reference the above simplified form of a program. Tools can then implement a static analysis off of this (I think CQual falls in this category) or do transformations of the ASTs and then pretty-print it back to source form (CCured is in this category). CCured is a source-to-source C transformation that inserts memory safety checks in locations that it can't statically prove safe. I forget who is behind these projects, but you can Google them as easily as I can. (They are out of Berkeley.;-))
[Finally, to complete the "these are from Berkeley" bit, the project that I'm working on with Elsa is extending the Cooperative Bug Isolation (CBI) project to being able to use C++. CBI is the brain child and (ACM award-winning) dissertation of Ben Liblit, who wrote his C source-to-source transformation using CIL while he was a grad student at Berkeley, and he's now my prof.]
Now... with all that setup, I actually have surprisingly little to say. Point being is that the Berkeley crowd is still interested in creating a "CIL++"-like think. Dan Wilkerson and Matt Harren are starting to put together a way to create OCaml bindings to the ASTs that Elsa generate. At least an eye is cocked towards a CIL++, though I think their aim is bigger than that. From the relevant email:
Several different groups are all interested in an OCaml way of relating to the Elsa AST and possibly Typesystem as well. The Mozilla guys want to use it as an AST query language and here at Berkeley there is a group that wants to do program analysis in OCaml on C++ (think CIL++). I want one solution to all of these problems. Mat Harren, a grad student here, and I just spec-ed out a solution that seems like it will work.
What is a language anyways but a context free grammar?
What? What kind of question is that?
If anything, it's the context free grammar part of languages that is LEAST interesting and, with the arguable exception of C++, easiest part!
A language is a mapping of syntactic elements to an actual action that the computer will perform. Saying "x ? y : z" is a legal expression means almost nothing; saying "x ? y : z means that the computer will evaluate x, convert it to a boolean value; if it is true, the computer will evaluate y and that will be the result of the expression, otherwise the computer will evaluate z and that will be the result of the expression" is what a language IS.
Even ignoring semantics, there's a large number of syntactic rules that can't be specified in a CFG. For instance, "int main() { return x; }" is not a legal C++ program, but there's no way to say that variables must be declared before they are used. "5.4 + "hello world"" is (I hope and think) not a valid C++ expression, but the CFG doesn't capture that.
The language part of the C++ standard is about 300 pages. The context free grammar is about 25. (And that's not doing much to make it compact either; that might be one column of grammar rules per page.)
* If a variable or reference to a variable of primitive type is const, you can't assign to (and must initialize) it * If an object or reference to an object is const, you can't change non-mutable data members or call non-const functions * For pointers to primitive types, there are two notions of constness: whether the address stored in the pointer ("where the pointer points") can be changed, and whether the value at the pointed-to location can change. In the type of a pointer, __1__ <type> __2__ * __3__, a const in slot 1 or 2 refers to the second kind of constness, while a const in slot 3 refers to the first kind. * For pointers to objects, the same rules apply as the last point, except that the second notion of constness changes to preventing modification of non-mutible members or calling non-const member functions * A const on a member function means that you can't modify non-const data-members or call non-const functions of this * Implicit conversions will add but never remove const qualifiers
There, that's most of C++ const rules.
Now, I'm being a little facetious here, because the article does go into a lot more detail about how const is actually used in practice (for instance, passing objects by const reference for efficency), but I think the article is making more of it than needs be.
For instance, there's no difference between the treatment of const for normal variables, parameters, and return values. In all cases, const means you can't modify it or call non-const functions, yet he devotes a section to each.
The problem is that two different translation units define two different versions of struct A.
Relevant parts from Section 3.2 of the cpp standard: "There can be more than one definition of a class type... in a program provided that each definition appears in a different translation unit, and... each definition of [the name defined more than once] shall consist of the same sequence of tokens..."
In the example provided, two translation units have definitions for struct A. However, they are not identical; in particular, one has members that are ints, the other, shorts.
However: "If the definitions of the [name defined more than once] do not satisfy these requirements, then the behavior is undefined."
In other words, the compiler is not required to diagnose violations of the ODR (One Definition Rule).
In this particular example, the compiler compiled bar as if doprint had a four-byte argument* (two shorts) but then threw out one of the definitions of doprint, leaving the other to treat shorts as if they were ints.
*or maybe an eight-byte argument with misc padding that wasn't cleared
That was clearly not the founder's intention. If you take your interpretation, also unconstitutional are libel laws, slander laws, and copyright laws, despite the latter being explicitly written into the Constitution.
But it's not Darwin unless you kill yourself...
;-)
BTW, your sig is unusually fitting for this story.
MSR *had an implementation* before Russinovich heard about it. Furthermore, their implementation at the time was more complete (except for the detection arms race I mention in a couple other posts) than RootkitRevealer is even now.
There's plenty to bash MS for. You don't have to make up crap.
Because like oil, the whales, and oxygen, version numbers are a non-renewable resource!
On another note, it's a darn good thing we didn't send him into Iraq to look for WMDs,... or did we, and that's the real reason we didn't find anything there?
He was over there embedded with some US troops.
Then he aired a description of their troop movements as part of his news broadcast (including a map drawn in the sand), and the Army essentially booted him.
Yes, because creating a solution that you can't distribute because the black hats will find out about it will do everyone who didn't get it a big favor.
In C++, I don't know that it really makes sense to say that it passes a reference by value... it's possible that people refer to that sometimes (possibly even the standard), but conceptually I don't think it makes too much sense. It's only when you drop to the implementation perspective and realize that references are syntactic sugar for pointers that it makes sense.
But while C++ references are implemented *using* pointers, and are syntactic sugar around pointers, Java references ARE pointers. (The syntax differs from C++ because it uses . instead of ->, and you don't have pointer arithmetic in Java, but other than that the semantics are the same as C++ pointers, and there is no equivalent entity to C++ references.)
He big difference: Ghostbuster does a high/low scan with low being a "reboot to trusted media". Rootkit Revealer just uses two different APIs to do the high/low scan
Strider GhostBuster can do three different types of scans. See http://research.microsoft.com/rootkit/. Only one of the requires a reboot, and the one they call an "inside-the-box scan" is essentially the same thing that RootkitRevealer does. (And yes, even that was described in a paper submission before RootkitRevealer was authored.)
I've posted this a couple times before...
RootkitRevealer postdates the MSR tech report describing the techniques used in RR, as they were developed for Strider GhostBuster, by about 8 months.
Russinovich found out about Strider GhostBuster which wasn't (and still apparently isn't) released, and said "oh, I can write that."
If your OS runs drivers in kernel mode (Windows, Linux, BSD, and MacOS all do), your OS is vulnerable to rootkits. Period.
It might be harder or less hard to get them loaded, but if the user can load code that executes in ring 0, the game's over.
Detect the Ghostbuster scan and report honestly to it but not to anything else, hoping that nobody will notice one extra file with a cryptic name in the system32 directory. This reduces the effectiveness of the rootkit and puts the rootkit author on the wrong side of an arms race if Ghostbuster scans start changing.
And in fact this is already the situation in some sense; it helps a LOT less if you can do a cold scan (because you can do signature checking), but for the "hot" scan (option 2 on the MS site I believe), there's been a mini arms race going on between rootkits and RootkitRevealer on this very issue for some time. I mention it in a couple other posts on this topic; see one of them. (Probably other users have mentioned it by now too.)
Still, it's hard to see the functional improvement over Tripwire on a live CD.
GhostBuster has a couple improvements:
1. You don't really need a known-good configuration. All GB's cold scan does is compare the results from an offline scan to the results from an online scan. If you have a known-good scan I could imagine you could use it in place of the offline scan if you haven't made any changes, but you don't need one. By my limited understanding of Tripwire, without a known-good scan, Tripwire's useless.
2. GhostBuster can produce useful results for most known rootkits even without an offline scan, so there's no need to reboot. Basically what it does is get a lower-level representation of what it's scanning, parse it, and consider that a good scan. For instance, if looking for hidden registry keys, it will parse the registry hives on the file system. To pass this test, either the rootkit has to not modify the API results (returning correct results) when being sent to GhostBuster (this is where the arms race I talked about above actually came into play) or modify the results of the lower-level scan (reading the registry hives) as well. This part is, to my knowledge, completely novel.
The functionality that tripwire contains is only a part of what Strider GB contains, and only overlaps at the highest level of description with the part that's in RootkitRevealer.
The online hot scan that RootkitRevealer and GB do is more thorough than my understanding of what tripwire does. My understanding is that tripwire calculates hashes of a set of files in a known good configuration, then periodically recomputes these hashes and compares them to the known good. If the hashes change, that means that you might have a problem. Is this correct, or is it more advanced than that?
If you do RR's scan or GB's hot scan, it does a much deeper analysis. For instance, it will query the Windows API, but it will also parse the registry hives in the file system. I think it might be able to do the same thing with directory listings gotten from the Find(First|Next)File Win32 API calls vs what is found by doing block requests to the disk. The theory behind this is that so far, rootkits are not sophisticated enough to intercept these requests and change them to be wrong in the right way, leading to an inconsistency in the two views. (Making them be correct is non-trivial too, as evidenced by the the fact that rather than try to intercept those calls and hide the reg entry/file, they have tried to just *unhide* the view through the API to RR/GB only, thus making its view consistently correct, but still hide from other processes. RR's response was to randomly rename the exe each time its run, which led black hats to create rootkits that use the same technologies in virus scanners to detect rootkit detectors so that it can not hide from them.) Note that this process does NOT require a known-good configuration.
The current PoC can be found by just scanning all memory, and if that could be solved (very difficult)
;-)
It's not as difficult as you think. There's a proof of concept rootkit called Shadow Walker which uses a very clever technique taken from PaX's method for preventing stack and heap execution of code without hardware NX support. It's not pefect -- there are a couple avenues of detection that are hard to "solve" -- but it does go a fair way towards achieving that goal.
though this needs an external reference timesource since you can't trust the host
Which means you're subject to network delays and whatnot. This is actually harder than it seems to do well.
I already have total control over everything in it (provided my user allows me to have it, which is pretty much a given with MS OSs). Why do I need a rootkit?
You don't. It's poor reporting. GhostBuster isn't a rootkit; it's just a rootkit detection program. (Or set of programs.)
The article is misleading if not outright wrong; GhostBuster isn't a rootkit itself, it's just a rootkit detection thing very similar to RootkitRevealer. (GhostBuster came first and is more complete.)
It's closer to anti-virus than it is to a rootkit itself, though the similarities there don't go very far either. (AVs almost universally work by signature matching; GB works by comparing registry entries and files against each other by multiple means of acquiring that information in order to find the symptoms of having a rootkit -- missing information. This assumes that the rootkit is imperfect in hiding. For instance, this will do a scan of the registry through the standard API calls. But then it will parse the registry hives that are on disk. The assumption is that the rootkit is going to hook the API calls. Hooking the I/O calls is rather more difficult, and it's impossible if you can do a clean boot. (One of the options is to do a diff of a hot scan vs. a known good scan done from a Windows PE boot.) There are still things that rootkit authors can do though, specifically NOT hide from GB itself. IN the case of RootkitRevealer, this has actually turned into a mini-arms race of itself. Rootkits started not hiding from rreveal.exe or whatever it's called (so that it wouldn't detect diffs), so RootkitRevealer started randomly renaming itself each time it runs. The state of the art on the black hat side is to carry a signature of RootkitRevealer-like programs and do pattern matching in very much the same way that AV does pattern matching to find viruses.)
2. Rather than perfecting a rootkit, they should be working towards making a rootkit an impossibility in their OS.
If you can run drivers in kernel mode, you can run a rootkit. (Unless you can statically prove everything you let run in kernel space is safe... this may or may not be possible. For what it's worth, my current research is related to model checking drivers.)
Its a logical extension to the program "rootkit revealer" by sysinternals (who they happend to have bought out).
Which is an interesting comment considering that GhostBuster came first.
Actually, in a rare turn of events, GhostBuster isn't the reincarnation.
MSR has been working on GhostBuster for some time, with a white paper released July 2004. That MSR site says that RootkitRevealer was released Feb 22, 2005. This fact is confirmed by archive.org, where the version archived Feb 22 does not contain RR and the one from Feb 23 does. (Not to mention the front page listed it as Feb 22.)
But it's not just digital TVs that do it. I have about a 15 year old TV. I also have a TV tuner for the computer. Plugged into the same analog signal, the tuner takes a second and a half or two to change channels. Many (analog) TVs I've seen are about the same. But that over-a-decade old TV... changes channels almost instantly. Maybe a quarter second.
I always get very frustrated when on a TV that takes a while to change channels.
And from what you're saying it can take up to 10 seconds on digital? Holy crap... how can you channel surf with that? Maybe I'll just not go digital. Sometimes old tech is better.
I never used Java (tried it a bit and intend to use it soon), but that sounds terrible. I don't like special cases in languages. In C++ you decide for yourself if you want to pass by value or by reference
It's not really a special case though, because Java *isn't* pass by reference. It's pass by value. It just passes references by value.
If you think of reference = pointer then there's no confusion, and no special case.
I'm not saying Java is a great language (I'll take C++ any day), but this part is not as conceptually unclean as it seems.
Based on what? This article? From the comments throughout this story, it seems the article is just FUD. I'll admit that Macs seem to be better, but if you look through you'll see at least a couple saying that even their Macs have problems.
Also, in Vista, it doesn't do hibernate at the time when it goes to sleep. So if you sleep your laptop and remove the battery five minutes later, you're royally fucked without a warning.
Did you even read the post you're repyling to:
Or do you have information that this is wrong?
Hah! I wish. Berkeley was my top choice when I was applying to grad schools, but they didn't like me apparently. I'm at UW Madison now, which is still a fine school, and it's very nice. (Albeit a bit cold for the past couple weeks.)
Do you have a source for this? I've followed the CIL mailing list for awhile, and I thought that project was scrapped awhile ago.
Like I said, I don't go there, so I'm quite removed from what's going on. That said, I'll tell you what I know. I'm working on a project right now that uses Elsa, so I've been in correspondence with the people behind that a little bit. The prof in the class this project is for is a recent Berkeley grad (like a couple years ago) and said that there was some effort on this a while ago, but it was scrapped, so this is probably what you are referring to. I'm looking back at the email that talked about this, and I think saying there's early work might have been a little premature actually; but it's definitely still on the radar there.
First, I should give a little background. Scott McPeak, a (former now I think?) grad student there, has a Generalized LR parser called Elkhound which at least sounds quite fancy. (I don't know enough about parsing theory and grammars and such.) He used that to create a C++ front end called Elsa. This has been sort of slurped up under the umbrella of a project called Oink, which is overseen by Dan Wilkerson and Karl Chen. Elsa development is continuing under that heading, though it seems that Scott McPeak has little time for it now. The idea behind Oink is to provide platform for different static analysis tools for C++. Right now it's essentially a C++ version of CQual (which is another project out of Berkeley, this time by a guy named Jeff Foster).
[The poster I'm replying to can skip this paragraph.] CIL is an abbreviation for C Intermediate Language, which is in some sense a simplified form of C. (Many kinds of operations are "lowered" into other forms to reduce the number of constructs that one has to analyze. For instance, all loops are changed into "while(1) {
[Finally, to complete the "these are from Berkeley" bit, the project that I'm working on with Elsa is extending the Cooperative Bug Isolation (CBI) project to being able to use C++. CBI is the brain child and (ACM award-winning) dissertation of Ben Liblit, who wrote his C source-to-source transformation using CIL while he was a grad student at Berkeley, and he's now my prof.]
Now... with all that setup, I actually have surprisingly little to say. Point being is that the Berkeley crowd is still interested in creating a "CIL++"-like think. Dan Wilkerson and Matt Harren are starting to put together a way to create OCaml bindings to the ASTs that Elsa generate. At least an eye is cocked towards a CIL++, though I think their aim is bigger than that. From the relevant email:
I
What is a language anyways but a context free grammar?
What? What kind of question is that?
If anything, it's the context free grammar part of languages that is LEAST interesting and, with the arguable exception of C++, easiest part!
A language is a mapping of syntactic elements to an actual action that the computer will perform. Saying "x ? y : z" is a legal expression means almost nothing; saying "x ? y : z means that the computer will evaluate x, convert it to a boolean value; if it is true, the computer will evaluate y and that will be the result of the expression, otherwise the computer will evaluate z and that will be the result of the expression" is what a language IS.
Even ignoring semantics, there's a large number of syntactic rules that can't be specified in a CFG. For instance, "int main() { return x; }" is not a legal C++ program, but there's no way to say that variables must be declared before they are used. "5.4 + "hello world"" is (I hope and think) not a valid C++ expression, but the CFG doesn't capture that.
The language part of the C++ standard is about 300 pages. The context free grammar is about 25. (And that's not doing much to make it compact either; that might be one column of grammar rules per page.)
I don't see why you need to say so much.
* If a variable or reference to a variable of primitive type is const, you can't assign to (and must initialize) it
* If an object or reference to an object is const, you can't change non-mutable data members or call non-const functions
* For pointers to primitive types, there are two notions of constness: whether the address stored in the pointer ("where the pointer points") can be changed, and whether the value at the pointed-to location can change. In the type of a pointer, __1__ <type> __2__ * __3__, a const in slot 1 or 2 refers to the second kind of constness, while a const in slot 3 refers to the first kind.
* For pointers to objects, the same rules apply as the last point, except that the second notion of constness changes to preventing modification of non-mutible members or calling non-const member functions
* A const on a member function means that you can't modify non-const data-members or call non-const functions of this
* Implicit conversions will add but never remove const qualifiers
There, that's most of C++ const rules.
Now, I'm being a little facetious here, because the article does go into a lot more detail about how const is actually used in practice (for instance, passing objects by const reference for efficency), but I think the article is making more of it than needs be.
For instance, there's no difference between the treatment of const for normal variables, parameters, and return values. In all cases, const means you can't modify it or call non-const functions, yet he devotes a section to each.
You don't need to write doprint because the type can be deduced from the argument to doprint.
(This can be further demonstrated by the fact that making your change doesn't fix the problem.)
The problem is that two different translation units define two different versions of struct A.
... in a program provided that each definition appears in a different translation unit, and ... each definition of [the name defined more than once] shall consist of the same sequence of tokens ..."
Relevant parts from Section 3.2 of the cpp standard:
"There can be more than one definition of a class type
In the example provided, two translation units have definitions for struct A. However, they are not identical; in particular, one has members that are ints, the other, shorts.
However:
"If the definitions of the [name defined more than once] do not satisfy these requirements, then the behavior is undefined."
In other words, the compiler is not required to diagnose violations of the ODR (One Definition Rule).
In this particular example, the compiler compiled bar as if doprint had a four-byte argument* (two shorts) but then threw out one of the definitions of doprint, leaving the other to treat shorts as if they were ints.
*or maybe an eight-byte argument with misc padding that wasn't cleared