Slashdot Mirror


C# Under The Microscope

For anyone not following the story thus far, C# is here, courtesy of Microsoft, Anders Hejlsberg, and a raftload of other languages from which its features are derived (or added to). Napster's Nir Arbel here dissects what C# means to programmers used to other languages, and explains a bit about where it fits into the grand scheme of things. Be warned: Nir takes a pragmatic, low-dogma approach which may be unsettling to some readers. Please watch your head.

To Begin at the Ending

I'm a big fan of programming languages, possibly more than of actual programming. Every once in a while I hear about this new language that is just "brilliant", that "does things differently" and that "takes a whole different approach to programming". I typically then take the necessary time off my regularly scheduled C++ programming, learn enough about the language to get excited about the new one, but not enough to actually do anything useful with it, rave about it for a couple days, and then quietly and without protest go back to my C++ programming.

And so, when I learned of Microsoft's new up-and-comer, C# (pronunciation: whatever), I became duly excited and went forth to learn as much about it as possible.

Last things first: On paper, C# is very interesting. It does very little that's truly new and innovative, but it does do several things differently, and through this paper I hope to explore and present at least some the more important differences between C# and its obvious influences: C++ and Java. So, skipping the obligatory Slashdot "speaking favorably of Microsoft" apology, let's talk about C#, the language.

How is it like Java/C++?

In the look & feel department, C# feels very much like C++. More so than even Java. While Java syntax borrows much of the C++ syntax, some of the corresponding language constructs have a slightly different form of use. While this is hardly a complaint, it's interesting to note that the designers of C# went a little further in making it look like C++. This is good for the same reason it was good with Java. Being a professional C++ programmer, I use C++ way more than any other language. Eiffel, for instance, has a much cleaner syntax than either C++, C# or Java, and at face value it does seem as though one should bear with new syntax if this is going to lead to cleaner, more easily understandable code, but for an old dog like myself, not having to remember so much new syntax when switching to another language is nothing short of a blessing.

C# borrows much from Java, a debt which Microsoft has not acknowledged, and possibly never will. Just like Java, C# does automatic garbage-collection. This means that, unlike with C and C++, there is no need to track the use of created objects, since the program automatically knows when objects are no longer in use and eventually destroys them. This makes working with large object groups considerably simpler, although, there have been a few instances where I was faced with a programming problem where the solution depended on objects *not* being automatically destroyed, as they were supposed to exist separate from the main object hierarchy and would take care of their own destruction when the time was right. Stroustrup's vision of automatic garbage-collection for C++ sees automatic garbage-collection as an optional feature, which might make the language more complicated to use, but would allow better performance and increased design flexibility.

One interesting way in which C# deals with the performance issues involved with automatic garbage collection is that of allowing you to define classes whose objects always copy by value, instead of the default copy by reference, which means there is no need to garbage- collect such objects. This is done, confusingly enough, by defining classes instead as structs. This is very different from C++ structs, which are defined in exactly the same way; C++ structs are just classes where members are public by default, instead of privately. Another idea that was lifted directly off Java, and one which turned out to be very controversial is that of multiple inheritance. In what seemed like a step backwards, Java did not allow you to define classes that inherit from one than one class. Java did let you define "interfaces", which work like C++ abstract classes, but were semantically clearer: an interface is a functional contract that declares one or more methods. A class can choose to "sign" such a contract by inheriting it, and providing a working implementation for every method that the interface declares. In Java, you can inherit as many interfaces as you want. The rationale to all this being that multiply inheriting more than one class raises too many possible problems, most notably that of clashing implementations and repeated inheritance. On a side note, the cleanest separation between interface and implementation that I know of is that of Sather, where classes can provide either implementation or interface, but not both.

So what else is new?

One new feature that I mentioned already was that of copy-by-value objects. This seemingly small improvement is a potentially huge performance saver! With C++, one is regularly tempted to describe the simplest constructs as classes, and in so doing make it safer and simpler to use them. For example, a phone directory program might define a phone record as a class, and would maintain one PhoneRecord object per actual record. In Java, each and every one of those objects would be garbage collected! Now, Java uses mark-and-sweep in order to garbage collect. The way that this is done is this: the JVM starts with the program's main object, and starts recursively descending through references to other objects. Every object that is traversed is marked as referenced. When this is done, all of the objects that aren't marked are destroyed. In the phone book program, especially if there are thousands and thousands of phone records, this can drastically increase the time that it takes the JVM to go through the marking phase. In C#, you'd be able to avoid all this by defining PhoneRecord as a struct instead of a class.

Another thing that C# does better than Java is the type-unification system. In Java, all classes are implicitly descendents of the Object class, which supplies several extremely useful services. C# classes are also all eventual descendents of the object class, but unlike Java, primitives such as integers, booleans and floating-point types are considered to be regular classes. Java supplies classes that correspond with primitive types, and mapping an object-value to a primitive value and vice versa is very simple, but C# makes it that much simpler by eliminating that duplicity.

Personally, I found C# support of events to be a very exciting new feature! Whereas an object method operates the object in a certain way, object events let the object notify the outside world of particular changes in its state.. A Socket class, for instance, might define a ReadPossible event or a data object might release a DataChanged event. Other objects may then subscribe for such an event so that they'd be able to do some work when the event is released. Events may very well be considered to be "reverse- functions", in the sense that rather than operate the object, they allow the object to operate the outside world, and in my programming experience, events are almost as important as methods themselves.

While you could always implement events in C by taking pointers to functions, or optionally in C++ and Java by taking objects that subclass a corresponding handler type, C# allows you to define class events as regular members. Such event members can be defined to take any delegate type. Delegates are the C# version of function pointers. Whereas a C function pointer consists of nothing but a callable address, a delegate is an object reference as well as a method reference. Delegates are callable, and when called, operate the stored method upon the stored object reference. This design, which may seem less object-oriented than the Java approach of defining a handler interface and having subscribers subclass the interface and instantiate a subscriber, is considerably more straightforward and makes using events nearly as simple as invoking object methods.

Events are one example of how C# takes a popular use of pre-existing object-oriented mechanisms and makes it explicit by giving it a name and logic of its own. Properties are another example, even though they're not as much of a labor-saver as events are. It is very commonplace in C++ to provide "getters" and "setters" for private data members, in order to provide controlled access to them. C# treats such "protected" data members as Properties, and the declaration syntax of properties is such that you have to provide getter and setter functions for each property. In fact, properties do not have to correspond to real data members at all! They may very well be the product of some calculation or other operation.

And then, by far the ugliest, most redundant and hard-to-understand language construct in C# is the Attribute. Attributes are objects of certain types that can be attached to any variable or static language construct. At run-time, practically anything can be queried for the value of attributes attached to it. This sounds like the sort of hack someone would work into a language ten years after it's been in use and there was no other way to do something important without breaking backwards compatibility. Attributes are C#'s version of Java reflection, but with none of the elegance and appropriateness. In general, and especially in light of C#'s overall design, the Attributes feature is out of place, and inexcusable.

What is it missing? Being an unborn language, there is much that C# does not yet promise to deliver, and for which it can't be criticized. First of all, there is no telling just how well it would perform. Java is, in many ways, the better language but one of the prime reasons it's been avoided is its relatively slow performance, especially compared to corresponding C and C++ implementations. It's not yet clear whether C# programs would need the equivalent of a Java Virtual Machine or whether they could be compiled directly into standalone executables, which might positively affect C#'s performance and possibly even set it as a viable successor to C++, at the very least on Windows. While there is much talk of C# being cross-platform, it is unclear just how feasible implementing C# on non- windows platforms is going to be. The required .NET framework consists of much that is, at least at the moment, Windows specific, and C# relies heavily on Microsoft's Component Object Model. All things considered, setting up a proper environment for C# on other platforms should prove to be a massive undertaking, that perhaps none other than Microsoft can afford.

Furthermore, while there is mention of a provided system library, it's not clear what services such a library would provide. C++ provides a standard library that allows basic OS operations, the immensely useful STL and a powerful stream I/O system with basic implementation for files and memory buffers. The Java core libraries go much further by providing classes for anything from data structures, to communications, to GUI. It is yet to be seen how C#'s system library would fare in comparison.

One thing that's sure to be missing from C#, and very sadly at that is any form of genericity. Genericity, such as it is implemented in C++, allows one to define "types with holes". Such types, when supplied with the missing information, are used to create new types on the spot, and are therefore considered to be "templates" for types. A good example of a useful type template is C++'s list, which can be used to create linked-lists for values of any type. Unlike a C linked-list that takes in pointers to void or a Java linked list that takes Object references, a list instantiated from the C++ list template is type-safe. That is to say, it would only be able to take in values of the type for which it was instantiated. While it is true that inheritance and genericity are often interchangeable, having both makes for a safer, possibly faster development platform.

The designers of C# have admitted the usefulness of genericity, but also confessed that C# is not going to support genericity on first release. More interestingly, they are unhappy with C++'s approach to genericity, which is based entirely on templates. It would be interesting to see what approach C# would take towards the concept, seeing as templates are pretty much synonymous with genericity at the moment.

To sum it up

Many now refer to C# as a Java-wannabe, and there is much evidence to support this notion. C# doesn't only borrow a number of ideas from Java. It seems to follow up on Java's sense of clean design. It's a somewhat sad observation then that C#, purely as a language, not only provides a fraction of the innovation and daring that Java did, it also falls just a little behind Java where cleanliness and simplicity are concerned. However, if you're someone like myself, who uses Windows as their primary development platform and needs to use C or C++ because he cannot afford the overhead that Java incurs, it's possible that C# would turn out to be a very beneficial compromise.

6 of 389 comments (clear)

  1. Objective-C, NeXTStep, OpenStep, Mac OS X, and C# by plsuh · · Score: 5
    It is useful to note that much of what C# provides actually originated in the context of NeXT, the Objective-C language, and the associated operating systems.

    Inheritance and Interfaces

    Another idea that was lifted directly off Java, and one which turned out to be very controversial is that of multiple inheritance. In what seemed like a step backwards, Java did not allow you to define classes that inherit from one than one class. Java did let you define "interfaces", which work like C++ abstract classes, but were semantically clearer: an interface is a functional contract that declares one or more methods.
    Objective-C in the OpenStep/Mac OS X environment has single inheritance from a base class (NSObject), and protocols, which are precise counterparts to Java's interfaces. I have run into situations, however, where multiple inheritance is exactly what is required, and using interfaces meant that I had re-write the exact same code more than once, as I was implementing a group of specialized collection classes in Java. There were two axes of differentiation: mutability = (immutable, mutable), and ordering (partially ordered, ordered, strictly ordered). There was a lot of code that had to be duplicated that I should have been able to inherit from two abstract superclasses, one for mutability, and one for ordering. (*grumble*)

    Garbage Collection and Memory Management

    One new feature that I mentioned already was that of copy-by-value objects. This seemingly small improvement is a potentially huge performance saver! With C++, one is regularly tempted to describe the simplest constructs as classes, and in so doing make it safer and simpler to use them. For example, a phone directory program might define a phone record as a class, and would maintain one PhoneRecord object per actual record. In Java, each and every one of those objects would be garbage collected! Now, Java uses mark-and-sweep in order to garbage collect. The way that this is done is this: the JVM starts with the program's main object, and starts recursively descending through references to other objects. Every object that is traversed is marked as referenced. When this is done, all of the objects that aren't marked are destroyed. In the phone book program, especially if there are thousands and thousands of phone records, this can drastically increase the time that it takes the JVM to go through the marking phase. In C#, you'd be able to avoid all this by defining PhoneRecord as a struct instead of a class.
    Objective-C provides a semi-automatic reference-counted garbage collection mechanism that is amenable to programmer intervention to increase efficiency, through a construct called an Autorelease Pool. Every object has a retain count, which can be incremented or decremented. The object's retain count starts at one, and when an object's retain count goes down to zero it is garbage collected. Note that this happens the instant that the retain count drops to zero, not during a mark/sweep. However, you may need to pass an object on to another part of your app, but your code does not need/want to retain it. What you do instead is tell the object to auto-release. It is then put into the autorelease pool, and later on during the system's garbage collection each object in the autorelease pool is sent a release message. Some objects that are entered in the autorelease pool still have a retain count (as they are being retained by other objects) and are simply removed from the autorelease pool; others have their retain counts drop to zero and are garbage collected.

    You can fine-tune this mechanism to a high degree, by putting your own autorelease pool in the stack ahead of the system's primary autorelease pool. For instance, suppose you know that you will be allocating a whole bunch of objects for use in a part of your program, and after you exit you will never need them again. Well, you can put your own autorelease pool in for the system's autorelease pool at the start of that section of your code, write normal code, then remove and release your private autorelease pool and put back the system autorelease pool, which release all of the objects you created in your little section of code. Conversely, if you want an object to stick around, just don't ever release or autorelease it.

    However, from a business standpoint, I find that the automated garbage collection and never having to worry about memory allocation issues is a strong point of Java. It allows me to code more complex applications and avoid memory debugging issues that invarable bedevil complex Objective-C and C++ programs. I can get a WebObjects application to a customer much more quickly using Java than using Objective-C, with quicker turnaround and more feedback cycles.

    Events, Notifications, and Delegation

    Personally, I found C# support of events to be a very exciting new feature! Whereas an object method operates the object in a certain way, object events let the object notify the outside world of particular changes in its state.. A Socket class, for instance, might define a ReadPossible event or a data object might release a DataChanged event. Other objects may then subscribe for such an event so that they'd be able to do some work when the event is released. Events may very well be considered to be "reverse- functions", in the sense that rather than operate the object, they allow the object to operate the outside world, and in my programming experience, events are almost as important as methods themselves.
    The OpenStep and Mac OS X operating systems (viewed separately from the Objective-C language, as these features are available from Java as well) have long had notifications and delegates. There is a system-wide notification center, objects can define notifications that they will post in response to certain events, and objects can register to receive particular events or classes of events. This mechanism has been in place for a long time.

    Delegation is a bit more tightly tied to Objective-C, as objects in Obj-C can pass messages (i.e. method calls) onto to other objects, and objects can "pose as" other objects. An object can register to be the delegate of another object (in Java, the delegator object needs to make special provision for this), and there are "informal protocols" or "informal interfaces" defined that indicate the possible messages a delegate might receive from its delegator. Again, this is not new, and its assembly into a single OS is not new.

    Primitive Types

    C# classes are also all eventual descendents of the object class, but unlike Java, primitives such as integers, booleans and floating-point types are considered to be regular classes.
    This is one feature that I like very much, and wish that Java had. Objective-C, of course, will always have to support native types such as char's and int's, as it is defined as a superset of C. However, Java had the opportunity to remove this artificial distinction, and has caused lots of cursing from yours truly over the past couple of years.

    Compiling to Native Code

    It's not yet clear whether C# programs would need the equivalent of a Java Virtual Machine or whether they could be compiled directly into standalone executables, which might positively affect C#'s performance and possibly even set it as a viable successor to C++, at the very least on Windows.
    I would point out here that compiling to native code may not result in the fastest execution. Review the HP Dynamo project, as written up on Ars Technica, for the reasons why JITC can actually exceed the speed of native code. The whole Transmeta Crusoe architecture is built around this theory of operation, and no one will claim that it's too slow.

    Genericity

    One thing that's sure to be missing from C#, and very sadly at that is any form of genericity.
    Amen to this. The fact that genericity is missing from Java is a serious gripe of mine, and the fact that it is missing from C# is a serious omission. This business of casting objects coming out of arrays is a pain the in neck, and it is often tough to find out where an object of the wrong type went into an array, although on the cast coming out you get a ClassCastException. Far better to catch the problem when the object goes in, which often gives you a better idea of where your design is broken. One of these days I am really going to have to start using the stuff coming out of the GJ project.


    Conclusions

    Overall, I find that the "new" stuff in C# is really old stuff. Furthermore, this is not the first time that all of this has been pulled together in one place. Almost all of this has been in the NeXTStep/OpenStep/Mac OS X family for a long time, and the implementations there are quite mature. I suspect that the implementations in C# will require several revisions before they reach the levels that programmers can really use.

    Just so everyone knows, I am a Consulting Engineer working for Apple iServices, a part of Apple Computer, specializing in WebObjects development. These opinions are my own, however, and not those of Apple.


    --Paul
  2. This level of language... by Chiasmus_ · · Score: 5

    I think this whole "write in languages that are C, but easier" movement that's been going on for decades is a little weird.

    If I want to use a medium-level language because I want absolute control and optimized speed, I'll use C. I don't want an "almost-medium-level-but-a-little-higher-than-that -level language". If I was looking for ease of use and didn't care about optimizing, I'd go with PERL, or, hell, even Quickbasic.

    Granted, there's a need for these "weird-level" languages, and some people love them - but I think that C++ and Java nicely fill the niche. So, my first thought, which is even more valid, I think, in the face of this review, is "Why does Microsoft feel almost obligated to make an M$ version of *everything*??"

    For GUIs and money managers and anything else aimed at "my mom", Microsoft is guaranteed to reign supreme, because "my mom" doesn't really care about performance issues or security or any of that. But my hunch is that, in light of some of the bugs and general ickiness covered in this review, few people are going to want to switch over to C#. I mean, what would be the advantage?? If you already write C++ and/or Java, why would you want to start writing stuff in C#? I just don't understand.

    --
    "Beware he who would deny you access to information, for in his heart he deems himself your master."
  3. C#: Answer to the DOJ? by KFury · · Score: 5

    (I know I'll get thrashed for this, but my karma can take it)

    It seems to me that creating a new 'standard' language, which neverltheless relies heavily on COM and .NET ties which only exist on Windows, is in part a tactical method to inhibit migration of Windows products to other platforms.

    Let's say that C# is simply a better language to program for Windows than C++ is. Let's also suppose the hypothetical case where new Windows functionality comes along in future Win versions, and that this functionality is more easily taken advantage of using this new C# language. This gives developers the incentive to code new Windows products in C#. Note that C# has substantially different enough structures that porting from C# to C++ would not be trivial.

    Now suppose that Linux (or another OS) starts gaining prominence in the next 2-8 years. As with any new OS, its main barrier to entry is lack of software. (The only reason Linux is viable is because of all the UNIX software it inherits.) In this time, Microsoft's pushing of C# has created a new software base for Windows that is relatively locked into place, unable to be ported to other platforms without significant effort.

    Now I'm not saying this is evil. I'm not saying it's a conspiracy. Often languages built for specific environments are superior tools in those environments specifically because they're specialized.

    It's just something to be aware of.

    Kevin Fox

  4. For good "template" support: try ML by Tom7 · · Score: 5

    ML has an excellent implementation of parametric polymorphism (sometimes thought of as "templates"). You can define a function that counts the elements in a list of anythings:

    fun length nil = 0
    | length (h::t) = 1 + (length t)

    which has type: 'a list -> int
    (meaning the function takes a list of anything, and returns an integer).

    Through the mechanism called "functors", you can specialize a generic structure (say "sets", or "mappings", or "arrays") with some types and operations to create a new type. Signatures let you make these types truly abstract (paired with type safety, a very powerful notion).

    All of this is type safe (with proofs). Most of it is accomplished statically too, so there's little run-time overhead. It is indeed scheme with "some work".

  5. Attributes by EAG · · Score: 5

    Attributes aren't really the C# version of reflection; reflection lets you look at the normal things you'd expect it to, and it also lets you look for attributes.

    Why should you care?

    Well, attributes are really useful in cases where you want to pass some information about the class somewhere else but you don't want to make it part of the code.

    With attributes, for example, you can specify how a class should be persisted to XML.

    [XmlRoot("Order", Namespace="urn:acme.b2b-schema.v1")]
    public class PurchaseOrder
    {
    [XmlElement("shipTo")] public Address ShipTo;
    [XmlElement("billTo")] public Address BillTo;
    [XmlElement("comment")] public string Comment;
    [XmlElement("items")] public Item[] Items;
    [XmlAttribute("date")] public DateTime OrderDate;
    }

    At runtime, the XML serializer looks for those attributes on the object it's trying to serialize, and uses them to control how it works.

    You can also use attributes to communicate marshalling information, security information, etc.

    The nice thing about attributes is that it's a common mechanism, and it's extensible, so you don't have to invent some new mechanism to do something similar.

    Or, to look at it another way, attributes are just a general mechanism for getting information into the metadata.

  6. A look at C# by _prime · · Score: 5

    I recently went to a brief presentation on C#, done by some Comp Sci folks just back from the MS developer conference.

    A few points I recall:

    • It does require a virtual machine. It forms a layer of abstraction called the "Common Language Runtime." The name of the VM itself is "NGWS". Performance questions led to answers along the lines of it being comparible to Java; there were indications that MS engineers believed it would eventually run faster than a compiled language due to the virtual machine's ability to optimize code execution for individual processors and systems.
    • They had a look "under the hood" of the Virtual Machine only to discover that it looked *strangely* just like MS's Java VM. Apparently they changed the variable/function names but the programmer who was taking a look said the code itself looked the same. They commented that they could actually run Java code on the system without problems, providing it didn't refer to any of the special Java class libraries.
    • Visual Studio 7 (?) would be known as Visual Studio .NET and would feature C# and the VM. They remarked that the beta release they received was quite stable.
    • The VM would run on everything from Win98 up (not sure about 95).
    • They went into some detail on MS's new strategy. Basically they are hoping to capture the Middle Tier market with C# services running on the VM (NGWS), accessible remotely via SOAP. On top would be ASP+ (internet) and Win Forms (enterprise).

    Someone asked why we need another language, especially one so close to Java. The presenter(s) explained that MS basically wanted to offer a VM based Java-like language, but was unable to add their own extensions to Java fit in with their new strategy (remember the lawsuit from Sun?). They remarked that perhaps Sun made a mistake in their desire to keep MS from making non-standard alterations to the MS implementation of the Java VM. MS, as usual, just went ahead and created their own new standard. Now we have another language to pull developers away from Java.