Ask Slashdot: Best Programs To Learn From?
First time accepted submitter camServo writes "I took C++ classes in college and I have played around with some scripting languages. We learned the basics of how to make C++ work with small programs, but when I see large open source projects, I never know where to even start to try and figure out how their code works. I'm wondering if any of you have suggestions for some nice open source projects to look at to get an idea for how programming works in the real world, so I can start giving back to the FOSS community." Where would you start?
The more lines of code the more difficult to get started as a general rule. Just find a small library that provides support for something you have an interest in. Tinker with it.
Two of my imaginary friends reproduced once
Off you go
Oh yeah, Good ol' BSD kernel. The best one in town.
I have often wondered the same thing. People tell me, "read the code and submit patches!" It may sound like hand-holding to experienced developers, but many new coders could really use an introduction to becoming a part of a community around a project.
And then do the opposite.
Works every time.
For C++ I would suggest Qt.
For C I would suggest Minix3.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
If you want examples of 'real world' programming, take a bowl of spaghetti, add some additional ingredients that you wouldn't normally expect to see in spaghetti, and then fling the whole thing against a wall.
That's what the vast majority of modern day code looks like, especially if the organization that wrote the code tried to outsource the development effort 'to save money' at some point during the dev cycle.
Besides looking at the code of others be sure to look at the code you wrote a year ago and haven't looked at since. You should learn something about good comments and documentation. You probably will have ideas on how to better implement things. There is some truth to the notion that programmers don't really like the code they wrote for a project until they have thrown it out and rewritten it from scratch for the third time.
A tip I always give:
1. Start writing something you want. (It'll keep you interested)
2. Google the SMALLER hard parts (String Parsing, data models, misc functions, etc)
3. Use that code. (No one is going to blame you for copypasta on your own project.)
Eventually you'll understand how the copied code works. After a few projects you end up writing your own version because you're better than "that guy you copied from".
Um the "kernel" (by which I assume you mean Linux) is not written in C++.
It should be, but it isn't.
I mean, it's full of objects with derivation and virtual functions, and structs on which constructors and destructors have to be called for everything to remain in one peice. Seems odd not to use a language which is every bit as efficient, has a familiar syntax and yet does a large number of common tasks automatically and without errors.
Oh, and the other thing is that linux has the vtable inside the classes rather than a vptr, presumably because they syntactic overhead of a vptr is too high. C++ is by default significantly more memory efficient in this regard.
SJW n. One who posts facts.
I think you already do.
This is the difference between C and C++: in C, whatever the code of a function says it does, it does; in C++, whatever the code of a function says it does is subject to be changed by templates, operator redefinitions, etc. Because of this it is impossible to make small changes without reading and understanding the entire codebase first.
Basically, if you want to get involved in a large C++ project, you either have a tour guide or very good documentation or make the huge investment of learning the entire superstructure of the program before making any changes to any part of it. It's kinda interesting how C++ encourages this kind of greater dependency between different parts of a program than C.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
The special treatment that C++ gives to constructors and destructors makes things harder, though. They don't return any value. If the constructor fails your only option is to throw an exception. But C++ exceptions make code execution slower. Another alternative is to check the object through a method after construction, which a lot of STL objects do, but that's kind of messy.
Don't get me wrong; I program in C++. But this is one of the dilemmas that I've faced and researched, and I still don't know what to do about it.
the most important thing is to have techniques that allow you to find your way. the language doesn't actually matter, but it does definitely definitely help if the code is documented in some fashion. i tried, for example, to work on fontforge with my usual techniques, and the code was so incredibly dense and uncommented that it was absolutely impossible to understand. but, exceptions aside, here's a starting point for getting into large projects:
* use vi. do not use graphical editors. do not use emacs :e # to go _back_ to the file you were originally editing (after using ctrl-])
* get a damn big monitor (or 2 monitors). open xterms at 80x60, as many wide as you can get.
* use a multi-window desktop manager (i use fvwm2 and i run a 6x4 grid: that's 24 desktops.
* be prepared to open (and background) up to 200 simultaneous files, across multiple windows.
* make sure that you open the files from the *root* of the project.
* open the files "by name", explicitly, so that you can do "jobs | grep {filename}"
* run "ctags -R" - it is your friend. then use ctrl-] on a function you don't know, and read about it.
* remember to use
* be prepared to print out the ENTIRE codebase, and flip through it, off-line, very very quickly.
* be prepared to do page-down, page-down, very very quickly, through as many files as you can stand
the main thing to do is to get a vague map of the code into your subconscious, as quickly as possible. then you will go "i've seen that before..." and you stand a chance of being able to hunt for it and find it.
you *don't* have to memorise the entire codebase - you *don't* have to even understand all of it. but you *do* need to at least have the techniques which will allow you to jump to wherever it is that you want to go.
ultimately, though, you need a goal. what, exactly, is it that you want to achieve? if you have no goal, you are pissing in the wind.
i added NT Domains Security to freedce - that's a good, simple goal. FreeDCE is 250,000 lines of code, and very well laid-out. it was therefore quite straightforward to add 6,000 lines of code to do NTLMSSP. took a couple of weeks.
i added python bindings to webkit - that's a good simple goal (ok, it was horrendous, requiring over 12 different skillsets, including c, c++, python, perl, autoconf, gtk, python c modules, IDL files parsing - the list just went on and on). webkit is a massive project, and also very well laid-out and structured. the first version of the python bindings took about 8 weeks, and the 2nd (faster, better) version took only 2. the reason why the 2nd version took only 2 weeks is because i hunted down the mozilla xulrunner IDL file parser, hunted down python-gobject's code generator, adapted the xulrunner IDL file parser to understand the webkit IDL file-format (2 days), then spent the rest of the time hacking codegen.py to spew out the data types from webkit, and to create a standard python c module.
so you say "you don't know how to get familiar with a free software project", well, i am not - i wasn't familiar with webkit, but that didn't stop me. i wasn't familiar with xulrunner, but that didn't stop me. i wasn't familiar with python-gobject's codegen, but that didn't stop me. i just got on with it, and just trusted that the surrounding code would do its job, and trusted that the bit of code that i picked up could be adapted.
so in many ways, tackling a large codebase is more about overcoming your own fear and feelings of inadequacy. sometimes not even i can do that, and sometimes i can.
I finally found you. I hate you. Not personally, but this kind of thinking. A TCP class raises a disconnected exception, the stream class raises an interrupted exception, the object class raises an error exception, and the application says "There was an error." What kind, and how do I fix it?
OOP error handling and code reuse can be done well, but it generally is not. The basic idea of a "return code", giving some sort of information or context about the error, is very important. Even if it's just preserving the exception information to bubble up.
I've collected probably a hundred Microsoft-specific error messages that don't mean what they say they mean. They add helpful text to say what you might fix, but that's a red herring. There is an underlying error which is caught but not bubbled up, and it leaves the user with little or no idea what to do.
You have to have the idea, if not the implementation, of returning something to the user.
And, I take exception to your assertion that exceptions don't make the code slower. Each class wraps its code in try/catch and has to deal with fairly complicated Exception objects in many cases. Did the file open? fopen() returns null, and you can get more information if you want it. OOP says you have to make an exception object and run catch code, and go up the stack and to the exceptions there.
Code that experiences no exceptions will not be noticeably slower, but code that relies on exception processing to try alternate methods or re-try will be a lot slower. This from someone who looks at C++ code at the (dis)-assembler level.
Um the "kernel" (by which I assume you mean Linux) is not written in C++. It should be, but it isn't.
There are reasons the kernel doesn't have any c++ in it (link is about git, but same deal for the kernel).
If "the real world" means the corporate world, do this. Take an application you don't care about and don't know how to use, and assign yourself a bug to fix, and give yourself a deadline pulled from a RNG.
Code is developed this way:
Start developement
Shrink the team
Fire all but 1 guy who does all the maintenance
Bring in contractors
Add a few people
Shrink the team
Fire the 1 guy who knows everything
Scramble to find someone who knows the application
Bring in contractors
Select any step above at random
I'm not trying to be funny. End result is quirks, inconsistencies, inexplicable code blocks, bugs, performance issues, and all kinds of other bad things.
And if the code that *has* to be called to make the object valid fails, how do you prevent that object from being used when it fails? Can't use a return code as the programmer may simply ignore the return code and then blithely try to use the object that is now failing it's invariants. With the constructor throwing an exception, at least the code block that the variable was declared in will be exited, causing that variable to no longer be in scope, and thus cannot be accidentally used.
Exception enabled code used to be slow, especially in GCC. These days it is much faster.
Yes, exception-enabled code is fast, until you actually have to process an exception. Merely enabling exceptions typically also doubles the footprint of code, without a single actual exception handler or exception thrown, simply because the compiler has to emit cleanups to unwind each and every stack frame, from each and every scope, in case an exception *is* thrown somewhere. Code to actually work with exceptions then add on top of this.
Given the outrageous expense of processing exceptions, anything that resembles normal or non-exceptional should never be handled as an exception. This includes things like TCP FIN (*all* connections end with it), EOF, poll timeouts, event processing (yes, I've seen it done!), not to mention regular C library and syscall error returns. I've seen people wrap things like mkdir() with an error check that throws an exception if it returns -1. Well, the program that used the wrapper of course used mkdir() all over the place just to make sure a directory existed (and to create it if didn't). So it got an exception in the TYPICAL case. Every call point was then wrapped in an exception handler that, well, did nothing! I consider that borderline incompetent. But I digress. On the other hand, the things that really ARE exceptional - heap corruption, out of memory, deadlock detection, out of file descriptors, thread creation failure, unmounted root fs, kernel resource errors, etc, etc - there's nothing to be done about. And any attempt to do anything at all will likely aggravate the problem. In the case of say a heap corruption, or stack overflow, it's unlikely that attempting to process an exception is going to do anything more than crash. And serve only to make it harder to debug because it crashed somewhere in a runtime routine that walks a table with links to procedures to unwind the stack, not the place where it was actually first discovered. You're IMO better off simply calling a panic routine that stops then and there rather than attempt to do anything else that would only aggravate the problem further (possibly leading to real data loss - "oh, a corrupt heap... lets try to save the document before bailing" or similar brilliance). Between the two, and given the footprint overhead and the inevitable abuse in the absence of adult supervision, it's best not to use them at all. The sliver of cases between the two where exceptions are useful is so narrow that it's no longer meaningful formalism.
Note that you typically don't get away from the error check of a return value. Instead you move it further down the tree, to the position where you conditionally throw the exception, instead of at the return of the function. The difference is there, but trivial.