Ask Slashdot: Best Programs To Learn From?
First time accepted submitter camServo writes "I took C++ classes in college and I have played around with some scripting languages. We learned the basics of how to make C++ work with small programs, but when I see large open source projects, I never know where to even start to try and figure out how their code works. I'm wondering if any of you have suggestions for some nice open source projects to look at to get an idea for how programming works in the real world, so I can start giving back to the FOSS community." Where would you start?
The more lines of code the more difficult to get started as a general rule. Just find a small library that provides support for something you have an interest in. Tinker with it.
Two of my imaginary friends reproduced once
Nothing more to it, the gradual expansion of your own project will teach you the techniques you need... or you'll drown.
Off you go
Oh yeah, Good ol' BSD kernel. The best one in town.
I have often wondered the same thing. People tell me, "read the code and submit patches!" It may sound like hand-holding to experienced developers, but many new coders could really use an introduction to becoming a part of a community around a project.
And then do the opposite.
Works every time.
I suggest diving into Node. It is written in a very competent way, it's fast, small, efficient, nicely documented, does the IO correctly so no messy blocking function calls and threads synchronization madness, and is pretty young so the code base is not too big for one person to understand. Thanks to npm it is also very easy to write modules that are small, clean and have minimum boilerplate code so it's not like writing Java. There is a lot of code to be written so you may find writing and publishing your own useful modules pretty soon. Good luck!
Karma: Positive (probably because of superiour intellect)
For C++ I would suggest Qt.
For C I would suggest Minix3.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
If you want examples of 'real world' programming, take a bowl of spaghetti, add some additional ingredients that you wouldn't normally expect to see in spaghetti, and then fling the whole thing against a wall.
That's what the vast majority of modern day code looks like, especially if the organization that wrote the code tried to outsource the development effort 'to save money' at some point during the dev cycle.
Besides looking at the code of others be sure to look at the code you wrote a year ago and haven't looked at since. You should learn something about good comments and documentation. You probably will have ideas on how to better implement things. There is some truth to the notion that programmers don't really like the code they wrote for a project until they have thrown it out and rewritten it from scratch for the third time.
A tip I always give:
1. Start writing something you want. (It'll keep you interested)
2. Google the SMALLER hard parts (String Parsing, data models, misc functions, etc)
3. Use that code. (No one is going to blame you for copypasta on your own project.)
Eventually you'll understand how the copied code works. After a few projects you end up writing your own version because you're better than "that guy you copied from".
Step through the code with a debugger too, of course. I find that "ok, I'm gonna try to make the code do this", i.e. starting with a specific goal, setting breakpoints, and stepping through the code is the best way to get comfortable with an unfamiliar codebase, no matter its size.
Very true. With very large projects, this really is your only option.
Um the "kernel" (by which I assume you mean Linux) is not written in C++.
It should be, but it isn't.
I mean, it's full of objects with derivation and virtual functions, and structs on which constructors and destructors have to be called for everything to remain in one peice. Seems odd not to use a language which is every bit as efficient, has a familiar syntax and yet does a large number of common tasks automatically and without errors.
Oh, and the other thing is that linux has the vtable inside the classes rather than a vptr, presumably because they syntactic overhead of a vptr is too high. C++ is by default significantly more memory efficient in this regard.
SJW n. One who posts facts.
1. Find a program that interests you.
2. Find something you want to change about it.
3. Hack away.
4. Find a different program that interests you.
5. Goto 2.
Trying to understand all of the code in a large project may be an impossible task, and it's frequently not necessary if you just want to make a simple change.
how programming works in the real world
There is no such thing. Each project will have its own structure and idiosyncrasies, and even after looking at 10 of them you will only understand those 10, not "the real world" in general.
http://www.aosabook.org/en/index.html (And no, I'm not affiliated - just a fan)
I think you already do.
This is the difference between C and C++: in C, whatever the code of a function says it does, it does; in C++, whatever the code of a function says it does is subject to be changed by templates, operator redefinitions, etc. Because of this it is impossible to make small changes without reading and understanding the entire codebase first.
Basically, if you want to get involved in a large C++ project, you either have a tour guide or very good documentation or make the huge investment of learning the entire superstructure of the program before making any changes to any part of it. It's kinda interesting how C++ encourages this kind of greater dependency between different parts of a program than C.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
The Linux Kernel is mostly C, with a little ML on critical parts. I think some modules are written in C++ though, and if you want to do this I think the kernel is not a bad choice and recommenced starting with a simple text driver, you can find examples for this around the net. Start with a simple module written in C, learn how to build it correctly and fit it into the kernel, then adapt it to C++, then publish the source on some web site. Presto- you've just given back. Then tackle a real task, maybe a usb driver to some fob or doohickey. Move on from there to ... i dunno, contributing to some oss robotics thingy.
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
The special treatment that C++ gives to constructors and destructors makes things harder, though. They don't return any value. If the constructor fails your only option is to throw an exception. But C++ exceptions make code execution slower. Another alternative is to check the object through a method after construction, which a lot of STL objects do, but that's kind of messy.
Don't get me wrong; I program in C++. But this is one of the dilemmas that I've faced and researched, and I still don't know what to do about it.
The Linux kernel is a piece of good crafted code in comparison with most desktop projects. This is mainly due to working community processes which Gnome or Freedesktop.org lack. Before someone marks this as flamebait. There is and was a lot of bad communication between parts of the Gnome community e.g. between Ubuntu and Gnome, between users of Gnome and the GnomeShell team etc. And arguments like: "It is open source take it or leave it" where send out, which is nonsense as software is not only designed by the programmers and designers, but also shaped by user needs. So the feedback of users is important. That's why Ubuntu became so popular in the beginning. Lately that changed a bit and therefore their loosing user base.
However, there are projects with a good community model such as the Linux kernel (even if Linus is sometimes a little harsh). The Apache projects seems to have a good process too and Apache HTTP is a smaller project so it might be a good starting point.
The best thing right not to help the desktop FOSS community would be real community building. There are some efforts with common conferences from Gnome and KDE, but there are still big issues for users to be heard by the developers. So help building community processes is even more important than coding.
the most important thing is to have techniques that allow you to find your way. the language doesn't actually matter, but it does definitely definitely help if the code is documented in some fashion. i tried, for example, to work on fontforge with my usual techniques, and the code was so incredibly dense and uncommented that it was absolutely impossible to understand. but, exceptions aside, here's a starting point for getting into large projects:
* use vi. do not use graphical editors. do not use emacs :e # to go _back_ to the file you were originally editing (after using ctrl-])
* get a damn big monitor (or 2 monitors). open xterms at 80x60, as many wide as you can get.
* use a multi-window desktop manager (i use fvwm2 and i run a 6x4 grid: that's 24 desktops.
* be prepared to open (and background) up to 200 simultaneous files, across multiple windows.
* make sure that you open the files from the *root* of the project.
* open the files "by name", explicitly, so that you can do "jobs | grep {filename}"
* run "ctags -R" - it is your friend. then use ctrl-] on a function you don't know, and read about it.
* remember to use
* be prepared to print out the ENTIRE codebase, and flip through it, off-line, very very quickly.
* be prepared to do page-down, page-down, very very quickly, through as many files as you can stand
the main thing to do is to get a vague map of the code into your subconscious, as quickly as possible. then you will go "i've seen that before..." and you stand a chance of being able to hunt for it and find it.
you *don't* have to memorise the entire codebase - you *don't* have to even understand all of it. but you *do* need to at least have the techniques which will allow you to jump to wherever it is that you want to go.
ultimately, though, you need a goal. what, exactly, is it that you want to achieve? if you have no goal, you are pissing in the wind.
i added NT Domains Security to freedce - that's a good, simple goal. FreeDCE is 250,000 lines of code, and very well laid-out. it was therefore quite straightforward to add 6,000 lines of code to do NTLMSSP. took a couple of weeks.
i added python bindings to webkit - that's a good simple goal (ok, it was horrendous, requiring over 12 different skillsets, including c, c++, python, perl, autoconf, gtk, python c modules, IDL files parsing - the list just went on and on). webkit is a massive project, and also very well laid-out and structured. the first version of the python bindings took about 8 weeks, and the 2nd (faster, better) version took only 2. the reason why the 2nd version took only 2 weeks is because i hunted down the mozilla xulrunner IDL file parser, hunted down python-gobject's code generator, adapted the xulrunner IDL file parser to understand the webkit IDL file-format (2 days), then spent the rest of the time hacking codegen.py to spew out the data types from webkit, and to create a standard python c module.
so you say "you don't know how to get familiar with a free software project", well, i am not - i wasn't familiar with webkit, but that didn't stop me. i wasn't familiar with xulrunner, but that didn't stop me. i wasn't familiar with python-gobject's codegen, but that didn't stop me. i just got on with it, and just trusted that the surrounding code would do its job, and trusted that the bit of code that i picked up could be adapted.
so in many ways, tackling a large codebase is more about overcoming your own fear and feelings of inadequacy. sometimes not even i can do that, and sometimes i can.
Check out WiBit.net. It's not really an open source project; but it is a site that, for free, teaches programming including C++ (also C, Objective-C, soon Java and C#). We have a forum where users can help each other learn. It's not a big thing, but that's one way to give back to the community. Not just on WiBit, but helping others learn what you have learned is a great way to give back overall :-)
Also by getting into a learning site you can meet others who are like you: they know a bit, but want to be involved with something bigger. You can get your own effort moving and maybe create your own open source project. We have a guy in our forums working on an open source game with other users of the site. Check that out here.
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
The special treatment that C++ gives to constructors and destructors makes things harder, though. They don't return any value. If the constructor fails your only option is to throw an exception. But C++ exceptions make code execution slower. Another alternative is to check the object through a method after construction, which a lot of STL objects do, but that's kind of messy.
Well, bear in mind that what was true 5 years ago is not necessarily true now.
The C and C++ non RAII ones are pretty similar in structure:
struct f;
if(!init(f, params))
return fail;
versus
class f(params);
if(f.fail())
return fail;
I'm not convinced the C++ one is messier than the C one.
now for the RAII one, there is an interesting thing about speed. The C version has the error logic mixed up in the control logic. Modern C++ compilers put the unwinding logic in other pages. The C++ one can be faster if errors are rare, but slower if errors are common.
Exception enabled code used to be slow, especially in GCC. These days it is much faster.
SJW n. One who posts facts.
Maybe you're doing too much in a constructor? You could offload anything that can fail into a method that can return a value.
That's kind of the point of a constructor. If it *has* to be called to make the object valid then putting in the constructor makes it impossible to forget.
But I do Java/C#, so I'd throw the exception and tell the customer to buy more hardware.
Not even that. Whole classes of bugs simlpy won't happen, leaving more time to optimize the rest of the code.
SJW n. One who posts facts.
Aspirations of tackling a large coding project are usually less than imagined. Most successful projects start as the kernel of one programmer who strikes the landscape and envisions how things will work. Then they flesh it out to a degree and move on or otherwise becomes involved in managing the code base. At some point they leave or move on. So the majority of new or junior programmers end up interpreting that code base, usually with little or no documentation. In the best of all worlds they attempt to internalize or get inside the mind of the missing programmer.. and "figure things out" for themselves and extend it.. often making assumptions and mistakes the original programmer might laugh at or never intended.. but that is how the code lives on. Programming Libraries, APIs and application sdk's are all formalized examples of this.. sometimes with better documentation, or supplemental documentation and work even better as learning tools. In the distant past Language standards served the same purpose before formal libraries were common, and to a degree still do until new libraries for new languages are built up.
There are two fundamental reasons for wanting to work on "big" coding projects anyway, one is that your curious or have the need to extend or work with an existing product, application or code base.. that probably isn't well documented, or doesn't have "learners notes" or "examples" of how to get things done.
The other is the perception that it is career enhancing or will give one an edge over less experienced programmers.
For the first, the best way is to tackle intimidation with small projects and just going for it.. experience good or bad breeds confidence and eventually success.
For the second, a whole lot has to do with personality and experience and confidence, part of which can come from working with big projects and code.. but the personality is better worked on in a coffee shop and basically communicating with people in a sociable way.. even on the internet through email.
Open source projects are a great way to collaborate and work on all of the above in a safe non-litigious environment, and its high profile.. and you can find people willing to help.
Schools and colleges are a more traditional way, but cost a lot and there are some barriers to entry put in place by convention and societal expectations.
The interface between the real world and the digital hasn't changed since Woz and Steve Jobs invented the Apple computer or Bill Gates commercialized the free software disk in the back of a book. How can this tool, software, help me do things that I couldn't do before?
In the early days it was a lack of hardware that forced us to reach with our minds and know the capacity of the code better than the designers of the language.. more with less. That world is gone.
Today another economic, the embarrassment of hardware riches, tons of ram and lots of hard disk space.. but a limited human lifespan.. causes us to re-evaluate the old rule of "re-invent it yourself -- you'll learn something".. we just have the perception that to make something worthwhile.. it has to be totally and completely unique. Rather I think we should "get over it" and use Libraries or examples as if they were hardware building blocks and move on.. if you need a smaller code base then "that is the project" build from something that's already working.. and don't try to place magic crystal bricks in perfect celestial harmony.. before you get started.
There is an example of this happening before.. when we used to pontificate over RTL versus TTL which circuits should I use to build a Flip-Flop to build my microcomputer.. now we're moving into the age of when the Mobile computer replaces the Microcomputer and the Data cloud replaces the Database. What is a Mobile computer? It's not a PET Commodore anymore.. I don't even think I know.. its still being made.. but the effect on Apple and Microsoft are being felt.. safety seems to be in the cloud.. but maybe not.. what about distributed emergent data stores? Like bittorrent, dropbox, and programs that aren't programs but organize our lives like calendars? eh..
I finally found you. I hate you. Not personally, but this kind of thinking. A TCP class raises a disconnected exception, the stream class raises an interrupted exception, the object class raises an error exception, and the application says "There was an error." What kind, and how do I fix it?
OOP error handling and code reuse can be done well, but it generally is not. The basic idea of a "return code", giving some sort of information or context about the error, is very important. Even if it's just preserving the exception information to bubble up.
I've collected probably a hundred Microsoft-specific error messages that don't mean what they say they mean. They add helpful text to say what you might fix, but that's a red herring. There is an underlying error which is caught but not bubbled up, and it leaves the user with little or no idea what to do.
You have to have the idea, if not the implementation, of returning something to the user.
And, I take exception to your assertion that exceptions don't make the code slower. Each class wraps its code in try/catch and has to deal with fairly complicated Exception objects in many cases. Did the file open? fopen() returns null, and you can get more information if you want it. OOP says you have to make an exception object and run catch code, and go up the stack and to the exceptions there.
Code that experiences no exceptions will not be noticeably slower, but code that relies on exception processing to try alternate methods or re-try will be a lot slower. This from someone who looks at C++ code at the (dis)-assembler level.
Um the "kernel" (by which I assume you mean Linux) is not written in C++. It should be, but it isn't.
There are reasons the kernel doesn't have any c++ in it (link is about git, but same deal for the kernel).
If "the real world" means the corporate world, do this. Take an application you don't care about and don't know how to use, and assign yourself a bug to fix, and give yourself a deadline pulled from a RNG.
Code is developed this way:
Start developement
Shrink the team
Fire all but 1 guy who does all the maintenance
Bring in contractors
Add a few people
Shrink the team
Fire the 1 guy who knows everything
Scramble to find someone who knows the application
Bring in contractors
Select any step above at random
I'm not trying to be funny. End result is quirks, inconsistencies, inexplicable code blocks, bugs, performance issues, and all kinds of other bad things.
There are reasons the kernel doesn't have any c++ in it (link is about git, but same deal for the kernel).
Those reasons are so terrible. It amount to:
It's all FUD, BS and lies.
SJW n. One who posts facts.
Forget C++, it's becoming outdated and too complicated. More platforms are using Java and HTML5 now, it's the future of apps for PC's and Tablets.
-- By all means let's be open-minded, but not so open-minded that our brains drop out.
And if the code that *has* to be called to make the object valid fails, how do you prevent that object from being used when it fails? Can't use a return code as the programmer may simply ignore the return code and then blithely try to use the object that is now failing it's invariants. With the constructor throwing an exception, at least the code block that the variable was declared in will be exited, causing that variable to no longer be in scope, and thus cannot be accidentally used.
Look at the libstdc++ for GCC and some of the boost project code.
That code has production quality, is written in a style that actually utilizes c++. Beware that c++ recently got quite a few new features that have not gotten too much usage in libstdc++ and boot you may want to read up on that separately.
There is an *excellent* FAQ on most of the fine-grained aspects of c++ at http://www.parashift.com/c++-faq-lite/
In general, stay away from tutorials on the web, they are mostly written by people who have little or no experience and thinks they should teach the world about for loops or whatever because they just made one that doesn't crash themselves.
As a side note: that goes doubly for javascript, a much better search term to find quality code is ecmascript, unfortunatly there is no such good discriminating search-word for c++.
SLOGEN [ http://ungdomshus.nu : Sebastian cover music]
Exceptions actually add a lot of overhead (it used to be 20%+, I haven't benchmarked in a few years). And while they're useful in some circumstances, they can also lead to a lot of spaghetti code. They also require a very specific form of programming- everything needs to be a smart pointer, or you will leak. In addition, they aren't really all that readable- they're a glorified goto where the label can be multiple levels up. In general they should be avoided, and cases where they are used should be evaluated carefully.
I still have more fans than freaks. WTF is wrong with you people?
There was some discussion of this on Reddit a while back.
I second Mike Pall's comments. The Lua codebase is relatively small, and your puny brain can probably understand all of it from top to bottom. Other systems, like GCC and GHC, would be much more challenging to understand completely.
Exception enabled code used to be slow, especially in GCC. These days it is much faster.
Yes, exception-enabled code is fast, until you actually have to process an exception. Merely enabling exceptions typically also doubles the footprint of code, without a single actual exception handler or exception thrown, simply because the compiler has to emit cleanups to unwind each and every stack frame, from each and every scope, in case an exception *is* thrown somewhere. Code to actually work with exceptions then add on top of this.
Given the outrageous expense of processing exceptions, anything that resembles normal or non-exceptional should never be handled as an exception. This includes things like TCP FIN (*all* connections end with it), EOF, poll timeouts, event processing (yes, I've seen it done!), not to mention regular C library and syscall error returns. I've seen people wrap things like mkdir() with an error check that throws an exception if it returns -1. Well, the program that used the wrapper of course used mkdir() all over the place just to make sure a directory existed (and to create it if didn't). So it got an exception in the TYPICAL case. Every call point was then wrapped in an exception handler that, well, did nothing! I consider that borderline incompetent. But I digress. On the other hand, the things that really ARE exceptional - heap corruption, out of memory, deadlock detection, out of file descriptors, thread creation failure, unmounted root fs, kernel resource errors, etc, etc - there's nothing to be done about. And any attempt to do anything at all will likely aggravate the problem. In the case of say a heap corruption, or stack overflow, it's unlikely that attempting to process an exception is going to do anything more than crash. And serve only to make it harder to debug because it crashed somewhere in a runtime routine that walks a table with links to procedures to unwind the stack, not the place where it was actually first discovered. You're IMO better off simply calling a panic routine that stops then and there rather than attempt to do anything else that would only aggravate the problem further (possibly leading to real data loss - "oh, a corrupt heap... lets try to save the document before bailing" or similar brilliance). Between the two, and given the footprint overhead and the inevitable abuse in the absence of adult supervision, it's best not to use them at all. The sliver of cases between the two where exceptions are useful is so narrow that it's no longer meaningful formalism.
Note that you typically don't get away from the error check of a return value. Instead you move it further down the tree, to the position where you conditionally throw the exception, instead of at the return of the function. The difference is there, but trivial.
Here's some more FUD, BS and lies from other people who, like Linus Torvalds, don't know what they are talking about and have zero credibility.
Fascism should more properly be called corporatism because it is the merger of state and corporate power. -- Mussolini
It is extremely well documented what every function in there is supposed to do, and most of the functions are actually written in C. Though I suggest you avoid printf. You'll learn things about C that you really can't learn any other way.
Some of the other standard unix utilities are also pretty good. I seem to recall that reading the source to awk and vi were very enlightening.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
If "being uninitialized" is a well-defined state for the object, then I agree. For example, you can have an empty container, an empty string, or a disconnected TCP "connection" (fd==-1 is a natural state, and instead of init() you have connect() or bind()). OTOH, I frequently discourage my co-workers from reusing value-objects. It's just as easy to say homeAddress=StreetAddress(street,unit,city,state,zip) as it is to say homeAddress.init(street,unit,city,state,zip). I have seen the latter pattern contribute to bugs when somebody added a member variable and didn't think to add it to the init() method, whereas nobody seems to accidentally omit it from the constructor.