Seeking Multi-Platform I/O Libraries?
An Anonymous Coward asks: "I'm just getting ready to plunge into a new project, and joy of joys have been given complete freedom when it comes to the implementation language - so long as the program will build and run on both x86 Linux and Windows. Now, I don't need a GUI, this is systems stuff only (processing binary executables in fact, so lots of bitfiddling and big nasty algorithms over hairy data structures) so pretty much all I need are standard IO libraries. C is currently at the top of my list..but what other language should I be looking at? I'm happy to learn a new one, and have the go ahead to do it..like I say, they want absolute speed. Can someone suggest a better language? C++ is out, it does come with a speed hit (using C++ properly anyway, not as a
souped-up C). If I'm gonna take the speed hit, I
may as well consider something like Ocaml which might let me claw the speed back with better algorithms and data structures.."
[ in my best announcer voice ]:
Let's get ready to RUMBLE!!!
--
Evan
"$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien
You will likely find that algorithmic improvements will gain you more speed than IO library efficiency, as long as you avoid VB. Heck, I'd even strongly look at Java with a good JIT. Don't write off anything 'till you've tried it.
meh.
I know it's a bit of a stretch, but consider Python. Prototype the heck out of the system in Python, profile the application, then recode the bottlenecks in C. Use SWIG to generate your interfaces. Easier to program, easier to extend, easier to read/maintain. Shorter programming time, too.
You'll be happier, your fellow programmers will be happier, your successor programmers will be happier, and the chewy parts of your code will still be really fast. Think about it.
Hi, yes! I have a similar question: What is the best language for What I Want To Do? It needs to be able to handle floating point numbers. I don't want to use perl, because it's slower than C but messier than Smalltalk. Also, should I use vi or emacs to edit my source?
But seriously.. Every language provides standard support for file IO, unless it's totally half-assed*.
If you actually want to get helpful answers, you might provide a little more information. For instance: How much analysis will you be doing on the files? How much data are you dealing with? Is this probably going to be blocking on input all the time, or does run speed actually matter? How large and/or complicated will your program be? Does cost of deployment really matter?
* Or halfway totally assed, or whatever.
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!
Yes and no. Python is a nice language, and potentially useful. However, where I work we have a "legacy" system written in python with SWIG (interacting with C++).
/. and would be very unhappy with me discussing this.]
The problem is that under large workloads (which is normal for us) you end up with python spending more time marshalling and unmarshalling objects. It's a PITA. I blame this mostly on SWIG (which I am NOT a fan of. Don't get me started on what the maintainers consider good development practice.)
Python's a great choice if you can do it all natively. It's also a great language to prototype in and then "translate" to another language like C++ or C or Java. (depending on task and preference.) But I wouldn't do the python+swig thing.
[Note: I'm only posting anonymously to protect my identity. There are certain political factions at work that read
Yes, I would really recommend O'Caml. Here's why:
If you just write the same program you would have written in C, the speed will be quite good, probably about 20% CPU-slower than C. (And if your program is IO-heavy, you might not notice this at all.)
If you have any sort of limited time or interest (as most projects do), you'll be able to write a much better program in O'Caml than you would in C, because:
- Because it's safe, you won't need to ever spend time tracking down or debugging core dumps or memory leaks. Because it's statically typed, a large percentage of bugs are caught at compile-time.
- If your program is interacting with the network, you won't need to worry about buffer overflows, format string bugs, or most of the common security problems.
- O'Caml has a much richer core language than C, with support for algebraic datatypes, pattern matching, higher-order functions, threads, modules, and objects. You can do a lot of great stuff with these.
- O'Caml has a nicer (though not as nice as, say, SML) module system, which keeps your program from getting unmanageable, and helps isolate faults to a particular module.
And by better, I also mean faster -- development wisdom says that algorithms and data structures are what matter most, not just the instruction-level efficiency of your code.
Of course, if you don't know the language, then it will have a higher startup cost for you. But I think it's worth it; you'll learn a different programming style that can help you think in new ways even when you're writing code in Old School languages. =)
Of course, it has a portable IO lib - just because the corresponding module for more low level stuff is called "Unix" doesn't mean that it isn't available on Windows as well, with some restrictions.
Programming can be fun again. Film at 11.
I'm really very curious why you decided that c++ is out. I understand that the common (mis)perception is that c++ is slower - but let me ask this: Have you ever benchmarked it? If not, then I strongly suggest that you don't discount c++ out of hand. It has the cross-platform io facility of which you speak (streams), already has all the (completely debugged) algorithms and advanced data structures. Look, nothing is going to be faster than c (except for hand-tuned assembly) - If you absolutely need every little bit of performance, then don't bother with a language other than c. But, if you're looking for a language nearly as fast, with a complete template and streams library, that's portable, then you ought to seriously consider c++. (btw, I've written extensive projects in c++ (25000+ lines) - There isn't much performance difference, and the benefits to using it far outweigh any other penalties.)
If you know C best, use C. If you know Java best, use Java. Ditto for Perl.
Really.
The better you know a language, the faster you will be able to write your app, the more optimized it will be, fewer bugs, etc. This is common sense.
(I was going to have a really smart-assed comment on Logo, but I'll reserve that for later....)
This is more than just a language question. It looks like you're starting to get the standard responses already for Java, C++, etc.
But all of these opinions presume that you're fairly experienced in these languages. Ignore them.
Language experience/familiarity is THE factor here, so don't discount it. Someone who has been eating and breathing Java would likely produce speedier code than someone who is just learning C, for example.
Your employer/client wants SPEED. This project involves hairy and complicated bit fiddling. I would suggest NOT using this project to learn a new language, for the risks outweigh the rewards in this situation.
If you choose to use a new langauge for this critical job, you're setting yourself up for disappoint. Do not forget that you're going to have to go through the all the growing pains associated with a new langauge. You're going to spend weekends tracking down (and learning from) all the newbie mistakes one makes with a new langauge. You are going to encounter new and unfamiliar bugs at all levels - logical design, physical design, semantic, syntactic.
Do you really want to spend your nights and weekends figuring out what the heck is throwing some particular JAVA exception seamingly at random? Why your C++ function template specialization is being ignored?
Learning a new language is exhilarating, but that will quickly turn to FRUSTRATION when you run into that weekend-long show-stopper bug.
With your product being measured by performance, and with deadlines looming... When it comes down to crunch-time, I think the choice is OBVIOUS!!
Choose a different, fun project to learn a new language. But for this product you're delivering, I would encourage you to stick with the tools you know and love.
Best,
Captain Abstraction
#include <stdio.h>
Says it all really.
Cheers,
Ian
Ok, so Java isn't the greatest at performance, but it is cross-platform.
Apache 2.0 is based on an excellent platform independent IO library (and many other cross platform data types, data structures, etc), the Apache Portable Runtime. It's written in C, and it's fast.
http://apr.apache.org/
Your "speed" priority, and the binary processing bit, got me almost sold, and then
I saw O'Caml!!
You quiche eating wanker, how COULD you forget assembly? Isn't that what programming is
all about? And WHY are you comparing C to O'Caml, a fine assembly macro language, to
shity ML dialect used by equally hard-wanking mathematicians and abstractly thinking
creatures? If these wankmaticians knew how the world operated, they would not
have invented recursion let alone APPROVED of inductions as a sane, corner stone
princible in their so called "art". Induction is only possible as long as the
the "counter" register can hold your index, and recurssion is the crackwhore narcessistic
twin sister of iteration (there is nothing she does, iteration can't do with
a well placed label and a jump.)
Listen to me son, read Quine, Boole and DeMorgan, get the manual to your processor,
and "script" at the level of the ONE TRUE ABSTRACTION LAYER.
How can you use it improperly? C++ is an object capable language, not a strict object oriented language. If you want to use objects, then fine. If not, then please don't.
Object oriented development is a tremendous thing, useful for many things, and a marvel of overcoming complexity through abstraction.
BUT, OOP is not the solution for everything. There are many problems that don't need an object structure, and should be written another way. Above all, drop the notion that C++ should be used only a certain way to be proper. The latest cool feature of C++, the Standard Template Library, isn't even object oriented - it's GENERIC, because that type of programming just was the right thing to do for that library.
If tits were wings it'd be flying around.
Have you actually tried using Python? If you have, it's probably for not enough time or using the wrong tools
1. Using indentation instead of braces kills the religious "coding convention" wars before they have a chance to start. It's easy to read, it makes what you read and what the parser read consistent (Never chased a mismatched indentation/braces case, have you?), and it just plain works. Where did that function start? Any editor worth its while can tell you that, most of them already have a macro that does this. If you ever used Scintilla/SciTE you'd probably never go back to "find matching" only style editors unless you were forced to - collapsing functions makes a lot of sense even in the curly brace world (more so in Python's indentation world).
2. There are add-ons that can enforce that, but that would be missing the point. The Python interpreter and language specification goes to some length to catch this kind of errors, and although it's a long way from e.g. C or Java, it caters for the common cases. Typos in long variable names may create annoying bugs, but ones that are _always_ easy to identify and fix. True, they wait for run time rather than compile time; personally, the number of bugs of this kind that I get is consistently low enough for this not to matter (and, since Python code tends to be an order of magnitude shorter than any other language except Lisp or APL, it's more than worth it. Plus, there's a Lint for Python if you insist). Variable declarations are NOT free documentation. "Object my_object = new Blah();" is not more informative than "my_object = Blah()". It's the variable's name that's the documentation, rarely it's type.
3. Oh jesus. C++, Java, SmallTalk, LISP and just about any other language does this too. What language are you using? Plus, try scintilla and you'll be amazed at what a GOOD language sensitive editor can do (for any of the above languages).
Try TCL.
For me, using TCL my performance increased by 60%
(especially when using its [Incr TCL] OO Extension)
TCL works on most unices, Windows, Mac, VMS, Palm Pilot...
Tk graphical library is so successful that other languages
(perl, prolog, python) are using it.
If all this should have a reason, we would be the last to know.
QBasic.
A smart C++ programmer can use template metaprogramming in a library like Blitz++ to automatically build code optimised for the job. To write the equivalent code in C is possible but it's much more laborious and harder to maintain.
There are good reasons not to use C++. Performance isn't one of them.
-- SIGFPE
You can choose to use a general-purpose language which has a good spread of capabilities, or you can go with a best of breed language in the area you are trying to work in.
For general projects, I use a mix of Python and C++. I'd say the best of breed languages for text would be Perl, math would be Haskell, and for getting down to the metal would be Assembler.
For what you are trying to do, the no-brainer choice would be souped-up C, i.e. C which uses a few C++ features to make your life easier.
"Well, put a stake in my heart and drag me into sunlight."
Some of theK programming maxims are that memmap is better than read/write (the native file I/O is memmap), operating over bulk data is better than scalar data (the language is built around bulk operators), and terse code is good.
There is a warning, though. K is very elite and may be too elite for you (it was for me at first), but it is very eay to learn.
No one's mentioned Borland's tools, but I think they'd fit the bill. Borland has great compiler technology, and it will compile and run cleanly across Linux and Windows (possibly with a few {$IFDEF}s). It has an I/O library that's as capable as C's (maybe a bit more wordy sometimes). Developing and debugging in Kylix is *much* quicker, in my experience, than using gcc/gdb. It's truly compiled, the compiler is lightning fast, and the integrated debugger is quite a bit more efficient than gdb based solutions.