I think it could work. Why don't you implement it and let us know how it turns out? The basic implementation is not that complicated. I did something similar for a class project this semester, although I used boosting with decision stumps instead of "Bayesian" learning. The tricky part is selecting the features. There isn't as much text, so you'd also want to look at things like position in the web page, image size, accompanying tags, etc.
I can totally believe that the SBC is using dirty tricks to sabotage muni broadband. So I checked out the TriCityBroadBand site. Without cable, they will offer 2 Mb/s for $50/month, assuming their market share projections come true. They don't say if there are extra equipment or regulatory fees.
Here in the Bay Area I'm getting the same deal from Comcast. (I think the service is officially rated at 1.5 Mb/s for $50/month, but I get 270kB/s downloads.) I don't think the arguments that private enterprise is inefficient or charges high prices because of the need for profits are valid. Apparently, the argument that government services are inefficient isn't true either, at least in this case.
The TriCityBroadband site says that DSL is currently 192 kb/s for $40/month, so their proposal is definitely a great deal for them. I only wonder why some entrepreneur hasn't already installed fiber there if it's so much better than DSL. I notice that they are using a bond to finance the project. Muni bonds tend to be at lower rates because they are tax-free. Is that enough of an advantage to make the difference?
A commonly held notion, but not really well thought through. Sloppy programmer accesses through bad pointer in C. OS traps task. Sloppy programmer accesses beyond array bounds in MySafeLanguage. Runtime system traps tasks. In either case, your program "crashes".
Only sloppy programmers write bugs, huh? Anyway, that is all true, but there's also this situation:
Human, imperfect programmer accesses memory through bad pointer in C. Has no effect during testing. Causes crash/data loss/security hole in production at customer site.
Human, imperfect programmer access memory through bad pointer in safe language. Causes crash during testing and is fixed before release. If it isn't caught, when it crashes in production it will cause a controlled, traceable shutdown (assuming no bad exception handlers!): less chance of data loss or security holes.
We'll give 5 bonus points to whoever successfully answers the question "What is INTERCAL?"
The most obfuscated programming language ever. Don't tell me they're letting kids out of high school without having read the Jargon File these days.:-) (Actually, I didn't read it until I was a freshman in college.)
Its amusing that there is a mathematical way to prove that a piece of code (or any program) has absolutly no errors. However, it takes exponential time to do.
No, it's actually undecidable. For any nontrivial property P, there is no general algorithm for checking programs for P.
There are algorithms that work on most programs, (on others they fail to terminate) and as you point out, it takes too long to do a full verification of a full program. But you can often prove the absence of certain specific kinds of errors, like null pointer deferences. It's still slow, but doable.
You also have the problem of generating the specification (and a full specification of Word would be huge). Do you run a specification-checker to prove that it's correct? And what about the specification for that?...
Software crashes because it's complex, yes, but that's just part of it. Jets are complex too. So is the Space Shuttle. Cruise ships. CARS are pretty complex. While all these things do suffer catastrophic failure from time to time, it is far from the norm. Defective cars get recalled. Space shuttles ALL get grounded at the mere possibility of defect.
If your car shut down unexpectedly, it could kill you. It's perfectly sensible to put much more effort into debugging a car than debugging a PC.
Also, the average car now has 50 computers. And they don't crash. Compared to the software on those computers, the rest of the car is buggy and unreliable!
I remmeber years ago having a conversation with an IT manager at IBM. We were talking about the inability of computer programmers to make their code foolproof. His point was that we don't see problems like this with proprietary hardware.
Hardware definitely has bugs. CPUs always have bugs, although only a few of them have been a really big deal. Hardware tends to have fewer bugs because it is simpler than software. Also, it's much more expensive to patch hardware than software, so there's more incentive to get it right before shipping.
Yes, NT5+ is very stable. MS is working on the driver problem. SLAM is a tool for verifying drivers. Given a requirement, e.g., after acquiring a kernel lock the driver must release it exactly once on all control paths, and some driver source code, SLAM can find all the ways the driver can fail the requirement. They have specifications for various driver types and are using them to test some drivers. It's a research project by the Software Development Tools group in MSR, but they're working on getting it stable and powerful enough to verify more drivers. If they can get it to work well enough, they'll supply it to hardware vendors.
You just reminded me of some more basic economics. The point of a monopoly or cartel is to make more profits by creating artificial scarcity. Intentionally writing slow code would be a way of doing that, but it only works if you have limited competition. Which may be the case with your bigger examples. Hard to say from my armchair. Certainly could have been true back when IBM was the only major manufacturer.
I tried it. Memory usage held steady. The JVM was doing a lot of collections, though, so it might be running slower than it could. Even in C, though, you wouldn't call malloc() in a tight loop if you wanted optimal performance.
What opportunistic garbage collection of C/C++? You mean delete and free? Get real! Personally, I wouldn't trust the average programmer to even collect garbage correctly more than half the time, and that doesn't cut it.
Just to support what you're saying, I've heard that it's a common practice for C++ server/web app development shops to forbid the use of delete. Instead they restart the app periodically so it doesn't run out of memory.
Incrementing the ref count is generally more than one instruction because you have to get its address first. You also have memory taken up by the ref counts, increased register and cache pressure for the increments, etc. And when you decrement it, you have to test for zero. I think ref counting would work well if you had a lot of short-lived local objects, but not so well if you had a lot of long-lived objects passed as method parameters. GCs don't maintain ref counts, btw.
I've heard that GC can work pretty well with C++, but you can't guarantee that there won't be memory leaks because you have to assume that a declared int might really be a pointer. It would be great with a type-safe variant of C++, though.
Java does virtual method inlining. That, alone, makes it potentially faster than C++. A slightly more intelligent garbage collector designed to decrease page-faults later, and presto, C++ is the slow language.
There's nothing stopping a C++ compiler from doing virtual method inlining. And many big apps benefit from custom memory managers, which are easier to put together in C++ than Java. Personally I don't like C++, but I don't think it will be the slow language any time soon.
As you indicated, the more time you are willing to sacrafice at load-time and at "just-in-time" time, the more optimizations you can do based on that runtime information. This just isn't an option with pure "static compilation."
Actually, you can do quite well by feeding profile information into an static compiler. There are still a few things you can't do that way, like inlining calls to DLLs. But the bigger problem with profile-driven optimization is that from what I've heard, not many developers use it. Something about not trusting really aggressive optimizers or not wanting to gather the profile information.
I have a feeling that as machines become faster, enough time will be able to be devoted to JIT work to make it as fast as, if not faster than static compilation.
That's a good point. I hadn't thought of that. Another thing is that the biggest problem for JITs is the startup time. Adaptive recompilation can help get good startup time while also aggressively optimizing hot spots.
I reran my cheesy prime number benchmark today and found it was actually faster in C# than in C++. MS has some really good compiler people working for them, so I'm only mildly surprised.
1. I'm sure I haven't fully optimized the code in all the languages. That's a fair test for the kind of applications I have experience with, where it's more important to write reasonably maintainable code fast than to fully optimize the source code.
2. True. My claim is that for normal code within methods, Java is almost as fast as C++. If you have lots of memory allocations and method calls, Java is definitely slower. According to the Programming Language Shootout, JDK 1.3 is 7x slower than g++ on heapsort. I'm actually surprised it's that bad. I wonder if it's better with 1.4.
3. I don't know much about scientific computing, but I understand that Fortran is still popular because the optimizers are really good for scientific programs. I don't expect Java to meet their needs any time soon.
I lost the code, but it didn't take too long to recreate. I did C# instead of VB this time. Results are here.
If you took a Java benchmark and ported it "directly" to C (e.g., virtual methods turn into function pointers) I think that would diminish the speed advantage of C. You'd still get speedup from having structs, optimizers that have 30 years of experience behind them instead of 10, etc. But if you rewrote the thing to an equivalent hand-optimized C program, I think C would still be a lot faster.
That makes a lot of sense. But what about a company that has a strategy of attracting talent by allowing their employees a lot of freedom, and has a policy of not monitoring or restricting employees' use of their computers? What should the company do if a tech reports one of the employees for illegal files?
Given that the tech saw the files, he did the right thing, but he also violated, or at least gave the appearance of having violated, the company's policy and hurt its business. It seems to me that the company can legitimately fire the tech in this case. (Of course, that would make it harder to hire good techs!)
This hypothetical example isn't necessarily relevant to the situation in the article. But it could be. Maybe the faculty at other schools will not want their school to hire this company because of what happened.
The only revolution it would start would be the one to create a police state to execute all those child-ogling, cop-shooting psychos. And 20 million innocent people. If it saves just one child, after all. Count me out.
I don't think it works that way, though. People who have "deviant" urges tend to start with small things and gradually escalate. In this case, it could go thoughts -> looking at normal pics of kids -> hanging around kids -> child porn -> sexual advances toward kids, etc. (I don't want to disgust myself by taking the sequence any further.) It's easier to take small steps, and having child porn available makes there be more (and smaller) steps.
Of course, not all who look at child porn take the next steps, and the law allows punishing people only for actually harming someone, not taking a step that might lead them to in the future.
If you wrote a program with mostly static methods, primitive types, and arrays, minimizing object creation and virtual method calls, it would run almost as fast as the equivalent C++ program. A couple years back, I implemented Sieve of Eratosthenes in C++, Java, and VB, and the speeds were comparable. IIRC, on Windows they were within 10% of each other. On Solaris, bizarrely, C++ kicked Java's ass, but Java was still only about 50% slower.
But if you write an object-oriented program, it will be slower. OOP tends to be higher-level, and thus faster to write but slower to run. All those memory allocations and virtual method calls take time, and they're difficult to optimize. Also, the standard libraries are kind of slow, because they try to be really general (e.g., synchronized collections).
It was somewhat shocking to me, but back in the VAX days I learned that software made by hardware manufacturers is as slow as they can get the customer to accept. That makes customers buy more hardware.
I've heard that too, but I don't think it's true. Basic economics says that if a product is more useful, customers will pay a higher price for it. That's why manufacturers try to make the hardware faster. Shouldn't faster software be just as good from their point of view?
However, that's only basic economics. I could believe that there were some strange economic conditions affecting DEC business strategy. Or maybe that DEC was just stupid. That would explain their nonexistence. But Intel spends a lot of money maintaining an optimizing compiler and helping software vendors optimize their code. So it doesn't seem to be true now.
JIT compilation, also called "dynamic compilation" has proven to be faster than normal (or static) compilation in an increasing number of cases.
Can you tell me about those cases? As a grad student in PL, I'd be interested in hearing about them.
What I've seen is that in many cases JIT systems can get within a few percent of static compilers, which is usually good enough. The advantage of JIT is that you have all the runtime information, e.g., target architecture, library code, etc, so you can do some optimizations that are impossible at compile-time. The downside is that the JIT has to compile really fast, so it can't do as many optimizations as a static compiler.
You have good intellectual company. In his famous talk "You and Your Research", Richard Hamming said,
You have to change. You get tired after a while; you use up your originality in one field...I mean within your field you should shift areas so that you don't go stale. You couldn't get away with forcing a change every seven years, but if you could, I would require a condition for doing research, being that you will change your field of research every seven years with a reasonable definition of what it means, or at the end of 10 years, management has the right to compel you to change. I would insist on a change because I'm serious. What happens to the old fellows is that they get a technique going; they keep on using it. They were marching in that direction which was right then, but the world changes. There's the new direction; but the old fellows are still marching in their former direction.
What I really want to know is why (so I can avoid settling in to dogma, or at least do it gracefully.) Is it psychological: scientists get attached to the ideas they work on; rational: they reason correctly that they will get more done by continuing less important work on what they know than trying something completely different; irrational: they reason that incorrectly; physical: brains just get less flexible...?
Hi, David. Congratulations on your win. Don't worry about Slashdotters, they're like this to everybody, out of envy. Hell, everyone who knows me considers me an excellent programmer, I'm not interested in contests, and I'm not an asshole, but I still felt urges to put you down. I'm just mature enough to recognize the source of those urges and suppress them.
I'm sure you know that the skills you demonstrated in the contest are only a small part of the skills needed by a professional coder. I'm also pretty sure you'd make a great coder. On the other hand, based on what I've heard about you, I suspect you're too smart to be just a game programmer or something like that, and that better things are in store for you. Of course, coding is still a lot of fun, good preparation for management or research, and a decent profession. Whatever you decide to do, thanks for stopping by on Slashdot, and best wishes.
I think it could work. Why don't you implement it and let us know how it turns out? The basic implementation is not that complicated. I did something similar for a class project this semester, although I used boosting with decision stumps instead of "Bayesian" learning. The tricky part is selecting the features. There isn't as much text, so you'd also want to look at things like position in the web page, image size, accompanying tags, etc.
Here in the Bay Area I'm getting the same deal from Comcast. (I think the service is officially rated at 1.5 Mb/s for $50/month, but I get 270kB/s downloads.) I don't think the arguments that private enterprise is inefficient or charges high prices because of the need for profits are valid. Apparently, the argument that government services are inefficient isn't true either, at least in this case.
The TriCityBroadband site says that DSL is currently 192 kb/s for $40/month, so their proposal is definitely a great deal for them. I only wonder why some entrepreneur hasn't already installed fiber there if it's so much better than DSL. I notice that they are using a bond to finance the project. Muni bonds tend to be at lower rates because they are tax-free. Is that enough of an advantage to make the difference?
Only sloppy programmers write bugs, huh? Anyway, that is all true, but there's also this situation:
Human, imperfect programmer accesses memory through bad pointer in C. Has no effect during testing. Causes crash/data loss/security hole in production at customer site.
Human, imperfect programmer access memory through bad pointer in safe language. Causes crash during testing and is fixed before release. If it isn't caught, when it crashes in production it will cause a controlled, traceable shutdown (assuming no bad exception handlers!): less chance of data loss or security holes.
The most obfuscated programming language ever. Don't tell me they're letting kids out of high school without having read the Jargon File these days. :-) (Actually, I didn't read it until I was a freshman in college.)
No, it's actually undecidable. For any nontrivial property P, there is no general algorithm for checking programs for P.
There are algorithms that work on most programs, (on others they fail to terminate) and as you point out, it takes too long to do a full verification of a full program. But you can often prove the absence of certain specific kinds of errors, like null pointer deferences. It's still slow, but doable.
You also have the problem of generating the specification (and a full specification of Word would be huge). Do you run a specification-checker to prove that it's correct? And what about the specification for that? ...
If your car shut down unexpectedly, it could kill you. It's perfectly sensible to put much more effort into debugging a car than debugging a PC.
Also, the average car now has 50 computers. And they don't crash. Compared to the software on those computers, the rest of the car is buggy and unreliable!
Hardware definitely has bugs. CPUs always have bugs, although only a few of them have been a really big deal. Hardware tends to have fewer bugs because it is simpler than software. Also, it's much more expensive to patch hardware than software, so there's more incentive to get it right before shipping.
Yes, NT5+ is very stable. MS is working on the driver problem. SLAM is a tool for verifying drivers. Given a requirement, e.g., after acquiring a kernel lock the driver must release it exactly once on all control paths, and some driver source code, SLAM can find all the ways the driver can fail the requirement. They have specifications for various driver types and are using them to test some drivers. It's a research project by the Software Development Tools group in MSR, but they're working on getting it stable and powerful enough to verify more drivers. If they can get it to work well enough, they'll supply it to hardware vendors.
You just reminded me of some more basic economics. The point of a monopoly or cartel is to make more profits by creating artificial scarcity. Intentionally writing slow code would be a way of doing that, but it only works if you have limited competition. Which may be the case with your bigger examples. Hard to say from my armchair. Certainly could have been true back when IBM was the only major manufacturer.
I tried it. Memory usage held steady. The JVM was doing a lot of collections, though, so it might be running slower than it could. Even in C, though, you wouldn't call malloc() in a tight loop if you wanted optimal performance.
Just to support what you're saying, I've heard that it's a common practice for C++ server/web app development shops to forbid the use of delete. Instead they restart the app periodically so it doesn't run out of memory.
I've heard that GC can work pretty well with C++, but you can't guarantee that there won't be memory leaks because you have to assume that a declared int might really be a pointer. It would be great with a type-safe variant of C++, though.
Not quite. Since the compiler has more type information it can safely omit most of the casts.
There's nothing stopping a C++ compiler from doing virtual method inlining. And many big apps benefit from custom memory managers, which are easier to put together in C++ than Java. Personally I don't like C++, but I don't think it will be the slow language any time soon.
Actually, you can do quite well by feeding profile information into an static compiler. There are still a few things you can't do that way, like inlining calls to DLLs. But the bigger problem with profile-driven optimization is that from what I've heard, not many developers use it. Something about not trusting really aggressive optimizers or not wanting to gather the profile information.
I have a feeling that as machines become faster, enough time will be able to be devoted to JIT work to make it as fast as, if not faster than static compilation.
That's a good point. I hadn't thought of that. Another thing is that the biggest problem for JITs is the startup time. Adaptive recompilation can help get good startup time while also aggressively optimizing hot spots.
I reran my cheesy prime number benchmark today and found it was actually faster in C# than in C++. MS has some really good compiler people working for them, so I'm only mildly surprised.
2. True. My claim is that for normal code within methods, Java is almost as fast as C++. If you have lots of memory allocations and method calls, Java is definitely slower. According to the Programming Language Shootout, JDK 1.3 is 7x slower than g++ on heapsort. I'm actually surprised it's that bad. I wonder if it's better with 1.4.
3. I don't know much about scientific computing, but I understand that Fortran is still popular because the optimizers are really good for scientific programs. I don't expect Java to meet their needs any time soon.
If you took a Java benchmark and ported it "directly" to C (e.g., virtual methods turn into function pointers) I think that would diminish the speed advantage of C. You'd still get speedup from having structs, optimizers that have 30 years of experience behind them instead of 10, etc. But if you rewrote the thing to an equivalent hand-optimized C program, I think C would still be a lot faster.
Given that the tech saw the files, he did the right thing, but he also violated, or at least gave the appearance of having violated, the company's policy and hurt its business. It seems to me that the company can legitimately fire the tech in this case. (Of course, that would make it harder to hire good techs!)
This hypothetical example isn't necessarily relevant to the situation in the article. But it could be. Maybe the faculty at other schools will not want their school to hire this company because of what happened.
The only revolution it would start would be the one to create a police state to execute all those child-ogling, cop-shooting psychos. And 20 million innocent people. If it saves just one child, after all. Count me out.
Of course, not all who look at child porn take the next steps, and the law allows punishing people only for actually harming someone, not taking a step that might lead them to in the future.
If you wrote a program with mostly static methods, primitive types, and arrays, minimizing object creation and virtual method calls, it would run almost as fast as the equivalent C++ program. A couple years back, I implemented Sieve of Eratosthenes in C++, Java, and VB, and the speeds were comparable. IIRC, on Windows they were within 10% of each other. On Solaris, bizarrely, C++ kicked Java's ass, but Java was still only about 50% slower.
But if you write an object-oriented program, it will be slower. OOP tends to be higher-level, and thus faster to write but slower to run. All those memory allocations and virtual method calls take time, and they're difficult to optimize. Also, the standard libraries are kind of slow, because they try to be really general (e.g., synchronized collections).
I've heard that too, but I don't think it's true. Basic economics says that if a product is more useful, customers will pay a higher price for it. That's why manufacturers try to make the hardware faster. Shouldn't faster software be just as good from their point of view?
However, that's only basic economics. I could believe that there were some strange economic conditions affecting DEC business strategy. Or maybe that DEC was just stupid. That would explain their nonexistence. But Intel spends a lot of money maintaining an optimizing compiler and helping software vendors optimize their code. So it doesn't seem to be true now.
Can you tell me about those cases? As a grad student in PL, I'd be interested in hearing about them.
What I've seen is that in many cases JIT systems can get within a few percent of static compilers, which is usually good enough. The advantage of JIT is that you have all the runtime information, e.g., target architecture, library code, etc, so you can do some optimizations that are impossible at compile-time. The downside is that the JIT has to compile really fast, so it can't do as many optimizations as a static compiler.
I'm sure you know that the skills you demonstrated in the contest are only a small part of the skills needed by a professional coder. I'm also pretty sure you'd make a great coder. On the other hand, based on what I've heard about you, I suspect you're too smart to be just a game programmer or something like that, and that better things are in store for you. Of course, coding is still a lot of fun, good preparation for management or research, and a decent profession. Whatever you decide to do, thanks for stopping by on Slashdot, and best wishes.