Oooooooooooooooooooooor, you could replace all of the stuff you said by not using a shitty RDBMS, making sure your your hardware is acceptable, running whatever RDBMS you picked's index and statistic optimiser tool, and watch as even a query with 50 joins on a terrabyte of data runs within 0.1 second! WEEEE!
All of the above could have been done within the time you TYPED YOUR POST. Thats -exactly- what the article was talking about. You need to know enough to avoid totally retarded mistakes, but you seriously don't need to start pulling out all of that to work with a database.
The worse of all that? Modern RDBMS' query optimisers and statistic analysers will make sure that the SQL you're optimising is NOT running the way you're thinking it will. So looking at the query and analysing how it SHOULD run is pointless: it will be rewritten internally (thus why rdbms without good table statistic supports are slow as hell).
Obviously this works for you, but I can't beleive this is really cost effective... I mean, just all the talk about the transactions... the RDBMS internally will optimise all of it, so that you batch things in transactions or not, usually will not even affect how internally it is being done, since a modern database will use in memory snapshots of the data, so even if you submit a transaction, it doesn't even mean the data is flushed to disk! So the I/O cost is completly depend on what the engine thinks it should do...it cannot really be predicted...thus, until you see how things behave in action, you may have a lot of surprises...
Ever come across the n+1 selects problem in hibernate
Common issue indeed, and actually a problem with Hibernate... in this day and age, there are algorythms that can be implemented in Object relational mappers to avoid what is at least the common scenario for this to happen in Hibernate or LINQ to SQL/Entity Framework...I'm not sure why it never gets fixed.
That being said, if you read the article (I know, i know, slashdot), they're talking about premature optimisation. Basically, things like avoiding Hibernate completly because of its overhead, or optimising every single queries as much as possible (even if performance is acceptable) to save every last bit of juice, so that your app can run on 10 megs of RAM instead of 100. They're not, in any way, talking about using shitty programmers, but advocate using GOOD developer's time more efficiently (solving real problems, instead of spending too much time on performance).
Almost everyone who replied saying it was stupid, almost ALL brought up either how programming mistakes can screw things up, or how its possible to make a system that doesn't scale at all. All those people missed the point.
You can make a system that is slower, but still scales and is still correctly done a LOT (and I mean a LOT) faster, if you don't go nitpicky and try and optimise everything as you go. You simply avoid doing something totally dumb, code according to best practice, etc, but you're not going to rewrite your system in C to avoid Garbage Collection, you're not going to rewrite the data structures of the framework to squeeze a 1% performance, and you won't avoid Hibernate (completly) to avoid the mapping overhead. You can tap into these time saving paradigms just by upgrading your hardware. You still need competent developers!!! But those competent developers can do more in less time.
Another reason why I say "Magic object persistence layer" instead of ORM is that they would call a generated GetXXXByName method 50,000 times instead of writing a query that used the "in" clause and adding an index
Ouch, yeah thats bad =P -Especially- since NHibernate does support the correct ways of doing things like that...its not like GetXXXByName is the only thing it gives you....good thing you got rid of these people:) Even if you know NOTHING about database, and just know how to use the "Magic object persistance layer", you wouldn't do stuff like that...even from an OOP perspective...
These guys probably needed to be reminded to breath.
I understand and I agree, but what you're describing has nothing to do with relational theory, which what confused me. Its pure I/O and RDBMS considerations. Even in the realm of "magic object persistence layer" (what you're refering to is called an ORM though, Object Relational Mapper), there are pure OOP ways of doing things without having to load everything in RAM though, that don't even require loading all the objects... That said, on a large enough table, you ARE better off running 1 update statement with the 1000 objects in batch rather than running 1000 updates:). 2 queries + 1 mapping job, vs 1000 queries going back and forth...the I/O overhead will be less (but I understand what your original point was).
In any case, LLBLGEN Pro is better than NHibernate anyway, but thats a whole other story, I suppose.
Actually (and I realise this varies widely from company to company), all retail companies (including some of the size you're talking about) I worked for all have server based PoS system. It was required for retail time inventory and sales tracking (I realise this isn't the main focus of a POS, but it was all interconnected)... We just had backup procedures for when the connectivity was down (which simply involved a local mode with synchronisation... not much to it).
All what the article is saying is that premature optimisation is a bad idea. Often making your code maintainable, and optimising it to the maximum, are mutually exclusive concepts. If you can throw hardware at the performance issue, you can make sure your code is as maintainable as possible. If its heavily maintainable, when you end up with a bottleneck, THEN you optimise it...but only if it is cost effective.
Thats all there is to it. They're not saying you should hire code monkeys. They're saying to still hire good programmers, but to shift their focus away from performance, and toward everything ELSE that matters, to throw hardware at it, and if that doesn't work, THEN you could back and you optimise.
A running joke here is that the first thing you hear when you hire someone out of school, is them asking which type of sort algo they should use to order a 10 element dropdown list. Thats what you want to avoid:)
I agree with most of your post, but what does understanding relational theory have to do with anything? There's no RDBMS out there that is theoritically sound, and even the most complex of stored procedure won't need any relational algebra. That stuff is only useful if you're -writing the RDBMS itself-, not if you're using it. Oh, you need to know the basics of normalisation (IF you're designing the database...), know how keys work... index aren't part of the theory and is a purely pragmatic concept... By port of Hibernate, you probably mean NHibernate (thus the C# reference above), and thats used by thousands of companies, including extremely large projects with high levels of success (and is actually growing in adoption).
The thing is, that theory straight out of the school book might be cute and pretty, but in the real world, unless you're working for Oracle or Nvidia, it is -extremely-uncommon. Why? Because anything where you'd need to CARE about it is wrapped in a 3rd party library, or was designed by the ONE computer scientist or functional analyst in the team.
The programmer need to know just enough to avoid the obvious cases. Anything else will show up in the load tests (which, if you're not working for Ebay, will push your server under significantly higher load than anything it will see in the real world). Then the profiler will find the ONE instance where the "lesser" dev screwed up, you add more CPU and RAM to be safe on the rest, and you're good to go.
The thing is, modern software development is usually (Im talking quantity here) integration. Getting systems to talk to each other, getting the business rules so the data goes correctly in the database, implementing the interface so you can add some functionality to the 3rd party package... None of this can even BE O(N^2), its all I/O considerations, which isn't solved with better algorythms, its solve with better cashing strategies and better hardware, even if you throw PhDs at it.
There are parts of many higher end systems that will require the fancy algorythm implementations... for that, you get a fraction of your team to have that as their primary responsability.
If you're making the next Ebay or the next Nvidia drivers, this is obviously not true at all...but those cases are a tiny little fraction of the real world.
Note that this doesn't mean you can hire a bunch of idiots. It just means that your SKILLED developers shouldn't spend -too much- time on optimisation. Optimize the OBVIOUS cases, avoid doing something completly stupid... But don't do things like "Oh, I should inline this function call manually because the compiler will not in this case, and making an instance of an object will add a 0.00000001% overhead". That, you throw hardware at it. Its also 99.9% of the cases.
And the article isn't advocating that you should be using code monkeys and compensate with hardware. Its just saying that you're better off throwing hardware at a problem then hiring an army of PhD. Its two extremes, with the middle being the best... you actually agree, you just disagreed on the extremes:)
Thats why usually a system for this type of environment would be server side, with only the UI on the client... a web application or something in Flex/XBAP/Silverlight/whatever... or even a thick client, but something that does all the heavy processing server side.
So even if you have 12974170240192740971249071290 locations, you'll still only need to upgrade your servers, unless you decided your UI was going to be in OpenGL
Depends how you read into it. If I take a web application written in Java,.NET or PHP...let say...Facebook? And I throw a LOT more developer resources at it, rewrite the bottlenecks in raw C, the algorythms in pure assembly optimized for the hardware, get some PhDs in the research development to develop new more advanced operational research algos that cut out the edge cases to improve performance.... I'll get a LOOOOOOOOOOT more performance.
Is it worth it though? Probably not.
The whole "throwing hardware instead of dev at a problem" doesn't necessarly mean you're comparing using $100k devs vs $30k code monkeys. It may mean you take less $100k devs and give them better tools, that have some overhead.
Thats correct. You still need someone who can make out bubble sort from quicksort.
What people who say we should use hardware over programmers usually mean though, is that you don't need to go all out. Take a framework like Java or.NET. They have data structures built in. Sets, hashtables, linked lists... These are generic implementations meant for everyday use. You could throw a computer science guru (well, more like someone who remembers their data structure class...) at them and make them better for a specific business model. Or you can upgrade your CPU.
Both ways will work fine. Just one way you better hope the programmer types fast to be cost effective.
Hardware will not allow you to trade a competent developer for a code monkey. It, however, allows you to get a competent developer to do the job of two, by not having to optimize every single line of code. As long as it somewhat scale, there's some caching for the edge cases, and that it remembers to use librairies when available instead of using a sketchy hand-made implementation thats poorly optimized, RAM and new CPUs will pretty much solve 90% of your problems. Just make sure the dev is competent enough to spot the last 10%, which is a LOT easier than having to SOLVE all and every problems.
Case in point: I have a fairly strong CS and algo background. Over a decade ago that is... by now, I forgot most of it. I remember enough to not do something completly stupid, and I know when to use the cache and all the best practices, but thats really it, by now I forgot the rest. Yet I have systems in productions getting hammered by hundreds of thousands of requests, growing everyday, and the servers didn't so much as hiccup.
When I have some scalability issue, I can just take the good old Profiler, and I'll spot it within minutes. A couple of years ago, this wouldn't have been possible.
If thats true, then we know right there why the benchmarks were favoring Linux... -server is superior in basically every ways, shape and form for synthetic benchmarks, since -client is for feeling of responsiveness, not raw performance, and the article said nothing about forcing the VMs to run in the same mode in both cases... not very fair to compare benchmarks not using the same mode...
1 -> I don't know when they ran the benchmark, but when the article was published, even on Linux they were using one update behind. They say in the article they were using the latest...and I don't know whats Sun's release schedule and if they release for Linux first, but I -heavily- doubt the Windows version was 3 release behind... and revision 10 was a very, very important revision in term of performance, so even if it wasn't released on Windows, it was a very dubious time to benchmark things...right after one of the fastest JVM revision came out for one of the OS but not the other (though I doubt that was the case)
especially since the sun java should be built off the same codebase...
They're not. They're using different builds of java for the 2. The benchmark with Ubuntu was using a newer version of the JVM that has some of the more significant performance related modifications in the history of the platform.
Don't be surprised. The benchmark is comparing a JVM version that recently got some of the biggest performance improvements in Java's history, vs the version that preceeded it... You'd get similar number if you did the same and compared Windows vs Windows or Linux vs Linux.
Because retailers will not sell an A rated game. The original version of Manhunt 2 got an A rating, so stores like Walmart wouldn't carry it. Kindda hard to justify the expenses on a game you can't sell in most places.
They -are- different JVM builds, so its possible (as is common in the JVM's history) that some bug fixes improve performance wildly... Not across the board though, so something's wrong, either with the JVM, or with Windows itself... but something is seriously messed up.
The problem is if you roll out a patch to home user, then hackers have the blueprints on "How to exploit the corporate".
Its still totally retarded IMO, but MS is between a rock and a hard place on that one...look what happens when they don't give people what they want (Vista). This is what people who pay "want", ugh....
Err? When there are updates, I can cherry pick which ones I want from a list with checkboxes, and click "Install", and do so ONLY for the ones I want. Some non-security related updates are irrelevent to me, so I left them to rot for months... I can even hide them so it never asks me about them ever again.
Pretty much the totality (with one or two exceptions) of Microsoft's products update via Windows update, From Internet Explorer, going to SQL Server, passing by MS Office. Even SQL Server's Book-Online and some built in games updates via Windows Update
I dont know about the newer models without the backward compatibility and stuff, but previously, they were definately net loss. The -last- thing Sony wants is to sell a million PS3 with 0 attach rate. Of course, those numbers would still count to impress developers, and may be a catalyst, but....
Google is aggressively advertising Chrome lately, so I doubt they'll be pulling the plug on it
Oooooooooooooooooooooor, you could replace all of the stuff you said by not using a shitty RDBMS, making sure your your hardware is acceptable, running whatever RDBMS you picked's index and statistic optimiser tool, and watch as even a query with 50 joins on a terrabyte of data runs within 0.1 second! WEEEE!
All of the above could have been done within the time you TYPED YOUR POST. Thats -exactly- what the article was talking about. You need to know enough to avoid totally retarded mistakes, but you seriously don't need to start pulling out all of that to work with a database.
The worse of all that? Modern RDBMS' query optimisers and statistic analysers will make sure that the SQL you're optimising is NOT running the way you're thinking it will. So looking at the query and analysing how it SHOULD run is pointless: it will be rewritten internally (thus why rdbms without good table statistic supports are slow as hell).
Obviously this works for you, but I can't beleive this is really cost effective... I mean, just all the talk about the transactions... the RDBMS internally will optimise all of it, so that you batch things in transactions or not, usually will not even affect how internally it is being done, since a modern database will use in memory snapshots of the data, so even if you submit a transaction, it doesn't even mean the data is flushed to disk! So the I/O cost is completly depend on what the engine thinks it should do...it cannot really be predicted...thus, until you see how things behave in action, you may have a lot of surprises...
Common issue indeed, and actually a problem with Hibernate... in this day and age, there are algorythms that can be implemented in Object relational mappers to avoid what is at least the common scenario for this to happen in Hibernate or LINQ to SQL/Entity Framework...I'm not sure why it never gets fixed.
That being said, if you read the article (I know, i know, slashdot), they're talking about premature optimisation. Basically, things like avoiding Hibernate completly because of its overhead, or optimising every single queries as much as possible (even if performance is acceptable) to save every last bit of juice, so that your app can run on 10 megs of RAM instead of 100. They're not, in any way, talking about using shitty programmers, but advocate using GOOD developer's time more efficiently (solving real problems, instead of spending too much time on performance).
Almost everyone who replied saying it was stupid, almost ALL brought up either how programming mistakes can screw things up, or how its possible to make a system that doesn't scale at all. All those people missed the point.
You can make a system that is slower, but still scales and is still correctly done a LOT (and I mean a LOT) faster, if you don't go nitpicky and try and optimise everything as you go. You simply avoid doing something totally dumb, code according to best practice, etc, but you're not going to rewrite your system in C to avoid Garbage Collection, you're not going to rewrite the data structures of the framework to squeeze a 1% performance, and you won't avoid Hibernate (completly) to avoid the mapping overhead. You can tap into these time saving paradigms just by upgrading your hardware. You still need competent developers!!! But those competent developers can do more in less time.
Thats -all- what the author was advocating.
Ouch, yeah thats bad =P -Especially- since NHibernate does support the correct ways of doing things like that...its not like GetXXXByName is the only thing it gives you....good thing you got rid of these people :) Even if you know NOTHING about database, and just know how to use the "Magic object persistance layer", you wouldn't do stuff like that...even from an OOP perspective...
These guys probably needed to be reminded to breath.
I understand and I agree, but what you're describing has nothing to do with relational theory, which what confused me. Its pure I/O and RDBMS considerations. Even in the realm of "magic object persistence layer" (what you're refering to is called an ORM though, Object Relational Mapper), there are pure OOP ways of doing things without having to load everything in RAM though, that don't even require loading all the objects... That said, on a large enough table, you ARE better off running 1 update statement with the 1000 objects in batch rather than running 1000 updates :). 2 queries + 1 mapping job, vs 1000 queries going back and forth...the I/O overhead will be less (but I understand what your original point was).
In any case, LLBLGEN Pro is better than NHibernate anyway, but thats a whole other story, I suppose.
Actually (and I realise this varies widely from company to company), all retail companies (including some of the size you're talking about) I worked for all have server based PoS system. It was required for retail time inventory and sales tracking (I realise this isn't the main focus of a POS, but it was all interconnected)... We just had backup procedures for when the connectivity was down (which simply involved a local mode with synchronisation... not much to it).
Really depends on requirements, I suppose.
All what the article is saying is that premature optimisation is a bad idea. Often making your code maintainable, and optimising it to the maximum, are mutually exclusive concepts. If you can throw hardware at the performance issue, you can make sure your code is as maintainable as possible. If its heavily maintainable, when you end up with a bottleneck, THEN you optimise it...but only if it is cost effective.
Thats all there is to it. They're not saying you should hire code monkeys. They're saying to still hire good programmers, but to shift their focus away from performance, and toward everything ELSE that matters, to throw hardware at it, and if that doesn't work, THEN you could back and you optimise.
A running joke here is that the first thing you hear when you hire someone out of school, is them asking which type of sort algo they should use to order a 10 element dropdown list. Thats what you want to avoid :)
I agree with most of your post, but what does understanding relational theory have to do with anything? There's no RDBMS out there that is theoritically sound, and even the most complex of stored procedure won't need any relational algebra. That stuff is only useful if you're -writing the RDBMS itself-, not if you're using it. Oh, you need to know the basics of normalisation (IF you're designing the database...), know how keys work... index aren't part of the theory and is a purely pragmatic concept... By port of Hibernate, you probably mean NHibernate (thus the C# reference above), and thats used by thousands of companies, including extremely large projects with high levels of success (and is actually growing in adoption).
I do agree with everything else though.
The thing is, that theory straight out of the school book might be cute and pretty, but in the real world, unless you're working for Oracle or Nvidia, it is -extremely-uncommon. Why? Because anything where you'd need to CARE about it is wrapped in a 3rd party library, or was designed by the ONE computer scientist or functional analyst in the team.
The programmer need to know just enough to avoid the obvious cases. Anything else will show up in the load tests (which, if you're not working for Ebay, will push your server under significantly higher load than anything it will see in the real world). Then the profiler will find the ONE instance where the "lesser" dev screwed up, you add more CPU and RAM to be safe on the rest, and you're good to go.
The thing is, modern software development is usually (Im talking quantity here) integration. Getting systems to talk to each other, getting the business rules so the data goes correctly in the database, implementing the interface so you can add some functionality to the 3rd party package... None of this can even BE O(N^2), its all I/O considerations, which isn't solved with better algorythms, its solve with better cashing strategies and better hardware, even if you throw PhDs at it.
There are parts of many higher end systems that will require the fancy algorythm implementations... for that, you get a fraction of your team to have that as their primary responsability.
If you're making the next Ebay or the next Nvidia drivers, this is obviously not true at all...but those cases are a tiny little fraction of the real world.
Note that this doesn't mean you can hire a bunch of idiots. It just means that your SKILLED developers shouldn't spend -too much- time on optimisation. Optimize the OBVIOUS cases, avoid doing something completly stupid... But don't do things like "Oh, I should inline this function call manually because the compiler will not in this case, and making an instance of an object will add a 0.00000001% overhead". That, you throw hardware at it. Its also 99.9% of the cases.
And the article isn't advocating that you should be using code monkeys and compensate with hardware. Its just saying that you're better off throwing hardware at a problem then hiring an army of PhD. Its two extremes, with the middle being the best... you actually agree, you just disagreed on the extremes :)
Thats why usually a system for this type of environment would be server side, with only the UI on the client... a web application or something in Flex/XBAP/Silverlight/whatever... or even a thick client, but something that does all the heavy processing server side.
So even if you have 12974170240192740971249071290 locations, you'll still only need to upgrade your servers, unless you decided your UI was going to be in OpenGL
Depends how you read into it. If I take a web application written in Java, .NET or PHP...let say...Facebook? And I throw a LOT more developer resources at it, rewrite the bottlenecks in raw C, the algorythms in pure assembly optimized for the hardware, get some PhDs in the research development to develop new more advanced operational research algos that cut out the edge cases to improve performance.... I'll get a LOOOOOOOOOOT more performance.
Is it worth it though? Probably not.
The whole "throwing hardware instead of dev at a problem" doesn't necessarly mean you're comparing using $100k devs vs $30k code monkeys. It may mean you take less $100k devs and give them better tools, that have some overhead.
Thats correct. You still need someone who can make out bubble sort from quicksort.
What people who say we should use hardware over programmers usually mean though, is that you don't need to go all out. Take a framework like Java or .NET. They have data structures built in. Sets, hashtables, linked lists... These are generic implementations meant for everyday use. You could throw a computer science guru (well, more like someone who remembers their data structure class...) at them and make them better for a specific business model. Or you can upgrade your CPU.
Both ways will work fine. Just one way you better hope the programmer types fast to be cost effective.
Hardware will not allow you to trade a competent developer for a code monkey. It, however, allows you to get a competent developer to do the job of two, by not having to optimize every single line of code. As long as it somewhat scale, there's some caching for the edge cases, and that it remembers to use librairies when available instead of using a sketchy hand-made implementation thats poorly optimized, RAM and new CPUs will pretty much solve 90% of your problems. Just make sure the dev is competent enough to spot the last 10%, which is a LOT easier than having to SOLVE all and every problems.
Case in point: I have a fairly strong CS and algo background. Over a decade ago that is... by now, I forgot most of it. I remember enough to not do something completly stupid, and I know when to use the cache and all the best practices, but thats really it, by now I forgot the rest. Yet I have systems in productions getting hammered by hundreds of thousands of requests, growing everyday, and the servers didn't so much as hiccup.
When I have some scalability issue, I can just take the good old Profiler, and I'll spot it within minutes. A couple of years ago, this wouldn't have been possible.
If thats true, then we know right there why the benchmarks were favoring Linux... -server is superior in basically every ways, shape and form for synthetic benchmarks, since -client is for feeling of responsiveness, not raw performance, and the article said nothing about forcing the VMs to run in the same mode in both cases... not very fair to compare benchmarks not using the same mode...
1 -> I don't know when they ran the benchmark, but when the article was published, even on Linux they were using one update behind. They say in the article they were using the latest...and I don't know whats Sun's release schedule and if they release for Linux first, but I -heavily- doubt the Windows version was 3 release behind... and revision 10 was a very, very important revision in term of performance, so even if it wasn't released on Windows, it was a very dubious time to benchmark things...right after one of the fastest JVM revision came out for one of the OS but not the other (though I doubt that was the case)
They're not. They're using different builds of java for the 2. The benchmark with Ubuntu was using a newer version of the JVM that has some of the more significant performance related modifications in the history of the platform.
Don't be surprised. The benchmark is comparing a JVM version that recently got some of the biggest performance improvements in Java's history, vs the version that preceeded it... You'd get similar number if you did the same and compared Windows vs Windows or Linux vs Linux.
If you wanted to do that, then you'd want to test on WinServer 2008, not Vista, in which case it definately WOULD scale.
Because retailers will not sell an A rated game. The original version of Manhunt 2 got an A rating, so stores like Walmart wouldn't carry it. Kindda hard to justify the expenses on a game you can't sell in most places.
They -are- different JVM builds, so its possible (as is common in the JVM's history) that some bug fixes improve performance wildly... Not across the board though, so something's wrong, either with the JVM, or with Windows itself... but something is seriously messed up.
The problem is if you roll out a patch to home user, then hackers have the blueprints on "How to exploit the corporate".
Its still totally retarded IMO, but MS is between a rock and a hard place on that one...look what happens when they don't give people what they want (Vista). This is what people who pay "want", ugh....
Err? When there are updates, I can cherry pick which ones I want from a list with checkboxes, and click "Install", and do so ONLY for the ones I want. Some non-security related updates are irrelevent to me, so I left them to rot for months... I can even hide them so it never asks me about them ever again.
Are you talking about something else, or..?
Pretty much the totality (with one or two exceptions) of Microsoft's products update via Windows update, From Internet Explorer, going to SQL Server, passing by MS Office. Even SQL Server's Book-Online and some built in games updates via Windows Update
They're talking about Acid 3. Acid 2 is old school now!
I dont know about the newer models without the backward compatibility and stuff, but previously, they were definately net loss. The -last- thing Sony wants is to sell a million PS3 with 0 attach rate. Of course, those numbers would still count to impress developers, and may be a catalyst, but....