In using Oracle RDBMS, I see that for very large data set queries, using a hash join causes lots of disk activity (lots of paging, going to swap)
Though hash functions are fast, this performance is from scanning though a hash table that's fully mapped in memory.
Once your hash table gets too big for the available memory, you start using disk space (unindexed, sequential full reads)
Isn't this a bottleneck in a distributed database that relies on hash functions?
Wouldn't you want to have a distributed DB based on a distributed version of a B-Tree descendant (B+Tree, B*Tree,B**Tree)
that would use memory AND storage and scale out more than just the available memory on all your nodes?
Not only that, but you'd likely have better performance on range scans.
Just thinking...
Disclaimer: I'm making sweeping generalizations below. There are exceptions to the rule.
Stakeholders who make the business decision to go with with open source software fall into:
1. Brilliant techs who run the show and eschew proprietary software
2. Clueless PHBs who have a minuscule budget and go cheap across the board (try not to pay for anything)
The first type tends to go BK after running out of venture capital (because brilliant techs
are usually bad at business and gauging the market) Google and Facebook are examples
that buck this trend.
Stakeholders who go with proprietary software are usually PHBs who go with the
"industry standard" because of a google search or reading "industry rags"
Usually, every work place has a healthy mix of everything.
Bottom line is that programmers need to get work done and if
the sanctioned M$ system doesn't cut it, they will download and
install a system that will do the job. PHBs are clueless and tend
not to go against programmer recommendations so they tend to approve
free tools.
In this harsh economy, getting a CS degree and IT certifications
without having an internship or job already lined up is foolish.
It is not practical to invest in learning a set of skills ahead of time.
Find an opening, get a job and try to get the employer to pay for your education / certifications / training.
If they really need a skill set, they will pay for you to learn it.
Try not to think of your career as a step-by-step process.
i.e.
Step one: Invest in training
Step two: Find an internship
Step three: Turn internship into full-time job
There is too much that can go wrong in step-by-step thinking.
If you don't find an internship, or your internship does not turn
into a full time job, what then?
Instead, start with the goal in mind:
Get a job that pays well, is interesting and where you'll learn
things that you can use to get an even better, more interesting job
or that you can use to open a business.
If you take $8/hr, you are not valuing your time.
Think about your investment:
You've completed some of your course load for a Batchelor's in CS.
You've gotten some certifications.
You've invested time and money in training yourself.
Don't sell yourself short.
Try to find a better paid internship or a job.
This is happening on a major scale in IT operations.
If a you installed a third-party client application
with a DB backend and are hosting the database locally,
chances are the vendor is working on getting that out of
your server room DB server and into their cloud data center.
Your users will probably access the new system
through a web interface.
Anecdotally ADP has done this with their eTIMEsheet
application.
That which can be off-hosted will be.
The reasoning is that it frees up personnel and cuts overhead.
How it pans out in real life (cloud outages, data loss, etc.) is
another story.
I would not use the cellular to phone socket device:
i.e. Dock-N-Talk, Cell Socket, Hellodirect Cell Docking Station, MyXLink, etc.
Air time is expensive.
In using Oracle RDBMS, I see that for very large data set queries, using a hash join causes lots of disk activity (lots of paging, going to swap) Though hash functions are fast, this performance is from scanning though a hash table that's fully mapped in memory. Once your hash table gets too big for the available memory, you start using disk space (unindexed, sequential full reads) Isn't this a bottleneck in a distributed database that relies on hash functions? Wouldn't you want to have a distributed DB based on a distributed version of a B-Tree descendant (B+Tree, B*Tree,B**Tree) that would use memory AND storage and scale out more than just the available memory on all your nodes? Not only that, but you'd likely have better performance on range scans. Just thinking...
Disclaimer: I'm making sweeping generalizations below. There are exceptions to the rule. Stakeholders who make the business decision to go with with open source software fall into: 1. Brilliant techs who run the show and eschew proprietary software 2. Clueless PHBs who have a minuscule budget and go cheap across the board (try not to pay for anything) The first type tends to go BK after running out of venture capital (because brilliant techs are usually bad at business and gauging the market) Google and Facebook are examples that buck this trend. Stakeholders who go with proprietary software are usually PHBs who go with the "industry standard" because of a google search or reading "industry rags" Usually, every work place has a healthy mix of everything. Bottom line is that programmers need to get work done and if the sanctioned M$ system doesn't cut it, they will download and install a system that will do the job. PHBs are clueless and tend not to go against programmer recommendations so they tend to approve free tools.
In this harsh economy, getting a CS degree and IT certifications
without having an internship or job already lined up is foolish.
It is not practical to invest in learning a set of skills ahead of time.
Find an opening, get a job and try to get the employer to pay for your education / certifications / training.
If they really need a skill set, they will pay for you to learn it.
Try not to think of your career as a step-by-step process.
i.e.
Step one: Invest in training
Step two: Find an internship
Step three: Turn internship into full-time job
There is too much that can go wrong in step-by-step thinking.
If you don't find an internship, or your internship does not turn
into a full time job, what then?
Instead, start with the goal in mind:
Get a job that pays well, is interesting and where you'll learn
things that you can use to get an even better, more interesting job
or that you can use to open a business.
If you take $8/hr, you are not valuing your time.
Think about your investment:
You've completed some of your course load for a Batchelor's in CS.
You've gotten some certifications.
You've invested time and money in training yourself.
Don't sell yourself short.
Try to find a better paid internship or a job.
What company?
This is happening on a major scale in IT operations.
If a you installed a third-party client application
with a DB backend and are hosting the database locally,
chances are the vendor is working on getting that out of
your server room DB server and into their cloud data center.
Your users will probably access the new system
through a web interface.
Anecdotally ADP has done this with their eTIMEsheet
application.
That which can be off-hosted will be.
The reasoning is that it frees up personnel and cuts overhead.
How it pans out in real life (cloud outages, data loss, etc.) is
another story.
I would not use the cellular to phone socket device:
i.e. Dock-N-Talk, Cell Socket, Hellodirect Cell Docking Station, MyXLink, etc.
Air time is expensive.
If you have the time, build a Skype server for your home phone system.
http://www.linuxjournal.com/article/8592/
It's cheap (under $200) and you get to pay Skype rates instead
of your cellular telco.