Ask Slashdot: Changing Career From OLTP To OLAP Dev
First time accepted submitter xby2_arch writes "After spending over 12 years writing OLTP applications (Java EE/JDBC/ORMs), I decided to dabble in the OLAP world. I had decent DB skills, considering most of my previous projects had involved data modeling and coding using Stored Procs, etc. Yet I hadn't designed or implemented any dimensional databases. Luckily for me, I had enough relevant domain knowledge to land a developer job in a data warehousing project. The work was enjoyable enough that it motivated me to spend that extra time and effort I needed to cope with the different dynamics of coding in the OLAP realm. In my past life, data volumes weren't the primary concern (instead, transaction volumes were), here, everything was about data. ETL/Integrations present another set of problems you generally skirt in a typical web/app-tier developer role. All in all, it turned out to be a non-trivial, yet worthwhile transition. I am certain that there are plenty of seasoned developers out there who plan to make a similar move (or have made already), who see data as the next chapter in their careers evolving toward becoming Enterprise Architects. I want to hear what's holding them back, or what helped them move forward. What should be considered a prerequisite to make this switch, and what are the risks, etc.?"
You know, we can always check Wikipedia for definitions:
https://en.wikipedia.org/wiki/OLTP
https://en.wikipedia.org/wiki/OLAP
It might have been nice if the editors had included these links in the summary, but it is not as though you do not know how to use Wikipedia.
Palm trees and 8
OLTP: Online Transaction Processing: you buy a ticket, you want it immediatly. The seller types it in the computer and prints your ticket, a database checked if there were free seats and immediatly reserves one. "Immediatly" is the important word, the customer is not waiting. OLAP: Online Analytical Processing: How many seats from the US to EMEA have been sold by that kind of sellers with such produyt code. Managements wants the results by the end of the month, it is OK if the query runs a couple of hours/days. Many real life systems contain two databases, one tuned for speed (containing only the current tickets) and one for reference later (containing all tickets sold in the past n months/years). The difference between them is database tuning and SQL tuning. All the rest (such as path to architect: yeah, the more different systems you know, the more chance you will have to design new ones) is hype.
/. refugees on Usenet: news:comp.misc
The difference between them is database tuning and SQL tuning. All the rest (such as path to architect: yeah, the more different systems you know, the more chance you will have to design new ones) is hype.
That's like saying web coding is the same as application coding and the main difference is just languages used. There is a huge difference in the goals and methodology of OLTP and OLAP. DB and SQL tuning is a small part of the larger scheme of things. Take for instance your example of tickets; you can get an OLTP to show how many seats from US to EMEA by seller. It may not be very easy especially as you add conditions (by quarter, by day of week of travelled, by day of week sold, etc). In OLAP these questions become important and you have to design your DB structure to accommodate them and other queries. It also matters how this information is reported by tool as not all tools can take advantage of DB features. After all that then DB and SQL tuning comes into play.
Well, there's spam egg sausage and spam, that's not got much spam in it.
The prerequisites to making the switch is first and most importantly having an appropriate business case for OLAP. The second prerequisite is that you've tried doing analytics in a traditional RDMS, perhaps jumped on to the NoSQL bandwagon, and you've failed at it (i.e. success for a little while but then your data eventually brings your queries down to its knees). Don't worry, failure isn't necessarily wrong, it's just you and your team needed the experience before you could make the next leap.
The risks are a knowledge jump in to an OLAP mindset from a traditional SQL mindset. Invest in you and your fellow developer's knowledge. Push back on management and sales when they want more immediate results and let them know that it will take 3-5 months to replace your current system. Do your proper technology evaluations. Learn FoodMart and Adventureworks and let them guide you down the path of good fact and dimension design. Don't snub your nose at Microsoft as they absorbed the company in the 80's that basically pioneered this stuff and made billions, but also don't take their stuff too literally as there are several products out there and some that do things better.
Read The Data Warehouse Toolkit thoroughly and practice using Mondrian which is an open source Java OLAP engine that can sit on top of PostgreSQL, MySQL, and others. Find a good ETL tool rather than trying to write your own at first and don't be afraid to force your internal users to use this tool to create their facts. Don't worry if you don't get it the first time, but keep trying and keep discussing with your fellow developers as it takes a team to work out all the kinks. Later on you'll probably end up seeing how you did things wrong, but hopefully you can get most things right in the beginning.
If it's on the main page the target audience is pretty general, so you really shouldn't have to check Wikipedia. You should ALWAYS tell people what acronyms mean before using them exclusively. And if you have time enough to be an arrogant *&$# while posting links to Wikipedia you could also take the time to spell out what the acronyms mean. At the very least give the long form, e.g. Online Transaction Processing (OLTP), Online Analytical Processing (OLAP). See it's not that fucking hard.
And since I am taking the time to rant (and because any technical article in Wikipedia gets hijacked by propellor heads who like to inject as much as of their industry specific double speak so that they sound important and a layman can't get at least an understandable overview in the summary):
OLTP - Online Transaction Processing. An application/database designed for larger, equal, or nearly equal volumes of database inserts, updates, and deletes, as there are reads. The database is generally more (and often highly) normalized, meaning that there is less data duplication across tables and/or within tables.
OLAP - Online Analytical Processing. Essentially a data warehouse designed for larger volumes of reads than there will be inserts, updates, or deletes (often relatively very, very few deletes). Less normalized meaning that there may be duplicated data across and/or within tables in order to increase query speed at the cost of possible consistency issues due to the data duplication.
OLTP - Online Transaction Processing. An application/database designed for larger, equal, or nearly equal volumes of database inserts, updates, and deletes, as there are reads. The database is generally more (and often highly) normalized, meaning that there is less data duplication across tables and/or within tables.
OLAP - Online Analytical Processing. Essentially a data warehouse designed for larger volumes of reads than there will be inserts, updates, or deletes (often relatively very, very few deletes). Less normalized meaning that there may be duplicated data across and/or within tables in order to increase query speed at the cost of possible consistency issues due to the data duplication.
Mark this informative comment the hell up!
And smash my Karma if you want to, but the (four letter) acronyms really didn't help the article much.
A=Analytical: the problems of creating useful reports from vast amounts of mainly read-only data.
T=Transactional: the problems of highly concurrent read-write transactions.
Wikipedia simply states that the terms are derived from each other, the techniques they embody are very different.
If you only have a small amount of data and a small number of transactions then you don't care about the difference, but once you have a large number of one or other (or worse both) then it is a very significant distinction.