Ask Slashdot: Choosing a Data Warehouse Server System?
New submitter puzzled_decoy writes The company I work has decided to get in on this "big data" thing. We are trying to find a good data warehouse system to host and run analytics on, you guessed it, a bunch of data. Right now we are looking into MSSQL, a company called Domo, and Oracle contacted us. Google BigQuery may be another option. At its core, we need to be able to query huge amounts of data in sometimes rather odd ways. We need a strong ETLlayer, and hopefully we can put some nice visual reporting service on top of wherever the data is stored. So, what is your experience with "big data" servers and services? What would you recommend, and what are the pitfalls you've encountered?
Microsoft doesn't win the real "Big Data" contracts, but there's many medium data contracts with delusions of grandeur. I work with a TB-size (as in, >1 TB...) database and while it's certainly no longer small data it's not "Big Data". It fits in a traditional RDBMS, when we get past the buzzwords what our users want are fairly traditional cubes/reports with drilldown that OLAP systems provide. If Microsoft is bad, the alternatives like Oracle, SAS, SAP or IBM are worse. Looking at an open source stack replacing the database is actually the easy bit, I'm sure we'd do fine running on PostgreSQL or MariaDB. Reporting tools on par with Reporting Services are also easy to come by. I've seen nothing as user-friendly as Integration Services on the data flow side which we use a lot, but I guess we could use it with foreign sources and destinations too.
Probably the biggest lack on the data warehouse side is an open source OLAP server. The wikipedia page lists two, one is Palo/Jedox which is a very limited marketing version for their commercial product and the other is Mondarian which by closer inspection seems to just translate MDX to SQL and let the RDBMS database do the aggregation which I suppose is okay for small data sets but will choke on any significant volume. Basically it comes down to all the Microsoft tools being "good enough" and working nicely together, while the rest ends up being a mix of different pieces from here and there. Either that or you're looking at a whole different stack, and I got lots of requirements that'd make a NoSQL solution squirm.
Live today, because you never know what tomorrow brings