Slashdot Mirror


Ask Slashdot: Choosing a Data Warehouse Server System?

New submitter puzzled_decoy writes The company I work has decided to get in on this "big data" thing. We are trying to find a good data warehouse system to host and run analytics on, you guessed it, a bunch of data. Right now we are looking into MSSQL, a company called Domo, and Oracle contacted us. Google BigQuery may be another option. At its core, we need to be able to query huge amounts of data in sometimes rather odd ways. We need a strong ETLlayer, and hopefully we can put some nice visual reporting service on top of wherever the data is stored. So, what is your experience with "big data" servers and services? What would you recommend, and what are the pitfalls you've encountered?

5 of 147 comments (clear)

  1. First step by Anonymous Coward · · Score: 5, Insightful

    The first step is to ask Slashdot a really vague question to a highly technical and expensive undertaking.

  2. Re:Dear Slashdot, by Sesostris+III · · Score: 5, Insightful

    Maybe. However I would also be interested in any answer (especially any answer involving FLOSS software). Interested not because it's my job or my company is looking to use such software, but because I'm curious and like to expand my knowledge.

    In general I don't mind such questions on Slashdot, as they're usually interesting and informative to the rest of us. And if they're not, then I (we) don't read the article!

    --
    You never know what is enough unless you know what is more than enough. - Blake
  3. we need more detasils on this "big data thing" by Anonymous Coward · · Score: 5, Informative

    Big data is an entire field of study, this is not "should I use vi or emacs or nano" and even that requires a shitload of context and the source of flame wars until the end of time.

    Think about your budget, your audience, and the value that you can add by spending time and money on this.

    MapReduce (hadoop) is awesome and open source, you can run it in house or in multiple cloud offerings and has a tremendous community. BUT it sucks at relationships (foreign keys) graph calculations and others.

    Graph databases can make connections between things that are impossible in other systems, but are only good for graph relationships.

    OLAP data stored in n-dimensional cubes allows reporting and analysis if familiar tools that many analysts (not programmers) think is the cat's pajamas.

    Your best be is to slow down and talk to your users, while reading Seven Databases in Seven Weeks
    https://pragprog.com/book/rwdata/seven-databases-in-seven-weeks
    And then realize that you probably need to hire a consultant so you have somebody to fire when the whole thing goes south.

  4. Re:Skip Oracle. by RuffMasterD · · Score: 5, Informative

    Just from a technical and financial point of view, I wouldn't recommend Oracle either. Oracle Advanced Analytics just seems to be a very expensive way to get R.

    Financially - R is open source and free (as in both free as a bird, and free beer), so you don't need to buy it from Oracle. No doubt Oracle will make you buy their DBMS as well to work with Advanced Analytics, and a big server to run it on, plus support to get it up and running.

    Technically - Oracle make a good DMBS for sure, but you don't need all the advanced features their DBMS is good at, such record level locking, three phase commit, redo logs, conflict resolution etc. You need that sort of stuff to maintain data integrity on transaction processing systems, but not for analysis. For analysis you just need a giant de-normalised table, and maybe indexes if you want to pick out specific subsets of records without full table scans.

    Personally I use SAS. It's not sexy, but I have never found a dataset too large to handle. It will thrash the harddrive all night if it has to to get a result, but it won't crash. SPSS will definitely crap itself with even moderate datasets. Stata does OK, but even that can't handle the larger datasets. I haven't pushed R hard enough to find it's limit.

    --
    Human Rights, Article 12: Freedom from Interference with Privacy, Family, Home and Correspondence
  5. Re: Skip Oracle. by Livius · · Score: 5, Informative

    There was a crime, and Oracle was a willing accomplice.