Ask Slashdot: Which NoSQL Database For New Project?

Do you need a database? by tubs · 2014-04-08 21:17 · Score: 2, Insightful

Do you need a database to do what you're trying to do? Why not just write the information to a text file (csv or tab seperated?), and use other programs to query the data?

--

try to make ends meet, you're a slave to money, then you die

Re:Do you need a database? by Anonymous Coward · 2014-04-08 21:20 · Score: 5, Funny

Excel Spreadsheet, maybe?
Re:Do you need a database? by Anonymous Coward · 2014-04-08 21:27 · Score: 1, Insightful

Definitely use a CSV or tab-separated file. A NoSQL database is wayyyy overkill. Even a SQL database is overkill for what you're trying to do.
Re:Do you need a database? by mwvdlee · 2014-04-08 21:49 · Score: 5, Insightful

Basically the question is; what's the expected volume of records and fields per records?
A solution for 100 records a week with 4 fields each would be different from 1000 records per second with 30 fields each.
1000 records/sec with 4 fields would be yet another solution.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Do you need a database? by Richard_at_work · 2014-04-08 22:00 · Score: 3, Interesting

Theres probably an element of multithreaded access that needs to be taken into consideration here - writing to a single text file may get you into issues if the receiving webserver is multithreaded, meaning the threads will either have to queue for write locks, or write to a different file.
Database engines don't have this issue, so while it may be overkill, there may be reasons to have one irregardless.
Re:Do you need a database? by FyRE666 · 2014-04-08 22:07 · Score: 5, Insightful

Please don't do this (use a flat file) to store data for a web app that's likely to be accessed by more than one device at a time. Unless you implement your own file locking mechanism, you'll eventually end up with corrupt entries. Even if you do implement your own locking scheme, it's probably not going to be as efficient as using a DB. It's a 5 minute job to set up a new MySQL DB and associated query to push data in, then you can filter and report on it much more easily. It's something DBs are very good at!
Unless you have a specific need to scale horizontally, it's generally better to stick with a SQL DB for web apps. I've used MySQL, PostgreSQL and Oracle for this. MySQL is by far the easiest to work with, hence its popularity. I don't actually know of any advantage to using PostgreSQL; it doesn't perform any better, and is (or at least used to be) much less user friendly.

--
Code, Hardware, stuff like that.
Re:Do you need a database? by DarkOx · 2014-04-08 22:31 · Score: 4, Informative

I disagree, he is concerned about scaling. The last thing in the world he should do is use a bunch of flat files, unless he really just needs to store the data, but he already said he needs to do reports and totals on it.
Also he is working in Ruby. The smart thing for him to do IMHO is write his program against ruby/DBI. It isn't the pretty database api, but it supports plenty of different backend options and it does not sound like his program needs especially complex database operations or queries. He can start working with something like SQLite as the database "server", and move up to something else, perhaps Postgress (which can be every bit as fast as the NOSQL solutions unless you are getting highly highly custom) without needing to alter his program.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Do you need a database? by Raumkraut · 2014-04-08 22:31 · Score: 1

For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting, a traditional relational SQL database is not necessarily the best way to do it.
And if anything, MongoDB is easier to start using than any relational database, IME. No need to create databases, schemas, or tables (collections) beforehand - you just install MongoDB, start writing data, and it gets stored.
Re:Do you need a database? by Richard_at_work · 2014-04-08 22:43 · Score: 5, Insightful

I think many people get stuck in thinking "one single database, thats it, my initial decision condemns me forever", when in-fact theres no shame in having many databases.
Stick the raw data into one database, choose the database that suits that.
Transform the data from the raw database into something you can use day to day, thats well structured etc, choose the database for that.
Transform the data from the day to day schemas into something that more suitable for archiving and long term reporting, again choose the database for that.
You don't have to have one single database type, every particular one has its strengths, so use them!
Re:Do you need a database? by Lennie · 2014-04-08 22:45 · Score: 1

There are a whole lot of things PostgreSQL was less user friendly, but they take their time and keep improving it in a consistent way. It has many, many features.
Personally I really like PostgreSQL. It scales really well.
And if there is anything missing, there might be things some people want.
But I think you'll find it will be added in the next 3 releases. 9.4 is now in development:
- upsert/merge in 9.4
- basis of logical replication in 9.4 (has been available in out of tree tools for many years), upcoming versions will built on that.
I'm not sure what people still need if those are done other than multi-master. And this is where logical replication can really help. We don't know if the developers will implement it of course. These things take effort and time.

--
New things are always on the horizon
Re:Do you need a database? by nctritech · 2014-04-08 23:28 · Score: 2

Create a table, get a POST, Insert contents of POST into table...I don't really see how this isn't the best way to do it.
Re:Do you need a database? by jythie · 2014-04-09 00:29 · Score: 2

*gasp* a sensible solution using readily available mature tools? *faints*
Re:Do you need a database? by jythie · 2014-04-09 00:32 · Score: 1

No need to develop your own locking system, just use whatever logging functionality the server has.
Re:Do you need a database? by Anonymous Coward · 2014-04-09 00:59 · Score: 2, Informative

>For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting
I dunno. I read TFS and it looks more like he wants rows of tabular data. Were this a STX site, I'd vote to close as too broad since he hasn't actually said anything useful about what he's storing.
So default answer to "Which NoSQL database should I use?" is always "Don't use NoSQL."
Re:Do you need a database? by DorianGre · 2014-04-09 01:12 · Score: 3, Informative

We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.
Re:Do you need a database? by boristdog · 2014-04-09 01:19 · Score: 4, Insightful

As someone who is currently trying to convert a 20 year-old, multi-million-entry flat files DB into a real DB for a major corporation without bringing the corporation to its knees I heartily concur with NOT using flat files if there is ANY chance of this growing beyond a few hundred entries.
By now hundreds of applications are using the old flat file DB, I have so much re-coding to do that I will probably retire before it is all complete.
Re:Do you need a database? by squiggleslash · 2014-04-09 01:21 · Score: 5, Insightful

Then perhaps he should use a real database, rather than embrace a fad started by people who don't like databases?

--
You are not alone. This is not normal. None of this is normal.
Re:Do you need a database? by DorianGre · 2014-04-09 01:35 · Score: 1

I quit a gig within the last year where the company was on DB2 (8) and the data was scattered. Their daily processes were pushing 22 hours to complete, and their chosen solution was just to delete historical data, so that they couldn't even tell their customers what happened the previous month. Of course, the same team had been building their PL/I code since the 1980s, so there was no way to get them unstuck without some executive decisiveness, and that wasn't happening. They wanted a data warehouse for business intelligence in their oracle system. I signed a contract for golden gate with implementation and then walked. Really, the place was a mess. WebFocus for reporting, an excel reporting team because they couldn't get data from WebFocus, a java team where the last 2 architects had quit within a year, a stealth jasper reports team that had been working on the same goal for years with no deliverable, etc. Really, the only thing in the entire place that worked at all was the DB2, and on that alone they were making money hand over fist. It was an amazing scene.
Re:Do you need a database? by DorianGre · 2014-04-09 01:43 · Score: 1

Scaling is the #1 issue we are concerned about. The reports are not complex, but they do need to happen. We are also not stuck on ruby (it is a pig, processor wise), but the application is such that it is easy to scale the front end horizontally.
Re:Do you need a database? by Art3x · 2014-04-09 01:44 · Score: 1

"Think of SQLite not as a replacement for Oracle but as a replacement for fopen()" --- About
Re:Do you need a database? by oh_my_080980980 · 2014-04-09 02:07 · Score: 1

NoSQL is a flat file, so it's the same thing. He's not going to be organizing the data in any meaningful way with NoSQL, it's just a dumping ground.
Re:Do you need a database? by oh_my_080980980 · 2014-04-09 02:08 · Score: 1

Amen brother.
Re:Do you need a database? by oh_my_080980980 · 2014-04-09 02:11 · Score: 1

Which is why he might as well use a flat file. If he has structure, then an RDMS is what he should use. If he's not going to bother to organize the information, then a flat file would be perfect because all you are after is junk anyways.
Re:Do you need a database? by NatasRevol · 2014-04-09 02:12 · Score: 2

So, 10-20 thousand data points, per customer, per week?
Or, at 100 customers, 50-100 million data points per year?
Get a real database. And some real horsepower.

--
There are two types of people in the world: Those who crave closure
Re:Do you need a database? by tubs · 2014-04-09 02:13 · Score: 2

When I read the post the first thought that came to me was "log files" - you mention date & time, a "number" of fields and "few" fields for reporting. It still sounds like a log file from everything that is said. Indeed, just change from POST to GET and you can use the web server logs :-)
But, why not build into the design that you may change the "backend" database without having to worry about what is at the backend?

--
try to make ends meet, you're a slave to money, then you die
Re:Do you need a database? by K.+S.+Kyosuke · 2014-04-09 02:17 · Score: 1

I've used MySQL, PostgreSQL and Oracle for this. MySQL is by far the easiest to work with, hence its popularity.
What about Firebird? Actual transactions - even transactional lazy schema updates -, single-file databases, reasonable tools, almost invisible maintenance, everything virtually idiot-proof. Even LibreOffice wants to switch to embedded Firebird for its native database engine. I can't imagine MySQL being anything other than PITA compared to Firebird.

--
Ezekiel 23:20
Re:Do you need a database? by Anonymous Coward · 2014-04-09 02:26 · Score: 2, Informative

"Irregardless" is not a word, you nigger."
Merriam-Webster:
irregardless
irregardless
adverb \ir-i-gärd-ls\
Definition of IRREGARDLESS
Usage Discussion of IRREGARDLESS
Irregardless originated in dialectal American speech in the early 20th century. Its fairly widespread use in speech called it to the attention of usage commentators as early as 1927. The most frequently repeated remark about it is that “there is no such word.” There is such a word, however. It is still used primarily in speech, although it can be found from time to time in edited prose. Its reputation has not risen over the years, and it is still a long way from general acceptance. Use regardless instead.
Re:Do you need a database? by funwithBSD · 2014-04-09 02:39 · Score: 1

Way overkill for the project, way underkill for the CV builder.

--
Never answer an anonymous letter. - Yogi Berra
Re:Do you need a database? by funwithBSD · 2014-04-09 02:42 · Score: 4, Funny

You ain't supposed to use it.

--
Never answer an anonymous letter. - Yogi Berra
Re:Do you need a database? by DorianGre · 2014-04-09 02:46 · Score: 1

Low cost solution for underserved and emerging markets. What happens when we hit 20,000 customers? (22.4 billions data points per year)
Re:Do you need a database? by aoteoroa · 2014-04-09 03:39 · Score: 1

We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.

Our company's accounting system uses Mongo on the backend. With about 30 users, and a database that is 7 GB Mongo performs well and sounds like it would fit your application.
Having said that I agree with other posters who have suggested that if you want to plan for future growth you would be wise to consider a real database from the start. We are planning a migration to PostgreSQL this year.
Re:Do you need a database? by brainboyz · 2014-04-09 03:56 · Score: 1

Gah, the day I don't have mod points.
Re:Do you need a database? by Archangel+Michael · 2014-04-09 04:06 · Score: 1

My first Grammar Nazi post

irregardless
IS not a word. Stop using it. Thank you.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Do you need a database? by Grishnakh · 2014-04-09 04:09 · Score: 1

MySQL's normal db engine isn't ACID, so if you care about data integrity, PostgreSQL is a better choice. MySQL's innodb engine is ACID, but doesn't perform as well as Postgres. At least that's my understanding of the situation. I honestly don't see the point at all of using a DB that isn't ACID.
Re:Do you need a database? by flipperdo · 2014-04-09 04:38 · Score: 2

The problem with choosing the best database (or technology in general) for each corner of each application is that before long you've got yourself a maintenance/support nightmare. Better to stick with what you know, provided it's sufficient for the job at hand. Only when there's a compelling reason should you bring in something new. For example, there aren't many use cases for which PostgreSQL isn't sufficient...
Re:Do you need a database? by jopsen · 2014-04-09 05:03 · Score: 1

We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.
Hmm... I would consider azure table storage, if data-points are big or complicated query features is needed, maybe mongodb.

When you say 99% incoming, I suspect you don't need to query much. Hence, querying by scanning entire partitions of the database might be acceptable.
In this case storing the data as one file per customer per week might a good solution. If data collection is your primary goal with cost and scalability as primary concerns..

But unless, you're looking at multi terabyte loads anytime soon, you'll probably do fine with any solution you pick :)
So consider going with what is easiest to develop with and optimize for scalability later...
Re:Do you need a database? by FyRE666 · 2014-04-09 05:05 · Score: 1

To be honest, the OPs use case doesn't require ACID compliance. There's no need for a transaction when performing a single insert. It's also debatable to claim PostgreSQL offers better performance, at least without a qualifier. True it's (currently) faster in some areas,and (currently) equal or slower in others. As I say, I've used PG, MySQL and Oracle, although I haven't used PG for a few years now I'll admit. But it was pretty damning that I actually preferred using the Oracle command line client to PG's version! It's piqued my interest in trying it out again though :)
Most distros either come with a LAMP stack installed now, or an easy way to install one in a couple of minutes, all working out of the box. For the sake of convenience it makes sense. I'm not sure if there's an equivalent turnkey LAPP stack? I'll have to look it up!

--
Code, Hardware, stuff like that.
Re:Do you need a database? by Zmobie · 2014-04-09 05:29 · Score: 1

Yea I definitely wouldn't use NoSQL or any kind of flat file data storage for that amount of data. If you're rather averse to having a large complicated DB, SQLite is probably a good starting point especially because if you find you do need to scale up to a more robust platform it converts very easily. If you expect it to scale up quickly (hitting in the 6 figure range or higher for data points) look at your standard mySQL and other related flavors imho. MongoDB I have heard very good things about (don't have any first hand experience unfortunately, I work with mostly MS SQL, Oracle SQL and mySQL since I do enterprise level work).
Re:Do you need a database? by Zmobie · 2014-04-09 05:40 · Score: 1

The data access architecture will get overly complicated using any kind of flat storage like that with a web app. Asynchronous access to flat storage becomes very complicated and only has an advantage over even a weakly structured DB if your scale is >1000 data points really.
I wrote an application that used flat storage back in college and when even as few as 2 or 3 different access points had to be accommodated for data writing I had to modify a lot of logic just to keep data from becoming corrupt. There was a bit of a performance advantage to doing it, which is why I never went with an actual SQL database, but it was only because of the very small data volume.
SQLite or MongoDB sound like good options for this kind of need, especially because they can be transitioned into much more robust platforms fairly easily. Hell doing some XML structure embedded within the columns could probably help with a lot of the expandibility needed in the structure, but there are just too many headaches in using flat file storage imho.
Re:Do you need a database? by angel'o'sphere · 2014-04-09 05:46 · Score: 1

Operation systems are able to serialize acces to files just fine.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Do you need a database? by angel'o'sphere · 2014-04-09 05:48 · Score: 1

Since when is a NoSQL database not a real database?

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Do you need a database? by nullchar · 2014-04-09 06:08 · Score: 1

Once you have done trend reporting, can you just store that aggregate info instead of all the data from the beginning?
RRDB style (round robin data base) where you store daily stats, weekly, monthly, yearly, etc.
Re:Do you need a database? by Archangel+Michael · 2014-04-09 07:05 · Score: 1

Dictionary.com says it best ...

Usage note
Irregardless is considered nonstandard because of the two negative elements ir- and -less. It was probably formed on the analogy of such words as irrespective, irrelevant, and irreparable. Those who use it, including on occasion educated speakers, may do so from a desire to add emphasis.

and

irregardless an erroneous word that, etymologically, means the exact opposite of what it is used to express, attested in non-standard writing from at least 1870s

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Do you need a database? by funwithBSD · 2014-04-09 08:33 · Score: 1

That would be an "accent" not a use of a word that is recognized as a "word" but not considered common or proper english.

--
Never answer an anonymous letter. - Yogi Berra
Re:Do you need a database? by cheesybagel · 2014-04-09 08:47 · Score: 1

I thought it was a simple key-value data store.
Re:Do you need a database? by cheesybagel · 2014-04-09 08:53 · Score: 1

IMO NoSQL will continue being used but will become less hip and may go under the radar. The fact is for a lot of applications you do not need a relational database model as it is needlessly complicated. OODBs were the opposite as they were more complex than regular SQL back when they were proposed.
Re:Do you need a database? by Hewligan · 2014-04-09 09:04 · Score: 1

I thought it was a simple key-value data store.
Not really. The term NoSQL is used to describe a whole bunch of very different models for storing data, each of which has its own pros and cons.

--
"If God created us in his own image, we have more than reciprocated"
Re:Do you need a database? by denmarkw00t · 2014-04-09 11:44 · Score: 1

Need I remind you: common table expressions? It's rare that the need for one arises, but boy have I been sore a few times from MySQL lacking them.
Re:Do you need a database? by denmarkw00t · 2014-04-09 11:46 · Score: 1

I do prefer Postgres, but damn was Oracle fun (mostly) to work with, mainly because listening to the DBAs explain things felt like I was being imparted with some ancient wisdom.
Re:Do you need a database? by DarkOx · 2014-04-10 00:05 · Score: 1

In that case i say all the more reason to stick with Ruby but use an abstract database API like DBI. You can keep throwing additional front end processors at the problem, good horizontal scaling on the front end, so Ruby's CPU heavy nature won't be an issue, raw compute is still getting cheaper faster than I/O. So I think it makes total sense to keep using tools like Ruby and Python that enable efficient development even at a hit to execution.
DBI will let you change database later with as little rework as possible, if you keep your database use to just storage, and keep you usage to basic table and constraint feature sets widely supported across all database engines. The RDBMS will take care of plumbing around locking and ACID considerations across multiple front ends for you. As well as allow you to run your reporting jobs or data warehouse ETLs without having to either take your main system offline or tightly integrate them with the front end.
Back on the scaling front using the more traditional database engines will give you the last 40 years of developed talent pool, and case studies on what works where scaling is concerned. The tools exist to build these things out for almost any use case. The tools exist on the NOSQL side to but they are more tools for building tools, its still immature and very much DIY.
Ultimately what you have to decide here is where are you going to get the most value for your time. NOSQL *might* offer you some better back end performance down the line, so if you think the data volume is going to get real big real fast give it look. It will certainly mean you will spend more energy working on the plumbing, and force you into dealing with many more unknowns. A RDBMS will provide almost all the plumbing to you; meaning you focus on the front end.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Do you need a database? by rioki · 2014-04-10 02:12 · Score: 1

So default answer to "Which NoSQL database should I use?" is always "Don't use NoSQL."
This! When people come to me and ask what NoSQL database they need, I let them describe the data and the requirements they have. Almost all cases the data is highly structured and need strong query capabilities. In almost all cases it turns out that the problem is not a that SQL DBs will not do the job, but rather a poor understanding of DBs in general and a koowl new hawtness vibe. I am not saying that MongoDB, CouchDB or Redis are not interesting tools, but with all their advantages, they have strong drawbacks.
Basically, if you look at the available options and do not know which to pick, you did not do your homework. You should research your requirements you have and the different options, including the classical SQL databases. In my experience, a clear option will normally jump at you once you have sufficiently invested time and effort.
Re:Do you need a database? by whitroth · 2014-04-10 04:53 · Score: 1

Flat files? What was the world's *largest* database, at least as of 6 or 8 years ago, is Daytona, with trillions of records. It's Bellcore, it's flat files, they write quesies in C... and it's the record of every phonecall ever made, back to "Come here, Mr. Watson, I need you".
mark
Re:Do you need a database? by thirdender · 2014-04-10 05:55 · Score: 1

I recently posed the following question in #mysql:
Wordpress stores posts (pages) in one table, and individual fields of data in another table. As more fields are added, the field table grows geometrically.
Drupal stores entities (often, but not always, pages) in one table, and each field in separate tables. JOINs are used to query the data.
So which is more optimal? One large table with all field data, or separate tables and JOINs? The response was that both CMSs use MySQL "wrong". How I understood that was, MySQL is built for large tables of data. Each table should have a defined structure where each field is given it's own column. This is in fact how some CMS systems manage their database (afaik, Expression Engine uses this structure). However, this is less flexible in terms of multi-value fields and, eventually, modifying the table structure will become slow as the amount of data in the system grows.
Given that programmers have no foreknowledge of the fields that will be created by CMS users, there needs to be a level of flexibility that's not immediately possible with a relational database. Does NoSQL not fill that need?

Use PostgreSQL by Anonymous Coward · 2014-04-08 21:17 · Score: 5, Informative

If you need to store less than a few hundred million rows just use PostgreSQL.
It supports JSON and transactions.

Re:Use PostgreSQL by Lennie · 2014-04-08 21:37 · Score: 4, Insightful

Yes, that is what I would wanted to point out too.
Also in PostgreSQL 9.4 it has jsonb which is, in certain tests less than a year ago, faster than MongoDB.

--
New things are always on the horizon
Re:Use PostgreSQL by Lennie · 2014-04-08 21:39 · Score: 2

Also if you want a key/value store, there is also http://symas.com/mdb/ from a company of some of the OpenLDAP developers.
Which really seems to be have the fastest read performance of them all.

--
New things are always on the horizon
Re:Use PostgreSQL by Anonymous Coward · 2014-04-08 22:29 · Score: 1

Object languages like PHP and relational databases have impotence mismatch.
Re:Use PostgreSQL by zauberberg51 · 2014-04-09 00:58 · Score: 1

impotence => impedence for those who are not too impudent
Re:Use PostgreSQL by fsagx · 2014-04-09 01:33 · Score: 1

idempotence is important!
Re:Use PostgreSQL by NatasRevol · 2014-04-09 02:16 · Score: 1

Unfortunately, as the submitter gave details later.
Each customer is about 500k data points per year.
Thousands of customers is a few hundred million rows, per year.

--
There are two types of people in the world: Those who crave closure
Re:Use PostgreSQL by Anonymous Coward · 2014-04-09 02:48 · Score: 1

Longtime PG user and huge fan. Especially having been bitten in the ass by mysql about 2001...
But... be careful with what you recommend.
It does scale... well. But not as fast and as cheaply as the NoSQL platforms.
That it comes with transactions is great. UNTIL the moment you actually try to use them.
You can read as fast as you want over the pgpool/cluster/slony, but zombie-jesus help you if you are trying to run 5,000 tps without transaction batching -- it has staggered every time we've tried at multiple companies.
And it's key-value store, hstore.... ? Yeah, we had that freak the fuck out and lose all its data twice at not even a hundred million records...
I love the parts of it that work. But the parts that don't... well... it's a database. They should be clearly identified.
Re:Use PostgreSQL by rtaylor · 2014-04-09 02:48 · Score: 1

Right. So 5 years from requiring a NoSQL DB, and hardware/software advancements in that period will likely give another 3 years of easy growth with just a basic Pg installation.
If it was 10m text/blob records per day, that would be a different animal; but it's probably 1/10th of that.

--
Rod Taylor
Re:Use PostgreSQL by rycamor · 2014-04-09 03:29 · Score: 1

A few hundred million rows is no trouble to PostgreSQL, if configured right. And if you go beyond that there are some great ways to deal with the problem:
1. Partitioning: Make a large table composed of smaller subset tables. This is a great way to deal with what is primarily historical data, since you can partition by month, quarter, or whatever time period makes sense for your application. Then, when it comes time to archive or delete old data, all you have to do is migrate that month's table to the archive location, or just drop it. MUCH less expensive than a DELETE with a WHERE clause.
2. BigSQL: if you want the power of NoSQL but the querying ability of PostgreSQL, check out this package.
3. If you are starting to get serious data, hopefully you are making serious money. There are scores of commercial entities that can help you get a lot more performance out of PostgreSQL. Some of them have add-ons for performance, or have just gotten a lot of experience and good ideas on how to deisgn a solution.
These steps may sound like a pain, but NoSQL brings all sorts of pain with it, also. Limited querying ability, many extra measures required for data integrity, stability issues... bizarre limitations in some areas... Think these things through carefully, and don't fall for anyone's hype.
Re:Use PostgreSQL by angel'o'sphere · 2014-04-09 05:52 · Score: 1

And why should he migrate his data in 5 or 8 years when his problem is so simple that he can code it right away in a day or two for a NoSQL database?

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Use PostgreSQL by nullchar · 2014-04-09 08:38 · Score: 1

Was your 5000 tps using normal insert/update/delete statements or using the COPY statement? (I guess it's a form of batching: meaning, you issue large copy statements instead of many insert statements, if your application can data that way.)
Also, was your hstore experience with 9.3+ or what version(s) had problems?
Re:Use PostgreSQL by cavebison · 2014-04-10 15:44 · Score: 1

impotence => impedence for those who are not too impudent
Impedance actually. :)

Sounds like you need a database by Anonymous Coward · 2014-04-08 21:23 · Score: 5, Insightful

You might want to consider a SQL database.

Re:Sounds like you need a database by wiredlogic · 2014-04-09 03:29 · Score: 1

But he needs something that's webscale. Probably will need sharding too.

--
I am becoming gerund, destroyer of verbs.
Re:Sounds like you need a database by labradore · 2014-04-09 03:49 · Score: 1

Box, DropBox, Etsy, Twitter and Facebook and Amazon all rely on MySQL at "webscale".
Perhaps this will also work for our friend.
Re:Sounds like you need a database by angel'o'sphere · 2014-04-09 05:54 · Score: 1

ROFL, facebook uses cassandra, twitter a self crafted NoSQL database ...

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Sounds like you need a database by cmr-denver · 2014-04-09 08:48 · Score: 1

For those of you who don't get the joke: https://www.youtube.com/playli...

Please specify a better scenario by prefec2 · 2014-04-08 21:25 · Score: 2

Based on your information no one can give you solid advice. It highly depends on the load you expect and on the data model you will use. for a simple twitter, you can use a log file, or any NoSQL technology. If you only have a few transactions and not billions of entries, you could use PostgreSQL or even MySQL. However, PostgreSQL scales better. If you want to make complex interpretations on graph like data you may consider Neo4J as a graph DB.

Re:Please specify a better scenario by OzPeter · 2014-04-08 23:00 · Score: 4, Insightful

Based on your information no one can give you solid advice.
IMHO the question is deliberately designed to be vague. iPhones and Android devices, PHP and Ruby On Rails .. that is such a shotgun blast of specifications that are totally unrelated to the DB use on the back end that the entire question smells of click bait to me.

--
I am Slashdot. Are you Slashdot as well?
Re:Please specify a better scenario by khchung · 2014-04-09 00:07 · Score: 5, Insightful

Based on your information no one can give you solid advice.
IMHO the question is deliberately designed to be vague. iPhones and Android devices, PHP and Ruby On Rails .. that is such a shotgun blast of specifications that are totally unrelated to the DB use on the back end that the entire question smells of click bait to me.
Either that, or the OP simply have no idea how databases work at all.
If OP has any idea how database (any database, not just relational) works, he would be talking about data and transaction volumes, access patterns, transactional requirements, data integrity constraints, retention and housekeeping requirements, etc.
Instead, as you said, he talked about devices platforms, communication protocols, language and runtime environment which are all irrelevant to choosing database. (ok, the last may be a bit relevant depending on which database used)

--
Oliver.
Re:Please specify a better scenario by jythie · 2014-04-09 00:35 · Score: 1

And here I am out of mod points.

At first reading something seemed off about the question, and I think you summed it up nicely.

To me it comes across a bit as the OP asking 'I need some vaguely authoritative sounding reasons for a sexy solution, look at my keywords and tell me what is "in" with that community'
Re:Please specify a better scenario by DorianGre · 2014-04-09 01:06 · Score: 1

Not bait, simply dipping my tow into the NoSQL waters. I have been a developer for almost 20 years now and can spin this up with a SQL database in under an hour. The big thing here is that it be highly scalable (thus the iphone/andriod - you never know how big these will get, or how fast) and we are able to get some kind of structured time based reports out on the back end.
Re:Please specify a better scenario by OzPeter · 2014-04-09 01:45 · Score: 1

I have been a developer for almost 20 years now and can spin this up with a SQL database in under an hour.

If you have have been a developer for 20 years then you should know that people will be skeptical of any question that lets them play and win Buzzword Bingo from a single sentence.

--
I am Slashdot. Are you Slashdot as well?
Re:Please specify a better scenario by DorianGre · 2014-04-09 01:50 · Score: 1

Sorry about that. I just wanted you to have an understanding of what our complete stack looks like and what our connectivity issues would be. I am sure that some NoSQL has better support for PHP, and others have better support for Java. Also, just indicating that this is to support mobile apps, and the inherent unknown scaling issues that come from that.
Re:Please specify a better scenario by oh_my_080980980 · 2014-04-09 02:20 · Score: 1

Why don't I believe you. If you can spin this up with a SQL database in under an hour, then you have your answer. The fact that you repeat "scalability" and "reporting" leads me to believe you do not understand what databases, in particular SQL databases, can do.
Re:Please specify a better scenario by DorianGre · 2014-04-09 03:02 · Score: 1

POC is already running. It is heavily write intensive just in testing and, having been on the receiving end of a firehose of data before, we really just wanted to investigate the options. The easy scaling is to just split customers across multiple copies of the database and link them for aggregate queries, but it seems like such a cludge. In the past I have logged everything to flat files and then imported those into the DB every 5 minutes or so, which helps with web layer scaling, but creates a lot of unintended issues managing the flat files. Thus, we are looking at the other options out there.
Re:Please specify a better scenario by nullchar · 2014-04-09 08:51 · Score: 1

Instead of "sharding" (split customers across multiple copies of the database) you should try a NoSQL solution to handle the flood of writes as the first layer. Then an recurring process can query the data in your NoSQL object store (by timestamp) and aggregate it into an SQL database for reporting. You could archive those processed entries, or wait until they get old, to another object store for your "data warehouse" -- basically just an archive in case you need to do different aggregate reporting in the future (depending on storage size of course).
I must ask, do you really need to store each full piece of information written by these clients at such a high volume?
Depending on your use of the data, you could even just store the results in memory for X hours/minutes, and then aggregate-process that and write the results to your SQL DB. A single DB with many application servers would be fine in this condition, with writes every X hours/minutes. (You are probably already flat-file logging the incoming requests; that is an archive if you *really* need to go back.) If you cannot afford memory loss if an app server dies, solutions like EhCache (java) will persist the memory to disk, in case of hardware/software failure.

NoSQL? by aaaaaaargh! · 2014-04-08 21:29 · Score: 5, Insightful

I would like to start with a NoSQL solution for scaling

And there it is, the proverbial premature optimization ...

Re:NoSQL? by louaish88 · 2014-04-08 21:49 · Score: 1

But, but, but NoSQL is Webscale!
Re:NoSQL? by mwvdlee · 2014-04-08 21:52 · Score: 2

Being able to scale from 1 billion records a day to 10 billion a day does not a premature optimization make.
The simple fact is that there's not enough information to give any reasonable advise.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:NoSQL? by gnoshi · 2014-04-08 23:14 · Score: 4, Funny

Shards! It has shards!
Re:NoSQL? by Sarten-X · 2014-04-09 00:00 · Score: 5, Insightful

As an expert (relative to most of Slashdot) in NoSQL databases, with a significant amount of experience in Hadoop and HBase systems, I agree wholeheartedly.
NoSQL solutions can be ridiculously fast and scale beautifully over billions of rows. Under a billion rows, though, and they're just different from normal databases in various arguably-broken ways. By the time you need a NoSQL database, you'll be successful enough to have a well-organized team to manage the transition to a different backend. For a new project, use a RDBMS, and enjoy the ample documentation and resources available.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:NoSQL? by tigersha · 2014-04-09 00:04 · Score: 1

Thank you. Someone who talks sense around here.

--
The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
Re:NoSQL? by VortexCortex · 2014-04-09 00:09 · Score: 3, Funny

Shards! It has shards!
Heal The Dark Crystal, Gelfling!
Only then can the two be made one!
Re:NoSQL? by Anonymous Coward · 2014-04-09 00:29 · Score: 2, Insightful

As an expert (relative to most of Slashdot) in NoSQL databases, with a significant amount of experience in Hadoop and HBase systems, I agree wholeheartedly.
NoSQL solutions can be ridiculously fast and scale beautifully over billions of rows. Under a billion rows, though, and they're just different from normal databases in various arguably-broken ways. By the time you need a NoSQL database, you'll be successful enough to have a well-organized team to manage the transition to a different backend. For a new project, use a RDBMS, and enjoy the ample documentation and resources available.
Agreed. I used a NoSQL database on a project I'm working on at the moment, and stick by that decision even though I don't even have millions of row, but my situation is somewhat different to the OP's: my data model is very difficult to map to SQL (I have hundreds of different entity types, each of which has different field storage requirements, and need to be able to associate between entities of different types according to a variety of rules, meaning that some entity types may have hundreds of different types of entity associated with them; SQL quite simply sucks for this kind of data, but thankfully applications where you end up with this kind of data are few and far between). OP's data sounds like an ideal candidate for storage in a relational database; he has one basic entity type, no need to make any kind of connection between entities, and apparently no complicating factors at all.
Re:NoSQL? by Wootery · 2014-04-09 00:56 · Score: 1

It's true!
Re:NoSQL? by DorianGre · 2014-04-09 01:09 · Score: 1

We don't know how big to scale. A few thousand users, a few million?? Apps in the wild are like this sometimes. New to NoSQL and really just wanted a good place to start with a platform that would let us scale. I don't want the Oracle on million dollar hardware problem again.
Re:NoSQL? by DorianGre · 2014-04-09 01:10 · Score: 1

Thanks. This is how we were going, but as we have a blank canvas at the moment, this was a why not sort of decision to look at NoSQL solutions.
Re:NoSQL? by NatasRevol · 2014-04-09 02:26 · Score: 1

Reports on big dbs are always the choke point.
Lots of people, DBAs included, seem to miss this.

--
There are two types of people in the world: Those who crave closure
Re:NoSQL? by Sarten-X · 2014-04-09 03:15 · Score: 4, Interesting

"Why not" is because the cost/benefit analysis is not in NoSQL's favor. NoSQL's downsides are a steeper learning curve (to do it right), fewer support tools, and a more specialized skill set. Its primary benefits don't apply to you. You don't need ridiculously fast writes, you don't need schema flexibility, and you don't need to run complex queries on previously-unknown keys. Rather, you have input rates limited by an external connection, only a few entity types, and you know your query keys ahead of time.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:NoSQL? by rjstanford · 2014-04-09 03:34 · Score: 1

Are you reporting across customers? If not, then sharding totally takes care of your problem. If so, then a combination of sharding and some meaningful aggregation may.
It really sounds like you've already decided on a solution and are looking for affirmation rather than advice. I've regularly inserted millions of rows into a simple 3-node MySQL cluster (unsharded) every day for years... if you don't like SQL, that's fine, but what you're asking for sure sounds like a problem that a halfway competently set up SQL system can handle without breaking a sweat, and almost all of the problems have been encountered, documented, and solved already.

--
You're special forces then? That's great! I just love your olympics!
Re:NoSQL? by DorianGre · 2014-04-09 03:35 · Score: 1

All correct statements. Thanks.
Re:NoSQL? by avandesande · 2014-04-09 04:38 · Score: 1

You will also have a hard time hiring developers to work on it, if you ever get to that point.

--
love is just extroverted narcissism
Re:NoSQL? by Sarten-X · 2014-04-09 05:14 · Score: 1

In my experience, it's not much harder than finding developers with any other specialized skill set.
Hadoop and HBase are exposed as libraries with well-documented APIs. If you're trying to hire developers who can't read an API doc, you have bigger problems than database choice. If you want to hire someone who already knows what they're doing, then your prospects are similar to finding a dev who already knows a particular 3D engine, or kernel development, et cetera.
If you're hiring competent documentation-reading developers anyway, and are willing to pay the expenses while they learn the idioms of this particular library, then there's no additional difficulty in the hiring process.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:NoSQL? by avandesande · 2014-04-09 05:18 · Score: 1

If they use a RDBMS, they don't need to hire someone with a specialized skill set, at least for that part of the application.

--
love is just extroverted narcissism
Re:NoSQL? by aicrules · 2014-04-09 05:42 · Score: 1

Of course not, you're a boy!
Re:NoSQL? by Sarten-X · 2014-04-09 07:29 · Score: 1

I take it you've never seen a DBA in a code review.
The ability to write well-formed SQL queries that are efficient and correct is also a specialized skill. It may not be one you recognize, presumably because you've had it for so long, but the majority of applicants I've encountered are not suited for doing production SQL work. They might be able to write a simple query, but finding someone who understands keys, indexes, views, and all of the other efficiency-improving features is a rarity indeed.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:NoSQL? by rjstanford · 2014-04-09 10:29 · Score: 1

The ability to write well-formed SQL queries that are efficient and correct is also a specialized skill. It may not be one you recognize, presumably because you've had it for so long, but the majority of applicants I've encountered are not suited for doing production SQL work. They might be able to write a simple query, but finding someone who understands keys, indexes, views, and all of the other efficiency-improving features is a rarity indeed.
And yet SQL has been around for decades and has a massively greater installed-base than even the most popular NoSQL tools. How many people out there do you think really understand MongoDB's nuances at scale (remember, if we're not talking billions of rows then it really doesn't matter what tool is being used, including bog-standard MySQL).
All of your arguments - and they are real, and well-reasoned - apply to the NoSQL space far more than the SQL space.

--
You're special forces then? That's great! I just love your olympics!
Re:NoSQL? by swillden · 2014-04-09 14:16 · Score: 1

Shards! It has shards!
:-)
It's worth pointing out that you can also shard SQL databases. Sharded MySQL is used pretty widely at Google, for example, especially for data that needs transactional consistency guarantees, not just eventual consistency, and/or lower long-tail latency. I've seen sharded MySQL scale to petabytes just fine.
The downsides, of course, are that you have to pick a value to shard on, you have to implement the sharding, and if you need to do a lot of cross-shard joins in interactive time frames, you're sunk. For batch processes, there's mapreduce, or for interactive queries that can stand update latency you can pump the data into a different database structured for that purpose.
In the short term, though, just use a regular, unsharded, SQL database. By the time you grow enough to need to shard it, or to move to a NoSQL solution, you'll be able to afford a team of engineers to build and migrate to the new solution.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:NoSQL? by Bogtha · 2014-04-09 21:15 · Score: 1

Being able to scale from 1 billion records a day to 10 billion a day does not a premature optimization make.

It does when you are currently averaging zero records per day.

--
Bogtha Bogtha Bogtha

2 comments, both useless by Anonymous Coward · 2014-04-08 21:29 · Score: 1

To answer the question "Which NoSQL Database For New Project?" there are 2 comments:
- A relational database
- A plain text file

The user gave an argument: "I would like to start with a NoSQL solution for scaling"

NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.

I would recommend MongoDB if the transactional aspect is not important for your purpose: easy to learn, easy to use.

Re:2 comments, both useless by Anonymous Coward · 2014-04-09 00:41 · Score: 2, Informative
NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.
I'd like to dispute this. Based on the OP's description of his application, two things come to mind:
- His application is mostly-write-only. He probably does not need instant query ability, but may need to be able to handle a very large number of inserts per second (assuming he's justified in his assertion that he needs scalability). For this kind of application, logging your incoming data to a plain text file (or sequentially-appended binary data file, or any other write-only plain file approach) can be a significant performance improvement. This files can then periodically (e.g. every hour, every minute, whatever time frame suits) be pulled of local storage, merged, and inserted into a central database as a batch from which read queries are performed. Single batched updates are much more efficient than large numbers of small updates.
- His queries are easily parallelized. He needs to perform only two operations: selecting data based on simple criteria, simple numerical summarization. Both of these are trivially scaled horizontally by using systems with local SQL databases and a simple service running on the machines as nodes in a map/reduce architecture.
Blanket statements like yours above can't really be made without reference to the intended application, as some applications scale much more easily than others, and OP's sounds like it's one of the easy kind.

MongoDB by timkofu · 2014-04-08 21:31 · Score: 2

These guys are committed, meaning mongo has a future. 2.6 that came out the other day has some nice new features and many bug fixes.

Re:MongoDB by Anonymous Coward · 2014-04-09 00:53 · Score: 1

Plus it's web scale. You just plug it in and it scales right up.
Re:MongoDB by DorianGre · 2014-04-09 01:16 · Score: 1

Thanks. We are not trying to make this deliberately difficult. If we go with NoSQL, it seems MongoDB is the one people are leaning towards.
Re:MongoDB by CauseBy · 2014-04-09 05:03 · Score: 1

I took a little online Mongo class and I actually liked it a lot. I thought it was pretty well designed and easy to use, although totally different than all the traditional databases I'd ever used. There's a lot of nerd-hate out there for Mongo and I don't really get it, I think those people can't even stretch the boundaries of their mental jail cells a little bit.
If you use Mongo, use JSON everywhere in your app. That will make your life easier.

light by invictusvoyd · 2014-04-08 21:33 · Score: 3, Insightful

SQLite is a relational database management system contained in a C programming library. In contrast to other database management systems, SQLite is not a separate process that is accessed from the client application, but an integral part of it.

Database Scaleability. by tonywestonuk · 2014-04-08 21:34 · Score: 5, Insightful

"I'll need to be able to pull by date or by a number of key fields"

So, in other words, you have already decided on key fields. If you use a database, this has things call index's, that can search billions of rows for a key field in a fraction of a second.
If you don't use something with INDEX's then you can't do this.

Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.

Re:Database Scaleability. by cyber-vandal · 2014-04-08 21:35 · Score: 4, Insightful

Where has this idea that Databases can't scale come from?
Salesmen
Re:Database Scaleability. by Anonymous Coward · 2014-04-08 21:43 · Score: 1

Also known as "Scalesmen"
Re:Database Scaleability. by korgitser · 2014-04-08 21:55 · Score: 1

b.bb...but mongodb is webscale!

--
FCKGW 09F9 42
Re:Database Scaleability. by Raumkraut · 2014-04-08 22:27 · Score: 3, Informative

MongoDB has indexes.
MongoDB also lets you store and query arbitrary data, in addition to any "key fields", without having to pre-define all the possible fields. Which it seems is what the submitter asked for.
Where has this idea that "NoSQL" means "not a database" come from?
Re:Database Scaleability. by janoc · 2014-04-08 23:15 · Score: 5, Insightful

Databases don't scale for people who don't understand SQL, don't understand data normalization, indexing and want to use them as flat files. Unfortunately, a way too common anti-pattern :(
The second group are too-cool-to-learn kids using the latest development tool fad on the market to build yet another Facebook/Twitter/Instagram/whatever clone ...
Re:Database Scaleability. by TheDarkMaster · 2014-04-08 23:47 · Score: 1

We have a winner here. When I saw the number of buzzwords in the article, I already thought the worst too.

--
Religion: The greatest weapon of mass destruction of all time
Re:Database Scaleability. by wvmarle · 2014-04-09 00:24 · Score: 1

I've mis-used databases just as you describe. And continue to do so. That's fine, I'm an amateur, and I never needed to handle databases larger than a couple thousand rows. I could probably get away with tens or hundreds of thousands of rows before running into problems.
Now if I were to develop something that needed a billion rows - that's a different story, and I do know my current approach won't work and I'd have to learn a lot about databases to pull it off. And submitter is obviously trying to do that (or at least something that needs a few rows and hoping it grows larger than Facebook and Google combined, so he needs scalability). Also I believe submitter doesn't really know what he's talking about.
If you really need to be able to handle that kind of data sets, and have even just a subset of the skills needed, you don't come to Slashdot for advice. You'd know who to ask - a friend or colleague who does just that.
So submitter may have big dreams, he almost certainly doesn't have the skills to have even a fighting chance of making it. And with that I don't need the actual database management skills, but the skills of knowing where your weaknesses are, knowing who can fill those gaps, and asking those people (maybe by having a discussion over a beer, or by hiring them outright).
Re:Database Scaleability. by Anonymous Coward · 2014-04-09 00:53 · Score: 1

The ATM using MONGODB would explain why I never have any money in my account.
Re:Database Scaleability. by CadentOrange · 2014-04-09 01:17 · Score: 1

What if you have to use PostgreSQL? I've seen no evidence that it can scale or run multi-master.
Are you high? Instagram (200 million users) uses PostgreSQL. PostgreSQL is web scale :)
Re:Database Scaleability. by DorianGre · 2014-04-09 01:18 · Score: 1

From what I have read, the indexing tools in MongoDB are superior to CouchBase. Can someone confirm this?
Re:Database Scaleability. by DorianGre · 2014-04-09 01:46 · Score: 1

Experience. I've personally been in the "we doing 4 billion transactions a day and replicating that over multiple data centers" thing. Don't want to do that again. Its crazy expensive in hardware and effort.
Re:Database Scaleability. by DorianGre · 2014-04-09 01:56 · Score: 1

Unfortunately, I am stuck in backwater USA, and my contacts from my silicon valley days have mostly cashed out or I've lost touch. We expect a few hundred thousand rows of data a day, but you never know. I've hit the hard limits on databases before, so want to avoid that up front if possible.
Re:Database Scaleability. by mbourgon · 2014-04-09 02:31 · Score: 2

And Developers. Anything to keep those damn DBAs away.
(Yes, I'm a DBA)

--
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
Re:Database Scaleability. by DorianGre · 2014-04-09 02:42 · Score: 1

Sorry, didn't mean to make buzzword soup. Here is what we have: Mobile apps -> PHP rest apis -> some datastore (Currently MySQL) -> PHP web for reporting. We are dealing with tracking physical products. We will keep the MySQL for user management and primary tables of product information (UPC, description, weight, etc), but storing information on everything else is where we are not sure.
Re:Database Scaleability. by mlk · 2014-04-09 02:53 · Score: 1

Is that required for you current project? Will the disadvantages of NoSQL based databases hit you (for example are cross-document consistency or cross-document transactions important)?
Give what you have posted in the OP I don't see a good reason to select No SQL over relational over object over flat files and without more information all you are going to get is a list of personal preferences. So lets jump in with my personal preference... look into your data you plan to store and what queries you want to do on it. Then match that against the general styles (relational, object, KV, tabular, graph). Once you have done that I'd likely just look at the leaders for that style of data store and spike it out.

--
Wow, I should not post when knackered.
Re:Database Scaleability. by rthille · 2014-04-09 03:00 · Score: 1

Where has this idea that Databases can't scale come from?
the CAP theorem
Consistency, Availability, Partition-Resistance. Choose any two.

--
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Re:Database Scaleability. by TheDarkMaster · 2014-04-09 03:07 · Score: 1

Okay, I assume you are the original author of the topic. Looking the whole situation, I guess your primary problem is the ability to handle a large number of simultaneous users, correct? Databases like Postgres support this type of work, only if you had an operation of the size of Facebook you would begin to have problems. However, remember that the database is only part of the chain. You will need the application itself also has high performance (Ruby and performance are mutually exclusive). As an example, I have an application wherever although the client-server communication uses HTTP, the server is a highly specialized application that only pretends to be a "web server", receiving commands over HTTP but executing them in a specialized way and communicating directly with the database without intermediate frameworks. A bit strange, but works very well.

--
Religion: The greatest weapon of mass destruction of all time
Re:Database Scaleability. by nevermindme · 2014-04-09 03:25 · Score: 1

If you have hit limits before (MySQL?) use a very mature platform that operates effectively when the DB does get larger than available memory for indexes or dataset such as DB2, Oracle and MSSQL and strongly type inputs and normalize your data set in the first place and use a language native data connector APIs, pointers and record locking. Just about anything important to the application needs to go through a Stored Procedure because database should not trust anything directly from the application to be suitable to a query at the data layer. As always set yourself up on a platform with service accounts that are RO and RW. In other words don't set yourself up outside the big data mainstream.

If your product is a unexpected hit you will have what seams like zero moments to fix the database and program design without business impact. And start with the DB and the Application/Web layer being VMs load balancing to begin with and the database in some sort of cluster.. Have the entire Apps platform packaged to go to a outsouced data center or Amazon on day one so that a 1000x user growth per day you have a plan. This is sort of Web Application Architecture and Design 101 as any code monkey can put data into NOSQL and hope to get it back through just the right query.
Re:Database Scaleability. by Bacon+Bits · 2014-04-09 03:33 · Score: 4, Insightful

God forbid someone make them think about their data structures and how the end user might need to query them with their own reports.

--
The road to tyranny has always been paved with claims of necessity.
Re:Database Scaleability. by Kjella · 2014-04-09 04:13 · Score: 2

Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.
Of course you can with lots of money in hardware and software and top notch database administrators, architects and query designers but it's a lot of hard work and expensive. The sales pitch for NoSQL is that it's built for horizontal scale-out by design, just throw more servers at it - mainstream servers, not the extremely expensive high-end servers and it'll scale almost indefinitely without having to rework everything. There's a lot of people in the "when we go viral we must be ready for it" category, with highly variable degrees of realism. And social media has been the big buzzword lately where social media feeds of various forms are almost ideal for NoSQL, nobody cares if the feed is perfectly consistent or updated with the last two seconds of posts from your friends. To the tech-unsavvy, "They did that why can't we?"

--
Live today, because you never know what tomorrow brings
Re:Database Scaleability. by danomac · 2014-04-09 07:01 · Score: 1

Where has this idea that Databases can't scale come from?
MS Access.

MariaDB by Anonymous Coward · 2014-04-08 21:39 · Score: 2, Insightful

I would consider using the latest release of MariaDB.

You can use it as a standard MySQL server, but they also have Cassandra NoSQL as an engine for it now (since the release of 10)... So you would be easily able to play with things on different database types and see what suits your situation better.

MongoDB obviously... by kryps · 2014-04-08 21:39 · Score: 1

... since it is web scale. ;-)

https://www.youtube.com/watch?v=b2F-DItXtZs

Re:MongoDB obviously... by jythie · 2014-04-09 00:47 · Score: 1

I was hoping someone would post that ^_^ always good for a laugh.

Elastic Search by Anonymous Coward · 2014-04-08 21:42 · Score: 1

If you're going to need search at some point you should just opt for Elastic Search from the start. Yeah, it's a search engine, but it's also a rather good key/value store.

Re:Elastic Search by beerbear · 2014-04-08 22:31 · Score: 1

I second this. Easy to set up, easy to use.

--
Hold my beer and watch this!

Short Intro by emblemparade · 2014-04-08 21:51 · Score: 5, Informative

It's a mistake to think that "NoSQL" is a silver bullet for scalability. You can scale just fine using MySQL (FlockDB) or Postresgl if you know what you're doing. On the other, if you don't know what you're doing, NoSQL may create problems where you didn't have them.

An important advantage of NoSQL (which has its costs) is that it's schema-free. This can allow for more rapid iteration in your development cycle. It pays off to plan document structures carefully, but if you need to make changes at some point (or just want to experiment), you can handle it at the code level. You can also support older "schemas" if you plan accordingly: for example, adding a version tag or something similar that can tell your code how to handle it. So, even ignoring the dubious potential of better scalability, NoSQL can still be beneficial for your project.

More so than SQL, NoSQL database are designed for different kinds of applications, and have different strengths:

MongoDB is a really good backend engine that gives programmers lot of control over performance and its costs: if you need faster writes, you can allow for eventual integrity, or if you need faster reads, you can allow for data not being the absolute freshest. For many massive multiuser applications, not having immediately up-to-date data is a reasonable compromise. It also offers an excellent set of atomic operations, which from my experience compensate well for the lack of transactions. Furthermore, MongoDB is by far the most feature-rich of these, supporting aggregate queries and map-reduce, which again can make up for the lack of joins. It also offers good sharding tools, so if you do need to scale, you can. Again, I'll emphasize that you need a good understanding of how MongoDB works in order to properly scale. For example, map-reduce locks the database, so you don't want to rely on it too much. The bottom line is that MongoDB can offer similar features to SQL databases (though they work very differently), so it's good for first-timers.

Couchbase is very good at dispersed synchronization. For example, if parts of your database live in your clients (mobile applications come to mind), it does a terrific job at resynching itself and handling divergences. This is also "scalable," but in a quite different meaning of the term than in MongoDB.

I would also take a look at OrientDB: it's not quite as feature rich as MongoDB (and has no atomic operations), but it can work in schema-mode, and generally offers a great set of tools that can make it easy to migrate from SQL. It's query language, for example, looks a lot like SQL.

The above are all "document-oriented" databases, where you data is not opaque: the database actually does understand how your data is structured, and can allow for deep indexing and updating of your documents. Cassandra and REDIS (and Tokyo Cabinet, and BerkeleyDB) are key-value stores: much simpler databases offering fewer querying features: your data is simply a blob as far the engine is concerned. I would be less inclined to recommend them unless your use case is very specific. Where appropriate, of course simpler is better. With these kinds of databases, there are actually very few ways in which you can create an obstacle for scalability: simply because they don't do very much, from a programming perspective.

There are also in-between databases that are sometimes called "column-oriented": Google and Amazon's hosted big data services are both of this type. Your data is structured, but the structure is flat. Generally, I would prefer full-blown "document-oriented" databases, such as MongoDB and OrientDB. However, if you're using a hosted service, you might not have a choice.

It's also entirely possible to mix different kinds of databases. For example, use MongoDB for your complex data and use REDIS for a simple data store. I've even seen sophisticated deployments that very smartly archive data from one DB to another, and migrate it back again when necessary.

Re:Short Intro by St.Creed · 2014-04-09 00:26 · Score: 1

Any relational database can also do "schemaless" models, by using the EAV (anti-)pattern. Mainly this conveys a lack of understanding of your data and a lack of planning and design in your datamodel, but hey, it happens. The fun thing is that you still get all those nice database features like parallel processing, concurrency, SQL, ACID transactions if you want them, security and maintenance tooling, etc.
And if you use a modern database like SQL 2014 or Oracle's latest, you will get column-based compression (okay, it still sucks in SQL Server 2014, but it's a start), so the whole issue with extending sparse schema's is moot. If you use the 6th normal form it's not an issue anyway since that implements column-based compression by modeling it.
What you say is of course correct. It's just that for people who have a nice toolbox with all kinds of data models, relational databases go a lot further than most people think.

--
Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
Re:Short Intro by mrpoundsign7072 · 2014-04-09 00:58 · Score: 3, Insightful

And any text file can be transnational if you write your code right. We can keep going down this road about how you don't /need/ X technology, but nobody wins. It's really OK to see the good in different technologies.
Re:Short Intro by St.Creed · 2014-04-09 01:12 · Score: 1

I agree that that road isn't productive (otherwise we'd still write machine code since we can do everything in machine code), but the hint of "it's going to be on internet so I can't use and RDBMS" in the original question is silly, and that's what I react to.
Given 3 trillion users your options are pretty much limited to horizontal scaling, no SQL etc. but most people never get that far with their applications and in that case, storing the data in a noSQL database and then getting actionable information out of it (which is the hardest part IMO) is a lot of effort spent for something much cheaper and easier done with an RDBMS.

--
Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
Re:Short Intro by emblemparade · 2014-04-09 04:06 · Score: 1

I agree, entirely. Even more interesting are the recent "noSQL" features added to Postgres. The fervor of the "noSQL" is too often "anti-SQL," in a ridiculously technically uninformed way.
Re:Short Intro by St.Creed · 2014-04-09 08:09 · Score: 1

Oh, didn't know about Postgres's new features. Well, that gives me something do to tonight :) Thanks.

--
Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)

Just Use SQL by Anonymous Coward · 2014-04-08 21:52 · Score: 5, Insightful

I just felt I have to comment on this. So many developers start with the phrase "I need NoSQL so I can scale" and almost all of them are wrong. The chances are your project will never ever ever scale to the kind of size where the NoSQL design decision will win. Its far more likely that NoSQL design choice will cause far more problems (performance etc), than the theoretical scaling issues.

Take for example two systems I've been involved with for managing WiFi access to large scale networks (100,000+ concurrent users, 1000's of APs), one uses MongoDB the other based on PostgresSql. The MongoDB based solution has very real performance problems, its reporting takes a very long time to run taking very large amounts of system ram (24G in some cases) and that performance is only degrading as the system grows, there are also many other performance issue. These issues are not just mongo issues but simply that NoSQL is not well suited to the task. The system has been rewritten using an SQL backend and now works much better but importantly it's scaling but better. Growth in the system is no-longer degrading performance and the point where we need hardware upgrades or extra servers etc are now much more predictable so we can predict cost base growth in relation to user growth.

NoSQL does not guarantee scaling, in many cases it scales worse than an SQL based solution. Workout what your scaling problems will be for your proposed application and workout when they will become a problem and will you ever reach that scale. Being on a bandwagon can be fun, but you would be in a better place if you really think through any potential scaling issues. NoSQL might be the right choice but in many places I've seen it in use it was the wrong choice, and it was chosen base on one developers faith that NoSQL scales better rather than think through the scaling issues.

PostresSQL or Riak by imbaczek · 2014-04-08 22:27 · Score: 1

Postgres might carry you further than you imagine with hstore and json extensions. I'd also try Riak if you really want NoSQL.

hyperdex by fredan · 2014-04-08 22:48 · Score: 1

take a look at hyperdex if your are looking for a NoSQL DB: http://www.hyperdex.org/

Re:hyperdex by DorianGre · 2014-04-09 01:20 · Score: 1

Thanks. That is a new one to me.

Big mistake by msobkow · 2014-04-08 22:58 · Score: 5, Insightful

Telecommunications data is eminently suitable to schema table storage in any relational database, which with a little work, will let you index by the keys you intend to query by.

NoSQL solutions are better for unstructured data that doesn't come in predictable formats or value sets.

You need to take a step back and look at the problem before you decide on a solution. Don't be one of those idiots who tries to use a hammer to drive a screw.

--
I do not fail; I succeed at finding out what does not work.

Re:JUST USE POSTGRES by Tanaka · 2014-04-08 22:59 · Score: 1

I like Postgres, and I like MongoDB too. Both have their strengths. Best tool for the job I say.

The great thing about MongoDB is you can install two or three servers in different datacenters, and have redundancy out of the box. It's really simple. And you can scale horizontally if you need to without any downtime.

The last time I looked at Postgres, to do the same, you had to use third party solutions, and the client side drivers didn't support it. Is it any better now?

SQLite by jchevali · 2014-04-08 23:05 · Score: 1

SQLite

Re:S3 better than files on disk by xelah · 2014-04-08 23:21 · Score: 2

Now scale that. Or just lock it properly.

If you want simple, scalable and low sysadmin overhead and all you need are key -> value lookups then Amazon's S3 can be an excellent choice. You don't need to manage it, you don't need to work out how to add servers and its well proven at extremely large scales.

However, like a lot of other posters, I'm very sceptical that NoSQL is the place to start. SQL databases can do a LOT for you, are very robust and can scale very considerably. As your requirements grow you might find yourself wanting things like indexes, transactions, referential integrity, the ability to manually inspect and edit data using SQL and the ability to store and access more complex structures. You're likely to give yourself a lot of pain if you go straight for NoSQL, and even if you DO need to scale later combining existing SQL and new NoSQL data stores can be a useful way to go.

Which luxury yacht after my new project? by BlackPignouf · 2014-04-08 23:24 · Score: 5, Funny

"I'm working on a new independent project. It will soon become the new Facebook, and I'll be billionaire next quarter. The only problem is that I don't know which luxury yacht to buy with all this money. I've been looking at Lady Moura, Christina O, Pelorus, Venus and others. What do you recommend? What problems have you run into with the ones you've tried?"

Re:Which luxury yacht after my new project? by coofercat · 2014-04-09 00:51 · Score: 5, Funny

Pff! All that soon-to-have money and yet no imagination, huh? Buy an old diesel Navy submarine and have it refitted. Maybe cut some windows into the hull - that'll mean you can only go down to maybe 50 metres instead of 350, but that's still plenty, and if you get lost you can just look out of the windows to see where you are without having to worry about using sonar.
I'd imagine surfacing your submarine in Monaco's marina will turn far more heads than your ridiculous yacht moored a mile offshore ;-) (besides, a submarine is phallically shaped, so works better in metaphorical dick measuring competitions)
Oh, and be sure to use Postgres or MySQL for your on-board systems - it'll scale plenty well for a long time before you need to go all 'web scale' with a NoSQL DB.

Two words by ledow · 2014-04-08 23:57 · Score: 2

Premature Optimisation.

It's a TRAP by Anonymous Coward · 2014-04-09 00:05 · Score: 1

Don't tell NSA how to record calls into a database! I guess they've been typing it to a excel all this time.

Re:JUST USE POSTGRES by VortexCortex · 2014-04-09 00:19 · Score: 1

Seriously - JUST USE POSTGRES - there is virtually nothing that it can't do.

Indeed. With its native JSON type and HStore Key/Value store it has NoSQL features. Given Postgresql's ability to cluster, pool, and replicate it also scales quite well. IMO, it doesn't make sense to abandon all relational DB features in a NoSQL only solution (especially right off the bat) when you can have both. Postresql may just be the droids you are looking for.

Re:CouchBase by grcumb · 2014-04-09 00:21 · Score: 2

CouchBase/CouchDB is probably the easiest and most available one out there. It's particularly well suited for app backends too, as both the backend and mobile apps can talk to the same database, in theory eliminating the need for the backend to handle data syncing.

Those are good reasons, and it's also true that CouchDB will use a lot less resource overhead than a full-bore RDBMS under load. Depending on the use case, it might also prove decidedly easier to scale.

But the place where NoSQL really shines is storing amorphous or heterogeneous data. Because you have no constraints about what goes into a given record, you can record more or less name/value pairs at your whim. As with Perl, though, freedom comes at the cost of potential disorder.

But honestly, with the tiny amount of detail provided, it seems like it's really six of one and half a dozen of the other. If it's just call data being recorded, and the same call data every time, it won't make a huge difference if you use a full-blown RDBMS or a NoSQL database. Either one has its costs (individual PUTs and POSTs in CouchDB for example, can be expensive, whereas queuing and write contention might cause headaches at extreme scales in PostGres or Oracle).

Both an RDBMS and a NoSQL database will deal with replication fairly well, though my personal inclination is to prefer the simplicity of replication in CouchDB right up until the noise level gets out of hand.

--
Crumb's Corollary: Never bring a knife to a bun fight.

Re:JUST USE POSTGRES by VortexCortex · 2014-04-09 00:38 · Score: 1

The great thing about MongoDB is you can install two or three servers in different datacenters, and have redundancy out of the box. It's really simple. And you can scale horizontally if you need to without any downtime.

I've never had to use 3rd party solutions to implement horizontal scaling, replication, pooling, clustering, etc. with Postgresql. I have often had to demand changes of 3rd party vendor-lockin-ware, or add a kludge myself to fit a business's needs. RTFM application used to be far more common, but seems to have fallen out of fashion of late as more programmers and DBAs are increasingly discovered not to be hackers. Did you know Postgresql supports NoSQL features via HStore and JSON?

Much experience has shown that it's better to look well before leaping rather than hop on the buzz-wagon then try adding wings on the fly. The problem with one-size-fits-all methodology is that when one designs a system with everyone in mind, one has actually designed it for no one at all. What happens when that "simple" redundancy solution meets a more complex problem space is that you're left with folks who didn't understand the issue in the first place trying to fix the problems they've caused.

Develop for a design, not a technology. by generic_screenname · 2014-04-09 01:16 · Score: 1

What is your architecture? Answer that question first, then decide what kind of data store to use. What are you storing, and why are you storing it? How will you use that later?

Stock inventory? by biodata · 2014-04-09 01:17 · Score: 1

Is this for your stock inventory project? If you want to do anything that involves keeping track of any goods or money or anything of value, then NoSQL is not necessarily the way to go. NoSQL is designed to keep track of value-less things like Twitter messages and Facebook postings, where it doesn't matter if you lose a few thousand transactions here or there. People keeping track of things with actual monetary value usually use SQL for the transactions, from what I've seen.

--
Korma: Good

Re:Stock inventory? by DorianGre · 2014-04-09 02:08 · Score: 1

This is exactly for the stock inventory project. We currently have MySQL backend and it seems to be working well. Currently plan is to migrate to PostgreSQL in the next few months. We are expanding the project to have consumer-facing iphone/android, so scaling is ??? The consumer facing app will query data looking for availability and then report to the retailers info on those queries (18 people looked for this product in your area last week and you were out). We are also starting to import data from 3rd party inventory systems. The long play here is automated stock and reorder management, but we are starting where there is less competition in the space.

HBase by scorp1us · 2014-04-09 01:20 · Score: 2

First. everyone who is pointing out your premature optimization is probably right. You can get a lot of scalability out of existing databases, particularly if you optimize your data schema with indexes. Even if you store all possible 9,999,999,999 phone numbers, the log base-2 of that is 34. So you'll need a b-tree 34 levels deep. That's big, real big, but b-trees are fast. Worst case you are reading 34 blocks from disk, which is ~16kB.

Next, don't choose databases by name. Choose them by their features because you use features, not names. That said, HBase is probably what you want. It's a blend of distributable hadoop and tables. You don't need atomicity (it doesn't sound like) which is one thing you give up when leaving SQL behind.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Perhaps you should abstract your persistence model by Assmasher · 2014-04-09 01:31 · Score: 2

...so that you simply write an adapter for pushing/pulling data.

Then you don't have to worry so much about making what appears to be an extremely premature optimization.

In other words, have your backend web services (presuming you're using them and not manually POSTing from a socket yourself to your own socket server) instantiate an instance of iMyDBAdapter and use it.

Later, when you find out that you actually do need MongoDB, PostgreSQL, sharded MariaDB, whatever, you can simply write another adapter class that simply has to satisfy the iMyDBAdapter interface.

The reason this works so well is that it will force you to separate your business logic from your underlying DB implementation (which requires a lot of discipline to do otherwise, especially when you just want to get something 'done'.)

Also, as another poster pointed out, you're much more likely to suffer from other issues relating to scaling (and issues better solved elsewhere) than a modern database.

My advice, stick rigidly to the interface/adapter mechanism and implement an adapter for whichever DB you're most comfortable with right now.

--
Loading...

Solution looking for a problem by luis_a_espinal · 2014-04-09 01:35 · Score: 5, Insightful

I would like to start with a NoSQL solution for scaling,

This is a solution looking for a problem. Or more precisely, you are looking for an excuse to use a piece of technology or paradigm. Don't get me wrong, your systems requirements might indeed be best served using a NoSQL solution, but what exactly has your analysis shown regarding this?

Scaling is not just a technical feature (NoSQL, SQL, Jedi mind-meld tricks). Scaling is a function of your architecture. You can NoSQL the shit out of your solution, but if your software and system architecture is not scalable, then having NoSQL will mean chicken poop as solutions go.

and ideally it would be dead simple if possible.

If you want simple, put a simple RDBMs schema (a properly normalized that) in place, and have your code use a simple, technology-agnostic persistence layer that maps your domain-level artifacts to database artifacts. If you ever had to replace the back-end, then you can do so with minimal changes to the API that domain-level artifacts use to persist themselves with the persistence layer.

Design your domain solution around domain-specific artifacts. Persistence technology is typically a low-level design/implementation detail, an important one obviously (and a critical one for some classes of systems).

But for what you are describing, the choice shouldn't even be coming into the picture without first having an architectural notion of your solution.

Re:Solution looking for a problem by DorianGre · 2014-04-09 03:39 · Score: 1

I think we were just looking for an excuse to play with NoSQL solutions, rather than needing it. Through segmenting customers across multiple databases, we will do scaling just fine. Hoping for an elegant solution, but Copy, Rename, Repeat seems to work fine too.
Re:Solution looking for a problem by avandesande · 2014-04-09 04:54 · Score: 1

After working on dozens of large projects I would suggest that if you want whatever you are doing to be successful you need to be ruthless about removing any kind of complexity or opacity from your solution.

--
love is just extroverted narcissism
Re:Solution looking for a problem by cavebison · 2014-04-10 16:06 · Score: 1

Jedi mind-meld tricks
That's definitely a mix of incompatible technologies right there.
Re:Solution looking for a problem by luis_a_espinal · 2014-04-11 01:51 · Score: 1

Jedi mind-meld tricks
That's definitely a mix of incompatible technologies right there.
I know!!(10+1) ;)

Re:JUST USE POSTGRES by Assmasher · 2014-04-09 01:37 · Score: 1

I would have to agree about PostgreSQL, it is surprisingly flexible and powerful. I've used it for small business systems and recently on a 'big data' (oh, that overused buzzword...) project (millions of devices reporting dozens of times per day) and it has been fantastic.

Wish I'd gotten on the bandwagon 10 years ago.

--
Loading...

Re:S3 better than files on disk by DorianGre · 2014-04-09 01:37 · Score: 1

I think we are leaning toward SQL as it is something we all know. However, the alternatives needed to be investigated.

If you have to ask ... by ehiris · 2014-04-09 01:59 · Score: 2

It means you don't have any big data requirements so you're better off sticking with MySQL or something easier to manage at a small scale.
If growth is high or you have a lot of data to analyze, you can look into importing data into Hadoop using sqoop and query it with Hive and HBase. But you most likely won't need that for at least a couple of years.

Files, flys and fries by WaffleMonster · 2014-04-09 02:34 · Score: 1

Create a separate folder for each type of 'key' copying 'POST' data to files in these folders using filename as key for ... umm... lightning fast retrieval.

U should then totally think about creating other directories full of symbolic links rather than files enabling you to have many keys for reference or even generate materialized views without duplicating data.

Since you would be using a query language that is not SQL it is guaranteed to scale to infinity and beyond... (inodes sold separately)

One of which was even spelled correctly ;-) by Scotland · 2014-04-09 02:49 · Score: 1

I've never seen a post with 50% of its words spelled incorrectly. Unless it's in French? -- in which case, I guess your keyboard doesn't support accents.

--

Not a grammar nazi. Just couldn't resist on this one.

Make it fast, don't marry first... by NotesSensei · 2014-04-09 02:52 · Score: 2

and get to know it later :-). Fast here: your prototype creation, not primary the database I/O. The general comments are right: there is no one-fits-all solution and the database might change. It looks very much like you also haven't decided on the server platform: Ruby, PHP... you could look at node.js or vert.x too - server side JavaScript is at least neat for prototyping (I'm not making a statement that is is *only* neat for prototyping - that's a completely different discussion). We did a number of super rapid prototypes with datasets roughly in the range you describe using CouchDB (not CouchBase!). There we took advantage of CouchApps - the ability to store the application itself inside the database - works like a charm when replicating data and you need a http server (Apache, NGix) for the URL mapping (which is already kind of optional) and CouchDB. You can authenticate with OAuth or via the Webserver and it replicates - so you can have local data easily (gold for testing). Since you can specify the direction I usually replicate all data from the server into local, but not the design. So I can try new app features local against the live dataset. It also does Map-Reduce using JavaScript. Give it a shot. If it can handle the data from CERN you also have quite a growth path. One fun project we did: run it on a Rasberry Pi to collect weather data from Arduinos all mounted in a small sail boat (the Pi in the cabin, the Arduinos on the masts). Occasionally when the Wifi or 3G shield picks up a network, it replicated back to a cloud server.

Re:JUST USE POSTGRES by WuphonsReach · 2014-04-09 03:16 · Score: 1

Wish I'd gotten on the bandwagon 10 years ago.

Mmm, 10 years ago you would have been using 7.3 or 7.4. Which was not all that fast unless heavily tuned. It wasn't until the 8.x series in 2006-2008 (roughly) where they started focusing a bit more on performance. These days it is quite powerful and a definite competitor to the high-end paid offerings.

There was also the issue that 7.x was a PITA to run on top of a Microsoft Windows system. The 8.x and 9.x series run natively and integrate far better with the Windows O/S, which makes it easier for desktop developers to get their feet wet.

(We ran PostgreSQL on a Windows server for the first year or two once 8.0 came out, then migrated over to running it on Linux.)

--
Wolde you bothe eate your cake, and have your cake?

MongoDB by GameMaster · 2014-04-09 03:42 · Score: 2

Use MongoDB, it's web-scale. They produce kick-ass benchmarks by piping all your data to /dev/null.

--

Rules of Conduct:
#1 - The DM is always right.
#2 - If the DM is wrong, see rule #1

Can you separate data collection from reporting? by Anonymous Coward · 2014-04-09 03:45 · Score: 2, Interesting

If the goal really is just to amass data and then do offline reports on it (not completely clear from the question) then I can report that at my company we've been doing this at scale for over five years. Here's how:

* A bunch of web servers accept data and append it to a local disk file.
* Every hour, that "log" is pushed from each host into HDFS and a new log file started. (HDFS as in the Hadoop Distributed Filesystem)
* Querying is done later, using Hive with a custom deserializer that natively understands our on-disk format. (You could also just make sure your on disk-format is the delimited text format Hive natively understands, of course. We had some unique needs here.)
* An hourly task runs a small set of Hive aggregation queries (Hive presents a SQL-like interface to defining and running MapReduce jobs) on the raw "table" to produce some smaller datasets that can return aggregate-based results faster than the raw data, including copying some of the smaller aggregates into a MySQL database for online access via some reporting applications.

At this point our daily dataset is a few terabytes in size, when considering the sum of all of the collecting servers across all of the hours. (There are some peak hours due to the nature of our business, so the volume isn't even across the whole day.)

The only thing we've ever disliked about this system is the delay between data arriving and it being available to query. For a little while experimented with using Apache Storm to with realtime log streaming, and produced a working prototype that was shown to work for a one-tenth sample of the data, but ultimately we concluded that the need for faster data wasn't strong enough to justify the additional complexity and stuck with the above design. Therefore I can't speak to how far that solution would scale, but if real-time analysis isn't a requirement -- and scaling up in data size is -- then I can certainly recommend the above design.

Re:What is this? Stack Overflow? HackerNews thread by DorianGre · 2014-04-09 04:31 · Score: 1

Really, I just want to know if you were starting a new site that was mostly incoming data and needed to possibly scale quickly, what choice would you make at the outset to make your future life more bearable.

Re:One of which was even spelled correctly ;-) by ledow · 2014-04-09 04:37 · Score: 1

Welcome to English.

The language you copied, fucked with, and then claimed to have the definitive version of.

Pretty much if we end a word with -our (colour, flavour, honour) or -ise (optimise, etc.) then we're right.

If your only tool is a (NoSQL) hammer... by akubot · 2014-04-09 04:45 · Score: 2

... then everything looks like a (NoSQL) nail. Who says you need NoSQL? Nothing against using cool, newish stuff, but as others have pointed out, you didn't describe the scale of your project. Don't blindly pick trendy technology just because you want to sit with the cool kids at lunchtime. If this is an alpha or beta product with under 1 billion records, use a regular database and be done with that. Move onto the interesting parts of your project and fix the plumbing later if you need to.

MySQL + SSD by Leolo · 2014-04-09 04:59 · Score: 1

NoSQL was only necessary because traditional SQL's table joins are slow. Table joins are slow because hard disks are slow. But if your table data is on SSD, disk access stops being slow, joins stop being slow and NoSQL stops being necessary.

I saw a great rant about this a few years ago. I've lost the link though.

Why only NoSQL? by nepoznatn · 2014-04-09 05:31 · Score: 1

There are number of solutions on the market that support best from both worlds. Oracle and postgres both have support for NoSQL datatypes. Informix went even further, it gives you ability to mix classic relational tables with NoSQL collection in the same database. You can write a query that will access data from both table and collection at the same time. You can use compression, timeseries it also supports mongodb API so you can write application that will connect to Informix using mongodb drivers, and of course you can shard as much as you want with no pain. Just google hybrid sql.

Re:Why only NoSQL? by DorianGre · 2014-04-09 06:23 · Score: 1

Thanks. I at least got some pointers to things I wouldn't have otherwise considered through this thread, even if my original question was poorly thought out. I haven't looked at hybrids.
Re:Why only NoSQL? by nepoznatn · 2014-04-09 20:15 · Score: 1

There are no pure NoSQL or pure SQL projects/solutions, sooner or later you will end up using both SQL and NoSQL, with NoSQL it can be even worse. I have seen project that are using 2 and more different NoSQL solutions. For example, mongodb for documents and neo4j as a graph database and some other database for relational data. Hybrid approach simplifies all this, you just have to plan and choose wisely.

Re:One of which was even spelled correctly ;-) by Bacon+Bits · 2014-04-09 06:05 · Score: 1

The language you copied, fucked with, and then claimed to have the definitive version of.

Isn't that how English came about in the first place?

--
The road to tyranny has always been paved with claims of necessity.

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-09 06:14 · Score: 1

You forget the fact that modelling for a NoSQL database usually works completly different than for a rational database, hence the code using one or the other is completely different.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Ooooh shiny by DorianGre · 2014-04-09 06:25 · Score: 1

I work with a SQL database every day. I optimize weekly. I haven't done a project since mid 1990's that didn't have some sort of SQL DB attached to it. I just wanted to see if there was something better. Apparently not.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-09 06:26 · Score: 1

Why would you be modeling anything relating to entities at the interface level?

iMyDatabase would have methods such as:

ValidateConnection
GetSomething
StoreSomething

It shouldn't know anything about how that data is stored, where it is stored, how your object is serialized/deserialized from a DB entity, et cetera...

--
Loading...

Re:Ooooh shiny by plopez · 2014-04-09 06:56 · Score: 1

" if configured and used correctly"
There's the rub. Configuration takes knowledge and work and most developers think there is a magic way to avoid it. There isn't. If you don't need data consistency or atomic operations but throughput you also have the option of turning off logging. That gives you a mature DB engine that is proven with much faster through put.

"If you've designed a bulletproof database schema, optimized all your queries to the bone, created every possible index on every possible table, partitioned your database files and even thrown hardware at it"
In other words, done everything a good DBA should.

Sacrificing data integrity is ok if it is a happy little mobi game. Go for it. If it has anything to do with human life, e.g. medical records, you had better think long and hard about sacrificing data integrity.

--
putting the 'B' in LGBTQ+

Re:Ooooh shiny by maestroX · 2014-04-09 07:34 · Score: 1

I work with a SQL database every day. I optimize weekly. I haven't done a project since mid 1990's that didn't have some sort of SQL DB attached to it. I just wanted to see if there was something better. Apparently not.

In what role do you work?
Your questions do not show any experience, it's just a soup of popular IT terms
sendmail fits the bill

Depends on the situation by samwhite_y · 2014-04-09 07:47 · Score: 2

I have used Oracle, MySQL, and Mongo in prod situations. I have looked at Cassandra for evaluating it for potential usage in prod.

I can imagine situations where I could recommend any of the above. For example, if you are large financial company with billions of rows, I would go with Oracle. If you have smarts but not money and didn't need somebody to sue if something went wrong, then maybe Postgres would do . If I were a simple web based app with simple form submits, I would go with MySQL. If I had complex unpredictable data blobs and unpredictable needs to do certain types of queries against the data, I might recommend Mongo. If I have large amounts of data on which I want to do analytics I would use Cassandra.

Cassandra wins when you have a lot of data and not a lot of complex real time queries against it. It is especially good at scaling up on cheap data storage (think 100s of terabytes). It also has an unreal "write" throughput (important for certain types of analytics which write out complex intermediate results) though that is not relevant for the case described.

The problem generally with noSql solutions is that they increase the amount of storage to store the equivalent amount of information. You are essentially redundantly storing schema design with each "record" that you store. This really matters more than some might suspect, because when you can put an entire collection into memory, the read performance is much higher. You usually need 1/5th to 1/10th as much RAM to do the job with a traditional relational database (especially since MySQL and their brethren handle getting in and out memory better than mongo). This isn't so much the case for Cassandra because of its distributed storage nature, but it really isn't usable for real time transactions.

My recommendation, use a traditional database -- if in a Microsoft shop use SQL Server, otherwise I like postgres or mysql. If however, you have complex data storage needs that a noSql solution is perfect for, then I would go with that. If you are into back end analytics, copy the data as it comes in and put into a Cassandra (or one of its similar brethren) as well.

Postgres clusters? by Anonymous Coward · 2014-04-09 07:53 · Score: 1

People are saying Postgres clusters without third party software... yet.. it does not.

Synchronous replication != clustering

If your master dies (and your only master), your application cannot automagically recover.

You have change a slave to a master, which requires a config change/restart of the slave

So now your master has gone down, and you have to restart at least one slave which becomes the new master

Tell me how the out-of-the-box solutions can be considered clustered? When people say that, they mean it's HA, and Postgres certainly is not.

Don't get me wrong, I love Postgres, but don't hype up core features it doesn't have (I sure wish it did)

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-09 08:41 · Score: 1

You will model something like: findMyStuffByTimeStamp().

Your suggestion sounded as if you wanted to put the layout of the DB into an abstraction layer.

If you simply talk about method signatures, then I wonder why you brought it up :)

And what exactly does ValidateConnection mean? Either you have a connection, or you have not, just an idea ....

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-09 08:59 · Score: 1

"...so that you simply write an adapter for pushing/pulling data" makes you think the abstraction layer would have the DB layout in?

Let me be perfectly clear then, the abstraction layer would simply know about the business logic side of things and that you can store and retrieve those things in some fashion most likely represented by some criteria associated with them.

If you simply talk about method signatures, then I wonder why you brought it up

I don't know what you mean.

And what exactly does ValidateConnection mean? Either you have a connection, or you have not, just an idea ....

What? Who/what would already "have a connection" to another server or memory mapped file or process or socket?

ValidateConnection, in this example, would simply ensure that the backend persistence mechanism both exists and (as is required in most cases) that you have valid credentials.

--
Loading...

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-09 10:17 · Score: 1

What about "connect(user, credential)"? That is how it works in the real world.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-09 10:36 · Score: 1

Well, in the real world, when you abstract things properly you don't expose a "connect" method. The code behind your interface - the adapter - would use connect and disconnect internally.

In the real world, when you abstract things, you expose a method that validates that the persistence layer is functioning/configured/usable as a normal part of the application/service/component's life-cycle. I called it ValidateConnection in this scenario because of the way he described his issue.

--
Loading...

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-09 12:02 · Score: 1

Sorry,
you have no idea about the real world.

You connect to a DB or open a File or open a Socket and either "it just works" or you get an exception. There is no need to "validate" your connection object after you have created it, either you have it and it is "valid" or you don't have it or any subsequent method call results in an exception (which you have to handle anyway).

The title of your post is "Perhaps you should abstract your persistence mo".

After I answered to you, you suddenly talk about abstracting the business level.

So either you made a mistake in choosing the right words or headline or you simply are mixing stuff up and now try to weasel out of it ;D

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-09 12:42 · Score: 1

Sorry, you have no idea about the real world.

Funny, just a few years ago I was the chief software architect for a company purchased for more than 60 million dollars entirely for our enterprise product. One of the primary reason this company was acquired instead of its competitors was because we were pioneering open standards in our market verticals and supporting those open standards with public integration points that 3rd party companies, including our competitors, wrote integrations to.

This system had a persistence model that had to scale, not just horizontally, but in 'swim lane' fashion - or if you prefer the actual fashion we used, in AKF cube fashion. It handled tens of millions of persisted logic events daily and integrated with many different back end databases - all supported through this EXACT same facade/proxy system implemented with adapters. This pattern was used for all of the integration points and was how 3rd parties wrote integrations with our system.

So, whatever it is you do, you can rest assured that I write enterprise software in the "real world" and quite successfully.

You connect to a DB or open a File or open a Socket and either "it just works" or you get an exception.

You really just can't seem to understand abstraction.

After I answered to you, you suddenly talk about abstracting the business level.

Not at all. Again you demonstrate that you don't understand what abstraction is. By hiding the details of the persistence model, which means (so that you understand) that people using the abstraction interface don't know if it is a DB, or a file, or a web service, or a pipe, or a local process, or a remote process, the business logic simple deals with business objects.

If I was talking about abstracting the "business level" (presumably you mean business logic) I would be talking about an interface exposed to a view or consumer that didn't need to know any details about how the business logic operated. I was clearly not talking about that at all.

So either you made a mistake in choosing the right words or headline or you simply are mixing stuff up and now try to weasel out of it ;D

I'm willing to bet that you end up in a lot of 'arguments' where you bring out this line. It's okay, maybe some day you'll get it.

--
Loading...

any good isam engine by rewindustry · 2014-04-10 04:59 · Score: 1

enough said.

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-10 06:35 · Score: 1

Your first post I answered to certailny was not clear about "abstracting away persistanve issues" and your naming examples like ValidateConnection or CheckConnection are certainly bad choices as an example. On top of that that post made no contribution to the question the poster asked.

I'm a real programmer, not a manager.

Abstracting away the fact that a Service is remote and not local leads to all forms of problems. It is very often. o good idea.

I rather assume you get in lots of arguments, or you are to lazy to use the correct words/concepts to make clear about what you want to talk.

Sorry to say so, but the post I'm just answering to does not look like the person who wrote it had any clue or real life experince in software engineering at all. Is not ment as offense.

And no, I don't use that line often, actually I don't remember if I had used it already once.

Well, the application I'm working on right now mainly uses custom written persistence (MongoDB and kryo) as we have to persist millions of events per hour (worst case) and perform analysises that need response times of less than a second (usully working on a time frame, so only a relatively small amount of data has to be fetched from the backing storage).

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-10 07:44 · Score: 1

Your first post I answered to certailny was not clear about "abstracting away persistanve issues"

Are you an escaped inmate from a Guatemalan insane asylum?

The entire first post, including the title of the post, is explicitly about abstracting your persistence model.

"In other words, have your backend web services (presuming you're using them and not manually POSTing from a socket yourself to your own socket server) instantiate an instance of iMyDBAdapter and use it."

Maybe you don't find that clear, but that's because you apparently don't understand abstraction...

your naming examples like ValidateConnection or CheckConnection are certainly bad choices as an example.

The stupidity of your statement really cannot be overstated. You dislike ValidateConnection because you claim you will simply catch an exception when you connect; ergo, you are either connected or you are not. This, alone, is proof that you do not understand abstraction.

I'm a real programmer, not a manager.

And you'll apparently never get any further, because you'll need to understand abstraction before you can be an architect. I'm also not a programmer, I'm a software engineer (there's a difference that you're not aware of), a software architect, a founder, a co-founder, and I also perform technical due diligences for multiple Vencture Capital firms.

Abstracting away the fact that a Service is remote and not local leads to all forms of problems. It is very often. o good idea.

Actually, this is EXACTLY what you should abstract away. Yet again you demonstrate your lack of basic understanding of the purpose of abstraction. You think that abstracting away 'locality' is bad and leads to problems? Why on earth would it do that? LOL. Your abstraction layer should satisfy the requirements of the business logic, if locality is an issue (i.e. for performance) then your adapter implementation must account for that. The only time anyone using your abstraction layer should ever know anything about locality would be if that knowledge would be required so that the business logic could make a decision - otherwise, that sort of information should be encapsulated totally.

And no, I don't use that line often, actually I don't remember if I had used it already once.

Sure, I believe you, and you understand abstraction too.

Well, the application I'm working on right now...

Great, I hope you have a competent architect.

--
Loading...

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-10 08:17 · Score: 1

I'm a competent architect.

That is why I work there.

Sorry I don't get your rants. You seem to have 3 bad days in a row or something. Your talking about abstractions really makes not much sense, so I pray for the entroneurs you consult, good luck.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-10 08:56 · Score: 1

Your talking about abstractions really makes not much sense, so I pray for the entroneurs you consult, good luck.

I'm sure it doesn't, because you have demonstrated quite clearly that you don't understand abstraction.

How can you be a competent architect when you don't understand abstraction? LOL.

In any case, you're a programmer, right?

--
Loading...

Re:Perhaps you should abstract your persistence mo by angel'o'sphere · 2014-04-10 09:59 · Score: 1

Sigh, what is your problem?
Do you have a mental illness?
I for my part did not talk about abstraction at all, hence you have no basis to judge if I know anything about abstraction.

Have a good day (and once again I wonder why /. has no ignore feature).

In any case: I'm a requirements engineer, a software architect, a systems architect, a developer in about 20 programming languages; I do everything from training, coaching, developing, testing, analysis, design, implementation, test. I do internet applications with a few dozen of millions of users, desktop applications, embedded development in the automotive and aircraft industries. I do everything that is interesting ... do I need to continue?.

Since over 30 years. But you are the guy who sold a company for a few millions ... wow, I really wonder what I do wrong.

I for my part don't make the mistake (anymore) to accuse someone about "he does not know X" ... can be a grave mistake sometimes.

If you indeed did anything you claimed the previous posts I strongly suggest you improve your comunication skills, and for that matter: your manners.

Sorry to half insult you again: your previous five posts sound like you are a complete idiot and a superb moron.

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.

Re:Perhaps you should abstract your persistence mo by Assmasher · 2014-04-10 10:20 · Score: 1

Last word, lol

--
Loading...

Slashdot Mirror

Ask Slashdot: Which NoSQL Database For New Project?

206 of 272 comments (clear)