Ask Slashdot: Which NoSQL Database For New Project?
DorianGre writes: "I'm working on a new independent project. It involves iPhones and Android phones talking to PHP (Symfony) or Ruby/Rails. Each incoming call will be a data element POST, and I would like to simply write that into the database for later use. I'll need to be able to pull by date or by a number of key fields, as well as do trend reporting over time on the totals of a few fields. I would like to start with a NoSQL solution for scaling, and ideally it would be dead simple if possible. I've been looking at MongoDB, Couchbase, Cassandra/Hadoop and others. What do you recommend? What problems have you run into with the ones you've tried?"
Do you need a database to do what you're trying to do? Why not just write the information to a text file (csv or tab seperated?), and use other programs to query the data?
try to make ends meet, you're a slave to money, then you die
If you need to store less than a few hundred million rows just use PostgreSQL.
It supports JSON and transactions.
CouchBase/CouchDB is probably the easiest and most available one out there. It's particularly well suited for app backends too, as both the backend and mobile apps can talk to the same database, in theory eliminating the need for the backend to handle data syncing.
One caveat though, the last time i used Couch ( which was a few years back now ) I encountered problems with its map/reduce implementation. Specifically, you cant ( or at least couldn't at the time ) do chain map/reduces, which severely limits how you can query your data. With the requirements you listed, you should be fine though.
You might want to consider a SQL database.
Based on your information no one can give you solid advice. It highly depends on the load you expect and on the data model you will use. for a simple twitter, you can use a log file, or any NoSQL technology. If you only have a few transactions and not billions of entries, you could use PostgreSQL or even MySQL. However, PostgreSQL scales better. If you want to make complex interpretations on graph like data you may consider Neo4J as a graph DB.
I would like to start with a NoSQL solution for scaling
And there it is, the proverbial premature optimization ...
To answer the question "Which NoSQL Database For New Project?" there are 2 comments:
- A relational database
- A plain text file
The user gave an argument: "I would like to start with a NoSQL solution for scaling"
NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.
I would recommend MongoDB if the transactional aspect is not important for your purpose: easy to learn, easy to use.
These guys are committed, meaning mongo has a future. 2.6 that came out the other day has some nice new features and many bug fixes.
SQLite is a relational database management system contained in a C programming library. In contrast to other database management systems, SQLite is not a separate process that is accessed from the client application, but an integral part of it.
"I'll need to be able to pull by date or by a number of key fields"
So, in other words, you have already decided on key fields. If you use a database, this has things call index's, that can search billions of rows for a key field in a fraction of a second.
If you don't use something with INDEX's then you can't do this.
Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.
Become a fan of Slashdot on Facebook
Brave Sir Robin ran away. ("No!") Bravely ran away away. ("I didn't!")
I would consider using the latest release of MariaDB.
You can use it as a standard MySQL server, but they also have Cassandra NoSQL as an engine for it now (since the release of 10)... So you would be easily able to play with things on different database types and see what suits your situation better.
... since it is web scale. ;-)
https://www.youtube.com/watch?v=b2F-DItXtZs
If you're going to need search at some point you should just opt for Elastic Search from the start. Yeah, it's a search engine, but it's also a rather good key/value store.
Must be nice to be dice!
If you're question is relatively simple, aimed at the general slashdot crowd, than the answer is that you need to hire someone who knows database implementations
If the question is complex enough for an experienced database implementer, he/she would know where to post that question. And it is not here.
As you can read above. The answer is simple. The problem non-existing for experienced implementers.
Why are other peoples sig's always more witty ???
It's a mistake to think that "NoSQL" is a silver bullet for scalability. You can scale just fine using MySQL (FlockDB) or Postresgl if you know what you're doing. On the other, if you don't know what you're doing, NoSQL may create problems where you didn't have them.
An important advantage of NoSQL (which has its costs) is that it's schema-free. This can allow for more rapid iteration in your development cycle. It pays off to plan document structures carefully, but if you need to make changes at some point (or just want to experiment), you can handle it at the code level. You can also support older "schemas" if you plan accordingly: for example, adding a version tag or something similar that can tell your code how to handle it. So, even ignoring the dubious potential of better scalability, NoSQL can still be beneficial for your project.
More so than SQL, NoSQL database are designed for different kinds of applications, and have different strengths:
MongoDB is a really good backend engine that gives programmers lot of control over performance and its costs: if you need faster writes, you can allow for eventual integrity, or if you need faster reads, you can allow for data not being the absolute freshest. For many massive multiuser applications, not having immediately up-to-date data is a reasonable compromise. It also offers an excellent set of atomic operations, which from my experience compensate well for the lack of transactions. Furthermore, MongoDB is by far the most feature-rich of these, supporting aggregate queries and map-reduce, which again can make up for the lack of joins. It also offers good sharding tools, so if you do need to scale, you can. Again, I'll emphasize that you need a good understanding of how MongoDB works in order to properly scale. For example, map-reduce locks the database, so you don't want to rely on it too much. The bottom line is that MongoDB can offer similar features to SQL databases (though they work very differently), so it's good for first-timers.
Couchbase is very good at dispersed synchronization. For example, if parts of your database live in your clients (mobile applications come to mind), it does a terrific job at resynching itself and handling divergences. This is also "scalable," but in a quite different meaning of the term than in MongoDB.
I would also take a look at OrientDB: it's not quite as feature rich as MongoDB (and has no atomic operations), but it can work in schema-mode, and generally offers a great set of tools that can make it easy to migrate from SQL. It's query language, for example, looks a lot like SQL.
The above are all "document-oriented" databases, where you data is not opaque: the database actually does understand how your data is structured, and can allow for deep indexing and updating of your documents. Cassandra and REDIS (and Tokyo Cabinet, and BerkeleyDB) are key-value stores: much simpler databases offering fewer querying features: your data is simply a blob as far the engine is concerned. I would be less inclined to recommend them unless your use case is very specific. Where appropriate, of course simpler is better. With these kinds of databases, there are actually very few ways in which you can create an obstacle for scalability: simply because they don't do very much, from a programming perspective.
There are also in-between databases that are sometimes called "column-oriented": Google and Amazon's hosted big data services are both of this type. Your data is structured, but the structure is flat. Generally, I would prefer full-blown "document-oriented" databases, such as MongoDB and OrientDB. However, if you're using a hosted service, you might not have a choice.
It's also entirely possible to mix different kinds of databases. For example, use MongoDB for your complex data and use REDIS for a simple data store. I've even seen sophisticated deployments that very smartly archive data from one DB to another, and migrate it back again when necessary.
Seriously - JUST USE POSTGRES - there is virtually nothing that it can't do.
I just felt I have to comment on this. So many developers start with the phrase "I need NoSQL so I can scale" and almost all of them are wrong. The chances are your project will never ever ever scale to the kind of size where the NoSQL design decision will win. Its far more likely that NoSQL design choice will cause far more problems (performance etc), than the theoretical scaling issues.
Take for example two systems I've been involved with for managing WiFi access to large scale networks (100,000+ concurrent users, 1000's of APs), one uses MongoDB the other based on PostgresSql. The MongoDB based solution has very real performance problems, its reporting takes a very long time to run taking very large amounts of system ram (24G in some cases) and that performance is only degrading as the system grows, there are also many other performance issue. These issues are not just mongo issues but simply that NoSQL is not well suited to the task. The system has been rewritten using an SQL backend and now works much better but importantly it's scaling but better. Growth in the system is no-longer degrading performance and the point where we need hardware upgrades or extra servers etc are now much more predictable so we can predict cost base growth in relation to user growth.
NoSQL does not guarantee scaling, in many cases it scales worse than an SQL based solution. Workout what your scaling problems will be for your proposed application and workout when they will become a problem and will you ever reach that scale. Being on a bandwagon can be fun, but you would be in a better place if you really think through any potential scaling issues. NoSQL might be the right choice but in many places I've seen it in use it was the wrong choice, and it was chosen base on one developers faith that NoSQL scales better rather than think through the scaling issues.
Fundamentally the single-key document store databases are built on the compare-and-swap primitive. This means that the data structure being implemented, i.e. the one that must support the application's write cases, must be designed up front and won't be amenable to incremental development. Not to mention that designing such a data structure is far more difficult than laying down some CREATE TABLE statements and figuring out what it is exactly that the application prototype is supposed to do.
But also avoid MySQL. It's not good at all. SQLite will also lead you astray.
It's lightweight, fast and supports reasonably complicated queries. Not sure why you need a NoSQL database when you clearly need to Query by key fields.
Postgres might carry you further than you imagine with hstore and json extensions. I'd also try Riak if you really want NoSQL.
If you're going to be doing analysis and totalling, then a traditional SQL database may be the better option.
Which is why the question is just technological masterbation
take a look at hyperdex if your are looking for a NoSQL DB: http://www.hyperdex.org/
Telecommunications data is eminently suitable to schema table storage in any relational database, which with a little work, will let you index by the keys you intend to query by.
NoSQL solutions are better for unstructured data that doesn't come in predictable formats or value sets.
You need to take a step back and look at the problem before you decide on a solution. Don't be one of those idiots who tries to use a hammer to drive a screw.
I do not fail; I succeed at finding out what does not work.
SQLite
Now scale that. Or just lock it properly.
If you want simple, scalable and low sysadmin overhead and all you need are key -> value lookups then Amazon's S3 can be an excellent choice. You don't need to manage it, you don't need to work out how to add servers and its well proven at extremely large scales.
However, like a lot of other posters, I'm very sceptical that NoSQL is the place to start. SQL databases can do a LOT for you, are very robust and can scale very considerably. As your requirements grow you might find yourself wanting things like indexes, transactions, referential integrity, the ability to manually inspect and edit data using SQL and the ability to store and access more complex structures. You're likely to give yourself a lot of pain if you go straight for NoSQL, and even if you DO need to scale later combining existing SQL and new NoSQL data stores can be a useful way to go.
"I'm working on a new independent project. It will soon become the new Facebook, and I'll be billionaire next quarter. The only problem is that I don't know which luxury yacht to buy with all this money. I've been looking at Lady Moura, Christina O, Pelorus, Venus and others. What do you recommend? What problems have you run into with the ones you've tried?"
Look at a disk-backed Redis configuration.
Premature Optimisation.
Don't tell NSA how to record calls into a database! I guess they've been typing it to a excel all this time.
You can't access any phone functions or text message functions via code on an iPhone. Unless you intend this for jail broken phones you're dead in the water. You're probably dead in the water anyway as only an idiot would load an app that tracks calls. There's a very good reason Apple locked that stuff out...security.
To ind the right db I wrote this checklist:
http://nosql-database.org/select-the-right-database.html
Nevertheless I love ArangoDB because of:
* K/V + JSON + Graph = 3 models available!
* Speaks ServSide JavaScript with embedded V8 Server!
* FOXX GUI can talk directly to Database
* Multicore ready
* Advanced indexing plus geo, skip-list, n-gram, !
* Tunable durability + transactions
* AQL = SQL + JSONiq + CYPHER (I do not know of a better graph+SQL language out there...)
* quasi MVCC => SSD ready
* capped collections
* Replication + sharding
* management GUI
* and tons more
not having query or joins or ACID is so cool . everyone is doing it
From your requirements
1) You require logging of information. If the 'back-end' system goes offline, what would you like to happen in the front-end? Using a filesystem for storage would remove the requirement of a 24/7 back-end database.
2) Using filesystem for storage would likely be a single file per POST. What will be the usage? If > 50k a week, you might want timestamped daily directory.
3) Trends. How often do you want these trends, immediate? More immediate, more likely move from file system to repository. And what value would you want your trend analysis to provide? As mentioned above, splunk is wonderful for basic trend reporting. Do you want deep statistical analysis, searching and querying?
4) Searching and querying I would suggest Postgres. Stats, how about R. This means you will need a ETL (extract transfer load) to separate SILOS (yeah, one day this will be solved, but not by NoSQL). If you do not know what you are collecting, or it will change often, now we might move to NoSQL. No Schema = NoSQL.
What is your architecture? Answer that question first, then decide what kind of data store to use. What are you storing, and why are you storing it? How will you use that later?
Is this for your stock inventory project? If you want to do anything that involves keeping track of any goods or money or anything of value, then NoSQL is not necessarily the way to go. NoSQL is designed to keep track of value-less things like Twitter messages and Facebook postings, where it doesn't matter if you lose a few thousand transactions here or there. People keeping track of things with actual monetary value usually use SQL for the transactions, from what I've seen.
Korma: Good
First. everyone who is pointing out your premature optimization is probably right. You can get a lot of scalability out of existing databases, particularly if you optimize your data schema with indexes. Even if you store all possible 9,999,999,999 phone numbers, the log base-2 of that is 34. So you'll need a b-tree 34 levels deep. That's big, real big, but b-trees are fast. Worst case you are reading 34 blocks from disk, which is ~16kB.
Next, don't choose databases by name. Choose them by their features because you use features, not names. That said, HBase is probably what you want. It's a blend of distributable hadoop and tables. You don't need atomicity (it doesn't sound like) which is one thing you give up when leaving SQL behind.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
...so that you simply write an adapter for pushing/pulling data.
Then you don't have to worry so much about making what appears to be an extremely premature optimization.
In other words, have your backend web services (presuming you're using them and not manually POSTing from a socket yourself to your own socket server) instantiate an instance of iMyDBAdapter and use it.
Later, when you find out that you actually do need MongoDB, PostgreSQL, sharded MariaDB, whatever, you can simply write another adapter class that simply has to satisfy the iMyDBAdapter interface.
The reason this works so well is that it will force you to separate your business logic from your underlying DB implementation (which requires a lot of discipline to do otherwise, especially when you just want to get something 'done'.)
Also, as another poster pointed out, you're much more likely to suffer from other issues relating to scaling (and issues better solved elsewhere) than a modern database.
My advice, stick rigidly to the interface/adapter mechanism and implement an adapter for whichever DB you're most comfortable with right now.
Loading...
I would like to start with a NoSQL solution for scaling,
This is a solution looking for a problem. Or more precisely, you are looking for an excuse to use a piece of technology or paradigm. Don't get me wrong, your systems requirements might indeed be best served using a NoSQL solution, but what exactly has your analysis shown regarding this?
Scaling is not just a technical feature (NoSQL, SQL, Jedi mind-meld tricks). Scaling is a function of your architecture. You can NoSQL the shit out of your solution, but if your software and system architecture is not scalable, then having NoSQL will mean chicken poop as solutions go.
and ideally it would be dead simple if possible.
If you want simple, put a simple RDBMs schema (a properly normalized that) in place, and have your code use a simple, technology-agnostic persistence layer that maps your domain-level artifacts to database artifacts. If you ever had to replace the back-end, then you can do so with minimal changes to the API that domain-level artifacts use to persist themselves with the persistence layer.
Design your domain solution around domain-specific artifacts. Persistence technology is typically a low-level design/implementation detail, an important one obviously (and a critical one for some classes of systems).
But for what you are describing, the choice shouldn't even be coming into the picture without first having an architectural notion of your solution.
I think we are leaning toward SQL as it is something we all know. However, the alternatives needed to be investigated.
I wouldn't touch Mongo, transaction in Mongo don't work the way you think they do.
I wouldn't touch MySQL/Maria for exactly the same reason.
Go with PostgreSQL, then you will have a solid infrastructure that will support relations when you need them.
No workarounds no compromises, to lazy amateur BS.
It means you don't have any big data requirements so you're better off sticking with MySQL or something easier to manage at a small scale.
If growth is high or you have a lot of data to analyze, you can look into importing data into Hadoop using sqoop and query it with Hive and HBase. But you most likely won't need that for at least a couple of years.
whatever wordpress comes with.
Create a separate folder for each type of 'key' copying 'POST' data to files in these folders using filename as key for ... umm... lightning fast retrieval.
U should then totally think about creating other directories full of symbolic links rather than files enabling you to have many keys for reference or even generate materialized views without duplicating data.
Since you would be using a query language that is not SQL it is guaranteed to scale to infinity and beyond... (inodes sold separately)
I've never seen a post with 50% of its words spelled incorrectly. Unless it's in French? -- in which case, I guess your keyboard doesn't support accents.
--
Not a grammar nazi. Just couldn't resist on this one.
and get to know it later :-). Fast here: your prototype creation, not primary the database I/O. The general comments are right: there is no one-fits-all solution and the database might change. It looks very much like you also haven't decided on the server platform: Ruby, PHP... you could look at node.js or vert.x too - server side JavaScript is at least neat for prototyping (I'm not making a statement that is is *only* neat for prototyping - that's a completely different discussion).
We did a number of super rapid prototypes with datasets roughly in the range you describe using CouchDB (not CouchBase!). There we took advantage of CouchApps - the ability to store the application itself inside the database - works like a charm when replicating data and you need a http server (Apache, NGix) for the URL mapping (which is already kind of optional) and CouchDB. You can authenticate with OAuth or via the Webserver and it replicates - so you can have local data easily (gold for testing). Since you can specify the direction I usually replicate all data from the server into local, but not the design. So I can try new app features local against the live dataset. It also does Map-Reduce using JavaScript.
Give it a shot. If it can handle the data from CERN you also have quite a growth path. One fun project we did: run it on a Rasberry Pi to collect weather data from Arduinos all mounted in a small sail boat (the Pi in the cabin, the Arduinos on the masts). Occasionally when the Wifi or 3G shield picks up a network, it replicated back to a cloud server.
With all of the NoSQL DB's that you've mentioned, you won't be able to use specific fields in the data blob to reduce the data set. Most only care about the key and don't have much access to the data associated with it. Hyperdex does allow for use with the fields in the data object, but what your looking at doing sounds like you want to run analyitical queries against the data you have. You may want to look at using a multidimentional database, which can better be used for reporting. I'm sure there are many other solutions as well, but this may be worth looking into as a possible solution.
Use MongoDB, it's web-scale. They produce kick-ass benchmarks by piping all your data to /dev/null.
Rules of Conduct:
#1 - The DM is always right.
#2 - If the DM is wrong, see rule #1
If the goal really is just to amass data and then do offline reports on it (not completely clear from the question) then I can report that at my company we've been doing this at scale for over five years. Here's how:
* A bunch of web servers accept data and append it to a local disk file.
* Every hour, that "log" is pushed from each host into HDFS and a new log file started. (HDFS as in the Hadoop Distributed Filesystem)
* Querying is done later, using Hive with a custom deserializer that natively understands our on-disk format. (You could also just make sure your on disk-format is the delimited text format Hive natively understands, of course. We had some unique needs here.)
* An hourly task runs a small set of Hive aggregation queries (Hive presents a SQL-like interface to defining and running MapReduce jobs) on the raw "table" to produce some smaller datasets that can return aggregate-based results faster than the raw data, including copying some of the smaller aggregates into a MySQL database for online access via some reporting applications.
At this point our daily dataset is a few terabytes in size, when considering the sum of all of the collecting servers across all of the hours. (There are some peak hours due to the nature of our business, so the volume isn't even across the whole day.)
The only thing we've ever disliked about this system is the delay between data arriving and it being available to query. For a little while experimented with using Apache Storm to with realtime log streaming, and produced a working prototype that was shown to work for a one-tenth sample of the data, but ultimately we concluded that the need for faster data wasn't strong enough to justify the additional complexity and stuck with the above design. Therefore I can't speak to how far that solution would scale, but if real-time analysis isn't a requirement -- and scaling up in data size is -- then I can certainly recommend the above design.
Lucene and solr.
Not sure about the trending though.
What a total n00b question... I'm not sure if you should actually be messing with this stuff if you cannot figure out what Database technology to use, and you ask /.
And I agree with the others, you are asking for problems....
EOM.
If you're a DBA or dev, go with Mongo and let the sysadmins handle the scaling and expansion. If you're a sysadmin, go with SQL and let the devs and DBA's handle the scaling. If you're a DBA......well I'm sorry but you're screwed.
I didn't know that slashdot turned into stack overflow.... Is this a new beta feature that I just missed? ;)
Like other posters have pointed out, this whole article smells fishy. Being vague as hell but asking for a specific solution?
The point is is that there is no be all, end all answer to this question. It depends on a lot of intangible requirements that the OA seems to miss on purpose. I call troll!
Really, I just want to know if you were starting a new site that was mostly incoming data and needed to possibly scale quickly, what choice would you make at the outset to make your future life more bearable.
Welcome to English.
The language you copied, fucked with, and then claimed to have the definitive version of.
Pretty much if we end a word with -our (colour, flavour, honour) or -ise (optimise, etc.) then we're right.
Look up Neo4J. If you're after something that will be the new "RDBMS-killer", this is it!
Forget MongoDB, Hadoop, MapReduce, etc. By the time you've learned it, something new comes along. Find something generic. So if an RDBMS makes sense (Postgres, MariaDB, etc.), use that.
So it depends what YOU want, basically, since I guess you will support it afterwards?
... then everything looks like a (NoSQL) nail. Who says you need NoSQL? Nothing against using cool, newish stuff, but as others have pointed out, you didn't describe the scale of your project. Don't blindly pick trendy technology just because you want to sit with the cool kids at lunchtime. If this is an alpha or beta product with under 1 billion records, use a regular database and be done with that. Move onto the interesting parts of your project and fix the plumbing later if you need to.
NoSQL was only necessary because traditional SQL's table joins are slow. Table joins are slow because hard disks are slow. But if your table data is on SSD, disk access stops being slow, joins stop being slow and NoSQL stops being necessary.
I saw a great rant about this a few years ago. I've lost the link though.
There are number of solutions on the market that support best from both worlds. Oracle and postgres both have support for NoSQL datatypes. Informix went even further, it gives you ability to mix classic relational tables with NoSQL collection in the same database. You can write a query that will access data from both table and collection at the same time. You can use compression, timeseries it also supports mongodb API so you can write application that will connect to Informix using mongodb drivers, and of course you can shard as much as you want with no pain. Just google hybrid sql.
Isn't that how English came about in the first place?
The road to tyranny has always been paved with claims of necessity.
You can spin this up with a SQL database, but the real question is how well can you do it? Do you know how to design a properly structured database with good indices? Or are you just an app designer that expects the database to somehow know automagically how you want to query the data?
Being that I know from experience that Postgres without much effort can handle queries hundreds of thousands of rows in parallel with a few hundred updates/sec. The limiting factor will be the IO of your disks whatever solution you use.
I've been using postgres for some time with rather kinky data structures and it has never failed to perform. Sincerely the only use for NoSQL I have is in-memory temporary caches of objects, I can fathom that something like Cassandra might perform very well if I don't need data integrity.
You forget the fact that modelling for a NoSQL database usually works completly different than for a rational database, hence the code using one or the other is completely different.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I'm with the guys claiming premature optimization. I think you've been duped by the legion of NoSQL evangelists into thinking RDBMS are old, slow and obsolete and if you put a few thousand rows into a few tables they'll bog down and take minutes to query it. Here's a spoiler: they're wrong. I think you'd be amazed just how robust and fast Postgres, MySQL and MS SQL can be if configured and used correctly. The question NoSQL actually answers is "would you sacrifice atomicity and some consistency for a much higher data throughput?" I work on projects that have to manage sizable chunks of data every day and in my experience NoSQL is only an option after you've exhausted all other avenues. If you've designed a bulletproof database schema, optimized all your queries to the bone, created every possible index on every possible table, partitioned your database files and even thrown hardware at it and you still have issues, then NoSQL might be your salvation. Otherwise, stick to what everybody else is using.
Why would you be modeling anything relating to entities at the interface level?
iMyDatabase would have methods such as:
ValidateConnection
GetSomething
StoreSomething
It shouldn't know anything about how that data is stored, where it is stored, how your object is serialized/deserialized from a DB entity, et cetera...
Loading...
I have used Oracle, MySQL, and Mongo in prod situations. I have looked at Cassandra for evaluating it for potential usage in prod.
I can imagine situations where I could recommend any of the above. For example, if you are large financial company with billions of rows, I would go with Oracle. If you have smarts but not money and didn't need somebody to sue if something went wrong, then maybe Postgres would do . If I were a simple web based app with simple form submits, I would go with MySQL. If I had complex unpredictable data blobs and unpredictable needs to do certain types of queries against the data, I might recommend Mongo. If I have large amounts of data on which I want to do analytics I would use Cassandra.
Cassandra wins when you have a lot of data and not a lot of complex real time queries against it. It is especially good at scaling up on cheap data storage (think 100s of terabytes). It also has an unreal "write" throughput (important for certain types of analytics which write out complex intermediate results) though that is not relevant for the case described.
The problem generally with noSql solutions is that they increase the amount of storage to store the equivalent amount of information. You are essentially redundantly storing schema design with each "record" that you store. This really matters more than some might suspect, because when you can put an entire collection into memory, the read performance is much higher. You usually need 1/5th to 1/10th as much RAM to do the job with a traditional relational database (especially since MySQL and their brethren handle getting in and out memory better than mongo). This isn't so much the case for Cassandra because of its distributed storage nature, but it really isn't usable for real time transactions.
My recommendation, use a traditional database -- if in a Microsoft shop use SQL Server, otherwise I like postgres or mysql. If however, you have complex data storage needs that a noSql solution is perfect for, then I would go with that. If you are into back end analytics, copy the data as it comes in and put into a Cassandra (or one of its similar brethren) as well.
People are saying Postgres clusters without third party software... yet.. it does not.
Synchronous replication != clustering
If your master dies (and your only master), your application cannot automagically recover.
You have change a slave to a master, which requires a config change/restart of the slave
So now your master has gone down, and you have to restart at least one slave which becomes the new master
Tell me how the out-of-the-box solutions can be considered clustered? When people say that, they mean it's HA, and Postgres certainly is not.
Don't get me wrong, I love Postgres, but don't hype up core features it doesn't have (I sure wish it did)
Google makes android. Google makes a NoSQL option (https://cloud.google.com/products/cloud-datastore/). Google makes it easy to glue the two together: https://developers.google.com/appengine/docs/java/endpoints/
You will model something like: findMyStuffByTimeStamp().
Your suggestion sounded as if you wanted to put the layout of the DB into an abstraction layer.
If you simply talk about method signatures, then I wonder why you brought it up :)
And what exactly does ValidateConnection mean? Either you have a connection, or you have not, just an idea ....
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
A problem with SQL DBs are the stupid patents on things like trimming whitespace and common SQL patterns. I'm not a huge fan of NoSQL DBs but I'm even less of a fan of patent trolls.
"...so that you simply write an adapter for pushing/pulling data" makes you think the abstraction layer would have the DB layout in?
Let me be perfectly clear then, the abstraction layer would simply know about the business logic side of things and that you can store and retrieve those things in some fashion most likely represented by some criteria associated with them.
If you simply talk about method signatures, then I wonder why you brought it up
I don't know what you mean.
And what exactly does ValidateConnection mean? Either you have a connection, or you have not, just an idea ....
What? Who/what would already "have a connection" to another server or memory mapped file or process or socket?
ValidateConnection, in this example, would simply ensure that the backend persistence mechanism both exists and (as is required in most cases) that you have valid credentials.
Loading...
What about "connect(user, credential)"? That is how it works in the real world.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Well, in the real world, when you abstract things properly you don't expose a "connect" method. The code behind your interface - the adapter - would use connect and disconnect internally.
In the real world, when you abstract things, you expose a method that validates that the persistence layer is functioning/configured/usable as a normal part of the application/service/component's life-cycle. I called it ValidateConnection in this scenario because of the way he described his issue.
Loading...
Sorry,
you have no idea about the real world.
You connect to a DB or open a File or open a Socket and either "it just works" or you get an exception. There is no need to "validate" your connection object after you have created it, either you have it and it is "valid" or you don't have it or any subsequent method call results in an exception (which you have to handle anyway).
The title of your post is "Perhaps you should abstract your persistence mo".
After I answered to you, you suddenly talk about abstracting the business level.
So either you made a mistake in choosing the right words or headline or you simply are mixing stuff up and now try to weasel out of it ;D
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Sorry, you have no idea about the real world.
Funny, just a few years ago I was the chief software architect for a company purchased for more than 60 million dollars entirely for our enterprise product. One of the primary reason this company was acquired instead of its competitors was because we were pioneering open standards in our market verticals and supporting those open standards with public integration points that 3rd party companies, including our competitors, wrote integrations to.
This system had a persistence model that had to scale, not just horizontally, but in 'swim lane' fashion - or if you prefer the actual fashion we used, in AKF cube fashion. It handled tens of millions of persisted logic events daily and integrated with many different back end databases - all supported through this EXACT same facade/proxy system implemented with adapters. This pattern was used for all of the integration points and was how 3rd parties wrote integrations with our system.
So, whatever it is you do, you can rest assured that I write enterprise software in the "real world" and quite successfully.
You connect to a DB or open a File or open a Socket and either "it just works" or you get an exception.
You really just can't seem to understand abstraction.
After I answered to you, you suddenly talk about abstracting the business level.
Not at all. Again you demonstrate that you don't understand what abstraction is. By hiding the details of the persistence model, which means (so that you understand) that people using the abstraction interface don't know if it is a DB, or a file, or a web service, or a pipe, or a local process, or a remote process, the business logic simple deals with business objects.
If I was talking about abstracting the "business level" (presumably you mean business logic) I would be talking about an interface exposed to a view or consumer that didn't need to know any details about how the business logic operated. I was clearly not talking about that at all.
So either you made a mistake in choosing the right words or headline or you simply are mixing stuff up and now try to weasel out of it ;D
I'm willing to bet that you end up in a lot of 'arguments' where you bring out this line. It's okay, maybe some day you'll get it.
Loading...
Firebase? Sounds like you can get away with no infrastructure
enough said.
Your first post I answered to certailny was not clear about "abstracting away persistanve issues" and your naming examples like ValidateConnection or CheckConnection are certainly bad choices as an example. On top of that that post made no contribution to the question the poster asked.
I'm a real programmer, not a manager.
Abstracting away the fact that a Service is remote and not local leads to all forms of problems. It is very often. o good idea.
I rather assume you get in lots of arguments, or you are to lazy to use the correct words/concepts to make clear about what you want to talk.
Sorry to say so, but the post I'm just answering to does not look like the person who wrote it had any clue or real life experince in software engineering at all. Is not ment as offense.
And no, I don't use that line often, actually I don't remember if I had used it already once.
Well, the application I'm working on right now mainly uses custom written persistence (MongoDB and kryo) as we have to persist millions of events per hour (worst case) and perform analysises that need response times of less than a second (usully working on a time frame, so only a relatively small amount of data has to be fetched from the backing storage).
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Your first post I answered to certailny was not clear about "abstracting away persistanve issues"
Are you an escaped inmate from a Guatemalan insane asylum?
The entire first post, including the title of the post, is explicitly about abstracting your persistence model.
"In other words, have your backend web services (presuming you're using them and not manually POSTing from a socket yourself to your own socket server) instantiate an instance of iMyDBAdapter and use it."
Maybe you don't find that clear, but that's because you apparently don't understand abstraction...
your naming examples like ValidateConnection or CheckConnection are certainly bad choices as an example.
The stupidity of your statement really cannot be overstated. You dislike ValidateConnection because you claim you will simply catch an exception when you connect; ergo, you are either connected or you are not. This, alone, is proof that you do not understand abstraction.
I'm a real programmer, not a manager.
And you'll apparently never get any further, because you'll need to understand abstraction before you can be an architect. I'm also not a programmer, I'm a software engineer (there's a difference that you're not aware of), a software architect, a founder, a co-founder, and I also perform technical due diligences for multiple Vencture Capital firms.
Abstracting away the fact that a Service is remote and not local leads to all forms of problems. It is very often. o good idea.
Actually, this is EXACTLY what you should abstract away. Yet again you demonstrate your lack of basic understanding of the purpose of abstraction. You think that abstracting away 'locality' is bad and leads to problems? Why on earth would it do that? LOL. Your abstraction layer should satisfy the requirements of the business logic, if locality is an issue (i.e. for performance) then your adapter implementation must account for that. The only time anyone using your abstraction layer should ever know anything about locality would be if that knowledge would be required so that the business logic could make a decision - otherwise, that sort of information should be encapsulated totally.
And no, I don't use that line often, actually I don't remember if I had used it already once.
Sure, I believe you, and you understand abstraction too.
Well, the application I'm working on right now...
Great, I hope you have a competent architect.
Loading...
I'm a competent architect.
That is why I work there.
Sorry I don't get your rants. You seem to have 3 bad days in a row or something. Your talking about abstractions really makes not much sense, so I pray for the entroneurs you consult, good luck.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Your talking about abstractions really makes not much sense, so I pray for the entroneurs you consult, good luck.
I'm sure it doesn't, because you have demonstrated quite clearly that you don't understand abstraction.
How can you be a competent architect when you don't understand abstraction? LOL.
In any case, you're a programmer, right?
Loading...
Sigh, what is your problem?
Do you have a mental illness?
I for my part did not talk about abstraction at all, hence you have no basis to judge if I know anything about abstraction.
Have a good day (and once again I wonder why /. has no ignore feature).
In any case: I'm a requirements engineer, a software architect, a systems architect, a developer in about 20 programming languages; I do everything from training, coaching, developing, testing, analysis, design, implementation, test. I do internet applications with a few dozen of millions of users, desktop applications, embedded development in the automotive and aircraft industries. I do everything that is interesting ... do I need to continue?.
Since over 30 years. But you are the guy who sold a company for a few millions ... wow, I really wonder what I do wrong.
I for my part don't make the mistake (anymore) to accuse someone about "he does not know X" ... can be a grave mistake sometimes.
If you indeed did anything you claimed the previous posts I strongly suggest you improve your comunication skills, and for that matter: your manners.
Sorry to half insult you again: your previous five posts sound like you are a complete idiot and a superb moron.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Last word, lol
Loading...
The approach bears some similarity to what Nathan Marz describes in his book on big data.