An RDBMS for CTI System?
cpt_koloth asks: "The company I work for are currently in the process of designing a custom CTI system for a client. A small part of the system is implemented, mostly to familiarize the development team with the telephony API (in our case TSAPI, since the client uses an Ericsson PBX) as a simple click 'n dial application. The main issue is the database system which will be used. We need a database is fast so that it can assign the calls without delays. The present system uses MySQL and is doing great but the numbers of requests will increase exponentially once the 'main' parts of the system are implemented (we have about 60000 requests per day currently most of them being cross table queries but finally they should be seven or eight times this number). Another aspect is the reporting agent, which will operate on the same database and also needs to be fast. We are currently thinking on a system with two databases one for the 'calling part', and one for the reporting part, and we cannot decide on the RDBMS to be used with the way the data will be updated between the two databases. Keep in mind that cost matters a lot. Does Slashdot have any insight to offer?"
I would recommend a RTFK running on a SMKPR, with a stock GHCCF, and several KKDFL's.
As you can see, this would drastically cut UIUER, and lead to greater LUD which, of course, is what every management team wants!
Sounds like a good fit for MySQL Cluster.
Hey,
If mysql is working for you now, you should look at mysql scaling options. For example, if you are worried about reporting queries, replicate the database to a second machine for running the reports against. Mysql replication works great for this sort of application. Also, if your dialer application is only performaning read queries, you can spread those across replicas too.
Knowing the current 'size' of your database would help -- if it's a dual processor box with 1 or 2 gb of ram, there are still a few affordable forklift upgrades before you need to worry too much about one box or mysql's performance (assuming your indexes are set right).
Also, MySQL Cluster was designed by/for the telecomm industry -- the original commissioners were performaning analysis on call records or something of the such.
MySQL can definitely do whatever you want it to. Why switch?
-- DrZaius - Minister of Sciences and Protector of the Faith
You have Cost, Performance/Features, and Time. When requesting anything you pick two and the other is determined by what you pick. You have picked all three. You want low cost, fast performance, and in a reasonable time. It doesn't exist. Pick two and try again. There's nothing essentially wrong with MySQL running in a high capacity environment. It would need to by properly maintained, and database design will matter a lot. Speaking of DB design, wanting to separate the reporting and dialing databases, which are functionaly connected, sounds like it rubs against the grain a bit. I don't know much about your specific environment, so take my opinion with a large grain of salt.
FreeBSD: The Power to Serve!
Just Wow!
From the sounds of it, you are using a RDMBS as a queue. If not, then ignore my message :D
I've worked on a number of systems similar to yours: The end result is that databases aren't meant to be queueing systems, they are meant to *STORE* data. ACID compliance (even the little that MySQL has) is not meant for a queueing system. It's meant for long term data reliability. Replicating databases is generally useless, as you need to delete a queue element once it is complete, and that has to be done on *every* single db server. Replication only helps for select.
My advice? The only thing I've ever seen work are custom controllers. Load up a bunch of id's into memory, work the data in there. Write your own memory allocator, and you'll be surprised at how many elements you can pack into memory when the data resides in a database (but accessible with a one table primary index query) and you keep an id reference in memory. From there, you can implement concurrency, priority, load scaling AND if you really need more performance, you can have different databases, with the controller knowing what resides where.
I see if far too often, the point is a database is NOT a queueing system.
Josh
Open Your Mind. Open Your Source.
MySQL is fast. If the queries you are doing on it now it can handle, "correctly" (i.e. with indexes.. use EXPLAIN SELECT....) then MySQL will almost definitely be as fast or faster then anything else for reads. Well, on a single machine. And Ill cover even money bets on when there is clustering, when all the data is everywhere. The likes of DB2 and Oracle may be able to do something with complex clustering, e.g. partitioning the data up on different machines. Without knowing about your app, I would think that the working data of any client is going to be about the same as any other client. (Well, the non-reporting parts, anyway). That, and your somewhat confused use of "different databases" indicates to me that you are not talking about that level of clustering.
Again, for reads across a reasonable number of tables, MySQL is blazingly fast. Your app doesn't seem like it would be beyond these current MySQL limits. Some analysis, testing, optimization, and a resulting good design, MySQL will beat the "enterprise" systems. With a shitty design, the enterprise systems will be shit, too.
Is there a problem with MySQL now? Or do you just have an (irrational) fear that it wont scale up to your level of requirements"?
MySQL? Be sure you have a robust database first and work around that. I don't know if MySQL can handle high loads gracefully now, but in the past it's been known to flake out and corrupt the database. Not a good thing if that takes out your phones.
Performance is always an issue, but there are some standard techniques for that. E.g., connection pooling (where a single connection is reused for 100-ish queries or 15 seconds before being discarded), caching read-mostly data, pushing as much of your logic into stored procedures as possible, etc.
(Remember that it's always far cheaper to do things in the database itself than it is to push data to your application and then push the results back. I have some updates that take ~15 minutes in my app optimized to minimize queries... and ~10 seconds when done via a loop in a stored procedure even though the latter uses a less efficient algorithm!)
BTW I like PostgreSQL. Solid referential integrity, triggers, stored procedures, etc. Even support for user-defined types and functions written in native C.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
You're anticipating 500,000 requests a day, so let's bump it up to a million to give you a bit of wiggle-room. Assuming a worst-case scenario of those queries being shoe-horned into a typical 8-hour working day, that equates to an average of 35 queries a second, multiplied by some factor to account for peak usage. I have no experience of MySQL specifically but that doesn't sound like an unreasonable ball-park to me - with a decent server, proper indexing and well-written queries I don't see why you should struggle.
(This is obviously subject to the size and structure of your database, but I'm guessing that you don't want to join dozens of tables all containing millions of records).
Before you run out and buy an Oracle licence I would attempt to run some tests on your current set-up. Presumably you have, or can acquire, an idea of the ultimate database structure and data volumes? You may find that you just need to optimise your database and spend a few grand on a better server. As for the reporting question: if you don't need to report on real-time data you can perhaps perform a daily load to your reporting server in the middle of the night when usage is low.
And in any case, your application will hopefully be portable between different database systems without massive effort, so you may be able to defer a final decision until you've got a better idea of what your database needs to be able to do.
You have ONLY 60000 query's per day? That's on a 12 hour working cycle about 1,3 per second. ANY RDBMS should be able to handle this on today's hardware (even on yesterday's hardware).
If you're looking for data safety and recovery etc. you better make sure you use decent table types and optimized queries and that your programmers use database-side transactions (Yes, I've seen programmers implement the transactions on the program side, not a good idea btw) with whatever database system they are using. Check out http://www.developer.com/db/article.php/2235521 for the different table types and the pro/con's about them on MySQL.
Also make sure your hardware is decent. Especially with database systems, you do not want to have downtime because you saved $200 on the hardware. Use RAID5 or even RAID6 if possible, look at optimizing your server with the documentation from the merchant (MySQL AB has some good documentation). Another issue I recently walked into: don't use cheap SuperMicro hardware enclosures you built yourself. I got a hard drive stuck the other day because a power cable slipped under while sliding it in and had to bring the whole machine down to 'operate' it. I have good experience with Apple's hardware (the server hardware), it falls apart quit easily and is easy to maintain without downtime (up to replacing fans and power), you do not have to keep Mac OS X however if you don't want to.
MySQL is definitely an industry-grade solution, especially their latest versions. And they're relatively cheap (free if you want) and have a very good commercial support plan and staff (if you go not-so-free).
Custom electronics and digital signage for your business: www.evcircuits.com
i'm a dba in a sql2000 shop and we have servers that serve over 1 million hits a day with very little problems
database design is your biggest challenge no matter what you use. you can spend $1 million on orcale, but if you build it wrong then you are wasting your money.
build a read/write design where your writable servers don't serve reporting or similar queries. A long select can cause locks for people trying to update data. force people to use the reporting servers for report and other data generation
use the right data type for the data. don't use bigint's where an int will do since it will cause your storage needs to grow and the data involved in each query will increase as well
build the hardware right. don't use RAID5 on files where you are going to have a lot of writes
"(we have about 60000 requests per day currently most of them being cross table queries but finally they should be seven or eight times this number)."
MySQL can very happily handle >800 queries per second on a single machine with good indexes. if you've got complicated data in big datasets, go with cluster or reader/writer replication where you can, and have a read-replica for reporting.
Like others have said - any respectable db should be able to do this without problem.
--onyx--
Anyway, hopefully the mods will see this and -1 mod you and your fanboyism. PGSQL is fine, I'm using it now for a major project, but you needn't attempt to trash talk other solutions. It's simply counterproductive to the original poster's question.
MySQL Cluster may be just what you are looking for. Just keep your database schema very simple since there are some gotchas with the cluster back end.
In the end, your real-time call database may need to be mysql cluster for speed and call processing, but other database functions can probably be farmed off to an inno back end or another database entirely (pgsql, sql server, etc.).
. 62,400 repetitions make one truth -- Brave New World, Aldous Huxley
If the supported limits are enough for your app:
e /xe/pdf/dbxe_faq.pdf
Max 4 gb of user data
Using at most a single processor
Using max 1 gb RAM
Oracle its a pretty decent option, and this version its free (as in beer, not as in speech).
Faq here: http://www.oracle.com/technology/products/databas
"Does Slashdot have any insight to offer?"
No.
You control the app, so you can easily log the response time of the database, and measure it with a small but credible test load.
If you send the database a hundred or so requests at a rate that's deliberately slow, you will get an average response time that's a good estimate of the actual internal response time of the database. Let's say it's 1/10 second. With that number (alone) you can predict and plan for the performance you need.
On a uniprocessor, you will get a maximum of 10 transactions per second before a queue starts to build and you start seeing delays due to queuing, so for 1 through 10 TPS, the response time will be 1/10 second.
In fact, as you get closer and closer to 10 TPS, there is a higher and higher probability that two transactions will come in at the same time, and the queue will start to build before you hit 10...
After the queue builds up, every additional request that comes in will need to wait before it get processed. T queue length is calculated using Little's law, Q=XR, where X is offered load and R is response time. A load of 50 requests would yield a queue length of 50* 1/10 = 5, and the average response time at that load would be (40*0.1 + 0.1) = 4.1 seconds
Voila! For a load of 50 TPS and a target time of 1/10 second, you need a five-processor system, easily achieved with 3 dual-core AMDs.
Feel free to send me mail: I have a copy of Neil Gunther's "pdq" queue solver and can easily compute what-ifs based on your measurements and needs. See "Analyzing Computer System Performance with Perl::PDQ", at http://www.perfdynamics.com/iBook/ppa_new.html
--dave
davecb@spamcop.net
If you're comfortable with two databases, consider LDAP (OpenLDAP is free) for the call-routing part and an RDBMS (MySQL is already working) for the reporting part.
With a proper OpenLDAP install, you should be able to handle over 15000 queries per second (PDF reference). For redundancy, or if you need more capacity, LDAP replication is straightforward.
As far as keeping the two databases synchronized, it's hard to say without knowing a lot more about your requirements. It may be as simple as periodically dumping the call-routing data from the LDAP server and loading it into the MySQL reporting environment.
Geez, you aren't asking for much. You have a development team that is inexperienced in the technology, a large anticipated growth factor, complex queries, requirements for high performance on both the incoming call and reporting ends, and low cost. Have you ever heard the phrase, "good, cheap, quick -- pick two"? In this case you have a complex problem domain and high performance requirements, and you want low cost, i.e., "good" and "cheap". Then you're going to have to accept "not quick". I hope you don't expect to deploy this sucker any time soon.
--Paul
How important is reliability to you? Do you do lots of writes, and if so are they to several tables?
My gut reaction is that this is the kind of situation where you want a "real" database with ACID. But if reliability isn't that important (or you can, for instance, control power down tightly) then maybe something less can do.
From what I read, MySQL is usually considered to be good for situations with lots of reads and a few simple writes (e.g. websites). Is that your situation or not?
For every expert, there is an equal and opposite expert. - Arthur C. Clarke
I'm sorry to tell this to you, but you will be better off buying CTI software such as Altitude (disclaimer: I work for Altitude). You will pay your developers much more implementing all the different CTI possibilities:
dial, answer, hold, extend, retrieve, conference, transfer, blind transfer, etc, etc.
Also you want to do predictive dialing in the future perhaps, multimedia (web chat, e-mail handling), IVR? It is all in there when you buy it. When you need to write it - a huge amount of work.
When your client wants to switch from Ericsson to Avaya or Cisco? No problem for us, switch a driver and that's it. For you - huge problems.
Convince your client that they buy Altitude + a real database (Oracle or MS SQL Server) and 300K database requests is something we laugh about with the proper hardware.
There still is programming, set up etc to do, but it will be less. And you will end up with a happy client.
Writing your own CTI when you don't have any experience is a recipe for disaster and unhappy clients.
Mark
Assuming a worst-case scenario of those queries being shoe-horned into a typical 8-hour working day, that equates to an average of 35 queries a second, multiplied by some factor to account for peak usage. I have no experience of MySQL specifically but that doesn't sound like an unreasonable ball-park to me - with a decent server, proper indexing and well-written queries I don't see why you should struggle.
I'd also say from experience that either of the main Free/open source options could handily manage the load you've described here. Besides making sure you have robust hardware there are application design and requirements to consider in the choice of RDBMS as well that relate to "quality and not quantity"
Personally I'd choose PostgreSQL 8.x over MySQL because (again from direct experience) is is head and shoulders above any other Free solution out there when it comes to concurrency and its transactional support is a fair bit more mature than MySQL. The main consideration is how update-heavy your application is. The article poster said that these transactions would be done concurrently with reporting and data analysis. If the transactions are essentially a series of INSERT statements then there wouldn't be much impact on performance in MySQL and I'd say stick with it because it is what you have now (if it ain't broke, don't fix it). The performance gap has closed quite a bit in the last few years, but MySQL still has a bit of an edge when it comes to single-table queries and basic INSERTs. If that is what you are doing most then it is great.
However, if you are doing multi-statement transactions with UPDATEs on one or more existing records, and/or INSERT and UPDATE operations that rely on data retrieved from some moderate to complex SELECT statement and this is within a single transaction then you have to be a fair bit more careful with MySQL than with PostgreSQL to avoid concurrency problems. If one client runs such a transaction on a MySQL database there are more situations where the transaction will lock rows and/or tables and all other clients will e blocked until the transaction is finished. Generally it is good practice (regardless of the RDBMS used) to avoid long-running transactions as much as possible, however at times an expensive transaction may be unavoidable
and that is where PostgreSQL's MVCC handles things much better.
Another consideration is how your business logic is handled. I prefer to make use of the database to make sure business rules are followed as it offers, almost without exception, the best performance by far. It is basically impossible to use tools/languages like PHP, Perl, Ruby or Python to handle ground-floor business logic faster or better than the database can. MySQL's heritage is simplicity and performance and thus there has been little emphasis on data integrity. It is pretty slow and tedious to have to rely on higher-level layers to do things like make sure '0000-00-00' and '2007-02-30' are not accepted as valid dates, or ensure referential integrity, or simulate triggers. If you know the data going into the database is already of high quality (it is validated at a higher-level, or it is acquired through automation such as barcodes or RFIDs or industrial controllers) and you are doing straight SELECTs without processing at the DBMS level then MySQL will present no problems and its performance and small footprint might be worth more than the unused capabilities of a heavier database back-end.
However, if you must validate human-entered data at the DBMS level or need to create queries that do calculations or other data manipulations (aggregate functions, concatenations, getting a timespan from different dates, etc) then MySQL is VERY INFURIATING to a seasoned database admin or programmer. This is because while MySQL behaves consistently, it is often consistently WRONG behaviour (by that I don't mean it spits out 2+2=5 or it works differently from howit was designed--I mean wrong in that it is different from how most people thin
You do not use RAID5 for anything other than file systems with large files that need only sequential access.
For databases nothing will hurt performance more than RAID5.
A database will update 13 bits here and there, with RAID5 you need to read all of the (potentially multi-MB) stripe into memory to recompute the parity and then write both the data and the parity.
For a database you will need something without parity, but with redundancy, than means: Mirrors.
If you want more space and more performance you add disks in stripesets and then mirror the stripesets (or stripe mirrors).
You enforce your argument about todays hardware being more than fast enough with that RAID5 crazy talk, but only to a point, because if you had worked with databases too large to fit in memory then you'd know that RAID5 is wrong.
I'd rather say that if you have enough money to fit all of your data into memory then you should do so, after you run out of RAM buying cash then you can start adding nice 15k RPM disks in monsterous RAID0+1 configurations.
MySQL, however, is a nice replacement for a csv file, but it's a piece of shit when compared to any RDBMS that takes ACID seriously.
If you can solve your problem with MySQL then feel free to do so, but invest in a UPS and take frequent backups, because you are going to need both.
-- To dream a dream is grand, but to live it is divine. -- Leto ][
Several rules to follow: 1) NEVER mix batch and production 2) NEVER mix reporting and production 3) Build on a platform that can scale 4) Do not share your CTI instance with ANY other activity. 5) Explore active:active deployment (such as Oracle 10G RAC) with transactional replication between the active nodes (such as shareplex). Our production CTI database uses a 32CPU cluster with 90GB ram, SRDF/S replicated disk between the sites between 2 facilities and BCV snapshots for performing Backup & Recovery (BUR). We are also taking snapshots hourly to maintain RPO/RTO. YMMV.
Armaments, 2-9-21 And Saint Attila raised the hand grenade up on high, saying, 'O Lord, bless this Thy hand grenade' N
No-one ever seems to notice that Ingres (not to be confused with Postgres) is now Open Source (GPL) software. As someone who has been using it for several years, I'm somewhat surprised. It's a mature and powerful RDBMS that can scale to very large systems. As an example, we supply systems capable of supporting upwards of 800 users running multiple complex queries on databases that are 80GB+. Of course, this is running on a 4 CPU Sun Fire V890 with 16GB RAM, but the point is that Ingres scales and is used in production for real mission critical applications.
Oh yeah, the obligatory link: www.ingres.com