ArijitMukherji · Slashdot Mirror

← Back to Users

User: ArijitMukherji

ArijitMukherji's activity in the archive.

Stories: 0
Comments: 2
First seen: 2015-08-10
Last seen: 2015-08-11
Profile: (view on slashdot.org)

Comments · 2

Re: Simplify the problem, use a metrics based appr on Ask Slashdot: Capacity Planning and Performance Management? · 2015-08-11 03:13 · Score: 1

Exactly. That is one of the things we consider in the blog.
Simplify the problem, use a metrics based approach on Ask Slashdot: Capacity Planning and Performance Management? · 2015-08-10 08:38 · Score: 3, Informative
This is exactly the situation we ran into when we launched our SAAS platform SignalFx to general availability. Internally it is composed of 15-20 different micro-services, making capacity planning a big challenge. We blogged about our experience here Metrics based approach to capacity planning . SignalFx is a metrics based monitoring perform, so in a meta way, we used SignalFx to capacity for SignalFx's launch
tl:dr; version of our lessons and suggestions
1. Design your architecture to be loosely coupled, so that it is possible to capacity-plan for each sub-component independently. Break a complex problem into N simpler ones
2. Identity the 'limiting system resource' for each component individually (i.e. what will hit the wall first - CPU, memory, network etc.). You can do this through a combination of experimentation and plain and simple reasoning based on understanding of how it works
3. Identify a business metric that correlates with the utilization of the limiting resource (e.g. api calls per second, number of logged in users, or whatever)
4. Use analytics/math to project the capacity of the system, and how much free capacity you have (make sure to leave enough buffer, e.g. most services won't run very well at 99.99% cpu)
At the end, you'll have something like this for each component of the system - e.g. "if I'm CPU bound on component X, and CPU of X linearly goes up with API_calls/s, and I'm currently at 5000 API/sec at 50% CPU, then I have total capacity for 9000 API/sec (with a 10% buffer) and free capacity for another 4000 API/sec.
Now divide and conquer - let each component owner the responsibility to manage capacity of their system based on business needs provided to them.