Slashdot Mirror


User: ArijitMukherji

ArijitMukherji's activity in the archive.

Stories
0
Comments
2
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 2

  1. Re: Simplify the problem, use a metrics based appr on Ask Slashdot: Capacity Planning and Performance Management? · · Score: 1

    Exactly. That is one of the things we consider in the blog.

  2. Simplify the problem, use a metrics based approach on Ask Slashdot: Capacity Planning and Performance Management? · · Score: 3, Informative
    This is exactly the situation we ran into when we launched our SAAS platform SignalFx to general availability. Internally it is composed of 15-20 different micro-services, making capacity planning a big challenge. We blogged about our experience here Metrics based approach to capacity planning . SignalFx is a metrics based monitoring perform, so in a meta way, we used SignalFx to capacity for SignalFx's launch

    tl:dr; version of our lessons and suggestions

    1. Design your architecture to be loosely coupled, so that it is possible to capacity-plan for each sub-component independently. Break a complex problem into N simpler ones
    2. Identity the 'limiting system resource' for each component individually (i.e. what will hit the wall first - CPU, memory, network etc.). You can do this through a combination of experimentation and plain and simple reasoning based on understanding of how it works
    3. Identify a business metric that correlates with the utilization of the limiting resource (e.g. api calls per second, number of logged in users, or whatever)
    4. Use analytics/math to project the capacity of the system, and how much free capacity you have (make sure to leave enough buffer, e.g. most services won't run very well at 99.99% cpu)

    At the end, you'll have something like this for each component of the system - e.g. "if I'm CPU bound on component X, and CPU of X linearly goes up with API_calls/s, and I'm currently at 5000 API/sec at 50% CPU, then I have total capacity for 9000 API/sec (with a 10% buffer) and free capacity for another 4000 API/sec.

    Now divide and conquer - let each component owner the responsibility to manage capacity of their system based on business needs provided to them.