Slashdot Mirror


Ask Slashdot: Building a Web App Scalable To Hundreds of Thousand of Users?

AleX122 writes "I have an idea for a web app. Things I know: I am not the first person with a brilliant idea. Many others 'inventors' failed and it may happen to me, but without trying the outcome will always be failure. That said, the project will be huge if successful. However, I currently do not have money needed to hire developers. I have pretty solid experience in Java, GWT, HTML, Hibernate/Eclipselink, SQL/PLSQL/Oracle. The downside is project nature. All applications I've developed to date were hosted on single server or in small cluster (2 tomcats with fail-over). The application, if I succeed, will have to serve thousands of users simultaneously. The userbase will come from all over the world. (Consider infrastructure requirements similar to a social network.) My questions: What technologies should I use now to ensure easy scaling for a future traffic increase? I need distributed processing and data storage. I would like to stick to open standards, so Google App Engine or a similar proprietary cloud solution isn't acceptable. Since I do not have the resources to hire a team of developers and I will be the first coder, it would be nice if technology used is Java related. However, when you have a hammer, everything looks like a nail, so I am open to technologies unrelated to Java."

11 of 274 comments (clear)

  1. Show me the users! by Anonymous Coward · · Score: 5, Insightful

    Before going all-out to reinvent the wheel on yet-another-next-big-thing web app, why not roll out a proof-of-principle version letting someone else competent do the "heavy lifting" back-end work. Use an existing cloud/hosting service like Amazon EC2 (they'll do a lot better on the basic back-end stuff than your "I'm incompetent but building a cloud app anyway" approach). After you get your first hundred thousand users, and have investment rolling in by the gazillions, then you hire your own crack team of cloud experts to design your own custom back-end solution (or just sell out for a couple hundred million to whatever group of suckers thinks your zero-dollar-per-user profit model will start paying off once they hit the million-user mark).

    1. Re:Show me the users! by ATMAvatar · · Score: 5, Interesting

      This. The submitter has made an assumption that there will be hundreds of thousands of users. There might not. The only sure thing is that if he spends all his time trying to build a platform capable of serving hundreds of thousands of users right out of the gate, the project will probably fail before a single user sees it.

      Remember: not even Facebook, Twitter, or eBay started off with platforms capable of handling their current load. They all started with something quick and built things out as their respective user bases grew.

      --
      "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
  2. Silly priorities by Anonymous Coward · · Score: 5, Interesting

    Youtube was a lame app with basic mysql setup. Same with Facebook. When it took off, they hired gold people and fixed the scalability issues. Twitter didn't exactly put scalability first either.

    So get real. Don't worry about "hundred of thousands of users", but about getting something decent out there for users to try. If users come, you'll get scalablity sorted out.

    1. Re:Silly priorities by Anonymous Coward · · Score: 5, Informative

      "They then offloaded parts of the infrastructure to Scala of all things.

      http://blog.redfin.com/devblog/2010/05/how_and_why_twitter_uses_scala.html [redfin.com]

      Scala is interesting and has some good paradigms built in to the language for the things Twitter needs to do. Not sure if it is really fundamentally better than Java though - after all it runs on the same JVM."

      Disclaimer: I was a developer at Twitter until last year.

      From the point of view of scalability, Scala is so much more advanced than Java it's not even funny. Ultimately, this boils down to the adoption of immutability as a core concept of the language. In particular, Scala's approach to concurrency is a decade or more ahead of what's in use in Java. Finagle, Twitter's async RPC system, simply wouldn't have been deliverable in a language that makes the use of Futures as difficult as Java does.

      "Plus I like static typing."

      Scala is statically typed.

  3. Start smaller by bfandreas · · Score: 5, Insightful

    Do not plan for hundreds of millions of concurrent users at once right off the bat. That's the very common error a lot of startups make. You do not have such a large userbase. It will take some time until you have.
    Think smaller and scale up when your idea takes off. Set yourself concurrent user milestones when you rethink your architecture. You will also have to rethink the iron your stuff runs on and that may dictate what kind of technology you use when you reached your hundreds of millions goal.

    Technology is interchangeable. It's a tool and you choose the best tool for the job and at the moment you have no users and might as well start off with the usual suspects. JSP/Struts, JSF, whatever you are most comfortable with. If in the long run you do find that this is not sustainable and you need to shift to another technology then you can hopefully afford to hire people who know it.

    You really, really should set yourself userbase milestones, plan ahead for reaching them and be prepared when you reach them. For that you need a lot of information. Log how much time users spend on what functionality you offer because this also has an impact on your UI design when you go big. It also has impact on what technology(-ies) you use.


    I usually bill big when I give advice such as this and help setting up a plan when to do what. Your problem is less one of technology but a business one. Think like a businessman first and like a techie second.

    --
    20 minutes into the future
  4. Re:Heroku by Baby+Duck · · Score: 5, Informative

    OpenStack. You can start with a hosting provider like Rackspace that has as a faithful implementation of it. I know they were recently pinged for some incompatibility, but they have vowed to fix that. If you still can't stomach it, choose a different OpenStack provider. OpenStack is the key.

    When you get really big, then you can work on running your own datacenter or paying someone to host the hardware for you (again, Rackspace, DreamHost, etc.). Then you can put your own implementation of OpenStack on the hardware with all the customization specific to your needs. This will naturally build on top of your years of investment with the vanilla OpenStack when you were smaller. The progression path is laid out for you.

    I'm replying to this parent because Heroku is also an excellent choice for scaling where you pay as you grow. I'm just not sure if you can later fork Heroku to suit your needs with the datacenter supplier of your choice.

    --

    "Love heals scars love left." -- Henry Rollins

  5. Re:Cultivate Teams, Not Ideas by crutchy · · Score: 5, Funny

    teams are much better at solving problems than individuals

    even this slashdot forum could be thought of as a sort of team, in that many people are coming together to address a problem

    ok there is no leadership and its full of trolls, shills and idiots... maybe it's not really a team... more like a committee... ok so you're probably doomed

  6. Some research material by leonardop · · Score: 5, Insightful

    I salute you for your ambition and determination. I hope you get to realize your vision.

    Now, as I read your question, I remembered an interview I saw a few days ago with Ben Kamens, one of the engineers working at Khan Academy, talking about scalability and things like how they manage their operation and the spikes of growth they have experienced in the past. It's a little light in technical details, but you may find it interesting: Root Access: How to Scale your Startup to Millions of Users.

    One thing I'd like to mention is that when you hear someone else talk about the things they've done and how they have done it, it's easy to see it as an advertisement for a particular technology platform (AppEngine and other Google machinery in the previous video, for example), but that's not the thing to focus on. Whatever choices other people have made, the good thing is that their advice can be useful no matter what choices you end up taking. I know this seems like such a trivial thing to say, but evidence suggests that a number of people miss this basic concept, and then discussions quickly degenerate into pointless noise about concrete technologies, instead of the ideas.

    I'd also recommend that you pay a visit to Google Developers youtube channel and type something like "scale" or "scalability" in the little channel search box. You might learn a few things from some really smart people who have confronted very real situations regarding scalability.

    Best of luck to you, my friend.

  7. Re:Premature optimization by UnknownSoldier · · Score: 5, Interesting

    Agreed. This guy doesn't really understand scalability.

    The OP needs to read how Plenty of Fish started off:
    http://highscalability.com/plentyoffish-architecture

    * PlentyOfFish (POF) gets 1.2 billion page views/month, and 500,000 average unique logins per day. The peak season is January, when it will grow 30 percent.
    POF has one single employee: the founder and CEO Markus Frind.
    * 30+ Million Hits a Day (500 - 600 pages per second).
    * 1.1 billion page views and 45 million visitors a month.
    * Has 5-10 times the click through rate of Facebook.
    * 2 load balanced web servers with 2 Quad Core Intel Xeon X5355 @ 2.66Ghz), 8 Gigs of RAM (using about 800 MBs), 2 hard drives, runs Windows x64 Server 2003.

    And also about NginX:
    http://www.aosabook.org/en/nginx.html

    If you "need" multiple servers when you are first _starting_ out you're probably focusing on solving the wrong problems.

  8. Re:OT: "why not" by Anonymous Coward · · Score: 5, Funny

    This. I have found it's best to avoid phrases like "why not shut the fuck up," "why not eat shit and die," and "why not stick your unsolicited advice up your ass." People do not react to these phrases as positive suggestions as intended, and they immediately go on the defensive. Instead, try sarcasm.

  9. Start with scalable technologies! by MarkRose · · Score: 5, Insightful

    As someone who has written an application that scales to over 1 billion requests per day, let me offer my thoughts.

    Scaling your application should be as trivial as launching more application server nodes. If you can't add/remove application nodes painlessly, you've probably done something wrong like keep state on them (this includes sessions).

    Don't worry about scaling your application layer at all (within reason). You can always throw more machines at the application side in a pinch, and for a long while it will be cheaper to add servers than to hire someone. When your application servers are costing you more than a salary, hire someone to find the hotspots in the code and make them faster. Until then it's a waste of your time.

    Scaling state, aka your datastores, is where the challenge lies. You need to spend a large amount of time sitting down and analysing every operation you plan to do with your data. SQL is great for a lot of things, but you will eventually run into a point where heavy updates make SQL difficult to scale. Mind you, decent hardware (lots of cores, RAM, and SSD) running MySQL should scale to several thousand active users if your queries are not expensive. The Galera patches to MySQL (incorporated into Percona XtraDB Cluster and MariaDB) can give you true high-availability, but you will still have write-throughput limitations.

    I would also highly recommend you look into Cassandra (especially 1.2+, with CQL 3), which was built from the ground up to scale thousands of low end machines that often fail (if you can't tolerate hardware failure, you messed up). Cassandra is more limited in the kinds of queries you can execute, more relaxed with data consistency, and more thought is needed ahead of time. On the other hand, it can also be used for global replication, which is something you are interested in. At the very least, having a good understanding of its data and query model will open your mind to the kinds of tradeoffs that must be made to enabling scaling.

    Contrary to what others are saying, you are correct to think about scaling now before you even start! Doing a rewrite is costly and expensive in money and time. Why set yourself up for that? Planning for scale before you start is the best time! If you start with a scalable datastore like Cassandra, and structure all your queries to work within its model, it is no more work than doing things in SQL, and you're way ahead of the game!

    The most important part is spending time modeling how you will access your data. Think about how you'll avoid hot spots (which make scaling writes difficult), and think about how to make reads fast by reading as little as possible. Think about caching, and how you'll invalidate the cache of a piece of your data without having to invalidate caches for things that didn't change. (Think about updating on data ingestion instead of running statistics later.) If you can't avoid hot spots, make only small reads, and cache independently, you are not done.

    Good luck!

    --
    Be relentless!