Domain: danga.com
Stories and comments across the archive that link to danga.com.
Comments · 80
-
Quizilla.com
I run quizilla.com, a pseudo-entertainment site that does 60-70 million pages a month, at least 2/3rds being dynamic database backed.
The site faq has the grity details, but basically everything is running on 8 web servers with a cluster of 4 database servers. Mod_perl is used for the most highly trafficed pages, though some less used pages are still static CGIs.
For the way I have it set up, this farm has reached it's limit with the web servers getting pegged pretty constantly during peak hours, and the database servers aren't far behind (mostly due to lack of ram).
The site makes heavy use of Memcached as well as a homebrew ghetto load balancing system based on apache mod_rewrite and some ansilary code.
If I had my druthers, I'd keep the number of machines but have the web heads be 2.8-3ghz Xeons or Opterons with 1.5 GB ram each and the database servers could be dual 1.8ghz xeons with at least 3GB ram each. Idea memcache would be at least 2GB, but more is always better. From my guess, a setup like that would run my site at 100mil quick pages a month, instead of like now where pages often take 5 seconds or more.
One big things that you don't really notice until you try to make things on this scale is that optimization is king -- optimize the hell out of your code. A stray regex might not look expensive, but when it's happening twenty times a second on every machine it quickly adds. up.
Code is almost always the weakest link in a big cluster in that seldom are things sufficiently planned -- I've had huge growing pains since I never planned on scalling past one machine so when i had to move to 2,3,4 and up to 8 is has been a real hassle making things work "right" in a massive cluster. Plan for clustering from the get-go if you even have the slightest inkling it will do high traffic volumes. -
Take a look at livejournal's setup
Akamai for static content and take a look at livejournal's setup for dynamic content (master-master replication based on mysql).
Other people are much more qualified than I to answer the number of servers questions though. -
Re:How to Suck in 21 days!
You can also speed up dynamic websites with caching - with the memcached software.
Slashdot, Livejournal, and other sites use that tool.
-
Re:You answered my question b4 I asked, partially
Memcached. I think you'll find that is actually the way it should be.
-
Memcached
Or you could use something like Memcached. Works with pretty much any language, and tonnes less hassle. (Thanks LJ
;)) -
Re:Application scalingmemcached
Wow. I see. Thanks a lot.
-
Re:Application scaling
Thats exactly what memcached fixes. memcached is a cluster of memory segments. You just fill up all your server with cheap memory and add them to the memcached pool which leaves you with gigs and gigs of ultra quickly accessible data.
This is what powers slashdot as well
-
Re:Um... No?
Just about every single hit is DB backed. At peak times, they get over 1000 hits per second. Read about the infrastructure. It's interesting. http://www.danga.com/words/2004_oscon/oscon2004.p
d f. -
Re:./ed !!!! Server Reboot Time?
You can read Brad's presentations on LiveJournal's setup. The LISA one is the most recent, I think.
-
Re:./ed !!!!
The Alexa link was the only tangible example I could find. I distinctly recall seeing a post by Brad himself mentioning how much more traffic LJ handles, but obviously I can't link to it at the moment.
Anyway, as of Google's last crawl of the stats page (shortly before the outage), there were almost 6 million LJ users, a little under half of those "active." I don't know if /. has any stats available, but skimming through this page, the highest UID I see is in the 800,000 range. I'm not going to even attempt to guess what the relative activity level of LJ users is compared to /., or which has bigger pages or whatever, but I would offhand say that LJ probably handles more image traffic (user pictures, and now the in-testing photo hosting service). I know they used to use Akamai for that, but I seem to recall that fairly recently they switched over to doing something else. (I think they handle it themselves again, but I'm not sure.) There's also the audio files from phone posts. I'd say there's little question that LJ is the more heavily trafficked site.
Besides, a lot of the DB load on Slashdot is eased tremendously by Memcached, developed by... Danga Interactive, i.e. LJ. Wikipedia uses it too, and just started using Perlbal. (And I do mean "just") Ditto for Audioscrobbler/Last.fm. So /. isn't in much of a position to pooh-pooh the technical ability of Brad/LJ. -
Re:./ed !!!!
The Alexa link was the only tangible example I could find. I distinctly recall seeing a post by Brad himself mentioning how much more traffic LJ handles, but obviously I can't link to it at the moment.
Anyway, as of Google's last crawl of the stats page (shortly before the outage), there were almost 6 million LJ users, a little under half of those "active." I don't know if /. has any stats available, but skimming through this page, the highest UID I see is in the 800,000 range. I'm not going to even attempt to guess what the relative activity level of LJ users is compared to /., or which has bigger pages or whatever, but I would offhand say that LJ probably handles more image traffic (user pictures, and now the in-testing photo hosting service). I know they used to use Akamai for that, but I seem to recall that fairly recently they switched over to doing something else. (I think they handle it themselves again, but I'm not sure.) There's also the audio files from phone posts. I'd say there's little question that LJ is the more heavily trafficked site.
Besides, a lot of the DB load on Slashdot is eased tremendously by Memcached, developed by... Danga Interactive, i.e. LJ. Wikipedia uses it too, and just started using Perlbal. (And I do mean "just") Ditto for Audioscrobbler/Last.fm. So /. isn't in much of a position to pooh-pooh the technical ability of Brad/LJ. -
Re:./ed !!!!
The Alexa link was the only tangible example I could find. I distinctly recall seeing a post by Brad himself mentioning how much more traffic LJ handles, but obviously I can't link to it at the moment.
Anyway, as of Google's last crawl of the stats page (shortly before the outage), there were almost 6 million LJ users, a little under half of those "active." I don't know if /. has any stats available, but skimming through this page, the highest UID I see is in the 800,000 range. I'm not going to even attempt to guess what the relative activity level of LJ users is compared to /., or which has bigger pages or whatever, but I would offhand say that LJ probably handles more image traffic (user pictures, and now the in-testing photo hosting service). I know they used to use Akamai for that, but I seem to recall that fairly recently they switched over to doing something else. (I think they handle it themselves again, but I'm not sure.) There's also the audio files from phone posts. I'd say there's little question that LJ is the more heavily trafficked site.
Besides, a lot of the DB load on Slashdot is eased tremendously by Memcached, developed by... Danga Interactive, i.e. LJ. Wikipedia uses it too, and just started using Perlbal. (And I do mean "just") Ditto for Audioscrobbler/Last.fm. So /. isn't in much of a position to pooh-pooh the technical ability of Brad/LJ. -
Um... No?
Do not pass go, do not collect $200.
Look, Perl rubs me the wrong way. I loathe it, and it makes me wanna hurl. More than that - it's Postgres that rocks my DB world. But personally, I think I'd at least read up on LJ's infrastructure before bashing it.
I mean they've got what? 2.5 million active users?
And how many hits are DB-backed?
Sweet fuck, man. How many servers do you think they're wasting? Assuming no redundancy (ha!), right now they're sitting at an approximate ratio of about 25,000 users per server! What morons they must be to not be squeezing more out of them. (And yes I know that I'm way oversimplifing, but... really?) -
Good for SA
This move will be good for SA, because LiveJournal has some excellent thinkers and programmers. Okay, their users might tend to be a bit juvenile, but LiveJournal's architecture is pretty amazing. It's great what the team have managed to do with limited resources, they've developed some really hot technologies, like memcached, which even Slashdot uses now.
I just hope technology migrates from LJ to SA's products, rather than the other way round.. no TypeKey or comment spam on LJ please! -
LiveJournal is more interesting than you think
Most geeks seem to react to hearing "LiveJournal" with something along the lines of "haha, livejournal sucks! it's just a bunch of 12-year-old girls complaining about their parents!" However, the service is quite interesting from a geek perspective: They run a pretty huge web application (700-800 pageviews per second at peak, most of them database-backed), and Brad has written quite a bit about the challenges and solutions they've come up with. They've also written several very interesting open source infrastructure applications like memcached (used by Slashdot) and perlbal. Thus, while the service may not be all that interesting, the tech behind it certainly is (at least to this geek).
-
LiveJournal is more interesting than you think
Most geeks seem to react to hearing "LiveJournal" with something along the lines of "haha, livejournal sucks! it's just a bunch of 12-year-old girls complaining about their parents!" However, the service is quite interesting from a geek perspective: They run a pretty huge web application (700-800 pageviews per second at peak, most of them database-backed), and Brad has written quite a bit about the challenges and solutions they've come up with. They've also written several very interesting open source infrastructure applications like memcached (used by Slashdot) and perlbal. Thus, while the service may not be all that interesting, the tech behind it certainly is (at least to this geek).
-
Don't forget postgresql.
I'm using LAPP (Linux + Apache + PHP + Postgresql) for http://www.coku.com/ ,
may be I should also add memcached http://www.danga.com/memcached/.
D Moon
-
Memcached?
Your sharedance software is interesting. Don't know if you are aware of memcached though, (http://www.danga.com/memcached/, by Livejournal guys) and if so did it lack something that prompted you to write your own?
-
Re:ipvs, LAMP
look into what slashdot does for high performance, I forget the name of the software but it's a distributed caching type system, linux journal had an article about it and it looked very interesting.
I think you are referring to memcached. -
Learn from LiveJournal.com
Hi OP,
you may want to read this from the creator of LiveJournal.com: http://www.danga.com/words/2004_oscon/oscon2004.pd f Good Luck! -
LJ - Memcached - Wikipedia
Some may find it interesting that Wikipedia (covered earlier today on Slashdot) uses some code that came out of LiveJournal for caching: memcached.
-
Livejournal Backup..
Hrrm.. I imagine that that would have only ever happened as a mistake - never as an unannounced delibrate action. I cannot imagine Brad being as unrepentent and arrogant as Dave here. (Another
/.er has said that Dave apparently has quite a reputation for arrogance.)
LJ is a completely different level of outfit - their scale is huge. They also created and released (the open source) memcached, now a standard way of accellerating databases on very heavy traffic'd sites.
Anyway, there is finally a livejournal backup program - downloads your LJ to your local computer. -
Re:Already in use
For those others reading this, Memcached rocks as a general-purpose distributed memory cache.
-
MediaWiki and other wikisAlso take a look at MediaWiki, the open source wiki that runs Wikipedia. It was especially developed for that purpose, but is now also used by our spin-off projects Wiktionary, Wikiquote and Wikibooks (the latter is an attempt to create free textbooks for use in education, and has already made some good progress). All of these projects are organized under the Wikimedia non-profit foundations. More projects such as a wiki news site are on the horizon.
MediaWiki is also used by non-Wikimedia projects. Among the more interesting ones is Disinfopedia, an encyclopedia of propaganda, and Wikitravel, a travel guide. Star Trek fans will want to take a look at Memory Alpha.
Because of Wikipedia's constant server problems, MediaWiki has been refined to be very scalable. It caches almost everything and uses Livejournal's memcached to keep important data in memory. It also has support for Squid proxy servers. Aside from that MediaWiki comes with a huge set of features, many of which are found in few other wikis:
- section editing - edit not a whole page, but just a small subsection of it (great for large pages)
- automatic image rescaling
- LaTeX support for mathematic formulas
- message transclusion - create messages that can be used
- namespaces to separate article content, user pages, image descriptions and discussions; message notification for user-to-user messages
- plenty of query functions to examine the relationships between articles (articles which have many links to them but don't exist, articles which have no links to them, very long/short articles etc.)
-
Re:Joke in Topic!
And for caching we all know about memcached, right? slashdot. uses it, and so does Livejournal.
-
Re:This article is intended to be read by humans
Using memcached (which slashdot does), they have a few options at hand:
- Store gzipped content in the memcached daemon, pulling it out when needed
- Use the time saved (saved in the sense that you no longer need to pull the comments from a database every single time) to gzip content and send it to the user
-
Re:Some facts
PHP provides persistent connections, so the system does not need to reconnect to the database on each hit. It also provides various caching tools, like memcached. Charts and graphs are typically handled with third-party libraries. It's not exactly hard.
-
Re:What is slashdot doing?
Last I recall this was discussed they said they had a beast of a quad-cpu mysql4 server as primary, with slaved replicators for read-only things.
They're using innodb tables more and more (see the slashcode), along with http://www.danga.com/memcached/ -
Re:Probably a dumb question
One problem for Slashdot has been all the comments, even if the page that is output is cached somewhat. One solution that livejournal.com has created, and is now being investigated by
/. and Sourceforge is MemcacheD, run by danga, the 'parent company' of Livejournal. It will cache comments in memory, and user information so it doesn't have to be dragged out of a database, even one as quick as MySQL. -
Re:Isn't this idea saturated??
There's a difference between LJ, the codebase, and LJ.com, the site. LJ.com is owned by a for-profit business called Danga Interactive, Inc.. LJ also has a community to discuss how to make money from LJ.
(I have no connection with LJ other than keeping a journal there-- the above is mostly just stuff I found with Google.)