Large Scale Web Apps Built on Open Source
prostoalex writes "Brad Fitzpatrick presented at OSCON with on overview of his little project. Interesting facts about the evolution of the Livejournal back-end architecture."
← Back to Stories (view on slashdot.org)
It's all LAMP.
OMG! Today I had CEREAL!!!!!
With MILK!!!! OMG!!
Uh, like, you mean the Web itself? That's large scale, certainly was built, and is most certainly built on open source.
So, yeah, I reckon it can be done. I'm using the proof-of-concept to submit this comment.
Are you serious?
In the off chance that you are, it's one of the OpenOffice.org formats, inheritted from StarOffice... it's supposed to be their answer to MS PowerPoint.
-- [insert sig here]
...right here.
It's powered by GForge, so it's backed by PHP and PostgreSQL.
There are a bunch of other sites running GForge listed here...
The Army reading list
We are using Fedora, Postgres, and PHP for what I consider a rather large-scale application. It is a storage and query system for research on a few million patients. We could have gone with Oracle and Java (...shiver...), or even MSSQL and a Windows server, but why waste money? The only real headache I've had is figuring out that Apache2 is threaded and Postgres/PHP sits on top of some low-level linux code that is not. I could use Apache instead of Apache2 to fix the problem, but I fixed the non-threaded code instead.
The previous comment is purposely vague and generalized, but all of the facts are completely true.
Maypole is a Perl framework for MVC-oriented web applications, similar to Jakarta's Struts. Maypole is designed to minimize coding requirements for creating simple web interfaces to databases, while remaining flexible enough to support enterprise web applications.
Ok, so most of the Journals lack even a scrap of entertainment value... but the data feeds are normally fun. Is there anyone left that hasn't wasted a few bytes on the following url?
http://www.livejournal.com/stats/latest-img.bml
Hint - its a constantly updating list of all the new images posted to journals. After a while you give up waiting for a hot chick to post and decide crazy survey graphics are as good as it gets. And then some hot chick posts her birthday party pictures, but she's only 14 and suddenly you wish you'd spent the day doing something else.
0daymeme.com: Great stuff.
I thought the P means any or all of the P language: PHP, Python, Perl
Back in the .com days, I worked at a huge (now defunct) porn site. We had about 50,000 active hosted sites, 500,000 hit counters and a bunch of other stuff. We were getting tens of millions of page views daily, maxing out two 100 megabit circuits at times. It was all FreeBSD, a little Redhat, Perl, mysql, squid, apache, mod_perl and C. The only real closed stuff we used were BigIPs and traffic monitoring software.
Actually, LAMP can also refer to PERL and Python as well as PHP.
The web is really a mixed bag that allows a mix of open standards, and proprietary software. To claim it is all open source is misleading. It is a dynamic network that allows development on multiple layers.
The most important aspect of the web is that the interface of the different layers were well defined and exposed...not that each line of code in the different layers is exposed.
It's a pervasive belief among the suddenly famous. IBM, MS, or Sun doesn't need this. It's the small website with a bright idea that is all of a sudden gaining popularity which goes through almost each of the stages described in this document.
This is for people with absolutely no budget and infinite traffic. This is how to live through that and come out winning like Brad apparently has.
If you are looking for scalable OSS solutions, also look into Zope with Zope Enterprise Objects (ZEO).
A little harsh considering the guy's starting point, but it is true that most people / companies don't think things through. I put in a lot of startup web sites in the 90's, and used to give lectures on, among other things, why replicating databases doesn't scale. Looks like people still think that replicating databases is a solution, almost ten years later. It makes me glad I opted out of the e-com performance world, or I'd still be solving exactly the same problems.
Simple lessons:
-replicating database all over the place doesn't work
-adding lots of servers doesn't work unless the apps are designed to work that way
-object-relational and object databases are useful for a narrow class of problems, and Do Not Scale
-java/perl/etc. are great, but you have to learn some SQL because doing things like sorting data in code is stupid when the database is 10x faster doing on retrieval than your code
There's the material I used to get $2000 for for a 1 hour lecture. Share and enjoy.
Not large enough scale to survive a Slashdotting...
You need to get over your favorite language/technology/term you read in the trade-rag you read last week. And then you need to get over yourself.
Give it up slashdot crowd. mod_perl is not a valid technology for a large scale website! Perl was designed for a task, and that task was NOT enterprise application development.
Spoken like someone who has never had to build a very large site (doing "real" work) completely in Perl/mod_perl. I can tell you that it most certainly can scale to enterprise needs. Did this guy do it right? I don't think so either but he most certainly learned a valuable lesson. Hopefully other people will study what he has done and improve their own systems based on his work.
For the record, Java wasn't built for enterprise application development either. As with Perl, people discovered that Java had a future there and here we are today.
A properly designed website with n-tier sepperation will be able to handle a large load and scale infinitly. You'll note that large websites who actually do real things besides logging people's daily problems don't use mod_perl and a thousand servers. There's a reason for this.
You're assuming two dangerous things... (1) That you can't have n-tier and Perl. And (2) that large mod_perl sites require lots of servers. To believe any of these things is to demonstrate your horrific misunderstanding of computer science in general. I pity the company that lets you design their architecture. Wait, no I don't.... I'll gladly take their money for fixing your mistakes.
Oh yeah, and let us not forget some other languages that are showing promise... specifically Python+Zope. In fact, I know of several people implementing n-tier applications with PHP on the front, Python in the middle and PostgreSQL in the back with much success.
And for the record, here are some large companies and sites heavily using mod_perl.
Want more?
Most of the time those numbers are four or more times that high. It's early in the afternoon, this isn't a peak time.
...
Anyway, those are only the number of entries being posted. For every entry being posted, there are a ton of inserts actually going on:
* log2 table to contain some metadata about the entry
* logtext2 table to contain the actual text
* logprop2 table (multiple rows, 3-5) containing other metadata about entry
So, four times the traffic, about 6 inserts each, 2400 updates per second--and that's just for posting entries. We get a lot more traffic from people posting comments (which also do 3 or 4 update/inserts each comment), plus people editing their userinfo, uploading new userpics,
While LiveJournal definitely isn't a huge site, it's not a lightweight, and definitely doing pretty good for having around 80 machines and doing 30-40 million fully dynamic page views a day.
As a paying subscriber of Livejournal, I can say the only reason I even have an account is because of the friends that I have who use it. I would never use it as a case study for any technology. It's got huge performance problems, data loss issues, and usability issues. This may not be the fault of using OSS, but it definitely doesn't help it look good.
There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
Some may find it interesting that Wikipedia (covered earlier today on Slashdot) uses some code that came out of LiveJournal for caching: memcached.
Simpy
It's somewhat amusing that in the first load balancing example, one of the points of failure was Kenny. Especially since Kenny ALWAYS DIES.
Karma: It's all a bunch of tree-huggin' hippy crap!