Domain: danga.com
Stories and comments across the archive that link to danga.com.
Comments · 80
-
Don't worry
Just some things I've learned over the years while working on high and low volume websites:
* Spend your energy coming up with the product and figuring out your customers needs. Chances are you won't run into scaling problems until later. Your first goal is to get that far.
* What you think will be the bottleneck when you start out will probably not be it. The ugly part is that you won't know what it is until it hits you
* Read through some of Brad Fitzpatrick's presentations at http://danga.com/words/ (They're mostly variations on the same theme, pick one of the later ones). Yes it's 6 years old at this point, but little has changed. OK, maybe schemaless datastores. But look at what livejournal did on commodity technology.
* Don't fall into the temptation of using sexy technology because it solves a problem you don't have yet. You can do a heck of a lot with MySQL and Postgres.
* Your choice of technology isn't as important as your development practices. Automate your testing. Automate your deploys. Automate your testing. Stick with the languages you know.
* Measure. Something like New Relic will help you spot your problems and fix them. -
Re:My God...
Amazon's S3 is based off of MogileFS (the concept, not the code): http://danga.com/mogilefs/
And if you want to run an S3 compliant system internally, you'll us openstack.org's object storage system:
http://www.openstack.org/projects/storage/
Ability to provide object storage services at multi-petabyte scale
Free open source software, no licensing frees, ‘open-core,’ or ‘freemium’ model
Written in python; easy to differentiate your offering with extensions and modifications
Compatibility and established ecosystem with industry standard OpenStack API
Support for Amazon S3 API for easy inbound migration
Completely multi-tenant, with billing integration hooks
Pluggable authentication mechanism for SSO integration
Integrated reseller model allows for resale of services -
Re:SparkleShare uses SVN for the dirty work...
Argh. It should be using a light version of MogileFS or Walrus for storage.
-
Re:Umm... no.
The author is pulling numbers out of his ass and has no clue about what uses most time (waiting for database results mostly), about PHP accelerators and about caching systems like memcached.
He's comparing performance of php script running on a raw PHP installation versus running a C++ version of the same script, doing calculations that almost never apply to real world scenarios.I don't see how any company would use C++ to develop their whole systems except maybe for some CGI scripts. Not even Google does it, afaik they use Perl and Python a lot.
Anyway, the number of servers has no direct correlation to the programming language. Out of those thousands of systems, lots of them are read only database servers in a cluster, lots are only serving static files (thumbnails, images used in CSS files on people's pages and so on), some servers are used solely for memcached instances and content used very rarely, some are load balancers....
Basically, the author has no clue.
I always found Livejournal's presentation about scaling very insightful, especially as it's a pretty big site, just like Facebook and other big time sites. The second link gives a lot of details about how they fine tune mysql and other parts of the system, which just goes to show how the apparent speed improvement of C++ versus PHP can overall be actually insignificant.
http://video.google.com/videoplay?docid=-8953828243232338732&ei=3VUuS5-hLaKi2ALXqanJBQ&q=livejournal#
http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf -
Unfortunate for Hadoop
I've been on the market for a distributed, clustered file system for some time. Unfortunately, Hardoop is not really what I'm looking for. What I'm looking for:
1) Redundancy - no single point of failure.
2) Suitable for standard-sized file I/O.
3) Performance that doesn't completely suck ass.
4) Graceful re-integration when bringing a cluster portion back online.
5) Accessible through standard interfaces. (EG: Posix F/S)
6) Doesn't require a PHD in the technology to administer.
7) Doesn't require insane quantities of cash to build.
8) Stable.There are clustered file systems that have some of these qualities. None that I've found so far have *all* of these qualities.
Hardoop fails on #1, #2, and #6. It has a single nameserver commanding the cluster, so if it goes down, well... (shrug) It also does poorly for "normal" sized files, somehow having a 10 GB file is the norm for Google. And setting a multiple node cluster up is definitely non-trivial.
Of all that I've reviewed, GlusterFS did the best but even in that case, I ran into severe over-serialization that brought my 6-node cluster to its knees. I tried three times to roll it out, and had to roll back all three times. I fiddled with the brick setup and caches for days before finally throwing in the towel.
Now I get by with rsyncing program files, and a homegrown data distribution setup using network sockets and xinetd. Not optimal to be sure, but so far it's scaled linearly and provides decent performance, at the price of a PHD in said technology. I guess you could compare our technology to MogileFS, only our scheme
A) uses DNS records to coordinate the cluster so that it scales up,
B) has a richer "where is the file" schema than the simple flat keys used by Mogile, and
C) has the ability to execute programs against files for performance. (EG: grep for searching text files, tar/gzip for compress/uncompress, virus scans, etc)
D) has the ability to "hang open" for activities like logging.So far, this has held up well with about 500,000 file operations and millions of log entries per business day with an average file size of about 1-3 megabytes and every sign that growth can continue by simply stacking on more hardware. No, I'm not talking about massive throughput, but I *am* talking about the need for high availability systems that scale nicely without bottlenecks and exorbitant expense. Yes, it works pretty well, but we've had to invest significant programming time to do this.
Guess it's like the old engineering saw: Convenient, Cheap, Quality: pick any two!
-
Shilling - not always obvious
It's not always obvious when an account is a shill on twitter.
For instance, did you know that the twitter account memcached is a shill for a company named Gear6 rather than an official twitter by the memcached team or Danga Interactive's owner, Sixapart?
-
Re:1000+ a day is trivial have you thought of amaz
Perlbal is still going strong too.
-
caching dynamic content
There is a common understanding that a single server can serve static data many orders of magnitude the scenario described.
But for some dynamic content that triggers database queries, the must is to use memcached. -
Re:ah, stupid.
It is fair to say, "Any key/value database will be at least as fast as any relational database", since one degenerates to the other. However, I find it quite easy to believe that there are a good number of optimizations that can be applied to a key/value database that don't apply to relational systems with foreign key integrity. There are more constraints, and more constraints usually leads to more efficient implementation. For an example, check out memcached.
-
Re:Can't take recommendations seriously
It's worth noting that the performance of the cites you list are probably better examples of memcached performance (which I believe is used by all the sites you give, though I stand to be corrected) than MySQL per se, though certainly the database is an important part of the equation for a massive public online deployment.
-
Re:Color Me Confused
Mod parent up!
This question is one that appears to not yet have been raised in the OpenID security discussion. In these times of phishing attacks on OpenID this should bear heavy on the mind.
For more information, this article is a good jumping off point. -
Try a distributed filesystem
The first thing you need to know about RAID5 is that it's pretty unreliable; if you lose one device (and subsequently replace it) then the array has to read every sector from every other device in order to rebuild the data. Any unrecoverable sector error on any device will result in a corrupt sector in your rebuilt array.
RAID1 duplicates devices, although your storage requirement is now 2x the quantity of data being stored (as opposed to say 1.25x), the chance of error on rebuild is a lot smaller.
However, all inexpensive RAID solutions suffer from the problem that your devices are on a single server - they're a single point of failure, and if, for example, your server's power supply fails and fries the parts in the case, all copies of your data may be destroyed.
To mitigate that problem you could try a distributed filesystem. Your files would actually be distributed among multiple servers and the filesystem would ensure replication. MogileFS is one such, although it does not provide a POSIX filesystem view it is nevertheless pretty easy to use. There are various distributed filesystem projects around, including Ceph, Kosmos, and Venti.
Although these projects are at varying stages of completeness and you may need to be a bit brave to trust them with your important data, the promise of distributed filesystems is high availability and extensibility.
-
Re:He's missing real world experienceHow do you sell it to the business if their budget is getting squeezed (I know this applies to physical servers as well, but if you have capacity in the virtual cluster to fit their app in it's a lot harder to say no).
Oh, you're looking at it from a salesman's point of view, rather than a customer's. That can't be good for your customer. Since Xen is an open source project RedHat's new approach using KVM could prove more interesting.
1 dual processor/8 core server running Oracle with in-memory cache option and support: roughly $200,000.
50 dual processor/8 core servers each running several VM's of postgresql with pgpool-II and memcached: roughly $200,000. The freedom to PXEBoot a blank box into a replicant node faster than you can rack a box: priceless.
Depending on your customer's workload, one of these choices might be better than the other and vice versa. Now, which one are you going to recommend in every case?
-
Re:whyConsider an in-memory database. OK. Instead, you'd like at most only partitions of the data where massive working-sets reside on each partition and do inter-data operations. Got it. Can't find a link, but I'm thinking specifically the hashing mechanism. Given a key, I can find which node should be caching that key. Thus for certain problems that do not nicely break down into small messages, you are indeed limited to single-memory-space hardware. I'm not sure I've seen such a problem. For example, the CPU cache alone is an example of what happens when you break a problem down into smaller chunks.
I can see where a single memory space might do better, though. a simultaneous 700 thread application is NOT hard to write in java at all. Once you know how, I suppose. Consider that most programmers who use threads find ways to deadlock on one or two cores.
The reason I'm drawn to message-passing systems is that pretty much any higher-level abstraction is a Good Thing, as far as threads are concerned. I've come to believe that threads are as harmful as GOTOs. Sure, we'll use them under the hood, but we really need something more structured on top of them.
Also: Message-passing and shared memory are not mutually exclusive. If the message is being passed between, say, two Erlang "processes" on the same machine, I see no reason the contents of that message need to be copied, even if those "processes" are in different OS threads. -
Re:Why have physical storage at all?
> Why not just bounce chunks of data around forever on the Internet?
Yeah, great idea, because SRAM buffers in network switches and routers are so much cheaper per GB than hard drives or memory sticks. If you want to do that, build your own damn Internet that you can clog up without bothering the people who are trying to use the current one. Networks are already the bottleneck for a lot of things.
It is true that high-speed networks can have a lot of data in-flight, though. Esp. over long-distance high-speed fiber-optic links. The speed of light is not _that_ high compared to the switching speed.
A more sensible idea would be a distributed data-replication network, where people offer storage in return for being able to store their own data off site. (encrypted of course.) I think I've heard of projects like that. I think I'm thinking of freenet, if that's what it's called, though. Where you put a file on the net, and it's sent to hubs that request it. So unpopular stuff doesn't get replicated. So I guess I haven't heard of anything quite like that for a distributed network. There is MogileFS, though, for when all the machines are trusted (I think), and form a single filesystem. -
Re:Will it be used?
Stability isnt critical for my applications. Raw speed is however. A decrease in speed would be rather bad.
Memcached? -
As one of the comments on the blog ...
...entry says;
"You seem to not have noticed that mapreduce is not a DBMS."
Exactly. These are the same sort of criticisms that you hear around memcached - the feature set is smaller, etc - and they make the same mistake. It's not a DBMS, and it's not supposed to be. But it does what it does quite well nonetheless! -
Re:Speed and Protection
Ruby is too slow for what I need to do... I know, I know, do more caching do more magic, get more computers, build a server farm, etc.
Or, you could keep your C#, do more caching magic, get more computers, build a server farm, and still win the speed game. Besides, adding an object cache is generally trivial in any language, if you have access to something like memcached, and adding an HTTP cache (sending 304 Not Modified) is as simple as checking a few timestamps.
-
Just another contribution
FWIW, lots of the powerful bits that make Movable Type great have been GPL'ed for some time: Data::ObjectDriver, XML::Atom, memcached. And of course, OpenID has been an open standard for a while now, too.
-
Re:Cache what you can
Frameworks like Hibernate allow you to cache the results of SQL calls so that if the same SQL is reissued (even between different users) the cache reads the result, not the database. Usually you can pick which calls are cached versus which ones have to be live.
Also consider looking at something like memcache, which is a very fast distributed caching mechanism. You can use it to cache more than just SQL queries, too.
-
Re:Can anyone compare this to Jonathan Lemons Kque
The big ones that come to mind are broken support for TCP_NOPUSH:
http://lists.danga.com/pipermail/memcached/2006-Ma rch/002024.html
..and the fact that it can't be used with stdin:
http://lists.apple.com/archives/darwin-dev/2006/Ap r/msg00072.html
The latter in particular is super irritating, at least to me.
I have to see any decent-sized project using events that didn't need Mac OS-specific configuration. -
Re:Prior art - memcached
I can think of some very similar products/etc, for example memcached:
http://www.danga.com/memcached/
You can have multiple memcached servers servicing multiple front ends (just ask wikipedia.org!)
Or just put an SLB in front of a bunch of web services boxen that connect to a NAS/SAN. The SLB distributes the requests across the boxen and tracks which request went where to keep sessions from breaking. Its all been in use for many years, and sounds about like their "One Click" patent:
1. combine widely used or obvious computer systems or "internets technology"
2. Patent this combination
3. ?
4. Profit!Tm
-
Re:Prior art - memcached
I would've thought it was more like MogileFS (http://www.danga.com/mogilefs/) if anything. (MogileFS is the distributed file store that LiveJournal uses for image storage).
-
Prior art - memcached
I can think of some very similar products/etc, for example memcached:
http://www.danga.com/memcached/
You can have multiple memcached servers servicing multiple front ends (just ask wikipedia.org!)
-
Try out MogileFS
We've been using MogileFS on commodity Linux servers for a few months now and it's been working great. The MogileFS community/mailing list is very active, so it's actually been fun to implement.
Right now we have 22.8 TB spread across six 2U servers using a mix of 400 and 500 GB SATA drives. The great thing is that we can lose an entire file server (or two) with no downtime or loss of data.
Another reason to like MogileFS is that it removes the need to maintain RAID arrays. A RAID-5 array made of 750 GB disks is very risky. A high-end controller will still take many hours to rebuild a degraded array, during which time you could lose another disk and be largely screwed. (This actually happened to us very early on and we lost 0.02% of our data after restoring from backup, which still hurt.)
-
Re:This is spot on -- I did some benchmarking, too
Memcached is an excellent suggestion -- especially since it is a distributed cache. Of course, there is overhead; some time ago, I did a series of tests using ApacheBench, trying to establish just how big the performance penalty is.
In purely local tests (i.e. ignoring network overhead) performance dropped 40% with the introduction of memcached. Over a rather slow 10 Mbps LAN, the performance degradation was only 10%. Note that memcached was still local to the server in the second series of tests -- only the request from ApacheBench went over the LAN.
Much of the memcached overhead was due to marshaling; in the non-memcached version, all objects lived entirely in memory. Increasing the number of objects in memcached 10 times resulted in a massive 70% performance drop for the LAN-based tests.
So, if you cache few objects (or plain strings), and there is little communication between the machines in your server farm, memcached performance will be close to pure in-memory performance. On the other hand, if there is lots of local I/O to handle a request, and you maintain a complex set of objects in memcached, you will take quite a hit.
The above is not the fault of memcached -- it is just that when designing distributed systems, you are actually trying to reduce the amount of communication inside the cluster. This is similar to multi-threaded design, where you must try to reduce the number of threads and especially their interaction with each other (I covered this in one of my articles for O'Reilly).
In my system (FlightFeather) I do maintain quite a lot of in-memory state to improve performance. For example, a special subclass of the float type (this is in Python) helps create a fast session cache. In addition, the operating system helps when you are using files directly, by maintaining a cache on its own. The most important thing, however, is to try to generate static content for frequently accessed material. The authors of memcached already know that static content is "boring, easy" (PDF; see page 30). If you want reliability and performance, boring and easy is where you want to go
:-) -
Re:This is spot on -- I did some benchmarking, too
You're probably using it, but if you're dealing with caching (as most Web programmers should), look at http://www.danga.com/memcached/
-
Re:Multiple different storage engines....
Whether OSTG uses a "customized version of MySQL" has absolutely nothing to do with what happened with the threaded comment posting being disabled a few months ago. That was a schema issue, plain and simple.
I've never heard they used *anything* custom in MySQL. InnoDB, MyISAM, replication and multiple masters on some beefy hardware is the general consensus. For caching, it's mainly memcached, just like a lot of the bigger sites. -
How automakers contribute to global warming...
[Danga] / wcmtools / memcached / autogen.sh 1.6
autoconf sucks
autoconf sucks
autoconf sucks
autoconf sucks
autoconf sucks
autoconf sucks
autoconf sucksThat's 105 bytes to say what could have been said in 15 (including whitespace)!
-
Interesting question
I run a community website which is written in Perl with a MySQL back end.
Despite having just under 5000 users I had 3million hits last month, and shifted 13 Gb of traffic. Not bad for a single (dedicated) host!
There are two things that I'd suggest above all:
- Mimimize database queries
- Caching, caching, and more caching
I use Danga's memcached which has a perl interface, but there are PHP ones too. This allows me to sensibly cache database queries (don't forget to test things to make sure you expire the cache appropriately!)
A combination of minimising queries and caching has kept me going even under a slashdotting.
If you have written the site code yourself I'd urge you to add a test suite. My site runs a full test suite every day, and I run it manually whenever I make changes - this allows me to be sure that I'm not breaking things when I make changes.
Of course the standard development model of having a "live" site and a "test" site help here too. I develop the code on a laptop and store it under version control (CVS in my case, but it doesn't matter which system you use as long as you pick one) and only when it has passed the test suite do I push it to the live site.
Adding extra hardware can be an option for bigger sites, but I'm not at that point now. I had my biggest strain when the site reached around 1000 users, since then things keep ticking over nicely, and although it is growing it isn't growing terribly quickly which suits me fine. (There are a lot of users who visit the site via google searches and never register/return; I'd like to fix that, but I don't mind too much!)
-
Re:CPAN!
two identical setups, one that uses mod_php and the other that uses PHP CGI will run identically.
If they are identical setups, sure.
I've run into minor/annoying problems when upgrading PHP on hosts though. The defaults of the language will change (eg. RegisterGlobals), and the available extensions can get broken in upgrades.
There are a lot of times when taking a working PHP script from one host to another will result in errors and require a fiddle to get sorted. By contrast perl is simpler to use since the defaults rarely change, and installing dependency modules is usually simple.
Personally most of my code is written in Perl with Danga's Memcached used to cache database results.
-
Re:Scheduling Priority is for sissys
Slashdot has a *lot* more users than either though. Although some times it can seem otherwise, the good comments can show though... you just need to browse at +4 and ignore anything posted = 25 minutes after a story is posted.
:)
Hey, I browse at +4 already. 90% of my foes are those MMLM people with "free" iPods in their sigs. That has gotten rid of most of the college kids. I'm a subscriber and give bonus points to friends, friend of friends, interesting, informative, yada yada. I've been reading slashdot before it was slashdot, AKA chips and dips. I really like slashdot for the discussions. I wish that there was a more professional side to it. Personally, this topic about nice is way too low for me. I would love to see discussions about software trends, especially things like learning software, cool new APIs or libraries or things like memcached which drives slashdot and other high volume DB based sites. I've used memcached successfully and really like it. I would like to see Linux topics like about the preemptive kernel patches. I like the MySQL/PostgreSQL/Oracle debates, but even those are not lead by very informed people. Basically, I would like a more experienced, professional twist to the discussions. I guess the saying that "Knowledge shared is power lost" is true. The people that know keep it to themselves, while the people that know next to nothing will tell you all about it, with confidence and conviction, yet no working experience is behind the wisdom. -
Re:Doesn't scale?
You're basically praising memcached; which isn't MySQL specific at all. Read Danga's documentation.
Memcached is effectively its own simple associative-array style database, stored purely in memory. It is expected you use another database for the real, persistent storage and then cache the results of database queries using memcached. -
Re:Waiting
Not true in taller buildings when you're going to/from the upper floors and you have to stop at every. damn. floor. on the way up/down.
Yup. The worst is when you're on the ground floor, want to go to the 3rd or 4th, and someone hops into the elevator as the door is closing, and pushes the 2 button.
Stairs are next to elevators. I believe only service and freight elevators should stop on each floor. For people, stop on every other and walk up or down the stairs to the next floor. People do that many times a day in a 2 story house, why can't they go up or down one flight at work? Actually, the same would apply if you live in a highrise as well.
Check this out from the FTA:
In time, the new Fujitec system becomes even more efficient at grouping passengers by learning elevator-use patterns, said Rennekamp, whose team of engineers pioneered the software for the system. It does this by considering historical information to learn traffic variances in the building.
"The predictive logic in our software acts like neurons in our body, parking (the elevators) at certain floors, knowing where the demand might be at certain times."
They call it wrong though. Its not predictive, its learned from the past. This is where computing is going.
Google "learns" the misspellings through context and usage, it is not fed the dictionary. Slashcode, "learns" what is in the database for a while. It does that via http://www.danga.com/memcached/ I believe that is correct.
What the memcache enables is a larger "working memory" like you do when you repeat a phone number so you won't forget it. -
Re:Start with a scalable pipe
Some good comments there
:)I can't see your diagram, but I'd certainly echo the use of Danga's memcached. I use it upon my site, and found that I save a lot of database access via the caching.
There's a brief introduction to memcached with perl I wrote to explain it for newcomers, but bindings are available for PHP, and many many other languages.
Secondly I'd look at cheap clustering with pound this is much better than using Round Robin DNS as another poster mentioned; since it avoids clients getting sent to "dead" hosts. It also allows you to redirect visitors to specific backends for particular requests.
Using dedicated machines for serving static content and images may be useful since it frees your primary server(s) to concentrate on the heavyweight CGI stuff.
-
Re:Cache, cache and CACHE
And Memcached!
-
Memory CacheFor those interested the livejournal people released a while ago their source code for memory cache.
Imemcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
-
MogileFS
Check out MogileFS http://www.danga.com/mogilefs/ . It is open source and might meet your requirements.
-
Here's a couple to look at
Compete File System at http://www.python.org/pycon/2005/papers/46/Compet
e FileSystem.pdf.
MogileFS at http://www.danga.com/mogilefs/ -
MogileFS from livejournal
Livejournal developed their own distributed filesystem:
http://www.danga.com/mogilefs/
It's scalable and has nice reliability features, but is all userspace and doesn't have all the features/operations of a true POSIX filesystem, so it may not suit your needs. -
Re:Caching
If somebody could show me some in process caching that'd take me a long way to ditching php. I don't mean caching to disk either. Caching to disk is much slower than memory. Ideas?
idea. -
Re:Caching
Try memcached: http://www.danga.com/memcached/ and php-mcache http://www.klir.com/~johnm/php-mcache/
memcached is a lightweight, fast distributed cache which gets communicated with over TCP. php-mcache is a c extensions for php which allows you to interface with it. There is some overhead to talking over TCP instead of in process, but the benifit is that you can move the cache to a seperate server or cluster of servers, keeping the cache in process only scales so far. -
Re:high availability of the service
If you find yourself running a bunch of servers all with similar spec/config, you should consider removing the disks from them and netbooting off a single image on another server (or a single image available one 2 other servers just in case). Disks are far more likely to break than any other component imo, far more likely than fans or PSUs if you ask me.
As for RAID5, it's not always practical, but bear in mind if you buy all your disks from the same mfg at the same time, your chances of concurrent failure are increased. (the batch the disks came from may be suspect). You could buy disks from different manufacturers. Hot spares are always handy too.
The webservers for last.fm are all diskless and boot off a single debian image. Makes it helluvalot easier to upgrade/update them. We use Perlbal (from the LiveJournal crew) as a reverse-proxy load balancer, which works nicely. -
Re:PHP != Crap Code
Grr...
I never claimed that Jedit is an IDE for PHP. I simply said I used Jedit. There are IDEs for PHP though. More then one. Get that through your thick skull. Stop saying what you know is to be false. There are IDEs for PHP.
You said this in the previous thread, "Right. ANd PHP has IDEs. Stop putting this strawman up. There are IDEs for PHP. I happen to use and like Jedit." So you literally said, "There are IDEs for PHP. I like Jedit." Now maybe you didn't mean to imply that Jedit is an IDE, but it sure as hell sounded like it. (And JEdit's code complete is very basic... it can complete for loops, but it can't introspect packages for methods that are in the object [or maybe it now can, it couldn't a year ago])
Finally IDEs have nothing to do with the scalibility of PHP.
They do in terms of development.
How does that make your application more scalable? I told you that my application scaled to 20 hits per second without breaking a sweat. I calculated that it could probably do 100 hits per second but we never even came close to that.
They do in terms of development. I wasn't arguing performance scability at that point, I was arguing reinventing the wheel everytime (validating code, parameter check code, database connection pooling code, insert sql statements for object code, caching code.)
This has got the most insane conversation I have ever had. How can you simultaniously claim that PHP is unable to cache and then list one free and one paid product that enable you to cache?
Sorry... I assumed you knew what those products were. They're not parts of PHP (certainly memcached isn't... it's AN ENTIRELY SEPERATE DAEMON APPLICATION that can be used in ANY language.) They're other APPLICATIONS that you can use with PHP to store stuff in resident memory (since PHP can't DO THAT by itself without a bytecode compiler.) Your response is insane. It's like saying "Man, PHP can totally play Sam & Max because I can run SCUMMVM in Unix and Php runs in Unix too!"
That's just a flat out lie. By the time you get your classpath figured out and your XML written the PHP application will be halfway to done. How? Just go visit the PEAR library and see for yourself.
I know java very well thank you.
Ok you don't know Java. It takes about 1 second to resolve a class path in a modern development environment (like Eclipse). It takes about 1 minute to make an ANT build script that contains the line: "<classpath><pathelement path="${classpath}"/></classpath>"? How hard is it to copy your jars to the common/lib directory of tomcat or WEB-INF/lib directory of your webapp? You're telling me you can write half a PHP application faster than I can type "cp *.jar $TOMCAT_HOME/common/lib"? Shit, you are good.
Again, coding a little System.out.println("Hello World") application does not mean you know Java.
That application will not be more scalable then PHP. The only time Java is more scalable then PHP is when you have to distribute your application server amongst multiple servers.
It will certainly be more scalable than PHP without some bytecode optimizations or some inmemory cache products like memcached. -
Re:just what the world needed
Their site is cumbersome, but it is useful when it works. I don't identify with much of the content, but I do enjoy how easy it is to communicate with my pals.
The site is complex and high volume. A LOT of information is being tracked and served. I believe that much of it was originally done in Cold Fusion whereas now all of the sections are subdomains with pages brought to you by Windows. I think they would benefit from studying livejournal's memcached approach to their problem. -
Re:Cache
You can also cache the database queries if you're happy to mess with code.
This introduction to memcached shows an overview; using memcached - used by
/., Livejournal, etc. -
Memcached
-
Memcached
-
Problems with OpenIdI've expounded on why OpenID is insecure and I believe it is unnecessarily complicated.
Problems with OpenIDI put off reading the OpenID spec because I though it was probably flawed. Now I just feel applying my head to my desk.
OpenID is led by with this philosophy:The point of OpenID is to be dead simple, short-comings and all, so it's actually adopted.
The above is taken from a discussion of vulnerabilities. The problem with this lowest common denominator approach is that it's horribly broken. OpenID is currently no better than just giving the URL of your blog.
The number one problem is the complete lack of integrity checking. Everything in OpenID seems to be perfectly happy to let their requests be modified in transit. I think the problem with this are pretty damn obvious: nothing can be trusted. Fortunately, fixing this is pretty simple: use TLS. In today's shared hosting environment, you probably want to require support for server name indication.
Another brilliant idea: transmit the key that you'll use for signing later in plaintext.Yes, you can ask for DH-SHA1 encryption and get back a plaintext secret. If this troubles you, don't use the handle and instead use dumb mode with that server. (and if somebody sniffed the plaintext secret, it won't matter, since you'll never accept queries using that assoc_handle). If the server can't do DH, it's probably limited in some way, but using dumb mode is still safe, if not a little slower.
I believe "limited in some way" means "completely insecure." "Dumb mode" is not safe because there's no key associated with the server, so there's no way to ensure you're talking to the same one or that someone isn't tampering.
I also don't see much point in using a symmetric key for speed and security when you're just encrypting a short string. It's so tiny that both improvements are similarly small.
Perhaps the biggest problem with OpenID is it's reliance on sending a user to another page to login. It's just too easy to spoof a page and fool most people. Even better, you can open a window using Javascript and hide the location bar. Even if you normally use TLS, most people probably won't notice if it's missing or the certificate is different. Also, most sites (including LiveJournal) include a completely insecure assurance that you're secure. For example, LiveJournal says "LiveJournal Secure Site "
A simpler and more secure alternativeThe only way to fix this is (gasp) get users to carry their own keys. If you stored your key in a bookmarklet or extension, you could sign something with it. This is completely feasible because Javascript cryptography implementation is done. You could submit your public key with the signed comment. If you wanted to associate yourself with a URL, all you need to do is link to a page with the public key. If the same public key can be used for the signature.. That's right, no special identity server is needed. The public key could be submitted directly or it can be linked to. It might be a pain to write out the entire URL to the key, so perhaps autodiscovery-from-HTML should be supported:
<link rel="openpgp.key" href="http://www.livejournal.com/pubkey.bml?user=a trustheotaku" />
Note that no TLS is needed. The signature is secure in and of itself. If you want to support all the fanciness (e.g. revocation) of OpenPGP (spec), then you just need the -
Re:does RR have ....
The sessions in Rails are actually provided by the library underlying Rails (cgi.rb), which by default dumps the sessions into the filesystem, but you can replace that backend with anything you like, e.g. memcached.