Getting a Grip on Google Code
netbuzz writes "Niall Kennedy reports on his blog that Guido van Rossum, author of the Python programming language, has begun showing off his first project since joining Google last year. 'Mondrian is a Web-based code-review system built on top of a Perforce and BigTable backend with a Python-powered front-end,' Kennedy writes. 'Mondrian is a pretty impressive system and is currently in use across Google.' Kennedy's description of Google's current code-review system sure makes it sound like it was in need of an upgrade. 'The Mondrian tool creates a much better workflow by creating task-specific dashboards, in-line commenting, well-tracked statistics, and more,' he writes. 'The application is built on top of Python open source libraries such as the Django framework, smtpd.py mail service, and the wsgiref Web server software.'"
8 bit characters is exactly what it /does/ support. It's multi-byte characters that are often seen as the problem, although UTF-8 is also supported (Unicode generally, however, is a different matter). Ruby can also support load balancing and HTTPS.. although since those aren't relevant to a programming language per se, it's intriguing why you bring them up (unless I've fallen for a troll, in which case.. well done ;-))
Uh, UTF-8 is a Unicode Transformation Format, that's usually (that or UTF-16) what people talk about when they mention "Unicode". And Ruby definitely sucks at anything out of the ascii character space, be it inside or at the boundaries (interfacing with the outer world).
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
And actually good documentation is written in Latin, or Hieroglyphs, or Sanskrit.
?? What's wrong with Perforce?
Good idea, building on a closed-source SCMS that's (barely!) a mid-level player in the market. I can understand not wanting ClearCase, but what's wrong with CVS or Subversion? Hell, even Monotone or GNU Arch...
Oh well, could be worse: they could have gone with StarTeam, PVCS or MKS Source Integrity...
Just junk food for thought...
Codestriker does the same thing. Except it is in perl + GPL, on source forge.
I'm not sure how you decided Perforce is a "barely mid-level player" in the SCM market. Adobe, Google, and Microsoft all use Perforce as their primary source code management solution. (Though Microsoft has highly modified it and calls it something else internally... but my contacts there tell me it's still Perforce underneath.) Perforce does have its problems with scalability, but in terms of merging, collaborating, viewing history, keeping branches, etc, etc, etc, it's pretty awesome.
my blog
Totally OT, but your comment reminded me of this. A great piece of history from the Multics group about an error code that never was meant to see the light of day, yet, through circumstances, did show up once during an upgrade.
I use both Subversion and Perforce. There's one major feature still lacking from Subversion: merge tracking. There's work underway to design, implement, and document this feature, but it's not done yet. This is a huge deal for anyone with lots of branches.
Not that it's all roses with Perforce. My impression is that it doesn't scale very well. Most operations simply lock the entire database. I think it's a reader/writer lock, but it means that (for example) while the hour-long checkpointing pre-backup process happens every night, you can't do any write operations. (And there's a way to do an offline checkpoint, but it's not documented or supported, and is difficult to get right, with bad consequences if you don't.)
Let me be a little more specific: while the hour-long checkpointing process is happening, you can't even open files for edit. In addition to having really course locking, Perforce has more write operations than most version control systems. Subversion's CVS-style working copy means the only write operations are commits and revpropsets.
The main reason for starting SVN was that a lot of things were wrong with CVS. Arguably SVN(nevermind Monotone, Arch) has only recently approached Perforce level of stability, scalability and functionality. They needed something workable probably at least five years prior. ClearCase is clearly not a Google-style solution.
Looks like a good choice to me.
Adobe, Google, and Microsoft all use Perforce as their primary source code management solution.
Amazon does too.
You can't check out files with both Unix and Windows line-endings. See http://smithii.com/perforce_bugs for the ugly details.
Blargh! Mondrian is already an open-source OLAP engine! Seriously, a casual google search could tell you that. And it's not some sf.net abandonware, it's a mature and powerful OLAP Cube engine used by some big-name corps!
Oh, and just to rant a bit more: Python WAS ALREADY THE NAME of the Lisp Compiler used in the CMUCL Common Lisp implementation and lately SBCL. And was relatively well known in computing science at the time Guido was naming python because it is a snazzy type inferencing lisp compiler!
Guido's some sort of naming-dick. What'll he call his next python project? Glibc? Mesa? Gimp?
This is why working at google is awesome. Internal code reviewer is big news.
I use the tool in question, it's good.
Also I've used perforce at a previous company. Generally most people who talk about SCMs and reference CVS as a potential replacement/alternative to P4 really do not know what they are talking about. P4 has it's problems, granted, but if you are looking to maintain a massive code base, there really are few choices. Atomic change lists, they are fantastic.
> Not that it's all roses with Perforce. My impression is that it doesn't scale
d ex.html). So maybe future versions of the Perforce server will benefit too. I hope.
> very well. Most operations simply lock the entire database.
I agree - the backup solution described and recommended by Perforce works well for small installations, but doesn't scale very well in my experience. It's disappointing given that Perforce use scalability as a selling feature (http://www.perforce.com/perforce/products.html).
I went on a limb and made an alternative way to do checkpoints/backups for exactly the reason you describe - it's difficult to get right and seriously bad if you get it wrong. The write up of what I do is here:
http://www.mcternan.co.uk/PerforceBackup/
In my opinion it would be simple for Perforce to implement some simple changes to help large scale backups (e.g. make p4d -jj -c "cmd" work), and I've suggested it to their support staff, some of whom I've met in person at various times. However, I haven't heard or seen any indication that they are going to do this... I'm still hopeful, but less so these days.
I also believe that Perforce only does locking at the table level (using flock()), which is most likely why the server often sees poor concurrency, especially with write operations as you describe. The more recent versions of the server are apparently better (2006.x), although I'm yet to upgrade. The server itself is based on SleepyCat Berkley DB tables, which Oracle recently took over and look to have improved (http://www.oracle.com/database/berkeley-db/db/in
-- Mike
It could be better, but it's not that bad:
Interesting! I'll have to look it over more later.
For comparison, I've put the latest (not yet deployed) version of our offline checkpoint process here. (It's a NetVault backup script; pre locks and does the checkpoint, post touches a file signalling success to our monitoring and releases the lock). It's a procedure outlined by Perforce, though they didn't mention error handling...
FYI - Guido built this system to work within the existing Google infrastructure, he didn't choose Perforce for the project. Guido also wants to eventually refactor it to work with many SCM including Subversion, CVS, etc. BTW, Perforce is used at many very-large-software-companies, so while it is not perfect, it is still very useful.
Microsoft dogfoods most everything, including Exchange Server (for @microsoft.com and @hotmail.com), SQL Server (for *.live.com), and internal Office betas (with pushes out to everyone, including admins). But for source control, their own products just don't scale up to 60,000 employees. I've heard there's dogfood initiatives for VSS, but developers in Windows Client and Office vehemently oppose them. I'm not sure if VSS is used in other areas of the company, like perhaps the Live.com groups and Xbox/Zune. And I'm not sure if there's efforts to expand VSS's capabilities to support Microsoft's own requirements, since there's very few companies that need that kind of scalability and most are Microsoft competitors.
my blog
Microsoft is using Perforce for source code control? Why aren't they using their own product--Visual Source Safe? Does it suck or something?
"Not an actor, but he plays one on TV."
It took you until now to see an early warning sign?