Getting a Grip on Google Code
netbuzz writes "Niall Kennedy reports on his blog that Guido van Rossum, author of the Python programming language, has begun showing off his first project since joining Google last year. 'Mondrian is a Web-based code-review system built on top of a Perforce and BigTable backend with a Python-powered front-end,' Kennedy writes. 'Mondrian is a pretty impressive system and is currently in use across Google.' Kennedy's description of Google's current code-review system sure makes it sound like it was in need of an upgrade. 'The Mondrian tool creates a much better workflow by creating task-specific dashboards, in-line commenting, well-tracked statistics, and more,' he writes. 'The application is built on top of Python open source libraries such as the Django framework, smtpd.py mail service, and the wsgiref Web server software.'"
Mmm no, what google knows it's about is "the tool that fits the task best".
In this case -- the creator of the Python language having to build a webby app -- the obvious tool was much more likely to be Python than, say, Java. Or even actually good languages like Smalltalk, or Forth, or Erlang.
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
8 bit characters is exactly what it /does/ support. It's multi-byte characters that are often seen as the problem, although UTF-8 is also supported (Unicode generally, however, is a different matter). Ruby can also support load balancing and HTTPS.. although since those aren't relevant to a programming language per se, it's intriguing why you bring them up (unless I've fallen for a troll, in which case.. well done ;-))
AFAIK stuff like load balancing or HTTPS are handled by HTTP servers/balancers/fronts (or at least TCP balancers for load balancing) and stuff, not by the language used to create the site... If you're going it an other way, you're likely to be doing it very wrong.
And apart from that, modern Ruby webapps actually have fairly good deployment solutions, e.g. Mongrel.
Unicode in ruby still sucks though
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
Uh, UTF-8 is a Unicode Transformation Format, that's usually (that or UTF-16) what people talk about when they mention "Unicode". And Ruby definitely sucks at anything out of the ascii character space, be it inside or at the boundaries (interfacing with the outer world).
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
And actually good documentation is written in Latin, or Hieroglyphs, or Sanskrit.
?? What's wrong with Perforce?
Good idea, building on a closed-source SCMS that's (barely!) a mid-level player in the market. I can understand not wanting ClearCase, but what's wrong with CVS or Subversion? Hell, even Monotone or GNU Arch...
Oh well, could be worse: they could have gone with StarTeam, PVCS or MKS Source Integrity...
Just junk food for thought...
Am I missing something here? Why don't the use Subversion?
Codestriker does the same thing. Except it is in perl + GPL, on source forge.
Try giving a deep, suculent "yard of tongue down your throat" soul kiss to the suspected troll. Watch in the closet door mirror: If their spine lights up IT'S A CYLON! FOR GOD'S SAKE RUN! Otherwise it's probably a troll.
If you want to go ahead and have sex with the Cylon before you run, that's your business.
Disclaimer: I've yet to work with Perforce, having not yet graduated from CVS, but at least I'm not using VSS.
Anyone who's used perforce will understand. I read this and it made me very, very jealous.
See you, space cowboy...
I'm not sure how you decided Perforce is a "barely mid-level player" in the SCM market. Adobe, Google, and Microsoft all use Perforce as their primary source code management solution. (Though Microsoft has highly modified it and calls it something else internally... but my contacts there tell me it's still Perforce underneath.) Perforce does have its problems with scalability, but in terms of merging, collaborating, viewing history, keeping branches, etc, etc, etc, it's pretty awesome.
my blog
Ouch.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Totally OT, but your comment reminded me of this. A great piece of history from the Multics group about an error code that never was meant to see the light of day, yet, through circumstances, did show up once during an upgrade.
Why would linux kernel maintainers have used a proprietary SCMS all these years, if it wasn't simply the best suited tool for that purpose? (bitkeeper)
I use both Subversion and Perforce. There's one major feature still lacking from Subversion: merge tracking. There's work underway to design, implement, and document this feature, but it's not done yet. This is a huge deal for anyone with lots of branches.
Not that it's all roses with Perforce. My impression is that it doesn't scale very well. Most operations simply lock the entire database. I think it's a reader/writer lock, but it means that (for example) while the hour-long checkpointing pre-backup process happens every night, you can't do any write operations. (And there's a way to do an offline checkpoint, but it's not documented or supported, and is difficult to get right, with bad consequences if you don't.)
Let me be a little more specific: while the hour-long checkpointing process is happening, you can't even open files for edit. In addition to having really course locking, Perforce has more write operations than most version control systems. Subversion's CVS-style working copy means the only write operations are commits and revpropsets.
I could care less if they managed their code using punch cards.
Where's the 0xBEEF
Microsoft uses a modified version of the Perforce SCM for all of their internal code management. I personally use Subversion on a daily basis at work and on my SourceForge projects but I guess if it's good enough for Microsofts development teams it's good enough for Google to build a custom version too.
I think the problem WITH Ruby is that its making the same mistake as Smalltalk. Everything has to be a object, but this isn't always true. Some things are better modeled with generic functions, or other Functional style paradigms. Python is picking from both worlds. I think my ideal language would be a blend of Haskell and Python in some form. Ocaml comes close, but has it's own weirdness, with '+' and '+.' etc etc.
IE:
Why do numbers need to know about iteration? From Smalltalk:
1 to: 3 do: [ Something ]
There are several other examples as well of this kind of impedance mismatch. Objects get CLUTTERED with cruft, because that's the only place you can stick behaviour.
Objects are nice, but not everything fits in that paradigm. I think blended languages are the way of the future. Perl developers flock to Ruby because it has objects with a few Perlisms, but Smalltalk was rocking that world 20 years ago.
There are some exciting developments on the horizon, but they aren't mature yet.
The main reason for starting SVN was that a lot of things were wrong with CVS. Arguably SVN(nevermind Monotone, Arch) has only recently approached Perforce level of stability, scalability and functionality. They needed something workable probably at least five years prior. ClearCase is clearly not a Google-style solution.
Looks like a good choice to me.
Adobe, Google, and Microsoft all use Perforce as their primary source code management solution.
Amazon does too.
You can't check out files with both Unix and Windows line-endings. See http://smithii.com/perforce_bugs for the ugly details.
I find your sig really annoying.
UTF-8 is the de facto standard encoding of Unicode characters, but in the interests of not having someone respond with "but Unicode != UTF-8" I thought I'd add the proviso (although it opened me to complaints of the opposite nature, of course ;-))
Blargh! Mondrian is already an open-source OLAP engine! Seriously, a casual google search could tell you that. And it's not some sf.net abandonware, it's a mature and powerful OLAP Cube engine used by some big-name corps!
Oh, and just to rant a bit more: Python WAS ALREADY THE NAME of the Lisp Compiler used in the CMUCL Common Lisp implementation and lately SBCL. And was relatively well known in computing science at the time Guido was naming python because it is a snazzy type inferencing lisp compiler!
Guido's some sort of naming-dick. What'll he call his next python project? Glibc? Mesa? Gimp?
Interesting you say it has its problems with scalability. I've seen scalability sucking under Perforce too. However, their website (http://www.perforce.com/perforce/products.html) says:
"The Perforce Software Configuration Management System features comprehensive SCM capabilities built around a scalable client/server architecture."
Certainly that reads like BS in my experience...
Subversion is based on changesets as well. Its model is virtually identical to Perforce's, and quite far from CVS's.
ZING!
This is why working at google is awesome. Internal code reviewer is big news.
I use the tool in question, it's good.
Also I've used perforce at a previous company. Generally most people who talk about SCMs and reference CVS as a potential replacement/alternative to P4 really do not know what they are talking about. P4 has it's problems, granted, but if you are looking to maintain a massive code base, there really are few choices. Atomic change lists, they are fantastic.
> Not that it's all roses with Perforce. My impression is that it doesn't scale
d ex.html). So maybe future versions of the Perforce server will benefit too. I hope.
> very well. Most operations simply lock the entire database.
I agree - the backup solution described and recommended by Perforce works well for small installations, but doesn't scale very well in my experience. It's disappointing given that Perforce use scalability as a selling feature (http://www.perforce.com/perforce/products.html).
I went on a limb and made an alternative way to do checkpoints/backups for exactly the reason you describe - it's difficult to get right and seriously bad if you get it wrong. The write up of what I do is here:
http://www.mcternan.co.uk/PerforceBackup/
In my opinion it would be simple for Perforce to implement some simple changes to help large scale backups (e.g. make p4d -jj -c "cmd" work), and I've suggested it to their support staff, some of whom I've met in person at various times. However, I haven't heard or seen any indication that they are going to do this... I'm still hopeful, but less so these days.
I also believe that Perforce only does locking at the table level (using flock()), which is most likely why the server often sees poor concurrency, especially with write operations as you describe. The more recent versions of the server are apparently better (2006.x), although I'm yet to upgrade. The server itself is based on SleepyCat Berkley DB tables, which Oracle recently took over and look to have improved (http://www.oracle.com/database/berkeley-db/db/in
-- Mike
It could be better, but it's not that bad:
It don't scale. Once the DB exceeds the disk cache, it grinds :(
Worse in this case, Mondrian is also the name for statistical and plotting program (available since at least 2002).
:-)
In biology, when somebody accidentally names something as a homonym (i.e. the name already exists for some other species), then some other person who notices it often names it after the person who made the mistake. Having a species named after you is therefore either an honor, or it means you goofed.
So, perhaps he should call it "Guido"
With good administrators, Perforce is comfortably scalable up to 500 to 1,000 developers. You just start running into lag problems once you get above that. Certain operations like branch integrates can lock up a depot for minutes, which doesn't sound bad until you realize that no developers on that depot can check-out files or make other Perforce client changes that whole time. Technically you'd run into the same problems with just 2 developers, but it gets really noticable when there's 3,000 developers and they can all do integrates and anyone's integration will lock the other 3,000 users. Microsoft's definitely running into this problem, plus they're running into sandbox issues. You can probably think of other companies with the same problem.
my blog
Interesting! I'll have to look it over more later.
For comparison, I've put the latest (not yet deployed) version of our offline checkpoint process here. (It's a NetVault backup script; pre locks and does the checkpoint, post touches a file signalling success to our monitoring and releases the lock). It's a procedure outlined by Perforce, though they didn't mention error handling...
Why does MS not use the tools they sell to the rest of the world? For example, my organization does not use perforce but TFS for version control. Is TFS inadequate for large-scale development?
I do not have much experience with lots of different versioning tools. I have only used TFS, CVS, and SourceGear.
-- Posted from my parent's basement
FYI - Guido built this system to work within the existing Google infrastructure, he didn't choose Perforce for the project. Guido also wants to eventually refactor it to work with many SCM including Subversion, CVS, etc. BTW, Perforce is used at many very-large-software-companies, so while it is not perfect, it is still very useful.
Microsoft dogfoods most everything, including Exchange Server (for @microsoft.com and @hotmail.com), SQL Server (for *.live.com), and internal Office betas (with pushes out to everyone, including admins). But for source control, their own products just don't scale up to 60,000 employees. I've heard there's dogfood initiatives for VSS, but developers in Windows Client and Office vehemently oppose them. I'm not sure if VSS is used in other areas of the company, like perhaps the Live.com groups and Xbox/Zune. And I'm not sure if there's efforts to expand VSS's capabilities to support Microsoft's own requirements, since there's very few companies that need that kind of scalability and most are Microsoft competitors.
my blog
merge tracking is already "implemented". Use svk, which does quite a nice job.
Actually merge tracking is useful enough that it makes sense to use svk in completly online settings.
yacc
I use both Subversion and Perforce. There's one major feature still lacking from Subversion: merge tracking. There's work underway [tigris.org] to design, implement, and document this feature, but it's not done yet. This is a huge deal for anyone with lots of branches.
Aye, the SVN team is definitely not sitting on their laurels (yet) after finally hitting the 1.0 release a while back. They made significant improvements in 1.4 and have more up their sleeves for the upcoming 1.5 release. With even more things planned for down the road.
(We made the switch to SVN over the summer, but we're just a tiny little shop with average needs.)
Wolde you bothe eate your cake, and have your cake?
Actually, you can keep checkpoint downtime down to .. oh.. a few seconds, by using some simple techniques. "Offline checkpointing" is what it's called. In fact, you don't technically have to suffer any downtime for administrative tasks.
Google Mondrian: web-based code review and storage
* Search/Google
* Programming
Google Mondrian logo
Guido van Rossum unveiled his first Google project, Mondrian, tonight during a Python tech talk at the Google campus in Mountain View. Mondrian is a web-based code review system built on top of a Perforce and BigTable backend with a Python-powered front-end. Mondrian is a pretty impressive system and is currently in use across Google.
Shared Development Environment
Google uses a company-wide Perforce depot with almost no developer branches. Each developer has their own NFS workspace readable by anyone in the company, including automated processes. An administrative process takes snapshots of each developer workspace including local development environments accessed over SSH. Files within these snapshots can be compared to checked-in data, encrypted, and archived.
Previous methods of review
Previous to Mondrian code review was conducted largely over e-mail using Google command-line wrappers built on top of Perforce. A developer could initiate a code review from within the g4 mail tool, which would fire off an e-mail and begin a review thread. When the developer receives a response of "looks good to me," or lgtm for short, they could proceed to checkin. Changes could be compared using tkdiff.
Design-level reviews are often conducted by e-mailing around Word documents or editing a team wiki. Recently some design reviews have moved onto an internal version of Google Docs.
Web-based collaboration meets code review
Mondrian code review
The Mondrian tool creates a much better workflow by creating task-specific dashboards, in-line commenting, well-tracked statistics, and more. The application is built on top of Python open source libraries such as the Django framework, smtpd.py mail service, and the wsgiref web server software.
Code reviews can be initiated and completed from within the Mondrian interface. A developer requests a review from another user or a group of users to kick off the process. Each invited reviewer can add comments directly underneath a line of code or reference the entire file. You can request and diff the file against previous versions as well. It's a pretty slick interface, lightly highlighting each line of code as you hover, and popping open a comment box in response to a double-click. Comments can be saved as a draft and shared at a later time.
Putting the entire code review process online means you never have to worry about referencing the most recent version of a file or losing e-mails. Mondrian captures every outgoing e-mail related to the workflow, looks for key data such as revision numbers, and updates a to-do list accordingly.
More on BigTable
Mondrian uses BigTable as backend storage for user data. More specifically, it's used to store:
* Change metadata such as a description or list of files
* Comments entered through the web interface or via e-mail
* Encrypted file snapshots taken from user workspaces
* Per-user data such as active changes or last view dates
Summary
The Mondrian web code review system is pretty impressive. Guido estimates he has spent about 25% of his work time on the project since joining Google in December 2005. Mondrian served as Guido's introduction to Google technologies and processes with the help of a few other Googlers treating it as a side-project. The application is so deeply intertwined with Google technologies it's not likely to be available as open source until Subversion and a backend such as SQLite can be supported.
Guido's full talk, including a demo of Mondrian, should be available on Google Video sometime in the future.
It's just as free as CVS/SVN for two users. They offer a completely free version that never expires: http://www.perforce.com/perforce/evaldemo.html
This is a nice design win for Django as a web framework. I wonder how much of the stack he ended up using and whether he used the ORM layer at all.
Microsoft is using Perforce for source code control? Why aren't they using their own product--Visual Source Safe? Does it suck or something?
"Not an actor, but he plays one on TV."
CVS is shit. I mean this respectfully, of course, but it doesn't even have atomic commits or changesets, which are bare minimums for modern VC. SVN isn't much better, it is still too much CVS-derived for it's own good.
-- pending
VSS was used internally at microsoft by some groups for a while, but it's been replaced by their version of Perforce, which is in turn slowly being replaced by microsoft's own TFS which scales quite well to large projects/teams. check out dogfood stats here
at least it's not source unsafe, harvest, or PVCS.
clearcase 4 ever!
PHP is the solution of choice for relaying mysql errors to web users.
Sure it can be done, but Perforce leave you to implement this yourself, and it's not something you want to get wrong.
So what you are saying is that it stops scaling at about 500 to 1000 developers.
For fanboys and lackeys. And people who feel important because they know somebody who knows somebody who thinks g is great or something, but can't write any code of their own :(
The entire google team in other words!
Our product Crucible http://www.cenqua.com/crucible provides online web-based code review including inline commenting, workflow etc. Crucible is currently in Beta release and supports CVS, SVN and Perforce. Free licenses for Open Source projects are available.
Cheers,
-Brendan
Oh really? I'll change it then
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
Hypothetically, because the developer was a friend of Torvald's, and he talked him into it.
My company, Smart Bear Software, has developed a commercial tool for peer code review called Code Collaborator. We support a wide variety of SCM's, including CVS, Subversion, Perforce, Clear Case, and soon Team Foundation Server.
Using our tool, we also performed the largest case study of peer code review ever published and have made it available as a free book. It includes data from 2500 reviews of 3.2 million lines of source code at Cisco Systems. To get your free copy, just sign up on our website.
2.8M — Clear Case
2.8M — CVS
2.4M — Visual Source Safe
1.8M — Subversion
1.1M — CCC/Harvest (now CA AllFusion Harvest)
900K — RCS
665K — Perforce
536K — PVCS
378K — Aegis
376K — Monotone
186K — BitKeeper
154K — StarTeam
101K — AllChange
68K — GNU Arch
29K — Continuus
16K — MKS Source Integrity
If we assume that # of hits == market share[1], then Perforce has 4.65% of the market. I realize this isn't the case, but I thought the results would be at least roughly indicative of relative ranking. This may be a Really Bad Idea (SM), and I'd happily fall on my sword if anyone could post some real figures (or even some from Gartner). I found some during my searches, but they were all several years old, so I didn't really consider them.
I'm sure these results are well worth the price you paid for them. Now if I could only get reimbursed for the fifteen minutes or so it took me to compile them...
---
[1] I could probably get a patent on this idea, it's so good...
Just junk food for thought...
Valve also uses Perforce.