Perl Migrates To the Git Version Control System

← Back to Stories (view on slashdot.org)

Perl Migrates To the Git Version Control System

Posted by Soulskill on Sunday January 4, 2009 @04:53AM from the git-on-up dept.

On Elpeleg writes "The Perl Foundation has announced they are switching their version control systems to git. According to the announcement, Perl 5 migration to git would allow the language development team to take advantage of git's extensive offline and distributed version support. Git is open source and readily available to all Perl developers. Among other advantages, the announcement notes that git simplifies commits, producing fewer administrative overheads for integrating contributions. Git's change analysis tools are also singled out for praise. The transformation from Perforce to git apparently took over a year. Sam Vilain of Catalyst IT 'spent more than a year building custom tools to transform 21 years of Perl history into the first ever unified repository of every single change to Perl.' The git repository incorporates historic snapshot releases and patch sets, which is frankly both cool and historically pleasing. Some of the patch sets were apparently recovered from old hard drives, notching up the geek satisfaction factor even more. Developers can download a copy of the current Perl 5 repository directly from the perl.org site, where the source is hosted."

20 of 277 comments (clear)

Min score:

Reason:

Sort:

que the unreadability jokes by Dan667 · 2009-01-04 04:55 · Score: 4, Funny

but this is fantastic. I use perl every day and love it.
1. Re:que the unreadability jokes by Dystopian+Rebel · 2009-01-04 05:21 · Score: 4, Funny
  
  Dear Perl Monger,
  TMTOWTDI in Perl. However, in English spelling:
  - Que Publishing is a publisher of computer books
  - Q was an arrogant but powerful character in ST:TNG who liked to annoy the crew of the Enterprise because they didn't copulate anywhere near as much as James... T... Kirk.
  - queue, noun, a line or series of people or things; verb, to form a line or series.
  - cul, noun, French, the buttocks
  - cue, noun, a signal or indication; verb, to signal, to indicate, to move to position.
  2009: The Year Of The Truly Helpful Slashdot Grammar Nazi
  
  --
  Rich And Stupid is not so bad as Working For Rich And Stupid.
2. Re:que the unreadability jokes by CODiNE · 2009-01-04 05:49 · Score: 5, Funny
  
  Qué?
  
  --
  Cwm, fjord-bank glyphs vext quiz
3. Re:que the unreadability jokes by Anonymous Coward · 2009-01-04 06:09 · Score: 5, Funny
  
  Here is the script used to migrate perl to the git version control system:
  #! /bin/perl
  $T=s/Y9/s/0YT-sx^*%fr86%8 ^% v%^* %^* 8*R%^*f vR%^ print @V^58 *$$%&^*7890JH87gV7 65&ygtyR$KLJi"'"%$44:H{"['J{]09'[u"JOPu9)P{"Y8yghO*HYgT*gtO""i'G{*(#h'oiHIO*UYF&d97c 567F&Olf*(Up[;yh['
  "[]i
  O}{];{:}{;';}
  jpJhi8[9
  89ouyfo8tIGUYf65D 54$4$edc%$
Re:But... is Perl now historical only? by berend+botje · 2009-01-04 05:00 · Score: 5, Insightful

I take it you have volunteered to help finish P6?
Can't get there from here by djupedal · 2009-01-04 05:20 · Score: 4, Funny

$ git clone git://perl5.git.perl.org/perl.git

-bash: git: command not found
1. Re:Can't get there from here by eggnet · 2009-01-04 07:20 · Score: 5, Informative
  
  The joke is that git depends on perl.
Re:I'd rather seen they moved to Subversion by SanityInAnarchy · 2009-01-04 05:23 · Score: 4, Informative

There are significant advantages of Git over Subversion. RTFS for some.
Just to add insult to injury -- often, a Git checkout, which includes all history, takes up less space than a Subversion checkout for the same project, which doesn't even include recent commit log messages.
But think about this -- you're saying they should use a big, slow, central server, as a single point of failure, crippling offline development, complicating branches (especially merges), and several orders of magnitude slower for just about every operation, just so you don't have to learn a "weird" tool?

--
Don't thank God, thank a doctor!
Re:Darcs vs. Git by SanityInAnarchy · 2009-01-04 05:25 · Score: 4, Insightful

I would guess it's ubiquity and featureset.
Git is built of a patchwork of C and scripts, meaning it's something Perl6 could be a part of someday, and it's also something that's going to be quite familiar to all Perl developers, not just the Pugs guys.
And, Git seems to be quickly becoming the Subversion of DVCS -- fast, open source, everyone has it, everyone knows it, and the alternatives really don't have much compelling to offer.

--
Don't thank God, thank a doctor!
Re: Perl not historical only by Dystopian+Rebel · 2009-01-04 05:31 · Score: 4, Insightful

Fixed the subject line for you.
Last year, I completed two important Perl-based projects for my employer. I also use Perl at least once a week to run analyses of my Web server logs. I prototype Web applications in Perl and often just put the prototype into production because it works well. I'm still using Perl that I wrote over 10 years ago, with NO changes, on several OSs. And I use Ubuntu Debian, of which Perl is an integral component.
Perl is great. If I want what it doesn't have, I use a different language. But when I want regular expressions, CPAN, quick and secure CGI, analysis of large data sets and general parsing, easy database integration, and efficient portability from server all the way down to embedded systems, Perl is the first language I consider. Ruby might be ready for the real world one day. And Python is good for other things, but it is not a replacement for Perl.

--
Rich And Stupid is not so bad as Working For Rich And Stupid.
Re:Darcs vs. Git by Johnny+Loves+Linux · 2009-01-04 05:31 · Score: 5, Interesting
I can understand the advantage of using distributed version control. But given all the Haskell people involved (who came in via Pugs) I'm surprised they went with Git vs. Darcs.
Does anyone know if speed is as large of an issue as it is for Linux kernel or was there another reason?
Actually, you might not know this, but the Haskell folks already moved over to git from darcs a while ago. They were having scalability issues and did a 6 month survey to determine which distributed version control they should go with and determined that git was the best of the breed. Here are the links:
1. Announcement: http://article.gmane.org/gmane.comp.lang.haskell.glasgow.user/14819 [gmane.org]
2. Comparisons: http://hackage.haskell.org/trac/ghc/wiki/DarcsEvaluation [haskell.org]
Re:I'd rather seen they moved to Subversion by timeOday · 2009-01-04 06:10 · Score: 5, Insightful

"you're saying they should use a big, slow, central server, as a single point of failure, crippling offline development..."
I am intrigued git and adoption by a major project like Perl is a big endorsement, so please don't take this as a rhetorical question: isn't centralization the heart of source code management? As a project lead, I'm reluctant to have repositories sprouting like mushrooms everywhere and everybody having their own little "trunk," and developers arguing who should have to merge with whom before each release. Is this reluctance totally unfounded, or easily solved administratively, or a valid concern with a peer-to-peer SCM model?
it depends on the size, I think by Trepidity · 2009-01-04 06:37 · Score: 4, Informative

These distributed models work best if it's a large team, which potentially has more than one level of hierarchical structure.
You do typically have a canonical central repository managed by the project lead (in the Linux kernel's case, Linus's tree). But then sub-section leads might have their own canonical repository for that sub-section, and merge in their team members' changes into a stable state that they approve of before asking for those changes to get merged into the central branch. Or they might bundle up some particularly important set of changes for early merging "upstream", making sure they cleanly apply against the current central repo. That's all a nightmare to manage in SVN, which conceives of branches as something you do occasionally and keep around for a while, not as a hierarchical project-management tool.
On the other hand, if you have a relatively small or flat team, or one where the sub-sections break down really cleanly so each one can have its own central repo, it might not buy you much. I'm working on a small project with 4 people at the moment, and SVN is perfectly fine, and I can't really imagine what I'd do with a distributed version control system (I'd just use it like a centralized one, pushing everything to the one repo everyone pulls from).

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
1. Re:it depends on the size, I think by n+dot+l · 2009-01-04 08:43 · Score: 4, Informative
  
  Imagine doing this (half-asleep, bear with incorrect/silly git commands):
  git pull master git branch mystuff
  Two weeks go by, many, many changes have been made in master, and you do a ton of work in mystuff:
  git checkout master git pull master git checkout mystuff git merge master //whatever the git command to forcefully set master to point at mystuff is, I'm barely awake here...
  Or alternately:
  git checkout master git pull master git merge mystuff
  Oh, and in both cases, the merge turns into a single commit without so much as a pointer to the history. You can put that in the commit message yourself, if you like, that's your only option.
  That's essentially all you can do with SVN. You don't have the ability to rebase or cherry-pick or otherwise fiddle with your commit history so as to get a clean straight-forward merge in SVN, and because of that merges are slow and usually painful. So while you could regularly merge to keep things sync'd and simple, nobody actually does that in practice. It leads to what many SVN based teams call "merge day" where it literally takes a day to merge in a feature branch and work out all of the conflicts.
  The other issue is that branches have to be made on the server and then checked out separately, which makes them expensive. First you're polluting the global history, and second you have to do whatever build environment setup you do once you checkout your branch (you can "switch" your branch over which works in place, but that gets flakey as the things you're switching between diverge - maybe this has been fixed). So for any small-to-mid sized task you're (in practice) going to avoid creating a branch. You'll work right in trunk (master in SVN parlance). You won't make commits as you go, as those go straight to the server where they can never be erased, so your only "oops, undo that" feature you have is the undo buffer in your text editor, or picking and choosing lines from a diff, or sheer memory. And of course you *can't* make that one commit until you've pulled down all the changes that happened since your last update (yes, while your changes are sitting uncommitted in your working files), and those changes get dumped into your working copy as changes, indistinguishable from the code you just typed in diffs (except for conflicts, those get marked in the usual manner)...
  All of those problems go away when you can easily merge, because then branches cease to be painful - but then I've found that the best merges Git makes are the ones you get from rebaseing or cherry-picking, which SVN cannot do.
This story was a surprise to me by qazwart · 2009-01-04 07:13 · Score: 5, Interesting

My first response was "Do they still develop Perl?"
When I first started with Perl 3.0 many, many years ago, I fell in love with the language. It was flexible, powerful, and could do all sorts of amazing things. Version 5.0 brought in objects, but the way they worked was a little kinked. Defining classes in Perl is not easy, and I always have to go back to the manpage to make sure I've got all the incantations. Many times, I simply use object oriented structures and forgo the object definitions.
Perl 6 was suppose to fix everything. It would improve the way class definitions worked. Perl 6 would be a better object oriented language while still allowing you to hack out quick scripts like you would in shell script.
Well, Perl 6 was announced almost a decade ago, and it still isn't released. Meanwhile, Python has become the defacto scripting language for the OSS world. Even I, a Perl fanatic whom makes most Mac fanatics look mellow have to admit that, and learn Python. I hate Python. It's use of indents for flow control is a throwback to Fortran. Its lack of regular expressions in the string object (you have to import a special "re" object) makes it maddening. Why o' why does Python use "pop" for arrays, but not "push"? What were the designers on when they decided "exists" is not a member function of hashes -- excuse me -- dictionaries and arrays? Why this syntactic distortion of over 50 years of computer programming overturned?
But, I am now a good Python developer writing all of my stuff in Python. I am use to the cryptic error messages that don't really explain the problem (after all, Python has only been around for a bit over a decade). I am use to the fact that basic structures of the language change from implementation to implementation. I even like the fact that "numbers" are divided into multiple types although you really can't declare a number to be a specific type. It does allow you to experience the fun of your division suddenly not working because it is INTEGER division. (And, of course, Python 3.0 will change this very basic part of implementation and break everyone's Python script!)
Perl could have been the language of the web. After all, even Perl has fewer syntactic quirks than PHP, but it is PHP that is the glue behind server side webpages. While the Perl gurus were redesigning Perl, PHP got incorporated into an Apache module and added the syntactic sugar needed to run sessions and keep variables between PHP scripts.
So, Perl, the glue that use to keep the Internet flowing has become a niche language. Almost all of the younger developers I know never bother to learn it, and fewer and fewer jobs are interested in it. It is Python that everyone wants. It is PHP that runs the message boards and CMS pages. Perl is simply no longer in the picture.
Every few years, something I've learned becomes obsolete. It's the field. One time, I knew how to setup a UUCP network. One time, I could setup a Gopher site. I also learned all the quirks of HTML 3.2 and had to lose that to learn CSS. I use to know C shell programming, and of course I was a C developer and an expert in the curses library. I've usually gave up these technologies without too many problems.
Perl is different. I've been a Perl developer for over a decade. I've always loved the language, and I've solved many, many issues with it. One place where I worked was a .NET development shop when they suddenly realized that some major component of their software couldn't retrieve the information from the network. It would take weeks to fix! I wrote a Perl script in four hours that took care of the problem.
Another place I worked had damage in a customer's database. They had everyone in the company searching for problems and re-inputing the information by hand into a clean database. A Perl script I wrote in a couple of hours did the job. Perl made me the expert. I was the wunderkind. Perl allowed me to do the impossible. It was quick, hackish, yet could also be used to build powerful programs
Re:I'd rather seen they moved to Subversion by n+dot+l · 2009-01-04 07:25 · Score: 5, Informative
We use it at work and it works much better than SVN did.
Apart from everybody's local copies, we keep a repository sitting on a central server. That repo's "master" branch is our release code and, since I'm responsible for the final product, I'm responsible for this branch. Our workflow is fairly simple:
1. Developer pulls down a copy of the master branch (this either creates a local copy or brings an existing copy up to date).
2. Developer hacks away, creating, deleting, and merging local branches as is convenient for them.
3. Developer finishes task.
4. Developer pulls down an update, bringing their local master in sync with the central master.
5. Developer git-rebases their code on the new master. What this does is it takes all of the changes they made since their code diverged from the master and applies them to the new master. Git will apply commits one at a time, pausing if it runs into non-trivial merges or anything else that needs to be dealt with by hand. This has proven to be a massive improvement over the old SVN approach of having the updates in trunk blindly dumped on top of your work as the conflicts tend to be smaller, clearer, and much more manageable. Not to mention that the developer who wrote (and understands) the code is doing the merge.
6. Developer tests their code.
7. If the code is bad, goto step 2. Otherwise the dev will collapse their many little "work in progress" commits into a single "feature implemented/bug fixed" commit.
8. Developer pushes their cleaned up commit as a new branch on the central server and alerts me to its presence.
9. I review the diff (practically a nop for trusted senior coders, for the rest, well, I'd be reviewing their stuff anyway).
10. If I don't like it I send it back, else I merge it onto the central master (guaranteed to be a trivial merge since they did the work of rebasing onto the latest code - Git calls these a "fast-forward" and I automatically reject anything that hasn't been properly rebased) and delete their branch from the central server.
11. Developer pulls down new master, deletes temporary local branches, rebases any other work in progress (or puts this step off, up to them, I don't give a damn as long as I get high quality patches in the end).
12. goto 1
Note that pushing to master doesn't break anybody else, ever, until they decide they're ready to deal with integrating their patch. Nobody ever does the, "Are you gonna commit first or should I?" thing anymore. Developers that are collaborating on a patch sync via a branch on the central server, or directly to each other's machines, or via emailed patches, whatever they want to do. Git doesn't care and neither do I.
It sounds like a lot of tedious work, but Git is just stupid-fast. In the common case the whole update master, rebase, cleanup commits, push cycle takes about as long as SVN used to take to update and then scan for changes and actually commit anything. In the uncommon case where there's a non-trivial merge, the merges tend to come out a lot cleaner since Git is trying to make your changes to the new master one commit at a time, rather than dumping all of the changes in master on top of your stuff (though it can also do that, if you happen to enjoy pain).
And while I prefer the manual approval approach (which scales by appointing trusted lieutenants to take over some of the work) since it keeps me in the loop and keeps everyone else honest, there's no reason you couldn't automate it. Some projects give everyone push access, but disallow anything but fast-forward (trivial nothing-to-merge) pushes to the central server, others I've heard of have people push to a staging branch and a bot on the server grabs the code, runs the test suite, and merges it if it's good. Access is ssh-based, and there are hooks all over the place, you can set up all sorts of schemes when it comes to control of the canonical central repo.
The thing we've found is that because we've all
Re:I'd rather seen they moved to Subversion by Lord+Bitman · 2009-01-04 07:59 · Score: 4, Informative

git makes branching and merging easy enough that the question of "where is the central line?" isn't really an issue- developers can easily work on their own branches without worrying about other branches, and you can still push your developer branch to the central repository so that the question of "Where is this change? Is it in Steve's branch? Do I need to connect to his repository?" is also not an issue- Steve's branch can easily be in the central repository, Steve just needs to push changes in, just like he'd normally need to commit changes. Git's primary difference there is that "Steve's repository" is pretty much just a robust staging area for changes.
However, if you're used to centralized version control, you may miss things switching to git:
- Pick whether you want all or nothing in advance. You can either have "shallow" checkouts, which leave you with a crippled, broken, and useless copy that has no access to history functions, or you can have every change ever made. Once you've made this choice, you can only change your mind by cloning again. There is no way to gracefully get history as it is required.
- This means: no partial checkouts. This is a problem if you're used to versioning large binary files, or have large files which you won't care about for anything other than auditing reasons after a certain time.
- Which also implies: no "modules". This is a problem if you have lots of small related projects, which together make up one massive pool of code. You can have one massive project which everyone uses all of, or you can choose not to track the origin of files which you copy from one project to another. Having a "common" project shared by several others is not possible.
- Unless you try the "submodule" support, which is a broken hack that can devour changes far too easily to trust it to end-users. And submodule support does NOT allow copies from one "submodule" to another, or to your main project. Not while retaining history, anyway.
This is really all one flaw, re-stated five times. Fix this and git will be able to replace any centralized system. Without the change, I can't recommend it to anyone who is involved in a centralized project- at least not when there is a reason for being centralized.
Git is, despite proponent's claims, great for small projects which don't actually need to talk to anyone else and don't need to interface with any other projects. If your project involves other "projects" where the line between one and another is the least bit blurry, avoid git.

--
-- 'The' Lord and Master Bitman On High, Master Of All
Re:Tortoise ? by Tanaka · 2009-01-04 08:33 · Score: 5, Informative

http://code.google.com/p/tortoisegit/
Re:Git momentum by petermgreen · 2009-01-04 08:51 · Score: 4, Informative

Roughly
Linus was very resistant to version control at all and could always find a reason (or excuse) not to use each version control system that came along.
Eventually someone decided to listen to every demand from linus and create a vcs that met all of them. The catch was it was not FOSS and the gratis version had some pretty obnoxious terms. Things reached a head after someone at OSDL reverse engineered the protocol and linus was basically forced to either scrap bitkeeper or quit his job at OSDL.
However the period with bitkeeper had convinced linus that version control was a good idea. But all the alternatives he could find were either too centralised or too slow. So he hacked together git.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
I volunteer ! by Anonymous Coward · 2009-01-04 09:07 · Score: 5, Funny

There are too few developers working on Perl 6, adding a few would actually speed it up. There is a lot of work to be done, and people are spread too thin.
I don't have a clue regarding perl, but looking at some perl scripts, I think I can do it. I mean all you have to do is type #! /bin/perl and roll your face over the keyboard.