[packagers] Infrastructure - migration to GitHub?

Yury V. Zaytsev yury at shurup.com
Mon Apr 4 22:34:28 CEST 2011


Hi!

So, now that the choice of the version control software has been
discussed and the opinions are mostly positive, I want to bring up a
more controversial topic. 

The topic of hosting this version control repository and setting up a
development hub with a version control hosting / browser, wiki, issue
tracker and a web page. The need for all these things have been
discussed already and recently, Karanbir brought up the last one.

* The system needs to be should be simple to use and reliable.

* Preferably, it should impose minimal administrative overhead. 

* Finally, neutrality would be nice, in a sense that several persons
have full control over it at the same time. 

The solution that I would like to suggest is to migrate to GitHub,
because it addresses all (or most) of these points. 

- You can create a GitHub organization for free as long as you don't
need any *private* repositories (the storage limit is 0.3 Gb per
repository, whereas 7 years worth of our history took approximately 70
Mb, so it should be largely enough). 

- GitHub provides a convenient and elaborate interface: repository
browser, tree download facility, pull request management, external
repository event notifications etc.

- Several users can be administrators of the organization at the same
time, which addresses the redundancy concern.

Services offered by GitHub
--------------------------

1) GitHub provides a nice repo browser, git, ssh and https access (for
those behind corporate firewalls) and even access to the repository via
svn (!), allowing you to check out only a part of the repository if
needed right now, without waiting for the sparse checkout feature to get
into git (hey!) and even use familiar tools to commit (!!!). 

This means that in the first approximation the build system will
actually require zero changes :-)

I don't know if it is so important though... I guess flipping the switch
is actually not as complicated as that and can be done on the build
system side to immediately benefit from the new features of git.

2) GitHub provides you with a facility to host static pages. Actually,
gives you a little bit more than that. You can create a repository and
put your pages written in markdown, textile or something else in there
and on every commit it will regenerate your static website out of it.

I think this addresses Karanbir's point. Nice static pages in a
repository, just change the DNS and the website is up, zero
administration hassle etc. ...

3) GitHub provides you with a wiki and an issue tracker. Those can be
linked to the repository. Wiki has a web-interface or can be accessed as
a git repository.

Now, I think that GitHub is not going away anytime soon, but every
serious project has to have some kind of contingency policy. 

If GitHub vanishes leaving no trace
-----------------------------------

1) As for the version control we don't care so much. Every person has a
full copy of the project's history. Additionally, it is trivial to set
up a nightly off-site pull from github, or a weekly pull, or something
that keeps last X backups and rotates them every day etc.

What we can readily do, for instance, is to set up a free mirror at
repo.or.cz. It costs nothing and it's already one continuous off-site
backup in place. I am willing to volunteer to set up such a mirror. 

P.S. Trick question: what is the backup policy for the svn in place
right now? I mean not only the checkouts, but also the history?

2) Website and wiki are in fact git repositories, so they can be dealt
with in a same fashion.

3) Issues are a little bit more complicated. Currently, they can only be
backed up using the API, but I guess sooner or later a similar access
facility will be implemented for them as well.

There are already Ruby backup scripts around, so if anyone well-versed
with Ruby enough is willing to help to set up a backup script, this
would solve this problem.

Now bringing up a git server is easy. I have a local gitolite install on
my private server and in the case if a disaster happens a new instance
can be brought up on a VPS in a matter of hours.

I am under the impression that finding a temporary VPS to host the
repository is really not a big deal, now that high-quality paid-for Xen
hosts start at $20 per month, this can be tolerated for a few month in
the worst case before a sustainable solution is implemented.

Tentative migration plan
------------------------

I think the best bet would be to start of migrating the repository with
RPM SPEC files first and see how it goes from there. If we like it at
GitHub, nothing prevents us from going forward and getting the rest of
the infrastructure there.

If it turns out to be a terrible experience, then again we can get a
VPS, set up our own gitolite instance and gitweb, settle with that and
from there see, if, for instance, we can get back to the Trac idea.

Later on, a wiki and a web-pages repository can be set up and the
content can be imported from our various abandoned venues that we have
in place right now (docs folder in the svn repo, dead wiki at
rpmforge.net, devastated Trac instance etc.)

I can do part of this work, BUT at this point I'd really like to avoid
committing myself to it with a probability of 100%. The coming months
and the summer are probably going to be quite hard on me, but if we
waited for few years, a delay of few months can probably be tolerated.

In what concerns the implementation as such, I see it as a sequence of
the following steps: 

1) Create repoforge organization at GitHub
2) Collect the GitHub logins from everyone involved
3) Settle the repository migration day with Dag
4) Switch subversion into the R/O mode
5) Perform a final svnsync with the old repo
6) Start the conversion (takes around two hours on my hardware)
7) Perform few required manual adjustments and push it to GitHub
8) Set up a GitHub hook to send commit notifications to the list
9) Test that everything works (i.e. people are able to commit)
10) Switch the build system over to the git checkout
11) Post an email detailing what to do and what to not to do with git
12) PROFIT!!!

Basically, in order to get this done one weekend is needed to account
for unpredictable circumstances. I can make time in the coming month
provided that we all agree on that.

It can be the week after next, or in three weeks. It doesn't matter so
much, there is no rush and we all want this to go smoothly if at all.

>From there on, we can get back to the question of slowly and thouroughly
re-thinking the organization of the project and the buildsystem
accordingly, organize a solid backbone for further growth (important!)
as Dag mentioned earlier today etc.

I am of course willing to contribute in designing the new project,
assessing what is out there in terms of build systems and how it fits to
our needs etc., but it will take time... :-)

However, getting one thing done will benefit all of us right now. And
hopefully, by that time, the lowered contribution threshold would drive
in new maintainers, things will become a little bit easier and light
gonna sparkle at the end of the tunnel...

Opinions?
 
-- 
Sincerely yours,
Yury V. Zaytsev




More information about the packagers mailing list