[packagers] Infrastructure - version control

Yury V. Zaytsev yury at shurup.com
Sun Apr 3 23:03:15 CEST 2011


I have been working in the background on implementing a better version
control for RPMForge. I now think I have got to the point when I have
something to show. So let us jump straight to the results and if you
want to know about the theory, please just keep on reading.

I have converted the most substantial part of the subversion repository
that we have now to git. You can clone it this way (use gitk to browse
through the history):

git clone git://github.com/zyv/rpmforge.git

The GitHub page is here:


I have figured out the complete list of committers so far along with
their e-mail addresses and imported all branches and tags. The initial
1.3 Gb svnsync got compressed to ~70 Mb.

The upsides: 

1) The process of obtaining the new changes now is really a matter of
seconds: git pull and that's it. I am sick and tired of waiting for 15
minutes until my svn working copy is fully updated.

2) It is impossible to get a collision when you have been working on
some SPECs for some time and then you commit and overwrite someone
else's changes. This has happened to me and Steve in the past already.
Git just won't let you commit until you resolve the conflicts and ensure
that the whole tree is consistent.

3) You can have private branches for free (for nothing). This mostly
addresses Dag's point about him being unwilling to commit things before
they are done. Just create a private branch that nobody will be able to
see in a matter of milliseconds and commit there. When it is done,
cherry-pick or merge.

That's it. I want to know if everybody is happy with the results of this
conversion. Please ack or tell me why not. Once we settle this, the next
post will be on where to host it, so let's first close this and then
discuss all the rest. Please, one thing at a time.


Now, the extended part of the e-mail:

Last time we had quite a bit of discussion about that with Karanbir and
he raised some very valid points. I tried to consider them in as far as
I was able to and after finishing Pro Git,  evaluating Fedora's
experience etc. I decided to settle with migrating the package tree as
it is in a top-level repository and moving all the other stuff around in
the separate repositories.

The reasons for it are as follows:

0) Why git and not hg, bzr, monotone or you name it <distributed vcs>?

- Mainly because one needs to make a decision. It is quite clear that we
need a distributed system. I often work offline or on the road and I
need the history. By now, however, most distributed systems are more or
less on par in terms of features. So, unless there is a strong
preference for system X, it's just a matter of taking the fastest and
the most popular.

- I like git, I am used to it, I am familiar with it :-) and I did most
efforts on the migration, so I took what I took.

1) Ok, so why taking out the RPMS folder in a top-level repository
instead of breaking it up in 5000 repositories, one per package?

What works for Fedora will not work for us. They have more than 10K
repositories and around *1K* contributors (!!!). This is the main reason
why its perfectly reasonable for them to chose the multi-repo scheme and
why we shouldn't do it.

What they usually have is like ~10 packages per contributor and mostly
they absolutely don't care about any other packages than those that they
maintain. What we have is 5 active packagers (OK, if we are very
optimistic, 10 or even 15 in the future). That is 1K packages per
packager. And obviously I do care about the whole tree to a certain
extent, and so do Dag, Steve and Christoph. If I see an update that is
not too complicated to perform, I will do it even if I personally don't
use this package.

Fedora folks probably want to be able to make packagers as independent
as possible from each other and by no means force them to download and
operate on the whole tree. Also, the per-package scheme works with koji
quite well.

We don't really care, as long as there's a way to set up ACLs to give a
packager the access only to the part of the tree (not that it is really
needed right now, but might come in handy in the future and it can be
done easily with gitolite).

2) So you want to force people to check out the whole repository if they
only care about few packages?

a) As I suggested (and the experience confirmed), the size of the bare
repository is around 70 Mb + 10 Mb worth of checked out files. The size
of my svn checkout (full repo) is right now about 400 Mb. Come on,
what's 70 Mb's nowadays, guys? It's nothing as in *nothing*.

Besides, updating the whole repository will take neglectable amount of
bandwidth (few kilobytes) and will be much quicker than using

b) If one really-really needs only one SPEC and is absolutely not
interested in the rest, downloading a file from the repository browser
is always an option.

P.S. So far I haven't seen anyone submitting svn diff patches, in best
case it's a manual diff -Naur against the expanded SPEC found in
SRPM :-)

3) You can come up with a set of tools to manage 1 repo per package and
it's actually doable.

It is true, that it is possible to come up with a set of tools to manage
5000 repositories (one per package). You can automate creation /
deletion / ACLs with tools like gitosis or gitolite, you can come up
with bunch of scripts to mass-update repositories, create RPMs etc., but
this is a significant overhead which is just not justified for such a
low number of packagers.

Also, do not forget, that not only the administrator of the system has
to be familiar with these, but also all of the packagers will forcibly
have to use them as well.

The threshold required to contribute is already high enough... why make
it even higher? I, for instance, won't be able to oversee the tree

P.S. An educative essay:

So, that's more or less it. I hope I have pro-actively answered most, if
not all questions :) 
Sincerely yours,
Yury V. Zaytsev

More information about the packagers mailing list