15 October 2017

This post serves as documentation of the steps to migrate the remaining CVS repositories of the Firebird project from SourceForge to GitHub.

General steps

Some steps for the migration are inspired by (or plainly copied from) https://sourceforge.net/p/forge/documentation/CVS/

The most important step, the migration itself, is explicitly not taken from the SourceForge documentation, as this turned out to be lossy (several branches were not included in the migration for unclear reasons).

The migration will be done on Windows 10 using Windows Subsystem for Linux with Ubuntu, but these instruction should work on a 'real' Linux install.

Tools to install

Using sudo apt-get install:

  • cvs (to be able to process the CVS repository)

  • rcs (for parsing the log files using rlog)

  • git (obviously)

  • make (to 'install' cvs2svn)

In a suitable working directory, install the latest development version of cvs2svn (the last released version might run into problems with multi-line commit messages).

# in ~/repomigration
git clone https://github.com/mhagger/cvs2svn.git
cd cvs2svn
sudo make install

Contrary to its name, cvs2svn also provides a conversion tool called cvs2git.

Retrieving and updating the CVS repository

Initial retrieval of CVS repository:

# in ~/repomigration
mkdir cvsrepo
rsync -av firebird.cvs.sourceforge.net::cvsroot/firebird/* ~/repomigration/cvsrepo

Subsequent updates can be retrieved using just:

# in ~/repomigration/cvsrepo
rsync -av firebird.cvs.sourceforge.net::cvsroot/firebird/* ~/repomigration/cvsrepo

Authors

Git uses email addresses as the usernames of committers, while CVS uses just a username. We will first need to obtain all usernames from the repository, and then associate email addresses (and if possible, user names).

For this migration we will associate the original SourceForge usernames with their username@users.sourceforge.net email address. If users want to associate these commits with their GitHub user account, they will need to associate this email address with their GitHub account (as a secondary address)

Get usernames from CVS logs:

for vfile in `find /home/mark/repomigration/cvsrepo -name '*,v'`; do
  rlog $vfile | sed -nr 's/^date:.* author: ([^;]+).*/\1/p'
done | sort -u >~/repomigration/cvs-author-names

Remove the user root if present, and then transform to email addresses and add name information from SourceForge profile in a format that can be applied in the cvs2git options file:

for uname in `cat ~/repomigration/cvs-author-names`; do
  json=`curl https://sourceforge.net/rest/u/$uname/profile`
  fname=`echo "$json" | sed -nr 's/\{"username": "[^"]+", "name": "([^"]+)".*/\1/p'`
  echo "    '$uname' : ('$fname', '$uname@users.sourceforge.net'),"
done >~/repomigration/authors.txt

Review the authors.txt and make changes were necessary (eg maybe some users indicated they want their commits associated with another email address, real names are not present in the source forge profile, etc).

Conversion of a module

Before conversion, make sure the local copy of the repository is up-to-date (using rsync). In the description below, I assume migrating the OdbcJdbc module to git project firebird-odbc-driver.

The cvs2git tool can only do per module conversions. To make migration easier, it is advisable to use an options file, as documented on http://cvs2svn.tigris.org/cvs2svn.html and http://cvs2svn.tigris.org/cvs2git.html.

For our purpose we took a copy of the cvs2git-example.options from the cvs2svn folder created in Tools to install, and made the following modifications. Most of these changes can be used for conversion for all modules, but some settings are per module (or may need some tuning per module).

  1. (optional) Set ctx.tmpdir to a name specific to the module being converted (eg r'/home/mark/repomigration/cvs2git-OdbcJdbc')

  2. Copy the contents of authors.txt to the author_transform list.

  3. (optional) Change the entry 'cvs2git' : 'cvs2git <admin@example.com>' to the domain of your project (in our case I changed the email address to firebird@firebirdsql.org)

  4. In run_options.set_project replace r’test-data/main-cvsrepos' with the path to the module in the repository copy (eg r'/home/mark/repomigration/cvsrepo/OdbcJdbc')

  5. In ctx.cvs_log_decoder uncomment 'latin1' (and maybe 'utf-8') and fallback_encoding='ascii' (especially if you receive warnings about log parsing)

  6. (optional) Change ctx.symbol_info_filename from None to (for example) 'symbol-info.txt', this may help in analyzing and fixing problems with name-conflicts between tags and branches

  7. (optional) Enable changeset_database.use_mmap_for_cvs_item_to_changeset_table (but read the warning in the options file!)

  8. (optional) If you are missing branches, comment out ExcludeTrivialImportBranchRule(), as an example the OdbcJdbc module had a branch that was equal to the original initial commit of the CVS repository, and was therefor excluded by this heuristic

  9. Download http://www.apache.org/dev/svn-eol-style.txt and http://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/mime.types and make changes if necessary.

    1. Uncomment the AutoPropsPropertySetter (and related lines) and point it to svn-eol-style.txt

    2. Uncomment the MimeMapper and point it to mime.types

    3. Uncomment the EOLStyleFromMimeTypeSetter

    4. Add add from cvs2svn_lib.svn_run_options import SVNEOLFixPropertySetter to the import list at the start, and at the end of ctx.file_property_setters.extend add SVNEOLFixPropertySetter(), to normalize up line-endings (test carefully if you really want to do this)

Be sure to read through the options file documentation, there are some settings you might want to tune further (eg the settings in ctx.file_property_setters.extend for line endings, etc). The cvs2git default behavior leaves the content as originally stored in the CVS repository (aka 'treat everything as binary').

The settings of step 9 have no effect if SVNEOLFixPropertySetter() isn’t added. Be aware that this can introduce issues further down the road, like line-ending changes between commits depending on the platform and configuration of the contributor. This applies especially for files that are prone to require specific line-endings (eg Windows .bat files). It might advisable to add a .gitattributes after migration and update affected files a described on https://www.git-scm.com/docs/gitattributes/.

To convert (replace firebird-odbc-driver.options with your options file)

cvs2git --options=firebird-odbc-driver.options

Conversion can take a while.

Then perform (replace the firebird-odbc-driver and cvs2git-OdbcJdbc with your specific names):

git init firebird-odbc-driver.git
cd firebird-odbc-driver.git
cat ../cvs2git-OdbcJdbc/git-blob.dat ../cvs2git-OdbcJdbc/git-dump.dat | git fast-import
# might fail if this branch doesn't exist
git branch -D TAG.FIXUP
python ~/repomigration/cvs2svn/contrib/git-move-refs.py
# delete branches prefixed `unlabelled-` (old branches that had their name deleted from CVS)
git branch --list 'unlabeled-*' | xargs git branch -D
git gc --prune=now
git repack -a -d -f --depth=50 --window=250

Deleting the unlabeled-* branches may lead to loss of history if those branches were never fully merged back in a still existing branch. However as they had their name deleted in CVS, it was likely that the branch was no longer important. Weigh your options carefully, and keep a backup of the CVS repository!

Verifying contents of repository (replace the firebird-odbc-driver and cvs2git-OdbcJdbc with your specific names):

mkdir /tmp/compare-firebird/
python ~/repomigration/cvs2svn/contrib/verify-cvs2svn.py \
    --git \
    ~/repomigration/cvsrepo/OdbcJdbc/ \
    ~/repomigration/firebird-odbc-driver.git/ \
    --tmp=/tmp/compare-firebird/ \
    --diff

Publishing the repository to GitHub

Create an empty repository on GitHub (replace PROJECT and REPOSITORY with the right values)

git remote add origin git@github.com:PROJECT/REPOSITORY.git
git config branch.master.remote origin
git config branch.master.merge refs/heads/master
git push origin --mirror

Make sure your SSH key for GitHub is loaded.

Repository specific steps

OdbcJdbc to firebird-odbc-driver

Add line NetfraRemote.lib = svn:eol-style=CRLF to svn-eol-style.txt in an attempt to fixup an incorrect binary file.

manual to firebird-documentation

After migration and publication to GitHub, checkout on Windows and perform steps:

In branch master and B_Release do:

  1. Create .gitignore with content:

    lib/*
    !lib/_readme_libs.txt
    
    tools/*
    !tools/_readme_tools.txt
    !tools/get_tools_linux.sh
    
    dist/
    inter/

    And then do

    git add .
    git commit -m "Add .gitignore"
    git push

    This will ignore the files and folders populated for the build process.

  2. Create .gitattributes with content:

    * text=auto
    *.xml           text
    *.xsl           text
    *.docbook       text
    *.css           text
    *.bat           text    eol=crlf
    *.sh            text
    
    *.bmp           binary
    *.gif           binary
    *.ico           binary
    *.jar           binary
    *.jpg           binary
    *.jpeg          binary
    *.png           binary

    And then do

    git read-tree --empty
    git add .
    git commit -m "Add .gitattributes and update affected files"
    git push