The following tips and tricks have been collected from various
users of fast-import, and are offered here as suggestions.
Use One Mark Per Commit
When doing a repository conversion, use a unique mark per commit
(mark :<n>
) and supply the --export-marks option on the command
line. fast-import will dump a file which lists every mark and the
Git object SHA-1 that corresponds to it. If the frontend can tie
the marks back to the source repository, it is easy to verify the
accuracy and completeness of the import by comparing each Git
commit to the corresponding source revision.
Coming from a system such as Perforce or Subversion this should
be quite simple, as the fast-import mark can also be the Perforce
changeset number or the Subversion revision number.
Freely Skip Around Branches
Don't bother trying to optimize the frontend to stick to one
branch at a time during an import. Although doing so might be
slightly faster for fast-import, it tends to increase the
complexity of the frontend code considerably.
The branch LRU builtin to fast-import tends to behave very well,
and the cost of activating an inactive branch is so low that
bouncing around between branches has virtually no impact on
import performance.
Handling Renames
When importing a renamed file or directory, simply delete the old
name(s) and modify the new name(s) during the corresponding
commit. Git performs rename detection after-the-fact, rather than
explicitly during a commit.
Use Tag Fixup Branches
Some other SCM systems let the user create a tag from multiple
files which are not from the same commit/changeset. Or to create
tags which are a subset of the files available in the repository.
Importing these tags as-is in Git is impossible without making at
least one commit which 'fixes up' the files to match the content
of the tag. Use fast-import's reset
command to reset a dummy
branch outside of your normal branch space to the base commit for
the tag, then commit one or more file fixup commits, and finally
tag the dummy branch.
For example since all normal branches are stored under
refs/heads/
name the tag fixup branch TAG_FIXUP
. This way it is
impossible for the fixup branch used by the importer to have
namespace conflicts with real branches imported from the source
(the name TAG_FIXUP
is not refs/heads/TAG_FIXUP
).
When committing fixups, consider using merge
to connect the
commit(s) which are supplying file revisions to the fixup branch.
Doing so will allow tools such as git blame to track through the
real commit history and properly annotate the source files.
After fast-import terminates the frontend will need to do rm
.git/TAG_FIXUP
to remove the dummy branch.
Import Now, Repack Later
As soon as fast-import completes the Git repository is completely
valid and ready for use. Typically this takes only a very short
time, even for considerably large projects (100,000+ commits).
However repacking the repository is necessary to improve data
locality and access performance. It can also take hours on
extremely large projects (especially if -f and a large --window
parameter is used). Since repacking is safe to run alongside
readers and writers, run the repack in the background and let it
finish when it finishes. There is no reason to wait to explore
your new Git project!
If you choose to wait for the repack, don't try to run benchmarks
or performance tests until repacking is completed. fast-import
outputs suboptimal packfiles that are simply never seen in real
use situations.
Repacking Historical Data
If you are repacking very old imported data (e.g. older than the
last year), consider expending some extra CPU time and supplying
--window=50 (or higher) when you run git repack. This will take
longer, but will also produce a smaller packfile. You only need
to expend the effort once, and everyone using your project will
benefit from the smaller repository.
Include Some Progress Messages
Every once in a while have your frontend emit a progress
message
to fast-import. The contents of the messages are entirely
free-form, so one suggestion would be to output the current month
and year each time the current commit date moves into the next
month. Your users will feel better knowing how much of the data
stream has been processed.