бэкэнд для быстрых импортеров данных Git (Backend for fast Git data importers)
PACKFILE OPTIMIZATION
When packing a blob fast-import always attempts to deltify
against the last blob written. Unless specifically arranged for
by the frontend, this will probably not be a prior version of the
same file, so the generated delta will not be the smallest
possible. The resulting packfile will be compressed, but will not
be optimal.
Frontends which have efficient access to all revisions of a
single file (for example reading an RCS/CVS ,v file) can choose
to supply all revisions of that file as a sequence of consecutive
blob
commands. This allows fast-import to deltify the different
file revisions against each other, saving space in the final
packfile. Marks can be used to later identify individual file
revisions during a sequence of commit
commands.
The packfile(s) created by fast-import do not encourage good disk
access patterns. This is caused by fast-import writing the data
in the order it is received on standard input, while Git
typically organizes data within packfiles to make the most recent
(current tip) data appear before historical data. Git also
clusters commits together, speeding up revision traversal through
better cache locality.
For this reason it is strongly recommended that users repack the
repository with git repack -a -d
after fast-import completes,
allowing Git to reorganize the packfiles for faster data access.
If blob deltas are suboptimal (see above) then also adding the -f
option to force recomputation of all deltas can significantly
reduce the final packfile size (30-50% smaller can be quite
typical).
Instead of running git repack
you can also run git gc
--aggressive
, which will also optimize other things after an
import (e.g. pack loose refs). As noted in the "AGGRESSIVE"
section in git-gc(1) the --aggressive
option will find new deltas
with the -f
option to git-repack(1). For the reasons elaborated
on above using --aggressive
after a fast-import is one of the few
cases where it's known to be worthwhile.