бэкэнд для быстрых импортеров данных Git (Backend for fast Git data importers)
MEMORY UTILIZATION
There are a number of factors which affect how much memory
fast-import requires to perform an import. Like critical sections
of core Git, fast-import uses its own memory allocators to
amortize any overheads associated with malloc. In practice
fast-import tends to amortize any malloc overheads to 0, due to
its use of large block allocations.
per object
fast-import maintains an in-memory structure for every object
written in this execution. On a 32 bit system the structure is 32
bytes, on a 64 bit system the structure is 40 bytes (due to the
larger pointer sizes). Objects in the table are not deallocated
until fast-import terminates. Importing 2 million objects on a 32
bit system will require approximately 64 MiB of memory.
The object table is actually a hashtable keyed on the object name
(the unique SHA-1). This storage configuration allows fast-import
to reuse an existing or already written object and avoid writing
duplicates to the output packfile. Duplicate blobs are
surprisingly common in an import, typically due to branch merges
in the source.
per mark
Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
bytes, depending on pointer size) per mark. Although the array is
sparse, frontends are still strongly encouraged to use marks
between 1 and n, where n is the total number of marks required
for this import.
per branch
Branches are classified as active and inactive. The memory usage
of the two classes is significantly different.
Inactive branches are stored in a structure which uses 96 or 120
bytes (32 bit or 64 bit systems, respectively), plus the length
of the branch name (typically under 200 bytes), per branch.
fast-import will easily handle as many as 10,000 inactive
branches in under 2 MiB of memory.
Active branches have the same overhead as inactive branches, but
also contain copies of every tree that has been recently modified
on that branch. If subtree include
has not been modified since
the branch became active, its contents will not be loaded into
memory, but if subtree src
has been modified by a commit since
the branch became active, then its contents will be loaded in
memory.
As active branches store metadata about the files contained on
that branch, their in-memory storage size can grow to a
considerable size (see below).
fast-import automatically moves active branches to inactive
status based on a simple least-recently-used algorithm. The LRU
chain is updated on each commit
command. The maximum number of
active branches can be increased or decreased on the command line
with --active-branches=.
per active tree
Trees (aka directories) use just 12 bytes of memory on top of the
memory required for their entries (see 'per active file' below).
The cost of a tree is virtually 0, as its overhead amortizes out
over the individual file entries.
per active file entry
Files (and pointers to subtrees) within active trees require 52
or 64 bytes (32/64 bit platforms) per entry. To conserve space,
file and tree names are pooled in a common string table, allowing
the filename 'Makefile' to use just 16 bytes (after including the
string header overhead) no matter how many times it occurs within
the project.
The active branch LRU, when coupled with the filename string pool
and lazy loading of subtrees, allows fast-import to efficiently
import projects with 2,000+ branches and 45,114+ files in a very
limited memory footprint (less than 2.7 MiB per active branch).