компилятор C и C ++ проекта GNU (GNU project C and C++ compiler)
Параметры подробно (Options detail)
Control Optimization - 2
-fipa-cp
Perform interprocedural constant propagation. This
optimization analyzes the program to determine when values
passed to functions are constants and then optimizes
accordingly. This optimization can substantially increase
performance if the application has constants passed to
functions. This flag is enabled by default at -O2
, -Os
and
-O3
. It is also enabled by -fprofile-use
and -fauto-profile
.
-fipa-cp-clone
Perform function cloning to make interprocedural constant
propagation stronger. When enabled, interprocedural constant
propagation performs function cloning when externally visible
function can be called with constant arguments. Because this
optimization can create multiple copies of functions, it may
significantly increase code size (see --param
ipcp-unit-growth=
value). This flag is enabled by default at
-O3
. It is also enabled by -fprofile-use
and -fauto-profile
.
-fipa-bit-cp
When enabled, perform interprocedural bitwise constant
propagation. This flag is enabled by default at -O2
and by
-fprofile-use
and -fauto-profile
. It requires that -fipa-cp
is enabled.
-fipa-vrp
When enabled, perform interprocedural propagation of value
ranges. This flag is enabled by default at -O2
. It requires
that -fipa-cp
is enabled.
-fipa-icf
Perform Identical Code Folding for functions and read-only
variables. The optimization reduces code size and may
disturb unwind stacks by replacing a function by equivalent
one with a different name. The optimization works more
effectively with link-time optimization enabled.
Although the behavior is similar to the Gold Linker's ICF
optimization, GCC ICF works on different levels and thus the
optimizations are not same - there are equivalences that are
found only by GCC and equivalences found only by Gold.
This flag is enabled by default at -O2
and -Os
.
-flive-patching=
level
Control GCC's optimizations to produce output suitable for
live-patching.
If the compiler's optimization uses a function's body or
information extracted from its body to optimize/change
another function, the latter is called an impacted function
of the former. If a function is patched, its impacted
functions should be patched too.
The impacted functions are determined by the compiler's
interprocedural optimizations. For example, a caller is
impacted when inlining a function into its caller, cloning a
function and changing its caller to call this new clone, or
extracting a function's pureness/constness information to
optimize its direct or indirect callers, etc.
Usually, the more IPA optimizations enabled, the larger the
number of impacted functions for each function. In order to
control the number of impacted functions and more easily
compute the list of impacted function, IPA optimizations can
be partially enabled at two different levels.
The level argument should be one of the following:
inline-clone
Only enable inlining and cloning optimizations, which
includes inlining, cloning, interprocedural scalar
replacement of aggregates and partial inlining. As a
result, when patching a function, all its callers and its
clones' callers are impacted, therefore need to be
patched as well.
-flive-patching=inline-clone
disables the following
optimization flags: -fwhole-program -fipa-pta
-fipa-reference -fipa-ra -fipa-icf -fipa-icf-functions
-fipa-icf-variables -fipa-bit-cp -fipa-vrp
-fipa-pure-const -fipa-reference-addressable
-fipa-stack-alignment
inline-only-static
Only enable inlining of static functions. As a result,
when patching a static function, all its callers are
impacted and so need to be patched as well.
In addition to all the flags that
-flive-patching=inline-clone
disables,
-flive-patching=inline-only-static
disables the following
additional optimization flags: -fipa-cp-clone -fipa-sra
-fpartial-inlining -fipa-cp
When -flive-patching
is specified without any value, the
default value is inline-clone.
This flag is disabled by default.
Note that -flive-patching
is not supported with link-time
optimization (-flto
).
-fisolate-erroneous-paths-dereference
Detect paths that trigger erroneous or undefined behavior due
to dereferencing a null pointer. Isolate those paths from
the main control flow and turn the statement with erroneous
or undefined behavior into a trap. This flag is enabled by
default at -O2
and higher and depends on
-fdelete-null-pointer-checks
also being enabled.
-fisolate-erroneous-paths-attribute
Detect paths that trigger erroneous or undefined behavior due
to a null value being used in a way forbidden by a
"returns_nonnull" or "nonnull" attribute. Isolate those
paths from the main control flow and turn the statement with
erroneous or undefined behavior into a trap. This is not
currently enabled, but may be enabled by -O2
in the future.
-ftree-sink
Perform forward store motion on trees. This flag is enabled
by default at -O
and higher.
-ftree-bit-ccp
Perform sparse conditional bit constant propagation on trees
and propagate pointer alignment information. This pass only
operates on local scalar variables and is enabled by default
at -O1
and higher, except for -Og
. It requires that
-ftree-ccp
is enabled.
-ftree-ccp
Perform sparse conditional constant propagation (CCP) on
trees. This pass only operates on local scalar variables and
is enabled by default at -O
and higher.
-fssa-backprop
Propagate information about uses of a value up the definition
chain in order to simplify the definitions. For example,
this pass strips sign operations if the sign of a value never
matters. The flag is enabled by default at -O
and higher.
-fssa-phiopt
Perform pattern matching on SSA PHI nodes to optimize
conditional code. This pass is enabled by default at -O1
and
higher, except for -Og
.
-ftree-switch-conversion
Perform conversion of simple initializations in a switch to
initializations from a scalar array. This flag is enabled by
default at -O2
and higher.
-ftree-tail-merge
Look for identical code sequences. When found, replace one
with a jump to the other. This optimization is known as tail
merging or cross jumping. This flag is enabled by default at
-O2
and higher. The compilation time in this pass can be
limited using max-tail-merge-comparisons
parameter and max-
tail-merge-iterations
parameter.
-ftree-dce
Perform dead code elimination (DCE) on trees. This flag is
enabled by default at -O
and higher.
-ftree-builtin-call-dce
Perform conditional dead code elimination (DCE) for calls to
built-in functions that may set "errno" but are otherwise
free of side effects. This flag is enabled by default at -O2
and higher if -Os
is not also specified.
-ftree-dominator-opts
Perform a variety of simple scalar cleanups (constant/copy
propagation, redundancy elimination, range propagation and
expression simplification) based on a dominator tree
traversal. This also performs jump threading (to reduce
jumps to jumps). This flag is enabled by default at -O
and
higher.
-ftree-dse
Perform dead store elimination (DSE) on trees. A dead store
is a store into a memory location that is later overwritten
by another store without any intervening loads. In this case
the earlier store can be deleted. This flag is enabled by
default at -O
and higher.
-ftree-ch
Perform loop header copying on trees. This is beneficial
since it increases effectiveness of code motion
optimizations. It also saves one jump. This flag is enabled
by default at -O
and higher. It is not enabled for -Os
,
since it usually increases code size.
-ftree-loop-optimize
Perform loop optimizations on trees. This flag is enabled by
default at -O
and higher.
-ftree-loop-linear
-floop-strip-mine
-floop-block
Perform loop nest optimizations. Same as
-floop-nest-optimize
. To use this code transformation, GCC
has to be configured with --with-isl
to enable the Graphite
loop transformation infrastructure.
-fgraphite-identity
Enable the identity transformation for graphite. For every
SCoP we generate the polyhedral representation and transform
it back to gimple. Using -fgraphite-identity
we can check
the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE
transformation. Some minimal optimizations are also
performed by the code generator isl, like index splitting and
dead code elimination in loops.
-floop-nest-optimize
Enable the isl based loop nest optimizer. This is a generic
loop nest optimizer based on the Pluto optimization
algorithms. It calculates a loop structure optimized for
data-locality and parallelism. This option is experimental.
-floop-parallelize-all
Use the Graphite data dependence analysis to identify loops
that can be parallelized. Parallelize all the loops that can
be analyzed to not contain loop carried dependences without
checking that it is profitable to parallelize the loops.
-ftree-coalesce-vars
While transforming the program out of the SSA representation,
attempt to reduce copying by coalescing versions of different
user-defined variables, instead of just compiler temporaries.
This may severely limit the ability to debug an optimized
program compiled with -fno-var-tracking-assignments
. In the
negated form, this flag prevents SSA coalescing of user
variables. This option is enabled by default if optimization
is enabled, and it does very little otherwise.
-ftree-loop-if-convert
Attempt to transform conditional jumps in the innermost loops
to branch-less equivalents. The intent is to remove control-
flow from the innermost loops in order to improve the ability
of the vectorization pass to handle these loops. This is
enabled by default if vectorization is enabled.
-ftree-loop-distribution
Perform loop distribution. This flag can improve cache
performance on big loop bodies and allow further loop
optimizations, like parallelization or vectorization, to take
place. For example, the loop
DO I = 1, N
A(I) = B(I) + C
D(I) = E(I) * F
ENDDO
is transformed to
DO I = 1, N
A(I) = B(I) + C
ENDDO
DO I = 1, N
D(I) = E(I) * F
ENDDO
This flag is enabled by default at -O3
. It is also enabled
by -fprofile-use
and -fauto-profile
.
-ftree-loop-distribute-patterns
Perform loop distribution of patterns that can be code
generated with calls to a library. This flag is enabled by
default at -O3
, and by -fprofile-use
and -fauto-profile
.
This pass distributes the initialization loops and generates
a call to memset zero. For example, the loop
DO I = 1, N
A(I) = 0
B(I) = A(I) + I
ENDDO
is transformed to
DO I = 1, N
A(I) = 0
ENDDO
DO I = 1, N
B(I) = A(I) + I
ENDDO
and the initialization loop is transformed into a call to
memset zero. This flag is enabled by default at -O3
. It is
also enabled by -fprofile-use
and -fauto-profile
.
-floop-interchange
Perform loop interchange outside of graphite. This flag can
improve cache performance on loop nest and allow further loop
optimizations, like vectorization, to take place. For
example, the loop
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
is transformed to
for (int i = 0; i < N; i++)
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
c[i][j] = c[i][j] + a[i][k]*b[k][j];
This flag is enabled by default at -O3
. It is also enabled
by -fprofile-use
and -fauto-profile
.
-floop-unroll-and-jam
Apply unroll and jam transformations on feasible loops. In a
loop nest this unrolls the outer loop by some factor and
fuses the resulting multiple inner loops. This flag is
enabled by default at -O3
. It is also enabled by
-fprofile-use
and -fauto-profile
.
-ftree-loop-im
Perform loop invariant motion on trees. This pass moves only
invariants that are hard to handle at RTL level (function
calls, operations that expand to nontrivial sequences of
insns). With -funswitch-loops
it also moves operands of
conditions that are invariant out of the loop, so that we can
use just trivial invariantness analysis in loop unswitching.
The pass also includes store motion.
-ftree-loop-ivcanon
Create a canonical counter for number of iterations in loops
for which determining number of iterations requires
complicated analysis. Later optimizations then may determine
the number easily. Useful especially in connection with
unrolling.
-ftree-scev-cprop
Perform final value replacement. If a variable is modified
in a loop in such a way that its value when exiting the loop
can be determined using only its initial value and the number
of loop iterations, replace uses of the final value by such a
computation, provided it is sufficiently cheap. This reduces
data dependencies and may allow further simplifications.
Enabled by default at -O
and higher.
-fivopts
Perform induction variable optimizations (strength reduction,
induction variable merging and induction variable
elimination) on trees.
-ftree-parallelize-loops=n
Parallelize loops, i.e., split their iteration space to run
in n threads. This is only possible for loops whose
iterations are independent and can be arbitrarily reordered.
The optimization is only profitable on multiprocessor
machines, for loops that are CPU-intensive, rather than
constrained e.g. by memory bandwidth. This option implies
-pthread
, and thus is only supported on targets that have
support for -pthread
.
-ftree-pta
Perform function-local points-to analysis on trees. This
flag is enabled by default at -O1
and higher, except for -Og
.
-ftree-sra
Perform scalar replacement of aggregates. This pass replaces
structure references with scalars to prevent committing
structures to memory too early. This flag is enabled by
default at -O1
and higher, except for -Og
.
-fstore-merging
Perform merging of narrow stores to consecutive memory
addresses. This pass merges contiguous stores of immediate
values narrower than a word into fewer wider stores to reduce
the number of instructions. This is enabled by default at
-O2
and higher as well as -Os
.
-ftree-ter
Perform temporary expression replacement during the
SSA->normal phase. Single use/single def temporaries are
replaced at their use location with their defining
expression. This results in non-GIMPLE code, but gives the
expanders much more complex trees to work on resulting in
better RTL generation. This is enabled by default at -O
and
higher.
-ftree-slsr
Perform straight-line strength reduction on trees. This
recognizes related expressions involving multiplications and
replaces them by less expensive calculations when possible.
This is enabled by default at -O
and higher.
-ftree-vectorize
Perform vectorization on trees. This flag enables
-ftree-loop-vectorize
and -ftree-slp-vectorize
if not
explicitly specified.
-ftree-loop-vectorize
Perform loop vectorization on trees. This flag is enabled by
default at -O3
and by -ftree-vectorize
, -fprofile-use
, and
-fauto-profile
.
-ftree-slp-vectorize
Perform basic block vectorization on trees. This flag is
enabled by default at -O3
and by -ftree-vectorize
,
-fprofile-use
, and -fauto-profile
.
-fvect-cost-model=
model
Alter the cost model used for vectorization. The model
argument should be one of unlimited
, dynamic
or cheap
. With
the unlimited
model the vectorized code-path is assumed to be
profitable while with the dynamic
model a runtime check
guards the vectorized code-path to enable it only for
iteration counts that will likely execute faster than when
executing the original scalar loop. The cheap
model disables
vectorization of loops where doing so would be cost
prohibitive for example due to required runtime checks for
data dependence or alignment but otherwise is equal to the
dynamic
model. The default cost model depends on other
optimization flags and is either dynamic
or cheap
.
-fsimd-cost-model=
model
Alter the cost model used for vectorization of loops marked
with the OpenMP simd directive. The model argument should be
one of unlimited
, dynamic
, cheap
. All values of model have
the same meaning as described in -fvect-cost-model
and by
default a cost model defined with -fvect-cost-model
is used.
-ftree-vrp
Perform Value Range Propagation on trees. This is similar to
the constant propagation pass, but instead of values, ranges
of values are propagated. This allows the optimizers to
remove unnecessary range checks like array bound checks and
null pointer checks. This is enabled by default at -O2
and
higher. Null pointer check elimination is only done if
-fdelete-null-pointer-checks
is enabled.
-fsplit-paths
Split paths leading to loop backedges. This can improve dead
code elimination and common subexpression elimination. This
is enabled by default at -O3
and above.
-fsplit-ivs-in-unroller
Enables expression of values of induction variables in later
iterations of the unrolled loop using the value in the first
iteration. This breaks long dependency chains, thus
improving efficiency of the scheduling passes.
A combination of -fweb
and CSE is often sufficient to obtain
the same effect. However, that is not reliable in cases
where the loop body is more complicated than a single basic
block. It also does not work at all on some architectures
due to restrictions in the CSE pass.
This optimization is enabled by default.
-fvariable-expansion-in-unroller
With this option, the compiler creates multiple copies of
some local variables when unrolling a loop, which can result
in superior code.
-fpartial-inlining
Inline parts of functions. This option has any effect only
when inlining itself is turned on by the -finline-functions
or -finline-small-functions
options.
Enabled at levels -O2
, -O3
, -Os
.
-fpredictive-commoning
Perform predictive commoning optimization, i.e., reusing
computations (especially memory loads and stores) performed
in previous iterations of loops.
This option is enabled at level -O3
. It is also enabled by
-fprofile-use
and -fauto-profile
.
-fprefetch-loop-arrays
If supported by the target machine, generate instructions to
prefetch memory to improve the performance of loops that
access large arrays.
This option may generate better or worse code; results are
highly dependent on the structure of loops within the source
code.
Disabled at level -Os
.
-fno-printf-return-value
Do not substitute constants for known return value of
formatted output functions such as "sprintf", "snprintf",
"vsprintf", and "vsnprintf" (but not "printf" of "fprintf").
This transformation allows GCC to optimize or even eliminate
branches based on the known return value of these functions
called with arguments that are either constant, or whose
values are known to be in a range that makes determining the
exact return value possible. For example, when
-fprintf-return-value
is in effect, both the branch and the
body of the "if" statement (but not the call to "snprint")
can be optimized away when "i" is a 32-bit or smaller integer
because the return value is guaranteed to be at most 8.
char buf[9];
if (snprintf (buf, "%08x", i) >= sizeof buf)
...
The -fprintf-return-value
option relies on other
optimizations and yields best results with -O2
and above. It
works in tandem with the -Wformat-overflow
and
-Wformat-truncation
options. The -fprintf-return-value
option is enabled by default.
-fno-peephole
-fno-peephole2
Disable any machine-specific peephole optimizations. The
difference between -fno-peephole
and -fno-peephole2
is in how
they are implemented in the compiler; some targets use one,
some use the other, a few use both.
-fpeephole
is enabled by default. -fpeephole2
enabled at
levels -O2
, -O3
, -Os
.
-fno-guess-branch-probability
Do not guess branch probabilities using heuristics.
GCC uses heuristics to guess branch probabilities if they are
not provided by profiling feedback (-fprofile-arcs
). These
heuristics are based on the control flow graph. If some
branch probabilities are specified by "__builtin_expect",
then the heuristics are used to guess branch probabilities
for the rest of the control flow graph, taking the
"__builtin_expect" info into account. The interactions
between the heuristics and "__builtin_expect" can be complex,
and in some cases, it may be useful to disable the heuristics
so that the effects of "__builtin_expect" are easier to
understand.
It is also possible to specify expected probability of the
expression with "__builtin_expect_with_probability" built-in
function.
The default is -fguess-branch-probability
at levels -O
, -O2
,
-O3
, -Os
.
-freorder-blocks
Reorder basic blocks in the compiled function in order to
reduce number of taken branches and improve code locality.
Enabled at levels -O
, -O2
, -O3
, -Os
.
-freorder-blocks-algorithm=
algorithm
Use the specified algorithm for basic block reordering. The
algorithm argument can be simple
, which does not increase
code size (except sometimes due to secondary effects like
alignment), or stc
, the "software trace cache" algorithm,
which tries to put all often executed code together,
minimizing the number of branches executed by making extra
copies of code.
The default is simple
at levels -O
, -Os
, and stc
at levels
-O2
, -O3
.
-freorder-blocks-and-partition
In addition to reordering basic blocks in the compiled
function, in order to reduce number of taken branches,
partitions hot and cold basic blocks into separate sections
of the assembly and .o files, to improve paging and cache
locality performance.
This optimization is automatically turned off in the presence
of exception handling or unwind tables (on targets using
setjump/longjump or target specific scheme), for linkonce
sections, for functions with a user-defined section attribute
and on any architecture that does not support named sections.
When -fsplit-stack
is used this option is not enabled by
default (to avoid linker errors), but may be enabled
explicitly (if using a working linker).
Enabled for x86 at levels -O2
, -O3
, -Os
.
-freorder-functions
Reorder functions in the object file in order to improve code
locality. This is implemented by using special subsections
".text.hot" for most frequently executed functions and
".text.unlikely" for unlikely executed functions. Reordering
is done by the linker so object file format must support
named sections and linker must place them in a reasonable
way.
This option isn't effective unless you either provide profile
feedback (see -fprofile-arcs
for details) or manually
annotate functions with "hot" or "cold" attributes.
Enabled at levels -O2
, -O3
, -Os
.
-fstrict-aliasing
Allow the compiler to assume the strictest aliasing rules
applicable to the language being compiled. For C (and C++),
this activates optimizations based on the type of
expressions. In particular, an object of one type is assumed
never to reside at the same address as an object of a
different type, unless the types are almost the same. For
example, an "unsigned int" can alias an "int", but not a
"void*" or a "double". A character type may alias any other
type.
Pay special attention to code like this:
union a_union {
int i;
double d;
};
int f() {
union a_union t;
t.d = 3.0;
return t.i;
}
The practice of reading from a different union member than
the one most recently written to (called "type-punning") is
common. Even with -fstrict-aliasing
, type-punning is
allowed, provided the memory is accessed through the union
type. So, the code above works as expected. However, this
code might not:
int f() {
union a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}
Similarly, access by taking the address, casting the
resulting pointer and dereferencing the result has undefined
behavior, even if the cast uses a union type, e.g.:
int f() {
double d = 3.0;
return ((union a_union *) &d)->i;
}
The -fstrict-aliasing
option is enabled at levels -O2
, -O3
,
-Os
.
-falign-functions
-falign-functions=
n
-falign-functions=
n:
m
-falign-functions=
n:
m:
n2
-falign-functions=
n:
m:
n2:
m2
Align the start of functions to the next power-of-two greater
than n, skipping up to m-1 bytes. This ensures that at least
the first m bytes of the function can be fetched by the CPU
without crossing an n-byte alignment boundary.
If m is not specified, it defaults to n.
Examples: -falign-functions=32
aligns functions to the next
32-byte boundary, -falign-functions=24
aligns to the next
32-byte boundary only if this can be done by skipping 23
bytes or less, -falign-functions=32:7
aligns to the next
32-byte boundary only if this can be done by skipping 6 bytes
or less.
The second pair of n2:m2 values allows you to specify a
secondary alignment: -falign-functions=64:7:32:3
aligns to
the next 64-byte boundary if this can be done by skipping 6
bytes or less, otherwise aligns to the next 32-byte boundary
if this can be done by skipping 2 bytes or less. If m2 is
not specified, it defaults to n2.
Some assemblers only support this flag when n is a power of
two; in that case, it is rounded up.
-fno-align-functions
and -falign-functions=1
are equivalent
and mean that functions are not aligned.
If n is not specified or is zero, use a machine-dependent
default. The maximum allowed n option value is 65536.
Enabled at levels -O2
, -O3
.
-flimit-function-alignment
If this option is enabled, the compiler tries to avoid
unnecessarily overaligning functions. It attempts to instruct
the assembler to align by the amount specified by
-falign-functions
, but not to skip more bytes than the size
of the function.
-falign-labels
-falign-labels=
n
-falign-labels=
n:
m
-falign-labels=
n:
m:
n2
-falign-labels=
n:
m:
n2:
m2
Align all branch targets to a power-of-two boundary.
Parameters of this option are analogous to the
-falign-functions
option. -fno-align-labels
and
-falign-labels=1
are equivalent and mean that labels are not
aligned.
If -falign-loops
or -falign-jumps
are applicable and are
greater than this value, then their values are used instead.
If n is not specified or is zero, use a machine-dependent
default which is very likely to be 1
, meaning no alignment.
The maximum allowed n option value is 65536.
Enabled at levels -O2
, -O3
.
-falign-loops
-falign-loops=
n
-falign-loops=
n:
m
-falign-loops=
n:
m:
n2
-falign-loops=
n:
m:
n2:
m2
Align loops to a power-of-two boundary. If the loops are
executed many times, this makes up for any execution of the
dummy padding instructions.
Parameters of this option are analogous to the
-falign-functions
option. -fno-align-loops
and
-falign-loops=1
are equivalent and mean that loops are not
aligned. The maximum allowed n option value is 65536.
If n is not specified or is zero, use a machine-dependent
default.
Enabled at levels -O2
, -O3
.
-falign-jumps
-falign-jumps=
n
-falign-jumps=
n:
m
-falign-jumps=
n:
m:
n2
-falign-jumps=
n:
m:
n2:
m2
Align branch targets to a power-of-two boundary, for branch
targets where the targets can only be reached by jumping. In
this case, no dummy operations need be executed.
Parameters of this option are analogous to the
-falign-functions
option. -fno-align-jumps
and
-falign-jumps=1
are equivalent and mean that loops are not
aligned.
If n is not specified or is zero, use a machine-dependent
default. The maximum allowed n option value is 65536.
Enabled at levels -O2
, -O3
.
-funit-at-a-time
This option is left for compatibility reasons.
-funit-at-a-time
has no effect, while -fno-unit-at-a-time
implies -fno-toplevel-reorder
and -fno-section-anchors
.
Enabled by default.
-fno-toplevel-reorder
Do not reorder top-level functions, variables, and "asm"
statements. Output them in the same order that they appear
in the input file. When this option is used, unreferenced
static variables are not removed. This option is intended to
support existing code that relies on a particular ordering.
For new code, it is better to use attributes when possible.
-ftoplevel-reorder
is the default at -O1
and higher, and also
at -O0
if -fsection-anchors
is explicitly requested.
Additionally -fno-toplevel-reorder
implies
-fno-section-anchors
.
-fweb
Constructs webs as commonly used for register allocation
purposes and assign each web individual pseudo register.
This allows the register allocation pass to operate on
pseudos directly, but also strengthens several other
optimization passes, such as CSE, loop optimizer and trivial
dead code remover. It can, however, make debugging
impossible, since variables no longer stay in a "home
register".
Enabled by default with -funroll-loops
.
-fwhole-program
Assume that the current compilation unit represents the whole
program being compiled. All public functions and variables
with the exception of "main" and those merged by attribute
"externally_visible" become static functions and in effect
are optimized more aggressively by interprocedural
optimizers.
This option should not be used in combination with -flto
.
Instead relying on a linker plugin should provide safer and
more precise information.
-flto[=
n]
This option runs the standard link-time optimizer. When
invoked with source code, it generates GIMPLE (one of GCC's
internal representations) and writes it to special ELF
sections in the object file. When the object files are
linked together, all the function bodies are read from these
ELF sections and instantiated as if they had been part of the
same translation unit.
To use the link-time optimizer, -flto
and optimization
options should be specified at compile time and during the
final link. It is recommended that you compile all the files
participating in the same link with the same options and also
specify those options at link time. For example:
gcc -c -O2 -flto foo.c
gcc -c -O2 -flto bar.c
gcc -o myprog -flto -O2 foo.o bar.o
The first two invocations to GCC save a bytecode
representation of GIMPLE into special ELF sections inside
foo.o and bar.o. The final invocation reads the GIMPLE
bytecode from foo.o and bar.o, merges the two files into a
single internal image, and compiles the result as usual.
Since both foo.o and bar.o are merged into a single image,
this causes all the interprocedural analyses and
optimizations in GCC to work across the two files as if they
were a single one. This means, for example, that the inliner
is able to inline functions in bar.o into functions in foo.o
and vice-versa.
Another (simpler) way to enable link-time optimization is:
gcc -o myprog -flto -O2 foo.c bar.c
The above generates bytecode for foo.c and bar.c, merges them
together into a single GIMPLE representation and optimizes
them as usual to produce myprog.
The important thing to keep in mind is that to enable link-
time optimizations you need to use the GCC driver to perform
the link step. GCC automatically performs link-time
optimization if any of the objects involved were compiled
with the -flto
command-line option. You can always override
the automatic decision to do link-time optimization by
passing -fno-lto
to the link command.
To make whole program optimization effective, it is necessary
to make certain whole program assumptions. The compiler
needs to know what functions and variables can be accessed by
libraries and runtime outside of the link-time optimized
unit. When supported by the linker, the linker plugin (see
-fuse-linker-plugin
) passes information to the compiler about
used and externally visible symbols. When the linker plugin
is not available, -fwhole-program
should be used to allow the
compiler to make these assumptions, which leads to more
aggressive optimization decisions.
When a file is compiled with -flto
without
-fuse-linker-plugin
, the generated object file is larger than
a regular object file because it contains GIMPLE bytecodes
and the usual final code (see -ffat-lto-objects
. This means
that object files with LTO information can be linked as
normal object files; if -fno-lto
is passed to the linker, no
interprocedural optimizations are applied. Note that when
-fno-fat-lto-objects
is enabled the compile stage is faster
but you cannot perform a regular, non-LTO link on them.
When producing the final binary, GCC only applies link-time
optimizations to those files that contain bytecode.
Therefore, you can mix and match object files and libraries
with GIMPLE bytecodes and final object code. GCC
automatically selects which files to optimize in LTO mode and
which files to link without further processing.
Generally, options specified at link time override those
specified at compile time, although in some cases GCC
attempts to infer link-time options from the settings used to
compile the input files.
If you do not specify an optimization level option -O
at link
time, then GCC uses the highest optimization level used when
compiling the object files. Note that it is generally
ineffective to specify an optimization level option only at
link time and not at compile time, for two reasons. First,
compiling without optimization suppresses compiler passes
that gather information needed for effective optimization at
link time. Second, some early optimization passes can be
performed only at compile time and not at link time.
There are some code generation flags preserved by GCC when
generating bytecodes, as they need to be used during the
final link. Currently, the following options and their
settings are taken from the first object file that explicitly
specifies them: -fPIC
, -fpic
, -fpie
, -fcommon
, -fexceptions
,
-fnon-call-exceptions
, -fgnu-tm
and all the -m
target flags.
Certain ABI-changing flags are required to match in all
compilation units, and trying to override this at link time
with a conflicting value is ignored. This includes options
such as -freg-struct-return
and -fpcc-struct-return
.
Other options such as -ffp-contract
, -fno-strict-overflow
,
-fwrapv
, -fno-trapv
or -fno-strict-aliasing
are passed
through to the link stage and merged conservatively for
conflicting translation units. Specifically
-fno-strict-overflow
, -fwrapv
and -fno-trapv
take precedence;
and for example -ffp-contract=off
takes precedence over
-ffp-contract=fast
. You can override them at link time.
When you need to pass options to the assembler via -Wa
or
-Xassembler
make sure to either compile such translation
units with -fno-lto
or consistently use the same assembler
options on all translation units. You can alternatively also
specify assembler options at LTO link time.
If LTO encounters objects with C linkage declared with
incompatible types in separate translation units to be linked
together (undefined behavior according to ISO C99 6.2.7), a
non-fatal diagnostic may be issued. The behavior is still
undefined at run time. Similar diagnostics may be raised for
other languages.
Another feature of LTO is that it is possible to apply
interprocedural optimizations on files written in different
languages:
gcc -c -flto foo.c
g++ -c -flto bar.cc
gfortran -c -flto baz.f90
g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran
Notice that the final link is done with g++
to get the C++
runtime libraries and -lgfortran
is added to get the Fortran
runtime libraries. In general, when mixing languages in LTO
mode, you should use the same link command options as when
mixing languages in a regular (non-LTO) compilation.
If object files containing GIMPLE bytecode are stored in a
library archive, say libfoo.a, it is possible to extract and
use them in an LTO link if you are using a linker with plugin
support. To create static libraries suitable for LTO, use
gcc-ar
and gcc-ranlib
instead of ar
and ranlib
; to show the
symbols of object files with GIMPLE bytecode, use gcc-nm
.
Those commands require that ar
, ranlib
and nm
have been
compiled with plugin support. At link time, use the flag
-fuse-linker-plugin
to ensure that the library participates
in the LTO optimization process:
gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo
With the linker plugin enabled, the linker extracts the
needed GIMPLE files from libfoo.a and passes them on to the
running GCC to make them part of the aggregated GIMPLE image
to be optimized.
If you are not using a linker with plugin support and/or do
not enable the linker plugin, then the objects inside
libfoo.a are extracted and linked as usual, but they do not
participate in the LTO optimization process. In order to
make a static library suitable for both LTO optimization and
usual linkage, compile its object files with -flto
-ffat-lto-objects
.
Link-time optimizations do not require the presence of the
whole program to operate. If the program does not require
any symbols to be exported, it is possible to combine -flto
and -fwhole-program
to allow the interprocedural optimizers
to use more aggressive assumptions which may lead to improved
optimization opportunities. Use of -fwhole-program
is not
needed when linker plugin is active (see
-fuse-linker-plugin
).
The current implementation of LTO makes no attempt to
generate bytecode that is portable between different types of
hosts. The bytecode files are versioned and there is a
strict version check, so bytecode files generated in one
version of GCC do not work with an older or newer version of
GCC.
Link-time optimization does not work well with generation of
debugging information on systems other than those using a
combination of ELF and DWARF.
If you specify the optional n, the optimization and code
generation done at link time is executed in parallel using n
parallel jobs by utilizing an installed make
program. The
environment variable MAKE
may be used to override the program
used. The default value for n is 1.
You can also specify -flto=jobserver
to use GNU make's job
server mode to determine the number of parallel jobs. This is
useful when the Makefile calling GCC is already executing in
parallel. You must prepend a +
to the command recipe in the
parent Makefile for this to work. This option likely only
works if MAKE
is GNU make.
-flto-partition=
alg
Specify the partitioning algorithm used by the link-time
optimizer. The value is either 1to1
to specify a
partitioning mirroring the original source files or balanced
to specify partitioning into equally sized chunks (whenever
possible) or max
to create new partition for every symbol
where possible. Specifying none
as an algorithm disables
partitioning and streaming completely. The default value is
balanced
. While 1to1
can be used as an workaround for various
code ordering issues, the max
partitioning is intended for
internal testing only. The value one
specifies that exactly
one partition should be used while the value none
bypasses
partitioning and executes the link-time optimization step
directly from the WPA phase.