компилятор C и C ++ проекта GNU (GNU project C and C++ compiler)
Параметры подробно (Options detail)
Control Optimization - 3
-flto-odr-type-merging
Enable streaming of mangled types names of C++ types and
their unification at link time. This increases size of LTO
object files, but enables diagnostics about One Definition
Rule violations.
-flto-compression-level=
n
This option specifies the level of compression used for
intermediate language written to LTO object files, and is
only meaningful in conjunction with LTO mode (-flto
). Valid
values are 0 (no compression) to 9 (maximum compression).
Values outside this range are clamped to either 0 or 9. If
the option is not given, a default balanced compression
setting is used.
-fuse-linker-plugin
Enables the use of a linker plugin during link-time
optimization. This option relies on plugin support in the
linker, which is available in gold or in GNU ld 2.21 or
newer.
This option enables the extraction of object files with
GIMPLE bytecode out of library archives. This improves the
quality of optimization by exposing more code to the link-
time optimizer. This information specifies what symbols can
be accessed externally (by non-LTO object or during dynamic
linking). Resulting code quality improvements on binaries
(and shared libraries that use hidden visibility) are similar
to -fwhole-program
. See -flto
for a description of the
effect of this flag and how to use it.
This option is enabled by default when LTO support in GCC is
enabled and GCC was configured for use with a linker
supporting plugins (GNU ld 2.21 or newer or gold).
-ffat-lto-objects
Fat LTO objects are object files that contain both the
intermediate language and the object code. This makes them
usable for both LTO linking and normal linking. This option
is effective only when compiling with -flto
and is ignored at
link time.
-fno-fat-lto-objects
improves compilation time over plain
LTO, but requires the complete toolchain to be aware of LTO.
It requires a linker with linker plugin support for basic
functionality. Additionally, nm
, ar
and ranlib
need to
support linker plugins to allow a full-featured build
environment (capable of building static libraries etc). GCC
provides the gcc-ar
, gcc-nm
, gcc-ranlib
wrappers to pass the
right options to these tools. With non fat LTO makefiles need
to be modified to use them.
Note that modern binutils provide plugin auto-load mechanism.
Installing the linker plugin into $libdir/bfd-plugins has the
same effect as usage of the command wrappers (gcc-ar
, gcc-nm
and gcc-ranlib
).
The default is -fno-fat-lto-objects
on targets with linker
plugin support.
-fcompare-elim
After register allocation and post-register allocation
instruction splitting, identify arithmetic instructions that
compute processor flags similar to a comparison operation
based on that arithmetic. If possible, eliminate the
explicit comparison operation.
This pass only applies to certain targets that cannot
explicitly represent the comparison operation before register
allocation is complete.
Enabled at levels -O
, -O2
, -O3
, -Os
.
-fcprop-registers
After register allocation and post-register allocation
instruction splitting, perform a copy-propagation pass to try
to reduce scheduling dependencies and occasionally eliminate
the copy.
Enabled at levels -O
, -O2
, -O3
, -Os
.
-fprofile-correction
Profiles collected using an instrumented binary for multi-
threaded programs may be inconsistent due to missed counter
updates. When this option is specified, GCC uses heuristics
to correct or smooth out such inconsistencies. By default,
GCC emits an error message when an inconsistent profile is
detected.
This option is enabled by -fauto-profile
.
-fprofile-use
-fprofile-use=
path
Enable profile feedback-directed optimizations, and the
following optimizations, many of which are generally
profitable only with profile feedback available:
-fbranch-probabilities -fprofile-values -funroll-loops
-fpeel-loops -ftracer -fvpt -finline-functions -fipa-cp
-fipa-cp-clone -fipa-bit-cp -fpredictive-commoning
-fsplit-loops -funswitch-loops -fgcse-after-reload
-ftree-loop-vectorize -ftree-slp-vectorize
-fvect-cost-model=dynamic -ftree-loop-distribute-patterns
-fprofile-reorder-functions
Before you can use this option, you must first generate
profiling information.
By default, GCC emits an error message if the feedback
profiles do not match the source code. This error can be
turned into a warning by using -Wno-error=coverage-mismatch
.
Note this may result in poorly optimized code. Additionally,
by default, GCC also emits a warning message if the feedback
profiles do not exist (see -Wmissing-profile
).
If path is specified, GCC looks at the path to find the
profile feedback data files. See -fprofile-dir
.
-fauto-profile
-fauto-profile=
path
Enable sampling-based feedback-directed optimizations, and
the following optimizations, many of which are generally
profitable only with profile feedback available:
-fbranch-probabilities -fprofile-values -funroll-loops
-fpeel-loops -ftracer -fvpt -finline-functions -fipa-cp
-fipa-cp-clone -fipa-bit-cp -fpredictive-commoning
-fsplit-loops -funswitch-loops -fgcse-after-reload
-ftree-loop-vectorize -ftree-slp-vectorize
-fvect-cost-model=dynamic -ftree-loop-distribute-patterns
-fprofile-correction
path is the name of a file containing AutoFDO profile
information. If omitted, it defaults to fbdata.afdo in the
current directory.
Producing an AutoFDO profile data file requires running your
program with the perf
utility on a supported GNU/Linux target
system. For more information, see
<https://perf.wiki.kernel.org/
>.
E.g.
perf record -e br_inst_retired:near_taken -b -o perf.data \
-- your_program
Then use the create_gcov
tool to convert the raw profile data
to a format that can be used by GCC. You must also supply
the unstripped binary for your program to this tool. See
<https://github.com/google/autofdo
>.
E.g.
create_gcov --binary=your_program.unstripped --profile=perf.data \
--gcov=profile.afdo
The following options control compiler behavior regarding
floating-point arithmetic. These options trade off between speed
and correctness. All must be specifically enabled.
-ffloat-store
Do not store floating-point variables in registers, and
inhibit other options that might change whether a floating-
point value is taken from a register or memory.
This option prevents undesirable excess precision on machines
such as the 68000 where the floating registers (of the 68881)
keep more precision than a "double" is supposed to have.
Similarly for the x86 architecture. For most programs, the
excess precision does only good, but a few programs rely on
the precise definition of IEEE floating point. Use
-ffloat-store
for such programs, after modifying them to
store all pertinent intermediate computations into variables.
-fexcess-precision=
style
This option allows further control over excess precision on
machines where floating-point operations occur in a format
with more precision or range than the IEEE standard and
interchange floating-point types. By default,
-fexcess-precision=fast
is in effect; this means that
operations may be carried out in a wider precision than the
types specified in the source if that would result in faster
code, and it is unpredictable when rounding to the types
specified in the source code takes place. When compiling C,
if -fexcess-precision=standard
is specified then excess
precision follows the rules specified in ISO C99; in
particular, both casts and assignments cause values to be
rounded to their semantic types (whereas -ffloat-store
only
affects assignments). This option is enabled by default for
C if a strict conformance option such as -std=c99
is used.
-ffast-math
enables -fexcess-precision=fast
by default
regardless of whether a strict conformance option is used.
-fexcess-precision=standard
is not implemented for languages
other than C. On the x86, it has no effect if -mfpmath=sse
or -mfpmath=sse+387
is specified; in the former case, IEEE
semantics apply without excess precision, and in the latter,
rounding is unpredictable.
-ffast-math
Sets the options -fno-math-errno
,
-funsafe-math-optimizations
, -ffinite-math-only
,
-fno-rounding-math
, -fno-signaling-nans
, -fcx-limited-range
and -fexcess-precision=fast
.
This option causes the preprocessor macro "__FAST_MATH__" to
be defined.
This option is not turned on by any -O
option besides -Ofast
since it can result in incorrect output for programs that
depend on an exact implementation of IEEE or ISO
rules/specifications for math functions. It may, however,
yield faster code for programs that do not require the
guarantees of these specifications.
-fno-math-errno
Do not set "errno" after calling math functions that are
executed with a single instruction, e.g., "sqrt". A program
that relies on IEEE exceptions for math error handling may
want to use this flag for speed while maintaining IEEE
arithmetic compatibility.
This option is not turned on by any -O
option since it can
result in incorrect output for programs that depend on an
exact implementation of IEEE or ISO rules/specifications for
math functions. It may, however, yield faster code for
programs that do not require the guarantees of these
specifications.
The default is -fmath-errno
.
On Darwin systems, the math library never sets "errno".
There is therefore no reason for the compiler to consider the
possibility that it might, and -fno-math-errno
is the
default.
-funsafe-math-optimizations
Allow optimizations for floating-point arithmetic that (a)
assume that arguments and results are valid and (b) may
violate IEEE or ANSI standards. When used at link time, it
may include libraries or startup files that change the
default FPU control word or other similar optimizations.
This option is not turned on by any -O
option since it can
result in incorrect output for programs that depend on an
exact implementation of IEEE or ISO rules/specifications for
math functions. It may, however, yield faster code for
programs that do not require the guarantees of these
specifications. Enables -fno-signed-zeros
,
-fno-trapping-math
, -fassociative-math
and -freciprocal-math
.
The default is -fno-unsafe-math-optimizations
.
-fassociative-math
Allow re-association of operands in series of floating-point
operations. This violates the ISO C and C++ language
standard by possibly changing computation result. NOTE: re-
ordering may change the sign of zero as well as ignore NaNs
and inhibit or create underflow or overflow (and thus cannot
be used on code that relies on rounding behavior like "(x +
2**52) - 2**52". May also reorder floating-point comparisons
and thus may not be used when ordered comparisons are
required. This option requires that both -fno-signed-zeros
and -fno-trapping-math
be in effect. Moreover, it doesn't
make much sense with -frounding-math
. For Fortran the option
is automatically enabled when both -fno-signed-zeros
and
-fno-trapping-math
are in effect.
The default is -fno-associative-math
.
-freciprocal-math
Allow the reciprocal of a value to be used instead of
dividing by the value if this enables optimizations. For
example "x / y" can be replaced with "x * (1/y)", which is
useful if "(1/y)" is subject to common subexpression
elimination. Note that this loses precision and increases
the number of flops operating on the value.
The default is -fno-reciprocal-math
.
-ffinite-math-only
Allow optimizations for floating-point arithmetic that assume
that arguments and results are not NaNs or +-Infs.
This option is not turned on by any -O
option since it can
result in incorrect output for programs that depend on an
exact implementation of IEEE or ISO rules/specifications for
math functions. It may, however, yield faster code for
programs that do not require the guarantees of these
specifications.
The default is -fno-finite-math-only
.
-fno-signed-zeros
Allow optimizations for floating-point arithmetic that ignore
the signedness of zero. IEEE arithmetic specifies the
behavior of distinct +0.0 and -0.0 values, which then
prohibits simplification of expressions such as x+0.0 or
0.0*x (even with -ffinite-math-only
). This option implies
that the sign of a zero result isn't significant.
The default is -fsigned-zeros
.
-fno-trapping-math
Compile code assuming that floating-point operations cannot
generate user-visible traps. These traps include division by
zero, overflow, underflow, inexact result and invalid
operation. This option requires that -fno-signaling-nans
be
in effect. Setting this option may allow faster code if one
relies on "non-stop" IEEE arithmetic, for example.
This option should never be turned on by any -O
option since
it can result in incorrect output for programs that depend on
an exact implementation of IEEE or ISO rules/specifications
for math functions.
The default is -ftrapping-math
.
-frounding-math
Disable transformations and optimizations that assume default
floating-point rounding behavior. This is round-to-zero for
all floating point to integer conversions, and round-to-
nearest for all other arithmetic truncations. This option
should be specified for programs that change the FP rounding
mode dynamically, or that may be executed with a non-default
rounding mode. This option disables constant folding of
floating-point expressions at compile time (which may be
affected by rounding mode) and arithmetic transformations
that are unsafe in the presence of sign-dependent rounding
modes.
The default is -fno-rounding-math
.
This option is experimental and does not currently guarantee
to disable all GCC optimizations that are affected by
rounding mode. Future versions of GCC may provide finer
control of this setting using C99's "FENV_ACCESS" pragma.
This command-line option will be used to specify the default
state for "FENV_ACCESS".
-fsignaling-nans
Compile code assuming that IEEE signaling NaNs may generate
user-visible traps during floating-point operations. Setting
this option disables optimizations that may change the number
of exceptions visible with signaling NaNs. This option
implies -ftrapping-math
.
This option causes the preprocessor macro "__SUPPORT_SNAN__"
to be defined.
The default is -fno-signaling-nans
.
This option is experimental and does not currently guarantee
to disable all GCC optimizations that affect signaling NaN
behavior.
-fno-fp-int-builtin-inexact
Do not allow the built-in functions "ceil", "floor", "round"
and "trunc", and their "float" and "long double" variants, to
generate code that raises the "inexact" floating-point
exception for noninteger arguments. ISO C99 and C11 allow
these functions to raise the "inexact" exception, but ISO/IEC
TS 18661-1:2014, the C bindings to IEEE 754-2008, does not
allow these functions to do so.
The default is -ffp-int-builtin-inexact
, allowing the
exception to be raised. This option does nothing unless
-ftrapping-math
is in effect.
Even if -fno-fp-int-builtin-inexact
is used, if the functions
generate a call to a library function then the "inexact"
exception may be raised if the library implementation does
not follow TS 18661.
-fsingle-precision-constant
Treat floating-point constants as single precision instead of
implicitly converting them to double-precision constants.
-fcx-limited-range
When enabled, this option states that a range reduction step
is not needed when performing complex division. Also, there
is no checking whether the result of a complex multiplication
or division is "NaN + I*NaN", with an attempt to rescue the
situation in that case. The default is
-fno-cx-limited-range
, but is enabled by -ffast-math
.
This option controls the default setting of the ISO C99
"CX_LIMITED_RANGE" pragma. Nevertheless, the option applies
to all languages.
-fcx-fortran-rules
Complex multiplication and division follow Fortran rules.
Range reduction is done as part of complex division, but
there is no checking whether the result of a complex
multiplication or division is "NaN + I*NaN", with an attempt
to rescue the situation in that case.
The default is -fno-cx-fortran-rules
.
The following options control optimizations that may improve
performance, but are not enabled by any -O
options. This section
includes experimental options that may produce broken code.
-fbranch-probabilities
After running a program compiled with -fprofile-arcs
, you can
compile it a second time using -fbranch-probabilities
, to
improve optimizations based on the number of times each
branch was taken. When a program compiled with
-fprofile-arcs
exits, it saves arc execution counts to a file
called sourcename.gcda for each source file. The information
in this data file is very dependent on the structure of the
generated code, so you must use the same source code and the
same optimization options for both compilations.
With -fbranch-probabilities
, GCC puts a REG_BR_PROB
note on
each JUMP_INSN
and CALL_INSN
. These can be used to improve
optimization. Currently, they are only used in one place: in
reorg.c, instead of guessing which path a branch is most
likely to take, the REG_BR_PROB
values are used to exactly
determine which path is taken more often.
Enabled by -fprofile-use
and -fauto-profile
.
-fprofile-values
If combined with -fprofile-arcs
, it adds code so that some
data about values of expressions in the program is gathered.
With -fbranch-probabilities
, it reads back the data gathered
from profiling values of expressions for usage in
optimizations.
Enabled by -fprofile-generate
, -fprofile-use
, and
-fauto-profile
.
-fprofile-reorder-functions
Function reordering based on profile instrumentation collects
first time of execution of a function and orders these
functions in ascending order.
Enabled with -fprofile-use
.
-fvpt
If combined with -fprofile-arcs
, this option instructs the
compiler to add code to gather information about values of
expressions.
With -fbranch-probabilities
, it reads back the data gathered
and actually performs the optimizations based on them.
Currently the optimizations include specialization of
division operations using the knowledge about the value of
the denominator.
Enabled with -fprofile-use
and -fauto-profile
.
-frename-registers
Attempt to avoid false dependencies in scheduled code by
making use of registers left over after register allocation.
This optimization most benefits processors with lots of
registers. Depending on the debug information format adopted
by the target, however, it can make debugging impossible,
since variables no longer stay in a "home register".
Enabled by default with -funroll-loops
.
-fschedule-fusion
Performs a target dependent pass over the instruction stream
to schedule instructions of same type together because target
machine can execute them more efficiently if they are
adjacent to each other in the instruction flow.
Enabled at levels -O2
, -O3
, -Os
.
-ftracer
Perform tail duplication to enlarge superblock size. This
transformation simplifies the control flow of the function
allowing other optimizations to do a better job.
Enabled by -fprofile-use
and -fauto-profile
.
-funroll-loops
Unroll loops whose number of iterations can be determined at
compile time or upon entry to the loop. -funroll-loops
implies -frerun-cse-after-loop
, -fweb
and -frename-registers
.
It also turns on complete loop peeling (i.e. complete removal
of loops with a small constant number of iterations). This
option makes code larger, and may or may not make it run
faster.
Enabled by -fprofile-use
and -fauto-profile
.
-funroll-all-loops
Unroll all loops, even if their number of iterations is
uncertain when the loop is entered. This usually makes
programs run more slowly. -funroll-all-loops
implies the
same options as -funroll-loops
.
-fpeel-loops
Peels loops for which there is enough information that they
do not roll much (from profile feedback or static analysis).
It also turns on complete loop peeling (i.e. complete removal
of loops with small constant number of iterations).
Enabled by -O3
, -fprofile-use
, and -fauto-profile
.
-fmove-loop-invariants
Enables the loop invariant motion pass in the RTL loop
optimizer. Enabled at level -O1
and higher, except for -Og
.
-fsplit-loops
Split a loop into two if it contains a condition that's
always true for one side of the iteration space and false for
the other.
Enabled by -fprofile-use
and -fauto-profile
.
-funswitch-loops
Move branches with loop invariant conditions out of the loop,
with duplicates of the loop on both branches (modified
according to result of the condition).
Enabled by -fprofile-use
and -fauto-profile
.
-fversion-loops-for-strides
If a loop iterates over an array with a variable stride,
create another version of the loop that assumes the stride is
always one. For example:
for (int i = 0; i < n; ++i)
x[i * stride] = ...;
becomes:
if (stride == 1)
for (int i = 0; i < n; ++i)
x[i] = ...;
else
for (int i = 0; i < n; ++i)
x[i * stride] = ...;
This is particularly useful for assumed-shape arrays in
Fortran where (for example) it allows better vectorization
assuming contiguous accesses. This flag is enabled by
default at -O3
. It is also enabled by -fprofile-use
and
-fauto-profile
.
-ffunction-sections
-fdata-sections
Place each function or data item into its own section in the
output file if the target supports arbitrary sections. The
name of the function or the name of the data item determines
the section's name in the output file.
Use these options on systems where the linker can perform
optimizations to improve locality of reference in the
instruction space. Most systems using the ELF object format
have linkers with such optimizations. On AIX, the linker
rearranges sections (CSECTs) based on the call graph. The
performance impact varies.
Together with a linker garbage collection (linker
--gc-sections
option) these options may lead to smaller
statically-linked executables (after stripping).
On ELF/DWARF systems these options do not degenerate the
quality of the debug information. There could be issues with
other object files/debug info formats.
Only use these options when there are significant benefits
from doing so. When you specify these options, the assembler
and linker create larger object and executable files and are
also slower. These options affect code generation. They
prevent optimizations by the compiler and assembler using
relative locations inside a translation unit since the
locations are unknown until link time. An example of such an
optimization is relaxing calls to short call instructions.
-fbranch-target-load-optimize
Perform branch target register load optimization before
prologue / epilogue threading. The use of target registers
can typically be exposed only during reload, thus hoisting
loads out of loops and doing inter-block scheduling needs a
separate optimization pass.
-fbranch-target-load-optimize2
Perform branch target register load optimization after
prologue / epilogue threading.
-fbtr-bb-exclusive
When performing branch target register load optimization,
don't reuse branch target registers within any basic block.
-fstdarg-opt
Optimize the prologue of variadic argument functions with
respect to usage of those arguments.
-fsection-anchors
Try to reduce the number of symbolic address calculations by
using shared "anchor" symbols to address nearby objects.
This transformation can help to reduce the number of GOT
entries and GOT accesses on some targets.
For example, the implementation of the following function
"foo":
static int a, b, c;
int foo (void) { return a + b + c; }
usually calculates the addresses of all three variables, but
if you compile it with -fsection-anchors
, it accesses the
variables from a common anchor point instead. The effect is
similar to the following pseudocode (which isn't valid C):
int foo (void)
{
register int *xr = &x;
return xr[&a - &x] + xr[&b - &x] + xr[&c - &x];
}
Not all targets support this option.
--param
name=
value
In some places, GCC uses various constants to control the
amount of optimization that is done. For example, GCC does
not inline functions that contain more than a certain number
of instructions. You can control some of these constants on
the command line using the --param
option.
The names of specific parameters, and the meaning of the
values, are tied to the internals of the compiler, and are
subject to change without notice in future releases.
In order to get minimal, maximal and default value of a
parameter, one can use --help=param -Q
options.
In each case, the value is an integer. The allowable choices
for name are:
predictable-branch-outcome
When branch is predicted to be taken with probability
lower than this threshold (in percent), then it is
considered well predictable.
max-rtl-if-conversion-insns
RTL if-conversion tries to remove conditional branches
around a block and replace them with conditionally
executed instructions. This parameter gives the maximum
number of instructions in a block which should be
considered for if-conversion. The compiler will also use
other heuristics to decide whether if-conversion is
likely to be profitable.
max-rtl-if-conversion-predictable-cost
max-rtl-if-conversion-unpredictable-cost
RTL if-conversion will try to remove conditional branches
around a block and replace them with conditionally
executed instructions. These parameters give the maximum
permissible cost for the sequence that would be generated
by if-conversion depending on whether the branch is
statically determined to be predictable or not. The
units for this parameter are the same as those for the
GCC internal seq_cost metric. The compiler will try to
provide a reasonable default for this parameter using the
BRANCH_COST target macro.
max-crossjump-edges
The maximum number of incoming edges to consider for
cross-jumping. The algorithm used by -fcrossjumping
is
O(N^2) in the number of edges incoming to each block.
Increasing values mean more aggressive optimization,
making the compilation time increase with probably small
improvement in executable size.
min-crossjump-insns
The minimum number of instructions that must be matched
at the end of two blocks before cross-jumping is
performed on them. This value is ignored in the case
where all instructions in the block being cross-jumped
from are matched.
max-grow-copy-bb-insns
The maximum code size expansion factor when copying basic
blocks instead of jumping. The expansion is relative to
a jump instruction.
max-goto-duplication-insns
The maximum number of instructions to duplicate to a
block that jumps to a computed goto. To avoid O(N^2)
behavior in a number of passes, GCC factors computed
gotos early in the compilation process, and unfactors
them as late as possible. Only computed jumps at the end
of a basic blocks with no more than max-goto-duplication-
insns are unfactored.
max-delay-slot-insn-search
The maximum number of instructions to consider when
looking for an instruction to fill a delay slot. If more
than this arbitrary number of instructions are searched,
the time savings from filling the delay slot are minimal,
so stop searching. Increasing values mean more
aggressive optimization, making the compilation time
increase with probably small improvement in execution
time.
max-delay-slot-live-search
When trying to fill delay slots, the maximum number of
instructions to consider when searching for a block with
valid live register information. Increasing this
arbitrarily chosen value means more aggressive
optimization, increasing the compilation time. This
parameter should be removed when the delay slot code is
rewritten to maintain the control-flow graph.
max-gcse-memory
The approximate maximum amount of memory that can be
allocated in order to perform the global common
subexpression elimination optimization. If more memory
than specified is required, the optimization is not done.
max-gcse-insertion-ratio
If the ratio of expression insertions to deletions is
larger than this value for any expression, then RTL PRE
inserts or removes the expression and thus leaves
partially redundant computations in the instruction
stream.
max-pending-list-length
The maximum number of pending dependencies scheduling
allows before flushing the current state and starting
over. Large functions with few branches or calls can
create excessively large lists which needlessly consume
memory and resources.
max-modulo-backtrack-attempts
The maximum number of backtrack attempts the scheduler
should make when modulo scheduling a loop. Larger values
can exponentially increase compilation time.
max-inline-insns-single
Several parameters control the tree inliner used in GCC.
This number sets the maximum number of instructions
(counted in GCC's internal representation) in a single
function that the tree inliner considers for inlining.
This only affects functions declared inline and methods
implemented in a class declaration (C++).
max-inline-insns-auto
When you use -finline-functions
(included in -O3
), a lot
of functions that would otherwise not be considered for
inlining by the compiler are investigated. To those
functions, a different (more restrictive) limit compared
to functions declared inline can be applied.
max-inline-insns-small
This is bound applied to calls which are considered
relevant with -finline-small-functions
.
max-inline-insns-size
This is bound applied to calls which are optimized for
size. Small growth may be desirable to anticipate
optimization oppurtunities exposed by inlining.
uninlined-function-insns
Number of instructions accounted by inliner for function
overhead such as function prologue and epilogue.
uninlined-function-time
Extra time accounted by inliner for function overhead
such as time needed to execute function prologue and
epilogue
uninlined-thunk-insns
uninlined-thunk-time
Same as --param uninlined-function-insns
and --param
uninlined-function-time
but applied to function thunks
inline-min-speedup
When estimated performance improvement of caller + callee
runtime exceeds this threshold (in percent), the function
can be inlined regardless of the limit on --param max-
inline-insns-single
and --param max-inline-insns-auto
.
large-function-insns
The limit specifying really large functions. For
functions larger than this limit after inlining, inlining
is constrained by --param large-function-growth
. This
parameter is useful primarily to avoid extreme
compilation time caused by non-linear algorithms used by
the back end.
large-function-growth
Specifies maximal growth of large function caused by
inlining in percents. For example, parameter value 100
limits large function growth to 2.0 times the original
size.
large-unit-insns
The limit specifying large translation unit. Growth
caused by inlining of units larger than this limit is
limited by --param inline-unit-growth
. For small units
this might be too tight. For example, consider a unit
consisting of function A that is inline and B that just
calls A three times. If B is small relative to A, the
growth of unit is 300\% and yet such inlining is very
sane. For very large units consisting of small
inlineable functions, however, the overall unit growth
limit is needed to avoid exponential explosion of code
size. Thus for smaller units, the size is increased to
--param large-unit-insns
before applying --param inline-
unit-growth
.
inline-unit-growth
Specifies maximal overall growth of the compilation unit
caused by inlining. For example, parameter value 20
limits unit growth to 1.2 times the original size. Cold
functions (either marked cold via an attribute or by
profile feedback) are not accounted into the unit size.
ipcp-unit-growth
Specifies maximal overall growth of the compilation unit
caused by interprocedural constant propagation. For
example, parameter value 10 limits unit growth to 1.1
times the original size.
large-stack-frame
The limit specifying large stack frames. While inlining
the algorithm is trying to not grow past this limit too
much.
large-stack-frame-growth
Specifies maximal growth of large stack frames caused by
inlining in percents. For example, parameter value 1000
limits large stack frame growth to 11 times the original
size.
max-inline-insns-recursive
max-inline-insns-recursive-auto
Specifies the maximum number of instructions an out-of-
line copy of a self-recursive inline function can grow
into by performing recursive inlining.
--param max-inline-insns-recursive
applies to functions
declared inline. For functions not declared inline,
recursive inlining happens only when -finline-functions
(included in -O3
) is enabled; --param max-inline-insns-
recursive-auto
applies instead.
max-inline-recursive-depth
max-inline-recursive-depth-auto
Specifies the maximum recursion depth used for recursive
inlining.
--param max-inline-recursive-depth
applies to functions
declared inline. For functions not declared inline,
recursive inlining happens only when -finline-functions
(included in -O3
) is enabled; --param max-inline-
recursive-depth-auto
applies instead.
min-inline-recursive-probability
Recursive inlining is profitable only for function having
deep recursion in average and can hurt for function
having little recursion depth by increasing the prologue
size or complexity of function body to other optimizers.
When profile feedback is available (see
-fprofile-generate
) the actual recursion depth can be
guessed from the probability that function recurses via a
given call expression. This parameter limits inlining
only to call expressions whose probability exceeds the
given threshold (in percents).
early-inlining-insns
Specify growth that the early inliner can make. In
effect it increases the amount of inlining for code
having a large abstraction penalty.
max-early-inliner-iterations
Limit of iterations of the early inliner. This basically
bounds the number of nested indirect calls the early
inliner can resolve. Deeper chains are still handled by
late inlining.
comdat-sharing-probability
Probability (in percent) that C++ inline function with
comdat visibility are shared across multiple compilation
units.
profile-func-internal-id
A parameter to control whether to use function internal
id in profile database lookup. If the value is 0, the
compiler uses an id that is based on function assembler
name and filename, which makes old profile data more
tolerant to source changes such as function reordering
etc.
min-vect-loop-bound
The minimum number of iterations under which loops are
not vectorized when -ftree-vectorize
is used. The number
of iterations after vectorization needs to be greater
than the value specified by this option to allow
vectorization.
gcse-cost-distance-ratio
Scaling factor in calculation of maximum distance an
expression can be moved by GCSE optimizations. This is
currently supported only in the code hoisting pass. The
bigger the ratio, the more aggressive code hoisting is
with simple expressions, i.e., the expressions that have
cost less than gcse-unrestricted-cost
. Specifying 0
disables hoisting of simple expressions.
gcse-unrestricted-cost
Cost, roughly measured as the cost of a single typical
machine instruction, at which GCSE optimizations do not
constrain the distance an expression can travel. This is
currently supported only in the code hoisting pass. The
lesser the cost, the more aggressive code hoisting is.
Specifying 0 allows all expressions to travel
unrestricted distances.
max-hoist-depth
The depth of search in the dominator tree for expressions
to hoist. This is used to avoid quadratic behavior in
hoisting algorithm. The value of 0 does not limit on the
search, but may slow down compilation of huge functions.
max-tail-merge-comparisons
The maximum amount of similar bbs to compare a bb with.
This is used to avoid quadratic behavior in tree tail
merging.
max-tail-merge-iterations
The maximum amount of iterations of the pass over the
function. This is used to limit compilation time in tree
tail merging.
store-merging-allow-unaligned
Allow the store merging pass to introduce unaligned
stores if it is legal to do so.
max-stores-to-merge
The maximum number of stores to attempt to merge into
wider stores in the store merging pass.
max-unrolled-insns
The maximum number of instructions that a loop may have
to be unrolled. If a loop is unrolled, this parameter
also determines how many times the loop code is unrolled.
max-average-unrolled-insns
The maximum number of instructions biased by
probabilities of their execution that a loop may have to
be unrolled. If a loop is unrolled, this parameter also
determines how many times the loop code is unrolled.
max-unroll-times
The maximum number of unrollings of a single loop.
max-peeled-insns
The maximum number of instructions that a loop may have
to be peeled. If a loop is peeled, this parameter also
determines how many times the loop code is peeled.
max-peel-times
The maximum number of peelings of a single loop.
max-peel-branches
The maximum number of branches on the hot path through
the peeled sequence.
max-completely-peeled-insns
The maximum number of insns of a completely peeled loop.
max-completely-peel-times
The maximum number of iterations of a loop to be suitable
for complete peeling.
max-completely-peel-loop-nest-depth
The maximum depth of a loop nest suitable for complete
peeling.
max-unswitch-insns
The maximum number of insns of an unswitched loop.
max-unswitch-level
The maximum number of branches unswitched in a single
loop.
lim-expensive
The minimum cost of an expensive expression in the loop
invariant motion.
iv-consider-all-candidates-bound
Bound on number of candidates for induction variables,
below which all candidates are considered for each use in
induction variable optimizations. If there are more
candidates than this, only the most relevant ones are
considered to avoid quadratic time complexity.