Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   gcc    ( 1 )

компилятор C и C ++ проекта GNU (GNU project C and C++ compiler)

  Name  |  Synopsis  |  Description  |  Options  |    Options detail    |  Environment  |  Bugs  |  Note  |  See also  |

Параметры подробно (Options detail)


  Controlling the Kind of Output  |  Compiling C++ Programs  |  Controlling C Dialect  |  Controlling C++ Dialect  |  Controlling Objective-C and Objective-C++ Dialects  |  Control Diagnostic Messages Formatting  |  Request or Suppress Warnings 1  |  Request or Suppress Warnings 2  |  Request or Suppress Warnings 3  |  Debugging Your Program  |  Control Optimization 1  |    Control Optimization 2    |  Control Optimization 3  |  Control Optimization 4  |  Program Instrumentation  |  Controlling the Preprocessor  |  Linking  |  Directory Search  |  Code Generation Conventions  |  GCC Developer  |  Machine-Dependent  |  AArch64  |  Adapteva Epiphany  |  AMD GCN  |  ARC  |  ARM  |  AVR  |  Blackfin  |  C6X  |  CRIS  |  CR16  |  C-SKY  |  Darwin  |  DEC Alpha  |  FR30  |  FT32  |  FRV  |  GNU/Linux  |  H8/300  |  HPPA  |  IA-64  |  LM32  |  M32C  |  M32R/D  |  M680x0  |  MCore  |  MeP  |  MicroBlaze  |  MIPS  |  MMIX  |  MN10300  |  Moxie  |  MSP430  |  NDS32  |  Nios II  |  Nvidia PTX  |  OpenRISC  |  PDP-11  |  picoChip  |  RISC-V  |  RL78  |  IBM RS/6000 and PowerPC  |  RX  |  S/390 and zSeries  |  Score  |  SH  |  Solaris 2  |  SPARC  |  SPU  |  System V  |  TILE-Gx  |  TILEPro  |  V850  |  VAX  |  Visium  |  VMS  |  VxWorks  |  x86 1  |  x86 2  |  x86 Windows  |  Xstormy16  |  Xtensa  |

Control Optimization - 2

-fipa-cp
           Perform interprocedural constant propagation.  This
           optimization analyzes the program to determine when values
           passed to functions are constants and then optimizes
           accordingly.  This optimization can substantially increase
           performance if the application has constants passed to
           functions.  This flag is enabled by default at -O2, -Os and
           -O3.  It is also enabled by -fprofile-use and -fauto-profile.

       -fipa-cp-clone
           Perform function cloning to make interprocedural constant
           propagation stronger.  When enabled, interprocedural constant
           propagation performs function cloning when externally visible
           function can be called with constant arguments.  Because this
           optimization can create multiple copies of functions, it may
           significantly increase code size (see --param
           ipcp-unit-growth=value).  This flag is enabled by default at
           -O3.  It is also enabled by -fprofile-use and -fauto-profile.

       -fipa-bit-cp
           When enabled, perform interprocedural bitwise constant
           propagation. This flag is enabled by default at -O2 and by
           -fprofile-use and -fauto-profile.  It requires that -fipa-cp
           is enabled.

       -fipa-vrp
           When enabled, perform interprocedural propagation of value
           ranges. This flag is enabled by default at -O2. It requires
           that -fipa-cp is enabled.

       -fipa-icf
           Perform Identical Code Folding for functions and read-only
           variables.  The optimization reduces code size and may
           disturb unwind stacks by replacing a function by equivalent
           one with a different name. The optimization works more
           effectively with link-time optimization enabled.

           Although the behavior is similar to the Gold Linker's ICF
           optimization, GCC ICF works on different levels and thus the
           optimizations are not same - there are equivalences that are
           found only by GCC and equivalences found only by Gold.

           This flag is enabled by default at -O2 and -Os.

       -flive-patching=level
           Control GCC's optimizations to produce output suitable for
           live-patching.

           If the compiler's optimization uses a function's body or
           information extracted from its body to optimize/change
           another function, the latter is called an impacted function
           of the former.  If a function is patched, its impacted
           functions should be patched too.

           The impacted functions are determined by the compiler's
           interprocedural optimizations.  For example, a caller is
           impacted when inlining a function into its caller, cloning a
           function and changing its caller to call this new clone, or
           extracting a function's pureness/constness information to
           optimize its direct or indirect callers, etc.

           Usually, the more IPA optimizations enabled, the larger the
           number of impacted functions for each function.  In order to
           control the number of impacted functions and more easily
           compute the list of impacted function, IPA optimizations can
           be partially enabled at two different levels.

           The level argument should be one of the following:

           inline-clone
               Only enable inlining and cloning optimizations, which
               includes inlining, cloning, interprocedural scalar
               replacement of aggregates and partial inlining.  As a
               result, when patching a function, all its callers and its
               clones' callers are impacted, therefore need to be
               patched as well.

               -flive-patching=inline-clone disables the following
               optimization flags: -fwhole-program  -fipa-pta
               -fipa-reference  -fipa-ra -fipa-icf  -fipa-icf-functions
               -fipa-icf-variables -fipa-bit-cp  -fipa-vrp
               -fipa-pure-const  -fipa-reference-addressable
               -fipa-stack-alignment

           inline-only-static
               Only enable inlining of static functions.  As a result,
               when patching a static function, all its callers are
               impacted and so need to be patched as well.

               In addition to all the flags that
               -flive-patching=inline-clone disables,
               -flive-patching=inline-only-static disables the following
               additional optimization flags: -fipa-cp-clone  -fipa-sra
               -fpartial-inlining  -fipa-cp

           When -flive-patching is specified without any value, the
           default value is inline-clone.

           This flag is disabled by default.

           Note that -flive-patching is not supported with link-time
           optimization (-flto).

       -fisolate-erroneous-paths-dereference
           Detect paths that trigger erroneous or undefined behavior due
           to dereferencing a null pointer.  Isolate those paths from
           the main control flow and turn the statement with erroneous
           or undefined behavior into a trap.  This flag is enabled by
           default at -O2 and higher and depends on
           -fdelete-null-pointer-checks also being enabled.

       -fisolate-erroneous-paths-attribute
           Detect paths that trigger erroneous or undefined behavior due
           to a null value being used in a way forbidden by a
           "returns_nonnull" or "nonnull" attribute.  Isolate those
           paths from the main control flow and turn the statement with
           erroneous or undefined behavior into a trap.  This is not
           currently enabled, but may be enabled by -O2 in the future.

       -ftree-sink
           Perform forward store motion on trees.  This flag is enabled
           by default at -O and higher.

       -ftree-bit-ccp
           Perform sparse conditional bit constant propagation on trees
           and propagate pointer alignment information.  This pass only
           operates on local scalar variables and is enabled by default
           at -O1 and higher, except for -Og.  It requires that
           -ftree-ccp is enabled.

       -ftree-ccp
           Perform sparse conditional constant propagation (CCP) on
           trees.  This pass only operates on local scalar variables and
           is enabled by default at -O and higher.

       -fssa-backprop
           Propagate information about uses of a value up the definition
           chain in order to simplify the definitions.  For example,
           this pass strips sign operations if the sign of a value never
           matters.  The flag is enabled by default at -O and higher.

       -fssa-phiopt
           Perform pattern matching on SSA PHI nodes to optimize
           conditional code.  This pass is enabled by default at -O1 and
           higher, except for -Og.

       -ftree-switch-conversion
           Perform conversion of simple initializations in a switch to
           initializations from a scalar array.  This flag is enabled by
           default at -O2 and higher.

       -ftree-tail-merge
           Look for identical code sequences.  When found, replace one
           with a jump to the other.  This optimization is known as tail
           merging or cross jumping.  This flag is enabled by default at
           -O2 and higher.  The compilation time in this pass can be
           limited using max-tail-merge-comparisons parameter and max-
           tail-merge-iterations parameter.

       -ftree-dce
           Perform dead code elimination (DCE) on trees.  This flag is
           enabled by default at -O and higher.

       -ftree-builtin-call-dce
           Perform conditional dead code elimination (DCE) for calls to
           built-in functions that may set "errno" but are otherwise
           free of side effects.  This flag is enabled by default at -O2
           and higher if -Os is not also specified.

       -ftree-dominator-opts
           Perform a variety of simple scalar cleanups (constant/copy
           propagation, redundancy elimination, range propagation and
           expression simplification) based on a dominator tree
           traversal.  This also performs jump threading (to reduce
           jumps to jumps). This flag is enabled by default at -O and
           higher.

       -ftree-dse
           Perform dead store elimination (DSE) on trees.  A dead store
           is a store into a memory location that is later overwritten
           by another store without any intervening loads.  In this case
           the earlier store can be deleted.  This flag is enabled by
           default at -O and higher.

       -ftree-ch
           Perform loop header copying on trees.  This is beneficial
           since it increases effectiveness of code motion
           optimizations.  It also saves one jump.  This flag is enabled
           by default at -O and higher.  It is not enabled for -Os,
           since it usually increases code size.

       -ftree-loop-optimize
           Perform loop optimizations on trees.  This flag is enabled by
           default at -O and higher.

       -ftree-loop-linear
       -floop-strip-mine
       -floop-block
           Perform loop nest optimizations.  Same as
           -floop-nest-optimize.  To use this code transformation, GCC
           has to be configured with --with-isl to enable the Graphite
           loop transformation infrastructure.

       -fgraphite-identity
           Enable the identity transformation for graphite.  For every
           SCoP we generate the polyhedral representation and transform
           it back to gimple.  Using -fgraphite-identity we can check
           the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE
           transformation.  Some minimal optimizations are also
           performed by the code generator isl, like index splitting and
           dead code elimination in loops.

       -floop-nest-optimize
           Enable the isl based loop nest optimizer.  This is a generic
           loop nest optimizer based on the Pluto optimization
           algorithms.  It calculates a loop structure optimized for
           data-locality and parallelism.  This option is experimental.

       -floop-parallelize-all
           Use the Graphite data dependence analysis to identify loops
           that can be parallelized.  Parallelize all the loops that can
           be analyzed to not contain loop carried dependences without
           checking that it is profitable to parallelize the loops.

       -ftree-coalesce-vars
           While transforming the program out of the SSA representation,
           attempt to reduce copying by coalescing versions of different
           user-defined variables, instead of just compiler temporaries.
           This may severely limit the ability to debug an optimized
           program compiled with -fno-var-tracking-assignments.  In the
           negated form, this flag prevents SSA coalescing of user
           variables.  This option is enabled by default if optimization
           is enabled, and it does very little otherwise.

       -ftree-loop-if-convert
           Attempt to transform conditional jumps in the innermost loops
           to branch-less equivalents.  The intent is to remove control-
           flow from the innermost loops in order to improve the ability
           of the vectorization pass to handle these loops.  This is
           enabled by default if vectorization is enabled.

       -ftree-loop-distribution
           Perform loop distribution.  This flag can improve cache
           performance on big loop bodies and allow further loop
           optimizations, like parallelization or vectorization, to take
           place.  For example, the loop

                   DO I = 1, N
                     A(I) = B(I) + C
                     D(I) = E(I) * F
                   ENDDO

           is transformed to

                   DO I = 1, N
                      A(I) = B(I) + C
                   ENDDO
                   DO I = 1, N
                      D(I) = E(I) * F
                   ENDDO

           This flag is enabled by default at -O3.  It is also enabled
           by -fprofile-use and -fauto-profile.

       -ftree-loop-distribute-patterns
           Perform loop distribution of patterns that can be code
           generated with calls to a library.  This flag is enabled by
           default at -O3, and by -fprofile-use and -fauto-profile.

           This pass distributes the initialization loops and generates
           a call to memset zero.  For example, the loop

                   DO I = 1, N
                     A(I) = 0
                     B(I) = A(I) + I
                   ENDDO

           is transformed to

                   DO I = 1, N
                      A(I) = 0
                   ENDDO
                   DO I = 1, N
                      B(I) = A(I) + I
                   ENDDO

           and the initialization loop is transformed into a call to
           memset zero.  This flag is enabled by default at -O3.  It is
           also enabled by -fprofile-use and -fauto-profile.

       -floop-interchange
           Perform loop interchange outside of graphite.  This flag can
           improve cache performance on loop nest and allow further loop
           optimizations, like vectorization, to take place.  For
           example, the loop

                   for (int i = 0; i < N; i++)
                     for (int j = 0; j < N; j++)
                       for (int k = 0; k < N; k++)
                         c[i][j] = c[i][j] + a[i][k]*b[k][j];

           is transformed to

                   for (int i = 0; i < N; i++)
                     for (int k = 0; k < N; k++)
                       for (int j = 0; j < N; j++)
                         c[i][j] = c[i][j] + a[i][k]*b[k][j];

           This flag is enabled by default at -O3.  It is also enabled
           by -fprofile-use and -fauto-profile.

       -floop-unroll-and-jam
           Apply unroll and jam transformations on feasible loops.  In a
           loop nest this unrolls the outer loop by some factor and
           fuses the resulting multiple inner loops.  This flag is
           enabled by default at -O3.  It is also enabled by
           -fprofile-use and -fauto-profile.

       -ftree-loop-im
           Perform loop invariant motion on trees.  This pass moves only
           invariants that are hard to handle at RTL level (function
           calls, operations that expand to nontrivial sequences of
           insns).  With -funswitch-loops it also moves operands of
           conditions that are invariant out of the loop, so that we can
           use just trivial invariantness analysis in loop unswitching.
           The pass also includes store motion.

       -ftree-loop-ivcanon
           Create a canonical counter for number of iterations in loops
           for which determining number of iterations requires
           complicated analysis.  Later optimizations then may determine
           the number easily.  Useful especially in connection with
           unrolling.

       -ftree-scev-cprop
           Perform final value replacement.  If a variable is modified
           in a loop in such a way that its value when exiting the loop
           can be determined using only its initial value and the number
           of loop iterations, replace uses of the final value by such a
           computation, provided it is sufficiently cheap.  This reduces
           data dependencies and may allow further simplifications.
           Enabled by default at -O and higher.

       -fivopts
           Perform induction variable optimizations (strength reduction,
           induction variable merging and induction variable
           elimination) on trees.

       -ftree-parallelize-loops=n
           Parallelize loops, i.e., split their iteration space to run
           in n threads.  This is only possible for loops whose
           iterations are independent and can be arbitrarily reordered.
           The optimization is only profitable on multiprocessor
           machines, for loops that are CPU-intensive, rather than
           constrained e.g. by memory bandwidth.  This option implies
           -pthread, and thus is only supported on targets that have
           support for -pthread.

       -ftree-pta
           Perform function-local points-to analysis on trees.  This
           flag is enabled by default at -O1 and higher, except for -Og.

       -ftree-sra
           Perform scalar replacement of aggregates.  This pass replaces
           structure references with scalars to prevent committing
           structures to memory too early.  This flag is enabled by
           default at -O1 and higher, except for -Og.

       -fstore-merging
           Perform merging of narrow stores to consecutive memory
           addresses.  This pass merges contiguous stores of immediate
           values narrower than a word into fewer wider stores to reduce
           the number of instructions.  This is enabled by default at
           -O2 and higher as well as -Os.

       -ftree-ter
           Perform temporary expression replacement during the
           SSA->normal phase.  Single use/single def temporaries are
           replaced at their use location with their defining
           expression.  This results in non-GIMPLE code, but gives the
           expanders much more complex trees to work on resulting in
           better RTL generation.  This is enabled by default at -O and
           higher.

       -ftree-slsr
           Perform straight-line strength reduction on trees.  This
           recognizes related expressions involving multiplications and
           replaces them by less expensive calculations when possible.
           This is enabled by default at -O and higher.

       -ftree-vectorize
           Perform vectorization on trees. This flag enables
           -ftree-loop-vectorize and -ftree-slp-vectorize if not
           explicitly specified.

       -ftree-loop-vectorize
           Perform loop vectorization on trees. This flag is enabled by
           default at -O3 and by -ftree-vectorize, -fprofile-use, and
           -fauto-profile.

       -ftree-slp-vectorize
           Perform basic block vectorization on trees. This flag is
           enabled by default at -O3 and by -ftree-vectorize,
           -fprofile-use, and -fauto-profile.

       -fvect-cost-model=model
           Alter the cost model used for vectorization.  The model
           argument should be one of unlimited, dynamic or cheap.  With
           the unlimited model the vectorized code-path is assumed to be
           profitable while with the dynamic model a runtime check
           guards the vectorized code-path to enable it only for
           iteration counts that will likely execute faster than when
           executing the original scalar loop.  The cheap model disables
           vectorization of loops where doing so would be cost
           prohibitive for example due to required runtime checks for
           data dependence or alignment but otherwise is equal to the
           dynamic model.  The default cost model depends on other
           optimization flags and is either dynamic or cheap.

       -fsimd-cost-model=model
           Alter the cost model used for vectorization of loops marked
           with the OpenMP simd directive.  The model argument should be
           one of unlimited, dynamic, cheap.  All values of model have
           the same meaning as described in -fvect-cost-model and by
           default a cost model defined with -fvect-cost-model is used.

       -ftree-vrp
           Perform Value Range Propagation on trees.  This is similar to
           the constant propagation pass, but instead of values, ranges
           of values are propagated.  This allows the optimizers to
           remove unnecessary range checks like array bound checks and
           null pointer checks.  This is enabled by default at -O2 and
           higher.  Null pointer check elimination is only done if
           -fdelete-null-pointer-checks is enabled.

       -fsplit-paths
           Split paths leading to loop backedges.  This can improve dead
           code elimination and common subexpression elimination.  This
           is enabled by default at -O3 and above.

       -fsplit-ivs-in-unroller
           Enables expression of values of induction variables in later
           iterations of the unrolled loop using the value in the first
           iteration.  This breaks long dependency chains, thus
           improving efficiency of the scheduling passes.

           A combination of -fweb and CSE is often sufficient to obtain
           the same effect.  However, that is not reliable in cases
           where the loop body is more complicated than a single basic
           block.  It also does not work at all on some architectures
           due to restrictions in the CSE pass.

           This optimization is enabled by default.

       -fvariable-expansion-in-unroller
           With this option, the compiler creates multiple copies of
           some local variables when unrolling a loop, which can result
           in superior code.

       -fpartial-inlining
           Inline parts of functions.  This option has any effect only
           when inlining itself is turned on by the -finline-functions
           or -finline-small-functions options.

           Enabled at levels -O2, -O3, -Os.

       -fpredictive-commoning
           Perform predictive commoning optimization, i.e., reusing
           computations (especially memory loads and stores) performed
           in previous iterations of loops.

           This option is enabled at level -O3.  It is also enabled by
           -fprofile-use and -fauto-profile.

       -fprefetch-loop-arrays
           If supported by the target machine, generate instructions to
           prefetch memory to improve the performance of loops that
           access large arrays.

           This option may generate better or worse code; results are
           highly dependent on the structure of loops within the source
           code.

           Disabled at level -Os.

       -fno-printf-return-value
           Do not substitute constants for known return value of
           formatted output functions such as "sprintf", "snprintf",
           "vsprintf", and "vsnprintf" (but not "printf" of "fprintf").
           This transformation allows GCC to optimize or even eliminate
           branches based on the known return value of these functions
           called with arguments that are either constant, or whose
           values are known to be in a range that makes determining the
           exact return value possible.  For example, when
           -fprintf-return-value is in effect, both the branch and the
           body of the "if" statement (but not the call to "snprint")
           can be optimized away when "i" is a 32-bit or smaller integer
           because the return value is guaranteed to be at most 8.

                   char buf[9];
                   if (snprintf (buf, "%08x", i) >= sizeof buf)
                     ...

           The -fprintf-return-value option relies on other
           optimizations and yields best results with -O2 and above.  It
           works in tandem with the -Wformat-overflow and
           -Wformat-truncation options.  The -fprintf-return-value
           option is enabled by default.

       -fno-peephole
       -fno-peephole2
           Disable any machine-specific peephole optimizations.  The
           difference between -fno-peephole and -fno-peephole2 is in how
           they are implemented in the compiler; some targets use one,
           some use the other, a few use both.

           -fpeephole is enabled by default.  -fpeephole2 enabled at
           levels -O2, -O3, -Os.

       -fno-guess-branch-probability
           Do not guess branch probabilities using heuristics.

           GCC uses heuristics to guess branch probabilities if they are
           not provided by profiling feedback (-fprofile-arcs).  These
           heuristics are based on the control flow graph.  If some
           branch probabilities are specified by "__builtin_expect",
           then the heuristics are used to guess branch probabilities
           for the rest of the control flow graph, taking the
           "__builtin_expect" info into account.  The interactions
           between the heuristics and "__builtin_expect" can be complex,
           and in some cases, it may be useful to disable the heuristics
           so that the effects of "__builtin_expect" are easier to
           understand.

           It is also possible to specify expected probability of the
           expression with "__builtin_expect_with_probability" built-in
           function.

           The default is -fguess-branch-probability at levels -O, -O2,
           -O3, -Os.

       -freorder-blocks
           Reorder basic blocks in the compiled function in order to
           reduce number of taken branches and improve code locality.

           Enabled at levels -O, -O2, -O3, -Os.

       -freorder-blocks-algorithm=algorithm
           Use the specified algorithm for basic block reordering.  The
           algorithm argument can be simple, which does not increase
           code size (except sometimes due to secondary effects like
           alignment), or stc, the "software trace cache" algorithm,
           which tries to put all often executed code together,
           minimizing the number of branches executed by making extra
           copies of code.

           The default is simple at levels -O, -Os, and stc at levels
           -O2, -O3.

       -freorder-blocks-and-partition
           In addition to reordering basic blocks in the compiled
           function, in order to reduce number of taken branches,
           partitions hot and cold basic blocks into separate sections
           of the assembly and .o files, to improve paging and cache
           locality performance.

           This optimization is automatically turned off in the presence
           of exception handling or unwind tables (on targets using
           setjump/longjump or target specific scheme), for linkonce
           sections, for functions with a user-defined section attribute
           and on any architecture that does not support named sections.
           When -fsplit-stack is used this option is not enabled by
           default (to avoid linker errors), but may be enabled
           explicitly (if using a working linker).

           Enabled for x86 at levels -O2, -O3, -Os.

       -freorder-functions
           Reorder functions in the object file in order to improve code
           locality.  This is implemented by using special subsections
           ".text.hot" for most frequently executed functions and
           ".text.unlikely" for unlikely executed functions.  Reordering
           is done by the linker so object file format must support
           named sections and linker must place them in a reasonable
           way.

           This option isn't effective unless you either provide profile
           feedback (see -fprofile-arcs for details) or manually
           annotate functions with "hot" or "cold" attributes.

           Enabled at levels -O2, -O3, -Os.

       -fstrict-aliasing
           Allow the compiler to assume the strictest aliasing rules
           applicable to the language being compiled.  For C (and C++),
           this activates optimizations based on the type of
           expressions.  In particular, an object of one type is assumed
           never to reside at the same address as an object of a
           different type, unless the types are almost the same.  For
           example, an "unsigned int" can alias an "int", but not a
           "void*" or a "double".  A character type may alias any other
           type.

           Pay special attention to code like this:

                   union a_union {
                     int i;
                     double d;
                   };

                   int f() {
                     union a_union t;
                     t.d = 3.0;
                     return t.i;
                   }

           The practice of reading from a different union member than
           the one most recently written to (called "type-punning") is
           common.  Even with -fstrict-aliasing, type-punning is
           allowed, provided the memory is accessed through the union
           type.  So, the code above works as expected.    However, this
           code might not:

                   int f() {
                     union a_union t;
                     int* ip;
                     t.d = 3.0;
                     ip = &t.i;
                     return *ip;
                   }

           Similarly, access by taking the address, casting the
           resulting pointer and dereferencing the result has undefined
           behavior, even if the cast uses a union type, e.g.:

                   int f() {
                     double d = 3.0;
                     return ((union a_union *) &d)->i;
                   }

           The -fstrict-aliasing option is enabled at levels -O2, -O3,
           -Os.

       -falign-functions
       -falign-functions=n
       -falign-functions=n:m
       -falign-functions=n:m:n2
       -falign-functions=n:m:n2:m2
           Align the start of functions to the next power-of-two greater
           than n, skipping up to m-1 bytes.  This ensures that at least
           the first m bytes of the function can be fetched by the CPU
           without crossing an n-byte alignment boundary.

           If m is not specified, it defaults to n.

           Examples: -falign-functions=32 aligns functions to the next
           32-byte boundary, -falign-functions=24 aligns to the next
           32-byte boundary only if this can be done by skipping 23
           bytes or less, -falign-functions=32:7 aligns to the next
           32-byte boundary only if this can be done by skipping 6 bytes
           or less.

           The second pair of n2:m2 values allows you to specify a
           secondary alignment: -falign-functions=64:7:32:3 aligns to
           the next 64-byte boundary if this can be done by skipping 6
           bytes or less, otherwise aligns to the next 32-byte boundary
           if this can be done by skipping 2 bytes or less.  If m2 is
           not specified, it defaults to n2.

           Some assemblers only support this flag when n is a power of
           two; in that case, it is rounded up.

           -fno-align-functions and -falign-functions=1 are equivalent
           and mean that functions are not aligned.

           If n is not specified or is zero, use a machine-dependent
           default.  The maximum allowed n option value is 65536.

           Enabled at levels -O2, -O3.

       -flimit-function-alignment
           If this option is enabled, the compiler tries to avoid
           unnecessarily overaligning functions. It attempts to instruct
           the assembler to align by the amount specified by
           -falign-functions, but not to skip more bytes than the size
           of the function.

       -falign-labels
       -falign-labels=n
       -falign-labels=n:m
       -falign-labels=n:m:n2
       -falign-labels=n:m:n2:m2
           Align all branch targets to a power-of-two boundary.

           Parameters of this option are analogous to the
           -falign-functions option.  -fno-align-labels and
           -falign-labels=1 are equivalent and mean that labels are not
           aligned.

           If -falign-loops or -falign-jumps are applicable and are
           greater than this value, then their values are used instead.

           If n is not specified or is zero, use a machine-dependent
           default which is very likely to be 1, meaning no alignment.
           The maximum allowed n option value is 65536.

           Enabled at levels -O2, -O3.

       -falign-loops
       -falign-loops=n
       -falign-loops=n:m
       -falign-loops=n:m:n2
       -falign-loops=n:m:n2:m2
           Align loops to a power-of-two boundary.  If the loops are
           executed many times, this makes up for any execution of the
           dummy padding instructions.

           Parameters of this option are analogous to the
           -falign-functions option.  -fno-align-loops and
           -falign-loops=1 are equivalent and mean that loops are not
           aligned.  The maximum allowed n option value is 65536.

           If n is not specified or is zero, use a machine-dependent
           default.

           Enabled at levels -O2, -O3.

       -falign-jumps
       -falign-jumps=n
       -falign-jumps=n:m
       -falign-jumps=n:m:n2
       -falign-jumps=n:m:n2:m2
           Align branch targets to a power-of-two boundary, for branch
           targets where the targets can only be reached by jumping.  In
           this case, no dummy operations need be executed.

           Parameters of this option are analogous to the
           -falign-functions option.  -fno-align-jumps and
           -falign-jumps=1 are equivalent and mean that loops are not
           aligned.

           If n is not specified or is zero, use a machine-dependent
           default.  The maximum allowed n option value is 65536.

           Enabled at levels -O2, -O3.

       -funit-at-a-time
           This option is left for compatibility reasons.
           -funit-at-a-time has no effect, while -fno-unit-at-a-time
           implies -fno-toplevel-reorder and -fno-section-anchors.

           Enabled by default.

       -fno-toplevel-reorder
           Do not reorder top-level functions, variables, and "asm"
           statements.  Output them in the same order that they appear
           in the input file.  When this option is used, unreferenced
           static variables are not removed.  This option is intended to
           support existing code that relies on a particular ordering.
           For new code, it is better to use attributes when possible.

           -ftoplevel-reorder is the default at -O1 and higher, and also
           at -O0 if -fsection-anchors is explicitly requested.
           Additionally -fno-toplevel-reorder implies
           -fno-section-anchors.

       -fweb
           Constructs webs as commonly used for register allocation
           purposes and assign each web individual pseudo register.
           This allows the register allocation pass to operate on
           pseudos directly, but also strengthens several other
           optimization passes, such as CSE, loop optimizer and trivial
           dead code remover.  It can, however, make debugging
           impossible, since variables no longer stay in a "home
           register".

           Enabled by default with -funroll-loops.

       -fwhole-program
           Assume that the current compilation unit represents the whole
           program being compiled.  All public functions and variables
           with the exception of "main" and those merged by attribute
           "externally_visible" become static functions and in effect
           are optimized more aggressively by interprocedural
           optimizers.

           This option should not be used in combination with -flto.
           Instead relying on a linker plugin should provide safer and
           more precise information.

       -flto[=n]
           This option runs the standard link-time optimizer.  When
           invoked with source code, it generates GIMPLE (one of GCC's
           internal representations) and writes it to special ELF
           sections in the object file.  When the object files are
           linked together, all the function bodies are read from these
           ELF sections and instantiated as if they had been part of the
           same translation unit.

           To use the link-time optimizer, -flto and optimization
           options should be specified at compile time and during the
           final link.  It is recommended that you compile all the files
           participating in the same link with the same options and also
           specify those options at link time.  For example:

                   gcc -c -O2 -flto foo.c
                   gcc -c -O2 -flto bar.c
                   gcc -o myprog -flto -O2 foo.o bar.o

           The first two invocations to GCC save a bytecode
           representation of GIMPLE into special ELF sections inside
           foo.o and bar.o.  The final invocation reads the GIMPLE
           bytecode from foo.o and bar.o, merges the two files into a
           single internal image, and compiles the result as usual.
           Since both foo.o and bar.o are merged into a single image,
           this causes all the interprocedural analyses and
           optimizations in GCC to work across the two files as if they
           were a single one.  This means, for example, that the inliner
           is able to inline functions in bar.o into functions in foo.o
           and vice-versa.

           Another (simpler) way to enable link-time optimization is:

                   gcc -o myprog -flto -O2 foo.c bar.c

           The above generates bytecode for foo.c and bar.c, merges them
           together into a single GIMPLE representation and optimizes
           them as usual to produce myprog.

           The important thing to keep in mind is that to enable link-
           time optimizations you need to use the GCC driver to perform
           the link step.  GCC automatically performs link-time
           optimization if any of the objects involved were compiled
           with the -flto command-line option.  You can always override
           the automatic decision to do link-time optimization by
           passing -fno-lto to the link command.

           To make whole program optimization effective, it is necessary
           to make certain whole program assumptions.  The compiler
           needs to know what functions and variables can be accessed by
           libraries and runtime outside of the link-time optimized
           unit.  When supported by the linker, the linker plugin (see
           -fuse-linker-plugin) passes information to the compiler about
           used and externally visible symbols.  When the linker plugin
           is not available, -fwhole-program should be used to allow the
           compiler to make these assumptions, which leads to more
           aggressive optimization decisions.

           When a file is compiled with -flto without
           -fuse-linker-plugin, the generated object file is larger than
           a regular object file because it contains GIMPLE bytecodes
           and the usual final code (see -ffat-lto-objects.  This means
           that object files with LTO information can be linked as
           normal object files; if -fno-lto is passed to the linker, no
           interprocedural optimizations are applied.  Note that when
           -fno-fat-lto-objects is enabled the compile stage is faster
           but you cannot perform a regular, non-LTO link on them.

           When producing the final binary, GCC only applies link-time
           optimizations to those files that contain bytecode.
           Therefore, you can mix and match object files and libraries
           with GIMPLE bytecodes and final object code.  GCC
           automatically selects which files to optimize in LTO mode and
           which files to link without further processing.

           Generally, options specified at link time override those
           specified at compile time, although in some cases GCC
           attempts to infer link-time options from the settings used to
           compile the input files.

           If you do not specify an optimization level option -O at link
           time, then GCC uses the highest optimization level used when
           compiling the object files.  Note that it is generally
           ineffective to specify an optimization level option only at
           link time and not at compile time, for two reasons.  First,
           compiling without optimization suppresses compiler passes
           that gather information needed for effective optimization at
           link time.  Second, some early optimization passes can be
           performed only at compile time and not at link time.

           There are some code generation flags preserved by GCC when
           generating bytecodes, as they need to be used during the
           final link.  Currently, the following options and their
           settings are taken from the first object file that explicitly
           specifies them: -fPIC, -fpic, -fpie, -fcommon, -fexceptions,
           -fnon-call-exceptions, -fgnu-tm and all the -m target flags.

           Certain ABI-changing flags are required to match in all
           compilation units, and trying to override this at link time
           with a conflicting value is ignored.  This includes options
           such as -freg-struct-return and -fpcc-struct-return.

           Other options such as -ffp-contract, -fno-strict-overflow,
           -fwrapv, -fno-trapv or -fno-strict-aliasing are passed
           through to the link stage and merged conservatively for
           conflicting translation units.  Specifically
           -fno-strict-overflow, -fwrapv and -fno-trapv take precedence;
           and for example -ffp-contract=off takes precedence over
           -ffp-contract=fast.  You can override them at link time.

           When you need to pass options to the assembler via -Wa or
           -Xassembler make sure to either compile such translation
           units with -fno-lto or consistently use the same assembler
           options on all translation units.  You can alternatively also
           specify assembler options at LTO link time.

           If LTO encounters objects with C linkage declared with
           incompatible types in separate translation units to be linked
           together (undefined behavior according to ISO C99 6.2.7), a
           non-fatal diagnostic may be issued.  The behavior is still
           undefined at run time.  Similar diagnostics may be raised for
           other languages.

           Another feature of LTO is that it is possible to apply
           interprocedural optimizations on files written in different
           languages:

                   gcc -c -flto foo.c
                   g++ -c -flto bar.cc
                   gfortran -c -flto baz.f90
                   g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran

           Notice that the final link is done with g++ to get the C++
           runtime libraries and -lgfortran is added to get the Fortran
           runtime libraries.  In general, when mixing languages in LTO
           mode, you should use the same link command options as when
           mixing languages in a regular (non-LTO) compilation.

           If object files containing GIMPLE bytecode are stored in a
           library archive, say libfoo.a, it is possible to extract and
           use them in an LTO link if you are using a linker with plugin
           support.  To create static libraries suitable for LTO, use
           gcc-ar and gcc-ranlib instead of ar and ranlib; to show the
           symbols of object files with GIMPLE bytecode, use gcc-nm.
           Those commands require that ar, ranlib and nm have been
           compiled with plugin support.  At link time, use the flag
           -fuse-linker-plugin to ensure that the library participates
           in the LTO optimization process:

                   gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo

           With the linker plugin enabled, the linker extracts the
           needed GIMPLE files from libfoo.a and passes them on to the
           running GCC to make them part of the aggregated GIMPLE image
           to be optimized.

           If you are not using a linker with plugin support and/or do
           not enable the linker plugin, then the objects inside
           libfoo.a are extracted and linked as usual, but they do not
           participate in the LTO optimization process.  In order to
           make a static library suitable for both LTO optimization and
           usual linkage, compile its object files with -flto
           -ffat-lto-objects.

           Link-time optimizations do not require the presence of the
           whole program to operate.  If the program does not require
           any symbols to be exported, it is possible to combine -flto
           and -fwhole-program to allow the interprocedural optimizers
           to use more aggressive assumptions which may lead to improved
           optimization opportunities.  Use of -fwhole-program is not
           needed when linker plugin is active (see
           -fuse-linker-plugin).

           The current implementation of LTO makes no attempt to
           generate bytecode that is portable between different types of
           hosts.  The bytecode files are versioned and there is a
           strict version check, so bytecode files generated in one
           version of GCC do not work with an older or newer version of
           GCC.

           Link-time optimization does not work well with generation of
           debugging information on systems other than those using a
           combination of ELF and DWARF.

           If you specify the optional n, the optimization and code
           generation done at link time is executed in parallel using n
           parallel jobs by utilizing an installed make program.  The
           environment variable MAKE may be used to override the program
           used.  The default value for n is 1.

           You can also specify -flto=jobserver to use GNU make's job
           server mode to determine the number of parallel jobs. This is
           useful when the Makefile calling GCC is already executing in
           parallel.  You must prepend a + to the command recipe in the
           parent Makefile for this to work.  This option likely only
           works if MAKE is GNU make.

       -flto-partition=alg
           Specify the partitioning algorithm used by the link-time
           optimizer.  The value is either 1to1 to specify a
           partitioning mirroring the original source files or balanced
           to specify partitioning into equally sized chunks (whenever
           possible) or max to create new partition for every symbol
           where possible.  Specifying none as an algorithm disables
           partitioning and streaming completely.  The default value is
           balanced. While 1to1 can be used as an workaround for various
           code ordering issues, the max partitioning is intended for
           internal testing only.  The value one specifies that exactly
           one partition should be used while the value none bypasses
           partitioning and executes the link-time optimization step
           directly from the WPA phase.