Linux/Compiler Options
Contents
Overview
Proper use of compiler options
For Firefox 3 builds, please use --enable-optimize without flags.
Our testing has shown that different parts of Mozilla run faster at different optimization levels. For example, cairo, pixman and sqlite are compiled at -O2 because they are fastest at that level while the JS engine is fastest at -Os. [3] Don't use --enable-optimize as a place to pass in random compile flags. That's a global setting that sets optimization levels throughout the source tree and is different depending on the module being compiled.
If you still need to pass in other non-optimization flags to the compile, please use CFLAGS and CXXFLAGS instead of passing them to --enable-optimize.
For Firefox 2 builds, you probably want to set a default optimization level.
The default optimization level on the 1.8 branch (a.k.a Firefox 2) is -O3 which is too aggressive and trades off a lot of space for not much speed. So you probably want to use --enable-optimize="..." for this release.
If you're using gcc 4.1.x you should use -O2 to make things go as fast as possible. This will result in about a 2MB code size hit. If you want to avoid that code size hit you can specify "-Os -finline-limit=100" which gives back most of the performance without too much code size growth. See the notes below.
For gcc 4.3.x you can use -O2 for your builds. The size hit is smaller because of visibility changes in that release of the compiler.
If you want to change optimization levels, please do it per-module.
As we discovered, the best optimization settings are per-module. If your testing shows that changes to a particular module improve performance please let us know by filing a bug against Firefox/Build Config and we can evaluate it and get it into the tree.
Compilers
Notes from dwitte on gcc 4.3 vs. 4.1.2. [4] Also see the original post about possible ways to make gcc 4.1.2 faster as well by using -Os and -finline-limit.
gcc 4.1.2 notes
it turns out that gcc 4.1.2 on linux, at our default optimization setting "-Os -freorder-blocks -fno-reorder-functions", avoids inlining even trivial functions (where the cost of doing so is less than even the fncall overhead). this is bad news for things like nsTArray, nsCOMPtr etc, which can result in many layers of wrapper calls if not inlined sensibly. gcc has an option to control inlining, "-finline-limit=n", which will (roughly) inline functions up to length n pseudo-instructions. to give some sense for numbers, the default value of n at -O2 is 600. i ran some tests and found that with our current settings and -finline-limit=50 on a 32-bit linux build, which is enough to inline trivial (one or two line) wrapper methods but no more, we can get a codesize saving of 225kb (2%), a Ts win of 3%, a Txul win of 18%, and a Tp2 win of about 25% (!). i also compared this to plain -O2: Txul is unchanged, Ts improves 3%, and Tp2 improves about 4%. however, codesize jumps 2,414kb (19%). maybe we can increase the inline limit at -Os to get back a bit of this perf, without exploding codesize. (we originally moved from -O2 to -Os on gcc 3.x, because it gave a huge codesize win and also a perf win of a few percent on Ts, Txul, and Tp. so, it seems gcc4.x behaves quite differently.)
gcc 4.3 notes
i've tested gcc 4.3 a bit. to summarize, it looks like this pathological -Os behavior is specific to 4.1 branch, and possibly just 4.1.2. also, there are some substantial perf and codesize wins to be had with gcc 4.3. gory details: tested with gcc 4.3 (20080104 pull). "stock configuration" is "-Os -freorder-blocks -fno-reorder-functions". some Tp2 numbers: baseline: gcc 4.3, stock: 142.78 ms stock, with -finline-limit=50: 146.89 ms (+2.9%) -O2: 131.56 ms (-7.9%) for comparison with previous results (comment 0): gcc 4.1.2, stock: 199 ms (+39%) stock, with -finline-limit=50: 149.33 ms (+4.6%) -O2: 142.67 ms (even) |size libxul.so| gcc 4.3, stock: 12,387kb stock, with -finline-limit=50: 12,325kb (-62kb) -O2: 15,061kb (+2,674kb) gcc 4.1.2, stock: 13,249kb (+862kb) stock, with -finline-limit=50: 13,025kb (+638kb) -O2: 15,440kb (+3,053kb) a few points from this data: 1) -Os is very sane on 4.3 by default. 2) on 4.3, relative to -Os, -O2 has improved a lot (8% Tp win, although at a 2.7Mb codesize cost). 3) 4.3 is 5 - 8% faster on Tp2 than 4.1.2, depending on -Os/-O2. 4) 4.3 gives an 400-800k codesize saving over 4.1.2. 3 & 4) are probably the same thing - a result of the hidden visibility propagation improvements introduced in gcc 4.2. these are a major win for us.
Distributions
Name |
GCC Version |
Last Build |
---|---|---|
Ubuntu 7.10 |
gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2) |
2.0.0.11+2nobinonly-0ubuntu0.7.10 (2008-01-07) |
gcc flags |
||
Fedora 8 |
gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) |
firefox-2.0.0.10-3.fc8 (2008-01-04) |
gcc flags |
||
CentOS 5.1 |
gcc version 4.1.2 20070626 (Red Hat 4.1.2-14) |
firefox-1.5.0.12-7.el5.centos (2008-01-07) |
gcc flags |