C + + performance optimization notes - 7 - Optimization in the compiler - 1 - how the compiler optimizes code

How does the compiler optimize

Modern compilers can make many changes to the code to improve performance. It is useful for developers to understand what the compiler can and cannot do. The following sections describe some compiler optimizations that developers need to know.

Function inlining

The compiler can replace the original function call with the called function body. example:

// Example 8.1a
float square (float a) {
      return a * a;
}

float parabola (float x) {
      return square(x) + 1.0f;
}

The compiler can replace the call to square with the code in square:

// Example 8.1b
float parabola (float x) {
      return x * x + 1.0f;
}

The benefits of function inlining are:

  • It eliminates the overhead of call and return and parameter passing.
  • Code caching is better because the code becomes continuous.
  • If only one call is made to an inline function, the code becomes smaller.
  • Function inlining can create opportunities for other optimizations, as explained below.

The disadvantage of function inlining is that if there are multiple calls by an inlined function and the function is large, the code becomes larger. If a function is small and called from only one or a few places, the compiler is likely to inline it.

Constant folding and constant propagation

Expressions or subexpressions that contain only constants are replaced by the evaluated results. example:

// Example 8.2a
double a, b;
a = b + 2.0 / 3.0;

The compiler replaces it with

// Example 8.2b
a = b + 0.666666666666666666667;

This is actually quite convenient. Writing 2.0 / 3.0 is easier than calculating this value and expressing it in decimal. It is recommended that such a subexpression be enclosed in parentheses to ensure that the compiler recognizes it as a subexpression. For example, b*2.0 / 3.0 will be calculated as (b*2.0) / 3.0 instead of b * (2.0 / 3.0), unless you wrap the constant quantum expression in parentheses.

Constants can be propagated through a series of calculations:

// Example 8.3a
float parabola (float x) {
      return x * x + 1.0f;
}

float a, b;
a = parabola (2.0f);
b = a + 1.0f;

The compiler replaces it with

// Example 8.3b
a = 5.0f;
b = 6.0f;

If the expression contains functions that cannot be inlined or evaluated at compile time, constant folding and constant propagation are impossible. For example:

// Example 8.4
double a = sin(0.8);

The function sin is defined in a separate function library. You can't expect the compiler to inline this function and calculate it at compile time. Some compilers can calculate most of the most common mathematical functions at compile time, such as sqrt and pow, but more complex functions, such as sin, cannot.

Pointer elimination

If the pointing target is known, the pointer or reference can be eliminated. example:

// Example 8.5a
void Plus2 (int * p) {
     *p = *p + 2;
}

int a;
Plus2 (&a);

The compiler replaces it with

// Example 8.5b
a += 2;

Common subexpression elimination

If the same subexpression occurs more than once, the compiler may evaluate it only once. example:

// Example 8.6a
int a, b, c;
b = (a+1) * (a+1);
c = (a+1) / 4;

The compiler replaces it with

// Example 8.6b
int a, b, c, temp;
temp = a+1;
b = temp * temp;
c = temp / 4;

register variable

The most commonly used variables are stored in registers (Reference) Register storage).

The maximum number of integer register variables is about 6 in a 32-bit system and 14 in a 64 bit system.
The maximum number of floating-point register variables is 8 on 32-bit systems and 16 on 64 bit systems (32 If the AVX512 instruction set is available). In 32-bit systems, some compilers have difficulty creating floating-point register variables unless the SSE2 (or newer) instruction set is enabled.

The compiler selects the most commonly used variables as register variables. This includes pointers and references, which can be stored in integer registers. Typical candidates for register variables are temporary intermediate results, loop counters, function parameters, pointers, references, this pointer, common subexpressions, and inductive variables (see below).

If a variable has a pointer or reference to it, it cannot be saved in a register. Therefore, the use of pointers or references to variables that can benefit from register storage should be avoided.

Life cycle analysis

The lifetime of a variable is the range of code that uses the variable. The optimization compiler can use the same register for multiple variables if their lifetimes do not overlap or if they are sure to have the same value. This is useful when the number of available registers is limited. Example:

// Example 8.7
int SomeFunction (int a, int x[]) {
     int b, c;
     x[0] = a;
     b = a + 1;
     x[1] = b;
     c = b + 1;
     return c;
}

In this example, a, b and c can share the same register because their lifetimes do not overlap. If c = b + 1 is changed to c = a + 2, a and b cannot share registers because their lifetimes now overlap.

Compilers usually cannot use this principle for objects stored in memory. They cannot use the same memory area for different objects, even if their lifetimes do not overlap. How to create different objects sharing the same memory area will be discussed later.

Merge the same branches

By combining the same code fragments, the code can be more compact. Example:

// Example 8.8a
double x, y, z; bool b;

if (b) {
     y = sin(x);
     z = y + 1.;
}
else {
     y = cos(x);
     z = y + 1.;
}

The compiler replaces it with

// Example 8.8b
double x, y; bool b;

if (b) {
     y = sin(x);
}
else {
     y = cos(x);
}
z = y + 1.;

Eliminate jump

Jump can be avoided by copying the object code to jump. Example:

// Example 8.9a
int SomeFunction (int a, bool b) {
     if (b) {
          a = a * 2;
     }
     else {
          a = a * 3;
     }
     return a + 1;
}

This code has a jump from a = a*2; to return a+1;. The compiler can eliminate this jump by copying the return statement:

// Example 8.9b
int SomeFunction (int a, bool b) {
     if (b) {
          a = a * 2;
          return a + 1;
     }
     else {
          a = a * 3;
          return a + 1;
     }
}

If the condition can always be reduced to true or false, the branch can be eliminated:

// Example 8.10a
if (true) {
     a = b;
}
else {
     a = c;
}

Can be reduced to:

// Example 8.10b
a = b;

If the previous branch condition is known, the branch can also be eliminated. Example:

// Example 8.11a
int SomeFunction (int a, bool b) {
     if (b) {
          a = a * 2;
     }
     else {
          a = a * 3;
     }

     if (b) {
          return a + 1;
     }
     else {
          return a - 1;
     }
}

The compiler can reduce this to:

// Example 8.11b
int SomeFunction (int a, bool b) {
     if (b) {
          a = a * 2;
          return a + 1;
     }
     else {
          a = a * 3;
          return a - 1;
     }
}

Loop expansion

If a high degree of optimization is required, some compilers expand the loop reference Loop expansion . this may be beneficial if the loop body is very small or it opens the possibility of further optimization. Loops with a small repeat count can be fully expanded to avoid loop overhead. Example:

// Example 8.12a
int i, a[2];
for (i = 0; i < 2; i++) a[i] = i+1;

The compiler reduces this to:

// Example 8.12b
int a[2];
a[0] = 1; a[1] = 2;

Unfortunately, some compilers expand too much. Excessive loop expansion is not optimal because it takes up too much space in the code cache and fills the loop buffer that some microprocessors have. In some cases, it may be helpful to turn off the loop expansion option in the compiler.

Loop invariant code movement

If a calculation does not depend on the loop counter, it can be moved out of the loop. Example:

// Example 8.13a
int i, a[100], b;
for (i = 0; i < 100; i++) {
     a[i] = b * b + 1;
}

The compiler may change this to:

// Example 8.13b
int i, a[100], b, temp;
temp = b * b + 1;
for (i = 0; i < 100; i++) {
     a[i] = temp;
}

Inductive variable

The expression of a linear function of a loop counter can be calculated by adding a constant to the previous value. Example:

// Example 8.14a
int i, a[100];
for (i = 0; i < 100; i++) {
     a[i] = i * 9 + 3;
}

To avoid multiplication, the compiler may change it to:

// Example 8.14b
int i, a[100], temp;
temp = 3;
for (i = 0; i < 100; i++) {
     a[i] = temp;
     temp += 9;
}

Inductive variables are usually used to calculate the address of array elements. Example:

// Example 8.15a
struct S1 {double a; double b;};
S1 list[100]; int i;
for (i = 0; i < 100; i++) {
     list[i].a = 1.0;
     list[i].b = 2.0;
}

In order to access an element in a list, the compiler must calculate its address. The address of list[i] is equal to the starting address of list plus i*sizeof(S1). This is a linear function of I that can be calculated by inductive variables. The compiler can use the same inductive variables to access list[i].a and list[i].b. It also eliminates I, which is used as a loop counter when the final value of this inductive variable can be calculated in advance. This reduces the code to:

// Example 8.15b
struct S1 {double a; double b;};
S1 list[100], *temp;
for (temp = &list[0]; temp < &list[100]; temp++) {
     temp->a = 1.0;
     temp->b = 2.0;
}

The factor sizeof(S1) = 16 is actually hidden behind the C + + syntax in example 8.15b. The integer representation of & list [100] is (int) (& list [100]) = (int) (& list [0]) + 100 * 16, and temp + + actually adds 16 to the integer value of temp.

The compiler does not need inductive variables to calculate the address of array elements of simple types, because if the address can be expressed as base address plus constant plus index multiplied by factor 1, 2, 4 or 8, the CPU has hardware support; Other non basic types are not allowed. If a and b in example 8.15a are float instead of double, sizeof(S1) will be 8, and inductive variables will not be required, because the CPU has hardware support for index multiplication by 8.

Common compilers do not create inductive variables for floating-point expressions or more complex integer expressions. How to use inductive variables for polynomial calculation will be discussed later.

dispatch

For the purpose of parallel execution, the compiler can rearrange instructions. example:

// Example 8.16
float a, b, c, d, e, f, x, y;
x = a + b + c;
y = d + e + f;

The compiler can interleave the two formulas in this example, first calculate a+b, then d+e, then the first sum plus c, the second sum plus f, then the first result is saved in x, and finally the second result is saved in y. This purpose is to help the CPU perform multiple calculations in parallel. Modern CPUs can actually rearrange instructions without the help of the compiler, but the compiler can make rearrangement easier for the CPU.

Algebraic simplification

Most compilers can simplify simple algebraic expressions using basic algebraic rules. For example, the compiler changes the expression - (- a) to a.

I don't think programmers often write expressions like - (- a), but such expressions may be a result of other optimizations, such as function inlining. As a result of macro expansion, reducible expressions are also quite common.

However, programmers do often write expressions that can be reduced. This is because non reduced expressions better explain the logic behind the program, or because programmers have not thought about algebraic reduction. For example, programmers tend to write if (! A & &! B), rather than the equivalent if (! (a|b)), even if the latter is less than one operator. Fortunately, all compilers can reduce this situation.

You can't expect the compiler to reduce complex algebraic expressions. For example, a compiler can reduce (a*b*c)+(c*b*a) to a*b*c*2. It is quite difficult to implement many algebraic rules in the compiler. In the case of Boolean algebra, it is possible to implement a general algorithm that can reduce any expression (such as Quine McCluskey or Espresso), but the compiler I tested doesn't seem to do so.

The compiler is better at reducing integer expressions than floating-point expressions, even if the algebraic rules are the same. This is because the algebraic operation of floating-point expressions may have unexpected effects. This effect can be demonstrated by the following examples:

// Example 8.17
char a = -100, b = 100, c = 100, y;
y = a + b + c;

According to algebraic rules, you can write:

y = c + b + a;

This may be beneficial if the subexpression c+b can be reused elsewhere. Now imagine that a is a large negative number, b and c are large positive numbers, and c+b produces overflow. Integer overflow produces a value wrap, resulting in a negative value. Fortunately, the overflow of c+b is offset by another overflow after adding a. a+b+c produces the same result as c+b+a, although the latter involves overflow and underflow, while the former does not. This is why algebraic operations on integer expressions are safe (except for the <, < =, > and > = operators).

The same conclusion does not apply to floating-point expressions. Floating point variables do not wrap around during overflow and underflow. The range of floating-point variables is so large that we don't have to worry about overflow and underflow, except in special mathematical applications. But we must consider the loss of accuracy. Let's repeat the above example with floating-point values:

// Example 8.18
float a = -1.0E8, b = 1.0E8, c = 1.23456, y;
y = a + b + c;

The calculation here gives a+b=0, and then 0 + 1.23456 = 1.23456. But if we change the order of operands and add B and C first, we won't get the same result. b+c = 100000001.23456. The float type can hold about 7 significant digits, so the value of B + C will be rounded to 10000000. When we add a to this value, we get 0 instead of 1.23456.

The conclusion is that changing the order of floating-point operands risks losing precision. The compiler will not do this unless you specify an option that allows floating-point calculations to lose precision. Even if all relevant optimization options are turned on, the compiler will not make such obvious reduction, such as 0/a = 0, because if a is 0 or infinite or NAN (not a number), the expression will be invalid. Different compilers behave differently because they allow or do not allow different options for precision loss.

You cannot rely on the compiler for any algebraic reduction on floating-point code. Manual reduction is safer.

De virtualization

If you know which version of the virtual function is needed, optimization compiler can bypass virtual table lookup used for virtual function calls. example:

// Example 8.19. Devirtualization
class C0 {
public:
     virtual void f();
};

class C1 : public C0 {
public:
     virtual void f();
};

void g() {
     C1 obj1;
     C0 * p = & obj1;
     p->f(); // Virtual call to C1::f
}

When there is no optimization, the compiler needs to look in the virtual table to see whether to call p - > f() to C0::f or C1::f. However, the optimization compiler will see that P always points to an object of Class C1, so it can call C1::f directly without using a virtual table. Unfortunately, only a few compilers can do this optimization.

Tags: C++ Optimize

Posted on Fri, 10 Sep 2021 21:43:39 -0400 by perry789