Is it polymorphic to call virtual functions in constructors and destructors?

In object-oriented programming language, the constructor and destructor of a class are two special member functions. They are mainly used for the creation and destruction of objects. They are closely related to the life cycle of objects, so they have special meanings. The compiler treats them differently from other ordinary member functions. When compiling, it will add some additional code to do some special-purpose business logic. This paper introduces one of the scenarios related to virtual function calls and its implementation mechanism.

As we know, in order to realize object-oriented polymorphic semantics, C + + designs a virtual function mechanism. Specifically, each class with virtual functions will have a virtual function table vtbl, in which the call entry address of each virtual function is stored. When creating an object, the object will be assigned a virtual function pointer vptr to point to the location of this vtbl. In this way, when calling its virtual function through the reference of a base class type, get the function pointer of the virtual function according to the virtual function table pointed to by vptr to call its virtual function. Although the declaration type of the reference is the base class, the function of the referenced subclass type is actually called. In other words, when the virtual function of the base class is called using the this pointer of the subclass object, because the real type of this is a subclass, its vptr points to the vtbl of the subclass, and the virtual function of the subclass is selected when the program runs. Because this process occurs when the program runs, this process is called dynamic binding.

In the member function of a class, when it calls other virtual functions in the class, it is implemented according to dynamic binding. So, in the two special functions of the constructor and destructor, is it the same way to call the virtual function of this class?

An example

Here is a small program to test:

// Defines the base class, which has two virtual functions 
class A {
public:
	A() {
		puts("A::A()");
		func1(); //Calling virtual function in constructor
	}

	virtual ~A() {
		puts("A::~A()");
		func1(); // Calling virtual functions in destructor 
	}
	
	// Ordinary function call virtual function 
	void foo() {
		puts("A::foo()");
		func1();
	}

	virtual void func1() {
		puts("A::func1() virtual");
	}
};

// Derived class 
class B : public A {
public:
	B() {
		puts("B::B()");
	}

	~B() {
		puts("B::~B()");
	}

	virtual void func1() override {
		puts("B::func1() virtual");
	}
};

Two classes are defined: A is the base class, which defines a virtual function func1, and B is the derived class of a, which overrides this virtual function. The virtual function func1 is called in both class a constructors and destructors.

Test the function to make a pointer with declaration type A * point to an object with actual type B, that is, the declaration type of the pointer is the base class, but actually points to its derived class object.

// Test function 
void test() {
	puts("construct B object"); 
	A *p = new B(); // p points to class B objects
	puts("invoke virtual function via base class pointer");
	p->func1();
	puts("destruct B object");
	delete p;
	puts("end.");
}

The log output by the program is as follows:

construct B object
A::A() // First call the constructor of base class A of derived class B
A::func1() virtual // The base class still calls its own virtual function
B::B()
invoke virtual function via base class pointer
B::func1() virtual // A virtual function of a derived class object is called
destruct B object
B::~B() // Destruct derived class B object first
A::~A() // Re destruct base class A object
A::func1() virtual // In the base class, the call is still its own virtual function. 
end.

It can be seen from the output log that when a pointer's declaration type is a base class type and the actual type is its subclass object, if the virtual function is invoked in its constructor and destructor and no polymorphic mechanism is implemented, the virtual function of this class will still be invoked. That is, the compiler does not bind them dynamically.

Cause analysis

Why? This is related to the order in which derived class objects are created and destroyed in C + +.

When constructing an object of a derived class, the construction order is to construct the base class object first, and then the child object. Since the sub object is constructed after, that is, the sub type object has not been initialized when constructing the base class object, we know that for polymorphic classes with virtual functions, an additional virtual function pointer vptr should be stored in the object, which points to the virtual function table of the class. This vptr is set during the initialization and construction of the class. If dynamic binding is used at this time, the derived class has not been initialized, its vptr is a wild pointer, and the required virtual function cannot be found through vptr. Therefore, the polymorphism mechanism cannot be implemented at this time, so the compiler can only select the virtual function of the base class version.

Similarly, when destructing a derived class object, the destruct order is to destruct the subclass object first, and then the base class object. When the destructor of the base class is called, the subclass object has been destroyed, that is, the vptr of the derived class object may be invalid. Therefore, in this scenario, the compiler can only select the virtual function of the base class version.

Of course, strictly speaking, it should be the base class object part and subclass object part. There is only one object created with a derived class. For convenience of writing, it is generally called base class object and subclass object.

Mechanism behind

So, how is the compiler implemented? Originally, when compiling constructors and destructors, the compiler will secretly insert some code at the beginning of the function. These codes first set the vptr of the object to point to the entry address of the virtual function table vtbl of the class. Obviously, in subsequent calls to virtual functions, they are bound dynamically according to the vptr of the class, and their own virtual functions are called unless the vptr is updated. When the constructor of the derived class object is called after the base class object is constructed, the vptr of the derived class object is also set to point to the virtual function table vtbl of the derived class. In this way, when a virtual function call is made later, it is the virtual function of the bound derived class.

Let's take a look at the assembly language generated by the compiler. When compiling constructors and destructors, the compiler does those operations.

1. Assembly code of constructor of base class A:

   0x0000000000422370 <+0>:	push   rbp
   0x0000000000422371 <+1>:	mov    rbp,rsp
   0x0000000000422374 <+4>:	sub    rsp,0x20
   0x0000000000422378 <+8>:	mov    QWORD PTR [rbp+0x10],rcx // rcx stores this pointer
   0x000000000042237c <+12>:	mov    rax,QWORD PTR [rbp+0x10] // Store vptr at the location indicated by rax
   0x0000000000422380 <+16>:	lea    rdx,[rip+0x6e269]        # 0x4905f0 <_ Ztv1a + 16 > / / RDX entry address for storing vtbl: 0x4905f0
   0x0000000000422387 <+23>:	mov    QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class
=> 0x000000000042238a <+26>:	lea    rcx,[rip+0x65c70]        # 0x488001 <_ZStL19piecewise_construct+1>
   0x0000000000422391 <+33>:	call   0x419d08 <puts>
   0x0000000000422396 <+38>:	mov    rcx,QWORD PTR [rbp+0x10]
   0x000000000042239a <+42>:	call   0x4222d0 <A::func1()>
   0x000000000042239f <+47>:	nop
   0x00000000004223a0 <+48>:	add    rsp,0x20
   0x00000000004223a4 <+52>:	pop    rbp
   0x00000000004223a5 <+53>:	ret    

It can be seen that in the constructor, the compiler first adds code to initialize the vptr of the object (lines 4-7 of assembly code), and then compiles the code in the function body that generates it.

2. Assembly code of constructor of derived class B:

   0x00000000004224c0 <+0>:	push   rbp
   0x00000000004224c1 <+1>:	push   rbx
   0x00000000004224c2 <+2>:	sub    rsp,0x28
   0x00000000004224c6 <+6>:	lea    rbp,[rsp+0x80]
   0x00000000004224ce <+14>:	mov    QWORD PTR [rbp-0x40],rcx // rcx stores this pointer
   0x00000000004224d2 <+18>:	mov    rax,QWORD PTR [rbp-0x40] // Store vptr at the location indicated by rax
   0x00000000004224d6 <+22>:	mov    rcx,rax
   0x00000000004224d9 <+25>:	call   0x422370 <A::A()> // Call the constructor of the base class
   0x00000000004224de <+30>:	mov    rax,QWORD PTR [rbp-0x40]
   0x00000000004224e2 <+34>:	lea    rdx,[rip+0x6e147]        # 0x490630 <_ Ztv1b + 16 > / / RDX entry address for storing vtbl: 0x490630 
   0x00000000004224e9 <+41>:	mov    QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class
=> 0x00000000004224ec <+44>:	lea    rcx,[rip+0x65b4c]        # 0x48803f <_ZStL19piecewise_construct+63>
   0x00000000004224f3 <+51>:	call   0x419d08 <puts>
   0x00000000004224f8 <+56>:	jmp    0x422515 <B::B()+85>
   0x00000000004224fa <+58>:	mov    rbx,rax
   0x00000000004224fd <+61>:	mov    rax,QWORD PTR [rbp-0x40]
   0x0000000000422501 <+65>:	mov    rcx,rax
   0x0000000000422504 <+68>:	call   0x422430 <A::~A()>
   0x0000000000422509 <+73>:	mov    rax,rbx
   0x000000000042250c <+76>:	mov    rcx,rax
   0x000000000042250f <+79>:	call   0x40f5d0 <_Unwind_Resume>
   0x0000000000422514 <+84>:	nop
   0x0000000000422515 <+85>:	add    rsp,0x28
   0x0000000000422519 <+89>:	pop    rbx
   0x000000000042251a <+90>:	pop    rbp
   0x000000000042251b <+91>:	ret   

It can be seen that in the constructor of class B, because B inherits a, first call the constructor of base class A (line 8). After the constructor of a is executed, update vptr to point to the virtual function table vtbl of this class (lines 5, 6, 10 and 11 of assembly code), and finally the code of the function body.

3. Assembly code of destructor of base class A:

   0x0000000000422430 <+0>:	push   rbp
   0x0000000000422431 <+1>:	mov    rbp,rsp
   0x0000000000422434 <+4>:	sub    rsp,0x20
   0x0000000000422438 <+8>:	mov    QWORD PTR [rbp+0x10],rcx // rcx stores this pointer
   0x000000000042243c <+12>:	mov    rax,QWORD PTR [rbp+0x10] // Store vptr at the location indicated by rax
   0x0000000000422440 <+16>:	lea    rdx,[rip+0x6e1a9]        # 0x4905f0 <_ Ztv1a + 16 > / / RDX entry address for storing vtbl: 0x4905f0
   0x0000000000422447 <+23>:	mov    QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class
=> 0x000000000042244a <+26>:	lea    rcx,[rip+0x65bb7]        # 0x488008 <_ZStL19piecewise_construct+8>
   0x0000000000422451 <+33>:	call   0x419d08 <puts>
   0x0000000000422456 <+38>:	mov    rcx,QWORD PTR [rbp+0x10]
   0x000000000042245a <+42>:	call   0x4222d0 <A::func1()>
   0x000000000042245f <+47>:	mov    eax,0x0
   0x0000000000422464 <+52>:	test   eax,eax
   0x0000000000422466 <+54>:	je     0x422472 <A::~A()+66>
   0x0000000000422468 <+56>:	mov    rcx,QWORD PTR [rbp+0x10]
   0x000000000042246c <+60>:	call   0x470cf0 <_ZdlPv>
   0x0000000000422471 <+65>:	nop
   0x0000000000422472 <+66>:	add    rsp,0x20
   0x0000000000422476 <+70>:	pop    rbp
   0x0000000000422477 <+71>:	ret  

Similarly, it can be seen that when entering the destructor, the compiler first initializes the vptr of the object (lines 4-7 of assembly code), and then executes other initialization logic codes.

4. Assembly code of destructor of derived class B:

   0x0000000000422550 <+0>:	push   rbp
   0x0000000000422551 <+1>:	mov    rbp,rsp
   0x0000000000422554 <+4>:	sub    rsp,0x20
   0x0000000000422558 <+8>:	mov    QWORD PTR [rbp+0x10],rcx // rcx stores this pointer
   0x000000000042255c <+12>:	mov    rax,QWORD PTR [rbp+0x10]  // Store vptr at the location indicated by rax
   0x0000000000422560 <+16>:	lea    rdx,[rip+0x6e0c9]        # 0x490630 <_ Ztv1b + 16 > / / RDX entry address for storing vtbl: 0x490630
   0x0000000000422567 <+23>:	mov    QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class
=> 0x000000000042256a <+26>:	lea    rcx,[rip+0x65ad5]        # 0x488046 <_ZStL19piecewise_construct+70>
   0x0000000000422571 <+33>:	call   0x419d08 <puts>
   0x0000000000422576 <+38>:	mov    rax,QWORD PTR [rbp+0x10]
   0x000000000042257a <+42>:	mov    rcx,rax
   0x000000000042257d <+45>:	call   0x422430 <A::~A()>
   0x0000000000422582 <+50>:	mov    eax,0x0
   0x0000000000422587 <+55>:	test   eax,eax
   0x0000000000422589 <+57>:	je     0x422595 <B::~B()+69>
   0x000000000042258b <+59>:	mov    rcx,QWORD PTR [rbp+0x10]
   0x000000000042258f <+63>:	call   0x470cf0 <_ZdlPv>
   0x0000000000422594 <+68>:	nop
   0x0000000000422595 <+69>:	add    rsp,0x20
   0x0000000000422599 <+73>:	pop    rbp
   0x000000000042259a <+74>:	ret    

It can be seen that in the destructor of class B, first set the vptr pointer to point to the vtbl of this class (lines 4-7 of assembly code), and then call the destructor of base class A (line 12). In the destructor of class A, set vptr to point to the vtbl of this class.

Based on the above analysis, we can come to the conclusion that if a virtual function is called in a constructor or destructor of a class, it is not directly called, but indirectly called. For example, the constructor invokes a non virtual function, such as the foo function in the test code, and a virtual function is invoked in this ordinary function. At this time, the ordinary function still calls the virtual function of the class where the constructor is located, because vptr points to the vtbl of the class where the constructor is located, and polymorphism is still not implemented.

Static binding

Although the constructor and destructor call virtual functions, the C + + compiler does not use dynamic binding mechanism for them, but static binding. Because the constructor setting vptr points to the virtual function table of the class, the virtual function bound through vptr is a function in the same class as the constructor. The compiler knows the entry address of the virtual function in the compilation stage. Although the function call can be made correctly through dynamic binding, in order to optimize performance and reduce the cost of function call, Save the overhead of indirect addressing of primary pointer, and there is no need to adopt dynamic binding; The same is true when Binding virtual functions in destructors. For example, the address 0x4222d0 of "call 0x4222d0 < A:: func1() >" in the assembly code is the entry address of the virtual function func1 in derived class A. the addressing method is direct memory addressing, which can be seen as static binding.

However, if the constructor or destructor calls the virtual function indirectly, although the polymorphism mechanism is not implemented, the virtual function is still dynamically bound. In the above example, in the constructor of a, the statement calling func1() is replaced by calling foo(), and the foo function calls the virtual function func1. At this time, func1 is dynamically bound, although func1 is still the version of class A.

conclusion

In short, in the C++ language, if the virtual function is invoked in the constructor or destructor, the polymorphic mechanism will not be implemented, and the virtual function version in the class where the constructor is selected is chosen. If the virtual function is called directly, the static binding mechanism is adopted, while when the virtual function is called indirectly, the dynamic binding mechanism is adopted.

Java language
However, not all object-oriented languages are implemented in this way, and Java language is not. The following is a code of Java, which calls the virtual function foo in the constructor of parent class Base, and foo is rewritten by subclass Derived. In the main function, when constructing the derived object, first construct its parent base object. When calling the constructor of base, it will be bound dynamically, and the derived foo method is called:

public class BaseInvokeVirtual {
    static class Base {
        String x;

        public Base(String s) {
            x = s;
            foo();
        }

        protected void foo() {
            System.out.println("Base::foo");
        }
    }

    static class Derived extends Base {
        String m;

        public Derived(String s) {
            super(s);
            m = s;
        }

		@Override
        protected void foo() {
            System.out.println("Derived::foo->" + m);
        }
    }

    static public void main(String[] args) {
        Base b = new Derived("1234");
    }
}

If you run this code, the following log information will be output:

Derived::foo->null

It can be seen that when the virtual function foo is called in the constructor of Base, the subclass Derived object is not initialized, its String type data member m, its value is null pointer at this time. Obviously, if the method of calling m objects is invoked in foo, NullPointerException exceptions will be thrown. Therefore, for Java programmers, it is best not to write code for such scenarios, so as to avoid accidents.

Tags: Java C++

Posted on Wed, 10 Nov 2021 13:34:21 -0500 by Bmen