In object-oriented programming language, the constructor and destructor of a class are two special member functions. They are mainly used for the creation and destruction of objects. They are closely related to the life cycle of objects, so they have special meanings. The compiler treats them differently from other ordinary member functions. When compiling, it will add some additional code to do some special-purpose business logic. This paper introduces one of the scenarios related to virtual function calls and its implementation mechanism.
As we know, in order to realize object-oriented polymorphic semantics, C + + designs a virtual function mechanism. Specifically, each class with virtual functions will have a virtual function table vtbl, in which the call entry address of each virtual function is stored. When creating an object, the object will be assigned a virtual function pointer vptr to point to the location of this vtbl. In this way, when calling its virtual function through the reference of a base class type, get the function pointer of the virtual function according to the virtual function table pointed to by vptr to call its virtual function. Although the declaration type of the reference is the base class, the function of the referenced subclass type is actually called. In other words, when the virtual function of the base class is called using the this pointer of the subclass object, because the real type of this is a subclass, its vptr points to the vtbl of the subclass, and the virtual function of the subclass is selected when the program runs. Because this process occurs when the program runs, this process is called dynamic binding.
In the member function of a class, when it calls other virtual functions in the class, it is implemented according to dynamic binding. So, in the two special functions of the constructor and destructor, is it the same way to call the virtual function of this class?
An example
Here is a small program to test:
// Defines the base class, which has two virtual functions class A { public: A() { puts("A::A()"); func1(); //Calling virtual function in constructor } virtual ~A() { puts("A::~A()"); func1(); // Calling virtual functions in destructor } // Ordinary function call virtual function void foo() { puts("A::foo()"); func1(); } virtual void func1() { puts("A::func1() virtual"); } }; // Derived class class B : public A { public: B() { puts("B::B()"); } ~B() { puts("B::~B()"); } virtual void func1() override { puts("B::func1() virtual"); } };
Two classes are defined: A is the base class, which defines a virtual function func1, and B is the derived class of a, which overrides this virtual function. The virtual function func1 is called in both class a constructors and destructors.
Test the function to make a pointer with declaration type A * point to an object with actual type B, that is, the declaration type of the pointer is the base class, but actually points to its derived class object.
// Test function void test() { puts("construct B object"); A *p = new B(); // p points to class B objects puts("invoke virtual function via base class pointer"); p->func1(); puts("destruct B object"); delete p; puts("end."); }
The log output by the program is as follows:
construct B object A::A() // First call the constructor of base class A of derived class B A::func1() virtual // The base class still calls its own virtual function B::B() invoke virtual function via base class pointer B::func1() virtual // A virtual function of a derived class object is called destruct B object B::~B() // Destruct derived class B object first A::~A() // Re destruct base class A object A::func1() virtual // In the base class, the call is still its own virtual function. end.
It can be seen from the output log that when a pointer's declaration type is a base class type and the actual type is its subclass object, if the virtual function is invoked in its constructor and destructor and no polymorphic mechanism is implemented, the virtual function of this class will still be invoked. That is, the compiler does not bind them dynamically.
Cause analysis
Why? This is related to the order in which derived class objects are created and destroyed in C + +.
When constructing an object of a derived class, the construction order is to construct the base class object first, and then the child object. Since the sub object is constructed after, that is, the sub type object has not been initialized when constructing the base class object, we know that for polymorphic classes with virtual functions, an additional virtual function pointer vptr should be stored in the object, which points to the virtual function table of the class. This vptr is set during the initialization and construction of the class. If dynamic binding is used at this time, the derived class has not been initialized, its vptr is a wild pointer, and the required virtual function cannot be found through vptr. Therefore, the polymorphism mechanism cannot be implemented at this time, so the compiler can only select the virtual function of the base class version.
Similarly, when destructing a derived class object, the destruct order is to destruct the subclass object first, and then the base class object. When the destructor of the base class is called, the subclass object has been destroyed, that is, the vptr of the derived class object may be invalid. Therefore, in this scenario, the compiler can only select the virtual function of the base class version.
Of course, strictly speaking, it should be the base class object part and subclass object part. There is only one object created with a derived class. For convenience of writing, it is generally called base class object and subclass object.
Mechanism behind
So, how is the compiler implemented? Originally, when compiling constructors and destructors, the compiler will secretly insert some code at the beginning of the function. These codes first set the vptr of the object to point to the entry address of the virtual function table vtbl of the class. Obviously, in subsequent calls to virtual functions, they are bound dynamically according to the vptr of the class, and their own virtual functions are called unless the vptr is updated. When the constructor of the derived class object is called after the base class object is constructed, the vptr of the derived class object is also set to point to the virtual function table vtbl of the derived class. In this way, when a virtual function call is made later, it is the virtual function of the bound derived class.
Let's take a look at the assembly language generated by the compiler. When compiling constructors and destructors, the compiler does those operations.
1. Assembly code of constructor of base class A:
0x0000000000422370 <+0>: push rbp 0x0000000000422371 <+1>: mov rbp,rsp 0x0000000000422374 <+4>: sub rsp,0x20 0x0000000000422378 <+8>: mov QWORD PTR [rbp+0x10],rcx // rcx stores this pointer 0x000000000042237c <+12>: mov rax,QWORD PTR [rbp+0x10] // Store vptr at the location indicated by rax 0x0000000000422380 <+16>: lea rdx,[rip+0x6e269] # 0x4905f0 <_ Ztv1a + 16 > / / RDX entry address for storing vtbl: 0x4905f0 0x0000000000422387 <+23>: mov QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class => 0x000000000042238a <+26>: lea rcx,[rip+0x65c70] # 0x488001 <_ZStL19piecewise_construct+1> 0x0000000000422391 <+33>: call 0x419d08 <puts> 0x0000000000422396 <+38>: mov rcx,QWORD PTR [rbp+0x10] 0x000000000042239a <+42>: call 0x4222d0 <A::func1()> 0x000000000042239f <+47>: nop 0x00000000004223a0 <+48>: add rsp,0x20 0x00000000004223a4 <+52>: pop rbp 0x00000000004223a5 <+53>: ret
It can be seen that in the constructor, the compiler first adds code to initialize the vptr of the object (lines 4-7 of assembly code), and then compiles the code in the function body that generates it.
2. Assembly code of constructor of derived class B:
0x00000000004224c0 <+0>: push rbp 0x00000000004224c1 <+1>: push rbx 0x00000000004224c2 <+2>: sub rsp,0x28 0x00000000004224c6 <+6>: lea rbp,[rsp+0x80] 0x00000000004224ce <+14>: mov QWORD PTR [rbp-0x40],rcx // rcx stores this pointer 0x00000000004224d2 <+18>: mov rax,QWORD PTR [rbp-0x40] // Store vptr at the location indicated by rax 0x00000000004224d6 <+22>: mov rcx,rax 0x00000000004224d9 <+25>: call 0x422370 <A::A()> // Call the constructor of the base class 0x00000000004224de <+30>: mov rax,QWORD PTR [rbp-0x40] 0x00000000004224e2 <+34>: lea rdx,[rip+0x6e147] # 0x490630 <_ Ztv1b + 16 > / / RDX entry address for storing vtbl: 0x490630 0x00000000004224e9 <+41>: mov QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class => 0x00000000004224ec <+44>: lea rcx,[rip+0x65b4c] # 0x48803f <_ZStL19piecewise_construct+63> 0x00000000004224f3 <+51>: call 0x419d08 <puts> 0x00000000004224f8 <+56>: jmp 0x422515 <B::B()+85> 0x00000000004224fa <+58>: mov rbx,rax 0x00000000004224fd <+61>: mov rax,QWORD PTR [rbp-0x40] 0x0000000000422501 <+65>: mov rcx,rax 0x0000000000422504 <+68>: call 0x422430 <A::~A()> 0x0000000000422509 <+73>: mov rax,rbx 0x000000000042250c <+76>: mov rcx,rax 0x000000000042250f <+79>: call 0x40f5d0 <_Unwind_Resume> 0x0000000000422514 <+84>: nop 0x0000000000422515 <+85>: add rsp,0x28 0x0000000000422519 <+89>: pop rbx 0x000000000042251a <+90>: pop rbp 0x000000000042251b <+91>: ret
It can be seen that in the constructor of class B, because B inherits a, first call the constructor of base class A (line 8). After the constructor of a is executed, update vptr to point to the virtual function table vtbl of this class (lines 5, 6, 10 and 11 of assembly code), and finally the code of the function body.
3. Assembly code of destructor of base class A:
0x0000000000422430 <+0>: push rbp 0x0000000000422431 <+1>: mov rbp,rsp 0x0000000000422434 <+4>: sub rsp,0x20 0x0000000000422438 <+8>: mov QWORD PTR [rbp+0x10],rcx // rcx stores this pointer 0x000000000042243c <+12>: mov rax,QWORD PTR [rbp+0x10] // Store vptr at the location indicated by rax 0x0000000000422440 <+16>: lea rdx,[rip+0x6e1a9] # 0x4905f0 <_ Ztv1a + 16 > / / RDX entry address for storing vtbl: 0x4905f0 0x0000000000422447 <+23>: mov QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class => 0x000000000042244a <+26>: lea rcx,[rip+0x65bb7] # 0x488008 <_ZStL19piecewise_construct+8> 0x0000000000422451 <+33>: call 0x419d08 <puts> 0x0000000000422456 <+38>: mov rcx,QWORD PTR [rbp+0x10] 0x000000000042245a <+42>: call 0x4222d0 <A::func1()> 0x000000000042245f <+47>: mov eax,0x0 0x0000000000422464 <+52>: test eax,eax 0x0000000000422466 <+54>: je 0x422472 <A::~A()+66> 0x0000000000422468 <+56>: mov rcx,QWORD PTR [rbp+0x10] 0x000000000042246c <+60>: call 0x470cf0 <_ZdlPv> 0x0000000000422471 <+65>: nop 0x0000000000422472 <+66>: add rsp,0x20 0x0000000000422476 <+70>: pop rbp 0x0000000000422477 <+71>: ret
Similarly, it can be seen that when entering the destructor, the compiler first initializes the vptr of the object (lines 4-7 of assembly code), and then executes other initialization logic codes.
4. Assembly code of destructor of derived class B:
0x0000000000422550 <+0>: push rbp 0x0000000000422551 <+1>: mov rbp,rsp 0x0000000000422554 <+4>: sub rsp,0x20 0x0000000000422558 <+8>: mov QWORD PTR [rbp+0x10],rcx // rcx stores this pointer 0x000000000042255c <+12>: mov rax,QWORD PTR [rbp+0x10] // Store vptr at the location indicated by rax 0x0000000000422560 <+16>: lea rdx,[rip+0x6e0c9] # 0x490630 <_ Ztv1b + 16 > / / RDX entry address for storing vtbl: 0x490630 0x0000000000422567 <+23>: mov QWORD PTR [rax],rdx //Adjust vptr to point to vtbl of this class => 0x000000000042256a <+26>: lea rcx,[rip+0x65ad5] # 0x488046 <_ZStL19piecewise_construct+70> 0x0000000000422571 <+33>: call 0x419d08 <puts> 0x0000000000422576 <+38>: mov rax,QWORD PTR [rbp+0x10] 0x000000000042257a <+42>: mov rcx,rax 0x000000000042257d <+45>: call 0x422430 <A::~A()> 0x0000000000422582 <+50>: mov eax,0x0 0x0000000000422587 <+55>: test eax,eax 0x0000000000422589 <+57>: je 0x422595 <B::~B()+69> 0x000000000042258b <+59>: mov rcx,QWORD PTR [rbp+0x10] 0x000000000042258f <+63>: call 0x470cf0 <_ZdlPv> 0x0000000000422594 <+68>: nop 0x0000000000422595 <+69>: add rsp,0x20 0x0000000000422599 <+73>: pop rbp 0x000000000042259a <+74>: ret
It can be seen that in the destructor of class B, first set the vptr pointer to point to the vtbl of this class (lines 4-7 of assembly code), and then call the destructor of base class A (line 12). In the destructor of class A, set vptr to point to the vtbl of this class.
Based on the above analysis, we can come to the conclusion that if a virtual function is called in a constructor or destructor of a class, it is not directly called, but indirectly called. For example, the constructor invokes a non virtual function, such as the foo function in the test code, and a virtual function is invoked in this ordinary function. At this time, the ordinary function still calls the virtual function of the class where the constructor is located, because vptr points to the vtbl of the class where the constructor is located, and polymorphism is still not implemented.
Static binding
Although the constructor and destructor call virtual functions, the C + + compiler does not use dynamic binding mechanism for them, but static binding. Because the constructor setting vptr points to the virtual function table of the class, the virtual function bound through vptr is a function in the same class as the constructor. The compiler knows the entry address of the virtual function in the compilation stage. Although the function call can be made correctly through dynamic binding, in order to optimize performance and reduce the cost of function call, Save the overhead of indirect addressing of primary pointer, and there is no need to adopt dynamic binding; The same is true when Binding virtual functions in destructors. For example, the address 0x4222d0 of "call 0x4222d0 < A:: func1() >" in the assembly code is the entry address of the virtual function func1 in derived class A. the addressing method is direct memory addressing, which can be seen as static binding.
However, if the constructor or destructor calls the virtual function indirectly, although the polymorphism mechanism is not implemented, the virtual function is still dynamically bound. In the above example, in the constructor of a, the statement calling func1() is replaced by calling foo(), and the foo function calls the virtual function func1. At this time, func1 is dynamically bound, although func1 is still the version of class A.
conclusion
In short, in the C++ language, if the virtual function is invoked in the constructor or destructor, the polymorphic mechanism will not be implemented, and the virtual function version in the class where the constructor is selected is chosen. If the virtual function is called directly, the static binding mechanism is adopted, while when the virtual function is called indirectly, the dynamic binding mechanism is adopted.
Java language
However, not all object-oriented languages are implemented in this way, and Java language is not. The following is a code of Java, which calls the virtual function foo in the constructor of parent class Base, and foo is rewritten by subclass Derived. In the main function, when constructing the derived object, first construct its parent base object. When calling the constructor of base, it will be bound dynamically, and the derived foo method is called:
public class BaseInvokeVirtual { static class Base { String x; public Base(String s) { x = s; foo(); } protected void foo() { System.out.println("Base::foo"); } } static class Derived extends Base { String m; public Derived(String s) { super(s); m = s; } @Override protected void foo() { System.out.println("Derived::foo->" + m); } } static public void main(String[] args) { Base b = new Derived("1234"); } }
If you run this code, the following log information will be output:
Derived::foo->null
It can be seen that when the virtual function foo is called in the constructor of Base, the subclass Derived object is not initialized, its String type data member m, its value is null pointer at this time. Obviously, if the method of calling m objects is invoked in foo, NullPointerException exceptions will be thrown. Therefore, for Java programmers, it is best not to write code for such scenarios, so as to avoid accidents.