A memory fragmentation analysis of an E-Commerce Oriented crawler on. NET

1: Background

1. Tell a story

Last month, a friend wx came to me and said that there was a memory leak in his program. How to solve it? As shown in the figure below:

As can be seen from the screenshot, this friend is still familiar with the operation of windbg. He may lack some practical experience, so he doesn't know how to troubleshoot after using several commands.

Now that you have found me, continue to analyze and find the culprit on his dump based on my personal experience. Don't talk much. Talk to windbg.

2: Windbg analysis

1. Is there a memory leak?

Friends who pursue this series should know that I use it countless times! address -summary and! eeheap -gc these two commands to determine whether the current memory leak belongs to the managed layer or unmanaged layer.

0:000> !address -summary

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                    358     7dfc`67f60000 ( 125.986 TB)           98.43%
<unknown>                              1087      203`88b6e000 (   2.014 TB)  99.99%    1.57%
Image                                  1532        0`09f11000 ( 159.066 MB)   0.01%    0.00%
Heap                                    249        0`03453000 (  52.324 MB)   0.00%    0.00%
Stack                                    66        0`01fc0000 (  31.750 MB)   0.00%    0.00%
Other                                    10        0`001d1000 (   1.816 MB)   0.00%    0.00%
TEB                                      22        0`0002c000 ( 176.000 kB)   0.00%    0.00%
PEB                                       1        0`00001000 (   4.000 kB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_MAPPED                              183      200`00d06000 (   2.000 TB)  99.30%    1.56%
MEM_PRIVATE                            1252        3`8d479000 (  14.207 GB)   0.69%    0.01%
MEM_IMAGE                              1532        0`09f11000 ( 159.066 MB)   0.01%    0.00%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                358     7dfc`67f60000 ( 125.986 TB)           98.43%
MEM_RESERVE                             749      200`28a9b000 (   2.001 TB)  99.33%    1.56%
MEM_COMMIT                             2218        3`6f5f5000 (  13.740 GB)   0.67%    0.01%

0:000> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00000026DA8DA928
generation 1 starts at 0x00000026DA7FC348
generation 2 starts at 0x00000024C4691000
ephemeral segment allocation context: none
         segment             begin         allocated              size
00000024C4690000  00000024C4691000  00000024D468FF28  0xfffef28(268431144)
00000024EECF0000  00000024EECF1000  00000024FECF0000  0xffff000(268431360)
000000248D6F0000  000000248D6F1000  000000249D6EFEF8  0xfffeef8(268431096)
...
00000026D66D0000  00000026D66D1000  00000026DBA3CA30  0x536ba30(87472688)
Large object heap starts at 0x00000024D4691000
         segment             begin         allocated              size
00000024D4690000  00000024D4691000  00000024DC67C318  0x7feb318(134132504)
00000024E60F0000  00000024E60F1000  00000024EE0637C8  0x7f727c8(133638088)
0000002482140000  0000002482141000  000000248A08F338  0x7f4e338(133489464)
00000024A6770000  00000024A6771000  00000024AE76F6C0  0x7ffe6c0(134211264)
...
000000278E6D0000  000000278E6D1000  000000279635F2D0  0x7c8e2d0(130605776)
00000029233E0000  00000029233E1000  000000292AF672F8  0x7b862f8(129524472)
000000292B3E0000  000000292B3E1000  0000002931A5ED60  0x667dd60(107470176)
000000299B3E0000  000000299B3E1000  00000029A20095B0  0x6c285b0(113411504)
000000281E6D0000  000000281E6D1000  0000002825CD3F58  0x7602f58(123744088)
00000028266D0000  00000028266D1000  000000282D5CAD50  0x6ef9d50(116366672)
000000282E6D0000  000000282E6D1000  0000002833CA0880  0x55cf880(89979008)
00000029A33E0000  00000029A33E1000  00000029A684D300  0x346c300(54969088)
Total Size:              Size: 0x353f96d88 (14293757320) bytes.
------------------------------
GC Heap Size:    Size: 0x353f96d88 (14293757320) bytes.

From the output, the current process occupies MEM_COMMIT=13.7G, the memory occupation of managed heap is 14293757320 = 13.3G. Obviously, this is a managed memory leak in the simple mode. According to experience, there may be any large objects on the managed heap. Use it here! dumpheap -stat command.

0:000> !dumpheap -stat
Statistics:
              MT    Count    TotalSize Class Name
00007ff9ed6ea268  3956842     94964208 System.Collections.Generic.Dictionary`2+KeyCollection[[System.String, System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue, Serilog]]
00007ff9ed5e6d28  3842435    166405016 Serilog.Parsing.MessageTemplateToken[]
00007ff9ed5e5e28  3842434    184436832 Serilog.Events.MessageTemplate
00007ff9ecccf090  4011012    203304420 System.Int32[]
00007ff9ed647078  3956849    253238336 Serilog.Events.LogEvent
00007ff9ed6e7b48  3956849    284893128 System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue, Serilog]]
00007ff9ed5e74e8  9259598    296307136 Serilog.Parsing.TextToken
00007ff9ed6471b0 12551808    301243392 Serilog.Events.ScalarValue
00007ff9ed6e8308  3956849    729078048 System.Collections.Generic.Dictionary`2+Entry[[System.String, System.Private.CoreLib],[Serilog.Events.LogEventPropertyValue, Serilog]][]
00007ff9eccb1e18 16546412   3987811940 System.String
00000024c3b8faf0    82904   7382993568      Free

I'll go. The largest object in the managed heap is free, which takes up about 7.3G. That's it 🐂👃 For better understanding, let's briefly explain the free object.

2. Briefly explain free

In fact, in short, a free object is an object that is marked by gc as a recycled free block but not submitted to the operating system for release. How can we prove it? You can first find a free block on the managed heap.

0:000> !dumpheap -type Free
         Address               MT     Size
00000024c4691000 00000024c3b8faf0       24 Free
00000024c46a2448 00000024c3b8faf0       24 Free
00000024c46b26d8 00000024c3b8faf0       40 Free
00000024c47e4418 00000024c3b8faf0       40 Free
00000024c4925680 00000024c3b8faf0       40 Free
00000024c49284a8 00000024c3b8faf0       64 Free
00000024c4947a90 00000024c3b8faf0      192 Free
00000024c4951f70 00000024c3b8faf0       48 Free         
000000249d6ea3a8 00000024c3b8faf0      640 Free
  1. Use it! The do command checks to see if it is marked as a free block
0:000> !do 000000249d6ea3a8
Free Object
Size:        640(0x280) bytes
  1. Use dc to see if there is any trace of gc recovery in the content of this free block.
0:000> dc 000000249d6ea3a8 000000249d6ea3a8+0x280
00000024`9d6ea3a8  c3b8faf0 00000024 00000268 00000000  ....$...h.......
00000024`9d6ea3b8  9d6ea6d0 00000024 00000000 00000000  ..n.$...........
00000024`9d6ea3c8  ed3ae2b8 00007ff9 9d6ea3a8 00000024  ..:.......n.$...
00000024`9d6ea3d8  00000000 00000000 edcc9118 00007ff9  ................
00000024`9d6ea3e8  00000000 00000000 00000000 00000000  ................
00000024`9d6ea3f8  00000000 00000000 00000000 00000000  ................
00000024`9d6ea408  eeb07a50 00007ff9 9d6ea3c8 00000024  Pz........n.$...
00000024`9d6ea418  00000000 00000000 ef292ee8 00007ff9  ..........).....
00000024`9d6ea428  9d6ea408 00000024 00000000 00000000  ..n.$...........
...
00000024`9d6ea4a8  eeb0a158 00007ff9 9d6ea420 00000024  X....... .n.$...
00000024`9d6ea4b8  00000000 00000000 00000000 00000000  ................
00000024`9d6ea4c8  ef293818 00007ff9 9d6ea4a8 00000024  .8).......n.$...
00000024`9d6ea4d8  00000000 00000000 ee8357e0 00007ff9  .........W......
...
00000024`9d6ea508  eed37b40 00007ff9 00000000 00000000  @{..............
00000024`9d6ea518  00000000 00000000 00000000 00000000  ................
00000024`9d6ea528  c4699b48 00000024 00000000 00000000  H.i.$...........
00000024`9d6ea538  00000000 07000440 00000001 00000000  ....@...........
00000024`9d6ea548  00000000 00000000 00000000 00000000  ................
00000024`9d6ea558  00000000 00000000 ef2af6e0 00007ff9  ..........*.....
00000024`9d6ea568  00000000 00000000 00000000 00000000  ................
00000024`9d6ea578  00000000 00000000 c4699b48 00000024  ........H.i.$...
00000024`9d6ea588  00000000 00000000 00000000 07000400  ................
...
00000024`9d6ea628  ef2afd08                             ..*.

You can see that there are some residual content characters in this free block. Well, after we have a basic understanding of the free block, let's continue to explore.

3. What exactly prevents the merging of free blocks?

According to normal logic, most free blocks will merge memory after gc recycling, and the emptied segment s will be released by the operating system, but this dump does not, which means that there is something here that prevents the merging of free blocks. What is it? Experienced friends will say that you can observe the handle table of gc. The command is! gchandles -stat .

0:000> !gchandles -stat
Statistics:
              MT    Count    TotalSize Class Name
...
00007ff9ed15c0f0     1008        72576 System.Reflection.Emit.DynamicResolver
00007ff9ecbf6618       38       409344 System.Object[]
Total 1784 objects

Handles:
    Strong Handles:       233
    Pinned Handles:       16
    Async Pinned Handles: 18
    Ref Count Handles:    1
    Weak Long Handles:    1327
    Weak Short Handles:   144
    Dependent Handles:    45

From the output, there is nothing suspicious here. What should I do? If you have a lot of practical experience, there are still some experiences worth sharing. For example, if you observe the layout characteristics of free on heap, you often have important discoveries.

4. View the layout features of the free block

In order to simplify the output results, I limit the range to a segment on the heap, such as 000000 29233e0000 000000 292af672f8 here, so the command is! dumpheap 00000029233E0000 000000292AF672F8

0:000> !dumpheap 00000029233E0000  000000292AF672F8
         Address               MT     Size
00000029233e1000 00000024c3b8faf0  8291896 Free
0000002923bc9638 00007ff9eccb1e18   108448     
0000002923be3dd8 00000024c3b8faf0 29931248 Free
000000292586f4c8 00007ff9eccb1e18   301328     
00000029258b8dd8 00000024c3b8faf0 41384784 Free
0000002928030928 00007ff9eccb1e18   301328     
000000292807a238 00000024c3b8faf0  2542664 Free
00000029282e6e80 00007ff9eccb1e18   108448     
0000002928301620 00000024c3b8faf0 29915032 Free
0000002929f88db8 00007ff9eccb1e18   301328     
0000002929fd26c8 00000024c3b8faf0  2746688 Free
000000292a271008 00007ff9eccb1e18   291304     
000000292a2b81f0 00000024c3b8faf0  1019600 Free
000000292a3b10c0 00007ff9eccb1e18   108448     
000000292a3cb860 00000024c3b8faf0 10601048 Free
000000292ade7ab8 00007ff9eccb1e18   301328     
000000292ae313c8 00000024c3b8faf0   280808 Free
000000292ae75cb0 00007ff9eccb1e18   280854     
000000292aeba5c8 00000024c3b8faf0   416584 Free
000000292af20110 00007ff9eccb1e18   291304     

Statistics:
              MT    Count    TotalSize Class Name
00007ff9eccb1e18       10      2394118 System.String
00000024c3b8faf0       10    127130352      Free
Total 20 objects

It's really a shock to see that free and object are alternating, which is the real reason why free blocks can't be merged. To be honest, this textbook memory fragmentation dump is really hard to find. Next, take a few object objects between free to see what references lead to gc recycling.

5. Find the reference chain of the object

To see who the object is referenced by, you can use! gcroot command, I'll take two here.

0:000> !gcroot 0000002923bc9638
Thread 1878:
    00000024C39BE4B0 00007FFA4C0B3522 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
        rbp+10: 00000024c39be520
            ->  00000024C48AD6E0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
            ->  00000024C48AD200 Microsoft.Extensions.Hosting.Internal.Host
            ->  00000024C48AC538 Microsoft.Extensions.Logging.Logger`1[[Microsoft.Extensions.Hosting.Internal.Host, Microsoft.Extensions.Hosting]]
            ->  00000024C48AC658 Microsoft.Extensions.Logging.Logger
            ->  00000024C48AC680 Microsoft.Extensions.Logging.LoggerInformation[]
            ->  00000024C48AC6E0 Serilog.Extensions.Logging.SerilogLogger
            ->  00000024C48AC808 Serilog.Core.Logger
            ->  00000024C48AC760 Serilog.Core.Logger
            ->  00000024C47AD468 Serilog.Core.Logger
            ->  00000024C47ABF08 Serilog.Core.Sinks.SafeAggregateSink
            ->  00000024C47ABF20 Serilog.Core.ILogEventSink[]
            ->  00000024C479C548 Serilog.Sinks.Grafana.Loki.LokiSink
            ->  00000024C479C778 Serilog.Sinks.Grafana.Loki.Infrastructure.BoundedQueue`1[[Serilog.Events.LogEvent, Serilog]]
            ->  00000024C479C7B8 System.Collections.Generic.Queue`1[[Serilog.Events.LogEvent, Serilog]]
            ->  00000026E8C1A800 Serilog.Events.LogEvent[]
            ->  00000026148D3308 Serilog.Events.LogEvent
            ->  00000026148D4EF0 Serilog.Events.MessageTemplate
            ->  0000002923BC9638 System.String

Found 1 unique roots (run '!gcroot -all' to see all roots).

0:000> !gcroot 000000292586f4c8
Thread 1878:
    00000024C39BE4B0 00007FFA4C0B3522 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2922]
        rbp+10: 00000024c39be520
            ->  00000024C48AD6E0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
            ->  00000024C48AD200 Microsoft.Extensions.Hosting.Internal.Host
            ->  00000024C48AC538 Microsoft.Extensions.Logging.Logger`1[[Microsoft.Extensions.Hosting.Internal.Host, Microsoft.Extensions.Hosting]]
            ->  00000024C48AC658 Microsoft.Extensions.Logging.Logger
            ->  00000024C48AC680 Microsoft.Extensions.Logging.LoggerInformation[]
            ->  00000024C48AC6E0 Serilog.Extensions.Logging.SerilogLogger
            ->  00000024C48AC808 Serilog.Core.Logger
            ->  00000024C48AC760 Serilog.Core.Logger
            ->  00000024C47AD468 Serilog.Core.Logger
            ->  00000024C47ABF08 Serilog.Core.Sinks.SafeAggregateSink
            ->  00000024C47ABF20 Serilog.Core.ILogEventSink[]
            ->  00000024C479C548 Serilog.Sinks.Grafana.Loki.LokiSink
            ->  00000024C479C778 Serilog.Sinks.Grafana.Loki.Infrastructure.BoundedQueue`1[[Serilog.Events.LogEvent, Serilog]]
            ->  00000024C479C7B8 System.Collections.Generic.Queue`1[[Serilog.Events.LogEvent, Serilog]]
            ->  00000026E8C1A800 Serilog.Events.LogEvent[]
            ->  0000002614BB7AC8 Serilog.Events.LogEvent
            ->  0000002616D3CC40 Serilog.Events.MessageTemplate
            ->  000000292586F4C8 System.String

Found 1 unique roots (run '!gcroot -all' to see all roots).

From the reference chain, these objects are sent to Grafana.Loki through the serial log component. Through tracking the reference chain objects, I found that there are major problems in System.Collections.Generic.Queue. The screenshot is as follows:

Yes, the queue has a backlog of 395w. What is the backlog that can be used! wdo take a look at the string content.

It seems that this is a failed retry. After analysis, the idea is probably clear, that is, there is something wrong with Grafana.Loki or Serilog components, which leads to the log not being sent to Loki or the sending speed is too slow, and then it is constantly overstocked. Next, we communicated the analyzed information with our friends. The screenshot is as follows:

3: Summary

The main reason for this memory fragmentation is the backlog of 395w queue s generated in the process of connecting Serilog to Loki, but I can only analyze it here. As for why there is backlog, friends have to further debug and analyze it. I believe this problem will be solved soon 💪💪💪

Tags: C#

Posted on Sun, 10 Oct 2021 20:53:55 -0400 by pck76