Take you to build an APM monitoring system

APM is the abbreviation of Application Performance Monitoring, which monitors and manages the performance and availability of software applications.Application performance management is critical to the continuous and stable operation of an application.So this article talks about how to accurately monitor and report data from an iOS App performance management perspective

App performance is one of the most important factors affecting user experience.Performance issues include Crash, network request errors or timeouts, slow UI response, main threading jayton, high CPU and memory usage, high power consumption, and so on.Most of the problems are due to the incorrect use of thread locks, system functions, programming specifications, data structures, and so on.The key to solving the problem is to find and locate the problem as soon as possible.

This article highlights the reasons for APM and how to collect data.APM data collection combined with data reporting mechanism, according to a certain strategy to upload data to the server.The server consumes this information and produces reports.Please combine Sister Stories And summarizes how to create a flexible, configurable and powerful data reporting component.

1. Carton Monitoring

The Carton problem is the problem of not responding to user interaction on the main thread.The direct experience of users is affected, so Carton monitoring for App is an important part of APM.

FPS (frame per second) number of frame refreshes per second, the best iPhone phone is 60, some iPad models are 120, which is also a reference parameter for Carton monitoring. Why is it a reference parameter?Because it's not accurate.First tell me how to get FPS.CADisplayLink is a system timer that refreshes views at the same rate as the frame refresh rate.[CADisplayLink displayLinkWithTarget:self selector:@selector(###:)].As to why not let's look at the following sample code

_displayLink = [CADisplayLink displayLinkWithTarget:self selector:@selector(p_displayLinkTick:)];
[_displayLink setPaused:YES];
[_displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSRunLoopCommonModes];

The code shows that the CADisplayLink object is added to a specified RunLoop under a Mode l.So again at the CPU level, Carton's experience is the result of the entire image rendering: CPU + GPU.Please keep looking down

1. Screen Drawing Principles

Explain the principle of old CRT monitors.The CRT gun scans from the top to the next line in the same way as above, and the display displays a frame after the sweeping surface is finished, then the gun returns to its original position to continue the next scan.To synchronize the display process of the display with the video controller of the system, the display (or other hardware) uses a hardware clock to generate a series of timing signals.When the gun changes to a new line and is ready for scanning, the monitor emits a horizontal synchronization signal (HSync); when a frame of picture is drawn, the gun returns to its original position, and before the next frame is drawn, the monitor emits a Vertical synchronization signal (VSync).Displays are usually refreshed at a fixed frequency, which is the frequency at which the VSync signal is generated.Although today's monitors are basically LCD displays, the principle remains unchanged.

Usually, the display of a picture on the screen is coordinated by the CPU, GPU, and monitor as shown above.The CPU calculates what needs to be realistic based on code written by the Engineer (view creation, layout calculation, picture decoding, text drawing, etc.) and submits the results to the GPU, which is responsible for layer composition, texture rendering, and then submits the rendering results to the frame buffer.The video controller then reads the data of the frame buffer line by line according to the VSync signal and passes it to the display through a digital-to-analog conversion.

In the case of only one frame buffer, there are efficiency problems in reading and refreshing the frame buffer. To solve the efficiency problem, the display system will introduce two buffers, double buffer mechanism.In this case, the GPU prerenders a frame into the frame buffer for the video controller to read, and when the next frame is rendered, the GPU points the video controller's pointer directly to the second buffer.Increased efficiency.

Currently, double buffers improve efficiency, but they present a new problem: when the video controller has not finished reading, that is, when the screen content is displayed, the GPU By submitting a newly rendered frame to another frame buffer and pointing the video controller's pointer to the new frame buffer, the video controller will display the lower half of the new frame's data on the screen, causing the picture to tear.

To solve this problem, the GPU usually has a mechanism called vertical synchronization signal (V-Sync). When the vertical synchronization signal is turned on, the GPU waits until the video controller sends a V-Sync signal before rendering a new frame and updating the frame buffer.These mechanisms solve the tearing problem and increase the smoothness of the picture.But more computing resources are needed

Answering question

Some people might see "When the V Sync signal is turned on, the GPU waits for the video controller to send a V-Sync signal before rendering a new frame and updating the frame buffer." Here, you might think that the GPU receives V-Sync before updating a new frame rendering and frame buffer. Does that mean double buffers are meaningless?

Imagine a process in which a display displays the first and second frame images.First, in the case of double buffers, the GPU renders a frame of the image and stores it in the frame buffer, then the video controller's pointer directly to the buffer to display the first frame of the image.After the content display of the first frame image is complete, the video controller sends a V-Sync signal, the GPU receives the V-Sync signal, renders the second frame image, and points the video controller's pointer to the second frame buffer.

It appears that the second frame image is a V-Sync signal sent by the video controller after the first frame is displayed.Is it?Is that true?(ii)What do you want, of course not._Otherwise double buffers would have no meaning

Disclosure.See the picture below

When the first V-Sync signal arrives, a frame of image is rendered and placed in the frame buffer, but not displayed. When the second V-Sync signal is received, the result of the first rendering is read (the pointer of the video controller points to the first frame buffer), a new frame is rendered and the result is saved in the second frame buffer, and the third V-Sync is received.After the signal, the content of the second frame buffer is read (the pointer of the video controller points to the second frame buffer), and the rendering of the third frame image is started and fed into the first frame buffer, cycling back and forth in turn.

Please check the information. A ladder is required: Multiple buffering

2. Causes of Carton

When the VSync signal arrives, the system graphics service notifies App through mechanisms such as CADisplayLink, and the App main thread begins to calculate the display content (view creation, layout calculation, picture decoding, text drawing, etc.) in the CPU.The calculated content is then submitted to the GPU, which is transformed, synthesized, rendered by the layer, and then submitted to the frame buffer by the GPU, waiting for the next VSync signal to arrive before displaying the rendered result.In the case of vertical synchronization mechanism, if the CPU or GPU does not complete the submission of content within a VSync time cycle, it will cause the frame to be discarded and wait for the next opportunity to display, when the screen is still the previously rendered image, so this is the reason why the CPU, GPU level interface is karton.

Currently, iOS devices have two caching mechanisms as well as three caching mechanisms. Android now maintains three caching mechanisms, and in the early days it was a single caching mechanism. iOS Triple Buffer Mechanism Example

CPU and GPU resources are consumed for many reasons, such as frequent object creation, attribute adjustment, file reading, view level adjustment, layout calculation (AutoLayout)More views are more difficult to solve linear equations, picture decoding (read optimization of large pictures), image drawing, text rendering, database reading (multi-read or multi-write optimistic lock, pessimistic lock scenarios), lock use (example: improper use of spin locks will waste CPU), and so on.Developers find the best solution based on their own experience (this is not the focus of this article).

3. How APM monitors Carton and reports

CADisplayLink is definitely not used. This FPS is for reference only.In general, there are two scenarios for Carton's monitoring: to listen for RunLoop status callbacks, and to ping the main thread as a child thread

3.1 Ways to monitor RunLoop status

RunLoop is responsible for monitoring input sources for scheduling.Examples include networks, input devices, periodic or delayed events, asynchronous callbacks, and so on.RunLoop receives two types of input sources: one is an asynchronous message (source0 event) from another thread or from a different application, and the other is an event from a scheduled or repeated interval.

RunLoop status is as follows

Step 1: Notify Observers that RunLoop is about to start entering the loop, followed by the loop

if (currentMode->_observerMask & kCFRunLoopEntry )
    // Notify Observers that RunLoop is about to enter the loop
    __CFRunLoopDoObservers(rl, currentMode, kCFRunLoopEntry);
// Enter loop
result = __CFRunLoopRun(rl, currentMode, seconds, returnAfterSourceHandled, previousMode);

Step 2: Open the do while loop lifesaving thread, notify Observers, RunLoop triggers Timer callback, Source0 callback, and then execute the added block

 if (rlm->_observerMask & kCFRunLoopBeforeTimers)
    //  Notify Observers that RunLoop is about to trigger a Timer callback
    __CFRunLoopDoObservers(rl, rlm, kCFRunLoopBeforeTimers);
if (rlm->_observerMask & kCFRunLoopBeforeSources)
    //  Notify Observers that RunLoop is about to trigger a Source callback
    __CFRunLoopDoObservers(rl, rlm, kCFRunLoopBeforeSources);
// Execute joined block
__CFRunLoopDoBlocks(rl, rlm);

Step 3: RunLoop will jump to handle_if Source1 is read after triggering the Source0 callbackMsg to process messages.

//  If Source1 (port-based) is read, process the Source1 directly and skip to process the message
if (MACH_PORT_NULL != dispatchPort && !didDispatchPortLastTime) {
#if DEPLOYMENT_TARGET_MACOSX || DEPLOYMENT_TARGET_EMBEDDED || DEPLOYMENT_TARGET_EMBEDDED_MINI
    msg = (mach_msg_header_t *)msg_buffer;
    
    if (__CFRunLoopServiceMachPort(dispatchPort, &msg, sizeof(msg_buffer), &livePort, 0, &voucherState, NULL)) {
        goto handle_msg;
    }
#elif DEPLOYMENT_TARGET_WINDOWS
    if (__CFRunLoopWaitForMultipleObjects(NULL, &dispatchPort, 0, 0, &livePort, NULL)) {
        goto handle_msg;
    }
#endif
}

Step 4: Notify Observers that they are about to go into hibernation when the callback triggers

Boolean poll = sourceHandledThisLoop || (0ULL == timeout_context->termTSR);
// Notify Observers that the thread of RunLoop is about to go into sleep
if (!poll && (rlm->_observerMask & kCFRunLoopBeforeWaiting)) __CFRunLoopDoObservers(rl, rlm, kCFRunLoopBeforeWaiting);
	__CFRunLoopSetSleeping(rl);

Step 5: After hibernation, wait for mach_port message to wake up again.Only the following four situations can be waked up again.

  • port-based source events
  • Timer time is up
  • RunLoop timeout
  • Callee Wake Up
do {
    if (kCFUseCollectableAllocator) {
        // objc_clear_stack(0);
        // <rdar://problem/16393959>
        memset(msg_buffer, 0, sizeof(msg_buffer));
    }
    msg = (mach_msg_header_t *)msg_buffer;
    
    __CFRunLoopServiceMachPort(waitSet, &msg, sizeof(msg_buffer), &livePort, poll ? 0 : TIMEOUT_INFINITY, &voucherState, &voucherCopy);
    
    if (modeQueuePort != MACH_PORT_NULL && livePort == modeQueuePort) {
        // Drain the internal queue. If one of the callout blocks sets the timerFired flag, break out and service the timer.
        while (_dispatch_runloop_root_queue_perform_4CF(rlm->_queue));
        if (rlm->_timerFired) {
            // Leave livePort as the queue port, and service timers below
            rlm->_timerFired = false;
            break;
        } else {
            if (msg && msg != (mach_msg_header_t *)msg_buffer) free(msg);
        }
    } else {
        // Go ahead and leave the inner loop.
        break;
    }
} while (1);

Step 6: Notify Observer when waking up that the RunLoop thread has just been waked up

// Notify Observers that the RunLoop thread has just been awakened
if (!poll && (rlm->_observerMask & kCFRunLoopAfterWaiting)) __CFRunLoopDoObservers(rl, rlm, kCFRunLoopAfterWaiting);
    // Processing messages
    handle_msg:;
    __CFRunLoopSetIgnoreWakeUps(rl);

Step 7: After RunLoop wakes up, process messages received on wake-up

  • If Timer time is up, trigger Timer's callback
  • If it is dispatch, block is executed
  • Handle this event if it is a source1 event
#if USE_MK_TIMER_TOO
        // If a Timer is time, trigger a callback for that Timer
        else if (rlm->_timerPort != MACH_PORT_NULL && livePort == rlm->_timerPort) {
            CFRUNLOOP_WAKEUP_FOR_TIMER();
            // On Windows, we have observed an issue where the timer port is set before the time which we requested it to be set. For example, we set the fire time to be TSR 167646765860, but it is actually observed firing at TSR 167646764145, which is 1715 ticks early. The result is that, when __CFRunLoopDoTimers checks to see if any of the run loop timers should be firing, it appears to be 'too early' for the next timer, and no timers are handled.
            // In this case, the timer port has been automatically reset (since it was returned from MsgWaitForMultipleObjectsEx), and if we do not re-arm it, then no timers will ever be serviced again unless something adjusts the timer list (e.g. adding or removing timers). The fix for the issue is to reset the timer here if CFRunLoopDoTimers did not handle a timer itself. 9308754
            if (!__CFRunLoopDoTimers(rl, rlm, mach_absolute_time())) {
                // Re-arm the next timer
                __CFArmNextTimerInMode(rlm, rl);
            }
        }
#endif
        //  If there is dispatch to main_queue block, execute block
        else if (livePort == dispatchPort) {
            CFRUNLOOP_WAKEUP_FOR_DISPATCH();
            __CFRunLoopModeUnlock(rlm);
            __CFRunLoopUnlock(rl);
            _CFSetTSD(__CFTSDKeyIsInGCDMainQ, (void *)6, NULL);
#if DEPLOYMENT_TARGET_WINDOWS
            void *msg = 0;
#endif
            __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__(msg);
            _CFSetTSD(__CFTSDKeyIsInGCDMainQ, (void *)0, NULL);
            __CFRunLoopLock(rl);
            __CFRunLoopModeLock(rlm);
            sourceHandledThisLoop = true;
            didDispatchPortLastTime = true;
        }
        // Handle an event if a Source1 (port-based) event is issued
        else {
            CFRUNLOOP_WAKEUP_FOR_SOURCE();
            
            // If we received a voucher from this mach_msg, then put a copy of the new voucher into TSD. CFMachPortBoost will look in the TSD for the voucher. By using the value in the TSD we tie the CFMachPortBoost to this received mach_msg explicitly without a chance for anything in between the two pieces of code to set the voucher again.
            voucher_t previousVoucher = _CFSetTSD(__CFTSDKeyMachMessageHasVoucher, (void *)voucherCopy, os_release);

            CFRunLoopSourceRef rls = __CFRunLoopModeFindSourceForMachPort(rl, rlm, livePort);
            if (rls) {
#if DEPLOYMENT_TARGET_MACOSX || DEPLOYMENT_TARGET_EMBEDDED || DEPLOYMENT_TARGET_EMBEDDED_MINI
		mach_msg_header_t *reply = NULL;
		sourceHandledThisLoop = __CFRunLoopDoSource1(rl, rlm, rls, msg, msg->msgh_size, &reply) || sourceHandledThisLoop;
		if (NULL != reply) {
		    (void)mach_msg(reply, MACH_SEND_MSG, reply->msgh_size, 0, MACH_PORT_NULL, 0, MACH_PORT_NULL);
		    CFAllocatorDeallocate(kCFAllocatorSystemDefault, reply);
		}
#elif DEPLOYMENT_TARGET_WINDOWS
                sourceHandledThisLoop = __CFRunLoopDoSource1(rl, rlm, rls) || sourceHandledThisLoop;
#endif

Step 8: Determine if you need to go to the next loop based on the current RunLoop state.When externally forced to stop or loop timed out, do not continue to the next loop, otherwise enter the next loop

if (sourceHandledThisLoop && stopAfterHandle) {
    // When entering a loop, the parameter says that once the event is processed, it returns
    retVal = kCFRunLoopRunHandledSource;
    } else if (timeout_context->termTSR < mach_absolute_time()) {
        // Timeout exceeded for incoming parameter marker
        retVal = kCFRunLoopRunTimedOut;
} else if (__CFRunLoopIsStopped(rl)) {
        __CFRunLoopUnsetStopped(rl);
    // Forced to stop by external caller
    retVal = kCFRunLoopRunStopped;
} else if (rlm->_stopped) {
    rlm->_stopped = false;
    retVal = kCFRunLoopRunStopped;
} else if (__CFRunLoopModeIsEmpty(rl, rlm, previousMode)) {
    // None of the sources/timers
    retVal = kCFRunLoopRunFinished;
}

Complete and annotated Runlop code can be found in here .Source1 is used by RunLoop to handle system events from Mach port, and Source0 is used to handle user events.It is essentially a call to the Source0 event handler after the Source1 system event is received.

RunLoop 6 states


typedef CF_OPTIONS(CFOptionFlags, CFRunLoopActivity) {
    kCFRunLoopEntry ,           // Enter loop
    kCFRunLoopBeforeTimers ,    // Trigger Timer Callback
    kCFRunLoopBeforeSources ,   // Trigger Source0 callback
    kCFRunLoopBeforeWaiting ,   // Waiting for mach_port message
    kCFRunLoopAfterWaiting ),   // Receive mach_port message
    kCFRunLoopExit ,            // Exit loop
    kCFRunLoopAllActivities     // loop all state changes
}

RunLoop's method before going to sleep takes too long to go to sleep, or the thread wakes up and receives messages for too long to go to the next step, blocking the thread.If it is the main thread, it behaves as Carton.

Once the KCFRunLoopBeforeSources state before going to sleep is found, or KCFRunLoopAfterWaiting after waking up does not change within the set time threshold, it can be judged as Katon. At this time, the dump stack information restores the scene of the case and resolves the Katon problem.

Open a sub-thread to continuously cycle to see if it is jammed.A carton is considered after n times the carton threshold has been exceeded.Carton then dump s the stack and reports it (there is a mechanism for data processing in the next part).

WatchDog has different values in different states.

  • Launch: 20s
  • Resume: 10s
  • Suspend: 10s
  • Quit: 6s
  • Background: 3 minutes (10 minutes before iOS7; 3 minutes later; continuous application up to 10 minutes)

The Carton threshold is set based on WatchDog's mechanism.Thresholds within APM systems need to be smaller than WatchDog values, so values range from [1, 6], and the industry typically chooses 3 seconds.

By long dispatch_Semaphore_Wait(dispatch_Semaphore_T dsema, dispatch_Time_The t timeout) method determines if the main thread is blocked, Returns zero on success, or non-zero if the timeout occurred. Returns non-zero if the timeout occurred. A timeout blocked the main thread.

Many people may wonder why KCFRunLoopBeforeSources and KCFRunLoopAfterWaiting were chosen because there were so many RunLoop states.Because most of Carton is between KCFRunLoopBeforeSources and KCFRunLoopAfterWaiting.For example, App internal events of Source0 type, etc.

The Runloop Detect Carton flowchart is as follows:

The key codes are as follows:

// Setting up the running environment of Runloop observer
CFRunLoopObserverContext context = {0, (__bridge void *)self, NULL, NULL};
// Create Runloop observer object
_observer = CFRunLoopObserverCreate(kCFAllocatorDefault,
                                    kCFRunLoopAllActivities,
                                    YES,
                                    0,
                                    &runLoopObserverCallBack,
                                    &context);
// Add the new observer to the runloop of the current thread
CFRunLoopAddObserver(CFRunLoopGetMain(), _observer, kCFRunLoopCommonModes);
// Create Signal
_semaphore = dispatch_semaphore_create(0);

__weak __typeof(self) weakSelf = self;
// Monitor duration on child threads
dispatch_async(dispatch_get_global_queue(0, 0), ^{
    __strong __typeof(weakSelf) strongSelf = weakSelf;
    if (!strongSelf) {
        return;
    }
    while (YES) {
        if (strongSelf.isCancel) {
            return;
        }
        // N cartons exceeding threshold T record as one carton
        long semaphoreWait = dispatch_semaphore_wait(self->_semaphore, dispatch_time(DISPATCH_TIME_NOW, strongSelf.limitMillisecond * NSEC_PER_MSEC));
        if (semaphoreWait != 0) {
            if (self->_activity == kCFRunLoopBeforeSources || self->_activity == kCFRunLoopAfterWaiting) {
                if (++strongSelf.countTime < strongSelf.standstillCount){
                    continue;
                }
                // The stack information dump is combined with the data reporting mechanism to upload data to the server according to certain strategies.The stack dump is explained below.Data reporting will be described in [Creating powerful, flexible and configurable data reporting components] (. /1.80.md)
            }
        }
        strongSelf.countTime = 0;
    }
});
3.2 Subthread ping How the main thread listens

Open a sub-thread to create a semaphore with an initial value of 0 and a Boolean type flag bit with an initial value of YES.Tasks with NO flags are assigned to the main thread. Subthreads sleep at a threshold time. When the time comes, it determines if the flags are successful by the main thread (value is NO). If unsuccessful, it is assumed that a ja*ton situation has occurred on the porcine thread. At this time, dump stack information is combined with data upload mechanism and data is uploaded to the server according to certain strategies.Data will be reported in Create powerful, flexible and configurable data reporting components speak

while (self.isCancelled == NO) {
        @autoreleasepool {
            __block BOOL isMainThreadNoRespond = YES;
            
            dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
            
            dispatch_async(dispatch_get_main_queue(), ^{
                isMainThreadNoRespond = NO;
                dispatch_semaphore_signal(semaphore);
            });
            
            [NSThread sleepForTimeInterval:self.threshold];
            
            if (isMainThreadNoRespond) {
                if (self.handlerBlock) {
                    self.handlerBlock(); // External dump stack inside the block (described below), Data Report
                }
            }
            
            dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);
        }
    }

4. Stack dump

Getting the method stack is a hassle.Take a look at your ideas.[NSThread callStackSymbols] Gets the call stack for the current thread.But when Carton is monitored, there is nothing you can do to get the stack information for the main thread.The path from any thread to the main thread is not accessible.Let's start with a knowledge review.

In computer science, the call stack is a stack-type data structure used to store thread information about computer programs.This kind of stack is also called execution stack, program stack, control stack, runtime stack, machine stack, etc.The call stack is used to track the point at which each active subroutine should return control after execution is complete.

Wikipedia searched for a picture and an example of Call Stack, as follows The diagram above represents a stack.It is divided into several stack frames, each corresponding to a function call.The blue part below represents the DrawSquare function, which calls the DrawLine function during execution and is represented by the green part.

You can see that the stack frame consists of three parts: function parameters, return address, and local variables.For example, the DrawLine function is called inside DrawSquare: the first one stacks the parameters required by the DrawLine function; the second one returns the address (control information).Example: function B is called inside function A, and the address of the next line of code calling function B is the return address) is put on the stack; local variables inside the third function are also stored on the stack.

Stack Pointer indicates the top of the current stack. Most operating systems grow down the stack, so the stack pointer is the minimum value.The address that the Frame Pointer points to stores the value of the last Stack Point, which is the return address.

In most operating systems, each stack frame also holds the Frame Pointer of the previous stack frame.Knowing the current stack frame's Stack Pointer s and Frame Pointers allows you to go back and get the frames at the bottom of the stack recursively.

The next step is to get Stack Pointer and Frame Pointer for all threads.Then go back and restore the scene of the case.

5. Mach Task Knowledge

Mach task:

When App runs, it corresponds to a Mach Task, which may have multiple threads executing tasks simultaneously.Mach Task is described in OS X and iOS Kernel Programming as a container object through which virtual memory space and other resources, including devices and other handles, are managed.In a nutshell: Mack task is a machine-independent abstraction of threads'execution environment.

Role: A task can be understood as a process that contains a list of threads.

Structures: task_threads, will target_All threads under task are saved in act_In the list array, the number of arrays is act_listCnt

kern_return_t task_threads
(
  task_t traget_task,
  thread_act_array_t *act_list,                     //Thread Pointer List
  mach_msg_type_number_t *act_listCnt  //Number of threads
)

thread_info:

kern_return_t thread_info
(
  thread_act_t target_act,
  thread_flavor_t flavor,
  thread_info_t thread_info_out,
  mach_msg_type_number_t *thread_info_outCnt
);

How to get stack data for a thread:

System method kern_return_t task_threads(task_inspect_t target_task, thread_act_array_t *act_list, mach_msg_type_number_t *act_listCnt); all threads are available, but the thread information obtained by this method is the lowest Mach thread.

For each thread, you can use kern_Return_T thread_Get_State (thread_Act_T target_Act, thread_State_Flavor_T flavor, thread_State_T old_State, mach_Msg_Type_Number_T *old_StateCnt; method takes all of its information and fills it in _STRUCT_Of the parameters of type MCONTEXT, two of the parameters in this method vary with the CPU architecture.So you need to define the differences between different CPUs for macro masking.

_STRUCT_In the MCONTEXT structure, the Stack Pointer and Frame pointer of the top stack frame of the current thread are stored, which in turn backtraces the entire thread call stack.

However, the above method takes the kernel thread, and the information we need is NSThread, so we need to convert the kernel thread to NSThread.

pthread's p is the abbreviation for POSIX, meaning Portable Operating System Interface.Each system is designed to have its own unique threading model, and different systems have different APIs for threading operations.So the purpose of POSIX is to provide abstract pthreads and related APIs.These APIs have different implementations in different operating systems, but they do the same thing.

Task_provided by Unix systemThreads and thread_get_state operates on the kernel system, with each kernel thread being threaded byThe id unique identifier of type T.The only identity of pthread is pthread_t type.Where the conversion of the kernel thread and pthread (thread_t and pthread_t) Easy because pthread was designed to be "abstract kernel threads".

Memorystatus_Action_Neededpthread_The callback function for the Create method to create a thread is nsthreadLauncher.

static void *nsthreadLauncher(void* thread)  
{
    NSThread *t = (NSThread*)thread;
    [nc postNotificationName: NSThreadDidStartNotification object:t userInfo: nil];
    [t _setName: [t name]];
    [t main];
    [NSThread exit];
    return NULL;
}

NSThreadDidStartNotification is actually the string @'_NSThreadDidStartNotification ".

<NSThread: 0x...>{number = 1, name = main}  

For NSThread to correspond to kernel threads, only one-to-one correspondence can be made by name.API pthread_of pthreadGetname_np also gets the name of the kernel thread.np stands for not POSIX, so it cannot be used across platforms.

The idea is summarized as follows: Store the original name of NSThread, change the name to a random number (time stamp), then iterate through the name of the pthread kernel thread, and name matching corresponds to NSThread and the kernel thread.Restore the thread's name to its original name when found.For the main thread, because pthread_cannot be usedGetname_Np, so get thread_in the load method of the current codeT, and then match the name.

static mach_port_t main_thread_id;  
+ (void)load {
    main_thread_id = mach_thread_self();
}

2. App Startup Time Monitoring

1. App Startup Time Monitoring

App startup time is one of the most important factors affecting the user experience, so we need to quantify how fast an App starts.Start-up is divided into cold start and hot start.

Cold Start: App is not running yet and must load and build the entire app.Complete the initialization of the application.There is a large optimization space for cold start.The cold start time starts with the application: didFinishLaunchingWithOptions: method, where App typically performs basic initialization of various SDK s and Apps.

Hot Start: The app is already running in the background (common scenarios: when a user clicks the Home key while using App, then opens App), and since some events wake up the app to the foreground, App accepts events that the app enters the foreground in the applicationWillEnterForeground: method

The idea is simple.as follows

  • Get the current time value first in the load method of the monitoring class
  • Listen for UIApplicationDidFinishLaunchingNotification after App Launch Completes
  • Get the current time after receipt of notification
  • The time difference between steps 1 and 3 is the App start time.

mach_absolute_time is a CPU/bus dependent function that returns the number of CPU clock cycles.It does not increase when the system hibernates.Is a number at the nanosecond level.Two nanoseconds before and after acquisition need to be converted to seconds.Need a benchmark based on system time, via mach_timebase_info obtained.

mach_timebase_info_data_t g_cmmStartupMonitorTimebaseInfoData = 0;
mach_timebase_info(&g_cmmStartupMonitorTimebaseInfoData);
uint64_t timelapse = mach_absolute_time() - g_cmmLoadTime;
double timeSpan = (timelapse * g_cmmStartupMonitorTimebaseInfoData.numer) / (g_cmmStartupMonitorTimebaseInfoData.denom * 1e9);

2. Online monitoring of startup time is good, but it needs to be optimized during the development phase.

To optimize the start-up time, you first know what has been done in the start-up phase and make a plan for the current situation.

The pre-main phase is defined as the stage when App starts to call the main function; the main phase is defined as the viewDidAppear when the main function enters the main UI framework.

App startup process:

  • analysis Info.plist: Load relevant information such as flash screen, sandbox establishment, permission check;
  • Mach-O Loading: If it is a fat binary file, look for the appropriate parts of the previous CPU architecture; load all dependent Mach-O files (recursively calling the method of Mach-O loading); define internal and external pointer references, such as strings, functions, and so on; load methods in classifications; load c++ static objects, call Objc's + load() function; execute functions declared as_ObjcC-function of attribute((constructor));
  • Program execution: call main(); call UIApplicationMain(); call applicationWillFinishLaunching();

Pre-Main phase

Main Stage

2.1 Load Dylib

For each dynamic library load, dyld requires

  • Analysis Dependent Dynamic Libraries
  • Locate Mach-O files for dynamic libraries
  • Open File
  • Verify File
  • Register file signatures at the system core
  • Call mmap() on each segment of a dynamic library

Optimize:

  • Reduce dependency on non-system Libraries
  • Use static libraries instead of dynamic libraries
  • Merge non-system dynamic libraries into one dynamic library
2.2 Rebase && Binding

Optimize:

  • Reduce the number of Objc classes, reduce the number of selector s, and delete unused classes and functions
  • Reduce the number of c++ virtual functions
  • Switch to Swift struct (essentially reducing the number of symbols)
2.3 Initializers

Optimize:

  • Use + initialize instead of + load
  • Instead of using attribute*((constructor)) to mark the method display as an initializer, let the initialization method call execute.For example, use dispatch_one, pthread_once() or std::once().That is, it is initialized the first time it is used, delays some work, takes time, and tries not to use c++ static objects
2.4 pre-main stage influencing factors
  • The more dynamic libraries are loaded, the slower they will start.
  • The more ObjC classes, the more functions, the slower to start.
  • The larger the executable, the slower it starts.
  • The more constructor functions C has, the slower it will start.
  • The more C++ static objects there are, the slower the startup will be.
  • The more + load s ObjC has, the slower it will start.

Optimize means:

  • Reduce dependency on unnecessary libraries, whether dynamic or static; transform dynamic libraries into static libraries if possible; and merge multiple non-system dynamic libraries into one dynamic library if you must rely on dynamic libraries
  • Check that the framework should be set to optional and required, if it exists in all iOS system versions currently supported by App, then required, otherwise optional, because optional has some additional checks
  • Merge or delete some OC classes and functions.For cleaning up unused classes in your project, use the tool AppCode Code Code Check to find classes that are not used in your current project (you can also analyze them from a linkmap file, but with less accuracy) There is one called FUI Open source projects can analyze classes that are no longer used very well. The accuracy is very high. The only problem is that they cannot handle the classes provided in dynamic and static libraries, nor the class templates in C++.
  • Delete some useless static variables
  • Delete methods that have not been called or are obsolete
  • Delay what you have to do in the + load method to + initialize, and try not to use C++ virtual functions (there is an overhead in creating virtual function tables)
  • Class and method names should not be too long: iOS each class and method name is uString values are stored in the cstring segment, so the length of class and method names also affects the size of the executable Object-c object model saves the class/method name string because it is also a dynamic property of Object-c because it needs to be found through class/method name reflection for invocation.
  • Using dispatch_once() instead of all attribute((constructor)) functions, C++ static object initialization, Obj C+load functions;
  • Compressing the size of a picture within the range acceptable to the designer can yield unexpected results. Why does compressing pictures speed up startup?Because it is normal to load more than ten or twenty pictures of different sizes at startup. If the picture is small, the IO operation will be small and the startup will be fast, TinyPNG is the more reliable compression algorithm.
2.5 main phase optimization
  • Reduce the number of processes that initiate initialization.Lazy load is lazy load, can put background initialization is lazy initialization, can delay initialization is delayed initialization, do not card the startup time of the main thread, the offline business code is deleted directly
  • Optimize code logic.Remove unnecessary logic and code to reduce the time consumed by each process
  • Startup phase uses multithreading for initialization to maximize CPU performance
  • Use pure code instead of xib or storyboard to describe the UI, especially the main UI framework, such as TabBarController.Since xib and storyboard still need to parse into code to render the page, there is one more step.

3. CPU Usage Monitoring

1. CPU architecture

CPU (Central Processing Unit) central processing unit, the mainstream architectures in the market are ARM (arm64), Intel (x86), AMD, and so on.Intel uses CISC (Complex Instruction Set Computer) and ARM uses RISC (Reduced Instruction Set Computer).The difference lies in different CPU design concepts and methods.

Early CPUs were all CISC architectures designed to accomplish the required computing tasks with minimal machine language instructions.For example, for multiplication, on a CISC-based CPU.One instruction MUL ADDRA, ADDRB can multiply memory ADDRA and memory ADDRB by fragrance, and store the results in ADDRA.What you do is: read the data from ADDRA and ADDRB into the registers, and write the multiplied results to memory depends on the CPU design, so the CISC architecture increases the complexity of the CPU and the requirements for CPU processing.

The RISC architecture requires software to specify the steps of operation.For example, the multiplication above is implemented as MOVE A, ADDRA; MOVE B, ADDRB; MUL A, B; STR ADDRA, A;.This architecture reduces CPU complexity and allows more powerful CPUs to be produced at the same process level, but requires a higher level of compiler design.

The current market is that most iPhone s are based on the arm64 architecture.Moreover, the arm architecture consumes less energy.

2. Get thread information <a name="threadInfo"></a>

Finished differentiating how to monitor CPU usage

  • Turn on the timer and execute the following logic continuously according to the set cycle
  • Gets the current task.Get all thread information (number of threads, thread array) from the current task
  • Traverse through all thread information to determine if there are threads whose CPU usage exceeds the set threshold
  • dump stack if thread usage exceeds threshold
  • Assemble and report data

Thread Information Structures

struct thread_basic_info {
	time_value_t    user_time;      /* user run time(User Runtime) */
	time_value_t    system_time;    /* system run time(System Runtime Length) */ 
	integer_t       cpu_usage;      /* scaled cpu usage percentage(CPU Usage, Upper 1000) */
	policy_t        policy;         /* scheduling policy in effect(Effective Scheduling Policy) */
	integer_t       run_state;      /* run state (Running status, see below) */
	integer_t       flags;          /* various flags (Various tags) */
	integer_t       suspend_count;  /* suspend count for thread(Number of thread suspensions) */
	integer_t       sleep_time;     /* number of seconds that thread
	                                 *  has been sleeping(Sleep Time) */
};

The code was talked about in Stack Restore and forgot to take a look at the analysis above

thread_act_array_t threads;
mach_msg_type_number_t threadCount = 0;
const task_t thisTask = mach_task_self();
kern_return_t kr = task_threads(thisTask, &threads, &threadCount);
if (kr != KERN_SUCCESS) {
    return ;
}
for (int i = 0; i < threadCount; i++) {
    thread_info_data_t threadInfo;
    thread_basic_info_t threadBaseInfo;
    mach_msg_type_number_t threadInfoCount;
    
    kern_return_t kr = thread_info((thread_inspect_t)threads[i], THREAD_BASIC_INFO, (thread_info_t)threadInfo, &threadInfoCount);
    
    if (kr == KERN_SUCCESS) {
        
        threadBaseInfo = (thread_basic_info_t)threadInfo;
        // todo: conditional judgment, not clear
        if (!(threadBaseInfo->flags & TH_FLAGS_IDLE)) {
            integer_t cpuUsage = threadBaseInfo->cpu_usage / 10;
            if (cpuUsage > CPUMONITORRATE) {
                
                NSMutableDictionary *CPUMetaDictionary = [NSMutableDictionary dictionary];
                NSData *CPUPayloadData = [NSData data];
                
                NSString *backtraceOfAllThread = [BacktraceLogger backtraceOfAllThread];
                // 1. Meta information for assembling Carton
                CPUMetaDictionary[@"MONITOR_TYPE"] = CMMonitorCPUType;
            
                // 2. Assemble Payload information for Carton (a JSON object whose Key is the agreed STACK_TRACE, value is stack information after base64)
                NSData *CPUData = [SAFE_STRING(backtraceOfAllThread) dataUsingEncoding:NSUTF8StringEncoding];
                NSString *CPUDataBase64String = [CPUData base64EncodedStringWithOptions:0];
                NSDictionary *CPUPayloadDictionary = @{@"STACK_TRACE": SAFE_STRING(CPUDataBase64String)};
                
                NSError *error;
                // The NSJSONWritingOptions parameter must pass 0 because the server needs to pass 0 with no\n in the generated json string based on \n processing logic
                NSData *parsedData = [NSJSONSerialization dataWithJSONObject:CPUPayloadDictionary options:0 error:&error];
                if (error) {
                    CMMLog(@"%@", error);
                    return;
                }
                CPUPayloadData = [parsedData copy];
                
                // 3. Data reporting will be described in [Creating powerful, flexible and configurable data reporting components] (. /1.80.md)
                [[PrismClient sharedInstance] sendWithType:CMMonitorCPUType meta:CPUMetaDictionary payload:CPUPayloadData]; 
            }
        }
    }
}

IV. OOM Issues

1. Basic knowledge preparation

Hard Disk: Also known as a disk, used to store data.The songs, pictures and videos you store are all on your hard drive.

Memory: Due to the slow reading speed of the hard disk, if all the data is read directly from the hard disk during the CPU running the program, the efficiency will be greatly affected.So the CPU reads the data needed to run the program from the hard disk into memory.The CPU then computes and swaps with the in-memory data.Memory is volatile memory (data disappears after power failure).Memory strips are memory inside the computer (on the motherboard) that stores intermediate data and results from CPU operations.Memory is the bridge between programs and CPUs.Read data from the hard disk or run a program to provide to the CPU.

Virtual memory is a technology for memory management in computer systems.It allows programs to think of it as having contiguous available memory, but in reality, it is often split into physical memory fragments, which may be partially temporarily stored on external disk (hard disk) memory (swapped into memory with data from the hard disk when needed).The Windows system is called Virtual Memory and the Linux/Unix system is called Swap Space.

IOS does not support swap space?Not only does iOS not support swap space, but most mobile systems do not.Because a large amount of memory in a mobile device is flash memory, it reads and writes much faster than the hard disk used by a computer. That is to say, even if a mobile phone uses swap space technology, it cannot improve performance because of the slow flash memory, so there is no swap space technology.

2. iOS memory knowledge

Memory (RAM), like CPU, is the scarcest resource in the system and is prone to competition. Application memory is directly related to performance.iOS does not have swap space as an alternative resource, so memory resources are particularly important.

What is OOM?Is the abbreviation of out-of-memory, which literally means that the memory limit has been exceeded.They are divided into FOOM (foreground OOM) and BOOM (background OOM).It is a non-mainstream Crash caused by iOS's Jetsam mechanism and cannot be captured by such a monitoring scheme as Signal.

What is the Jetsam mechanism?The Jetsam mechanism can be understood as a management mechanism used by the system to control the overuse of memory resources.The Jetsam mechanism runs in a separate process, where each process has a memory threshold beyond which Jetsam kills the process immediately.

Why design the Jetsam mechanism?Memory resources are important because the device's memory is limited.System processes and other apps used will grab this resource.Because iOS does not support swap space, once a low memory event is triggered, Jetsam frees up as much of the App's memory as possible, so when there is a memory shortage on the iOS system, App is killed by the system and becomes crash.

Two scenarios trigger OOM: the system kills lower priority Apps based on priority policy due to high overall memory usage; the current App reaches "highg water mark", and the system also kills the current App (beyond the system's memory limit for the current single App).

Read the source (xnu/bsd/kern/kern_memorystatus.c) will find that there are two mechanisms for memory killing, as follows

highwater processing - >Our App cannot consume more memory than a single limit

  1. Loop threads from priority list
  2. Determine if p_is satisfiedMemstat_Restrictions of memlimit
  3. DiagonoseActive, FREEZE Filtering
  4. Kill process, exit if successful, otherwise cycle

memorystatus_act_aggressive processing - > high memory usage, kill by priority

  1. According to the policy family in jld_bucket_count, used to determine if you were killed
  2. From JETSAM_PRIORITY_ELEVATED_INACTIVE Begins Killing
  3. Old_bucket_count and memorystatus_jld_eval_period_msecs decides whether to kill
  4. Kill from low to high priority until memorystatus_avail_pages_below_pressure

Several cases of excessive memory

  • App consumes less memory and other Apps manage it well, so even if you switch to another App, our own App is still "alive" and retains user status.Experience well
  • App memory consumption is low, but other Apps consume too much memory (either poorly managed or consuming resources themselves, such as games), so in addition to the threads in the foreground, all Apps are killed by the system, reclaiming memory resources to provide memory for active processes.
  • App consumes a lot of memory. Switching to other Apps, even if other Apps request less memory from the system, the system will kill Apps which consume more memory first because of memory shortage.App reload starts when the user exits the background and opens it again later.
  • App consumes a lot of memory and is killed by the system when running in the foreground, causing flicker.

When App runs out of memory, the system follows a strategy to make more room for use.A common practice is to move a portion of low-priority data to disk, called page out.When the data is accessed again later, it is the responsibility of the system to move it back into memory, which is called page in.

Memory page** is the smallest unit of memory management and is allocated by the system. A page may hold multiple objects, or a large object may span multiple pages.Usually it is 16KB in size and has three types of pages.

  • Clean Memory Clean memory consists of three categories: memory that can be page out, memory mapped files, and framework that App uses (each framework has _DATA_The CONST segment, which is usually clean, becomes dirty when runtime swizling is used.

    Pages that were initially allocated were clean (except for object allocations in the heap), and our App data became dirty when it was written.Files read into memory from the hard disk are also read-only, clean page s.

  • Dirty Memory

    Dirty memory consists of four categories: memory written to data by App, objects allocated by all heaps, image decoding buffers, framework (framework has _DATA segment and _DATA_DIRTY segments, whose memory is dirty.

    Dirty memory is generated during the use of the framework, and the use of singleton or global initialization methods can help reduce Dirty memory (because once a singleton is created it will not be destroyed, it will always be in memory, and the system is not considered Dirty memory).

  • Compressed Memory

    Due to flash memory capacity and read-write limitations, iOS does not have a swap space mechanism, but introduces memory compressor in iOS7.It is a memory object that can be compressed and freed up more page s when memory is tight and unused in the recent period of time.The memory compressor unmultiplexes it when needed.It saves memory and improves response speed.

    For example, App uses a Framework with a NSDictionary attribute inside to store data, uses 3 pages of memory, compresses it to 1 page when it has not been accessed recently, and restores it to 3 pages when it is reused.

App run memory = pageNumbers * pageSize.Because Compressed Memory belongs to Dirty memory.So Memory footprint = dirtySize + CompressedSize

Different devices, different memory usage caps, higher App cap, lower extension cap, more crash to EXC_RESOURCE_EXCEPTION.

Next, let's talk about how to get the upper memory limit and monitor how App was forced to kill because it consumed too much memory.

3. Get memory information

3.1 Calculate memory limits from the JetsamEvent log

When App is killed by the Jetsam mechanism, the phone generates a system log.View the path: Settings-Privacy-Analytics & Improvements-Analytics Data (Settings-Privacy-Analysis and Improvements-Analysis Data), you can see the JetsamEvent-2020-03-14-161828.ips log, starting with JetsamEvent.These JetsamEvent logs are left over by the iOS system kernel to forcibly kill Apps that have low priority (idle, frontmost, suspended) and occupy more memory than the system memory limit.

The log contains App's memory information.You can see that there is a pageSize field at the top of the log, look for per-process-limit, the rpages in the structure where the node is located, and you can get the OOM threshold by using rpages * pageSize.

The largestProcess field in the log represents the name of the App; the reason field represents the memory reason; and the states field represents the state of the App when it runs (idle, suspended, frontmost...).

To test the accuracy of the data, I will test all Apps on two devices (iPhone 6s plus/13.3.1, iPhone 11 Pro/13.3.1) and exit completely, running only one Demo App to test the memory threshold.Loop request memory, the ViewController code is as follows

- (void)viewDidLoad {
    [super viewDidLoad];
    NSMutableArray *array = [NSMutableArray array];
    for (NSInteger index = 0; index < 10000000; index++) {
        UIImageView *imageView = [[UIImageView alloc] initWithFrame:CGRectMake(0, 0, 100, 100)];
        UIImage *image = [UIImage imageNamed:@"AppIcon"];
        imageView.image = image;
        [array addObject:imageView];
    }
}

The data for the iPhone 6s plus/13.3.1 are as follows:

{"bug_type":"298","timestamp":"2020-03-19 17:23:45.94 +0800","os_version":"iPhone OS 13.3.1 (17D50)","incident_id":"DA8AF66D-24E8-458C-8734-981866942168"}
{
  "crashReporterKey" : "fc9b659ce486df1ed1b8062d5c7c977a7eb8c851",
  "kernel" : "Darwin Kernel Version 19.3.0: Thu Jan  9 21:10:44 PST 2020; root:xnu-6153.82.3~1\/RELEASE_ARM64_S8000",
  "product" : "iPhone8,2",
  "incident" : "DA8AF66D-24E8-458C-8734-981866942168",
  "date" : "2020-03-19 17:23:45.93 +0800",
  "build" : "iPhone OS 13.3.1 (17D50)",
  "timeDelta" : 332,
  "memoryStatus" : {
  "compressorSize" : 48499,
  "compressions" : 7458651,
  "decompressions" : 5190200,
  "zoneMapCap" : 744407040,
  "largestZone" : "APFS_4K_OBJS",
  "largestZoneSize" : 41402368,
  "pageSize" : 16384,
  "uncompressed" : 104065,
  "zoneMapSize" : 141606912,
  "memoryPages" : {
    "active" : 26214,
    "throttled" : 0,
    "fileBacked" : 14903,
    "wired" : 20019,
    "anonymous" : 37140,
    "purgeable" : 142,
    "inactive" : 23669,
    "free" : 2967,
    "speculative" : 2160
  }
},
  "largestProcess" : "Test",
  "genCounter" : 0,
  "processes" : [
  {
    "uuid" : "39c5738b-b321-3865-a731-68064c4f7a6f",
    "states" : [
      "daemon",
      "idle"
    ],
    "lifetimeMax" : 188,
    "age" : 948223699030,
    "purgeable" : 0,
    "fds" : 25,
    "coalition" : 422,
    "rpages" : 177,
    "pid" : 282,
    "idleDelta" : 824711280,
    "name" : "com.apple.Safari.SafeBrowsing.Se",
    "cpuTime" : 10.275422000000001
  },
  // ...
  {
    "uuid" : "83dbf121-7c0c-3ab5-9b66-77ee926e1561",
    "states" : [
      "frontmost"
    ],
    "killDelta" : 2592,
    "genCount" : 0,
    "age" : 1531004794,
    "purgeable" : 0,
    "fds" : 50,
    "coalition" : 1047,
    "rpages" : 92806,
    "reason" : "per-process-limit",
    "pid" : 2384,
    "cpuTime" : 59.464373999999999,
    "name" : "Test",
    "lifetimeMax" : 92806
  },
  // ...
 ]
}

The OOM critical value for iPhone 6s plus/13.3.1 mobile phone is: (16384*92806)/(1024*1024)=1450.09375M

The data for iPhone 11 Pro/13.3.1 are as follows:

{"bug_type":"298","timestamp":"2020-03-19 17:30:28.39 +0800","os_version":"iPhone OS 13.3.1 (17D50)","incident_id":"7F111601-BC7A-4BD7-A468-CE3370053057"}
{
  "crashReporterKey" : "bc2445adc164c399b330f812a48248e029e26276",
  "kernel" : "Darwin Kernel Version 19.3.0: Thu Jan  9 21:11:10 PST 2020; root:xnu-6153.82.3~1\/RELEASE_ARM64_T8030",
  "product" : "iPhone12,3",
  "incident" : "7F111601-BC7A-4BD7-A468-CE3370053057",
  "date" : "2020-03-19 17:30:28.39 +0800",
  "build" : "iPhone OS 13.3.1 (17D50)",
  "timeDelta" : 189,
  "memoryStatus" : {
  "compressorSize" : 66443,
  "compressions" : 25498129,
  "decompressions" : 15532621,
  "zoneMapCap" : 1395015680,
  "largestZone" : "APFS_4K_OBJS",
  "largestZoneSize" : 41222144,
  "pageSize" : 16384,
  "uncompressed" : 127027,
  "zoneMapSize" : 169639936,
  "memoryPages" : {
    "active" : 58652,
    "throttled" : 0,
    "fileBacked" : 20291,
    "wired" : 45838,
    "anonymous" : 96445,
    "purgeable" : 4,
    "inactive" : 54368,
    "free" : 5461,
    "speculative" : 3716
  }
},
  "largestProcess" : "Hangzhou Xiaoliu",
  "genCounter" : 0,
  "processes" : [
  {
    "uuid" : "2dd5eb1e-fd31-36c2-99d9-bcbff44efbb7",
    "states" : [
      "daemon",
      "idle"
    ],
    "lifetimeMax" : 171,
    "age" : 5151034269954,
    "purgeable" : 0,
    "fds" : 50,
    "coalition" : 66,
    "rpages" : 164,
    "pid" : 11276,
    "idleDelta" : 3801132318,
    "name" : "wcd",
    "cpuTime" : 3.430787
  },
  // ...
  {
    "uuid" : "63158edc-915f-3a2b-975c-0e0ac4ed44c0",
    "states" : [
      "frontmost"
    ],
    "killDelta" : 4345,
    "genCount" : 0,
    "age" : 654480778,
    "purgeable" : 0,
    "fds" : 50,
    "coalition" : 1718,
    "rpages" : 134278,
    "reason" : "per-process-limit",
    "pid" : 14206,
    "cpuTime" : 23.955463999999999,
    "name" : "Hangzhou Xiaoliu",
    "lifetimeMax" : 134278
  },
  // ...
 ]
}

The OOM critical value for iPhone 11 Pro/13.3.1 mobile phone is: (16384*134278)/(1024*1024)=2098.09375M

How does the iOS system discover Jetsam?

MacOS/iOS is a BSD-derived system whose kernel is Mach, but interfaces exposed to the upper layer are usually based on BSD-layer packaging of Mach.Mach is a microkernel architecture in which true virtual memory management takes place, and BSD provides the upper interface for memory management.Jetsam events are also generated by BSD.Bsd_The init function is the entry, where each subsystem, such as virtual memory management, is basically initialized.

// 1. Initialize the kernel memory allocator, initializes BSD memory Zone, which is built on the Mach kernel zone
kmeminit();

// 2. Initialise background freezing, unique features on iOS, dormant resident monitoring threads for memory and processes
#if CONFIG_FREEZE
#ifndef CONFIG_MEMORYSTATUS
    #error "CONFIG_FREEZE defined without matching CONFIG_MEMORYSTATUS"
#endif
    /* Initialise background freezing */
    bsd_init_kprintf("calling memorystatus_freeze_init\n");
    memorystatus_freeze_init();
#endif>

// 3. iOS unique, JetSAM (resident monitoring thread with low memory events)
#if CONFIG_MEMORYSTATUS
    /* Initialize kernel memory status notifications */
    bsd_init_kprintf("calling memorystatus_init\n");
    memorystatus_init();
#endif /* CONFIG_MEMORYSTATUS */

The main purpose is to open two threads with the highest priority to monitor the memory condition of the entire system.

CONFIG_When FREEZE is turned on, the kernel freezes the process rather than kills it.Cryopreservation is initiated by a memory ystatus_in the kernelFreeze_Threads, which calls memorystatus_when a signal is receivedFreeze_Top_Process to freeze.

The iOS system opens the highest priority thread vm_Pressure_The monitor monitors the memory pressure of the system and maintains all App processes through a stack.The iOS system also maintains a memory snapshot table that holds the memory page consumption for each process.For Jetsam, or memorystatus-related logic, you can use kern_in an XNU projectMemorystatus.h and **kern_View in the source code for memorystatus.c **.

At least six seconds before the iOS system can kill App due to high memory usage can be used for priority judgment, and the JetsamEvent log is also generated within that six seconds.

As mentioned above, iOS systems do not have swap space, so the MemoryStatus mechanism (also known as Jetsam) was introduced.This means freeing up as much memory as possible on the iOS system for use by the current App.This mechanism manifests itself in priority by forcibly killing background applications first; if there is not enough memory, forcibly killing current applications.In MacOS, MemoryStatus only kills processes marked as idle exit.

The MemoryStatus mechanism opens a memorystatus_jetsam_thread's thread, which is responsible for killing Apps and logging logs, does not send messages, so memory pressure detection threads cannot get messages about killing Apps.

When the monitoring thread discovers that an App has memory pressure, it notifies the app that has memory and executes the didReceiveMemoryWarning proxy method.At this time, we also have the opportunity to do some memory resource release logic that might prevent App from being killed by the system.

Source Angle View Problem

The iOS system kernel has an array dedicated to maintaining thread priority.Each item of the array is a structure containing a list of process chains.The structure is as follows:

#define MEMSTAT_BUCKET_COUNT (JETSAM_PRIORITY_MAX + 1)

typedef struct memstat_bucket {
    TAILQ_HEAD(, proc) list;
    int count;
} memstat_bucket_t;

memstat_bucket_t memstat_bucket[MEMSTAT_BUCKET_COUNT];

In kern_Priority information can be seen in memorystatus.h

#define JETSAM_PRIORITY_IDLE_HEAD                -2
/* The value -1 is an alias to JETSAM_PRIORITY_DEFAULT */
#define JETSAM_PRIORITY_IDLE                      0
#define JETSAM_PRIORITY_IDLE_DEFERRED		  1 /* Keeping this around till all xnu_quick_tests can be moved away from it.*/
#define JETSAM_PRIORITY_AGING_BAND1		  JETSAM_PRIORITY_IDLE_DEFERRED
#define JETSAM_PRIORITY_BACKGROUND_OPPORTUNISTIC  2
#define JETSAM_PRIORITY_AGING_BAND2		  JETSAM_PRIORITY_BACKGROUND_OPPORTUNISTIC
#define JETSAM_PRIORITY_BACKGROUND                3
#define JETSAM_PRIORITY_ELEVATED_INACTIVE	  JETSAM_PRIORITY_BACKGROUND
#define JETSAM_PRIORITY_MAIL                      4
#define JETSAM_PRIORITY_PHONE                     5
#define JETSAM_PRIORITY_UI_SUPPORT                8
#define JETSAM_PRIORITY_FOREGROUND_SUPPORT        9
#define JETSAM_PRIORITY_FOREGROUND               10
#define JETSAM_PRIORITY_AUDIO_AND_ACCESSORY      12
#define JETSAM_PRIORITY_CONDUCTOR                13
#define JETSAM_PRIORITY_HOME                     16
#define JETSAM_PRIORITY_EXECUTIVE                17
#define JETSAM_PRIORITY_IMPORTANT                18
#define JETSAM_PRIORITY_CRITICAL                 19

#define JETSAM_PRIORITY_MAX                      21

It is obvious that the background App priority JETSAM_PRIORITY_BACKGROUND is 3, foreground App priority JETSAM_PRIORITY_FOREGROUND is 10.

The priority rules are: Kernel Thread Priority > Operating System Priority > App Priority.The foreground App priority is higher than the background App; threads with more CPU usage have lower priority when their priority is the same.

In kern_Possible reasons for OOM can be seen in memorystatus.c:

/* For logging clarity */
static const char *memorystatus_kill_cause_name[] = {
	""								,		/* kMemorystatusInvalid							*/
	"jettisoned"					,		/* kMemorystatusKilled							*/
	"highwater"						,		/* kMemorystatusKilledHiwat						*/
	"vnode-limit"					,		/* kMemorystatusKilledVnodes					*/
	"vm-pageshortage"				,		/* kMemorystatusKilledVMPageShortage			*/
	"proc-thrashing"				,		/* kMemorystatusKilledProcThrashing				*/
	"fc-thrashing"					,		/* kMemorystatusKilledFCThrashing				*/
	"per-process-limit"				,		/* kMemorystatusKilledPerProcessLimit			*/
	"disk-space-shortage"			,		/* kMemorystatusKilledDiskSpaceShortage			*/
	"idle-exit"						,		/* kMemorystatusKilledIdleExit					*/
	"zone-map-exhaustion"			,		/* kMemorystatusKilledZoneMapExhaustion			*/
	"vm-compressor-thrashing"		,		/* kMemorystatusKilledVMCompressorThrashing		*/
	"vm-compressor-space-shortage"	,		/* kMemorystatusKilledVMCompressorSpaceShortage	*/
};

View memorystatus_init This function initializes the Jetsam thread's key code

__private_extern__ void
memorystatus_init(void)
{
	// ...
  /* Initialize the jetsam_threads state array */
	jetsam_threads = kalloc(sizeof(struct jetsam_thread_state) * max_jetsam_threads);
  
	/* Initialize all the jetsam threads */
	for (i = 0; i < max_jetsam_threads; i++) {

		result = kernel_thread_start_priority(memorystatus_thread, NULL, 95 /* MAXPRI_KERNEL */, &jetsam_threads[i].thread);
		if (result == KERN_SUCCESS) {
			jetsam_threads[i].inited = FALSE;
			jetsam_threads[i].index = i;
			thread_deallocate(jetsam_threads[i].thread);
		} else {
			panic("Could not create memorystatus_thread %d", i);
		}
	}
}
/*
 *	High-level priority assignments
 *
 *************************************************************************
 * 127		Reserved (real-time)
 *				A
 *				+
 *			(32 levels)
 *				+
 *				V
 * 96		Reserved (real-time)
 * 95		Kernel mode only
 *				A
 *				+
 *			(16 levels)
 *				+
 *				V
 * 80		Kernel mode only
 * 79		System high priority
 *				A
 *				+
 *			(16 levels)
 *				+
 *				V
 * 64		System high priority
 * 63		Elevated priorities
 *				A
 *				+
 *			(12 levels)
 *				+
 *				V
 * 52		Elevated priorities
 * 51		Elevated priorities (incl. BSD +nice)
 *				A
 *				+
 *			(20 levels)
 *				+
 *				V
 * 32		Elevated priorities (incl. BSD +nice)
 * 31		Default (default base for threads)
 * 30		Lowered priorities (incl. BSD -nice)
 *				A
 *				+
 *			(20 levels)
 *				+
 *				V
 * 11		Lowered priorities (incl. BSD -nice)
 * 10		Lowered priorities (aged pri's)
 *				A
 *				+
 *			(11 levels)
 *				+
 *				V
 * 0		Lowered priorities (aged pri's / idle)
 *************************************************************************
 */

You can see that user-mode applications cannot have higher threads than operating systems and cores.There are also differences in thread priority assignments between user-mode applications, such as applications in the foreground have a higher priority than applications in the background.The highest application priority on iOS is SpringBoard; in addition, the priority of threads is not constant.Mach dynamically adjusts thread priority based on thread utilization and overall system load.Thread priority is lowered if too many CPU s are consumed and increased if threads become overhungry.However, no matter how it changes, the program cannot exceed the priority interval of its thread.

You can see that the system will turn on max_based on the kernel boot parameters and device performanceJetsam_Threads (1 in general, 3 in special cases) jetsam threads with a priority of 95, or MAXPRI_KERNEL (Note that 95 here is the thread priority, and XNU has a thread priority range of 0-127.The macro definition above is process priority with an interval of -2~19.

Next, analyze memorystatus_Threadfunction, responsible for initializing thread startup

static void
memorystatus_thread(void *param __unused, wait_result_t wr __unused)
{
  //...
  while (memorystatus_action_needed()) {
		boolean_t killed;
		int32_t priority;
		uint32_t cause;
		uint64_t jetsam_reason_code = JETSAM_REASON_INVALID;
		os_reason_t jetsam_reason = OS_REASON_NULL;

		cause = kill_under_pressure_cause;
		switch (cause) {
			case kMemorystatusKilledFCThrashing:
				jetsam_reason_code = JETSAM_REASON_MEMORY_FCTHRASHING;
				break;
			case kMemorystatusKilledVMCompressorThrashing:
				jetsam_reason_code = JETSAM_REASON_MEMORY_VMCOMPRESSOR_THRASHING;
				break;
			case kMemorystatusKilledVMCompressorSpaceShortage:
				jetsam_reason_code = JETSAM_REASON_MEMORY_VMCOMPRESSOR_SPACE_SHORTAGE;
				break;
			case kMemorystatusKilledZoneMapExhaustion:
				jetsam_reason_code = JETSAM_REASON_ZONE_MAP_EXHAUSTION;
				break;
			case kMemorystatusKilledVMPageShortage:
				/* falls through */
			default:
				jetsam_reason_code = JETSAM_REASON_MEMORY_VMPAGESHORTAGE;
				cause = kMemorystatusKilledVMPageShortage;
				break;
		}

		/* Highwater */
		boolean_t is_critical = TRUE;
		if (memorystatus_act_on_hiwat_processes(&errors, &hwm_kill, &post_snapshot, &is_critical)) {
			if (is_critical == FALSE) {
				/*
				 * For now, don't kill any other processes.
				 */
				break;
			} else {
				goto done;
			}
		}

		jetsam_reason = os_reason_create(OS_REASON_JETSAM, jetsam_reason_code);
		if (jetsam_reason == OS_REASON_NULL) {
			printf("memorystatus_thread: failed to allocate jetsam reason\n");
		}

		if (memorystatus_act_aggressive(cause, jetsam_reason, &jld_idle_kills, &corpse_list_purged, &post_snapshot)) {
			goto done;
		}

		/*
		 * memorystatus_kill_top_process() drops a reference,
		 * so take another one so we can continue to use this exit reason
		 * even after it returns
		 */
		os_reason_ref(jetsam_reason);

		/* LRU */
		killed = memorystatus_kill_top_process(TRUE, sort_flag, cause, jetsam_reason, &priority, &errors);
		sort_flag = FALSE;

		if (killed) {
			if (memorystatus_post_snapshot(priority, cause) == TRUE) {

        			post_snapshot = TRUE;
			}

			/* Jetsam Loop Detection */
			if (memorystatus_jld_enabled == TRUE) {
				if ((priority == JETSAM_PRIORITY_IDLE) || (priority == system_procs_aging_band) || (priority == applications_aging_band)) {
					jld_idle_kills++;
				} else {
					/*
					 * We've reached into bands beyond idle deferred.
					 * We make no attempt to monitor them
					 */
				}
			}

			if ((priority >= JETSAM_PRIORITY_UI_SUPPORT) && (total_corpses_count() > 0) && (corpse_list_purged == FALSE)) {
				/*
				 * If we have jetsammed a process in or above JETSAM_PRIORITY_UI_SUPPORT
				 * then we attempt to relieve pressure by purging corpse memory.
				 */
				task_purge_all_corpses();
				corpse_list_purged = TRUE;
			}
			goto done;
		}
		
		if (memorystatus_avail_pages_below_critical()) {
			/*
			 * Still under pressure and unable to kill a process - purge corpse memory
			 */
			if (total_corpses_count() > 0) {
				task_purge_all_corpses();
				corpse_list_purged = TRUE;
			}

			if (memorystatus_avail_pages_below_critical()) {
				/*
				 * Still under pressure and unable to kill a process - panic
				 */
				panic("memorystatus_jetsam_thread: no victim! available pages:%llu\n", (uint64_t)memorystatus_available_pages);
			}
		}
			
done:	

}

You can see that it opens a loop, memorystatus_action_needed() to continue freeing memory as a loop condition.

static boolean_t
memorystatus_action_needed(void)
{
#if CONFIG_EMBEDDED
	return (is_reason_thrashing(kill_under_pressure_cause) ||
			is_reason_zone_map_exhaustion(kill_under_pressure_cause) ||
	       memorystatus_available_pages <= memorystatus_available_pages_pressure);
#else /* CONFIG_EMBEDDED */
	return (is_reason_thrashing(kill_under_pressure_cause) ||
			is_reason_zone_map_exhaustion(kill_under_pressure_cause));
#endif /* CONFIG_EMBEDDED */
}

It passes through vm_Memory pressure sent by pagepout to determine if current memory resources are tight.Several scenarios: frequent page swaps in and out is_reason_thrashing, Mach Zone exhausted is_reason_zone_map_exhaustion, and available pages are lower than memory status_available_pages is the threshold.

Continue reading memorystatus_thread, when memory is tight, triggers OOM of type High-water first, that is, OOM occurs when a process exceeds its maximum limit for using memory, hight water mark.In memorystatus_act_on_hiwat_processes(), through memorystatus_kill_hiwat_proc() in priority array memstat_Find the process with the lowest priority in the bucket if its memory is less than the threshold (footprint_In_Bytes <= memlimit_In_Bytes) continues to look for processes with lower priority until they find processes that occupy more memory than the threshold and kill them.

Usually it's hard for a single App to touch a high water mark. If you can't end any process, you end up at memorystatus_act_aggressive, where most OOM s occur.

static boolean_t
memorystatus_act_aggressive(uint32_t cause, os_reason_t jetsam_reason, int *jld_idle_kills, boolean_t *corpse_list_purged, boolean_t *post_snapshot)
{
	// ...
  if ( (jld_bucket_count == 0) || 
		     (jld_now_msecs > (jld_timestamp_msecs + memorystatus_jld_eval_period_msecs))) {

			/* 
			 * Refresh evaluation parameters 
			 */
			jld_timestamp_msecs	 = jld_now_msecs;
			jld_idle_kill_candidates = jld_bucket_count;
			*jld_idle_kills		 = 0;
			jld_eval_aggressive_count = 0;
			jld_priority_band_max	= JETSAM_PRIORITY_UI_SUPPORT;
		}
  //...
}

The code above shows that it takes a certain amount of time to decide whether to actually kill, provided that jld_Now_MSECS > (jld_Timestamp_MSECS + memorystatus_jld_eval_period_msecs.That is, in memorystatus_Jld_Eval_Period_Kill inside the condition does not occur until msecs.

/* Jetsam Loop Detection */
if (max_mem <= (512 * 1024 * 1024)) {
	/* 512 MB devices */
memorystatus_jld_eval_period_msecs = 8000;	/* 8000 msecs == 8 second window */
} else {
	/* 1GB and larger devices */
memorystatus_jld_eval_period_msecs = 6000;	/* 6000 msecs == 6 second window */
}

Where memorystatus_jld_eval_period_msecs takes a minimum of 6 seconds.So we can do something in 6 seconds.

3.2 Developer compilation income

stackoverflow There is a data compilation of OOM critical values for various devices

device crash amount:MB total amount:MB percentage of total
iPad1 127 256 49%
iPad2 275 512 53%
iPad3 645 1024 62%
iPad4(iOS 8.1) 585 1024 57%
Pad Mini 1st Generation 297 512 58%
iPad Mini retina(iOS 7.1) 696 1024 68%
iPad Air 697 1024 68%
iPad Air 2(iOS 10.2.1) 1383 2048 68%
iPad Pro 9.7"(iOS 10.0.2 (14A456)) 1395 1971 71%
iPad Pro 10.5"(iOS 11 beta4) 3057 4000 76%
iPad Pro 12.9" (2015)(iOS 11.2.1) 3058 3999 76%
iPad 10.2(iOS 13.2.3) 1844 2998 62%
iPod touch 4th gen(iOS 6.1.1) 130 256 51%
iPod touch 5th gen 286 512 56%
iPhone4 325 512 63%
iPhone4s 286 512 56%
iPhone5 645 1024 62%
iPhone5s 646 1024 63%
iPhone6(iOS 8.x) 645 1024 62%
iPhone6 Plus(iOS 8.x) 645 1024 62%
iPhone6s(iOS 9.2) 1396 2048 68%
iPhone6s Plus(iOS 10.2.1) 1396 2048 68%
iPhoneSE(iOS 9.3) 1395 2048 68%
iPhone7(iOS 10.2) 1395 2048 68%
iPhone7 Plus(iOS 10.2.1) 2040 3072 66%
iPhone8(iOS 12.1) 1364 1990 70%
iPhoneX(iOS 11.2.1) 1392 2785 50%
iPhoneXS(iOS 12.1) 2040 3754 54%
iPhoneXS Max(iOS 12.1) 2039 3735 55%
iPhoneXR(iOS 12.1) 1792 2813 63%
iPhone11(iOS 13.1.3) 2068 3844 54%
iPhone11 Pro Max(iOS 13.2.3) 2067 3740 55%
3.3 Trigger the high water mark of the current App

We can write timers, keep requesting memory, and then use phys_footprint currently consumes memory, so it makes sense to keep requesting memory to trigger the Jetsam mechanism and kill App, so the memory usage of the last print is the maximum memory limit of the current device.

timer = [NSTimer scheduledTimerWithTimeInterval:0.01 target:self selector:@selector(allocateMemory) userInfo:nil repeats:YES];

- (void)allocateMemory {
    UIImageView *imageView = [[UIImageView alloc] initWithFrame:CGRectMake(0, 0, 100, 100)];
    UIImage *image = [UIImage imageNamed:@"AppIcon"];
    imageView.image = image;
    [array addObject:imageView];
    
    memoryLimitSizeMB = [self usedSizeOfMemory];
    if (memoryWarningSizeMB && memoryLimitSizeMB) {
        NSLog(@"----- memory warnning:%dMB, memory limit:%dMB", memoryWarningSizeMB, memoryLimitSizeMB);
    }
}

- (int)usedSizeOfMemory {
    task_vm_info_data_t taskInfo;
    mach_msg_type_number_t infoCount = TASK_VM_INFO_COUNT;
    kern_return_t kernReturn = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t)&taskInfo, &infoCount);

    if (kernReturn != KERN_SUCCESS) {
        return 0;
    }
    return (int)(taskInfo.phys_footprint/1024.0/1024.0);
}
3.4 Availability for iOS13 systems

iOS13 Start <os/proc.h>Medium size_t os_proc_available_memory(void); you can view the currently available memory.

Return Value

The number of bytes that the app may allocate before it hits its memory limit. If the calling process isn't an app, or if the process has already exceeded its memory limit, this function returns 0.

Discussion

Call this function to determine the amount of memory available to your app. The returned value corresponds to the current memory limit minus the memory footprint of your app at the time of the function call. Your app's memory footprint consists of the data that you allocated in RAM, and that must stay in RAM (or the equivalent) at all times. Memory limits can change during the app life cycle and don't necessarily correspond to the amount of physical memory available on the device.

Use the returned value as advisory information only and don't cache it. The precise value changes when your app does any work that affects memory, which can happen frequently.

Although this function lets you determine the amount of memory your app may safely consume, don't use it to maximize your app's memory usage. Significant memory use, even when under the current memory limit, affects system performance. For example, when your app consumes all of its available memory, the system may need to terminate other apps and system processes to accommodate your app's requests. Instead, always consume the smallest amount of memory you need to be responsive to the user's needs.

If you need more detailed information about the available memory resources, you can call task_info. However, be aware that task_info is an expensive call, whereas this function is much more efficient.

if (@available(iOS 13.0, *)) {
	return os_proc_available_memory() / 1024.0 / 1024.0;
}

App Memory Information API can be found at Mach Layer, mach_Task_Basic_The info structure stores memory usage information for Mach task, where phys_footprint is the size of physical memory used by the application, virtual_size is the virtual memory size.

#define MACH_TASK_BASIC_INFO     20         /* always 64-bit basic info */
struct mach_task_basic_info {
    mach_vm_size_t  virtual_size;       /* virtual memory size (bytes) */
    mach_vm_size_t  resident_size;      /* resident memory size (bytes) */
    mach_vm_size_t  resident_size_max;  /* maximum resident memory size (bytes) */
    time_value_t    user_time;          /* total user run time for
                                            terminated threads */
    time_value_t    system_time;        /* total system run time for
                                            terminated threads */
    policy_t        policy;             /* default policy for new threads */
    integer_t       suspend_count;      /* suspend count for task */
};

So the get code is

task_vm_info_data_t vmInfo;
mach_msg_type_number_t count = TASK_VM_INFO_COUNT;
kern_return_t kr = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t)&vmInfo, &count);

if (kr != KERN_SUCCESS) {
    return ;
}
CGFloat memoryUsed = (CGFloat)(vmInfo.phys_footprint/1024.0/1024.0);

There may be some curiosity that resident_should not beSize This field gets memory usage?Resident_found after the initial testThere is a large difference between the size and Xcode measurements.Using phys_insteadFootprint is close to the result given by Xcode.And can be obtained from WebKit Source Verified in.

So on iOS13, we can use os_proc_available_memory gets the memory currently available through phys_Fooprint gets the current App memory usage, and the sum of the two, which is the upper memory limit of the current device, triggers the Jetsam mechanism when it exceeds it.

- (CGFloat)limitSizeOfMemory {
    if (@available(iOS 13.0, *)) {
        task_vm_info_data_t taskInfo;
        mach_msg_type_number_t infoCount = TASK_VM_INFO_COUNT;
        kern_return_t kernReturn = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t)&taskInfo, &infoCount);

        if (kernReturn != KERN_SUCCESS) {
            return 0;
        }
        return (CGFloat)((taskInfo.phys_footprint + os_proc_available_memory()) / (1024.0 * 1024.0);
    }
    return 0;
}

Currently available memory: 1435.936752MB; App currently occupies 14.5MB, critical value: 1435.936752MB + 14.5MB= 1450.436MB, same as the memory threshold obtained in the 3.1 method, "OOM critical value for iPhone 6s plus/13.3.1 mobile phone: (16384*92806)/(1024*1024)=1450.09375M".

3.5 Get memory limit via XNU

In XNU, there are functions and macros designed to get the upper limit of memory, which can be accessed through memorystatus_priority_entry This structure gets the priority and memory limit values for all processes.

typedef struct memorystatus_priority_entry {
  pid_t pid;
  int32_t priority;
  uint64_t user_data;
  int32_t limit;
  uint32_t state;
} memorystatus_priority_entry_t;

Priority represents the priority of the process and limit represents the memory limit of the process.But this requires root privileges, and I have not tried it since there is no jailbreak device.

The code can be found in kern_memorystatus.h file.Function int memorystatus_is requiredControl(uint32_T command, int32_T pid, uint32_T flags, void *buffer, size_T buffersize;

/* Commands */
#define MEMORYSTATUS_CMD_GET_PRIORITY_LIST            1
#define MEMORYSTATUS_CMD_SET_PRIORITY_PROPERTIES      2
#define MEMORYSTATUS_CMD_GET_JETSAM_SNAPSHOT          3
#define MEMORYSTATUS_CMD_GET_PRESSURE_STATUS          4
#define MEMORYSTATUS_CMD_SET_JETSAM_HIGH_WATER_MARK   5    /* Set active memory limit = inactive memory limit, both non-fatal	*/
#define MEMORYSTATUS_CMD_SET_JETSAM_TASK_LIMIT	      6    /* Set active memory limit = inactive memory limit, both fatal	*/
#define MEMORYSTATUS_CMD_SET_MEMLIMIT_PROPERTIES      7    /* Set memory limits plus attributes independently			*/
#define MEMORYSTATUS_CMD_GET_MEMLIMIT_PROPERTIES      8    /* Get memory limits plus attributes					*/
#define MEMORYSTATUS_CMD_PRIVILEGED_LISTENER_ENABLE   9    /* Set the task's status as a privileged listener w.r.t memory notifications  */
#define MEMORYSTATUS_CMD_PRIVILEGED_LISTENER_DISABLE  10   /* Reset the task's status as a privileged listener w.r.t memory notifications  */
#define MEMORYSTATUS_CMD_AGGRESSIVE_JETSAM_LENIENT_MODE_ENABLE  11   /* Enable the 'lenient' mode for aggressive jetsam. See comments in kern_memorystatus.c near the top. */
#define MEMORYSTATUS_CMD_AGGRESSIVE_JETSAM_LENIENT_MODE_DISABLE 12   /* Disable the 'lenient' mode for aggressive jetsam. */
#define MEMORYSTATUS_CMD_GET_MEMLIMIT_EXCESS          13   /* Compute how much a process's phys_footprint exceeds inactive memory limit */
#define MEMORYSTATUS_CMD_ELEVATED_INACTIVEJETSAMPRIORITY_ENABLE 	14 /* Set the inactive jetsam band for a process to JETSAM_PRIORITY_ELEVATED_INACTIVE */
#define MEMORYSTATUS_CMD_ELEVATED_INACTIVEJETSAMPRIORITY_DISABLE 	15 /* Reset the inactive jetsam band for a process to the default band (0)*/
#define MEMORYSTATUS_CMD_SET_PROCESS_IS_MANAGED       16   /* (Re-)Set state on a process that marks it as (un-)managed by a system entity e.g. assertiond */
#define MEMORYSTATUS_CMD_GET_PROCESS_IS_MANAGED       17   /* Return the 'managed' status of a process */
#define MEMORYSTATUS_CMD_SET_PROCESS_IS_FREEZABLE     18   /* Is the process eligible for freezing? Apps and extensions can pass in FALSE to opt out of freezing, i.e.,

Pseudocode

struct memorystatus_priority_entry memStatus[NUM_ENTRIES];
size_t count = sizeof(struct memorystatus_priority_entry) * NUM_ENTRIES;
int kernResult = memorystatus_control(MEMORYSTATUS_CMD_GET_PRIORITY_LIST, 0, 0, memStatus, count);
if (rc < 0) {
  NSLog(@"memorystatus_control"); 
	return ;
}

int entry = 0;
for (; rc > 0; rc -= sizeof(struct memorystatus_priority_entry)){
  printf ("PID: %5d\tPriority:%2d\tUser Data: %llx\tLimit:%2d\tState:%s\n",
          memstatus[entry].pid,
          memstatus[entry].priority,
          memstatus[entry].user_data,
          memstatus[entry].limit,
          state_to_text(memstatus[entry].state));
  entry++;
}

For loops print out pid, Priority, User Data, Limit, State information for each process (that is, App).Find the process with priority 10 from the log, which is the App that we run in the foreground.Why 10?Because #define JETSAM_PRIORITY_FOREGROUND 10 Our goal is to get the upper memory limit of the foreground App.

4. How to determine if OOM has occurred

Are app s sure to receive low memory warnings before OOM causes crash?

Two comparative experiments were performed:

// Experiment 1
NSMutableArray *array = [NSMutableArray array];
for (NSInteger index = 0; index < 10000000; index++) {
  NSString *filePath = [[NSBundle mainBundle] pathForResource:@"Info" ofType:@"plist"];
  NSData *data = [NSData dataWithContentsOfFile:filePath];
  [array addObject:data];
}
// Experiment 2
// ViewController.m
- (void)viewDidLoad {
    [super viewDidLoad];
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        NSMutableArray *array = [NSMutableArray array];
        for (NSInteger index = 0; index < 10000000; index++) {
            NSString *filePath = [[NSBundle mainBundle] pathForResource:@"Info" ofType:@"plist"];
            NSData *data = [NSData dataWithContentsOfFile:filePath];
            [array addObject:data];
        }
    });
}
- (void)didReceiveMemoryWarning
{
    NSLog(@"2");
}

// AppDelegate.m
- (void)applicationDidReceiveMemoryWarning:(UIApplication *)application
{
    NSLog(@"1");
}

Phenomena:

  1. Memory consumption is too high in viewDidLoad, the main thread, and the system will not issue a low memory warning, just Crash.The main thread is busy because memory is growing too fast.
  2. In multithreaded cases, App receives a low memory warning because of excessive memory growth. The applicationDidReceiveMemoryWarning in AppDelegate executes first, followed by didReceiveMemoryWarning in the current VC.

Conclusion:

Receiving a low memory warning doesn't necessarily mean Crash, because it takes 6 seconds for the system to judge, and no crash occurs if memory drops within 6 seconds.OOM does not necessarily receive a low memory warning.

5. Memory information collection

To locate the problem accurately, all dump objects and their memory information are required.When the memory is close to the system memory limit, collect and record the required information, upload to the server, analyze and repair with a certain data reporting mechanism.

You also need to know in which function each object was created to restore the scene.

Source code (libmalloc/malloc), memory allocation functions malloc and calloc use nano_by defaultZone, nano_Zones are memory allocations less than 256B, and scalable_allocations greater than 256BZone to assign.

Mainly for large memory allocation monitoring.The malloc function uses malloc_zone_malloc, calloc uses malloc_zone_calloc.

Using scalable_Functions that zone allocates memory call malloc_The logger function because the system specifically counts and manages memory allocations for a place.This design also meets the "acceptance principle".

void *
malloc(size_t size)
{
	void *retval;
	retval = malloc_zone_malloc(default_zone, size);
	if (retval == NULL) {
		errno = ENOMEM;
	}
	return retval;
}

void *
calloc(size_t num_items, size_t size)
{
	void *retval;
	retval = malloc_zone_calloc(default_zone, num_items, size);
	if (retval == NULL) {
		errno = ENOMEM;
	}
	return retval;
}

Let's start with this default_What is a zone? The code is as follows

typedef struct {
	malloc_zone_t malloc_zone;
	uint8_t pad[PAGE_MAX_SIZE - sizeof(malloc_zone_t)];
} virtual_default_zone_t;

static virtual_default_zone_t virtual_default_zone
__attribute__((section("__DATA,__v_zone")))
__attribute__((aligned(PAGE_MAX_SIZE))) = {
	NULL,
	NULL,
	default_zone_size,
	default_zone_malloc,
	default_zone_calloc,
	default_zone_valloc,
	default_zone_free,
	default_zone_realloc,
	default_zone_destroy,
	DEFAULT_MALLOC_ZONE_STRING,
	default_zone_batch_malloc,
	default_zone_batch_free,
	&default_zone_introspect,
	10,
	default_zone_memalign,
	default_zone_free_definite_size,
	default_zone_pressure_relief,
	default_zone_malloc_claimed_address,
};

static malloc_zone_t *default_zone = &virtual_default_zone.malloc_zone;

static void *
default_zone_malloc(malloc_zone_t *zone, size_t size)
{
	zone = runtime_default_zone();
	
	return zone->malloc(zone, size);
}


MALLOC_ALWAYS_INLINE
static inline malloc_zone_t *
runtime_default_zone() {
	return (lite_zone) ? lite_zone : inline_malloc_default_zone();
}

You can see default_zone initializes in this way

static inline malloc_zone_t *
inline_malloc_default_zone(void)
{
	_malloc_initialize_once();
	// malloc_report(ASL_LEVEL_INFO, "In inline_malloc_default_zone with %d %d\n", malloc_num_zones, malloc_has_debug_zone);
	return malloc_zones[0];
}

Subsequent calls are as follows _Malloc_Initialize -> create_Scalable_Zone -> create_Scalable_Szone eventually we created szone_Objects of type t, through type conversion, get our default_zone.

malloc_zone_t *
create_scalable_zone(size_t initial_size, unsigned debug_flags) {
	return (malloc_zone_t *) create_scalable_szone(initial_size, debug_flags);
}
void *malloc_zone_malloc(malloc_zone_t *zone, size_t size)
{
  MALLOC_TRACE(TRACE_malloc | DBG_FUNC_START, (uintptr_t)zone, size, 0, 0);
  void *ptr;
  if (malloc_check_start && (malloc_check_counter++ >= malloc_check_start)) {
    internal_check();
  }
  if (size > MALLOC_ABSOLUTE_MAX_SIZE) {
    return NULL;
  }
  ptr = zone->malloc(zone, size);
  // Start using malloc_after zone has allocated memoryLogger for recording
  if (malloc_logger) {
    malloc_logger(MALLOC_LOG_TYPE_ALLOCATE | MALLOC_LOG_TYPE_HAS_ZONE, (uintptr_t)zone, (uintptr_t)size, 0, (uintptr_t)ptr, 0);
  }
  MALLOC_TRACE(TRACE_malloc | DBG_FUNC_END, (uintptr_t)zone, size, (uintptr_t)ptr, 0);
  return ptr;
}

Its allocation implementation is zone->malloc based on previous analysis, which is szone_The corresponding malloc implementation in the t-struct object.

After creating the szone s, a series of initializations have been made as follows.

// Initialize the security token.
szone->cookie = (uintptr_t)malloc_entropy[0];

szone->basic_zone.version = 12;
szone->basic_zone.size = (void *)szone_size;
szone->basic_zone.malloc = (void *)szone_malloc;
szone->basic_zone.calloc = (void *)szone_calloc;
szone->basic_zone.valloc = (void *)szone_valloc;
szone->basic_zone.free = (void *)szone_free;
szone->basic_zone.realloc = (void *)szone_realloc;
szone->basic_zone.destroy = (void *)szone_destroy;
szone->basic_zone.batch_malloc = (void *)szone_batch_malloc;
szone->basic_zone.batch_free = (void *)szone_batch_free;
szone->basic_zone.introspect = (struct malloc_introspection_t *)&szone_introspect;
szone->basic_zone.memalign = (void *)szone_memalign;
szone->basic_zone.free_definite_size = (void *)szone_free_definite_size;
szone->basic_zone.pressure_relief = (void *)szone_pressure_relief;
szone->basic_zone.claimed_address = (void *)szone_claimed_address;

Others use scalable_zone allocates memory in a similar way, so large memory allocations, regardless of how the external function is encapsulated, end up being called malloc_logger function.So we can use fishhook to remove the hook function, record memory allocation, upload to the server, analyze and repair with certain data reporting mechanisms.

// For logging VM allocation and deallocation, arg1 here
// is the mach_port_name_t of the target task in which the
// alloc or dealloc is occurring. For example, for mmap()
// that would be mach_task_self(), but for a cross-task-capable
// call such as mach_vm_map(), it is the target task.

typedef void (malloc_logger_t)(uint32_t type, uintptr_t arg1, uintptr_t arg2, uintptr_t arg3, uintptr_t result, uint32_t num_hot_frames_to_skip);

extern malloc_logger_t *__syscall_logger;

When malloc_logger and uSyscall_malloc/free, vm_when logger function pointer is not emptyAllocate/vm_Memory allocation/release, such as deallocate, notifies the upper layer through these two pointers, which is also how the memory debugging tool malloc stack is implemented.With these two function pointers, it is easy to record memory allocation information (including allocation size and allocation stack) for the currently living object.The allocation stack can be captured with the backtrace function, but the captured address is a virtual memory address and the symbols cannot be parsed from the symbol table dsym.So also record the offset slide for each image when it is loaded, so the symbol table address = stack address - slide.

Small tips:

ASLR (Address space layout)Randomization: Randomized loading of address space, allocation of address space, and layout of address space are common names. Randomization is a computer security technology that prevents memory corruption vulnerabilities from being exploited. Randomizing an attacker's address space in a critical data area of a process can reliably jump to a specific location in memory to manipulate functions.Modern operating systems generally have this mechanism.

Function address add: the real implementation address of the function;

Function virtual address: vm_add;

ASLR: The random offset of the virtual address of the slide function loaded into the process memory, which is different for each mach-o.vm_add + slide = add.That is: *(base +offset)= imp.

Because Tencent has also opened its own OOM location scheme- OOMDetector With the ready-made wheels, you can use them well, so the idea for memory monitoring is to find the upper memory limit that the system gives to App, and then dump when it approaches the upper memory limit.Memory conditions, assembly of basic data information into a qualified report data, after a certain data reporting strategy to the server, the server consumes data, analysis produces reports, client engineers analyze problems according to reports.Data from different projects are notified to the owner and developer of the project by mail, SMS, Enterprise WeChat, etc.(In serious cases, the developer is called directly and the supervisor is followed up with the results of each step of the process). After problem analysis, either a new version or hot fix will be released.

6. What can we do in the development phase for memory

  1. Picture zooming

    WWDC 2018 Session 416 - iOS Memory Deep Dive, direct use of UIImage when processing picture zooming takes up a portion of memory when reading files while decoding, and generating intermediate bitmap bitmaps consumes a lot of memory.ImageIO does not have these two drawbacks, it only takes up the memory of the final picture size

    Two comparative experiments were performed: showing App a picture

    // Method 1: 19.6M
    UIImage *imageResult = [self scaleImage:[UIImage imageNamed:@"test"]                                                  newSize:CGSizeMake(self.view.frame.size.width, self.view.frame.size.height)];
    self.imageView.image = imageResult;
    
    // Method 2: 14M
    NSData *data = UIImagePNGRepresentation([UIImage imageNamed:@"test"]);
    UIImage *imageResult = [self scaledImageWithData:data 				    withSize:CGSizeMake(self.view.frame.size.width, self.view.frame.size.height) scale:3 orientation:UIImageOrientationUp];
    self.imageView.image = imageResult;
    
    - (UIImage *)scaleImage:(UIImage *)image newSize:(CGSize)newSize
    {
        UIGraphicsBeginImageContextWithOptions(newSize, NO, 0);
        [image drawInRect:CGRectMake(0, 0, newSize.width, newSize.height)];
        UIImage *newImage = UIGraphicsGetImageFromCurrentImageContext();
        UIGraphicsEndImageContext();
        return newImage;
    }
    
    - (UIImage *)scaledImageWithData:(NSData *)data withSize:(CGSize)size scale:(CGFloat)scale orientation:(UIImageOrientation)orientation
    {
        CGFloat maxPixelSize = MAX(size.width, size.height);
        CGImageSourceRef sourceRef = CGImageSourceCreateWithData((__bridge CFDataRef)data, nil);
        NSDictionary *options = @{(__bridge id)kCGImageSourceCreateThumbnailFromImageAlways : (__bridge id)kCFBooleanTrue,
                                  (__bridge id)kCGImageSourceThumbnailMaxPixelSize : [NSNumber numberWithFloat:maxPixelSize]};
        CGImageRef imageRef = CGImageSourceCreateThumbnailAtIndex(sourceRef, 0, (__bridge CFDictionaryRef)options);
        UIImage *resultImage = [UIImage imageWithCGImage:imageRef scale:scale orientation:orientation];
        CGImageRelease(imageRef);
        CFRelease(sourceRef);
        return resultImage;
    }
    

    You can see that using ImageIO takes up less memory than zooming directly using UIImage.

  2. Reasonable use of autorelease epool

We know that the autorelease epool object is not released until the end of RunLoop.Under ARC, if we are constantly requesting memory, such as loops, we need to manually add autorelease epools to avoid the sudden memory boom of OOM.

Comparative experiment

// Experiment 1
NSMutableArray *array = [NSMutableArray array];
for (NSInteger index = 0; index < 10000000; index++) {
  NSString *indexStrng = [NSString stringWithFormat:@"%zd", index];
  NSString *resultString = [NSString stringWithFormat:@"%zd-%@", index, indexStrng];
  [array addObject:resultString];
}

// Experiment 2
NSMutableArray *array = [NSMutableArray array];
for (NSInteger index = 0; index < 10000000; index++) {
  @autoreleasepool {
    NSString *indexStrng = [NSString stringWithFormat:@"%zd", index];
    NSString *resultString = [NSString stringWithFormat:@"%zd-%@", index, indexStrng];
    [array addObject:resultString];
  }
}

Experiment 1 consumed 739.6M of memory and Experiment 2 587M of memory.

  1. UIGraphics sBeginImageContext and UIGraphics sEndImageContext must appear in pairs or they will leak the context.XCode's Analeze can also sweep this out.

  2. Whether you open a Web page or execute js, you should use WKWebView.UIWebView consumes a lot of memory, which increases the chance that App will have OOM, while WKWebView is a multiprocess component, Network Loading and UI Rendering execute in other processes and consume less memory than UIWebView.

  3. For SDK s or App s, if the scene is cache-related, try using NSCache instead of NSMutableDictionary.NSCache allocates Purgeable Memory, which can be freed automatically by the system.The combination of NSCache and NSPureableData allows the system to either reclaim memory as appropriate or remove objects during memory cleanup.

    Other development habits are not described one by one. Good development habits and code awareness need to be refined in peacetime.

5. App Network Monitoring

Mobile network environment has been very complex, such as WIFI, 2G, 3G, 4G, 5G, etc. Users may switch between these types when using App. This is also a difference between mobile network and traditional network, called "Connection Migration".In addition, there are problems such as slow DNS resolution, high failure rate, operator hijacking, etc.Users have a poor experience with App for some reason. To improve your network, you must have a clear means of monitoring it.

###1. App network request process

App typically goes through the following key steps when sending a network request:

  • DNS Resolution

    Domain Name system, a network domain name system, is essentially a distributed database that maps domain names and IP addresses to each other, making it easier for people to access the Internet.Local DNS caches are queried first, and DNS servers are queried if the search fails. There may be a lot of nodes involved in the process of recursive and iterative queries.Operators may not be doing anything: one is operator hijacking, which means that when you visit a web page within App, you see advertisements that are not relevant to the content; the other is throwing your request to a very remote base station for DNS resolution, which results in long DNS resolution time for our App and inefficient App network.Typically, HTTPDNS schemes are used to solve DNS problems on their own.

  • TCP 3 Handshakes

    See this article about why TCP handshakes occur three times instead of two or four times Article.

  • TLS handshake

    TLS handshake, also known as key negotiation, is required for HTTPS requests.

  • Send Request

    Once the connection is established, you can send the request, at which point you can record the request start time

  • Waiting for a response

    Wait for the server to return a response.This time mainly depends on the size of the resource and is also the most time-consuming phase in the network request process.

  • Return Response

    The server returns the response to the client and determines whether the request was successful, whether the cache was taken, and whether redirection was required based on the status code in the HTTP header information.

2. Monitoring principle

Name Explain
NSURLConnection It has been abandoned.Simple usage
NSURLSession IOS 7.0 is available for more powerful functions
CFNetwork Bottom level of NSURL, pure C implementation

The iOS network framework hierarchy relationships are as follows:

iOS network status is composed of four layers: BSD Sockets at the bottom, SecureTransport; CFNetwork, NSURLSession, NSURLConnection, WebView are implemented with Objective-C and call CFNetwork; application layer framework AFNetworking is based on NSURLSession, NSURLConnection.

At present, there are two kinds of network monitoring in the industry: one is through NSURLProtocol monitoring, the other is through Hook monitoring.The following describes several ways to monitor network requests, each with advantages and disadvantages.

Scenario 1: NSURLProtocol monitors App network requests <a name="network-2.1"></a>

NSURLProtocol is a simple upper-level interface, but NSURLProtocol belongs to the URL Loading System system.Application protocols have limited support. They support several application layer protocols, such as FTP, HTTP, HTTPS, and can not be monitored for other protocols. There are some limitations.If you monitor the underlying network library CFNetwork, there is no limit.

Specific practices for NSURLProtocol are as follows: This article As mentioned in the section, inherit abstract classes and implement methods to customize network requests for monitoring purposes.

After iOS 10, a new proxy method was added to NSURLSessionTaskDelegate:

/*
 * Sent when complete statistics information has been collected for the task.
 */
- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didFinishCollectingMetrics:(NSURLSessionTaskMetrics *)metrics API_AVAILABLE(macosx(10.12), ios(10.0), watchos(3.0), tvos(10.0));

Indicators of network conditions can be obtained from NSURLSessionTaskMetrics.The parameters are as follows

@interface NSURLSessionTaskMetrics : NSObject

/*
 * transactionMetrics array contains the metrics collected for every request/response transaction created during the task execution.
 */
@property (copy, readonly) NSArray<NSURLSessionTaskTransactionMetrics *> *transactionMetrics;

/*
 * Interval from the task creation time to the task completion time.
 * Task creation time is the time when the task was instantiated.
 * Task completion time is the time when the task is about to change its internal state to completed.
 */
@property (copy, readonly) NSDateInterval *taskInterval;

/*
 * redirectCount is the number of redirects that were recorded.
 */
@property (assign, readonly) NSUInteger redirectCount;

- (instancetype)init API_DEPRECATED("Not supported", macos(10.12,10.15), ios(10.0,13.0), watchos(3.0,6.0), tvos(10.0,13.0));
+ (instancetype)new API_DEPRECATED("Not supported", macos(10.12,10.15), ios(10.0,13.0), watchos(3.0,6.0), tvos(10.0,13.0));

@end

Among them: taskInterval represents the total time between task creation and call completion, task creation time is the time when the task is instantiated, task completion time is the time when the internal state of the task will become complete, redirectCount indicates the number of times the task is redirected, transactionMetrics The array contains the metrics collected during each request/response transaction during task execution, with the following parameters:

/*
 * This class defines the performance metrics collected for a request/response transaction during the task execution.
 */
API_AVAILABLE(macosx(10.12), ios(10.0), watchos(3.0), tvos(10.0))
@interface NSURLSessionTaskTransactionMetrics : NSObject

/*
 * Represents the transaction request. Request Transaction
 */
@property (copy, readonly) NSURLRequest *request;

/*
 * Represents the transaction response. Can be nil if error occurred and no response was generated. Response Transaction
 */
@property (nullable, copy, readonly) NSURLResponse *response;

/*
 * For all NSDate metrics below, if that aspect of the task could not be completed, then the corresponding "EndDate" metric will be nil.
 * For example, if a name lookup was started but the name lookup timed out, failed, or the client canceled the task before the name could be resolved -- then while domainLookupStartDate may be set, domainLookupEndDate will be nil along with all later metrics.
 */

/*
 * When the client starts requesting, whether from the server or from the local cache
 * fetchStartDate returns the time when the user agent started fetching the resource, whether or not the resource was retrieved from the server or local resources.
 *
 * The following metrics will be set to nil, if a persistent connection was used or the resource was retrieved from local resources:
 *
 *   domainLookupStartDate
 *   domainLookupEndDate
 *   connectStartDate
 *   connectEndDate
 *   secureConnectionStartDate
 *   secureConnectionEndDate
 */
@property (nullable, copy, readonly) NSDate *fetchStartDate;

/*
 * domainLookupStartDate returns the time immediately before the user agent started the name lookup for the resource. DNS Time to start parsing
 */
@property (nullable, copy, readonly) NSDate *domainLookupStartDate;

/*
 * domainLookupEndDate returns the time after the name lookup was completed. DNS Resolution Completion Time
 */
@property (nullable, copy, readonly) NSDate *domainLookupEndDate;

/*
 * connectStartDate is the time immediately before the user agent started establishing the connection to the server.
 *
 * For example, this would correspond to the time immediately before the user agent started trying to establish the TCP connection. Time when client and server begin to establish TCP connection
 */
@property (nullable, copy, readonly) NSDate *connectStartDate;

/*
 * If an encrypted connection was used, secureConnectionStartDate is the time immediately before the user agent started the security handshake to secure the current connection. HTTPS The start time of the TLS handshake
 *
 * For example, this would correspond to the time immediately before the user agent started the TLS handshake. 
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSDate *secureConnectionStartDate;

/*
 * If an encrypted connection was used, secureConnectionEndDate is the time immediately after the security handshake completed. HTTPS The time at which the TLS handshake ended
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSDate *secureConnectionEndDate;

/*
 * connectEndDate is the time immediately after the user agent finished establishing the connection to the server, including completion of security-related and other handshakes. The time when the TCP connection between the client and the server is completed, including the TLS handshake time
 */
@property (nullable, copy, readonly) NSDate *connectEndDate;

/*
 * requestStartDate is the time immediately before the user agent started requesting the source, regardless of whether the resource was retrieved from the server or local resources.
 The time at which client requests begin is interpreted as the first byte time of the header that starts transmitting HTTP requests
 *
 * For example, this would correspond to the time immediately before the user agent sent an HTTP GET request.
 */
@property (nullable, copy, readonly) NSDate *requestStartDate;

/*
 * requestEndDate is the time immediately after the user agent finished requesting the source, regardless of whether the resource was retrieved from the server or local resources.
 The time when the client request ends, which is interpreted as the time when the last byte transfer of the HTTP request completes
 *
 * For example, this would correspond to the time immediately after the user agent finished sending the last byte of the request.
 */
@property (nullable, copy, readonly) NSDate *requestEndDate;

/*
 * responseStartDate is the time immediately after the user agent received the first byte of the response from the server or from local resources.
 The time the client receives the first byte of the response from the server
 *
 * For example, this would correspond to the time immediately after the user agent received the first byte of an HTTP response.
 */
@property (nullable, copy, readonly) NSDate *responseStartDate;

/*
 * responseEndDate is the time immediately after the user agent received the last byte of the resource. When the client received the last request from the server
 */
@property (nullable, copy, readonly) NSDate *responseEndDate;

/*
 * The network protocol used to fetch the resource, as identified by the ALPN Protocol ID Identification Sequence [RFC7301].
 * E.g., h2, http/1.1, spdy/3.1.
 Network protocol name, such as http/1.1, spdy/3.1
 *
 * When a proxy is configured AND a tunnel connection is established, then this attribute returns the value for the tunneled protocol.
 *
 * For example:
 * If no proxy were used, and HTTP/2 was negotiated, then h2 would be returned.
 * If HTTP/1.1 were used to the proxy, and the tunneled connection was HTTP/2, then h2 would be returned.
 * If HTTP/1.1 were used to the proxy, and there were no tunnel, then http/1.1 would be returned.
 *
 */
@property (nullable, copy, readonly) NSString *networkProtocolName;

/*
 * This property is set to YES if a proxy connection was used to fetch the resource.
	Whether the connection uses a proxy
 */
@property (assign, readonly, getter=isProxyConnection) BOOL proxyConnection;

/*
 * This property is set to YES if a persistent connection was used to fetch the resource.
 Whether existing connections are multiplexed
 */
@property (assign, readonly, getter=isReusedConnection) BOOL reusedConnection;

/*
 * Indicates whether the resource was loaded, pushed or retrieved from the local cache.
 Get Sources
 */
@property (assign, readonly) NSURLSessionTaskMetricsResourceFetchType resourceFetchType;

/*
 * countOfRequestHeaderBytesSent is the number of bytes transferred for request header.
 Bytes of Request Header
 */
@property (readonly) int64_t countOfRequestHeaderBytesSent API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfRequestBodyBytesSent is the number of bytes transferred for request body.
 Bytes of Requestor
 * It includes protocol-specific framing, transfer encoding, and content encoding.
 */
@property (readonly) int64_t countOfRequestBodyBytesSent API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfRequestBodyBytesBeforeEncoding is the size of upload body data, file, or stream.
 Size of uploaded volume data, files, streams
 */
@property (readonly) int64_t countOfRequestBodyBytesBeforeEncoding API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfResponseHeaderBytesReceived is the number of bytes transferred for response header.
 Number of bytes in response header
 */
@property (readonly) int64_t countOfResponseHeaderBytesReceived API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfResponseBodyBytesReceived is the number of bytes transferred for response body.
 Number of bytes in response volume
 * It includes protocol-specific framing, transfer encoding, and content encoding.
 */
@property (readonly) int64_t countOfResponseBodyBytesReceived API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * countOfResponseBodyBytesAfterDecoding is the size of data delivered to your delegate or completion handler.
Data size of callbacks to proxy methods or to complete post-processing
 
 */
@property (readonly) int64_t countOfResponseBodyBytesAfterDecoding API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * localAddress is the IP address string of the local interface for the connection.
  Local interface IP address under current connection
 *
 * For multipath protocols, this is the local address of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSString *localAddress API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * localPort is the port number of the local interface for the connection.
 Local port number under current connection
 
 *
 * For multipath protocols, this is the local port of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *localPort API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * remoteAddress is the IP address string of the remote interface for the connection.
 Remote IP address under current connection
 *
 * For multipath protocols, this is the remote address of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSString *remoteAddress API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * remotePort is the port number of the remote interface for the connection.
  Remote port number under current connection
 *
 * For multipath protocols, this is the remote port of the initial flow.
 *
 * If a connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *remotePort API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * negotiatedTLSProtocolVersion is the TLS protocol version negotiated for the connection.
  TLS protocol version number for connection negotiation
 * It is a 2-byte sequence in host byte order.
 *
 * Please refer to tls_protocol_version_t enum in Security/SecProtocolTypes.h
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *negotiatedTLSProtocolVersion API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * negotiatedTLSCipherSuite is the TLS cipher suite negotiated for the connection.
 TLS cipher suite for connection negotiation
 * It is a 2-byte sequence in host byte order.
 *
 * Please refer to tls_ciphersuite_t enum in Security/SecProtocolTypes.h
 *
 * If an encrypted connection was not used, this attribute is set to nil.
 */
@property (nullable, copy, readonly) NSNumber *negotiatedTLSCipherSuite API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether the connection is established over a cellular interface.
 Is the connection established over a cellular network
 */
@property (readonly, getter=isCellular) BOOL cellular API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether the connection is established over an expensive interface.
 Whether connections are made through expensive interfaces
 */
@property (readonly, getter=isExpensive) BOOL expensive API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether the connection is established over a constrained interface.
 Whether connections are made through restricted interfaces
 */
@property (readonly, getter=isConstrained) BOOL constrained API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));

/*
 * Whether a multipath protocol is successfully negotiated for the connection.
 Whether the multipath protocol was successfully negotiated for connection
 */
@property (readonly, getter=isMultipath) BOOL multipath API_AVAILABLE(macos(10.15), ios(13.0), watchos(6.0), tvos(13.0));


- (instancetype)init API_DEPRECATED("Not supported", macos(10.12,10.15), ios(10.0,13.0), watchos(3.0,6.0), tvos(10.0,13.0));
+ (instancetype)new API_DEPRECATED("Not supported", macos(10.12,10.15), ios(10.0,13.0), watchos(3.0,6.0), tvos(10.0,13.0));

@end

Simple code for network monitoring

// Monitor basic information
@interface  NetworkMonitorBaseDataModel : NSObject
// Requested URL address
@property (nonatomic, strong) NSString *requestUrl;
//Request Header
@property (nonatomic, strong) NSArray *requestHeaders;
//Response Header
@property (nonatomic, strong) NSArray *responseHeaders;
//Request parameters for GET methods
@property (nonatomic, strong) NSString *getRequestParams;
//HTTP methods, such as POST
@property (nonatomic, strong) NSString *httpMethod;
//Protocol name, such as http1.0 / http1.1 / http2.0
@property (nonatomic, strong) NSString *httpProtocol;
//Whether to use proxy
@property (nonatomic, assign) BOOL useProxy;
//DNS resolved IP address
@property (nonatomic, strong) NSString *ip;
@end

// Monitoring Information Model
@interface  NetworkMonitorDataModel : NetworkMonitorBaseDataModel
//When the client initiated the request
@property (nonatomic, assign) UInt64 requestDate;
//The wait time from the client to the start of dns resolution in ms 
@property (nonatomic, assign) int waitDNSTime;
//DNS Resolution Time-consuming
@property (nonatomic, assign) int dnsLookupTime;
//tcp three-time handshake, unit ms
@property (nonatomic, assign) int tcpTime;
//ssl handshake time
@property (nonatomic, assign) int sslTime;
//Time-consuming for a complete request, units ms
@property (nonatomic, assign) int requestTime;
//http response code
@property (nonatomic, assign) NSUInteger httpCode;
//Bytes Sent
@property (nonatomic, assign) UInt64 sendBytes;
//Number of bytes received
@property (nonatomic, assign) UInt64 receiveBytes;


// Error Information Model
@interface  NetworkMonitorErrorModel : NetworkMonitorBaseDataModel
//Error Code
@property (nonatomic, assign) NSInteger errorCode;
//Number of Errors
@property (nonatomic, assign) NSUInteger errCount;
//Exception Name
@property (nonatomic, strong) NSString *exceptionName;
//Exception Details
@property (nonatomic, strong) NSString *exceptionDetail;
//Exception Stack
@property (nonatomic, strong) NSString *stackTrace;
@end

  
// Inherit from NSURLProtocol Abstract class, implement response method, proxy network request
@interface CustomURLProtocol () <NSURLSessionTaskDelegate>

@property (nonatomic, strong) NSURLSessionDataTask *dataTask;
@property (nonatomic, strong) NSOperationQueue *sessionDelegateQueue;
@property (nonatomic, strong) NetworkMonitorDataModel *dataModel;
@property (nonatomic, strong) NetworkMonitorErrorModel *errModel;

@end

//Request network using NSURLSessionDataTask
- (void)startLoading {
    NSURLSessionConfiguration *configuration = [NSURLSessionConfiguration defaultSessionConfiguration];
  	NSURLSession *session = [NSURLSession sessionWithConfiguration:configuration
                                                          delegate:self
                                                     delegateQueue:nil];
    NSURLSession *session = [NSURLSession sessionWithConfiguration:configuration delegate:self delegateQueue:nil];
  	self.sessionDelegateQueue = [[NSOperationQueue alloc] init];
    self.sessionDelegateQueue.maxConcurrentOperationCount = 1;
    self.sessionDelegateQueue.name = @"com.networkMonitor.session.queue";
    self.dataTask = [session dataTaskWithRequest:self.request];
    [self.dataTask resume];
}

#pragma mark - NSURLSessionTaskDelegate
- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didCompleteWithError:(NSError *)error {
    if (error) {
        [self.client URLProtocol:self didFailWithError:error];
    } else {
        [self.client URLProtocolDidFinishLoading:self];
    }
    if (error) {
        NSURLRequest *request = task.currentRequest;
        if (request) {
            self.errModel.requestUrl  = request.URL.absoluteString;        
            self.errModel.httpMethod = request.HTTPMethod;
            self.errModel.requestParams = request.URL.query;
        }
        self.errModel.errorCode = error.code;
        self.errModel.exceptionName = error.domain;
        self.errModel.exceptionDetail = error.description;
      // Upload Network data to the data reporting component, which is described in [Create a powerful, flexible and configurable data reporting component] (. /1.80.md)
    }
    self.dataTask = nil;
}


- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didFinishCollectingMetrics:(NSURLSessionTaskMetrics *)metrics {
       if (@available(iOS 10.0, *) && [metrics.transactionMetrics count] > 0) {
        [metrics.transactionMetrics enumerateObjectsUsingBlock:^(NSURLSessionTaskTransactionMetrics *_Nonnull obj, NSUInteger idx, BOOL *_Nonnull stop) {
            if (obj.resourceFetchType == NSURLSessionTaskMetricsResourceFetchTypeNetworkLoad) {
                if (obj.fetchStartDate) {
                    self.dataModel.requestDate = [obj.fetchStartDate timeIntervalSince1970] * 1000;
                }
                if (obj.domainLookupStartDate && obj.domainLookupEndDate) {
                    self.dataModel. waitDNSTime = ceil([obj.domainLookupStartDate timeIntervalSinceDate:obj.fetchStartDate] * 1000);
                    self.dataModel. dnsLookupTime = ceil([obj.domainLookupEndDate timeIntervalSinceDate:obj.domainLookupStartDate] * 1000);
                }
                if (obj.connectStartDate) {
                    if (obj.secureConnectionStartDate) {
                        self.dataModel. waitDNSTime = ceil([obj.secureConnectionStartDate timeIntervalSinceDate:obj.connectStartDate] * 1000);
                    } else if (obj.connectEndDate) {
                        self.dataModel.tcpTime = ceil([obj.connectEndDate timeIntervalSinceDate:obj.connectStartDate] * 1000);
                    }
                }
                if (obj.secureConnectionEndDate && obj.secureConnectionStartDate) {
                    self.dataModel.sslTime = ceil([obj.secureConnectionEndDate timeIntervalSinceDate:obj.secureConnectionStartDate] * 1000);
                }

                if (obj.fetchStartDate && obj.responseEndDate) {
                    self.dataModel.requestTime = ceil([obj.responseEndDate timeIntervalSinceDate:obj.fetchStartDate] * 1000);
                }

                self.dataModel.httpProtocol = obj.networkProtocolName;

                NSHTTPURLResponse *response = (NSHTTPURLResponse *)obj.response;
                if ([response isKindOfClass:NSHTTPURLResponse.class]) {
                    self.dataModel.receiveBytes = response.expectedContentLength;
                }

                if ([obj respondsToSelector:@selector(_remoteAddressAndPort)]) {
                    self.dataModel.ip = [obj valueForKey:@"_remoteAddressAndPort"];
                }

                if ([obj respondsToSelector:@selector(_requestHeaderBytesSent)]) {
                    self.dataModel.sendBytes = [[obj valueForKey:@"_requestHeaderBytesSent"] unsignedIntegerValue];
                }
                if ([obj respondsToSelector:@selector(_responseHeaderBytesReceived)]) {
                    self.dataModel.receiveBytes = [[obj valueForKey:@"_responseHeaderBytesReceived"] unsignedIntegerValue];
                }

               self.dataModel.requestUrl = [obj.request.URL absoluteString];
                self.dataModel.httpMethod = obj.request.HTTPMethod;
                self.dataModel.useProxy = obj.isProxyConnection;
            }
        }];
				// Upload Network data to the data reporting component, which is described in [Create a powerful, flexible and configurable data reporting component] (. /1.80.md)
    }
}

Scenario 2.2: NSURLProtocol monitors the Black Magic of App network requests <a name="network-2.2"></a>

Article above 2.1 NSURLSessionTaskMetrics appears to be incomplete for network monitoring due to compatibility issues, but I see an article later when searching for data Article .This article found the following code when analyzing the Webkit source code while analyzing WebView's network monitoring

#if !HAVE(TIMINGDATAOPTIONS)
void setCollectsTimingData()
{
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        [NSURLConnection _setCollectsTimingData:YES];
        ...
    });
}
#endif

That is, NSURLConnection itself has a collection API for TimingData, but it is not exposed to developers, and Apple is using it.The _of NSURLConnection was found in the runtime header setCollectsTimingData:,_timingData 2 API (available after iOS8).

NSURLSession used before iOS9_setCollectsTimingData: You are ready to use TimingData.

Be careful:

  • Because it is a private API, be aware of confusion when using it.For example [[@'_SetC "stringByAppendingString:@" ollectsT "] stringByAppendingString:@" imingData:"].
  • Private API s are not recommended. As an APM, it belongs to the public team. You want to see if your SDK achieves the purpose of network monitoring, but in case it causes problems on the App shelves of your line of business, you will have to lose it.Usually this kind of speculation is tricky, not 100% sure things can be used in the toy stage.
@interface _NSURLConnectionProxy : DelegateProxy

@end

@implementation _NSURLConnectionProxy

- (BOOL)respondsToSelector:(SEL)aSelector
{
    if ([NSStringFromSelector(aSelector) isEqualToString:@"connectionDidFinishLoading:"]) {
        return YES;
    }
    return [self.target respondsToSelector:aSelector];
}

- (void)forwardInvocation:(NSInvocation *)invocation
{
    [super forwardInvocation:invocation];
    if ([NSStringFromSelector(invocation.selector) isEqualToString:@"connectionDidFinishLoading:"]) {
        __unsafe_unretained NSURLConnection *conn;
        [invocation getArgument:&conn atIndex:2];
        SEL selector = NSSelectorFromString([@"_timin" stringByAppendingString:@"gData"]);
        NSDictionary *timingData = [conn performSelector:selector];
        [[NTDataKeeper shareInstance] trackTimingData:timingData request:conn.currentRequest];
    }
}

@end

@implementation NSURLConnection(tracker)

+ (void)load
{
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        Class class = [self class];
        
        SEL originalSelector = @selector(initWithRequest:delegate:);
        SEL swizzledSelector = @selector(swizzledInitWithRequest:delegate:);
        
        Method originalMethod = class_getInstanceMethod(class, originalSelector);
        Method swizzledMethod = class_getInstanceMethod(class, swizzledSelector);
        method_exchangeImplementations(originalMethod, swizzledMethod);
        
        NSString *selectorName = [[@"_setC" stringByAppendingString:@"ollectsT"] stringByAppendingString:@"imingData:"];
        SEL selector = NSSelectorFromString(selectorName);
        [NSURLConnection performSelector:selector withObject:@(YES)];
    });
}

- (instancetype)swizzledInitWithRequest:(NSURLRequest *)request delegate:(id<NSURLConnectionDelegate>)delegate
{
    if (delegate) {
        _NSURLConnectionProxy *proxy = [[_NSURLConnectionProxy alloc] initWithTarget:delegate];
        objc_setAssociatedObject(delegate ,@"_NSURLConnectionProxy" ,proxy, OBJC_ASSOCIATION_RETAIN_NONATOMIC);
        return [self swizzledInitWithRequest:request delegate:(id<NSURLConnectionDelegate>)proxy];
    }else{
        return [self swizzledInitWithRequest:request delegate:delegate];
    }
}

@end

2.3 Option 3: Hook

There are two types of hook technology in iOS, one is NSProxy and the other is method swizzling (isa swizzling)

2.3.1 Method 1

Writing SDK s is definitely not possible to manually break into business code (you don't have that privilege to submit online code)(vii)So whether it's an APM or a seamless bury, it's all done through Hook.

Aspect-oriented Programming (AOP) is a programming paradigm in computer science that further separates cross-cutting concerns from business entities to improve the modularity of program code.Add functionality to the program dynamically without modifying the source code.Its core idea is to separate business logic (core focus, main function of the system) from public functions (cross-cutting focus, such as log system), reduce complexity and maintain modularity, maintainability and reusability of the system.They are commonly used in scenarios such as logging systems, performance statistics, security control, transaction processing, exception handling, and so on.

The implementation of AOP in iOS is based on Runtime mechanism and is currently implemented in three ways: Method Swizzling, NSProxy, FishHook (mainly for hook c code).

Article above 2.1 Discusses scenarios where most of the requirements are met. NSURLProtocol monitors network requests for NSURLConnection, NSURLSession, and can initiate network requests and obtain information such as request start time, request end time, header, etc. after proxying itself, but it cannot obtain very detailed network performance data such as DNS start resolution time, DNS start resolution time, etc.How long did it take to resolve, when reponse started returning, how long did it take to return, and so on.NSURLSessionTaskDelegate added a proxy method after iOS 10 - (void) URLSession:(NSURLSession *) session task:(NSURLSessionTask *) task didFinishCollecting Metrics:(NSURLSessionTaskMetrics *) metrics API_metricsAVAILABLE (macosx (10.12), IOS (10.0), watchos (3.0), tvos (10.0)); can obtain accurate network data.But they are compatible.Article above 2.2 Discusses information from Webkit source via private method_setCollectsTimingData:,_TimingData can be obtained.

However, if you need to monitor all the network requests, you can not meet the requirements. After consulting the data, we found that Alibaba has the solution of APM, so we have Scheme 3. For network monitoring, we need to do the following processing

Perhaps unfamiliar to CFNetwork, take a look at its hierarchy and simple usage

CFNetwork is based on CFSocket and CFtream.

CFSocket: Sockets are the underlying foundation of network communication, allowing two socket ports to interactively transmit data. The most common socket abstraction in iOS is BSD socket.CFSocket is the OC packaging of BSD socket, which implements almost all BSD functions, in addition to RunLoop.

CFStream: Provides a device-independent method of reading and writing data. It can be used to stream data from memory, files, networks (using socket s) without having to write all data into memory.CFStream provides API s that provide abstraction for two CFType objects: CFReadStream, CFWriteStream.It is also the basis of CFHTTP and CFFTP.

Simple Demo

- (void)testCFNetwork
{
    CFURLRef urlRef = CFURLCreateWithString(kCFAllocatorDefault, CFSTR("https://httpbin.org/get"), NULL);
    CFHTTPMessageRef httpMessageRef = CFHTTPMessageCreateRequest(kCFAllocatorDefault, CFSTR("GET"), urlRef, kCFHTTPVersion1_1);
    CFRelease(urlRef);
    
    CFReadStreamRef readStream = CFReadStreamCreateForHTTPRequest(kCFAllocatorDefault, httpMessageRef);
    CFRelease(httpMessageRef);
    
    CFReadStreamScheduleWithRunLoop(readStream, CFRunLoopGetCurrent(), kCFRunLoopCommonModes);
    
    CFOptionFlags eventFlags = (kCFStreamEventHasBytesAvailable | kCFStreamEventErrorOccurred | kCFStreamEventEndEncountered);
    CFStreamClientContext context = {
        0,
        NULL,
        NULL,
        NULL,
       NULL
    } ;
    // Assigns a client to a stream, which receives callbacks when certain events occur.
    CFReadStreamSetClient(readStream, eventFlags, CFNetworkRequestCallback, &context);
    // Opens a stream for reading.
    CFReadStreamOpen(readStream);
}
// callback
void CFNetworkRequestCallback (CFReadStreamRef _Null_unspecified stream, CFStreamEventType type, void * _Null_unspecified clientCallBackInfo) {
    CFMutableDataRef responseBytes = CFDataCreateMutable(kCFAllocatorDefault, 0);
    CFIndex numberOfBytesRead = 0;
    do {
        UInt8 buffer[2014];
        numberOfBytesRead = CFReadStreamRead(stream, buffer, sizeof(buffer));
        if (numberOfBytesRead > 0) {
            CFDataAppendBytes(responseBytes, buffer, numberOfBytesRead);
        }
    } while (numberOfBytesRead > 0);
    
    
    CFHTTPMessageRef response = (CFHTTPMessageRef)CFReadStreamCopyProperty(stream, kCFStreamPropertyHTTPResponseHeader);
    if (responseBytes) {
        if (response) {
            CFHTTPMessageSetBody(response, responseBytes);
        }
        CFRelease(responseBytes);
    }
    
    // close and cleanup
    CFReadStreamClose(stream);
    CFReadStreamUnscheduleFromRunLoop(stream, CFRunLoopGetCurrent(), kCFRunLoopCommonModes);
    CFRelease(stream);
    
    // print response
    if (response) {
        CFDataRef reponseBodyData = CFHTTPMessageCopyBody(response);
        CFRelease(response);
        
        printResponseData(reponseBodyData);
        CFRelease(reponseBodyData);
    }
}

void printResponseData (CFDataRef responseData) {
    CFIndex dataLength = CFDataGetLength(responseData);
    UInt8 *bytes = (UInt8 *)malloc(dataLength);
    CFDataGetBytes(responseData, CFRangeMake(0, CFDataGetLength(responseData)), bytes);
    CFStringRef responseString = CFStringCreateWithBytes(kCFAllocatorDefault, bytes, dataLength, kCFStringEncodingUTF8, TRUE);
    CFShow(responseString);
    CFRelease(responseString);
    free(bytes);
}
// console
{
  "args": {}, 
  "headers": {
    "Host": "httpbin.org", 
    "User-Agent": "Test/1 CFNetwork/1125.2 Darwin/19.3.0", 
    "X-Amzn-Trace-Id": "Root=1-5e8980d0-581f3f44724c7140614c2564"
  }, 
  "origin": "183.159.122.102", 
  "url": "https://httpbin.org/get"
}

We know that the use of NSURLSession, NSURLConnection, and CFNetwork requires calling a bunch of methods to set up and then setting up proxy objects to implement proxy methods.So the first thing to think about monitoring this situation is to use runtime hook s to drop the method hierarchy.However, the proxy method for the set proxy object cannot hook because it does not know which class the proxy object is.So find a way to hook the proxy object this step, replace the proxy object with a class we have designed, and then let this class implement the NSURLConnection, NSURLSession, CFNetwork-related proxy methods.Then, within each of these methods, the method implementation of the original proxy object is invoked.So our needs are met, we can get monitoring data in the corresponding methods, such as request start time, end time, status code, content size, and so on.

NSURLSession, NSURLConnection hook are as follows.

There are APM schemes for CFNetwork in the industry, which are described below:

CFNetwork is implemented in c language. Dynamic Loader Hook library is required to hook c code - fishhook.

Dynamic Loader (dyld) binds symbols by updating pointers saved in Mach-O files.Use it to modify the function pointer of a C function call in Runtime.Implementation principle of fishhook: traversal_uInside DATA segment s_uNl_Symbol_Ptr, uLa_Symbol_Through the combination of Indirect Symbol Table, Symbol Table and String Table, the symbols in the two section s of PTR find the function that they want to replace to achieve the purpose of hook.

/* Returns the number of bytes read, or -1 if an error occurs preventing any

bytes from being read, or 0 if the stream's end was encountered.

It is an error to try and read from a stream that hasn't been opened first.

This call will block until at least one byte is available; it will NOT block

until the entire buffer can be filled. To avoid blocking, either poll using

CFReadStreamHasBytesAvailable() or use the run loop and listen for the

kCFStreamEventHasBytesAvailable event for notification of data available. */

CF_EXPORT

CFIndex CFReadStreamRead(CFReadStreamRef _Null_unspecified stream, UInt8 * _Null_unspecified buffer, CFIndex bufferLength);

CFNetwork uses CFReadStreamRef to pass data and a callback function to accept the server's response.When the callback function is subjected to

The steps and their key codes are as follows, with NSURLConnection as an example

  • Write a method swizzling tool class because there are so many places to Hook

    #import <Foundation/Foundation.h>
    
    NS_ASSUME_NONNULL_BEGIN
    
    @interface NSObject (hook)
    
    /**
     hook Object Method
    
     @param originalSelector Raw object method requiring hook
     @param swizzledSelector Object methods that need to be replaced
     */
    + (void)apm_swizzleMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector;
    
    /**
     hook Class method
    
     @param originalSelector Raw class method requiring hook
     @param swizzledSelector Class methods that need to be replaced
     */
    + (void)apm_swizzleClassMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector;
    
    @end
    
    NS_ASSUME_NONNULL_END
    
    + (void)apm_swizzleMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector
    {
        class_swizzleInstanceMethod(self, originalSelector, swizzledSelector);
    }
    
    + (void)apm_swizzleClassMethod:(SEL)originalSelector swizzledSelector:(SEL)swizzledSelector
    {
        //Class methods are actually stored in classes (metaclasses) of class objects, that is, class methods are equivalent to instance methods of metaclasses, so you only need to pass in metaclasses, just like other logical and interactive instance methods.
        Class class2 = object_getClass(self);
        class_swizzleInstanceMethod(class2, originalSelector, swizzledSelector);
    }
    
    void class_swizzleInstanceMethod(Class class, SEL originalSEL, SEL replacementSEL)
    {
        Method originMethod = class_getInstanceMethod(class, originalSEL);
        Method replaceMethod = class_getInstanceMethod(class, replacementSEL);
    
        if(class_addMethod(class, originalSEL, method_getImplementation(replaceMethod),method_getTypeEncoding(replaceMethod)))
        {
            class_replaceMethod(class,replacementSEL, method_getImplementation(originMethod), method_getTypeEncoding(originMethod));
        }else {
            method_exchangeImplementations(originMethod, replaceMethod);
        }
    }
    
  • Create a class that inherits from the NSProxy abstract class and implement the corresponding method.

    #import <Foundation/Foundation.h>
    
    NS_ASSUME_NONNULL_BEGIN
    
    // Set up proxy forwarding for NSURLConnection, NSURLSession, CFNetwork proxy
    @interface NetworkDelegateProxy : NSProxy
    
    + (instancetype)setProxyForObject:(id)originalTarget withNewDelegate:(id)newDelegate;
    
    @end
    
    NS_ASSUME_NONNULL_END
    
    // .m
    @interface NetworkDelegateProxy () {
        id _originalTarget;
        id _NewDelegate;
    }
    
    @end
    
    
    @implementation NetworkDelegateProxy
    
    #pragma mark - life cycle
    
    + (instancetype)sharedInstance {
        static NetworkDelegateProxy *_sharedInstance = nil;
    
        static dispatch_once_t onceToken;
    
        dispatch_once(&onceToken, ^{
            _sharedInstance = [NetworkDelegateProxy alloc];
        });
    
        return _sharedInstance;
    }
    
    
    #pragma mark - public Method
    
    + (instancetype)setProxyForObject:(id)originalTarget withNewDelegate:(id)newDelegate
    {
        NetworkDelegateProxy *instance = [NetworkDelegateProxy sharedInstance];
        instance->_originalTarget = originalTarget;
        instance->_NewDelegate = newDelegate;
        return instance;
    }
    
    - (void)forwardInvocation:(NSInvocation *)invocation
    {
        if ([_originalTarget respondsToSelector:invocation.selector]) {
            [invocation invokeWithTarget:_originalTarget];
            [((NSURLSessionAndConnectionImplementor *)_NewDelegate) invoke:invocation];
        }
    }
    
    - (nullable NSMethodSignature *)methodSignatureForSelector:(SEL)sel
    {
        return [_originalTarget methodSignatureForSelector:sel];
    }
    
    @end
    
  • Create an object to implement the NSURLConnection, NSURLSession, NSIuputStream proxy methods

    // NetworkImplementor.m
    
    #pragma mark-NSURLConnectionDelegate
    - (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
        NSLog(@"%s", __func__);
    }
    
    - (nullable NSURLRequest *)connection:(NSURLConnection *)connection willSendRequest:(NSURLRequest *)request redirectResponse:(nullable NSURLResponse *)response {
        NSLog(@"%s", __func__);
        return request;
    }
    
    #pragma mark-NSURLConnectionDataDelegate
    - (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response {
        NSLog(@"%s", __func__);
    }
    
    - (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
       NSLog(@"%s", __func__);
    }
    
    - (void)connection:(NSURLConnection *)connection   didSendBodyData:(NSInteger)bytesWritten
     totalBytesWritten:(NSInteger)totalBytesWritten
    totalBytesExpectedToWrite:(NSInteger)totalBytesExpectedToWrite {
        NSLog(@"%s", __func__);
    }
    
    - (void)connectionDidFinishLoading:(NSURLConnection *)connection {
        NSLog(@"%s", __func__);
    }
    
    #pragma mark-NSURLConnectionDownloadDelegate
    - (void)connection:(NSURLConnection *)connection didWriteData:(long long)bytesWritten totalBytesWritten:(long long)totalBytesWritten expectedTotalBytes:(long long) expectedTotalBytes {
        NSLog(@"%s", __func__);
    }
    
    - (void)connectionDidResumeDownloading:(NSURLConnection *)connection totalBytesWritten:(long long)totalBytesWritten expectedTotalBytes:(long long) expectedTotalBytes {
        NSLog(@"%s", __func__);
    }
    
    - (void)connectionDidFinishDownloading:(NSURLConnection *)connection destinationURL:(NSURL *) destinationURL {
        NSLog(@"%s", __func__);
    }
    // Write the data items you need to monitor on demand
    
  • Add Category to NSURLConnection, specifically set hook proxy object, hook NSURLConnection object method

    // NSURLConnection+Monitor.m
    @implementation NSURLConnection (Monitor)
    
    + (void)load
    {
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            @autoreleasepool {
                [[self class] apm_swizzleMethod:@selector(apm_initWithRequest:delegate:) swizzledSelector:@selector(initWithRequest: delegate:)];
            }
        });
    }
    
    - (_Nonnull instancetype)apm_initWithRequest:(NSURLRequest *)request delegate:(nullable id)delegate
    {
        /*
         1. Replace delegate when setting Delegate.
         2. To monitor data within each proxy method, you need to hook the proxy method
         3. When the original proxy method is executed, let the new proxy object do the method forwarding.
         */
        NSString *traceId = @"traceId";
        NSMutableURLRequest *rq = [request mutableCopy];
        NSString *preTraceId = [request.allHTTPHeaderFields valueForKey:@"head_key_traceid"];
        if (preTraceId) {
            // Initialization method before hook is called to return NSURLConnection
            return [self apm_initWithRequest:rq delegate:delegate];
        } else {
            [rq setValue:traceId forHTTPHeaderField:@"head_key_traceid"];
    
            NSURLSessionAndConnectionImplementor *mockDelegate = [NSURLSessionAndConnectionImplementor new];
            [self registerDelegateMethod:@"connection:didFailWithError:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"];
    
            [self registerDelegateMethod:@"connection:didReceiveResponse:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"];
            [self registerDelegateMethod:@"connection:didReceiveData:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"];
            [self registerDelegateMethod:@"connection:didFailWithError:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@@"];
    
            [self registerDelegateMethod:@"connectionDidFinishLoading:" originalDelegate:delegate newDelegate:mockDelegate flag:"v@:@"];
            [self registerDelegateMethod:@"connection:willSendRequest:redirectResponse:" originalDelegate:delegate newDelegate:mockDelegate flag:"@@:@@"];
            delegate = [NetworkDelegateProxy setProxyForObject:delegate withNewDelegate:mockDelegate];
    
            // Initialization method before hook is called to return NSURLConnection
            return [self apm_initWithRequest:rq delegate:delegate];
        }
    }
    
    - (void)registerDelegateMethod:(NSString *)methodName originalDelegate:(id<NSURLConnectionDelegate>)originalDelegate newDelegate:(NSURLSessionAndConnectionImplementor *)newDelegate flag:(const char *)flag
    {
        if ([originalDelegate respondsToSelector:NSSelectorFromString(methodName)]) {
            IMP originalMethodImp = class_getMethodImplementation([originalDelegate class], NSSelectorFromString(methodName));
            IMP newMethodImp = class_getMethodImplementation([newDelegate class], NSSelectorFromString(methodName));
            if (originalMethodImp != newMethodImp) {
                [newDelegate registerSelector: methodName];
                NSLog(@"");
            }
        } else {
            class_addMethod([originalDelegate class], NSSelectorFromString(methodName), class_getMethodImplementation([newDelegate class], NSSelectorFromString(methodName)), flag);
        }
    }
    
    @end
    

This way, the network information can be monitored, then the data is handed over to the data report SDK, and the data is reported according to the data report strategy.

2.3.2 Method 2

In fact, there is another way to meet the above requirements, isa swizzling.

By the way, after hook ing the NSURLConnection, NSURLSession, NSInputStream proxy object above, there is another way to forward the proxy object method using NSProxy, that is, isa swizzling.

  • Method swizzling principle

    struct old_method {
        SEL method_name;
        char *method_types;
        IMP method_imp;
    };
    

    The improved method swizzling version is as follows

    Method originalMethod = class_getInstanceMethod(aClass, aSEL);
    IMP originalIMP = method_getImplementation(originalMethod);
    char *cd = method_getTypeEncoding(originalMethod);
    IMP newIMP = imp_implementationWithBlock(^(id self) {
      void (*tmp)(id self, SEL _cmd) = originalIMP;
      tmp(self, aSEL);
    });
    class_replaceMethod(aClass, aSEL, newIMP, cd);
    
  • isa swizzling

    /// Represents an instance of a class.
    struct objc_object {
        Class _Nonnull isa  OBJC_ISA_AVAILABILITY;
    };
    
    /// A pointer to an instance of a class.
    typedef struct objc_object *id;
    
    

Let's analyze why modifying isa can be accomplished.

  1. The person writing the APM monitoring could not determine the business code
  2. It is not possible to write some classes for the purpose of monitoring the APM so that line-of-business developers do not use the system NSURLSession, NSURLConnection classes

Think about how KVO works?Combine the above figure

  • Create monitoring object subclass
  • Override getter, seeter of attributes in subclasses
  • Point the monitoring object's isa pointer to the newly created subclass
  • Intercept changes in getter s and setter s of subclasses to notify monitors of changes in object values
  • Restore isa of monitored objects back after monitoring

In this way, we can also dynamically create subclasses in the load methods of NSURLConnection and NSURLSession, and override methods in subclasses, such as - (**nullable** ** instance type**) initWithRequest:(NSURLRequest *) request delegate:(**nullable** ** id**) delegate starstartImmediately:(**BOOL**) tImmediately;, and then start NSURLSession, NSURLConnection'sIsa points to dynamically created subclasses.Restore your own isa pointer after these methods have been processed.

However, isa swizzling still targets method swizzling, the proxy object is uncertain, or NSProxy needs to be processed dynamically.

As for how to modify isa, I wrote a simple Demo to simulate KVO

- (void)lbpKVO_addObserver:(NSObject *)observer forKeyPath:(NSString *)keyPath options:(NSKeyValueObservingOptions)options context:(nullable void *)context {
    //Generate custom names
    NSString *className = NSStringFromClass(self.class);
    NSString *currentClassName = [@"LBPKVONotifying_" stringByAppendingString:className];
    //1. runtime Generation Class
    Class myclass = objc_allocateClassPair(self.class, [currentClassName UTF8String], 0);
    // Can not be used immediately after build, must register first
    objc_registerClassPair(myclass);
    
    //2. Override setter methods
    class_addMethod(myclass,@selector(say) , (IMP)say, "v@:@");
    
//    class_addMethod(myclass,@selector(setName:) , (IMP)setName, "v@:@");
    //3. Modify isa
    object_setClass(self, myclass);
    
    //4. Save the observer inside the current object
    objc_setAssociatedObject(self, "observer", observer, OBJC_ASSOCIATION_ASSIGN);
    
    //5. Bind the passed context to the current object
    objc_setAssociatedObject(self, "context", (__bridge id _Nullable)(context), OBJC_ASSOCIATION_RETAIN);
}


void say(id self, SEL _cmd)
{
   // Call parent method one
    struct objc_super superclass = {self, [self superclass]};
    ((void(*)(struct objc_super *,SEL))objc_msgSendSuper)(&superclass,@selector(say));
    NSLog(@"%s", __func__);
// Call parent method two
//    Class class = [self class];
//    object_setClass(self, class_getSuperclass(class));
//    objc_msgSend(self, @selector(say));
}

void setName (id self, SEL _cmd, NSString *name) {
    NSLog(@"come here");
    //Switch to the parent of the current class, send the message setName, and then switch to the current subclass
    //1. Switch to parent class
    Class class = [self class];
    object_setClass(self, class_getSuperclass(class));
    //2. Call the setName method of the parent class
    objc_msgSend(self, @selector(setName:), name);
    
    //3. Call observation
    id observer = objc_getAssociatedObject(self, "observer");
    id context = objc_getAssociatedObject(self, "context");
    if (observer) {
        objc_msgSend(observer, @selector(observeValueForKeyPath:ofObject:change:context:), @"name", self, @{@"new": name, @"kind": @1 } , context);
    }
    //4. Change back to subclasses
    object_setClass(self, class);
}

@end

2.4 Scenario 4: Monitor common App network requests

For cost reasons, most projects now have network capabilities that pass through AFNetworking Completed, so the network monitoring in this article can be completed quickly.

AFNetworking will be notified when it initiates the network.AFNetworkingTaskDidResumeNotification and AFNetworkingTaskDidCompleteNotification.Get network information by listening for parameters carried by notifications.

 self.didResumeObserver = [[NSNotificationCenter defaultCenter] addObserverForName:AFNetworkingTaskDidResumeNotification object:nil queue:self.queue usingBlock:^(NSNotification * _Nonnull note) {
    // start
    __strong __typeof(weakSelf)strongSelf = weakSelf;
    NSURLSessionTask *task = note.object;
    NSString *requestId = [[NSUUID UUID] UUIDString];
    task.apm_requestId = requestId;
    [strongSelf.networkRecoder recordStartRequestWithRequestID:requestId task:task];
}];

self.didCompleteObserver = [[NSNotificationCenter defaultCenter] addObserverForName:AFNetworkingTaskDidCompleteNotification object:nil queue:self.queue usingBlock:^(NSNotification * _Nonnull note) {
    
    __strong __typeof(weakSelf)strongSelf = weakSelf;
    
    NSError *error = note.userInfo[AFNetworkingTaskDidCompleteErrorKey];
    NSURLSessionTask *task = note.object;
    if (!error) {
        // Success
        [strongSelf.networkRecoder recordFinishRequestWithRequestID:task.cmn_requestId task:task];
    } else {
        // fail
        [strongSelf.networkRecoder recordResponseErrorWithRequestID:task.cmn_requestId task:task error:error];
    }
}];

Assemble the data in the networkRecoder's method, hand it to the data reporting component, and wait until the appropriate time strategy is available to report.

Because the network is an asynchronous process, it is necessary to set a unique identity for each network at the beginning of the network request, and wait until the network request is completed to determine how long the network took and whether it succeeded or not based on the identity of each request.So the solution is to add a classification for NSURLSessionTask and an attribute, the unique identity, through runtime.

Take note of the naming of Category and its internal properties and methods.What if you don't pay attention?If you want to add the ability to hide the middle digits of an ID number to the NSString class, the long-coded old driver A adds a method name to the NSString called getMaskedIdCardNumber, but his requirement is to hide it from the four-digit string [9, 12].A few days later colleague B encountered a similar need. He was also an old driver, adding a method for NSString called getMaskedIdCardNumber, but his need was from [8,11] The four strings are hidden, but when he introduced the project, he found that the output was not what he expected. The single test written for the method failed. He thought he wrote the wrong intercept method and checked several times before he found that the project introduced another NSString classification with the same name._Real pit.

The following example is SDK, but the same is true for everyday development.

  • Category Class Name: It is recommended to prefix the current SDK name with an underscore, plus the functions of the current classification, that is, class name + SDK name abbreviation_Function name.For example, if the current SDK is called Juhua Suan APM, then the name of the NSURLSessionTask Category is called NSURLSessionTask+JuHuaSuan APM_NetworkMonitor.h
  • Category attribute name: It is recommended that the current SDK name be prefixed with an underscore and the attribute name, that is, SDK name abbreviation_Property name.For example, JuhuaSuanAPM_requestId`
  • Category method name: It is recommended that the current SDK name be prefixed with an underscore and the method name, which is SDK name abbreviation_Method name.For example - (BOOL) Juhua Suan APM_uIsGzippedData

Examples are as follows:

#import <Foundation/Foundation.h>

@interface NSURLSessionTask (JuhuaSuanAPM_NetworkMonitor)

@property (nonatomic, copy) NSString* JuhuaSuanAPM_requestId;

@end

#import "NSURLSessionTask+JuHuaSuanAPM_NetworkMonitor.h"
#import <objc/runtime.h>

@implementation NSURLSessionTask (JuHuaSuanAPM_NetworkMonitor)

- (NSString*)JuhuaSuanAPM_requestId
{
    return objc_getAssociatedObject(self, _cmd);
}

- (void)setJuhuaSuanAPM_requestId:(NSString*)requestId
{
    objc_setAssociatedObject(self, @selector(JuhuaSuanAPM_requestId), requestId, OBJC_ASSOCIATION_COPY_NONATOMIC);
}
@end

2.5 iOS traffic monitoring

2.5.1 HTTP Request, Response Data Structure

HTTP Request Message Structure

Structure of response message

  1. HTTP messages are formatted blocks of data, each consisting of three parts: the starting line for describing the message, the first block containing attributes, and an optional body containing data.
  2. The start line and the hand are ASCII text with line delimiters, each ending with a 2-character line termination sequence (including a carriage return and a line break)
  3. The principal of an entity or message is an optional data block.Unlike the start line and the header, the body can contain text or binary data, or it can be empty.
  4. The HTTP Headers should always end with an empty line, even if there is no entity part.The browser sent a blank line to notify the server that it has ended sending the header information.

Request message format

<method> <request-URI> <version>
<headers>

<entity-body>

Format of response message

<version> <status> <reason-phrase>
<headers>

<entity-body>

Below is the request information to open Chrome to view the polar lesson time web page.Includes information such as response line, response header, response body, etc.

Below is a complete request and response data view using curl at the terminal

We all know that in HTTP communication, response data will be compressed by gzip or other compression, listened by scenarios such as NSURLProtocol, calculated analysis traffic by NSData type, etc. This will result in inaccurate data, because the content of a normal HTTP response body is compressed by gzip or other compression, so using NSData will be larger.

2.5.2 Question
  1. Request and Response do not necessarily exist in pairs

    For example, the network is disconnected, App suddenly Crash, etc., so Request and Response should not be recorded in a record after monitoring

  2. Request traffic is not calculated accurately

    The main reasons are:

    • Monitoring technology scheme ignores data size of request header and request line part
    • Monitoring technology scheme ignores data size of Cookie part
    • Monitoring technology schemes are used directly when calculating the size of the request body HTTPBody.length Resulting in inaccuracies
  3. Incorrect calculation of response flow

    The main reasons are:

    • Monitoring technology scheme ignores data size of response header and response line parts
    • Monitoring technology scheme calculates the byte size of the body part, which is inaccurate due to exceptedContentLength
    • The monitoring technology scheme ignores the use of gzip compression for the response volume.Accept-Encoding in the request header of the originating request during true network communicationThe field represents the data compression method supported by the client (which indicates that the client can use the data normally), and the service side also processes the data according to the compression method desired by the client, the compression method currently supported by the service side, and the Content-Encoding field in the response header indicates what compression method the current server uses.
2.5.3 Technology Implementation

The fifth part describes various principles and technical schemes of network interception, here NSURLProtocol is used to implement traffic monitoring (Hook mode).If you know what we need from the above, you can do it step by step.

2.5.3.1 Request section
  1. NSURLProtocol first manages various network requests for App using a network monitoring scheme

  2. Record the required parameters within each method (NSURLProtocol cannot analyze data sizes and time consumption such as request handshake, wave, etc., but is sufficient for normal interface traffic analysis; the bottom layer requires the Socket layer)

    @property(nonatomic, strong) NSURLConnection *internalConnection;
    @property(nonatomic, strong) NSURLResponse *internalResponse;
    @property(nonatomic, strong) NSMutableData *responseData;
    @property (nonatomic, strong) NSURLRequest *internalRequest;
    
    - (void)startLoading
    {
        NSMutableURLRequest *mutableRequest = [[self request] mutableCopy];
        self.internalConnection = [[NSURLConnection alloc] initWithRequest:mutableRequest delegate:self];
        self.internalRequest = self.request;
    }
    
    - (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
    {
        [self.client URLProtocol:self didReceiveResponse:response cacheStoragePolicy:NSURLCacheStorageNotAllowed];
        self.internalResponse = response;
    }
    
    - (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data 
    {
        [self.responseData appendData:data];
        [self.client URLProtocol:self didLoadData:data];
    }
    
  3. Status Line section

NSURLResponse has no attributes or interfaces such as Status Line and no HTTP Version information, so try to get Status Line and find a way to convert it to the CFNetwork layer.Found private API implementations.

Idea: Pass NSURLResponse through _CFURLResponse is converted to CFTypeRef, then CFTypeRef is converted to CFHTTPMessageRef, and Status Line information for CFHTTPMessageRef is obtained from CFHTTPMessageCopyResponseStatusLine.

Adds a classification of NSURLResponse to the ability to read Status Line.

// NSURLResponse+cm_FetchStatusLineFromCFNetwork.h
#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface NSURLResponse (cm_FetchStatusLineFromCFNetwork)

- (NSString *)cm_fetchStatusLineFromCFNetwork;

@end

NS_ASSUME_NONNULL_END

// NSURLResponse+cm_FetchStatusLineFromCFNetwork.m
#import "NSURLResponse+cm_FetchStatusLineFromCFNetwork.h"
#import <dlfcn.h>


#define SuppressPerformSelectorLeakWarning(Stuff) \
do { \
    _Pragma("clang diagnostic push") \
    _Pragma("clang diagnostic ignored \"-Warc-performSelector-leaks\"") \
    Stuff; \
    _Pragma("clang diagnostic pop") \
} while (0)

typedef CFHTTPMessageRef (*CMURLResponseFetchHTTPResponse)(CFURLRef response);

@implementation NSURLResponse (cm_FetchStatusLineFromCFNetwork)

- (NSString *)cm_fetchStatusLineFromCFNetwork
{
    NSString *statusLine = @"";
    NSString *funcName = @"CFURLResponseGetHTTPResponse";
    CMURLResponseFetchHTTPResponse originalURLResponseFetchHTTPResponse = dlsym(RTLD_DEFAULT, [funcName UTF8String]);
    
    SEL getSelector = NSSelectorFromString(@"_CFURLResponse");
    if ([self respondsToSelector:getSelector] && NULL != originalURLResponseFetchHTTPResponse) {
        CFTypeRef cfResponse;
        SuppressPerformSelectorLeakWarning(
            cfResponse = CFBridgingRetain([self performSelector:getSelector]);
        );
        if (NULL != cfResponse) {
            CFHTTPMessageRef messageRef = originalURLResponseFetchHTTPResponse(cfResponse);
            statusLine = (__bridge_transfer NSString *)CFHTTPMessageCopyResponseStatusLine(messageRef);
            CFRelease(cfResponse);
        }
    }
    return statusLine;
}

@end
  1. Convert the obtained Status Line to NSData and calculate the size

    - (NSUInteger)cm_getLineLength {
        NSString *statusLineString = @"";
        if ([self isKindOfClass:[NSHTTPURLResponse class]]) {
            NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)self;
            statusLineString = [self cm_fetchStatusLineFromCFNetwork];
        }
        NSData *lineData = [statusLineString dataUsingEncoding:NSUTF8StringEncoding];
        return lineData.length;
    }
    
  2. Header section

    allHeaderFields takes NSDictionary, splits it into strings according to key: value, and converts it to NSData to calculate size

    Note: There is a space after the key: value key, and the curl or chrome Network panel can see under the confirmation.

    - (NSUInteger)cm_getHeadersLength
    {
        NSUInteger headersLength = 0;
        if ([self isKindOfClass:[NSHTTPURLResponse class]]) {
            NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)self;
            NSDictionary *headerFields = httpResponse.allHeaderFields;
            NSString *headerString = @"";
            for (NSString *key in headerFields.allKeys) {
                headerString = [headerStr stringByAppendingString:key];
                headheaderStringerStr = [headerString stringByAppendingString:@": "];
                if ([headerFields objectForKey:key]) {
                    headerString = [headerString stringByAppendingString:headerFields[key]];
                }
                headerString = [headerString stringByAppendingString:@"\n"];
            }
            NSData *headerData = [headerString dataUsingEncoding:NSUTF8StringEncoding];
            headersLength = headerData.length;
        }
        return headersLength;
    }
    
  3. Body Part

    The size of the Body cannot be calculated directly using excepectedContentLength, which is inaccurate and can only be used as a reference in official documents.Or the Content-Length value in allHeaderFields is not accurate enough.

    /*!

    @abstract Returns the expected content length of the receiver.

    @discussion Some protocol implementations report a content length

    as part of delivering load metadata, but not all protocols

    guarantee the amount of data that will be delivered in actuality.

    Hence, this method returns an expected amount. Clients should use

    this value as an advisory, and should be prepared to deal with

    either more or less data.

    @result The expected content length of the receiver, or -1 if

    there is no expectation that can be arrived at regarding expected

    content length.

    */

    @property (readonly) long long expectedContentLength;

    • HTTP version 1.1 specifies that if there is a Transfer-Encoding: chunked, there can be no Content-Length in the header and that it can be ignored.
    • In HTTP 1.0 and earlier, the content-length field was available or not
    • In HTTP version 1.1 and later.If it is keep alive, Content-Length and chunked must be the two choices.If not keep alive, it is the same as HTTP 1.0.Content-Length is optional.

    What is Transfer-Encoding: chunked

    The data is sent as a series of chunks The Content-Length header is not sent in this case.At the beginning of each chunk the length of the current chunk needs to be added, expressed in hexadecimal form, followed by \r\n, then the chunk itself, followed by \r\n. The ending block is a regular chunk, except that its length is 0.

    We previously recorded data with NSMutableData, so we can calculate the Body size in the stopLoading method.The steps are as follows:

    • Continuously adding data to didReceiveData

      - (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
      {
          [self.responseData appendData:data];
          [self.client URLProtocol:self didLoadData:data];
      }
      
    • Get the allHeaderFields dictionary in the stopLoading method, get the value of Content-Encoding key, and if it is gzip, process NSData into gzip-compressed data in stopLoading and calculate the size.(gzip-related functions can use this tool)

      Need to calculate the length of an extra blank line

      - (void)stopLoadi
      {
          [self.internalConnection cancel];
      
          PCTNetworkTrafficModel *model = [[PCTNetworkTrafficModel alloc] init];
          model.path = self.request.URL.path;
          model.host = self.request.URL.host;
          model.type = DMNetworkTrafficDataTypeResponse;
          model.lineLength = [self.internalResponse cm_getStatusLineLength];
          model.headerLength = [self.internalResponse cm_getHeadersLength];
          model.emptyLineLength = [self.internalResponse cm_getEmptyLineLength];
          if ([self.dm_response isKindOfClass:[NSHTTPURLResponse class]]) {
              NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)self.dm_response;
              NSData *data = self.dm_data;
              if ([[httpResponse.allHeaderFields objectForKey:@"Content-Encoding"] isEqualToString:@"gzip"]) {
                  data = [self.dm_data gzippedData];
              }
              model.bodyLength = data.length;
          }
          model.length = model.lineLength + model.headerLength + model.bodyLength + model.emptyLineLength;
          NSDictionary *networkTrafficDictionary = [model convertToDictionary];
          [[PrismClient sharedInstance] sendWithType:CMMonitorNetworkTrafficType meta:networkTrafficDictionary payload:nil];
      }
      
2.5.3.2 Resquest section
  1. NSURLProtocol first manages various network requests for App using a network monitoring scheme

  2. Record the required parameters within each method (NSURLProtocol cannot analyze data sizes and time consumption such as request handshake, wave, etc., but is sufficient for normal interface traffic analysis; the bottom layer requires the Socket layer)

    @property(nonatomic, strong) NSURLConnection *internalConnection;
    @property(nonatomic, strong) NSURLResponse *internalResponse;
    @property(nonatomic, strong) NSMutableData *responseData;
    @property (nonatomic, strong) NSURLRequest *internalRequest;
    
    - (void)startLoading
    {
        NSMutableURLRequest *mutableRequest = [[self request] mutableCopy];
        self.internalConnection = [[NSURLConnection alloc] initWithRequest:mutableRequest delegate:self];
        self.internalRequest = self.request;
    }
    
    - (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
    {
        [self.client URLProtocol:self didReceiveResponse:response cacheStoragePolicy:NSURLCacheStorageNotAllowed];
        self.internalResponse = response;
    }
    
    - (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data 
    {
        [self.responseData appendData:data];
        [self.client URLProtocol:self didLoadData:data];
    }
    
  3. Status Line section

    There is no way to find StatusLine for NSURLRequest like NSURLResponse.So the back-to-back scenario is that you manually construct one yourself based on the structure of the Status Line.The structure is: protocol version number + space + status code + space + status text + line break

    Add a classification for NSURLRequest that specifically gets Status Line.

    // NSURLResquest+cm_FetchStatusLineFromCFNetwork.m
    - (NSUInteger)cm_fetchStatusLineLength
    {
      NSString *statusLineString = [NSString stringWithFormat:@"%@ %@ %@\n", self.HTTPMethod, self.URL.path, @"HTTP/1.1"];
      NSData *statusLineData = [statusLineString dataUsingEncoding:NSUTF8StringEncoding];
      return statusLineData.length;
    }
    
  4. Header section

    An HTTP request is constructed to determine whether a cache exists, then DNS domain name resolution is performed to obtain the server IP address of the domain name requested.If the request protocol is HTTPS, then a TLS connection is also required.The next step is to establish a TCP connection using an IP address and server.Once the connection is established, the browser builds the request line, request header, and so on, attaches data such as cookies related to the domain name to the request header, and then sends the built request information to the server.

    So a network monitor doesn't consider cookie s_"That's not the end of the calf," he said.

    I've read some articles that say NSURLRequest doesn't get complete request header information.Actually, the problem is not big, and there is no way to get a few information completely.Measuring the monitoring scenario itself is to see whether the interface consumes unusual data in different versions or in some cases, and whether the WebView resource request is too large, similar to the idea of the control variable method.

    So once you get allHeaderFields for NSURLRequest, add cookie information to calculate the full Header size

    // NSURLResquest+cm_FetchHeaderWithCookies.m
    - (NSUInteger)cm_fetchHeaderLengthWithCookie
    {
        NSDictionary *headerFields = self.allHTTPHeaderFields;
        NSDictionary *cookiesHeader = [self cm_fetchCookies];
    
        if (cookiesHeader.count) {
            NSMutableDictionary *headerDictionaryWithCookies = [NSMutableDictionary dictionaryWithDictionary:headerFields];
            [headerDictionaryWithCookies addEntriesFromDictionary:cookiesHeader];
            headerFields = [headerDictionaryWithCookies copy];
        }
    
        NSString *headerString = @"";
    
        for (NSString *key in headerFields.allKeys) {
            headerString = [headerString stringByAppendingString:key];
            headerString = [headerString stringByAppendingString:@": "];
            if ([headerFields objectForKey:key]) {
                headerString = [headerString stringByAppendingString:headerFields[key]];
            }
            headerString = [headerString stringByAppendingString:@"\n"];
        }
        NSData *headerData = [headerString dataUsingEncoding:NSUTF8StringEncoding];
        headersLength = headerData.length;
        return headerString;
    }
    
    - (NSDictionary *)cm_fetchCookies
    {
        NSDictionary *cookiesHeaderDictionary;
        NSHTTPCookieStorage *cookieStorage = [NSHTTPCookieStorage sharedHTTPCookieStorage];
        NSArray<NSHTTPCookie *> *cookies = [cookieStorage cookiesForURL:self.URL];
        if (cookies.count) {
            cookiesHeaderDictionary = [NSHTTPCookie requestHeaderFieldsWithCookies:cookies];
        }
        return cookiesHeaderDictionary;
    }
    
  5. Body Part

    HTTPody for NSURLConnection may not be available, similar to ajax on WebView.So you can calculate the body size by reading stream s from HTTPBody Stream.

    - (NSUInteger)cm_fetchRequestBody
    {
        NSDictionary *headerFields = self.allHTTPHeaderFields;
        NSUInteger bodyLength = [self.HTTPBody length];
    
        if ([headerFields objectForKey:@"Content-Encoding"]) {
            NSData *bodyData;
            if (self.HTTPBody == nil) {
                uint8_t d[1024] = {0};
                NSInputStream *stream = self.HTTPBodyStream;
                NSMutableData *data = [[NSMutableData alloc] init];
                [stream open];
                while ([stream hasBytesAvailable]) {
                    NSInteger len = [stream read:d maxLength:1024];
                    if (len > 0 && stream.streamError == nil) {
                        [data appendBytes:(void *)d length:len];
                    }
                }
                bodyData = [data copy];
                [stream close];
            } else {
                bodyData = self.HTTPBody;
            }
            bodyLength = [[bodyData gzippedData] length];
        }
        return bodyLength;
    }
    
  6. Report data in the - (NSURLRequest *)connection:(NSURLConnection *)connection willSendRequest:(NSURLRequest *)request redirectResponse:(NSURLResponse *)response method Create powerful, flexible and configurable data reporting components speak

    -(NSURLRequest *)connection:(NSURLConnection *)connection willSendRequest:(NSURLRequest *)request redirectResponse:(NSURLResponse *)response
    {
        if (response != nil) {
            self.internalResponse = response;
            [self.client URLProtocol:self wasRedirectedToRequest:request redirectResponse:response];
        }
    
        PCTNetworkTrafficModel *model = [[PCTNetworkTrafficModel alloc] init];
        model.path = request.URL.path;
        model.host = request.URL.host;
        model.type = DMNetworkTrafficDataTypeRequest;
        model.lineLength = [connection.currentRequest dgm_getLineLength];
        model.headerLength = [connection.currentRequest dgm_getHeadersLengthWithCookie];
        model.bodyLength = [connection.currentRequest dgm_getBodyLength];
        model.emptyLineLength = [self.internalResponse cm_getEmptyLineLength];
        model.length = model.lineLength + model.headerLength + model.bodyLength + model.emptyLineLength;
    
        NSDictionary *networkTrafficDictionary = [model convertToDictionary];
        [[PrismClient sharedInstance] sendWithType:CMMonitorNetworkTrafficType meta:networkTrafficDictionary payload:nil];
        return request;
    }
    
   
   

 



## 6. Power Consumption

//Power consumption on mobile devices has always been a sensitive issue. If users find that an App consumes a lot of power and the phone is getting hot, they are likely to uninstall the App right away.So you need to be concerned about energy consumption during the development phase.

//Generally speaking, when we encounter a high power consumption, we immediately wonder if location is used, if frequent network requests are used, or if something is being done in a continuous cycle.

//There's basically nothing wrong with the development phase. We can use the `Energy Log'tool in `Instrucments' to locate problems.But online problems require code to monitor power consumption and can be one of the APM's capabilities.



### 1. How to get the power

//In iOS, `IOKit'is a private framework for getting detailed information about hardware and devices as well as the underlying framework for communication between hardware and kernel services.So we can get the hardware information through `IOKit', so we can get the power information.The steps are as follows:

- First open source at Apple opensource Found in  [IOPowerSources.h](https://opensource.apple.com/source/IOKitUser/IOKitUser-647.6/ps.subproj/IOPowerSources.h.auto.html), [IOPSKeys.h](https://opensource.apple.com/source/IOKitUser/IOKitUser-647.6/ps.subproj/IOPSKeys.h).Find in Xcode's `Package Contents``IOKit.framework`.Path is `/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/System/Library/Frameworks/IOKit.framework`
- Then the IOPowerSources.h,IOPSKeys.h,IOKit.framework Import Project Project
- Set up UIDevice Of batteryMonitoringEnabled by true
- Accuracy of power consumption obtained is 1%

‚Äč```objective-c
- (double)fetchBatteryCostUsage
{
  // returns a blob of power source information in an opaque CFTypeRef
    CFTypeRef blob = IOPSCopyPowerSourcesInfo();
    // returns a CFArray of power source handles, each of type CFTypeRef
    CFArrayRef sources = IOPSCopyPowerSourcesList(blob);
    CFDictionaryRef pSource = NULL;
    const void *psValue;
    // returns the number of values currently in an array
    int numOfSources = CFArrayGetCount(sources);
    // error in CFArrayGetCount
    if (numOfSources == 0) {
        NSLog(@"Error in CFArrayGetCount");
        return -1.0f;
    }

    // calculating the remaining energy
    for (int i=0; i<numOfSources; i++) {
        // returns a CFDictionary with readable information about the specific power source
        pSource = IOPSGetPowerSourceDescription(blob, CFArrayGetValueAtIndex(sources, i));
        if (!pSource) {
            NSLog(@"Error in IOPSGetPowerSourceDescription");
            return -1.0f;
        }
        psValue = (CFStringRef) CFDictionaryGetValue(pSource, CFSTR(kIOPSNameKey));

        int curCapacity = 0;
        int maxCapacity = 0;
        double percentage;

        psValue = CFDictionaryGetValue(pSource, CFSTR(kIOPSCurrentCapacityKey));
        CFNumberGetValue((CFNumberRef)psValue, kCFNumberSInt32Type, &curCapacity);

        psValue = CFDictionaryGetValue(pSource, CFSTR(kIOPSMaxCapacityKey));
        CFNumberGetValue((CFNumberRef)psValue, kCFNumberSInt32Type, &maxCapacity);

        percentage = ((double) curCapacity / (double) maxCapacity * 100.0f);
        NSLog(@"curCapacity : %d / maxCapacity: %d , percentage: %.1f ", curCapacity, maxCapacity, percentage);
        return percentage;
    }
    return -1.0f;
}

2. Positioning issues

Usually we solve a lot of problems with Energy Log in Instrucments, App comes online, and the power consumption on the line needs to be solved with APM.The power consumption may be a two-party library, a three-party library, or the code of a colleague.

The idea is: after detecting power consumption, first find the problematic thread, then stack dump, restore the scene of the case.

In the above section we know the structure of the thread information, thread_Basic_There is a field in info that records percentage CPU usage cpu_usage.So we can identify the problematic thread by traversing the current thread to determine which thread has a higher CPU usage.Then dump the stack to locate the code that consumes power.See more 3.2 Part.

3. What can we do in the development phase to address power consumption

CPU intensive computing is the main cause of power consumption.Therefore, we need to be careful with our CPU usage.Try to avoid leaving the CPU idle.For complex operations involving large amounts of data, the capabilities of the server and the GPU can be leveraged.If the schema design must operate on the CPU, you can use GCD technology and dispatch_Block_Create_With_Qos_Class (<#dispatch_Block_Flags_T flags#>, dispatch_Qos_Class_T qos_Class, <#int relative_Priority#>, <#^(void) block#>() and specify that the QoS of the queue is QOS_CLASS_UTILITY.Submit the task to this queue's block in QOS_CLASS_In UTILITY mode, the system optimizes the power consumption for a large amount of data calculation

In addition to the large number of CPU operations, I/O operations are also the main cause of power consumption.A common scenario in the industry is to defer the operation of writing fragmented data to disk storage, aggregate it in memory, and then store it on disk.The mechanism by which fragmented data is aggregated and stored in memory, and the NSCache object is provided by iOS.

NSCache is thread-safe, NSCache cleans up the cache when it meets the condition of reaching the preset cache space, triggering - (**void**)cache:(NSCache *)cache willEvictObject:(**id**)obj; method callback, which performs I/O operations on the data inside the method to delay the aggregated data I/O.Less I/O, less power consumption.

NSCache allows you to view SDWebImage, a picture loading framework.Instead of reading hard disk files (I/O) directly, the system's NSCache is used for image read cache processing.

- (nullable UIImage *)imageFromMemoryCacheForKey:(nullable NSString *)key {
    return [self.memoryCache objectForKey:key];
}

- (nullable UIImage *)imageFromDiskCacheForKey:(nullable NSString *)key {
    UIImage *diskImage = [self diskImageForKey:key];
    if (diskImage && self.config.shouldCacheImagesInMemory) {
        NSUInteger cost = diskImage.sd_memoryCost;
        [self.memoryCache setObject:diskImage forKey:key cost:cost];
    }

    return diskImage;
}

You can see that the main logic is to read pictures from disk first. If the configuration allows memory caching to be turned on, the pictures are saved to NSCache and read from NSCache when used.NSCache's totalCostLimit, countLimit properties,

- (void) setObject:(ObjectType) obj for Key:(KeyType) key cost:(NSUInteger) g; method is used to set cache condition.So we can use this strategy for optimizing power consumption when writing files to disk and memory.

6. Crash Monitoring

1. Review of knowledge related to anomalies

1.1 Mach Layer Handling Exceptions

Mach implements a unique set of exception handling methods based on messaging.Mach exception handling is designed to take into account:

  • Single exception handling facility with consistent semantics: Mach only provides an exception handling mechanism to handle all types of exceptions, including user-defined exceptions, platform-independent exceptions, and platform-specific exceptions.Grouping based on exception types allows specific platforms to define specific subtypes.
  • Clear and concise: The interface for exception handling relies on Mach's well-defined message and port architecture and is therefore elegant (without compromising efficiency).This allows for the expansion of debuggers and external handlers - even in theory, network-based exception handling.

In Mach, exceptions are handled through an infrastructure-messaging mechanism in the kernel.An exception is not much more complex than a message; it is caused by an error thread or task (via msg_send() is thrown, then passed through msg_by a handlerRecv()) capture.The handler can handle exceptions, make them clear (mark them as complete and continue), and decide to terminate the thread.

Unlike other exception handling models, Mach's exception handler runs in the context of the thread in which the error occurred, while Mach's exception handler runs in a different context, and the thread in error sends a message to a predefined exception port and waits for a response.Each task can register an exception handling port that will take effect for all threads in the task.In addition, each thread can pass threads_Set_Exception_Ports (<#thread_Act_T thread#>, <#exception_Mask_T exception_Mask#>, <#mach_Port_T new_Port#>, <#exception_Behavior_T behavior#>, <#thread_State_Flavor_T new_Flavor#>) Register your own exception handling port.Normally, the exception ports for tasks and threads are NULL, meaning that exceptions are not handled, and once an exception port is created, these ports can be transferred to other tasks or other hosts, just like other ports in the system.(With ports, you can use the UDP protocol to allow applications on other hosts to handle exceptions through network capabilities).

When an exception occurs, try throwing the exception to the thread's exception port first, then to the task's exception port, and finally to the host's exception port (the default port registered by the host).If no port returns KERN_SUCCESS, then the entire task will be terminated.That is, Mach does not provide exception handling logic, only a framework for passing exception notifications.

Exceptions are first caused by processor traps.To deal with traps, every modern kernel has a trap handler installed.These underlying functions are inserted by the assembly portion of the kernel.

1.2 BSD Layer Handling Exceptions

The BSD layer is the primary XUN interface used by the user state, which shows an interface that complies with the POSIX standard.Developers can use all the functionality of UNIX systems, but they don't need to know the details of the Mach layer implementation.

Mach has provided the underlying trap handling through the exception mechanism, while BSD builds a signal processing mechanism on top of the exception mechanism.Signals generated by hardware are captured by the Mach layer and then converted to corresponding UNIX signals. To maintain a unified mechanism, signals generated by the operating system and users are first converted to Mach exceptions and then to signals.

Mach exceptions are all ux_in the host layerExceptions are converted to the corresponding unix signal and threadsignal s are delivered to the thread in error.

2. Crash collection method

Apples`s Crash Reporter, which comes with iOS, records Crash logs in settings. Let's first look at the Crash logs

Incident Identifier: 7FA6736D-09E8-47A1-95EC-76C4522BDE1A
CrashReporter Key:   4e2d36419259f14413c3229e8b7235bcc74847f3
Hardware Model:      iPhone7,1
Process:         CMMonitorExample [3608]
Path:            /var/containers/Bundle/Application/9518A4F4-59B7-44E9-BDDA-9FBEE8CA18E5/CMMonitorExample.app/CMMonitorExample
Identifier:      com.Wacai.CMMonitorExample
Version:         1.0 (1)
Code Type:       ARM-64
Parent Process:  ? [1]

Date/Time:       2017-01-03 11:43:03.000 +0800
OS Version:      iOS 10.2 (14C92)
Report Version:  104

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread:  0

Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSSingleObjectArrayI objectForKey:]: unrecognized selector sent to instance 0x174015060'

Thread 0 Crashed:
0   CoreFoundation                  0x0000000188f291b8 0x188df9000 + 1245624 (<redacted> + 124)
1   libobjc.A.dylib                 0x000000018796055c 0x187958000 + 34140 (objc_exception_throw + 56)
2   CoreFoundation                  0x0000000188f30268 0x188df9000 + 1274472 (<redacted> + 140)
3   CoreFoundation                  0x0000000188f2d270 0x188df9000 + 1262192 (<redacted> + 916)
4   CoreFoundation                  0x0000000188e2680c 0x188df9000 + 186380 (_CF_forwarding_prep_0 + 92)
5   CMMonitorExample                0x000000010004c618 0x100044000 + 34328 (-[MakeCrashHandler throwUncaughtNSException] + 80)

You will find that the Exception Type entry in the Crash log consists of two parts: Mach exception + Unix signal.

So Exception Type: EXC_CRASH (SIGABRT) indicates that EXC_has occurred in the Mach layerCRASH exception, converted to SIGABRT signal at host level and delivered to the thread in error.

Question: Capturing Mach layer exceptions and registering Unix signal processing can both capture Crash. How do you choose between these two approaches?

Answer: Optimize Mach Layer Abnormal Interception.From the description in 1.2 above, we know that Mach layer exception handling occurs earlier. If the Mach layer exception handler exits the process, the Unix signal will never occur.

There are many open source projects in the industry for collecting crash logs, such as KSCrash, plcrashreporter, Bugly, and allies that provide one-stop services.We typically use open source projects to develop bug collection tools that meet our internal needs.After a comparison, choose KSCrash.Why choose KSCrash is not the focus of this article.

KSCrash is fully functional and can capture the following types of Crash

  • Mach kernel exceptions
  • Fatal signals
  • C++ exceptions
  • Objective-C exceptions
  • Main thread deadlock (experimental)
  • Custom crashes (e.g. from scripting languages)

Therefore, analyzing the Crash collection scheme on iOS side is to analyze the implementation principle of KSCrash's Crash monitoring.

2.1. Mach layer exception handling

The general idea is to create an exception handling port, apply permissions for it, set up an exception port, create a new kernel thread, and loop through it waiting for exceptions.However, in order to prevent self-registered Mach layer exception handling from preempting logic set by other SDK s or business line developers, we need to save the other exception handling ports initially and hand over the exception handling to logical processing within the other ports after the logic is executed.Once the Crash information is collected, the data is assembled and written to a json file.

The flowchart is as follows:

For Mach exception capture, you can register an exception port that listens on all threads of the current task.

Here's a look at the key code:

Register Mach Layer exception listening code

static bool installExceptionHandler()
{
    KSLOG_DEBUG("Installing mach exception handler.");

    bool attributes_created = false;
    pthread_attr_t attr;

    kern_return_t kr;
    int error;
    // Get the current process
    const task_t thisTask = mach_task_self();
    exception_mask_t mask = EXC_MASK_BAD_ACCESS |
    EXC_MASK_BAD_INSTRUCTION |
    EXC_MASK_ARITHMETIC |
    EXC_MASK_SOFTWARE |
    EXC_MASK_BREAKPOINT;

    KSLOG_DEBUG("Backing up original exception ports.");
    // Get the registered exception port on this Task
    kr = task_get_exception_ports(thisTask,
                                  mask,
                                  g_previousExceptionPorts.masks,
                                  &g_previousExceptionPorts.count,
                                  g_previousExceptionPorts.ports,
                                  g_previousExceptionPorts.behaviors,
                                  g_previousExceptionPorts.flavors);
    // Get Failure Walk failed Logic
    if(kr != KERN_SUCCESS)
    {
        KSLOG_ERROR("task_get_exception_ports: %s", mach_error_string(kr));
        goto failed;
    }
    // KSCrash's exception is empty and execute logic
    if(g_exceptionPort == MACH_PORT_NULL)
    {
        KSLOG_DEBUG("Allocating new port with receive rights.");
        // Request exception handling port
        kr = mach_port_allocate(thisTask,
                                MACH_PORT_RIGHT_RECEIVE,
                                &g_exceptionPort);
        if(kr != KERN_SUCCESS)
        {
            KSLOG_ERROR("mach_port_allocate: %s", mach_error_string(kr));
            goto failed;
        }

        KSLOG_DEBUG("Adding send rights to port.");
        // Request permissions for exception handling ports: MACH_MSG_TYPE_MAKE_SEND
        kr = mach_port_insert_right(thisTask,
                                    g_exceptionPort,
                                    g_exceptionPort,
                                    MACH_MSG_TYPE_MAKE_SEND);
        if(kr != KERN_SUCCESS)
        {
            KSLOG_ERROR("mach_port_insert_right: %s", mach_error_string(kr));
            goto failed;
        }
    }

    KSLOG_DEBUG("Installing port as exception handler.");
    // Set exception handling port for this Task
    kr = task_set_exception_ports(thisTask,
                                  mask,
                                  g_exceptionPort,
                                  EXCEPTION_DEFAULT,
                                  THREAD_STATE_NONE);
    if(kr != KERN_SUCCESS)
    {
        KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr));
        goto failed;
    }

    KSLOG_DEBUG("Creating secondary exception thread (suspended).");
    pthread_attr_init(&attr);
    attributes_created = true;
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    // Set Monitoring Threads
    error = pthread_create(&g_secondaryPThread,
                           &attr,
                           &handleExceptions,
                           kThreadSecondary);
    if(error != 0)
    {
        KSLOG_ERROR("pthread_create_suspended_np: %s", strerror(error));
        goto failed;
    }
    // Convert to Mach Kernel Thread
    g_secondaryMachThread = pthread_mach_thread_np(g_secondaryPThread);
    ksmc_addReservedThread(g_secondaryMachThread);

    KSLOG_DEBUG("Creating primary exception thread.");
    error = pthread_create(&g_primaryPThread,
                           &attr,
                           &handleExceptions,
                           kThreadPrimary);
    if(error != 0)
    {
        KSLOG_ERROR("pthread_create: %s", strerror(error));
        goto failed;
    }
    pthread_attr_destroy(&attr);
    g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread);
    ksmc_addReservedThread(g_primaryMachThread);
    
    KSLOG_DEBUG("Mach exception handler installed.");
    return true;


failed:
    KSLOG_DEBUG("Failed to install mach exception handler.");
    if(attributes_created)
    {
        pthread_attr_destroy(&attr);
    }
    // Restore the previous exception registration port to restore control
    uninstallExceptionHandler();
    return false;
}

Logic for handling exceptions, assembly crash information

/** Our exception handler thread routine.
 * Wait for an exception message, uninstall our exception port, record the
 * exception information, and write a report.
 */
static void* handleExceptions(void* const userData)
{
    MachExceptionMessage exceptionMessage = {{0}};
    MachReplyMessage replyMessage = {{0}};
    char* eventID = g_primaryEventID;

    const char* threadName = (const char*) userData;
    pthread_setname_np(threadName);
    if(threadName == kThreadSecondary)
    {
        KSLOG_DEBUG("This is the secondary thread. Suspending.");
        thread_suspend((thread_t)ksthread_self());
        eventID = g_secondaryEventID;
    }
    // Loop through registered exception port information
    for(;;)
    {
        KSLOG_DEBUG("Waiting for mach exception");

        // Wait for a message.
        kern_return_t kr = mach_msg(&exceptionMessage.header,
                                    MACH_RCV_MSG,
                                    0,
                                    sizeof(exceptionMessage),
                                    g_exceptionPort,
                                    MACH_MSG_TIMEOUT_NONE,
                                    MACH_PORT_NULL);
        // Getting the information means that a Mach layer exception has occurred, jumping out of the for loop, and assembling the data
        if(kr == KERN_SUCCESS)
        {
            break;
        }

        // Loop and try again on failure.
        KSLOG_ERROR("mach_msg: %s", mach_error_string(kr));
    }

    KSLOG_DEBUG("Trapped mach exception code 0x%x, subcode 0x%x",
                exceptionMessage.code[0], exceptionMessage.code[1]);
    if(g_isEnabled)
    {
        // Suspend all threads
        ksmc_suspendEnvironment();
        g_isHandlingCrash = true;
        // Notify that an exception has occurred
        kscm_notifyFatalExceptionCaptured(true);

        KSLOG_DEBUG("Exception handler is installed. Continuing exception handling.");


        // Switch to the secondary thread if necessary, or uninstall the handler
        // to avoid a death loop.
        if(ksthread_self() == g_primaryMachThread)
        {
            KSLOG_DEBUG("This is the primary exception thread. Activating secondary thread.");
// TODO: This was put here to avoid a freeze. Does secondary thread ever fire?
            restoreExceptionPorts();
            if(thread_resume(g_secondaryMachThread) != KERN_SUCCESS)
            {
                KSLOG_DEBUG("Could not activate secondary thread. Restoring original exception ports.");
            }
        }
        else
        {
            KSLOG_DEBUG("This is the secondary exception thread. Restoring original exception ports.");
//            restoreExceptionPorts();
        }

        // Fill out crash information
        // Scenario site information needed to assemble exceptions
        KSLOG_DEBUG("Fetching machine state.");
        KSMC_NEW_CONTEXT(machineContext);
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        crashContext->offendingMachineContext = machineContext;
        kssc_initCursor(&g_stackCursor, NULL, NULL);
        if(ksmc_getContextForThread(exceptionMessage.thread.name, machineContext, true))
        {
            kssc_initWithMachineContext(&g_stackCursor, 100, machineContext);
            KSLOG_TRACE("Fault address 0x%x, instruction address 0x%x", kscpu_faultAddress(machineContext), kscpu_instructionAddress(machineContext));
            if(exceptionMessage.exception == EXC_BAD_ACCESS)
            {
                crashContext->faultAddress = kscpu_faultAddress(machineContext);
            }
            else
            {
                crashContext->faultAddress = kscpu_instructionAddress(machineContext);
            }
        }

        KSLOG_DEBUG("Filling out context.");
        crashContext->crashType = KSCrashMonitorTypeMachException;
        crashContext->eventID = eventID;
        crashContext->registersAreValid = true;
        crashContext->mach.type = exceptionMessage.exception;
        crashContext->mach.code = exceptionMessage.code[0];
        crashContext->mach.subcode = exceptionMessage.code[1];
        if(crashContext->mach.code == KERN_PROTECTION_FAILURE && crashContext->isStackOverflow)
        {
            // A stack overflow should return KERN_INVALID_ADDRESS, but
            // when a stack blasts through the guard pages at the top of the stack,
            // it generates KERN_PROTECTION_FAILURE. Correct for this.
            crashContext->mach.code = KERN_INVALID_ADDRESS;
        }
        crashContext->signal.signum = signalForMachException(crashContext->mach.type, crashContext->mach.code);
        crashContext->stackCursor = &g_stackCursor;

        kscm_handleException(crashContext);

        KSLOG_DEBUG("Crash handling complete. Restoring original handlers.");
        g_isHandlingCrash = false;
        ksmc_resumeEnvironment();
    }

    KSLOG_DEBUG("Replying to mach exception message.");
    // Send a reply saying "I didn't handle this exception".
    replyMessage.header = exceptionMessage.header;
    replyMessage.NDR = exceptionMessage.NDR;
    replyMessage.returnCode = KERN_FAILURE;

    mach_msg(&replyMessage.header,
             MACH_SEND_MSG,
             sizeof(replyMessage),
             0,
             MACH_PORT_NULL,
             MACH_MSG_TIMEOUT_NONE,
             MACH_PORT_NULL);

    return NULL;
}

Restore exception handling port, transfer control

/** Restore the original mach exception ports.
 */
static void restoreExceptionPorts(void)
{
    KSLOG_DEBUG("Restoring original exception ports.");
    if(g_previousExceptionPorts.count == 0)
    {
        KSLOG_DEBUG("Original exception ports were already restored.");
        return;
    }

    const task_t thisTask = mach_task_self();
    kern_return_t kr;

    // Reinstall old exception ports.
    // for loop removes saved exception ports registered before KSCrash and registers each port back
    for(mach_msg_type_number_t i = 0; i < g_previousExceptionPorts.count; i++)
    {
        KSLOG_TRACE("Restoring port index %d", i);
        kr = task_set_exception_ports(thisTask,
                                      g_previousExceptionPorts.masks[i],
                                      g_previousExceptionPorts.ports[i],
                                      g_previousExceptionPorts.behaviors[i],
                                      g_previousExceptionPorts.flavors[i]);
        if(kr != KERN_SUCCESS)
        {
            KSLOG_ERROR("task_set_exception_ports: %s",
                        mach_error_string(kr));
        }
    }
    KSLOG_DEBUG("Exception ports restored.");
    g_previousExceptionPorts.count = 0;
}

2.2. Signal exception handling

For Mach exceptions, the operating system converts them to corresponding Unix signals, so developers can handle them by registering signanHandler.

KSCrash's logic here is as follows:

Take a look at the key code:

Setting up signal processing functions

static bool installSignalHandler()
{
    KSLOG_DEBUG("Installing signal handler.");

#if KSCRASH_HAS_SIGNAL_STACK
    // Allocate a block of memory on the heap,
    if(g_signalStack.ss_size == 0)
    {
        KSLOG_DEBUG("Allocating signal stack area.");
        g_signalStack.ss_size = SIGSTKSZ;
        g_signalStack.ss_sp = malloc(g_signalStack.ss_size);
    }
    // The stack of signal processing functions is moved to the stack instead of sharing a stack area with the process
    // sigaltstack() function, the first parameter of which is sigstack_The pointer to the t structure, which stores the location and attribute information of a "replaceable signal stack".Second parameter old_sigstack is also a stack_Type t pointer, which returns information (if any) from the last established Replaceable Stack
    KSLOG_DEBUG("Setting signal stack area.");
    // The first parameter of sigaltstack is the new replaceable signal stack created, and the second parameter can be set to NULL. If not, the information of the old replaceable signal stack will be stored inside.The function returned 0 successfully and -1 failed.
    if(sigaltstack(&g_signalStack, NULL) != 0)
    {
        KSLOG_ERROR("signalstack: %s", strerror(errno));
        goto failed;
    }
#endif

    const int* fatalSignals = kssignal_fatalSignals();
    int fatalSignalsCount = kssignal_numFatalSignals();

    if(g_previousSignalHandlers == NULL)
    {
        KSLOG_DEBUG("Allocating memory to store previous signal handlers.");
        g_previousSignalHandlers = malloc(sizeof(*g_previousSignalHandlers)
                                          * (unsigned)fatalSignalsCount);
    }

    // Sets the second parameter of the signal processing function sigaction, of type sigaction structure
    struct sigaction action = {{0}};
    // sa_flags members set up SA_The ONSTACK flag, which tells the kernel signal processing function that the stack frame is built on the Replaceable Stack.
    action.sa_flags = SA_SIGINFO | SA_ONSTACK;
#if KSCRASH_HOST_APPLE && defined(__LP64__)
    action.sa_flags |= SA_64REGSET;
#endif
    sigemptyset(&action.sa_mask);
    action.sa_sigaction = &handleSignal;

    // Traverse the array of signals that need to be processed
    for(int i = 0; i < fatalSignalsCount; i++)
    {
        // Bind the processing function of each signal to the action declared above, using g_previousSignalHandlers Save Processing Functions for Current Signal
        KSLOG_DEBUG("Assigning handler for signal %d", fatalSignals[i]);
        if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) != 0)
        {
            char sigNameBuff[30];
            const char* sigName = kssignal_signalName(fatalSignals[i]);
            if(sigName == NULL)
            {
                snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]);
                sigName = sigNameBuff;
            }
            KSLOG_ERROR("sigaction (%s): %s", sigName, strerror(errno));
            // Try to reverse the damage
            for(i--;i >= 0; i--)
            {
                sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
            }
            goto failed;
        }
    }
    KSLOG_DEBUG("Signal handlers installed.");
    return true;

failed:
    KSLOG_DEBUG("Failed to install signal handlers.");
    return false;
}

Record context information such as threads when signal processing

static void handleSignal(int sigNum, siginfo_t* signalInfo, void* userContext)
{
    KSLOG_DEBUG("Trapped signal %d", sigNum);
    if(g_isEnabled)
    {
        ksmc_suspendEnvironment();
        kscm_notifyFatalExceptionCaptured(false);
        
        KSLOG_DEBUG("Filling out context.");
        KSMC_NEW_CONTEXT(machineContext);
        ksmc_getContextForSignal(userContext, machineContext);
        kssc_initWithMachineContext(&g_stackCursor, 100, machineContext);
        // Context information for recording signal processing
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        memset(crashContext, 0, sizeof(*crashContext));
        crashContext->crashType = KSCrashMonitorTypeSignal;
        crashContext->eventID = g_eventID;
        crashContext->offendingMachineContext = machineContext;
        crashContext->registersAreValid = true;
        crashContext->faultAddress = (uintptr_t)signalInfo->si_addr;
        crashContext->signal.userContext = userContext;
        crashContext->signal.signum = signalInfo->si_signo;
        crashContext->signal.sigcode = signalInfo->si_code;
        crashContext->stackCursor = &g_stackCursor;

        kscm_handleException(crashContext);
        ksmc_resumeEnvironment();
    }

    KSLOG_DEBUG("Re-raising signal for regular handlers to catch.");
    // This is technically not allowed, but it works in OSX and iOS.
    raise(sigNum);
}

Signal processing privileges before restoring after KSCrash signal processing

static void uninstallSignalHandler(void)
{
    KSLOG_DEBUG("Uninstalling signal handlers.");

    const int* fatalSignals = kssignal_fatalSignals();
    int fatalSignalsCount = kssignal_numFatalSignals();
    // Traversal requires processing the signal array, restoring previous signal processing functions
    for(int i = 0; i < fatalSignalsCount; i++)
    {
        KSLOG_DEBUG("Restoring original handler for signal %d", fatalSignals[i]);
        sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
    }
    
    KSLOG_DEBUG("Signal handlers uninstalled.");
}

Explain:

  1. First allocate a memory area from the stack, known as the Replaceable Signal Stack, to dry out the stack of signal processing functions and replace it with the memory area on the stack instead of sharing a stack area with the process.

    Why?A process may have n threads, and each thread has its own task. If a thread fails to execute, it will cause the entire process to crash.Therefore, in order for the signal processing function to function properly, it is necessary to set a separate running space for the signal processing function.Another is that recursive functions exhaust the system's default stack space, but signal processing functions use a stack that implements the space allocated in the stack, not the system's default stack, so it still works.

  2. Int sigaltstack(const stack_t * uRestrict, stack_t * uThe two parameters of the restrict function are stack_A pointer to the T-structure that stores information about the replaceable signal stack (the starting address of the stack, the length of the stack, and the state of the stack).The first parameter stores the location and attribute information of a "replaceable signal stack".The second parameter is used to return the information (if any) of the Replaceable Stack that was last established.

    _STRUCT_SIGALTSTACK
    {
    	void            *ss_sp;         /* signal stack base */
    	__darwin_size_t ss_size;        /* signal stack length */
    	int             ss_flags;       /* SA_DISABLE and/or SA_ONSTACK */
    };
    typedef _STRUCT_SIGALTSTACK     stack_t; /* [???] signal stack */
    

    Newly created replaceable signal stack, ss_flags must be set to 0.The SIGSTKSZ constant is defined to meet the needs of most replaceable signal stacks.

    /*
     * Structure used in sigaltstack call.
     */
    
    #define SS_ONSTACK      0x0001  /* take signal on signal stack */
    #define SS_DISABLE      0x0004  /* disable taking signals on alternate stack */
    #define MINSIGSTKSZ     32768   /* (32K)minimum allowable stack */
    #define SIGSTKSZ        131072  /* (128K)recommended stack size */
    

    The sigaltstack system call notifies the kernel that a "replaceable signal stack" has been established.

    ss_flags is SS_ONSTACK indicates that the process is currently executing in the Replaceable Stack. If you attempt to create a new Replaceable Stack at this time, you will encounter an EPERM error; SS_DISABLE indicates that no Replaceable Stack is currently established and prohibits the creation of Replaceable Stack.

  3. int sigaction(int, const struct sigaction * __restrict, struct sigaction * __restrict);

    The first function represents the signal value that needs to be processed, but cannot be SIGKILL or SIGSTOP, which do not allow users to override because they provide the superuser with a method to terminate the program (SIGKILL and SIGSTOP cannot be caught, blocked, or ignored);

    The second and third parameters are a sigaction structure.If the second parameter is not null, it points to the signal processing function, and the third parameter is not null, the previous signal processing function is saved to the pointer.If the second parameter is null and the third parameter is not, the current signal processing function can be obtained.

    /*
     * Signal vector "template" used in sigaction call.
     */
    struct  sigaction {
    	union __sigaction_u __sigaction_u;  /* signal handler */
    	sigset_t sa_mask;               /* signal mask to apply */
    	int     sa_flags;               /* see signal options below */
    };
    

    Sa_of sigaction functionFlags parameter requires SA_to be setThe ONSTACK flag tells the kernel signal processing function that the stack frame is built on the Replaceable Stack.

2.3. C++ Exception Handling

The implementation of exception handling in c++ relies on std::set_of the standard libraryThe terminate (CPPExceptionTerminate) function.

Some functions in iOS project may be implemented using C, C++, etc.If a C++ exception is thrown, if it can be converted to NSException, the OC exception capture mechanism will be used. If not, the C++ exception process, default_will continue.Terminate_Handler.The default terminate function for this C++ exception calls abort_internallyThe message function, which triggers an abort call, produces a SIGABRT signal.

After the system throws a C++ exception, add a try...catch... layer to determine if the exception can be converted to NSException, and then re-throw the C++ exception.At this point, the exception's field stack has disappeared, so by capturing the SIGABRT signal, the upper layer cannot restore the scene in which the exception occurred, that is, the exception stack is missing.

Why?try...catch...calls_u inside a statementCxa_Rethrow() throws an exception, uCxa_Unwind is also called inside rethrow(). unwind can be simply understood as an inversion of a function call. It is mainly used to clean up the local variables generated by each function during the function call, up to the function where the outermost catch statement is located, and to transfer control to the catch statement, which is why the stack of C++ exceptions disappears.

static void setEnabled(bool isEnabled)
{
    if(isEnabled != g_isEnabled)
    {
        g_isEnabled = isEnabled;
        if(isEnabled)
        {
            initialize();

            ksid_generate(g_eventID);
            g_originalTerminateHandler = std::set_terminate(CPPExceptionTerminate);
        }
        else
        {
            std::set_terminate(g_originalTerminateHandler);
        }
        g_captureNextStackTrace = isEnabled;
    }
}

static void initialize()
{
    static bool isInitialized = false;
    if(!isInitialized)
    {
        isInitialized = true;
        kssc_initCursor(&g_stackCursor, NULL, NULL);
    }
}

void kssc_initCursor(KSStackCursor *cursor,
                     void (*resetCursor)(KSStackCursor*),
                     bool (*advanceCursor)(KSStackCursor*))
{
    cursor->symbolicate = kssymbolicator_symbolicate;
    cursor->advanceCursor = advanceCursor != NULL ? advanceCursor : g_advanceCursor;
    cursor->resetCursor = resetCursor != NULL ? resetCursor : kssc_resetCursor;
    cursor->resetCursor(cursor);
}
static void CPPExceptionTerminate(void)
{
    ksmc_suspendEnvironment();
    KSLOG_DEBUG("Trapped c++ exception");
    const char* name = NULL;
    std::type_info* tinfo = __cxxabiv1::__cxa_current_exception_type();
    if(tinfo != NULL)
    {
        name = tinfo->name();
    }
    
    if(name == NULL || strcmp(name, "NSException") != 0)
    {
        kscm_notifyFatalExceptionCaptured(false);
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        memset(crashContext, 0, sizeof(*crashContext));

        char descriptionBuff[DESCRIPTION_BUFFER_LENGTH];
        const char* description = descriptionBuff;
        descriptionBuff[0] = 0;

        KSLOG_DEBUG("Discovering what kind of exception was thrown.");
        g_captureNextStackTrace = false;
        try
        {
            throw;
        }
        catch(std::exception& exc)
        {
            strncpy(descriptionBuff, exc.what(), sizeof(descriptionBuff));
        }
#define CATCH_VALUE(TYPE, PRINTFTYPE) \
catch(TYPE value)\
{ \
    snprintf(descriptionBuff, sizeof(descriptionBuff), "%" #PRINTFTYPE, value); \
}
        CATCH_VALUE(char,                 d)
        CATCH_VALUE(short,                d)
        CATCH_VALUE(int,                  d)
        CATCH_VALUE(long,                ld)
        CATCH_VALUE(long long,          lld)
        CATCH_VALUE(unsigned char,        u)
        CATCH_VALUE(unsigned short,       u)
        CATCH_VALUE(unsigned int,         u)
        CATCH_VALUE(unsigned long,       lu)
        CATCH_VALUE(unsigned long long, llu)
        CATCH_VALUE(float,                f)
        CATCH_VALUE(double,               f)
        CATCH_VALUE(long double,         Lf)
        CATCH_VALUE(char*,                s)
        catch(...)
        {
            description = NULL;
        }
        g_captureNextStackTrace = g_isEnabled;

        // TODO: Should this be done here? Maybe better in the exception handler?
        KSMC_NEW_CONTEXT(machineContext);
        ksmc_getContextForThread(ksthread_self(), machineContext, true);

        KSLOG_DEBUG("Filling out context.");
        crashContext->crashType = KSCrashMonitorTypeCPPException;
        crashContext->eventID = g_eventID;
        crashContext->registersAreValid = false;
        crashContext->stackCursor = &g_stackCursor;
        crashContext->CPPException.name = name;
        crashContext->exceptionName = name;
        crashContext->crashReason = description;
        crashContext->offendingMachineContext = machineContext;

        kscm_handleException(crashContext);
    }
    else
    {
        KSLOG_DEBUG("Detected NSException. Letting the current NSException handler deal with it.");
    }
    ksmc_resumeEnvironment();

    KSLOG_DEBUG("Calling original terminate handler.");
    g_originalTerminateHandler();
}

2.4. Objective-C exception handling

For OC-level NSUncaughtExceptionHandler, exception information can be captured by registering NSUncaughtExceptionHandler, Crash information can be collected by NSException parameters, and handed over to the data reporting component.

static void setEnabled(bool isEnabled)
{
    if(isEnabled != g_isEnabled)
    {
        g_isEnabled = isEnabled;
        if(isEnabled)
        {
            KSLOG_DEBUG(@"Backing up original handler.");
            // OC exception handler before logging
            g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler();
            
            KSLOG_DEBUG(@"Setting new handler.");
            // Set up a new OC exception handling function
            NSSetUncaughtExceptionHandler(&handleException);
            KSCrash.sharedInstance.uncaughtExceptionHandler = &handleException;
        }
        else
        {
            KSLOG_DEBUG(@"Restoring original handler.");
            NSSetUncaughtExceptionHandler(g_previousUncaughtExceptionHandler);
        }
    }
}

2.5. Main Thread Deadlock

Deadlock detection for the main thread is somewhat similar to ANR detection

  • Create a thread, use do...while...loop logic in the thread run method, add autorelease to avoid memory overrun

  • There is a awaitingResponse property and a watchdogPulse method.The watchdogPulse main logic is to set the awaitingResponse to YES, switch to the main thread, and set the awaitingResponse to NO.

    - (void) watchdogPulse
    {
        __block id blockSelf = self;
        self.awaitingResponse = YES;
        dispatch_async(dispatch_get_main_queue(), ^
                       {
                           [blockSelf watchdogAnswer];
                       });
    }
    
  • Threads loop through their execution methods waiting for g_to be setAfter watchdogInterval, determine if the property value of awaitingResponse is the value of the initial state, otherwise it is a deadlock

    - (void) runMonitor
    {
        BOOL cancelled = NO;
        do
        {
            // Only do a watchdog check if the watchdog interval is > 0.
            // If the interval is <= 0, just idle until the user changes it.
            @autoreleasepool {
                NSTimeInterval sleepInterval = g_watchdogInterval;
                BOOL runWatchdogCheck = sleepInterval > 0;
                if(!runWatchdogCheck)
                {
                    sleepInterval = kIdleInterval;
                }
                [NSThread sleepForTimeInterval:sleepInterval];
                cancelled = self.monitorThread.isCancelled;
                if(!cancelled && runWatchdogCheck)
                {
                    if(self.awaitingResponse)
                    {
                        [self handleDeadlock];
                    }
                    else
                    {
                        [self watchdogPulse];
                    }
                }
            }
        } while (!cancelled);
    }
    

2.6 Crash Generation and Saving

2.6.1 Crash Log Generation Logic

The previous section talks about various crash monitoring logic in iOS application development. Next, you should analyze how crash information is recorded after crash capture, that is, stored in the application sandbox.

Take the crash example of a main thread deadlock to see how KSCrash records crash information.

// KSCrashMonitor_Deadlock.m
- (void) handleDeadlock
{
    ksmc_suspendEnvironment();
    kscm_notifyFatalExceptionCaptured(false);

    KSMC_NEW_CONTEXT(machineContext);
    ksmc_getContextForThread(g_mainQueueThread, machineContext, false);
    KSStackCursor stackCursor;
    kssc_initWithMachineContext(&stackCursor, 100, machineContext);
    char eventID[37];
    ksid_generate(eventID);

    KSLOG_DEBUG(@"Filling out context.");
    KSCrash_MonitorContext* crashContext = &g_monitorContext;
    memset(crashContext, 0, sizeof(*crashContext));
    crashContext->crashType = KSCrashMonitorTypeMainThreadDeadlock;
    crashContext->eventID = eventID;
    crashContext->registersAreValid = false;
    crashContext->offendingMachineContext = machineContext;
    crashContext->stackCursor = &stackCursor;
    
    kscm_handleException(crashContext);
    ksmc_resumeEnvironment();

    KSLOG_DEBUG(@"Calling abort()");
    abort();
}

The same is true for several other crashes, where exception information is packaged and given to kscm_handleException() function processing.You can see that this function is called after it is captured by several other crashes.


/** Start general exception processing.
 *
 * @oaram context Contextual information about the exception.
 */
void kscm_handleException(struct KSCrash_MonitorContext* context)
{
    context->requiresAsyncSafety = g_requiresAsyncSafety;
    if(g_crashedDuringExceptionHandling)
    {
        context->crashedDuringCrashHandling = true;
    }
    for(int i = 0; i < g_monitorsCount; i++)
    {
        Monitor* monitor = &g_monitors[i];
        // Determine if the current crash monitoring is on
        if(isMonitorEnabled(monitor))
        {
            // Do some additional information for each crash type
            addContextualInfoToEvent(monitor, context);
        }
    }
    // Really process crash information, save crash information in json format
    g_onExceptionEvent(context);

    
    if(g_handlingFatalException && !g_crashedDuringExceptionHandling)
    {
        KSLOG_DEBUG("Exception is fatal. Restoring original handlers.");
        kscm_setActiveMonitors(KSCrashMonitorTypeNone);
    }
}

g_onExceptionEvent is a block declared as static void (*g_onExceptionEvent (struct KSCrash_MonitorContext* monitorContext; assigned in KSCrashMonitor.c

void kscm_setEventCallback(void (*onEvent)(struct KSCrash_MonitorContext* monitorContext))
{
    g_onExceptionEvent = onEvent;
}

Kscm_The setEventCallback() function is called in the KSCrashC.c file

KSCrashMonitorType kscrash_install(const char* appName, const char* const installPath)
{
    KSLOG_DEBUG("Installing crash reporter.");

    if(g_installed)
    {
        KSLOG_DEBUG("Crash reporter already installed.");
        return g_monitoring;
    }
    g_installed = 1;

    char path[KSFU_MAX_PATH_LENGTH];
    snprintf(path, sizeof(path), "%s/Reports", installPath);
    ksfu_makePath(path);
    kscrs_initialize(appName, path);

    snprintf(path, sizeof(path), "%s/Data", installPath);
    ksfu_makePath(path);
    snprintf(path, sizeof(path), "%s/Data/CrashState.json", installPath);
    kscrashstate_initialize(path);

    snprintf(g_consoleLogPath, sizeof(g_consoleLogPath), "%s/Data/ConsoleLog.txt", installPath);
    if(g_shouldPrintPreviousLog)
    {
        printPreviousLog(g_consoleLogPath);
    }
    kslog_setLogFilename(g_consoleLogPath, true);
    
    ksccd_init(60);
    // Set the callback function when crash occurs
    kscm_setEventCallback(onCrash);
    KSCrashMonitorType monitors = kscrash_setMonitoring(g_monitoring);

    KSLOG_DEBUG("Installation complete.");
    return monitors;
}

/** Called when a crash occurs.
 *
 * This function gets passed as a callback to a crash handler.
 */
static void onCrash(struct KSCrash_MonitorContext* monitorContext)
{
    KSLOG_DEBUG("Updating application state to note crash.");
    kscrashstate_notifyAppCrash();
    monitorContext->consoleLogPath = g_shouldAddConsoleLogToReport ? g_consoleLogPath : NULL;

    // While crash was being processed, another crash occurred
    if(monitorContext->crashedDuringCrashHandling)
    {
        kscrashreport_writeRecrashReport(monitorContext, g_lastCrashReportFilePath);
    }
    else
    {
        // 1. Create a new crash file path based on the current time
        char crashReportFilePath[KSFU_MAX_PATH_LENGTH];
        kscrs_getNextCrashReportPath(crashReportFilePath);
        // 2. Save the newly generated file path to g_lastCrashReportFilePath
        strncpy(g_lastCrashReportFilePath, crashReportFilePath, sizeof(g_lastCrashReportFilePath));
        // 3. crash the newly generated file path into the function
        kscrashreport_writeStandardReport(monitorContext, crashReportFilePath);
    }
}

The next function is the implementation of the specific log write file.Both functions do the same thing by formatting into json and writing to a file.The difference is that if crash happens again when crash writes, the simplified version of the write logic kscrashreport_writeRecrashReport(), otherwise follow the standard write logic kscrashreport_writeStandardReport().

bool ksfu_openBufferedWriter(KSBufferedWriter* writer, const char* const path, char* writeBuffer, int writeBufferLength)
{
    writer->buffer = writeBuffer;
    writer->bufferLength = writeBufferLength;
    writer->position = 0;
    /*
     open() The second parameter describes the permissions for file operations
     #define O_RDONLY        0x0000         open for reading only
     #define O_WRONLY        0x0001         open for writing only
     #define O_RDWR          0x0002         open for reading and writing
     #define O_ACCMODE       0x0003         mask for above mode
     
     #define O_CREAT         0x0200         create if nonexistant
     #define O_TRUNC         0x0400         truncate to zero length
     #define O_EXCL          0x0800         error if already exists
     
     0755: That is, users have read/write/execute permissions, group users and other users have read/write permissions;
     0644: That is, users have read and write permissions, group users and other users have read-only permissions;
     Returns the file descriptor if successful and -1 if present
     */
    writer->fd = open(path, O_RDWR | O_CREAT | O_EXCL, 0644);
    if(writer->fd < 0)
    {
        KSLOG_ERROR("Could not open crash report file %s: %s", path, strerror(errno));
        return false;
    }
    return true;
}
/**
 * Write a standard crash report to a file.
 *
 *  @param monitorContext Contextual information about the crash and environment.
 *                      The caller must fill this out before passing it in.
 *
 *  @param path The file to write to.
 */
void kscrashreport_writeStandardReport(const struct KSCrash_MonitorContext* const monitorContext,
                                       const char* path)
{
		KSLOG_INFO("Writing crash report to %s", path);
    char writeBuffer[1024];
    KSBufferedWriter bufferedWriter;

    if(!ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer, sizeof(writeBuffer)))
    {
        return;
    }

    ksccd_freeze();
    
    KSJSONEncodeContext jsonContext;
    jsonContext.userData = &bufferedWriter;
    KSCrashReportWriter concreteWriter;
    KSCrashReportWriter* writer = &concreteWriter;
    prepareReportWriter(writer, &jsonContext);

    ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter);

    writer->beginObject(writer, KSCrashField_Report);
    {
        writeReportInfo(writer,
                        KSCrashField_Report,
                        KSCrashReportType_Standard,
                        monitorContext->eventID,
                        monitorContext->System.processName);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeBinaryImages(writer, KSCrashField_BinaryImages);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeProcessState(writer, KSCrashField_ProcessState, monitorContext);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeSystemInfo(writer, KSCrashField_System, monitorContext);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writer->beginObject(writer, KSCrashField_Crash);
        {
            writeError(writer, KSCrashField_Error, monitorContext);
            ksfu_flushBufferedWriter(&bufferedWriter);
            writeAllThreads(writer,
                            KSCrashField_Threads,
                            monitorContext,
                            g_introspectionRules.enabled);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        writer->endContainer(writer);

        if(g_userInfoJSON != NULL)
        {
            addJSONElement(writer, KSCrashField_User, g_userInfoJSON, false);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        else
        {
            writer->beginObject(writer, KSCrashField_User);
        }
        if(g_userSectionWriteCallback != NULL)
        {
            ksfu_flushBufferedWriter(&bufferedWriter);
            g_userSectionWriteCallback(writer);
        }
        writer->endContainer(writer);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeDebugInfo(writer, KSCrashField_Debug, monitorContext);
    }
    writer->endContainer(writer);
    
    ksjson_endEncode(getJsonContext(writer));
    ksfu_closeBufferedWriter(&bufferedWriter);
    ksccd_unfreeze();
}

/** Write a minimal crash report to a file.
 *
 * @param monitorContext Contextual information about the crash and environment.
 *                       The caller must fill this out before passing it in.
 *
 * @param path The file to write to.
 */
void kscrashreport_writeRecrashReport(const struct KSCrash_MonitorContext* const monitorContext,
                                      const char* path)
{
  char writeBuffer[1024];
    KSBufferedWriter bufferedWriter;
    static char tempPath[KSFU_MAX_PATH_LENGTH];
    // Modify the last crash report file name path passed in (/var/mobile/Containers/Data/Application/****/Library/Caches/KSCrash/Test/Reports/Test-report-******.json) to remove.json and add.old to become the new file path/var/mobile/Containers/Data/Application/******/Library/Caches/KSCrash/Test/Reports/Test-old-****.

    strncpy(tempPath, path, sizeof(tempPath) - 10);
    strncpy(tempPath + strlen(tempPath) - 5, ".old", 5);
    KSLOG_INFO("Writing recrash report to %s", path);

    if(rename(path, tempPath) < 0)
    {
        KSLOG_ERROR("Could not rename %s to %s: %s", path, tempPath, strerror(errno));
    }
    // Open memory to write required files based on incoming path
    if(!ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer, sizeof(writeBuffer)))
    {
        return;
    }

    ksccd_freeze();
    // c code for json parsing
    KSJSONEncodeContext jsonContext;
    jsonContext.userData = &bufferedWriter;
    KSCrashReportWriter concreteWriter;
    KSCrashReportWriter* writer = &concreteWriter;
    prepareReportWriter(writer, &jsonContext);

    ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter);

    writer->beginObject(writer, KSCrashField_Report);
    {
        writeRecrash(writer, KSCrashField_RecrashReport, tempPath);
        ksfu_flushBufferedWriter(&bufferedWriter);
        if(remove(tempPath) < 0)
        {
            KSLOG_ERROR("Could not remove %s: %s", tempPath, strerror(errno));
        }
        writeReportInfo(writer,
                        KSCrashField_Report,
                        KSCrashReportType_Minimal,
                        monitorContext->eventID,
                        monitorContext->System.processName);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writer->beginObject(writer, KSCrashField_Crash);
        {
            writeError(writer, KSCrashField_Error, monitorContext);
            ksfu_flushBufferedWriter(&bufferedWriter);
            int threadIndex = ksmc_indexOfThread(monitorContext->offendingMachineContext,
                                                 ksmc_getThreadFromContext(monitorContext->offendingMachineContext));
            writeThread(writer,
                        KSCrashField_CrashedThread,
                        monitorContext,
                        monitorContext->offendingMachineContext,
                        threadIndex,
                        false);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        writer->endContainer(writer);
    }
    writer->endContainer(writer);

    ksjson_endEncode(getJsonContext(writer));
    ksfu_closeBufferedWriter(&bufferedWriter);
    ksccd_unfreeze();
}
2.6.2 Crash Log Read Logic

Currently after Crash, KSCrash saves the data to the App sandbox directory. After App starts next, we read the stored crash file, then process and upload the data.

Function calls after App starts:

[KSCrashInstallation sendAllReportsWithCompletion:] -> [KSCrash sendAllReportsWithCompletion:] -> [KSCrash allReports] -> [KSCrash reportWithIntID:] ->[KSCrash loadCrashReportJSONWithID:] -> kscrs_readReport

Read Crash data in the sandbox in sendAllReportsWithCompletion.

// First determine the number of crash reports by reading the folder and traversing the number of files in the folder
static int getReportCount()
{
    int count = 0;
    DIR* dir = opendir(g_reportsPath);
    if(dir == NULL)
    {
        KSLOG_ERROR("Could not open directory %s", g_reportsPath);
        goto done;
    }
    struct dirent* ent;
    while((ent = readdir(dir)) != NULL)
    {
        if(getReportIDFromFilename(ent->d_name) > 0)
        {
            count++;
        }
    }

done:
    if(dir != NULL)
    {
        closedir(dir);
    }
    return count;
}

// Traverse through the number of crash files, folder information, get the file name once (the last part of the file name is the reportID), get the reportID, read the contents of the file in the crash report, and write to the array
- (NSArray*) allReports
{
    int reportCount = kscrash_getReportCount();
    int64_t reportIDs[reportCount];
    reportCount = kscrash_getReportIDs(reportIDs, reportCount);
    NSMutableArray* reports = [NSMutableArray arrayWithCapacity:(NSUInteger)reportCount];
    for(int i = 0; i < reportCount; i++)
    {
        NSDictionary* report = [self reportWithIntID:reportIDs[i]];
        if(report != nil)
        {
            [reports addObject:report];
        }
    }
    
    return reports;
}

//  Find crash information based on reportID
- (NSDictionary*) reportWithIntID:(int64_t) reportID
{
    NSData* jsonData = [self loadCrashReportJSONWithID:reportID];
    if(jsonData == nil)
    {
        return nil;
    }

    NSError* error = nil;
    NSMutableDictionary* crashReport = [KSJSONCodec decode:jsonData
                                                   options:KSJSONDecodeOptionIgnoreNullInArray |
                                                           KSJSONDecodeOptionIgnoreNullInObject |
                                                           KSJSONDecodeOptionKeepPartialObject
                                                     error:&error];
    if(error != nil)
    {
        KSLOG_ERROR(@"Encountered error loading crash report %" PRIx64 ": %@", reportID, error);
    }
    if(crashReport == nil)
    {
        KSLOG_ERROR(@"Could not load crash report");
        return nil;
    }
    [self doctorReport:crashReport];

    return crashReport;
}

//  reportID reads crash content and converts it to NSData type
- (NSData*) loadCrashReportJSONWithID:(int64_t) reportID
{
    char* report = kscrash_readReport(reportID);
    if(report != NULL)
    {
        return [NSData dataWithBytesNoCopy:report length:strlen(report) freeWhenDone:YES];
    }
    return nil;
}

// reportID reads crash data to char type
char* kscrash_readReport(int64_t reportID)
{
    if(reportID <= 0)
    {
        KSLOG_ERROR("Report ID was %" PRIx64, reportID);
        return NULL;
    }

    char* rawReport = kscrs_readReport(reportID);
    if(rawReport == NULL)
    {
        KSLOG_ERROR("Failed to load report ID %" PRIx64, reportID);
        return NULL;
    }

    char* fixedReport = kscrf_fixupCrashReport(rawReport);
    if(fixedReport == NULL)
    {
        KSLOG_ERROR("Failed to fixup report ID %" PRIx64, reportID);
    }

    free(rawReport);
    return fixedReport;
}

// Multi-threaded locking, the c-function getCrashReportPathByID is executed through the reportID, and the path is set to the path.Then execute ksfu_readEntireFile reads crash information to result
char* kscrs_readReport(int64_t reportID)
{
    pthread_mutex_lock(&g_mutex);
    char path[KSCRS_MAX_PATH_LENGTH];
    getCrashReportPathByID(reportID, path);
    char* result;
    ksfu_readEntireFile(path, &result, NULL, 2000000);
    pthread_mutex_unlock(&g_mutex);
    return result;
}

int kscrash_getReportIDs(int64_t* reportIDs, int count)
{
    return kscrs_getReportIDs(reportIDs, count);
}

int kscrs_getReportIDs(int64_t* reportIDs, int count)
{
    pthread_mutex_lock(&g_mutex);
    count = getReportIDs(reportIDs, count);
    pthread_mutex_unlock(&g_mutex);
    return count;
}
// Loop through folder contents, according to ent->d_Name calls the getReportIDFromFilename function to get the reportID, and the array is populated inside the loop
static int getReportIDs(int64_t* reportIDs, int count)
{
    int index = 0;
    DIR* dir = opendir(g_reportsPath);
    if(dir == NULL)
    {
        KSLOG_ERROR("Could not open directory %s", g_reportsPath);
        goto done;
    }

    struct dirent* ent;
    while((ent = readdir(dir)) != NULL && index < count)
    {
        int64_t reportID = getReportIDFromFilename(ent->d_name);
        if(reportID > 0)
        {
            reportIDs[index++] = reportID;
        }
    }

    qsort(reportIDs, (unsigned)count, sizeof(reportIDs[0]), compareInt64);

done:
    if(dir != NULL)
    {
        closedir(dir);
    }
    return index;
}

// The sprintf (parameter 1, format 2) function returns the value of format 2 to parameter 1, then executes sscanf (parameter 1, parameter 2, parameter 3). The function writes the contents of string parameter 1 to parameter 3 in the format of parameter 2.The crash file is named "App Name-report-reportID.json"
static int64_t getReportIDFromFilename(const char* filename)
{
    char scanFormat[100];
    sprintf(scanFormat, "%s-report-%%" PRIx64 ".json", g_appName);
    
    int64_t reportID = 0;
    sscanf(filename, scanFormat, &reportID);
    return reportID;
}

2.7 Front-end js related Crash monitoring

2.7.1 JavascriptCore exception monitoring

This part is simple and rude, monitored directly by the exceptionHandler property of the JSContext object, such as the code below

jsContext.exceptionHandler = ^(JSContext *context, JSValue *exception) {
    // Handling jscore-related exception information    
};
2.7.2 h5 page exception monitoring

When Javascript in the h5 page runs abnormally, the window object triggers the error event on the ErrorEvent interface and executes itWindow.onerror().

window.onerror = function (msg, url, lineNumber, columnNumber, error) {
   // Handling exception information
};

2.7.3 React Native exception monitoring

Trial: Here's an RN Demo project, with event monitoring code added to the Debug Text control and crash triggered by an insider

<Text style={styles.sectionTitle} onPress={()=>{1+qw;}}>Debug</Text>

Contrast group 1:

Conditions: iOS project debug mode.Code for exception handling has been added to the RN side.

Simulator clicks command + d to bring up the panel, select Debug, open Chrome browser, Mac shortcut Command + Option + J to open the debugging panel, you can debug RN code just like React.

After viewing the crash stack, click where you can jump to the sourceMap.

Tips:RN project hit Release package

  • Create a folder in the project root directory (release_iOS), output folder as a resource

  • Switch to the project directory at the terminal and execute the following code

    react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map;
    
  • Will release_Inside the iOS folder, drag the contents of the.jsbundle and assets folders into the iOS project.

Contrast group 2:

Conditions: iOS project release mode.Do not add exception handling code on the RN side

Operation: Run iOS project, click button to simulate crash

Phenomena: iOS projects are collapsing.The screenshots and logs are as follows

2020-06-22 22:26:03.318 [info][tid:main][RCTRootView.m:294] Running application todos ({
    initialProps =     {
    };
    rootTag = 1;
})
2020-06-22 22:26:03.490 [info][tid:com.facebook.react.JavaScript] Running "todos" with {"rootTag":1,"initialProps":{}}
2020-06-22 22:27:38.673 [error][tid:com.facebook.react.JavaScript] ReferenceError: Can't find variable: qw
2020-06-22 22:27:38.675 [fatal][tid:com.facebook.react.ExceptionsManagerQueue] Unhandled JS Exception: ReferenceError: Can't find variable: qw
2020-06-22 22:27:38.691300+0800 todos[16790:314161] *** Terminating app due to uncaught exception 'RCTFatalException: Unhandled JS Exception: ReferenceError: Can't find variable: qw', reason: 'Unhandled JS Exception: ReferenceError: Can't find variable: qw, stack:
onPress@397:1821
<unknown>@203:3896
_performSideEffectsForTransition@210:9689
_performSideEffectsForTransition@(null):(null)
_receiveSignal@210:8425
_receiveSignal@(null):(null)
touchableHandleResponderRelease@210:5671
touchableHandleResponderRelease@(null):(null)
onResponderRelease@203:3006
b@97:1125
S@97:1268
w@97:1322
R@97:1617
M@97:2401
forEach@(null):(null)
U@97:2201
<unknown>@97:13818
Pe@97:90199
Re@97:13478
Ie@97:13664
receiveTouches@97:14448
value@27:3544
<unknown>@27:840
value@27:2798
value@27:812
value@(null):(null)
'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff23e3cf0e __exceptionPreprocess + 350
	1   libobjc.A.dylib                     0x00007fff50ba89b2 objc_exception_throw + 48
	2   todos                               0x00000001017b0510 RCTFormatError + 0
	3   todos                               0x000000010182d8ca -[RCTExceptionsManager reportFatal:stack:exceptionId:suppressRedBox:] + 503
	4   todos                               0x000000010182e34e -[RCTExceptionsManager reportException:] + 1658
	5   CoreFoundation                      0x00007fff23e43e8c __invoking___ + 140
	6   CoreFoundation                      0x00007fff23e41071 -[NSInvocation invoke] + 321
	7   CoreFoundation                      0x00007fff23e41344 -[NSInvocation invokeWithTarget:] + 68
	8   todos                               0x00000001017e07fa -[RCTModuleMethod invokeWithBridge:module:arguments:] + 578
	9   todos                               0x00000001017e2a84 _ZN8facebook5reactL11invokeInnerEP9RCTBridgeP13RCTModuleDatajRKN5folly7dynamicE + 246
	10  todos                               0x00000001017e280c ___ZN8facebook5react15RCTNativeModule6invokeEjON5folly7dynamicEi_block_invoke + 78
	11  libdispatch.dylib                   0x00000001025b5f11 _dispatch_call_block_and_release + 12
	12  libdispatch.dylib                   0x00000001025b6e8e _dispatch_client_callout + 8
	13  libdispatch.dylib                   0x00000001025bd6fd _dispatch_lane_serial_drain + 788
	14  libdispatch.dylib                   0x00000001025be28f _dispatch_lane_invoke + 422
	15  libdispatch.dylib                   0x00000001025c9b65 _dispatch_workloop_worker_thread + 719
	16  libsystem_pthread.dylib             0x00007fff51c08a3d _pthread_wqthread + 290
	17  libsystem_pthread.dylib             0x00007fff51c07b77 start_wqthread + 15
)
libc++abi.dylib: terminating with uncaught exception of type NSException
(lldb) 

Tips: How to debug in RN release mode (see console information on js side)

  • Import #import <React/RCTLog.h>into AppDelegate.m
  • Add RCTSetLogThreshold(RCTLogLevelTrace) to - (BOOL) application:(UIApplication *) application didFinishLaunchingWithOptions:(NSDictionary *) launch Options;

Contrast group 3:

Conditions: iOS project release mode.Add exception handling code on the RN side.

global.ErrorUtils.setGlobalHandler((e) => {
  console.log(e);
  let message = { name: e.name,
                message: e.message,
                stack: e.stack
  };
  axios.get('http://192.168.1.100:8888/test.php', {
  	params: { 'message': JSON.stringify(message) }
  }).then(function (response) {
  		console.log(response)
  }).catch(function (error) {
  console.log(error)
  });
}, true)

Operation: Run the iOS project and click the button to simulate crash.

Phenomenon: The iOS project does not burst.The log information below compares the js in the bundle package.

Conclusion:

In RN projects, crash occurs and is reflected on the Native side.If the RN side writes crash-captured code, the Native side does not run.If the crash on the RN side is not captured, Native will run directly.

RN project has written crash monitoring, after monitoring, the stack information is printed out and found that the corresponding js information is processed by webpack, so crash analysis is very difficult.So we need to write monitoring code for RN crash, report after monitoring, and write special crash information to restore to you, that is, sourceMap parsing.

2.7.3.1 js logic error

Anyone who has written RN knows that problems with js code in DEBUG mode will produce a red screen, and blank or flicker in RELEASE mode. Abnormal monitoring is required for experience and quality control.

ErrorUtils was found while looking at the RN source. See the code to set up error handling.

/**
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 *
 * @format
 * @flow strict
 * @polyfill
 */

let _inGuard = 0;

type ErrorHandler = (error: mixed, isFatal: boolean) => void;
type Fn<Args, Return> = (...Args) => Return;

/**
 * This is the error handler that is called when we encounter an exception
 * when loading a module. This will report any errors encountered before
 * ExceptionsManager is configured.
 */
let _globalHandler: ErrorHandler = function onError(
  e: mixed,
  isFatal: boolean,
) {
  throw e;
};

/**
 * The particular require runtime that we are using looks for a global
 * `ErrorUtils` object and if it exists, then it requires modules with the
 * error handler specified via ErrorUtils.setGlobalHandler by calling the
 * require function with applyWithGuard. Since the require module is loaded
 * before any of the modules, this ErrorUtils must be defined (and the handler
 * set) globally before requiring anything.
 */
const ErrorUtils = {
  setGlobalHandler(fun: ErrorHandler): void {
    _globalHandler = fun;
  },
  getGlobalHandler(): ErrorHandler {
    return _globalHandler;
  },
  reportError(error: mixed): void {
    _globalHandler && _globalHandler(error, false);
  },
  reportFatalError(error: mixed): void {
    // NOTE: This has an untyped call site in Metro.
    _globalHandler && _globalHandler(error, true);
  },
  applyWithGuard<TArgs: $ReadOnlyArray<mixed>, TOut>(
    fun: Fn<TArgs, TOut>,
    context?: ?mixed,
    args?: ?TArgs,
    // Unused, but some code synced from www sets it to null.
    unused_onError?: null,
    // Some callers pass a name here, which we ignore.
    unused_name?: ?string,
  ): ?TOut {
    try {
      _inGuard++;
      // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work
      return fun.apply(context, args);
    } catch (e) {
      ErrorUtils.reportError(e);
    } finally {
      _inGuard--;
    }
    return null;
  },
  applyWithGuardIfNeeded<TArgs: $ReadOnlyArray<mixed>, TOut>(
    fun: Fn<TArgs, TOut>,
    context?: ?mixed,
    args?: ?TArgs,
  ): ?TOut {
    if (ErrorUtils.inGuard()) {
      // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work
      return fun.apply(context, args);
    } else {
      ErrorUtils.applyWithGuard(fun, context, args);
    }
    return null;
  },
  inGuard(): boolean {
    return !!_inGuard;
  },
  guard<TArgs: $ReadOnlyArray<mixed>, TOut>(
    fun: Fn<TArgs, TOut>,
    name?: ?string,
    context?: ?mixed,
  ): ?(...TArgs) => ?TOut {
    // TODO: (moti) T48204753 Make sure this warning is never hit and remove it - types
    // should be sufficient.
    if (typeof fun !== 'function') {
      console.warn('A function must be passed to ErrorUtils.guard, got ', fun);
      return null;
    }
    const guardName = name ?? fun.name ?? '<generated guard>';
    function guarded(...args: TArgs): ?TOut {
      return ErrorUtils.applyWithGuard(
        fun,
        context ?? this,
        args,
        null,
        guardName,
      );
    }

    return guarded;
  },
};

global.ErrorUtils = ErrorUtils;

export type ErrorUtilsT = typeof ErrorUtils;

So RN anomalies can be usedGlobal.ErrorUtilsTo set up error handling.For instance

global.ErrorUtils.setGlobalHandler(e => {
   // e.name e.message e.stack
}, true);
2.7.3.2 Component Issues

In fact, another important thing to note about RN crash processing is React Error Boundaries. Details

In the past, JavaScript errors within components caused React's internal state to be destroyed, and at the next rendering time produce Possibly untraceable error .These errors are basically caused by earlier errors in other code (non-React component code), but React does not provide an elegant way to handle these errors in components or recover from them.

To solve the problem that JavaScript errors in some UI s should not cause the entire application to crash, React 16 introduces a new concept, Error Boundary.

Error Boundary is a React component that captures and prints JavaScript errors that occur anywhere in its sub-component tree and renders the alternate UI instead of the crashed sub-component tree.Error boundaries capture errors during rendering, lifecycle methods, and constructors for the entire component tree.

It captures exceptions in subcomponent life cycle functions, including constructor s and render functions

The following exceptions cannot be caught:

  • Event handlers
  • Asynchronous code (asynchronous code, such as setTimeout, promise, etc.)
  • Server side rendering
  • Errors thrown in the error boundary itself (rather than its children) (exception thrown by the exception boundary component itself)

So you can prevent App crash and improve the user experience by capturing all exceptions in the life cycle of a component and then rendering the underlying component with the exception boundary component.User feedback can also be guided to facilitate problem investigation and repair

The RN crash is divided into two types, js logic error and component js error, which have been monitored and processed.Now let's see how to solve these problems from an engineering perspective

2.7.4 RN Crash Restore

SourceMap files are critical for front-end log parsing, where parameters and steps for how to calculate are written and viewed This article.

With the SourceMap file, use the mozilla Of source-map Project, crash log of RN can be restored very well.

I wrote a NodeJS script with the following code

var fs = require('fs');
var sourceMap = require('source-map');
var arguments = process.argv.splice(2);

function parseJSError(aLine, aColumn) {
    fs.readFile('./index.ios.map', 'utf8', function (err, data) {
        const whatever =  sourceMap.SourceMapConsumer.with(data, null, consumer => {
            // Read crash log line and column numbers
            let parseData = consumer.originalPositionFor({
                line: parseInt(aLine),
                column: parseInt(aColumn)
            });
            // Output to console
            console.log(parseData);
            // Output to file
            fs.writeFileSync('./parsed.txt', JSON.stringify(parseData) + '\n', 'utf8', function(err) {  
                if(err) {  
                    console.log(err);
                }
            });
        });
    });
}

var line = arguments[0];
var column = arguments[1];
parseJSError(line, column);

Next, let's do an experiment, or the todos project above.

  1. Simulate crash on a Text click event

    <Text style={styles.sectionTitle} onPress={()=>{1+qw;}}>Debug</Text>
    
  2. bundle RN projects and output sourceMap files.Execute the command,

    react-native bundle --entry-file index.js --platform android --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.android.map;
    

    Because of high frequency usage, add alias alias settings to iterm2 and modify the.zshrc file

    alias RNRelease='react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map;' # RN Call Release Package
    
  3. Copy JS bundles and picture resources into the Xcode project

  4. Click Simulate crash, copy the line and column numbers below the log, and execute the following command under the Node project

    node index.js 397 1822
    
  5. Compare the line number, column number, file information parsed by the script with the source code file and the result is correct.

2.7.5 SourceMap parsing system design

Purpose: Through the platform, crash on RN project line can be restored to specific files, lines of code, columns of code.You can see the specific code, you can see the RN stack trace, and you can provide the source file download function.

  1. Servers managed under the packaging system:
    • Packaging in production environment to generate source map file
    • Store all pre-packaged files (install)
  2. Develop product side RN analysis interface.Click on the collected RN crash to see the specific files, lines of code, and columns of code on the details page.You can see the specific code, you can see RN stack trace, Native stack trace.(Specific technical implementation described above)
  3. Due to the large size of the souece map file, RN parsing is too long, although not long, but it is a consumption of computing resources, so efficient reading needs to be designed.
  4. SourceMap is different in iOS and Android mode, so SoureceMap storage needs to distinguish os.

3. Packaging of KSCrash

Then encapsulate your own Crash processing logic.For example, one thing to do is:

  • Inherited from the abstract class KSCrashInstallation, set the initialization work (abstract classes such as NSURLProtocol must be inherited to use) to implement the sink method in the abstract class.

    /**
     * Crash system installation which handles backend-specific details.
     *
     * Only one installation can be installed at a time.
     *
     * This is an abstract class.
     */
    @interface KSCrashInstallation : NSObject
    
    #import "CMCrashInstallation.h"
    #import <KSCrash/KSCrashInstallation+Private.h>
    #import "CMCrashReporterSink.h"
    
    @implementation CMCrashInstallation
    
    + (instancetype)sharedInstance {
        static CMCrashInstallation *sharedInstance = nil;
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            sharedInstance = [[CMCrashInstallation alloc] init];
        });
        return sharedInstance;
    }
    
    - (id)init {
        return [super initWithRequiredProperties: nil];
    }
    
    - (id<KSCrashReportFilter>)sink {
        CMCrashReporterSink *sink = [[CMCrashReporterSink alloc] init];
        return [sink defaultCrashReportFilterSetAppleFmt];
    }
    
    @end
    
  • The CMCrashReporterSink class inside the sink method, which follows the KSCrashReportFilter protocol, declares the public method defaultCrashReportFilterSetAppleFmt

    // .h
    #import <Foundation/Foundation.h>
    #import <KSCrash/KSCrashReportFilter.h>
    
    @interface CMCrashReporterSink : NSObject<KSCrashReportFilter>
    
    - (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt;
    
    @end
    
    // .m
    #pragma mark - public Method
    
    - (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt
    {
        return [KSCrashReportFilterPipeline filterWithFilters:
                [CMCrashReportFilterAppleFmt filterWithReportStyle:KSAppleReportStyleSymbolicatedSideBySide],
                self,
                nil];
    }
    

    Inside the defaultCrashReportFilterSetAppleFmt method, a result of the KSCrashReportFilterPipeline class method filterWithFilters is returned.

    CMCrashReportFilterAppleFmt is a class that inherits from KSCrashReportFilterAppleFmt and follows the KSCrashReportFilter protocol.The protocol approach allows developers to work with Crash's data format.

    /** Filter the specified reports.
     *
     * @param reports The reports to process.
     * @param onCompletion Block to call when processing is complete.
     */
    - (void) filterReports:(NSArray*) reports
              onCompletion:(KSCrashReportFilterCompletion) onCompletion;
    
    #import <KSCrash/KSCrashReportFilterAppleFmt.h>
    
    @interface CMCrashReportFilterAppleFmt : KSCrashReportFilterAppleFmt<KSCrashReportFilter>
    
    @end
    
    // .m
    - (void) filterReports:(NSArray*)reports onCompletion:(KSCrashReportFilterCompletion)onCompletion
      {
        NSMutableArray* filteredReports = [NSMutableArray arrayWithCapacity:[reports count]];
        for(NSDictionary *report in reports){
          if([self majorVersion:report] == kExpectedMajorVersion){
            id monitorInfo = [self generateMonitorInfoFromCrashReport:report];
            if(monitorInfo != nil){
              [filteredReports addObject:monitorInfo];
            }
          }
        }
        kscrash_callCompletion(onCompletion, filteredReports, YES, nil);
    }
    
    /**
     @brief Get crash time, mach name, signal name, and apple report in Crash JSON
     */
    - (NSDictionary *)generateMonitorInfoFromCrashReport:(NSDictionary *)crashReport
    {
        NSDictionary *infoReport = [crashReport objectForKey:@"report"];
        // ...
        id appleReport = [self toAppleFormat:crashReport];
    
        NSMutableDictionary *info = [NSMutableDictionary dictionary];
        [info setValue:crashTime forKey:@"crashTime"];
        [info setValue:appleReport forKey:@"appleReport"];
        [info setValue:userException forKey:@"userException"];
        [info setValue:userInfo forKey:@"custom"];
    
        return [info copy];
    }
    
    /**
     * A pipeline of filters. Reports get passed through each subfilter in order.
     *
     * Input: Depends on what's in the pipeline.
     * Output: Depends on what's in the pipeline.
     */
    @interface KSCrashReportFilterPipeline : NSObject <KSCrashReportFilter>
    
  • APM capabilities set up a launcher for the Crash module.Initialization of KSCrash is set up inside the launcher and monitoring the assembly of data required when Crash is triggered.For example: SESSION_Basic information such as ID, App start time, App name, crash time, App version number, current page information, etc.

    /** C Function to call during a crash report to give the callee an opportunity to
     * add to the report. NULL = ignore.
     *
     * WARNING: Only call async-safe functions from this function! DO NOT call
     * Objective-C methods!!!
     */
    @property(atomic,readwrite,assign) KSReportWriteCallback onCrash;
    
    + (instancetype)sharedInstance
    {
        static CMCrashMonitor *_sharedManager = nil;
        static dispatch_once_t onceToken;
        dispatch_once(&onceToken, ^{
            _sharedManager = [[CMCrashMonitor alloc] init];
        });
        return _sharedManager;
    }
    
    
    #pragma mark - public Method
    
    - (void)startMonitor
    {
        CMMLog(@"crash monitor started");
    
    #ifdef DEBUG
        BOOL _trackingCrashOnDebug = [CMMonitorConfig sharedInstance].trackingCrashOnDebug;
        if (_trackingCrashOnDebug) {
            [self installKSCrash];
        }
    #else
        [self installKSCrash];
    #endif
    }
    
    #pragma mark - private method
    
    static void onCrash(const KSCrashReportWriter* writer)
    {
        NSString *sessionId = [NSString stringWithFormat:@"\"%@\"", ***]];
        writer->addJSONElement(writer, "SESSION_ID", [sessionId UTF8String], true);
    
        NSString *appLaunchTime = ***;
        writer->addJSONElement(writer, "USER_APP_START_DATE", [[NSString stringWithFormat:@"\"%@\"", appLaunchTime] UTF8String], true);
        // ...
    }
    
    - (void)installKSCrash
    {
        [[CMCrashInstallation sharedInstance] install];
        [[CMCrashInstallation sharedInstance] sendAllReportsWithCompletion:nil];
        [CMCrashInstallation sharedInstance].onCrash = onCrash;
        dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(5.f * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
            _isCanAddCrashCount = NO;
        });
    }
    

    [[CMCrashInstallation sharedInstance] sendAllReportsWithCompletion: nil] is called in the installKSCrash method with the following internal implementation

    - (void) sendAllReportsWithCompletion:(KSCrashReportFilterCompletion) onCompletion
    {
        NSError* error = [self validateProperties];
        if(error != nil)
        {
            if(onCompletion != nil)
            {
                onCompletion(nil, NO, error);
            }
            return;
        }
    
        id<KSCrashReportFilter> sink = [self sink];
        if(sink == nil)
        {
            onCompletion(nil, NO, [NSError errorWithDomain:[[self class] description]
                                                      code:0
                                               description:@"Sink was nil (subclasses must implement method \"sink\")"]);
            return;
        }
    
        sink = [KSCrashReportFilterPipeline filterWithFilters:self.prependedFilters, sink, nil];
    
        KSCrash* handler = [KSCrash sharedInstance];
        handler.sink = sink;
        [handler sendAllReportsWithCompletion:onCompletion];
    }
    

    Inside the method, the sink of KSCrashInstallation is assigned to the KSCrash object.Internally, KSCrash's sendAllReportsWithCompletion method was also called, as follows

    - (void) sendAllReportsWithCompletion:(KSCrashReportFilterCompletion) onCompletion
    {
        NSArray* reports = [self allReports];
    
        KSLOG_INFO(@"Sending %d crash reports", [reports count]);
    
        [self sendReports:reports
             onCompletion:^(NSArray* filteredReports, BOOL completed, NSError* error)
         {
             KSLOG_DEBUG(@"Process finished with completion: %d", completed);
             if(error != nil)
             {
                 KSLOG_ERROR(@"Failed to send reports: %@", error);
             }
             if((self.deleteBehaviorAfterSendAll == KSCDeleteOnSucess && completed) ||
                self.deleteBehaviorAfterSendAll == KSCDeleteAlways)
             {
                 kscrash_deleteAllReports();
             }
             kscrash_callCompletion(onCompletion, filteredReports, completed, error);
         }];
    }
    

    The object method sendReports: onCompletion:, which is called internally by the method, is shown below

    - (void) sendReports:(NSArray*) reports onCompletion:(KSCrashReportFilterCompletion) onCompletion
    {
        if([reports count] == 0)
        {
            kscrash_callCompletion(onCompletion, reports, YES, nil);
            return;
        }
    
        if(self.sink == nil)
        {
            kscrash_callCompletion(onCompletion, reports, NO,
                                     [NSError errorWithDomain:[[self class] description]
                                                         code:0
                                                  description:@"No sink set. Crash reports not sent."]);
            return;
        }
    
        [self.sink filterReports:reports
                    onCompletion:^(NSArray* filteredReports, BOOL completed, NSError* error)
         {
             kscrash_callCompletion(onCompletion, filteredReports, completed, error);
         }];
    }
    

    Inside the method [Self.sinkFiltererReports: onCompletion:] The implementation is actually the sink getter method set in CMCrashInstallation, which internally returns the defaultCrashReportFilterSetAppleFmt method of the CMCrashReporterSink object.Internal implementation is as follows

    - (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt
    {
        return [KSCrashReportFilterPipeline filterWithFilters:
                [CMCrashReportFilterAppleFmt filterWithReportStyle:KSAppleReportStyleSymbolicatedSideBySide],
                self,
                nil];
    }
    

    You can see that there are several filters set inside this function, one of which is self, the CMCrashReporterSink object, so the above [Self.sinkFilterReports: onCompletion:], which calls the data processing method within CMCrashReporterSink.Pass kscrash_when doneCallCompletion (onCompletion, reports, YES, nil); tells KSCrash that locally saved Crash logs have been processed and can be deleted.

    - (void)filterReports:(NSArray *)reports onCompletion:(KSCrashReportFilterCompletion)onCompletion
    {
        for (NSDictionary *report in reports) {
            // Processing Crash data, handing it over to a unified data reporting component...
        }
        kscrash_callCompletion(onCompletion, reports, YES, nil);
    }
    

    Here's an overview of what KSCrash does, providing various crash monitoring capabilities. After crash, process information, basic information, exception information, thread information and so on are efficiently converted to json files by c. App reads the crash log in the local crash folder after next startup, allowing developers to customize key, value and then report to APM System, and then delete the log in the local crash folder.

4. Symbolization

After crash is applied, a crash log is generated, stored in the settings, and information such as the running state of the application, the call stack, the thread in which it is located is recorded in the log.However, these logs are addresses and are not readable, so a symbolic restore is required.

4.1.dSYM file <a name="dSYM"></a>

The.DSYM (debugging symbol) file is a transit file that holds the address mapping information for hexadecimal functions, in which the debugging information (symbols) is contained.The Xcode project generates a new.DSYM file each time it is compiled and run.By default, when debug mode does not generate.DSYM, you can modify the value DWARF to DWARF with dSYM File after Build Settings -> Build Options -> Debug Information Format, so that the.DSYM file can be generated by compiling and running again.

So every time you package an App, you need to save each version of the.dSYM file.

The.DSYM file contains DWARF information, opening the package contents of the file Test.app.dSYM/Contents/Resources/DWARF/Test The DWARF file is saved.

The.dSYM file is a file directory that extracts debug information from Mach-O files. When it is published, the debug information is stored in a separate file for security purposes..dSYM is actually a file directory with the following structure:

4.2 DWARF Files

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

DWARF is a debug file format widely used by many compilers and debuggers to support source-level debugging.It meets the needs of many process languages (C, C++, Fortran) and is designed to support expansion to other languages.DWARF is architecturally independent and is suitable for any other processor and operating system.It is widely used on Unix, Linux, and other operating systems, as well as in stand-alone environments.

The full name of DWARF, Debugging With Arbitrary Record Formats, is a debug file that uses an attributed record format.

DWARF is a compact representation of the executable's relationship to the source code.

Most modern programming languages are block structures: each entity (a class, a function) is contained in another entity.A c program, each file may contain multiple data definitions, variables, and functions, so DWARF follows this model and is also a block structure.The basic description item in DWARF is the Debugging Information Entry.A DIE has a tag that indicates what the DIE describes and a list of attributes (like html, xml structures) that fill in the details and further describe the item.A DIE (except at the top) is contained by a parent DIE, there may be sibling DIEs or child DIEs, and attributes may contain values: constants (such as a function name), variables (such as the starting address of a function), or references to another DIE (such as the return value type of a function).

The data in the DWARF file is as follows:

Data Column Information Description
.debug_loc In DW_AT_List of locations used in the location property
.debug_macinfo Macro Information
.debug_pubnames Lookup table for global objects and functions
.debug_pubtypes Lookup table of global type
.debug_ranges In DW_AT_Address ranges used in ranges properties
.debug_str In.debug_String table used in info
.debug_types Type Description

Common tags and attributes are as follows:

Data Column Information Description
DW_TAG_class_type Represents class name and type information
DW_TAG_structure_type Represents structure name and type information
DW_TAG_union_type Represents union name and type information
DW_TAG_enumeration_type Represents enumeration name and type information
DW_TAG_typedef Represents the name and type information of a typedef
DW_TAG_array_type Represents array name and type information
DW_TAG_subrange_type Represents the size information of an array
DW_TAG_inheritance Represents inherited class names and type information
DW_TAG_member Represents a member of a class
DW_TAG_subprogram Represents the name information of a function
DW_TAG_formal_parameter Parameter information representing functions
DW_TAG_name Represents a name string
DW_TAG_type Represents type information
DW_TAG_artifical Set by compiler at creation time
DW_TAG_sibling Represents brotherly location information
DW_TAG_data_memver_location Representing location information
DW_TAG_virtuality Set in Virtual Time

Take a simple look at an example of DWARF: parse the DWARF file under the.dSYM folder of the test project with the following command

dwarfdump -F --debug-info Test.app.dSYM/Contents/Resources/DWARF/Test > debug-info.txt

Open as follows

Test.app.dSYM/Contents/Resources/DWARF/Test:	file format Mach-O arm64

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004f version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00000053)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer [DW_FORM_strp]	("Apple clang version 11.0.3 (clang-1103.0.32.62)")
              DW_AT_language [DW_FORM_data2]	(DW_LANG_ObjC)
              DW_AT_name [DW_FORM_strp]	("_Builtin_stddef_max_align_t")
              DW_AT_stmt_list [DW_FORM_sec_offset]	(0x00000000)
              DW_AT_comp_dir [DW_FORM_strp]	("/Users/lbp/Desktop/Test")
              DW_AT_APPLE_major_runtime_vers [DW_FORM_data1]	(0x02)
              DW_AT_GNU_dwo_id [DW_FORM_data8]	(0x392b5344d415340c)

0x00000027:   DW_TAG_module
                DW_AT_name [DW_FORM_strp]	("_Builtin_stddef_max_align_t")
                DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include")
                DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x00000038:     DW_TAG_typedef
                  DW_AT_type [DW_FORM_ref4]	(0x0000004b "long double")
                  DW_AT_name [DW_FORM_strp]	("max_align_t")
                  DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include/__stddef_max_align_t.h")
                  DW_AT_decl_line [DW_FORM_data1]	(16)

0x00000043:     DW_TAG_imported_declaration
                  DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include/__stddef_max_align_t.h")
                  DW_AT_decl_line [DW_FORM_data1]	(27)
                  DW_AT_import [DW_FORM_ref_addr]	(0x0000000000000027)

0x0000004a:     NULL

0x0000004b:   DW_TAG_base_type
                DW_AT_name [DW_FORM_strp]	("long double")
                DW_AT_encoding [DW_FORM_data1]	(DW_ATE_float)
                DW_AT_byte_size [DW_FORM_data1]	(0x08)

0x00000052:   NULL
0x00000053: Compile Unit: length = 0x000183dc version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00018433)

0x0000005e: DW_TAG_compile_unit
              DW_AT_producer [DW_FORM_strp]	("Apple clang version 11.0.3 (clang-1103.0.32.62)")
              DW_AT_language [DW_FORM_data2]	(DW_LANG_ObjC)
              DW_AT_name [DW_FORM_strp]	("Darwin")
              DW_AT_stmt_list [DW_FORM_sec_offset]	(0x000000a7)
              DW_AT_comp_dir [DW_FORM_strp]	("/Users/lbp/Desktop/Test")
              DW_AT_APPLE_major_runtime_vers [DW_FORM_data1]	(0x02)
              DW_AT_GNU_dwo_id [DW_FORM_data8]	(0xa4a1d339379e18a5)

0x0000007a:   DW_TAG_module
                DW_AT_name [DW_FORM_strp]	("Darwin")
                DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x0000008b:     DW_TAG_module
                  DW_AT_name [DW_FORM_strp]	("C")
                  DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                  DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                  DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x0000009c:       DW_TAG_module
                    DW_AT_name [DW_FORM_strp]	("fenv")
                    DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                    DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                    DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x000000ad:         DW_TAG_enumeration_type
                      DW_AT_type [DW_FORM_ref4]	(0x00017276 "unsigned int")
                      DW_AT_byte_size [DW_FORM_data1]	(0x04)
                      DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/fenv.h")
                      DW_AT_decl_line [DW_FORM_data1]	(154)

0x000000b5:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_invalid")
                        DW_AT_const_value [DW_FORM_udata]	(256)

0x000000bc:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_divbyzero")
                        DW_AT_const_value [DW_FORM_udata]	(512)

0x000000c3:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_overflow")
                        DW_AT_const_value [DW_FORM_udata]	(1024)

0x000000ca:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_underflow")
// ......
0x000466ee:   DW_TAG_subprogram
                DW_AT_name [DW_FORM_strp]	("CFBridgingRetain")
                DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObject.h")
                DW_AT_decl_line [DW_FORM_data1]	(105)
                DW_AT_prototyped [DW_FORM_flag_present]	(true)
                DW_AT_type [DW_FORM_ref_addr]	(0x0000000000019155 "CFTypeRef")
                DW_AT_inline [DW_FORM_data1]	(DW_INL_inlined)

0x000466fa:     DW_TAG_formal_parameter
                  DW_AT_name [DW_FORM_strp]	("X")
                  DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObject.h")
                  DW_AT_decl_line [DW_FORM_data1]	(105)
                  DW_AT_type [DW_FORM_ref4]	(0x00046706 "id")

0x00046705:     NULL

0x00046706:   DW_TAG_typedef
                DW_AT_type [DW_FORM_ref4]	(0x00046711 "objc_object*")
                DW_AT_name [DW_FORM_strp]	("id")
                DW_AT_decl_file [DW_FORM_data1]	("/Users/lbp/Desktop/Test/Test/NetworkAPM/NSURLResponse+cm_FetchStatusLineFromCFNetwork.m")
                DW_AT_decl_line [DW_FORM_data1]	(44)

0x00046711:   DW_TAG_pointer_type
                DW_AT_type [DW_FORM_ref4]	(0x00046716 "objc_object")

0x00046716:   DW_TAG_structure_type
                DW_AT_name [DW_FORM_strp]	("objc_object")
                DW_AT_byte_size [DW_FORM_data1]	(0x00)

0x0004671c:     DW_TAG_member
                  DW_AT_name [DW_FORM_strp]	("isa")
                  DW_AT_type [DW_FORM_ref4]	(0x00046727 "objc_class*")
                  DW_AT_data_member_location [DW_FORM_data1]	(0x00)
// ......

Don't paste everything here (it's too long).You can see that DIE contains the function start address, end address, function name, file name, line number. For a given address, if you find a DIE that contains this resistance between the function start address and end address, you can restore the function name and file name information.

debug_line restores information such as the number of lines in a file

dwarfdump -F --debug-line Test.app.dSYM/Contents/Resources/DWARF/Test > debug-inline.txt

Paste part information

Test.app.dSYM/Contents/Resources/DWARF/Test:	file format Mach-O arm64

.debug_line contents:
debug_line[0x00000000]
Line table prologue:
    total_length: 0x000000a3
         version: 4
 prologue_length: 0x0000009a
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include"
file_names[  1]:
           name: "__stddef_max_align_t.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      1      0      1   0             0  is_stmt end_sequence
debug_line[0x000000a7]
Line table prologue:
    total_length: 0x0000230a
         version: 4
 prologue_length: 0x00002301
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include"
include_directories[  2] = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include"
include_directories[  3] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys"
include_directories[  4] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach"
include_directories[  5] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/libkern"
include_directories[  6] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/architecture"
include_directories[  7] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys/_types"
include_directories[  8] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/_types"
include_directories[  9] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/arm"
include_directories[ 10] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys/_pthread"
include_directories[ 11] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach/arm"
include_directories[ 12] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/libkern/arm"
include_directories[ 13] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/uuid"
include_directories[ 14] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/netinet"
include_directories[ 15] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/netinet6"
include_directories[ 16] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/net"
include_directories[ 17] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/pthread"
include_directories[ 18] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach_debug"
include_directories[ 19] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/os"
include_directories[ 20] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/malloc"
include_directories[ 21] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/bsm"
include_directories[ 22] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/machine"
include_directories[ 23] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach/machine"
include_directories[ 24] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/secure"
include_directories[ 25] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/xlocale"
include_directories[ 26] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/arpa"
file_names[  1]:
           name: "fenv.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  2]:
           name: "stdatomic.h"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000
file_names[  3]:
           name: "wait.h"
      dir_index: 3
       mod_time: 0x00000000
         length: 0x00000000
// ......
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x000000010000b588     14      0      2   0             0  is_stmt
0x000000010000b5b4     16      5      2   0             0  is_stmt prologue_end
0x000000010000b5d0     17     11      2   0             0  is_stmt
0x000000010000b5d4      0      0      2   0             0 
0x000000010000b5d8     17      5      2   0             0 
0x000000010000b5dc     17     11      2   0             0 
0x000000010000b5e8     18      1      2   0             0  is_stmt
0x000000010000b608     20      0      2   0             0  is_stmt
0x000000010000b61c     22      5      2   0             0  is_stmt prologue_end
0x000000010000b628     23      5      2   0             0  is_stmt
0x000000010000b644     24      1      2   0             0  is_stmt
0x000000010000b650     15      0      1   0             0  is_stmt
0x000000010000b65c     15     41      1   0             0  is_stmt prologue_end
0x000000010000b66c     11      0      2   0             0  is_stmt
0x000000010000b680     11     17      2   0             0  is_stmt prologue_end
0x000000010000b6a4     11     17      2   0             0  is_stmt end_sequence
debug_line[0x0000def9]
Line table prologue:
    total_length: 0x0000015a
         version: 4
 prologue_length: 0x000000eb
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "Test"
include_directories[  2] = "Test/NetworkAPM"
include_directories[  3] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/objc"
file_names[  1]:
           name: "AppDelegate.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  2]:
           name: "JMWebResourceURLProtocol.h"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000
file_names[  3]:
           name: "AppDelegate.m"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  4]:
           name: "objc.h"
      dir_index: 3
       mod_time: 0x00000000
         length: 0x00000000
// ......

You can see debug_line contains the number of lines for each code address.The AppDelegate section is pasted above.

4.3 symbols

In links, we collectively refer to functions and variables as symbols. The function name or variable name is the symbol name (Symbol Name). We can think of symbols as binders in links. The whole link process is based on symbols to complete correctly.

The above words come from "Programmer's Self-cultivation".So symbols are the general name of functions, variables and classes.

Symbols can be divided into three categories by type:

  • Global Symbols: Symbols that are visible outside the target file and can be referenced by other target files or require other target file definitions
  • Local symbols: Symbols that are visible only in the target file, referring to functions and variables that are visible only in the target file
  • Debug symbols: Debug symbols that include line number information, which records the file and file line numbers corresponding to functions and variables.

Symbol Table: A table that maps memory addresses to function names, file names, and line numbers.Each defined symbol has a corresponding value, called a Symbol Value. For variables and functions, the symbol value is the address, and the symbol table is composed as follows

<Start Address> <End Address> <Function> [<Filename: Line Number>]

4.4 How do I get an address?

The image is relocated relative to the base address when it is loaded, and the base address is different each time it is loaded. The address of the function stack frame is the absolute address after the relocation. What we want is the relative address before the relocation.

Binary Images

Take the crash log of the test project as an example, and open and paste some Binary Images content

// ...
Binary Images:
0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
0x103204000 - 0x103267fff dyld arm64  <6f1c86b640a3352a8529bca213946dd5> /usr/lib/dyld
0x189a78000 - 0x189a8efff libsystem_trace.dylib arm64  <b7477df8f6ab3b2b9275ad23c6cc0b75> /usr/lib/system/libsystem_trace.dylib
// ...

You can see that the Binary Images in the Crash log contain the start address, end address, image name, arm schema, uuid, image path for each image to load.

Information in crash log

Last Exception Backtrace:
// ...
5   Test                          	0x102fe592c -[ViewController testMonitorCrash] + 22828 (ViewController.mm:58)
Binary Images:
0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test

So the relative address of frame 5 is 0x102fe592c - 0x102fe0000.Then use the command to restore the symbol information.

Using atos to resolve, 0x102fe0000 is the starting address for image loading, and 0x102fe592c is the address the frame needs to restore.

atos -o Test.app.dSYM/Contents/Resources/DWARF/Test-arch arm64 -l 0x102fe0000 0x102fe592c

4.5 UUID

  • UUID of crash file

    grep --after-context=2 "Binary Images:" *.crash
    
    Test  5-28-20, 7-47 PM.crash:Binary Images:
    Test  5-28-20, 7-47 PM.crash-0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
    Test  5-28-20, 7-47 PM.crash-0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
    --
    Test.crash:Binary Images:
    Test.crash-0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
    Test.crash-0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
    

    The UUID of Test App is 37eaa57df2523d95969e47a9a1d69ce5.

  • UUID of.dSYM file

    dwarfdump --uuid Test.app.dSYM
    

    The result is

    UUID: 37EAA57D-F252-3D95-969E-47A9A1D69CE5 (arm64) Test.app.dSYM/Contents/Resources/DWARF/Test
    
  • UUID of app

    dwarfdump --uuid Test.app/Test
    

    The result is

    UUID: 37EAA57D-F252-3D95-969E-47A9A1D69CE5 (arm64) Test.app/Test
    

4.6 Symbolization (parsing Crash logs)

The above section analyses how to capture various types of crash. App can acquire crash case discovery information in the user's hand through technical means and combine certain mechanism to report, but this stack is a hexadecimal address and cannot locate the problem, so it needs to be symbolized.

This is also illustrated above .dSYM file The function of restoring file names, rows, and function names by combining symbolic addresses with dSYM files is called symbolization.However, the.DSYM file must correspond strictly to the bundle id and version of the crash log file.

Getting the Crash log can select the corresponding device through Xcode -> Window -> Devices and Simulators, find the Crash log file, and locate it by time and App name.

app and.dSYM files can be obtained from packaged products in the path ~/Library/Developer/Xcode/Archives.

There are generally two analytical methods:

  • Using symbolicatecrash

    symbolicatecrash is the crash log analysis tool that comes with Xcode. First, determine the path and execute the following command at the terminal

    find /Applications/Xcode.app -name symbolicatecrash -type f
    

    Will return several paths to findIPhone Simulator.platformOn that line

    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/Library/PrivateFrameworks/DVTFoundation.framework/symbolicatecrash
    

    Copy symbolicatecrash to the specified folder (folder where app, dSYM, crash files are saved)

    Execute Command

    ./symbolicatecrash Test.crash Test.dSYM > Test.crash
    

    The first time you do this, you should miss Error:'DEVELOPER_DIR "is not defined at. /symbolicatecrash line 69." Solution: Execute the following command on the terminal

    export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
    
  • Using atos

    Unlike symbolicatecrash, atos are more flexible as long as the.crash and.dSYM or.crash and.app files correspond.

    Usage is as follows, -l ends up with a symbolic address

    xcrun atos -o Test.app.dSYM/Contents/Resources/DWARF/Test -arch armv7 -l 0x1023c592c
    

    You can also parse the.app file (no.dSYM file exists), where xxx is the segment address and xx is the offset address

    atos -arch architecture -o binary -l xxx xx
    

Since there may be many Apps and each App may be a different version in the user's hand, when symbolization is required after APM interception, crash files and.dSYM files need to be symbolized one by one in order to symbolize correctly, and the corresponding principle is that UUID s are consistent.

4.7 Symbolized Analysis of System Library

Each time we connect to the Xcode runner on our real machine, we are prompted to wait. In fact, for stack parsing, the system automatically imports the current version of the system symbol library into the / Users / your own user name / Library/Developer/Xcode/iOS DeviceSupport directory, where a large number of symbolized files of the system library are installed.You can visit the following directory to see

/Users/Your own username/Library/Developer/Xcode/iOS DeviceSupport/

5. Server-side Processing

5.1 ELK Logging System

Industry design log monitoring systems generally use ELK-based technology.ELK is the abbreviation of Elasticsearch, Logstash and Kibana.Elasticsearch is a distributed, interactive, near real-time search platform framework.Logstash is a central data flow engine that collects data in different formats from different destinations (file/data store/MQ) and filters it to support output to different destinations (file/MQ/Redis/ElasticsSearch/Kafka).Kibana can display Elasticserarch's data on friendly pages and provide visual analysis capabilities.So ELK can build an efficient and enterprise-level log analysis system.

In the early era of single application, almost all the functions of the application were running on one machine, and there was a problem. Operations and maintenance personnel opened terminal input commands to view the system log directly, to locate the problem and solve the problem.As the functions of the system become more and more complex, the number of users is larger and larger, and single application is hardly able to meet the needs, so the technology architecture iterates, supports a large number of users by expanding horizontally, and divides single application into multiple applications, each application is deployed in a cluster mode, load balancing control dispatch, and if a sub-module has problems, find this serviceIs the terminal on the device looking for log analysis?Obviously behind the scenes, log management platforms emerged.Logstash collects and analyzes log files for each server, filters them according to defined regular templates, transfers them to Kafka or Redis, reads them from Kafka or Redis, stores them in ES, and creates indexes through Kibana for visual analysis.In addition, data collected can be analyzed for further maintenance and decision making.

The diagram above shows an ELK log schema diagram.Simple instructions:

  • Before Logstash and ES, there was a Kafka layer, because Logstash is set up on the data resource server, real-time filtering of collected data takes time and memory, so there is a Kafka, which acts as a data buffer storage because Kafka has excellent read and write performance.
  • Another step is Logstash reads data from Kafka, filters, processes it, and transfers the results to ES
  • This design not only has good performance, low coupling, but also scalability.For example, it can read from n different Logstash and transfer to n Kafka s, which are then filtered by n Logstash.Log sources can be m, such as App logs, Tomcat logs, Nginx logs, and so on

The following image shows an "Elastic APM hands-on" shared by the Elasticsearch community theme Content screenshot.

5.2 Service Side

Crash log s are not symbolized when they are unified into Kibana, so symbolization is required to facilitate problem location, crash report generation, and subsequent processing.

So the whole process is: Client APM SDK collects crash log -> Kafka storage -> Mac execution timer task symbolization -> Data return Kafka -> Product side (display side) classifies data, reports, alarms and so on.

Because companies have multiple product lines, corresponding Apps, and users use different versions of Apps, crash log analysis must have the correct.dSYM files, so many different versions of Apps make automation very important.

Automation has two options, a smaller company or an easy way to add runScript script code to Xcode to automatically upload dSYM in release mode.

Because our company has its own set of systems, wax-cli, which can manage the initialization, dependency management, build (continuous integration, Unit Test, Lint, jump detection), test, packaging, deployment, dynamic capabilities (hot update, jump routing download) of iOS SDK, iOS App, Android SDK, Android App, Node, React, React Native project at the same time.Insertions can be made based on capabilities at each stage, so you can upload a.dSYM file to Qiniuyun storage after calling the package machine (rules can be AppName + Version as key, value as.dSYM file).

Many architecture designs are now microservices, and why microservices are selected is not covered in this article.So crash logs are symbolized as a microservice.The schematic diagram is as follows

Explain:

  • Symbolization Service, as a component of Prism in the entire monitoring system, is a microservice focused on crash report symbolization.

  • Receive requests from mass that contain pre-processed crash reports and dsym index es, pull the corresponding dsym from Seven Bulls, symbolize the crash reports, calculate hash, and respond hash to mass.

  • Receives requests from the Prism management system that contain the original crash report and dsym index, pulls the corresponding dsym from Seven Niu, symbolizes the crash report, and responds to the symbolized crash report to the Prism management system.

  • Mass is a common framework for data processing (streaming/batching) and task scheduling

  • Candle is a packaging system, wax-cli mentioned above has the ability to package, which is actually the packaging and building ability of the calling candle system.According to the characteristics of the project, the appropriate packer will be selected (packaging platform is to maintain multiple packaging tasks, different tasks are distributed to different packers according to the characteristics, task details page can see dependent download, compilation, running process, packaged products include binary package, download QR code, etc.)

Among them, symbolized services are the product of large front-end teams in the context of large front-end, so NodeJS implements them.The iOS symbolization machine is a dual-core Macmini, which requires experimental evaluation to start several worker processes for symbolization services.The result is that the crash log is processed by two processes, which is nearly twice as efficient as a single process, while the efficiency improvement of four processes is not significant compared with two processes, which is consistent with the dual-core mac mini feature.So start two worker processes to symbolize.

The following is the complete design

Simply explained, the symbolization process is a master-slave mode, a master machine, multiple slave machines, and the master machine reads the cache of.dSYM and crash results.The mass dispatch symbolization service (two internal symbolocate worker s) retrieves.dSYM files from the Qiniuyun cloud at the same time.

The system architecture diagram is as follows

7. Summary of APM

  1. Usually the monitoring capabilities of each end are inconsistent and the technical implementation details are inconsistent.Therefore, the monitoring capability needs to be aligned and unified during the technical scheme review.The data fields of each capability at each end must be aligned (number of fields, name, data type and accuracy), because APM itself is a closed-loop, after monitoring, symbolic parsing, data collation, product development, and finally, the need to monitor the overall display, etc.

  2. Some crash or ANR communicate with stakeholders by e-mail, SMS, and enterprise content communication tools based on hierarchy, then quickly release versions, hot fix, etc.

  3. The capabilities of monitoring need to be configurable, flexible to turn on and off.

  4. Monitoring data requires memory-to-file write processing and requires attention to strategies.Monitoring data needs to store databases, database sizes, design rules, and so on.How to report when stored in a database, the reporting mechanism, etc. will be discussed in another article: Create a universal, configurable data reporting SDK

  5. As far as possible, after the technical review, write the technical implementation of each end into the document and synchronize it with the relevant personnel.For example, the implementation of ANR

    /*
    android end
    
    More than 300 ms is generally considered a carton, according to device ratings
    hook System loop, which is stuck before and after message processing to calculate the duration of each message
     Open another thread dump stack and close after processing
    */
    new ExceptionProcessor().init(this, new Runnable() {
                @Override
                public void run() {
                    //Monitoring Carton
                    try {
                        ProxyPrinter proxyPrinter = new ProxyPrinter(PerformanceMonitor.this);
                        Looper.getMainLooper().setMessageLogging(proxyPrinter);
                        mWeakPrinter = new WeakReference<ProxyPrinter>(proxyPrinter);
                    } catch (FileNotFoundException e) {
                    }
                }
            })
    
    /*
    iOS end
    
    Subthreads ping the main thread to confirm that it is currently stuck.
    The Carton threshold is set to 300 ms, beyond which Carton is considered.
    Carton gets the stack of the main thread and stores the upload.
    */ 
    - (void) main() {
        while (self.cancle == NO) {
            self.isMainThreadBlocked = YES;
            dispatch_async(dispatch_get_main_queue(), ^{
                self.isMainThreadBlocked = YES;
                [self.semaphore singal];
            });
            [Thread sleep:300];
            if (self.isMainThreadBlocked) {
                [self handleMainThreadBlock];
            }
            [self.semaphore wait];
        }
    }
    
  6. The schematic diagram for the entire APM is as follows

    Explain:

    • Buried SDK, associated with log data through sessionId
    • Wax, described above, is a multiend project management model where each wax project has basic information
  7. APM technology solution itself is constantly adjusted and upgraded with the technical means and analysis needs.The schematic diagrams above are from earlier versions and are being upgraded and structured on this basis, with a few keywords: Hermes, Flink SQL, InfluxDB.

Reference material

Tags: Mobile iOS network xcode SDK

Posted on Fri, 26 Jun 2020 21:14:44 -0400 by veveu