KE's undefined instruction problem record

Article catalog

Analysis of undefined instruction in KE problem

1. What is the defined instruction

1.1 classification

In the android operating system, according to the different levels of exception occurrence, it can be divided into KE, NE, JE, HWT, SWT, ANR and other types. In addition, case s that do not respond for a long time are classified in different levels;
This is mainly to record Undefined Instruction in KE problem: prefetch instruction exception
Computer operations can be divided into two categories: 1. Fetching data; 2. Fetching instructions, in which undefined instruction is the fetching exception;

This exception can be judged in the hardware implementation of CPU, and the processing function of exception vector can be defined in the kernel. When the executed instruction is not recognized by CPU, it will jump to the corresponding processing function. At present, the processing of this part is mainly to collect the information in the stack and save this part of data to the debug information According to the different configuration of the system, hang may be restarted or restarted here. There are two possible situations for restarting: the restart interface is called in the 1. handle function; 2., the CPU hang will not receive the feed dog information and trigger the HWT restart.

Exception level division of armV8 architecture:

1.2 principle description of instruction

  1. The concept of instruction:
    1. The CPU is designed as RISC (ARM) and CISC (X86) instruction sets for low power consumption and high performance respectively;
    1. RISC prefers to handle simple commands, i.e. frequently used instructions. For infrequently used instructions, it can be realized by combining simple instructions. For special commands, RISC has low performance, few unit circuits, and low power consumption;
    2. CISC tends to process complex commands, all instructions are implemented, and it is more likely to operate on memory, so it has high performance for special instructions, but there are many unit circuits, so the overall power consumption will be high;
    2. The implementation of all instructions, that is, the logic circuits implemented in the CPU according to the instruction set, that is, the instructions (binary code) will be translated into electrical signals in the CPU and executed in the logic circuits;
    2. What is in the instruction set
    TBD, not listed one by one, can go to the arm official website to download documents for learning
    4. Abnormal triggering:
    1. The instruction set specifies the instruction range, so it is no longer the value in this range. The CPU has no corresponding logic circuit and cannot execute, that is, it will be judged as undefined instruction by the CPU hardware circuit; [HW]
    2. There are 7 kinds of exceptions defined in the CPU. When such exceptions are triggered, they will enter the handle function stored in the exception vector table for processing; [assembly]
    3. Through the exception vector table, this kind of problem will enter do_unfi_abort, then enter the software processing flow in the kernel; [kernel]

2. What are the possibilities from process analysis

The so-called instruction is the code segment of image stored in emmc, which is loaded into memory when the system starts, and loaded into CPU register when the program runs;

  1. The data in emmc is damaged, resulting in abnormal data reading. In this case, it is necessary under certain conditions, and the high probability is that the system cannot be started;
    • The data in dump emmc is compared with the burned image;
    • The data dd in dump emmc is tested by the same method on another machine;
    • It can be recovered after upgrading;
  2. PC pointer runs away. There is a problem in fetching instructions from memory:
    • The PC pointer causes the data segment to run, and the extracted content cannot be parsed into instructions;
    • The PC pointer is still in the code segment, and the end of the code segment is part of data data. The extracted content cannot be parsed into instructions;
      Generally speaking, calculating the position of the pc pointer can confirm whether it is in the code segment. It is difficult to distinguish whether it is running or not, so we need to make a comparison one by one;
  3. The PC pointer is normal, and the data taken out is different from the median value of memory. It is generally suspected that there is a problem in cache, which is generally Icache;
    • If SOC Icache is damaged, which is represented by bit inversion, then various crashes will occur randomly, and all of them are undefined instruction s;
    • Due to environmental factors, a bit deflection occurs in the process of HW fetching data, with low probability;
      From the process point of view, there are basically the above types, but generally speaking, such exceptions are relatively rare compared with data abort;

3. Analysis methods and tools

  1. vmlinux is the symbol library, which is used to check whether the data in the machine is damaged;
  2. readelf, addr2line, objdump and other register address corresponding tools;
  3. trace32 and gdb are mainly used to simulate the crime scene and compare with vmlinux;

3.1 case analysis (atypical)

3.1.1 questions

In the process of customer pressure test, an abnormal restart occurs occasionally. After the restart, it is found that the exception handling mechanism grabs KE once

3.1.2 log

<0>[ 7232.602738]-(0)[349:logd.writer]Internal error: undefined instruction: 0 [#1] PREEMPT SMP
<6>[ 7232.603835]-(0)[349:logd.writer]disable aee kernel api
<5>[ 7232.604413]-(0)[349:logd.writer]Kernel Offset: 0x13c1c00000 from 0xffffff8008000000
<4>[ 7232.605590]-(0)[349:logd.writer]Modules linked in: usb_f_ean ffffffa665e29000 (null) 24576 0 wlan_drv_gen4m ffffffa665e39000 (null) 1929216 0 (O) wmt_chrdev_wifi ffffffa665e31000 (null) 28672 0 (O) gps_drv ffffffa665de1000 (null) 61440 0 (O) fmradio_drv ffffffa665dfd000 (null) 176128 0 (O) bt_drv ffffffa665df2000 (null) 32768 0 (O) wmt_drv ffffffa665cbd000 (null) 1191936 0 (O) fpsgo ffffffa665cb5000 (null) 16384 0 (O)
<5>[ 7233.612604]-(0)[349:logd.writer]Non-crashing CPUs did not react to IPI
<4>[ 7233.613485]-(0)[349:logd.writer]CPU: 0 PID: 349 Comm: logd.writer Tainted: G W O 4.9.117 #1
<4>[ 7233.614692]-(0)[349:logd.writer]Hardware name: AC8257V/WAB (DT)
<4>[ 7233.615478]-(0)[349:logd.writer]task: ffffffd2b962b000 task.stack: ffffffd2b9648000
<4>[ 7233.616492]-(0)[349:logd.writer]PC is at core_sys_select+0x8/0x3b4
<4>[ 7233.617310]-(0)[349:logd.writer]LR is at SyS_pselect6+0x214/0x418
<4>[ 7233.618115]-(0)[349:logd.writer]pc : [] lr : [] pstate: 80400145
<4>[ 7233.619297]-(0)[349:logd.writer]sp : ffffffd2b964bc70
<4>[ 7233.619972]-(0)[349:logd.writer]x29: ffffffd2b964beb0 x28: 0000000000000000
<4>[ 7233.620645]-(0)[349:logd.writer]x27: ffffff93caa22000 x26: 0000000000000048
<4>[ 7233.621317]-(0)[349:logd.writer]x25: 0000000000000000 x24: 0000000000000000
<4>[ 7233.621989]-(0)[349:logd.writer]x23: 0000000000000014 x22: 0000007cc1fff3a8
<4>[ 7233.622661]-(0)[349:logd.writer]x21: 0000000000000000 x20: 0000000000000000
<4>[ 7233.623333]-(0)[349:logd.writer]x19: 0000000000000000 x18: 0000007cc1ffd4da
<4>[ 7233.624005]-(0)[349:logd.writer]x17: 0000007cc29eb868 x16: ffffff93c9e3c388
<4>[ 7233.624677]-(0)[349:logd.writer]x15: 0000000000000000 x14: 0000007cc1600000
<4>[ 7233.625350]-(0)[349:logd.writer]x13: 00000000000c8a7f x12: 0000007cc24af270
<4>[ 7233.626022]-(0)[349:logd.writer]x11: 0000000000000000 x10: 0000000000000001
<4>[ 7233.626694]-(0)[349:logd.writer]x9 : 0000000000000002 x8 : 0000000000040975
<4>[ 7233.627365]-(0)[349:logd.writer]x7 : 0000ffffffffffff x6 : 0000000000000000
<4>[ 7233.628036]-(0)[349:logd.writer]x5 : 0000000000000000 x4 : 0000000000000000
<4>[ 7233.628709]-(0)[349:logd.writer]x3 : 0000000000000000 x2 : 0000000000000000
<4>[ 7233.629381]-(0)[349:logd.writer]x1 : 0000007cc1fff3a8 x0 : 0000000000000014
<4>[ 7233.630055]-(0)[349:logd.writer]
<4>[ 7233.630055]PC: 0xffffff93c9e3bdb8:

  1. Internal error: undefined instruction: 0 [ා1] preempt SMP description is undefined instruction
  2. Several key registers: PC: [] LR: []
    • The logical address ffff93c9e3be38 of vmlinux can be obtained by computing pc pointer and offset, and FFFFFF800823BE38 can be obtained by subtracting 0x13c1c00000
    • Search the address in vmlinux and get the code: a91667fa
      -The code of this part is as follows:

      Note: 1. The above PC pointer position is in the code segment; 2. This part of data has not been damaged;
      Therefore, it is suspected that there is a problem in the process of fetching registers from dram, that is, the possibility of bit inversion in Icache;
      Because there is no Icache content from dump, it cannot be judged directly;

And because this problem happens occasionally, only once so far, so it is a mystery to suspect that there is an exception in the process of fetching instructions affected by the environment;

Tags: Android React

Posted on Thu, 04 Jun 2020 13:01:37 -0400 by Alelinux