Android ANR Log Analysis Guide

When your project becomes more and more complex, or your users reach a certain order of magnitude, your code accidentally has small problems, and you will receive all kinds of bug s, among which you will not be unfamiliar with the problem of ANR. This paper will explain in detail the types of anr, the reasons for its emergence, the detailed analysis of anr cases and classic cases.

definition

ANR(Application Not Responding)   The application is not responding. If your application is blocked in the UI thread for too long, anr will appear. Usually anr will appear, and a prompt box will pop up to let the user know whether to continue to wait or close the program.

ANR type

There are generally the following types of ANR:
1:KeyDispatchTimeout (common)
The input event was not processed within 5S, and ANR occurred.
logcat log keyword: Input event dispatching timed out

2:BroadcastTimeout
The foreground Broadcast: onReceiver does not complete processing within 10S, and an ANR occurs.
Background Broadcast: onReceiver does not complete processing within 60s, and ANR occurs.
logcat log keyword: Timeout of broadcast BroadcastRecord

3:ServiceTimeout
Foreground Service: ANR occurs when the life cycle of onCreate, onStart and onBind is not processed within 20s.
Background Service: ANR occurs when the life cycle of onCreate, onStart and onBind is not processed within 200s
logcat log keyword: Timeout executing service

4: ContentProviderTimeout
The ContentProvider did not complete processing within 10S, and an ANR occurred. logcat log keyword: timeout publishing content providers

Causes of ANR

1: The main thread frequently performs time-consuming IO operations, such as database reading and writing
2: Deadlock of multithreading operation, and the main thread is block ed;
3: The main thread is blocked by the Binder peer block;
4: ANR in WatchDog in system server;
5:service binder is online and cannot communicate with System Server
6: System resources are exhausted (pipeline, CPU, IO)

ANR case analysis process

We will analyze ANR step by step, which makes us better understand how to find, analyze and solve problems.

1, View events_log

View events under the mobilelog folder_ Log, search for the keyword from the log: am_anr, find the time point where ANR occurs, process PID and ANR type.

Such as log:

07-20 15:36:36.472  1000  1520  1597 I am_anr  : [0,1480,com.xxxx.moblie,952680005,Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]

From the above log, we can see that the application is com.xxxx.moblie   At 07-20 15:36:36.472, an ANR of KeyDispatchTimeout type occurred, and its process number was 1480. Sort out the key information:
ANR time: 07-20 15:36:36.472
Process pid: 1480
Process name: com.xxxx.moblie
ANR type: KeyDispatchTimeout

We already know that the ANR of KeyDispatchTimeout occurs because   The input event did not complete processing within 5 seconds. So at this time 07-20 15:36:36.472   What did the program do in the first five seconds of the program, that is, around the time period (15:36:30 ~ 15:36:31)? This is simple, because we already know the pid, and then search the logs with pid = 1480. These logs represent the track of the process. The key logs are as follows:

07-20 15:36:29.749 10102  1480  1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE
07-20 15:36:30.136 10102  1480  1737 D moblie-Application: receiving an empty message, drop
07-20 15:36:35.791 10102  1480  1766 I Adreno  : QUALCOMM build                   : 9c9b012, I92eb381bc9
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Build Date                       : 12/31/17
07-20 15:36:35.791 10102  1480  1766 I Adreno  : OpenGL ES Shader Compiler Version: EV031.22.00.01
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Local Branch                     : 
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Remote Branch                    : refs/tags/AU_LINUX_ANDROID_LA.UM.6.4.R1.08.00.00.309.049
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Remote Branch                    : NONE
07-20 15:36:35.791 10102  1480  1766 I Adreno  : Reconstruct Branch               : NOTHING
07-20 15:36:35.826 10102  1480  1766 I vndksupport: sphal namespace is not configured for this process. Loading /vendor/lib64/hw/gralloc.msm8998.so from the current namespace instead.
07-20 15:36:36.682 10102  1480  1480 W ViewRootImpl[MainActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_PERIOD, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=16099429, downTime=16099429, deviceId=-1, source=0x101 }

As we can see from the above, the program received an action message at time 07-20 15:36:29.749.

07-20 15:36:29.749 10102  1480  1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE. 

It turned out to be app com.xxxx.moblie   A push message (com.ttt.push.RECEIVE_MESSAGE) was received, which led to blocking. Let's connect the information we have obtained so far: at time 07-20 15:36:29.749   App com.xxxx.moblie   Received the push message action=com.ttt.push.RECEIVE_MESSAGE is blocked, and ANR of KeyDispatchTimeout occurs 5 seconds later.

Although I know how to start, the specific reason has not been found. Was it because the CPU was very nervous and various apps seized resources again? Let's look at the CPU information,. Search keywords:   ANR IN

07-20 15:36:58.711  1000  1520  1597 E ActivityManager: ANR in com.xxxx.moblie (com.xxxx.moblie/.ui.MainActivity) (Process name)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: PID: 1480 (process pid)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: Reason: Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: Load: 0.0 / 0.0 / 0.0 (Load Indicates 1 minute,5 minute,15 minute CPU Load of)
07-20 15:36:58.711  1000  1520  1597 E ActivityManager: CPU usage from 20ms to 20286ms later (2018-07-20 15:36:36.170 to 2018-07-20 15:36:56.436):
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   42% 6774/pressure: 41% user + 1.4% kernel / faults: 168 minor
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   34% 142/kswapd0: 0% user + 34% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   31% 1520/system_server: 13% user + 18% kernel / faults: 58724 minor 1585 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   13% 29901/com.ss.android.article.news: 7.7% user + 6% kernel / faults: 56007 minor 2446 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   13% 32638/com.android.quicksearchbox: 9.4% user + 3.8% kernel / faults: 48999 minor 1540 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   11% (CPU Utilization rate of)1480/com.xxxx.moblie: 5.2%(Usage of user status) user + (Kernel usage) 6.3% kernel / faults: 76401 minor 2422 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   8.2% 21000/kworker/u16:12: 0% user + 8.2% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   0.8% 724/mtd: 0% user + 0.8% kernel / faults: 1561 minor 9 major
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   8% 29704/kworker/u16:8: 0% user + 8% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   7.9% 24391/kworker/u16:18: 0% user + 7.9% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   7.1% 30656/kworker/u16:14: 0% user + 7.1% kernel
07-20 15:36:58.711  1000  1520  1597 E ActivityManager:   7.1% 9998/kworker/u16:4: 0% user + 7.1% kernel

I have marked the relevant meaning in the log. com.xxxx.moblie   It takes up 11% of the CPU, which is not much. Today's mobile phones are basically multi-core CPUs. If your CPU is 4 cores, the upper limit is 400%, and so on.

Since it is not the CPU load, what is the reason? At this time, it depends on our ultimate killer - traces.txt.

2, traces.txt log analysis

When the APP does not respond, the response is slow, or the WatchDog monitoring does not receive a response, the system will dump a traces.txt file and store it in the file directory: / data/anr/traces.txt. Through the traces file, we can get the thread name, stack information, current thread status, binder call and other information.
Get the file through the adb command: adb pull /data/anr/traces.txt
trace: Cmd line:com.xxxx.moblie

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x73bcc7d0 self=0x7f20814c00
  | sysTid=20176 nice=-10 cgrp=default sched=0/0 handle=0x7f251349b0
  | state=R schedstat=( 0 0 0 ) utm=12 stm=3 core=5 HZ=100
  | stack=0x7fdb75e000-0x7fdb760000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  // java stack call information, you can view the call relationship and locate the specific location
  at ttt.push.InterceptorProxy.addMiuiApplication(InterceptorProxy.java:77)
  at ttt.push.InterceptorProxy.create(InterceptorProxy.java:59)
  at android.app.Activity.onCreate(Activity.java:1041)
  at miui.app.Activity.onCreate(SourceFile:47)
  at com.xxxx.moblie.ui.b.onCreate(SourceFile:172)
  at com.xxxx.moblie.ui.MainActivity.onCreate(SourceFile:68)
  at android.app.Activity.performCreate(Activity.java:7050)
  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1214)
  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2807)
  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2929)
  at android.app.ActivityThread.-wrap11(ActivityThread.java:-1)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1618)
  at android.os.Handler.dispatchMessage(Handler.java:105)
  at android.os.Looper.loop(Looper.java:171)
  at android.app.ActivityThread.main(ActivityThread.java:6699)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)

I'll analyze some fields in traces.txt in detail to see what information it can provide us
Main: Main identifies the main thread. If it is a thread, it is named in the format of "Thread-X". X represents the thread id and increases gradually.
prio: thread priority. The default value is 5
tid: tid is not the ID of the thread, but the unique ID of the thread
Group: is the thread group name
sCount: the number of times the thread was suspended
dsCount: is the number of times the thread was suspended by the debugger
obj: object address
self: the Native address of the thread
sysTid: is the thread number (the thread number of the main thread is the same as the process number)
nice: is the scheduling priority of the thread
sched: indicates the scheduling policy and priority of threads respectively
cgrp: scheduling home group
handle: the address of the thread processing function.
State: is the scheduling state
Schedstat: from  / Read out proc/[pid]/task/[tid]/schedstat. The three values respectively represent the execution time of the thread on the cpu, the waiting time of the thread and the length of the time slice of thread execution. The three values that do not support this information are 0;
utm: is the time value used in the thread user mode (unit: jiffies)
stm: is the scheduling time value in kernel mode
Core: is the serial number of the cpu core that last executed this thread.

We are most concerned about the stack information of Java, which can locate the specific location. From the above traces, we can judge ttt.push.InterceptorProxy.addMiuiApplicationInterceptorProxy.java:77   This leads to ANR in com.xxxx.moblie. At this time, you can check the source code, find the problem and solve it.

Summarize the analysis process: first, we search am_anr to find the time point, process PID and anr type of anr, and then search PID to find the log about the first 5 seconds. Filter ANR IN to view CPU information, then view traces.txt, find java stack information, locate code location, and finally view the source code to analyze and solve problems. This process can basically find the reason for ANR sequence of events.

ANR case sorting

1, The main thread is lock ed by other threads, resulting in deadlock

waiting on <0x1cd570> (a android.os.MessageQueue)

DALVIK THREADS:
"main" prio=5 tid=3 TIMED_WAIT
  | group="main" sCount=1 dsCount=0 s=0 obj=0x400143a8
  | sysTid=691 nice=0 sched=0/0 handle=-1091117924
  at java.lang.Object.wait(Native Method)
  - waiting on <0x1cd570> (a android.os.MessageQueue)
  at java.lang.Object.wait(Object.java:195)
  at android.os.MessageQueue.next(MessageQueue.java:144)
  at android.os.Looper.loop(Looper.java:110)
  at android.app.ActivityThread.main(ActivityThread.java:3742)
  at java.lang.reflect.Method.invokeNative(Native Method)
  at java.lang.reflect.Method.invoke(Method.java:515)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:739)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:497)
  at dalvik.system.NativeStart.main(Native Method)

"Binder Thread #3" prio=5 tid=15 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x434e7758
  | sysTid=734 nice=0 sched=0/0 handle=1733632
  at dalvik.system.NativeStart.run(Native Method)

"Binder Thread #2" prio=5 tid=13 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x1cd570
  | sysTid=696 nice=0 sched=0/0 handle=1369840
  at dalvik.system.NativeStart.run(Native Method)

"Binder Thread #1" prio=5 tid=11 NATIVE
  | group="main" sCount=1 dsCount=0 s=0 obj=0x433aca10
  | sysTid=695 nice=0 sched=0/0 handle=1367448
  at dalvik.system.NativeStart.run(Native Method)

----- end 691 -----

2, The main thread does time-consuming operations, such as database reading and writing.

"main" prio=5 tid=1 Native
held mutexes=
kernel: (couldn't read /proc/self/task/11003/stack)
native: #00 pc 000492a4 /system/lib/libc.so (nanosleep+12)
native: #01 pc 0002dc21 /system/lib/libc.so (usleep+52)
native: #02 pc 00009cab /system/lib/libsqlite.so (???)
native: #03 pc 00011119 /system/lib/libsqlite.so (???)
native: #04 pc 00016455 /system/lib/libsqlite.so (???)
native: #16 pc 0000fa29 /system/lib/libsqlite.so (???)
native: #17 pc 0000fad7 /system/lib/libsqlite.so (sqlite3_prepare16_v2+14)
native: #18 pc 0007f671 /system/lib/libandroid_runtime.so (???)
native: #19 pc 002b4721 /system/framework/arm/boot-framework.oat (Java_android_database_sqlite_SQLiteConnection_nativePrepareStatement__JLjava_lang_String_2+116)
at android.database.sqlite.SQLiteConnection.setWalModeFromConfiguration(SQLiteConnection.java:294)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:215)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:193)
at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked(SQLiteConnectionPool.java:463)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:185)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:177)
at android.database.sqlite.SQLiteDatabase.openInner(SQLiteDatabase.java:808)
locked <0x0db193bf> (a java.lang.Object)
at android.database.sqlite.SQLiteDatabase.open(SQLiteDatabase.java:793)
at android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:696)
at android.app.ContextImpl.openOrCreateDatabase(ContextImpl.java:690)
at android.content.ContextWrapper.openOrCreateDatabase(ContextWrapper.java:299)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:223)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)
locked <0x045a4a8c> (a com.xxxx.video.common.data.DataBaseHelper)
at com.xxxx.video.common.data.DataBaseORM.<init>(DataBaseORM.java:46)
at com.xxxx.video.common.data.DataBaseORM.getInstance(DataBaseORM.java:53)
locked <0x017095d5> (a java.lang.Class<com.xxxx.video.common.data.DataBaseORM>)

3, Too much binder data

07-21 04:43:21.573  1000  1488 12756 E Binder  : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)
07-21 04:43:21.573  1000  1488 12756 E Binder  : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)
07-21 04:43:21.607  1000  1488  2951 E Binder  : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)
07-21 04:43:21.607  1000  1488  2951 E Binder  : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)
07-21 04:43:21.662  1000  1488  6258 E Binder  : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 259300 (data: 1, 33, 7274595)

4, binder communication failed

07-21 06:04:35.580 <6>[32837.690321] binder: 1698:2362 transaction failed 29189/-3, size 100-0 line 3042
07-21 06:04:35.594 <6>[32837.704042] binder: 1765:4071 transaction failed 29189/-3, size 76-0 line 3042
07-21 06:04:35.899 <6>[32838.009132] binder: 1765:4067 transaction failed 29189/-3, size 224-8 line 3042
07-21 06:04:36.018 <6>[32838.128903] binder: 1765:2397 transaction failed 29189/-22, size 348-0 line 2916

Posted on Fri, 08 Oct 2021 00:43:18 -0400 by oolongdavies