The 3rd apk is hung up 11s while playback.
The direct appearance is ANR: Input event dispatching timed out sending. This ANR is caused by mediaplayer process.
Firstly, we use top
to watch CPU loading, but find that CPU loading is not high when playback.
Secondly, check /proc/interrupts
, we cannot find any interrupt abnormal.
This is a customer on-site issue, we only digg the log. We add some debug logs, and find that sometimes the interval printing two logs is 11s,it’s weird.
We doubt that system scheduling maybe occur anomaly. So capture ftrace data.
start ftrace
12345678mount -t debugfs nodev /sys/kernel/debugecho nop > /sys/kernel/debug/tracing/current_tracerecho 0 > /sys/kernel/debug/tracing/tracing_onecho > /sys/kernel/debug/tracing/traceecho "sched_switch sched_wakeup sched_wakeup_new sched_migrate_task irq timer" > /sys/kernel/debug/tracing/set_eventecho "workqueue_execute_start workqueue_execute_end block_rq_issue block_rq_insert block_rq_complete" >> /sys/kernel/debug/tracing/set_eventecho 20480 > /sys/kernel/debug/tracing/buffer_size_kbecho 1 > /sys/kernel/debug/tracing/tracing_onstop ftrace (stop ftrace immediately once be reproduced)
123echo 0 > /sys/kernel/debug/tracing/tracing_oncat /sys/kernel/debug/tracing/trace > /data/ftrace.logecho 0 > /sys/kernel/debug/tracing/trace
Notes: How to make ANR time printed by logcat
to match Unix Timestamp. There is a very good tool: http://rimzy.net/tools/php_timestamp_converter.php
Go through digging ftrace data, we found a doubtful point: process X hasn’t scheduled about 11s.
- use
renice
andtasklet
to improve process priority and bind process X to specific CPU, it’s no improvement. - We found that process X
sched_wakeup
from idle process, it means process X is waked up by interrupt. The latest interrupt is uart interrupt.
So we disable uart log or change uart baud rate to 921600, the ANR disappeared.
At last, we review n_tty_write() function.
Write data in while loop. When user space process uses blocking write method, if write fail, it will yield cpu and wake up untill the condition is met. In this ANR case, the root cause is that user space process instant log data is huge and block in n_tty_write().