docs/speed/debug-janks.md
You are noticing janks when you are scrolling a page, or during an animation. You want to investigate the jank. This document gives a quick overview of how you can do this.
[TOC]
The best way to debug a janky scroll/animation is to collect a trace during the event. There are detailed instructions for doing this here, but briefly, to collect a trace:
This document does not explain all the ways of navigating the trace-viewer, but some quick tips:
We have instrumented the code to record FrameSequenceTracker trace-events for
scrolling, or animations. There can be many FrameSequenceTracker events, but
each one carries the name of the sequence (e.g. TouchScroll, WheelScroll,
MainThreadAnimation, CompositorAnimation, RAF etc.)
In the trace you recorded, you will need to find the FrameSequenceTracker
corresponding to the janky scroll/animation. Each such event reports the number
of frames expected and the number of frames actually produced that became
visible.
From these numbers, you can determine how many frames were dropped during the sequence. For example, in the above screenshot, the scroll lasted for almost 1 second (59 frames), but only 54 frames were produced. This means there were 5 dropped frames during this scroll.
The next step is to figure out when these frames were dropped during the
sequence. To find this, look for the FramePresented sub-events for the
FrameSequenceTracker event.
In the above screenshot, frames were presented at A, B, and C. Within this
duration, it appears a frame was presented every vsync. So there were no dropped
frames here. Contrast this with the following screenshot, where frames were
presented at A, B, C, D, E, F etc. The duration between D and E
is roughly double the duration between other FramePresented events as a result
of a dropped frame.
Now would be a good time to turn on flow events for viz. You can track the
flow events to see what may have caused the frame to be dropped. For example,
see the following screenshot:
A begin-frame is received at ~5828ms, and if we follow the flow-arrows, we see that a corresponding compositor-frame is submitted at ~5837ms. However, for the next begin-frame received at ~5842ms, we see that there is no outgoing flow from there. Looking more closely at these trace-events:
We see that the frame was deliberately skipped over to recover latency (notice
the SkipBeginImplFrameToReduceLatency event). At this point, we would want to
look at what else is happening around that time. You can use the
timing-selection in the trace-viewer to help with this:
The interesting things to look for are usually the Compositor thread and
CrRendererMain threads under the same process, or the CrGpuMain thread and
VizCompositorThread threads under the GPU Process. The
Graphics.Pipeline.DrawAndSwap trace-events (in the GPU process) usually give a
good idea of what is happening in the GPU process. Similarly, the
PipelineReporter events in the renderer process also give a good overview of
how each begin-frame message is handled.
Let’s look at some examples of janks:
In the above screenshot: the FramePresented event is ~62ms long, i.e.
there were 3 dropped frames here. Looking closely, we can see that the
compositor thread receives all the begin-frame messages, including for the
dropped frames. The main-thread is blocked running JavaScript (likely a
timer-callback) and can't handle the begin-frames before the frame deadlines. We
can assume there are no pending updates for the compositor-thread to handle
(i.e. there are no compositor-driven animations/scrolls in-progress at the
moment), so the compositor must wait on the main-thread before it is able to
submit a new frame. Following is the same screenshot with some annotations:
The begin-frames are received in A, B, C. The compositor is unable to
produce any updates for A and B because the main-thread is blocked running
some javascript code. Once the JS processing ends, the begin-frame received in
C is handled in the main-thread at D. After the main-thread is done handling
the begin-frame (note that it is called BeginMainFrame in the code to denote
that it is for the main-thread), the corresponding update is dispatched to the
display-compositor in E. However, the submitted frame misses the deadline in
the display compositor (not seen in this screenshot), and is therefore not
presented. A fourth begin-frame is received (F), and a frame is submitted in
response to that (G), which is then finally presented. Thus, there are three
dropped frames (no frames produced for A and B, and updated frame for C is
delayed and presented to the user with F) caused by the long-runner javascript
in the main-thread.
In the above screenshot (from a different trace), you can see a begin-frame sent
from the display-compositor (VizCompositorThread) to the renderer in A, the
renderer receives the begin-frame in B, and submits a CompositorFrame in C,
received back in the display compositor in D. The display-compositor starts
drawing the frame in E, and the gpu-main thread does a swap at F. For the next
frame, the display-compositor sends a begin-frame at G, and receives the
compositor-frame from the renderer in H, and draws the frame in I. However, the
gpu-main thread is blocked doing other work (from the renderer, in this case,
notice the FullImage trace-event), and so the swap-request from the
display-compositor gets delayed (notice the long WaitForSwap trace-event), and
finally happens at J.
Still have more questions? Please feel free to send questions to [email protected].