Wednesday, February 16, 2011

High-Level Fennec Profiling

I've been working on a series of patches to let us do high-level profiling on Fennec (and Firefox). The goal is to get a "big picture" view of what processing happens in Fennec, so we can investigate what should be optimized. More specifically, the idea is to see what events, runnables, and IPC messages are run, and how much time is spent on each.

Here is some example data, from ~9 seconds of panning and zooming on the Mozilla crash stats page, in the parent process. Notes:
  • Event 10 is a mouse event, and 37 is a simple gesture. So these are events that are triggered by the panning and zooming actions. A lot of these events happen, and they can take up to 1/20th of a second to process.
  • I am not sure why the RefreshDriver is called here (since this is the parent process). Perhaps worth looking into.
  • Layers Update IPC messages are working very well, with a mean processing time of 0.0004 seconds.
And here is some data from loading google.com (nonmobile). Notes:
  • AsyncAssociateIconToPage and InsertVisitedURIs take a while, but they are at least on a side thread.
  • RecvGetValue is web storage. We are considering what to do with that in bug 627635.
  • nsGlobalWindow.cpp:8834 is likely a setTimeout or Interval that google.com created.
  • We receive 39 RecvHTMLDNSPrefetch IPC messages. Offhand I don't know why so many would occur on a blank search page. Filed bug 634653.

So, the data shows on a high level what code is run, unlike a low-level profiler (like oprofile) that shows how much individual functions or lines of code take. Low-level profilers are very useful of course, but sometimes it is also good to get higher-level data (and it isn't always possible or easy to deduce that from the low-level data). Also, running low-level profilers depends on the OS, so you need one to be available for your OS (a problem on Android - but see the very cool piranha), and even if you do have one, your data is for that OS only, and not necessarily directly comparable to data from a different low-level profiler on a different OS (in other words, you are changing both profiler and OS, and not just OS).

Of course this approach has limitations as well, so a combination of both approaches is the best thing.

The patches are as follows:
  • Timer logger: A script that automatically rewrites the Fennec source code to add identifying strings to all timers, and code to generate log output for that. This is an improvement of my previous IPC profiling patch.
  • Runnables and events logger: Another rewriting script, that does something similar for Runnables. Also adds log output for Runnables and Events.
  • IPC traffic logger: A few hooks to generate log output for each IPC message.
  • Android log processer: Tiny script that splits an android adb logcat dump into separate files for each process, that are ready for further processing. This is only needed on Android.
  • Data analyzer: Processes the generated logs and creates readable output (like the data shown above)
To use the patches, apply them, then run timer_rewriter.py and runnable_rewriter.py. Build Fennec (or Firefox), then run it, and capture the log output into a file. You should have a separate file for each process; on Android, use the android_prettifier.py script mentioned above. Then run the analyze_profiling_output.py script on the data to get the summary.