Posted by Yabin Cui, Software Engineer
We are the Android LLVM toolchain team. One of our top priorities is to improve Android performance through optimization techniques in the LLVM ecosystem. We are constantly searching for ways to make Android faster, smoother, and more efficient. While much of our optimization work happens in userspace, the kernel remains the heart of the system. Today, we’re excited to share how we are bringing Automatic Feedback-Directed Optimization (AutoFDO) to the Android kernel to deliver significant performance wins for users.
During a standard software build, the compiler makes thousands of small decisions, such as whether to inline a function and which branch of a conditional is likely to be taken, based on static code hints.While these heuristics are useful, they don’t always accurately predict code execution during real-world phone usage.
AutoFDO changes this by using real-world execution patterns to guide the compiler. These patterns represent the most common instruction execution paths the code takes during actual use, captured by recording the CPU’s branching history. While this data can be collected from fleet devices, for the kernel we synthesize it in a lab environment using representative workloads, such as running the top 100 most popular apps. We use a sampling profiler to capture this data, identifying which parts of the code are ‘hot’ (frequently used) and which are ‘cold’.
When we rebuild the kernel with these profiles, the compiler can make much smarter optimization decisions tailored to actual Android workloads.
To understand the impact of this optimization, consider these key facts:
- On Android, the kernel accounts for about 40% of CPU time.
- We are already using AutoFDO to optimize native executables and libraries in the userspace, achieving about 4% cold app launch improvement and a 1% boot time reduction.
Real-World Performance Wins
We have seen impressive improvements across key Android metrics by leveraging profiles from controlled lab environments. These profiles were collected using app crawling and launching, and measured on Pixel devices across the 6.1, 6.6, and 6.12 kernels.
The most noticeable improvements are listed below. Details on the AutoFDO profiles for these kernel versions can be found in the respective Android kernel repositories for android16-6.12 and android15-6.6 kernels.
These aren’t just theoretical numbers. They translate to a snappier interface, faster app switching, extended battery life, and an overall more responsive device for the end user.
Our deployment strategy involves a sophisticated pipeline to ensure profiles stay relevant and performance remains stable.
Step 1: Profile Collection
While we rely on our internal test fleet to profile userspace binaries, we shifted to a controlled lab environment for the Generic Kernel Image (GKI). Decoupling profiling from the device release cycle allows for flexible, immediate updates independent of deployed kernel versions. Crucially, tests confirm that this lab-based data delivers performance gains comparable to those from real-world fleets.
System-Wide Monitoring: Capturing not only foreground app activities, but also critical background workloads and inter-process communications
Validation: This synthesized workload shows an 85% similarity to execution patterns collected from our internal fleet.
Targeted Data: By repeating these tests sufficiently, we capture high-fidelity execution patterns that accurately represent real-world user interaction with the most popular applications. Furthermore, this extensible framework allows us to seamlessly integrate additional workloads and benchmarks to broaden our coverage.
Step 2: Profile Processing
We post-process the raw trace data to ensure it is clean, effective, and ready for the compiler.
Aggregation: We consolidate data from multiple test runs and devices into a single system view.
- Conversion: We convert raw traces into the AutoFDO profile format, filtering out unwanted symbols as needed.
Profile Trimming: We trim profiles to remove data for “cold” functions, allowing them to use standard optimization. This prevents regressions in rarely used code and avoids unnecessary increases in binary size.
Step 3: Profile Testing
Before deployment, profiles undergo rigorous verification to ensure they deliver consistent performance wins without stability risks.
Profile & Binary Analysis: We strictly compare the new profile’s content (including hot functions, sample counts, and profile size) against previous versions. We also use the profile to build a new kernel image, analyzing binaries to ensure that changes to the text section are consistent with expectations.
Performance Verification: We run targeted benchmarks on the new kernel image. This confirms that it maintains the performance improvements established by previous baselines.
Continuous Updates
Code naturally “drifts” over time, so a static profile would eventually lose its effectiveness. To maintain peak performance, we run the pipeline continuously to drive regular updates:
- Regular Refresh: We refresh profiles in Android kernel LTS branches ahead of each GKI release, ensuring every build includes the latest profile data.
Future Expansion: We are currently delivering these updates to the android16-6.12 and android15-6.6 branches and will expand support to newer GKI versions, such as the upcoming android17-6.18.
A common question with profile-guided optimization is whether it introduces stability risks. Because AutoFDO primarily influences compiler heuristics, such as function inlining and code layout, rather than altering the source code’s logic, it preserves the functional integrity of the kernel. This technology has already been proven at scale, serving as a standard optimization for Android platform libraries, ChromeOS, and Google’s own server infrastructure for years.
To further guarantee consistent behavior, we apply a “conservative by default” strategy. Functions not captured in our high-fidelity profiles are optimized using standard compiler methods. This ensures that the “cold” or rarely executed parts of the kernel behave exactly as they would in a standard build, preventing performance regressions or unexpected behaviors in corner cases.
We are currently deploying AutoFDO across the android16-6.12 and android15-6.6 branches. Beyond this initial rollout, we see several promising avenues to further enhance the technology:
Expanded Reach: We look forward to deploying AutoFDO profiles to newer GKI kernel versions and additional build targets beyond the current aarch64 support.
GKI Module Optimization: Currently, our optimization is focused on the main kernel binary (vmlinux). Expanding AutoFDO to GKI modules could bring performance benefits to a larger portion of the kernel subsystem.
Vendor Module Support: We are also interested in supporting AutoFDO for vendor modules built using the Driver Development Kit (DDK). With support already available in our build system (Kleaf) and profiling tools (simpleperf), this allows vendors to apply these same optimization techniques to their specific hardware drivers.
Broader Profile Coverage: There is potential to collect profiles from a wider range of Critical User Journeys (CUJs) to optimize them.
By bringing AutoFDO to the Android kernel, we’re ensuring that the very foundation of the OS is optimized for the way you use your device every day.






