Profile Guided Optimizations for Go Applications

⏰ 4 Minutes 📅 Aug 1, 2023

The latest Go releases are absolutely stuffed with a number of fantastic new features, one of which, is the ability to significantly improve the performance of the hot paths within your applications using something called Profile Guided Optimizations.

Go isn’t generally considered a slouch when it comes to performance, however, this technique, also known as feedback-directed optimizations (FDO) - can help you to squeeze out every last ounce of performance out of your app which can be vital if you’re working in a high-performance domain.

How Does It Work?

So first of all, let’s try answering the question of how does this actually work?

This technique involes generating a profile of your application - with this generated profile, we can then run our code through the compiler again and this profile then helps to inform the compiler’s decisions when it comes to building a better optimized version of your application.

From the Go blog’s article, they’ve taken this approach and applied it to a set of Go programs that are representative of how Go is typically used in production and found that with PGO, they see the performance of their app improves by 2-7%.

This is a phenomenal tool to have in your toolbelt if you’re working on the kinds of applications that require every ounce of performance squeezed out of the underlying infrastructure. Even if you don’t require this level of performance, if you are running applications at any serious levels of scale then this could represent a fairly nice chunk off your overall infrastructure costs if you manage to cut 2-7% off across the board.

The Go team also expect these efficiency gains to improve over time as they lean more heavily into future optimizations.

Seeing it in Action

The Go team has shown us the way and now it’s time to walk the path for yourself.

Let’s use this example Go application that serves a /homepage endpoint and parses an index.html file:

package main

import (
	"fmt"
	"net/http"
	"text/template"
)

type HomePage struct {
	Title   string
	Content string
}

func main() {
	fmt.Println("Go PGO Tutorial")

	http.HandleFunc("/homepage", func(w http.ResponseWriter, r *http.Request) {
		tmpl, err := template.ParseFiles("index.html")
		if err != nil {
			w.WriteHeader(http.StatusInternalServerError)
			w.Write([]byte(err.Error()))
			return
		}

		tmpl.Execute(w, HomePage{
			Title:   "My Awesome Website",
			Content: "All my awesome content",
		})
	})
	http.ListenAndServe(":9000", nil)
}

The first thing we’ll want to do is to extract a CPU profile from this application as it’s running. In order to do that, we can leverage the pprof tool that comes bundled with your Go installation.

We’ll also need to ensure we link the net/http/pprof package to our list of imports:

import (
	"fmt"
	"net/http"
	"text/template"

	_ "net/http/pprof"
)

Note: We don’t have to modify our code here, importing the package in this example will automagically append the /debug HandleFunc definitions to the default mux router that we’re using in this example.

Generating a Profile For Our Code

Let’s kick off our application and then run this pprof tool now:

$ go run ./...

# In a separate terminal
$ curl -o cpu.pprof "http://localhost:9000/debug/pprof/profile?seconds=30"

As this pprof tool is running, we’ll have to place some form of load on our application to ensure that the code pathways we want optimized are getting profiled.

Open up a browser as you run this test and hit the http://localhost:9000/homepage endpoint a couple of times.

At the end of the 30 seconds, you should see a cpu.pprof file within your current directory which contains the profile information that we can then use to rebuild a more performant version of our application.

$ go build -pgo=cpu.pprof ./...

And there we have it! We should now have a profile-guide optimized binary that is more performant on our code hotpaths!

Incorporating this into your CI Systems

It’s a good idea to take this further when working with more serious applications. Traffic to your production applications can change in shape over time so it’s a good idea to generally follow this pattern:

  1. Deploy your un-optimized application to production.
  2. When you next want to deploy a new release of your system to production, you should collect a new profile live from your production environment
  3. Build your new application binary using this newly collected profile.
  4. Profit! (Possibly literally if you are seeing lower running costs 🤷‍♂️)

Considerations

Now, there are some considerations to bear in mind here.

  1. Including profile-guided optimizations in your build process can impact your build times. This makes sense though as it’s doing more processing to ensure that the profile is taken into consideration and hot-paths are better optimized.

  2. It should not make cold paths within your application any slower. Thankfully, we aren’t optimizing some of code at the expense of other parts.

  3. PGO may result in slightly larger binaries due to additional function inlining - if you’re working on say WASM or TinyGo and you’re really trying to reduce the size of the binaries you output, then it’s possible this tradeoff for additional performance may not be worth it in those cases. For the majority of other use-cases this is fairly negligible.

References