visit
Profiler is a dynamic performance analysis tool that provides critical
execution insights in various dimensions which enable resolving
performance issues, locating memory leaks, thread contention and more.
Support for profiling the standard testing package. As an example, the following command runs all benchmarks and writes CPU and memory profile to
cpu.prof
and mem.prof
:go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
If you need to have a live profile in a long-running service, package
net/http/pprof
is your ultimate solution. Solely import the package in your code and you are good to go:import _ "net/http/pprof"
func init() {
http.HandleFunc("/debug/pprof/", Index)
http.HandleFunc("/debug/pprof/cmdline", Cmdline)
http.HandleFunc("/debug/pprof/profile", Profile)
http.HandleFunc("/debug/pprof/symbol", Symbol)
http.HandleFunc("/debug/pprof/trace", Trace)
}
As you can see the package initialization registers handlers for the given URL in
DefaultServeMux
. These handlers write the profile into a http.ResponseWriter
which you can download. Now the only thing you need to do is running a HTTP server with a nil
handler:func init() {
go func() {
http.ListenAndServe(":1234", nil)
}()
}
Note that if you already run your http server with
DefaultServeMux
, you don’t need to run another server and profiles are available on your server.Now, you can download live profile data from the
/debug/pprof/
URL.Using
runtime/pprof
, You can also profile directly within the code. For example you can start a CPU profile using pprof.StartCPUProfile(io.Writer)
and then stop it by pprof.StopCPUProfile()
. Now you have the CPU profile written to the given io.Writer
.But wait, there is an easier option. has developed a package named
profile
which can make profiling easy as pie. First you have to get the package using following command (or using Go modules):go get github.com/pkg/profile
func main() {
defer profile.Start(profile.ProfilePath(".")).Stop()
// do something
}
Above code will write CPU profile into
cpu.pprof
in working directory. The default has been set to CPU profiling but you can pass options to profile.Start()
to be able to get other profiles as well.// CPUProfile enables cpu profiling. Note: Default is CPU
defer profile.Start(profile.CPUProfile).Stop()
// GoroutineProfile enables goroutine profiling.
// It returns all Goroutines alive when defer occurs.
defer profile.Start(profile.GoroutineProfile).Stop()
// BlockProfile enables block (contention) profiling.
defer profile.Start(profile.BlockProfile).Stop()
// ThreadcreationProfile enables thread creation profiling.
defer profile.Start(profile.ThreadcreationProfile).Stop()
// MemProfileHeap changes which type of memory profiling to
// profile the heap.
defer profile.Start(profile.MemProfileHeap).Stop()
// MemProfileAllocs changes which type of memory to profile
// allocations.
defer profile.Start(profile.MemProfileAllocs).Stop()
// MutexProfile enables mutex profiling.
defer profile.Start(profile.MutexProfile).Stop()
is a tool for visualization and analysis of profiling data. It reads a collection of profiling samples in
profile.proto
format and generates reports to visualize and help analyze the data. It can generate both text and graphical reports.All of the ways that mentioned above are using
runtime/pprof
under the hood and this package writes runtime profiling data in the format expected by the pprof visualization tool.For a better understanding, I’ve written the below code to profile. It is a simple HTTP server which handles requests on
/log
URL. Every request comes in, the logHandler
function will declare a chan int
and create a child Goroutine at line 16. The child will decode the given JSON log within the Request.Body
and if it was successful it does a time consuming (~400 millisecond) write and then sends http.StatusOK
to ch
. Meanwhile the logHandler
function blocks on select statement (line 27) until either when the child sends the result to ch
(line 24) or when a timeout happens (line 30).import (
"encoding/json"
"math/rand"
"net/http"
_ "net/http/pprof"
"time"
)
func main() {
http.HandleFunc("/log", logHandler)
http.ListenAndServe(":8080", nil)
}
func logHandler(w http.ResponseWriter, r *http.Request) {
ch := make(chan int)
go func() {
obj := make(map[string]float64)
if err := json.NewDecoder(r.Body).Decode(&obj); err != nil {
ch <- http.StatusBadRequest
return
}
// simulation of a time consuming process like writing logs into db
time.Sleep(time.Duration(rand.Intn(400)) * time.Millisecond)
ch <- http.StatusOK
}()
select {
case status := <-ch:
w.WriteHeader(status)
case <-time.After(200 * time.Millisecond):
w.WriteHeader(http.StatusRequestTimeout)
}
}
Summary:
Total: 0.8292 secs
Slowest: 0.2248 secs
Fastest: 0.0039 secs
Average: 0.1574 secs
Requests/sec: 241.1882Latency
distribution:
10% in 0.0409 secs
25% in 0.1143 secs
50% in 0.2007 secs
75% in 0.2013 secs
90% in 0.2093 secs
95% in 0.2187 secs
99% in 0.2245 secsStatus code
distribution:
[200] 87 responses
[408] 113 responses
You can use
go tool pprof
command or install to use it directly.go get -u github.com/google/pprof
pprof -alloc_objects //:8080/debug/pprof/allocs
Now enter the command
top
, the output will be the top memory consumers.Adding the
-nodefraction=0
flag or assigning it directly inside the attractive shell will include all data of the profile in the result.In the above output, you can see two columns,
flat
and cum
. Flat means only by this function, while cum (cumulative) means by this function and functions called down the stack. For better understanding, let’s assume the following function, the flat time of function A
is 4s and the cum is 11s.func A() {
B() // takes 1s
DO STH DIRECTLY // takes 4s
C() // takes 6s
}
An important note to consider is that the difference between the allocations and heap profile is the way the pprof tool reads them at start time. Allocations profile will start pprof in a mode which displays the total number of allocated objects since the program began which includes garbage collected ones as well, but heap profile will start in a mode that displays number of live allocated objects which does not include garbage collected bytes.
That means in the pprof tool you can change the mode regardless of the
profile type (heap or allocations). It can be done using the following values. You can either set them as a pprof flag or assign them to
sample_index
inside the pprof interactive shell:inuse_space
: amount of memory allocated and not released yet (heap)inuse_objects
: amount of objects allocated and not released yet (heap)alloc_space
: total amount of memory allocated (regardless of released)alloc_objects
: total amount of objects allocated (regardless of released)You can generate a graph of your profile in an SVG format, and open it
with a web browser. The following command will request for a 5s CPU
profile and will launch a browser with an SVG file.
pprof -web //:8080/debug/pprof/profile?seconds=5
One of the useful pprof features is the web interface. If the
-http
flag is specified, pprof starts a web server at the specified host:por that provides an interactive web-based interface to pprof.pprof -http :8080 //:8080/debug/pprof/goroutine
go get -u github.com/bcicen/grmon
grmon -host :8080
It seems the
logHandler
function blocks on a chan send
action. After reviewing the code I found the bug. It has occurred due to sending into an unguaranteed channel. I’ve fixed that by turn the ch
to a buffered one channel (line 15). It was the very same mistake that I have mentioned .CPU Profiler
The Go CPU profiler uses a signal to record code execution statistics. Once the signal got registered, it will deliver every specified time interval. This timer unlike typical timers will increment when the CPU is executing the process. The signal will interrupt code execution and make it possible to see which code was interrupted.When the function is called, the
SIGPROF
signal handler will be registered to call every 10 milliseconds interval by default (100 Hz). On Unix, it uses a syscall to set the signal timer.On every invocation signal, the handler will trace back the execution by unwinding it from the current value. This will generate a stack trace and increment its hit count. The whole process will result in a list of stack traces grouped by the number of times each stack trace is seen.After the profiler is stopped, the stack traces will be converted to pprof-supported stack frames with source file names, line numbers and function names.Memory Profiler
The memory profiler samples heap allocations. It will show function calls allocations. Recording all allocation and unwinding thestack trace would be expensive, therefore a sampling technique is used.
The sampling process relies on a pseudo random number generator based on to sample only a fraction of allocations. The generated numbers define the distance between samples in terms of allocated memory size. This means that only allocations that cross the next random sampling point will be sampled.The sampling rate defines the mean of exponential distribution . The default value is 512 KB which is specified by the
runtime.MemProfileRate
. That means to tune the size of the fraction to be sampled, you have to change runtime.MemProfileRate
value. Setting the value to 1, will include every allocated block in the profile. Obviously this will slow down your application.Stack traces with corresponding sampled allocations information is available any time, since the memory allocation profiler is always active. Setting the
runtime.MemProfileRate
to 0, will turn off the memory sampling entirely.Once the stack traces got ready, those will be converted to pprof-supported stack frames.Also published .