From 206b3360dc2e0c9cf168c1e968845d1d5b2cef79 Mon Sep 17 00:00:00 2001 From: Tolya Korniltsev Date: Fri, 21 Jul 2023 15:35:01 +0700 Subject: [PATCH 1/2] fix(ebpf): use grafana agent. copypaste doc from grafana agent --- .../configure-client/language-sdks/ebpf.md | 217 ++++++++++++++---- 1 file changed, 176 insertions(+), 41 deletions(-) diff --git a/docs/sources/configure-client/language-sdks/ebpf.md b/docs/sources/configure-client/language-sdks/ebpf.md index f2baf89184..c2b8c3cfeb 100644 --- a/docs/sources/configure-client/language-sdks/ebpf.md +++ b/docs/sources/configure-client/language-sdks/ebpf.md @@ -30,76 +30,211 @@ For the eBPF integration to work you'll need: ### Step 1: Add the helm repo ```shell -helm repo add pyroscope-io https://pyroscope-io.github.io/helm-chart +helm repo add grafana https://grafana.github.io/helm-charts +helm repo update ``` ### Step 2: Install pyroscope agent +```yaml +agent: + mode: 'flow' + configMap: + create: true + content: | + discovery.kubernetes "local_pods" { + selectors { + field = "spec.nodeName=" + env("HOSTNAME") + role = "pod" + } + role = "pod" + } + pyroscope.ebpf "instance" { + forward_to = [pyroscope.write.endpoint.receiver] + targets = discovery.kubernetes.local_pods.targets + } + pyroscope.write "endpoint" { + endpoint { + basic_auth { + password = "" + username = "" + } + url = "" + } + } + + securityContext: + privileged: true + runAsGroup: 0 + runAsUser: 0 +``` +Replace the `` placeholder with the appropriate server URL. This could be the Grafana Cloud URL or your own custom Phlare server URL. + +If you need to send data to Grafana Cloud, you'll have to configure HTTP Basic authentication. Replace `` with your Grafana Cloud stack user and `` with your Grafana Cloud API key. + ```shell -helm install pyroscope-ebpf pyroscope-io/pyroscope-ebpf +helm install pyroscope-ebpf grafana/grafana-agent -f values.yaml ``` It will install pyroscope eBPF agent on all of your nodes and start profiling applications across your cluster. -## Running eBPF profiler from binary -```shell -export PYROSCOPE_APPLICATION_NAME=my.ebpf.program -export PYROSCOPE_SERVER_ADDRESS=http://address-of-pyroscope-server:4040/ -export PYROSCOPE_SPY_NAME=ebpfspy -# optionally, if authentication is enabled, specify the API key: -# export PYROSCOPE_AUTH_TOKEN={YOUR_API_KEY} +## Configuration -# to wrap an existing program and profile it -sudo -E pyroscope exec mongod +The component configures and starts a new ebpf profiling job to collect performance profiles from the current host. -# to profile the whole system -sudo -E pyroscope ebpf -``` +You can use the following arguments to configure a `pyroscope.ebpf`. Only the +`forward_to` and `targets` fields are required. Omitted fields take their default +values. -## Dealing with `[unknowns]` +| Name | Type | Description | Default | Required | +|---------------------------|--------------------------|--------------------------------------------------------------|---------|----------| +| `targets` | `list(map(string))` | List of targets to group profiles by container id | | yes | +| `forward_to` | `list(ProfilesReceiver)` | List of receivers to send collected profiles to. | | yes | +| `collect_interval` | `duration` | How frequently to collect profiles | `15s` | no | +| `sample_rate` | `int` | How many times per second to collect profile samples | 97 | no | +| `pid_cache_size` | `int` | The size of the pid -> proc symbols table LRU cache | 32 | no | +| `build_id_cache_size` | `int` | The size of the elf file build id -> symbols table LRU cache | 64 | no | +| `same_file_cache_size` | `int` | The size of the elf file -> symbols table LRU cache | 8 | no | +| `container_id_cache_size` | `int` | The size of the pid -> container ID table LRU cache | 1024 | no | +| `collect_user_profile` | `bool` | A flag to enable/disable collection of userspace profiles | true | no | +| `collect_kernel_profile` | `bool` | A flag to enable/disable collection of kernelspace profiles | true | no | -eBPF relies on having debugging symbols available for each program installed in your system. If you don't have those you'll see a lot of stacktraces full of `[unknown]`s. On most systems you can get debugging symbols for most packages with `debuginfo-install` command: +## Exported fields -```shell -sudo debuginfo-install -y -``` +`pyroscope.ebpf` does not export any fields that can be referenced by other +components. -## Configuration +## Component health + +`pyroscope.ebpf` is only reported as unhealthy if given an invalid +configuration. + +## Debug information + +* `targets` currently tracked active targets. +* `pid_cache` per process elf symbol tables and their sizes in symbols count. +* `elf_cache` per build id and per same file symbol tables and their sizes in symbols count. + +## Debug metrics + +* `pyroscope_fanout_latency` (histogram): Write latency for sending to direct and indirect components. +* `pyroscope_ebpf_active_targets` (gauge): Number of active targets the component tracks. +* `pyroscope_ebpf_profiling_sessions_total` (counter): Number of profiling sessions completed. +* `pyroscope_ebpf_profiling_sessions_failing_total` (counter): Number of profiling sessions failed. +* `pyroscope_ebpf_pprofs_total` (counter): Number of pprof profiles collected by the ebpf component. + +## Profile collecting behavior + +The `pyroscope.ebpf` component collects stack traces associated with a process running on the current host. +You can use the `sample_rate` argument to define the number of stack traces collected per second. The default is 97. + +The following labels are automatically injected into the collected profiles if you have not defined them. These labels +can help you pin down a profiling target. + +| Label | Description | +|--------------------|----------------------------------------------------------------------------------------------------------------------------------| +| `service_name` | Pyroscope service name. It's automatically selected from discovery meta labels if possible. Otherwise defaults to `unspecified`. | +| `__name__` | pyroscope metric name. Defaults to `process_cpu`. | +| `__container_id__` | The container ID derived from target. | + +### Privileges + +You are required to run the agent as root and inside host pid namespace in order to `pyroscope.ebpf` component to work. +See helm example below how to do it with helm. + +### Container ID + +Each collected stack trace is then associated with a specified target from the targets list, determined by a +container ID. This association process involves checking the `__container_id__`, `__meta_docker_container_id`, +and `__meta_kubernetes_pod_container_id` labels of a target against the `/proc/{pid}/cgroup` of a process. -All parameters below are also supported as CLI arguments, a full list can be accessed via `pyroscope ebpf --help`. For brevity only environment variables are listed. +If a corresponding container ID is found, the stack traces are aggregated per target based on the container ID. +If a container ID is not found, the stack trace is associated with a `default_target`. -* `PYROSCOPE_KUBERNETES_NODE` Set to current k8s Node.nodeName for service discovery and labeling -* `PYROSCOPE_ONLY_SERVICES` Ignore processes unknown to service discovery -* `PYROSCOPE_SYMBOL_CACHE_SIZE` Max size of symbols cache (1 entry per process) +Any stack traces not associated with a listed target are ignored. -| env var | default | description | -| -------------------------- | -------------------------------- | ---------------------------------------------- | -| `PYROSCOPE_KUBERNETES_NODE` | `""` | Used by service discovery. It's automatically set in the Helm Chart. | -| `PYROSCOPE_ONLY_SERVICES` | `false` | Ignore processes unknown to service discovery. In a Kubernetes cluster it ignores processes like `containerd, runc, kubelet` etc | -| `PYROSCOPE_SYMBOL_CACHE_SIZE` | `256` | Max size of symbols cache (1 entry per process). Change this value if you’re experiencing memory pressure or have many individual services. | +### Service name -## Sending data to Grafana Cloud or Phlare with Pyroscope eBPF integration +The special label `service_name` is required and must always be present. If it's not specified, it is +attempted to be inferred from multiple sources: -Starting with [weekly-f8](https://hub.docker.com/r/grafana/phlare/tags) you can ingest pyroscope profiles directly to phlare. +- `__meta_kubernetes_pod_annotation_pyroscope_io_service_name` which is a `pyroscope.io/service_name` pod annotation. +- `__meta_kubernetes_namespace` and `__meta_kubernetes_pod_container_name` +- `__meta_docker_container_name` + +If `service_name` is not specified and could not be inferred, it is set to `unspecified`. + +## Troubleshooting unknown symbols + +Symbols are extracted from various sources, including: + +- The `.symtab` and `.dynsym` sections in the ELF file. +- The `.symtab` and `.dynsym` sections in the debug ELF file. +- The `.gopclntab` section in Go language ELF files. + +The search for debug files follows [gdb algorithm](https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html). +For example, if the profiler wants to find the debug file +for `/lib/x86_64-linux-gnu/libc.so.6` +with a `.gnu_debuglink` set to `libc.so.6.debug` and a build ID `0123456789abcdef`. The following paths are examined: + +- `/usr/lib/debug/.build-id/01/0123456789abcdef.debug` +- `/lib/x86_64-linux-gnu/libc.so.6.debug` +- `/lib/x86_64-linux-gnu/.debug/libc.so.6.debug` +- `/usr/lib/debug/lib/x86_64-linux-gnu/libc.so.6.debug` + +### Dealing with unknown symbols + +Unknown symbols in the profiles you’ve collected indicate that the profiler couldn't access an ELF file associated with a given address in the trace. + +This can occur for several reasons: + +- The process has terminated, making the ELF file inaccessible. +- The ELF file is either corrupted or not recognized as an ELF file. +- There is no corresponding ELF file entry in `/proc/pid/maps` for the address in the stack trace. + +### Addressing unresolved symbols + +If you only see module names (e.g., `/lib/x86_64-linux-gnu/libc.so.6`) without corresponding function names, this +indicates that the symbols couldn't be mapped to their respective function names. + +This can occur for several reasons: + +- The binary has been stripped, leaving no .symtab, .dynsym, or .gopclntab sections in the ELF file. +- The debug file is missing or could not be located. + +To fix this for your binaries, ensure that they are either not stripped or that you have separate +debug files available. You can achieve this by running: ```bash -./pyroscope ebpf \ - --application-name=phlare.ebpf.app \ - --server-address= \ - --basic-auth-user="" \ - --basic-auth-password="" \ - --tenant-id= \ +objcopy --only-keep-debug elf elf.debug +strip elf -o elf.stripped +objcopy --add-gnu-debuglink=elf.debug elf.stripped elf.debuglink ``` -To configure eBPF integration to send data to Phlare, replace the `` placeholder with the appropriate server URL. This could be the Grafana Cloud URL or your own custom Phlare server URL. +For system libraries, ensure that debug symbols are installed. On Ubuntu, for example, you can install debug symbols +for `libc` by executing: -If you need to send data to Grafana Cloud, you'll have to configure HTTP Basic authentication. Replace `` with your Grafana Cloud stack user and `` with your Grafana Cloud API key. +```bash +apt install libc6-dbg +``` + +### Understanding flat stack traces + +If your profiles show many shallow stack traces, typically 1-2 frames deep, your binary might have been compiled without frame pointers. + +To compile your code with frame pointers, include the `-fno-omit-frame-pointer` flag in your compiler options. + +### Profiling interpreted languages + +Profiling interpreted languages like Python, Ruby, JavaScript, etc., is not ideal using this implementation. +The JIT-compiled methods in these languages are typically not in ELF file format, demanding additional steps for +profiling. For instance, using perf-map-agent and enabling frame pointers for Java. -If your Phlare server has multi-tenancy enabled, you'll need to configure a tenant ID. Replace `` with your Phlare tenant ID. +Interpreted methods will display the interpreter function’s name rather than the actual function. ## Examples Check out the following resources to learn more about eBPF profiling: - [The pros and cons of eBPF profiling](https://pyroscope.io/blog/ebpf-profiling-pros-cons) blog post (for more context on flamegraphs below) - [Demo](https://demo.pyroscope.io/?query=rideshare-cluster-ebpf.cpu%7B%7D) showing breakdown of our examples cluster -- [docker-compose example](https://github.com/github/pyroscope/blob/main/examples/ebpf) in our repository +- Grafana agent documnetation for [pyroscope.ebpf](/docs/agent/next/flow/reference/components/pyroscope.ebpf/), [pyroscope.write](/docs/agent/next/flow/reference/components/pyroscope.write/), [discovery.kubernetes](/docs/agent/next/flow/reference/components/discovery.kubernetes/), [discovery.relabel](/docs/agent/next/flow/reference/components/discovery.relabel/) components From bb0184a75adf838fb2ab759d52f55a4dd0b214de Mon Sep 17 00:00:00 2001 From: Tolya Korniltsev Date: Fri, 21 Jul 2023 15:36:27 +0700 Subject: [PATCH 2/2] hostpid --- docs/sources/configure-client/language-sdks/ebpf.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/sources/configure-client/language-sdks/ebpf.md b/docs/sources/configure-client/language-sdks/ebpf.md index c2b8c3cfeb..07772a4c31 100644 --- a/docs/sources/configure-client/language-sdks/ebpf.md +++ b/docs/sources/configure-client/language-sdks/ebpf.md @@ -67,6 +67,9 @@ agent: privileged: true runAsGroup: 0 runAsUser: 0 + +controller: + hostPID: true ``` Replace the `` placeholder with the appropriate server URL. This could be the Grafana Cloud URL or your own custom Phlare server URL.