William Liu

Linux Tracing

I read Julia Evans’ post on Linux Tracing Systems and was inspried to learn more about the topic of tracing and debugging.

Summary (Linux Tracing Systems)

Linux Tracing Systems and how they fit together:

Data Sources

Probes

Probes are embedded “markers” that can be observed by say a DTrace or SystemTap script so you can easily monitor what the process on a system is really doing.

k probes - k stands for kernel. KProbes is a way to debug the Linux kernel by monitoring events inside a production system. u probes - u stands for user. UProbes is user-level dynamic tracing; it lets you trace user-level functions

Tracepoints

A tracepoint is something you compile into your program. When someone using your program wnats to see what happens when that tracepoint is hit and extract data, they can ‘enable’/’activate’ that tracepoint.

  1. Compile a tracepoint into your program
  2. As long as nobody activates it, no overhead
  3. Your users can activate the tracepoints with tools (e.g. ftrace / dtrace) to know what your program is doing

Tracepoints for userspace programs

Tracepoints for the kernel

Ways to extract tracing data

There’s a few ways to collect tracing data. These 3 are built into the Linux kernel:

Other ways to extract data:

Frontends

Frontends are tools to help you:

These include:

perf trace - perf can use perf_event_open and also ftrace to record tracing data. You can use perf trace to trace syscalls. ftrace - no frontend, just cat this text file trace-cmd - command line user interface to ftrace perf-tools - a collection of tools by Brendan Gregg for perf and ftrace. check out the kprobe and uprobe scripts.

Additional frontends:

bcc for eBPF - Python framework to help write eBPF programs at: https://github.com/iovisor/bcc

strace command

The strace command traces system calls and signals. Strace is used in simple examples like running a command until it exits, to diagnostics (like profiling) and debugging like finding race conditions.

strace example

When we look up the manual page (man strace), we see this example:

$strace cat /dev/null

The man pages output says that the output should be

open("/dev/null", O_RDONLY) = 3

When you do the command on an actual system, the output looks like this…

execve("/bin/cat", ["cat", "/dev/null"], [/* 51 vars */]) = 0
brk(NULL)
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE)......
...
open("/dev/null", O_RDONLY) = 3

strace results explained

So what is going on?

So why use strace?

Sometimes you need to investigate production systems where you cannot afford to run programs under a debugger.

strace options

Here are a few common uses of strace:

strace with timings and output to file

To get timings and output to a file: strace -ttTo data.txt cat /dev/null

The data.txt file looks like:

18:12:58.879824 execve("/bin/cat", ["cat", "/dev/null"], [/* 51 vars */]) = 0 <0.000479>
18:12:58.880708 brk(NULL)               = 0x1caa000 <0.000021>
18:12:58.880928 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000028>
18:12:58.881069 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3fde9e1000 <0.000025>
18:12:58.881233 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000025>
18:12:58.881392 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000029>
18:12:58.881532 fstat(3, {st_mode=S_IFREG|0644, st_size=147141, ...}) = 0 <0.000020>
18:12:58.881647 mmap(NULL, 147141, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3fde9bd000 <0.000022>
18:12:58.881749 close(3)                = 0 <0.000018>
18:12:58.881852 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000020>
18:12:58.881969 open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000025>
18:12:58.882076 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832 <0.000022>
18:12:58.882183 fstat(3, {st_mode=S_IFREG|0755, st_size=1868984, ...}) = 0 <0.000017>
18:12:58.882287 mmap(NULL, 3971488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3fde3f4000 <0.000025>
18:12:58.882391 mprotect(0x7f3fde5b4000, 2097152, PROT_NONE) = 0 <0.000036>
18:12:58.882501 mmap(0x7f3fde7b4000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7f3fde7b4000 <0.000030>
18:12:58.882618 mmap(0x7f3fde7ba000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3fde7ba000 <0.000023>
18:12:58.882728 close(3)                = 0 <0.000018>
18:12:58.882848 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3fde9bc000 <0.000020>
18:12:58.882950 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3fde9bb000 <0.000020>
18:12:58.883050 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3fde9ba000 <0.000018>
18:12:58.883148 arch_prctl(ARCH_SET_FS, 0x7f3fde9bb700) = 0 <0.000018>
18:12:58.883363 mprotect(0x7f3fde7b4000, 16384, PROT_READ) = 0 <0.000025>
18:12:58.883469 mprotect(0x60b000, 4096, PROT_READ) = 0 <0.000024>
18:12:58.883573 mprotect(0x7f3fde9e3000, 4096, PROT_READ) = 0 <0.000026>
18:12:58.883672 munmap(0x7f3fde9bd000, 147141) = 0 <0.000036>
18:12:58.883942 brk(NULL)               = 0x1caa000 <0.000017>
18:12:58.884035 brk(0x1ccb000)          = 0x1ccb000 <0.000021>
18:12:58.884139 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000030>
18:12:58.884258 fstat(3, {st_mode=S_IFREG|0644, st_size=2981280, ...}) = 0 <0.000019>
18:12:58.884359 mmap(NULL, 2981280, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3fde11c000 <0.000025>
18:12:58.884472 close(3)                = 0 <0.000018>
18:12:58.884637 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0 <0.000020>
18:12:58.884795 open("/dev/null", O_RDONLY) = 3 <0.000031>
18:12:58.884929 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 <0.000018>
18:12:58.885035 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000019>
18:12:58.885134 mmap(NULL, 139264, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3fde9bf000 <0.000022>
18:12:58.885241 read(3, "", 131072)     = 0 <0.000019>
18:12:58.885339 munmap(0x7f3fde9bf000, 139264) = 0 <0.000030>
18:12:58.885445 close(3)                = 0 <0.000022>
18:12:58.885558 close(1)                = 0 <0.000018>
18:12:58.885652 close(2)                = 0 <0.000018>
18:12:58.885748 exit_group(0)           = ?
18:12:58.885936 +++ exited with 0 +++

The command: strace -c ls -l output looks like:

will@xps ~ $ strace -c ls -l
total 96
drwxr-xr-x  2 will will  4096 May 13 12:02 Desktop
drwxr-xr-x  2 will will  4096 May 23 10:58 Documents
drwxr-xr-x  6 will will 12288 Jul  9 15:53 Downloads
drwxrwxr-x  2 will will  4096 May 19 08:12 Envs
-rw-r--r--  1 will will  8980 May 13 12:01 examples.desktop
drwxrwxr-x 20 will will  4096 Jul  9 17:06 GitHub
drwxrwxr-x  6 will will  4096 Jun  9 07:02 go
drwxr-xr-x  2 will will  4096 May 19 16:54 Music
drwxrwxr-x 12 will will  4096 Jul  5 18:28 nltk_data
drwxrwxr-x  2 will will  4096 Jul  9 18:20 notes
-rw-rw-r--  1 will will  1773 Jun 20 09:55 pgadmin.log
drwxr-xr-x  2 will will  4096 May 13 12:02 Pictures
drwxr-xr-x  2 will will  4096 May 13 12:02 Public
drwxrwxr-x  2 will will  4096 Jun  6 21:37 Snapshots
drwxr-xr-x  2 will will  4096 May 13 12:02 Templates
drwxr-xr-x  2 will will  4096 May 13 12:02 Videos
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.27    0.000713         357         2           getdents
  2.73    0.000020           1        16           lstat
  0.00    0.000000           0        17           read
  0.00    0.000000           0        17           write
  0.00    0.000000           0        27         5 open
  0.00    0.000000           0        28           close
  0.00    0.000000           0        15           stat
  0.00    0.000000           0        24           fstat
  0.00    0.000000           0         5           lseek
  0.00    0.000000           0        35           mmap
  0.00    0.000000           0        20           mprotect
  0.00    0.000000           0         5           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           ioctl
  0.00    0.000000           0        11        11 access
  0.00    0.000000           0         4           socket
  0.00    0.000000           0         4         4 connect
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         2         2 statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0        30        30 getxattr
  0.00    0.000000           0        16        16 lgetxattr
  0.00    0.000000           0         1           futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.000733                   292        68 total

An example of filtering output looks like:

will@xps ~ $ strace -e access ls -l
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
total 96
drwxr-xr-x  2 will will  4096 May 13 12:02 Desktop
drwxr-xr-x  2 will will  4096 May 23 10:58 Documents
drwxr-xr-x  6 will will 12288 Jul  9 15:53 Downloads
drwxrwxr-x  2 will will  4096 May 19 08:12 Envs
-rw-r--r--  1 will will  8980 May 13 12:01 examples.desktop
drwxrwxr-x 20 will will  4096 Jul  9 17:06 GitHub
drwxrwxr-x  6 will will  4096 Jun  9 07:02 go
drwxr-xr-x  2 will will  4096 May 19 16:54 Music
drwxrwxr-x 12 will will  4096 Jul  5 18:28 nltk_data
drwxrwxr-x  2 will will  4096 Jul  9 18:20 notes
-rw-rw-r--  1 will will  1773 Jun 20 09:55 pgadmin.log
drwxr-xr-x  2 will will  4096 May 13 12:02 Pictures
drwxr-xr-x  2 will will  4096 May 13 12:02 Public
drwxrwxr-x  2 will will  4096 Jun  6 21:37 Snapshots
drwxr-xr-x  2 will will  4096 May 13 12:02 Templates
drwxr-xr-x  2 will will  4096 May 13 12:02 Videos
+++ exited with 0 +++

strace specific process id

strace -ewrite -p $PID

g++

g++ is a nix based C++ compiler.

g++ usage

GNU Project Debugger (gdb)

The GNU Project Debugger (gdb) is a debugger that helps you see what is going on inside another program while it executes.