Debugging With Strace in Linux

Debugging With Strace in Linux

In this blog, we’ll explore:

  • Strace, a powerful Linux debugging tool.
  • A practical debugging example using strace.
  • Why strace is not recommended in production.
If you’ve never used strace, don’t worry, you’ll probably meet it in an interview when you least expect it. 😀

Troubleshooting is an important part of every DevOps engineer's or SRE's job.

It could involve:

  • Database performance bottlenecks
  • Web server performance issues
  • Identifying slow requests
  • Excessive resource consumption

and more…

The strace utility is useful for debugging various application issues by tracing system calls (syscalls).

This blog is for educational purposes only. Do not use strace in production without understanding its effects on performance.


Let’s say the Nginx app wants to write a log entry to a file on the disk. Since Nginx cannot directly access hardware like disks, it must use syscalls to request kernel services.

For example, Nginx might use the following syscalls:

  • open() – Opens log files once when Nginx starts - (not per write)
  • write() – Writes the log entries to the file in buffered batches.
  • close() – Closes the file descriptors during log rotation/shutdown

That means Nginx is programmed to use syscalls whenever it needs to interact with the operating system for tasks like file operations, network handling, or process management.

If you need to diagnose an issue related to an application's performance, networking, or other system interactions, tracing syscalls can help you identify the problem.

What is strace?

Strace is a linux command-line utility that intercepts and records all the system calls a program (eg: nginx) makes and any signals it receives.

Lets look at a simple example.

Lets look at a summary of system calls made when running the ls command.

Overall, the command made 78 system calls with 7 errors total.

Also, the execution time was extremely fast (all individual calls show 0.000000 seconds)

Practical Debugging Example

To look at a practical debugging example using strace, I had to simulate a real-world scenario of a slow-responding page.

To simulate this scenario, here’s what I did:

  • Deployed Nginx with PHP-FPM.
  • Created a simple PHP page (/slow) that responds only after 2 seconds. This is just to simulate a real-world scenario of a slow response page.

The next step is to capture the syscalls made by Apache when accessing the http://localhost/slow page.

Tracing nginx syscalls

To trace the syscalls, we first need to get the Nginx worker process ID.

ps aux | grep nginx

Now, we need to attach strace to the process ID to trace system calls.

Since we are investigating a slow-responding page, our focus should be on the syscalls involved in handling HTTP requests and responses.

recvfrom syscall receives data from a network socket (e.g., an HTTP request from a client).

I am going to trace only the recvfrom() syscall using the following command. Here, 13173 is the Nginx worker process ID.

sudo strace -e trace=recvfrom \
        -p 13173 -f -T -tt -o strace.log

Now that strace is tracing the calls, I generated 10 requests with 2 concurrent users to capture the logs in strace.log using apache benchmark utility.

ab -n 10 -c 2 http://localhost/slow

Now that enough requests have been generated, the goal is to debug the slow response issue.

If we check the strace.log, we will find complex log entries containing recvfrom syscalls, similar to the following.

Analysis of recvfrom() Log for Slow Requests

Lets look at the first two entries.

13173 10:23:41.655745 recvfrom(5, "GET /slow HTTP/1.0\r\nHost: localh"..., 1024, 0, NULL, NULL) = 81 <0.000011>

13173 10:23:41.656706 recvfrom(12, "GET /slow.php HTTP/1.0\r\nHost: lo"..., 1024, 0, NULL, NULL) = 104 <0.000005>


  1. Client requests /slow.
  2. Nginx forwards it to /slow.php for processing.
  3. Time: 10:23:41.656706.

The next log entry is the response from php.

13173 10:23:43.659573 recvfrom(13, "\1\6\0\1\0,\4\0Content-type: text/html;"..., 4096, 0, NULL, NULL) = 72 <0.000041>


  1. Response received from PHP after 2 seconds (10:23:43.659573).
  2. Delay: 2.003 seconds.
  3. This suggests slow PHP execution (Which we simulated)

Note: This is just an example of identifying slowness in an HTTP request. Depending on the issue, the relevant syscalls and the approach to analysis may vary.

Strace & Performance

Strace is not advised to be used in production environments.

One of the problems with strace is that it slows down applications significantly, especially in production.

It uses ptrace(), which pauses the target process twice per system call (entry and exit). It could even lead to application crashes.

This frequent pausing and context-switching adds high overhead to applications.

You can read more about it from Brendan Gregg's Blog

But eBPF based tools lilke bpftrace have minimal overhead.

Wrapping Up

This is just an example to show how to use strace to debug system calls. In real-world scenarios, the syscalls and issues you debug will be different.

Also, strace-related questions are common in interviews for product companies or roles that require Linux knowledge.

Hope this blog was helpful.

Want to Stay Ahead in DevOps & Cloud? Join Free Newsletter Below.

About the author
Bibin Wilson

Bibin Wilson

Bibin Wilson (authored over 300 tech tutorials) is a cloud and DevOps consultant with over 12+ years of IT experience. He has extensive hands-on experience with public cloud platforms and Kubernetes.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to DevOpsCube – Easy DevOps, SRE Guides & Reviews.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.