Debugging Ruby processes in Kubernetes

Debugging non-containerized apps in production is dead easy: ssh to a host, rbspy, strace or gdb the process or run rails console to reproduce something in production.

Why Kubernetes makes it harder?

Running a process in a container makes debugging a bit harder: you need to first get into the container with docker exec (and don't forget to add --interactive --tty) and only inside the container you'll be able to inspect the running process.

What about using rbspy in Docker? That's even trickier, because the ptrace (a system call that rbspy relies on) is not allowed by default in a container. You'll have to run the container with --cap-add=SYS_PTRACE flag, and only then you'll be able to docker exec into it and use rbspy.

Tips

Now, let's move on to Kubernetes tips. I wrote this post as a collection of snippets that I use every day when I need to get into a running Ruby process and see what's happening.

$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
secretland-c796bf9df-gmt97   1/1       Running   0          13d
secretland-4a145b44d-6xw11   1/1       Running   0          13d

I have two replicas of the secretland app that I've built in the previous post.

Let's get in to container:

$ kubectl exec -i -t secretland-c796bf9df-gmt97 /bin/bash
root@secretland-c796bf9df-gmt97:/app# ls
Dockerfile  Gemfile  Gemfile.lock  README.md  Rakefile  app  bin  config  config.ru  db  lib  log  package.json  public  script  storage  test  tmp  vendor
root@secretland-c796bf9df-gmt97:/app# bin/rails console
Loading production environment (Rails 5.2.0.rc2)
irb(main):001:0>

Or exec to Rails console directly:

$ kubectl exec -i -t secretland-c796bf9df-gmt97 /app/bin/rails console
Loading production environment (Rails 5.2.0.rc2)
irb(main):001:0>

Rbspy

Rbspy is a sampling profiler for Ruby programs that was recently released by Julia Evans. Rbspy is an awesome tool that we've been missing in the Ruby world for years, and it makes me so happy that Julia worked on it.

Let's see how to use rbspy on a live process that runs in a Kubernetes Pod. First we need to know that rbspy relies on ptrace(3). To run a ptrace-enabled container, we need to give a container the SYS_PTRACE privilege. See securityContext.capabilities field in the Deployment spec:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: secretland
  labels:
    app: secretland
spec:
  selector:
    matchLabels:
      app: secretland
  template:
    metadata:
      labels:
        app: secretland
    spec:
      containers:
      - image: kirshatrov/secretland:v1
        name: rails
        ports:
        - containerPort: 3000
        securityContext:
          capabilities:
            add:
            - SYS_PTRACE # <-- the privilege

To emulate a slow endpoint, I've created a sample action that calculates Fibonacci number. That way, we'll be able to see where the process spends time in the Ruby land.

After the new Deployment spec is in place, we can exec to the web container and try rbspy. At least in my case, Puma process has pid equal 1.

$ kubectl exec -i -t secretland-8dc689458-jstp2 /bin/bash
root@secretland-8dc689458-jstp2:/app# rbspy record --pid 1
Time since start: 9s. Press Ctrl+C to stop.
Summary of profiling data so far:
% self  % total  name
100.00   100.00  <c function> - unknown
  0.00    79.41  block in start! - /usr/local/bundle/gems/puma-3.11.3/lib/puma/thread_pool.rb
  0.00    20.59  run_internal - /usr/local/bundle/gems/puma-3.11.3/lib/puma/reactor.rb
  0.00    20.59  block in run_in_thread - /usr/local/bundle/gems/puma-3.11.3/lib/puma/reactor.rb

The profile changes after I hit the /slowpath endpoint in browser:

% self  % total  name
 70.56   100.00  <c function> - unknown
 29.44    29.44  fibonacci - /app/app/controllers/helloworld_controller.rb
  0.00    67.89  block in start! - /usr/local/bundle/gems/puma-3.11.3/lib/puma/thread_pool.rb
  0.00    32.11  block in spawn_thread - /usr/local/bundle/gems/puma-3.11.3/lib/puma/thread_pool.rb
  0.00    30.97  process_client - /usr/local/bundle/gems/puma-3.11.3/lib/puma/server.rb
  0.00    30.97  block in run - /usr/local/bundle/gems/puma-3.11.3/lib/puma/server.rb
  0.00    29.44  tagged - /usr/local/bundle/gems/activesupport-5.2.0.rc2/lib/active_support/tagged_logging.rb
  0.00    29.44  slow - /app/app/controllers/helloworld_controller.rb
  0.00    29.44  serve - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_dispatch/routing/route_set.rb
  0.00    29.44  serve - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_dispatch/journey/router.rb
  0.00    29.44  send_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/basic_implicit_render.rb
  0.00    29.44  run_callbacks - /usr/local/bundle/gems/activesupport-5.2.0.rc2/lib/active_support/callbacks.rb
  0.00    29.44  process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/rescue.rb
  0.00    29.44  process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/rendering.rb
  0.00    29.44  process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/params_wrapper.rb
  0.00    29.44  process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/instrumentation.rb
  0.00    29.44  process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/abstract_controller/callbacks.rb

Yay it works!

Getting to Docker

What if you need to get directly to the Docker daemon? Describe the pod, see what Node it's running on, then ssh to that instance.

$ kubectl describe pod secretland-8dc689458-jstp2 | grep Node
Node:           gke-kirs-jobs-default-pool-4a145b44-t690/10.128.0.3
Node-Selectors:  <none>

$ gcloud compute ssh gke-kirs-jobs-default-pool-4a145b44-t690 --zone us-central1-a

kir@gke-kirs-jobs-default-pool-4a145b44-t690 ~ $ docker ps
CONTAINER ID        IMAGE                                                                                                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
fc10153238a0        kirshatrov/secretland@sha256:2e6d8341f51ebe7393d2a7c770c29fbaf959e3317b628d0dc5ebbb19c923d29c                            "rails server -b 0

I'm using Google Cloud, so instead of sshing directly I use gcloud compute ssh.

gdb

gdb might be useful for dumping MRI call stack, for instance when you want to find out why a Ruby process is stuck. See the script to dump call stack by my colleague Scott.

I haven't found a proper way to run gdb from a Kubernetes Pod yet, because gdb can't find Ruby's symbols:

$ kubectl exec -i -t secretland-8dc689458-jstp2 /bin/bash
root@secretland-8dc689458-jstp2:/app# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  1.5 709816 58080 ?        Ssl  Apr07   0:08 puma 3.11.3 (tcp://0.0.0.0:3000) [app]
root        43  0.0  0.0  18204  3280 ?        Ss   02:29   0:00 /bin/bash
root        53  0.0  0.0  36636  2808 ?        R+   02:31   0:00 ps aux
root@secretland-8dc689458-jstp2:/app# gdb --pid 1
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
...
Attaching to process 1
[New LWP 6]
...
[New LWP 16]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.

Though I have a feeling that with the recent release of rbspy I won't need to use gdb much anymore, because 1) unlike gdb, rbspy doesn't pause the process 2) rbspy is way more user friendly.

Please share any snippets that you find helpful and I'll include them in the post.

Written in April 2018.
Kir Shatrov

Kir Shatrov helps businesses to grow by scaling the infrastructure. He writes about software, scalability and the ecosystem. Follow him on Twitter to get the latest updates.