Debugging non-containerized apps in production is dead easy: ssh to a host, rbspy,
gdb the process or run
rails console to reproduce something in production.
Why Kubernetes makes it harder?
Running a process in a container makes debugging a bit harder: you need to first get into the container with
docker exec (and don't forget to add
--interactive --tty) and only inside the container you'll be able to inspect the running process.
What about using rbspy in Docker? That's even trickier, because the ptrace (a system call that rbspy relies on) is not allowed by default in a container. You'll have to run the container with
--cap-add=SYS_PTRACE flag, and only then you'll be able to
docker exec into it and use rbspy.
Now, let's move on to Kubernetes tips. I wrote this post as a collection of snippets that I use every day when I need to get into a running Ruby process and see what's happening.
$ kubectl get pods NAME READY STATUS RESTARTS AGE secretland-c796bf9df-gmt97 1/1 Running 0 13d secretland-4a145b44d-6xw11 1/1 Running 0 13d
I have two replicas of the secretland app that I've built in the previous post.
Let's get in to container:
$ kubectl exec -i -t secretland-c796bf9df-gmt97 /bin/bash root@secretland-c796bf9df-gmt97:/app# ls Dockerfile Gemfile Gemfile.lock README.md Rakefile app bin config config.ru db lib log package.json public script storage test tmp vendor root@secretland-c796bf9df-gmt97:/app# bin/rails console Loading production environment (Rails 5.2.0.rc2) irb(main):001:0>
Or exec to Rails console directly:
$ kubectl exec -i -t secretland-c796bf9df-gmt97 /app/bin/rails console Loading production environment (Rails 5.2.0.rc2) irb(main):001:0>
Rbspy is a sampling profiler for Ruby programs that was recently released by Julia Evans. Rbspy is an awesome tool that we've been missing in the Ruby world for years, and it makes me so happy that Julia worked on it.
Let's see how to use rbspy on a live process that runs in a Kubernetes Pod. First we need to know that rbspy relies on
ptrace(3). To run a ptrace-enabled container, we need to give a container the
SYS_PTRACE privilege. See
securityContext.capabilities field in the Deployment spec:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: secretland labels: app: secretland spec: selector: matchLabels: app: secretland template: metadata: labels: app: secretland spec: containers: - image: kirshatrov/secretland:v1 name: rails ports: - containerPort: 3000 securityContext: capabilities: add: - SYS_PTRACE # <-- the privilege
To emulate a slow endpoint, I've created a sample action that calculates Fibonacci number. That way, we'll be able to see where the process spends time in the Ruby land.
After the new Deployment spec is in place, we can exec to the web container and try rbspy. At least in my case, Puma process has
$ kubectl exec -i -t secretland-8dc689458-jstp2 /bin/bash root@secretland-8dc689458-jstp2:/app# rbspy record --pid 1 Time since start: 9s. Press Ctrl+C to stop. Summary of profiling data so far: % self % total name 100.00 100.00 <c function> - unknown 0.00 79.41 block in start! - /usr/local/bundle/gems/puma-3.11.3/lib/puma/thread_pool.rb 0.00 20.59 run_internal - /usr/local/bundle/gems/puma-3.11.3/lib/puma/reactor.rb 0.00 20.59 block in run_in_thread - /usr/local/bundle/gems/puma-3.11.3/lib/puma/reactor.rb
The profile changes after I hit the
/slowpath endpoint in browser:
% self % total name 70.56 100.00 <c function> - unknown 29.44 29.44 fibonacci - /app/app/controllers/helloworld_controller.rb 0.00 67.89 block in start! - /usr/local/bundle/gems/puma-3.11.3/lib/puma/thread_pool.rb 0.00 32.11 block in spawn_thread - /usr/local/bundle/gems/puma-3.11.3/lib/puma/thread_pool.rb 0.00 30.97 process_client - /usr/local/bundle/gems/puma-3.11.3/lib/puma/server.rb 0.00 30.97 block in run - /usr/local/bundle/gems/puma-3.11.3/lib/puma/server.rb 0.00 29.44 tagged - /usr/local/bundle/gems/activesupport-5.2.0.rc2/lib/active_support/tagged_logging.rb 0.00 29.44 slow - /app/app/controllers/helloworld_controller.rb 0.00 29.44 serve - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_dispatch/routing/route_set.rb 0.00 29.44 serve - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_dispatch/journey/router.rb 0.00 29.44 send_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/basic_implicit_render.rb 0.00 29.44 run_callbacks - /usr/local/bundle/gems/activesupport-5.2.0.rc2/lib/active_support/callbacks.rb 0.00 29.44 process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/rescue.rb 0.00 29.44 process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/rendering.rb 0.00 29.44 process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/params_wrapper.rb 0.00 29.44 process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/action_controller/metal/instrumentation.rb 0.00 29.44 process_action - /usr/local/bundle/gems/actionpack-5.2.0.rc2/lib/abstract_controller/callbacks.rb
Yay it works!
Getting to Docker
What if you need to get directly to the Docker daemon? Describe the pod, see what Node it's running on, then ssh to that instance.
$ kubectl describe pod secretland-8dc689458-jstp2 | grep Node Node: gke-kirs-jobs-default-pool-4a145b44-t690/10.128.0.3 Node-Selectors: <none> $ gcloud compute ssh gke-kirs-jobs-default-pool-4a145b44-t690 --zone us-central1-a kir@gke-kirs-jobs-default-pool-4a145b44-t690 ~ $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fc10153238a0 kirshatrov/secretland@sha256:2e6d8341f51ebe7393d2a7c770c29fbaf959e3317b628d0dc5ebbb19c923d29c "rails server -b 0
I'm using Google Cloud, so instead of sshing directly I use
gcloud compute ssh.
gdb might be useful for dumping MRI call stack, for instance when you want to find out why a Ruby process is stuck. See the script to dump call stack by my colleague Scott.
I haven't found a proper way to run gdb from a Kubernetes Pod yet, because gdb can't find Ruby's symbols:
$ kubectl exec -i -t secretland-8dc689458-jstp2 /bin/bash root@secretland-8dc689458-jstp2:/app# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 1.5 709816 58080 ? Ssl Apr07 0:08 puma 3.11.3 (tcp://0.0.0.0:3000) [app] root 43 0.0 0.0 18204 3280 ? Ss 02:29 0:00 /bin/bash root 53 0.0 0.0 36636 2808 ? R+ 02:31 0:00 ps aux root@secretland-8dc689458-jstp2:/app# gdb --pid 1 GNU gdb (Debian 7.12-6) 188.8.131.5261007-git ... Attaching to process 1 [New LWP 6] ... [New LWP 16] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 185 ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
Though I have a feeling that with the recent release of rbspy I won't need to use gdb much anymore, because 1) unlike gdb, rbspy doesn't pause the process 2) rbspy is way more user friendly.
Please share any snippets that you find helpful and I'll include them in the post.