Back to post index

Why doesn't 'docker stop' work?
Tags: [docker]
Published: 24 Jan 2016 16:57

I’ve been playing around with docker on one of my VPS machines recently, moving applications from another VPS and containerizing as I go along.

Today, I CTRL-C’ed a docker build run that I realized wouldn’t work, and after that I couldn’t do anything with that container.

For hours…

CONTAINER ID  IMAGE        COMMAND                CREATED     STATUS     PORTS NAMES
8b46f61ea788  2c32664e08a0 "/bin/sh -c 'apt-get " 2 hours ago Up 2 hours       evil_goldstine

I pressed CTRL-C in the middle of a step installing openssh-server. After that, I couldn’t sudo docker stop this container, the command never returned.

I couldn’t see what is running in there, because:

$ sudo docker exec 8b46f61ea788 ps afx
nsenter: Unable to fork: Cannot allocate memory

What process was running?

 8204 ?        Ssl   18:28 /usr/bin/docker daemon -H fd://
28649 ?        Ss     0:00  \_ [sh]
28748 ?        Rs   172:03      \_ /usr/bin/dpkg --status-fd 13 --unpack
                                 --auto-deconfigure
                                 /var/cache/apt/archives/libwrap0_7.6.q-25_amd64.deb
                                 /var/cache/apt/archives/ncurses-term_5.9+20140913-1_all.deb
                                 /var/cache/apt/archives/openssh-sftp-server_1%3a6.7p1-5_amd64.deb
                                 /var/cache/apt/archives/openssh-server_1%3a6.7p1-5_amd64.deb
                                 /var/cache/apt/archives/tcpd_7.6.q-25_amd64.deb

sudo kill -9 28748 didn’t work.

Performing a docker inspect showed the container directory:

$ sudo docker inspect 8b46f61ea788

# cd /var/lib/docker/containers/8b46f61ea788904aa52c1e5ea48dcb1e91de582dfe00966f797d36bf1d352855
# tail -n1 8b46f61ea788904aa52c1e5ea48dcb1e91de582dfe00966f797d36bf1d352855-json.log
{"log":"Unpacking ncurses-term (5.9+20140913-1) ...\r\n","stream":"stdout","time":"2016-01-24T17:56:37.965639089Z"}

My best guess was that unpacking meant that it was probably stuck writing to docker’s aufs mounted filesystem.

I tried asking what work the process was doing:

$ sudo strace -p 28748
Process 28748 attached

Despite this process consuming ~100% of the CPU, I waited for a while and did not see anything from strace. From this perspective, it doesn’t look like this process is doing any work.

I tried CTRL-C strace and that didn’t work either. Maybe strace never attached and was waiting on something system related? Maybe strace attached and was stuck in something related to the process itself? I’m not sure.

There were lots of dmesg messages, including:

[801548.782477] auplink[29302]: segfault at 7ffddabd7888 ip 00007f387eaba7f9 sp 00007ffddabd7890 error 6 in libc-2.19.so[7f387e9df000+19f000]

And lots of aufs related messages about dirperm1 breaks the protection by the permission bits on the lower branch.

At this point, I rebooted:

$ sudo reboot

But that didn’t even work, the system never went down. I had to reboot the instance from the VPS’s web management console.

After the reboot, the container was stopped and could be removed without problems:

$ sudo docker rm 8b46f61ea788
8b46f61ea788
$

Very frustrating.