I’ve been playing around with docker on one of my VPS machines recently, moving applications from another VPS and containerizing as I go along.
Today, I CTRL-C’ed a docker build
run that I realized wouldn’t work,
and after that I couldn’t do anything with that container.
For hours…
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8b46f61ea788 2c32664e08a0 "/bin/sh -c 'apt-get " 2 hours ago Up 2 hours evil_goldstine
I pressed CTRL-C in the middle of a step installing openssh-server.
After that, I couldn’t sudo docker stop
this container, the command
never returned.
I couldn’t see what is running in there, because:
$ sudo docker exec 8b46f61ea788 ps afx
nsenter: Unable to fork: Cannot allocate memory
What process was running?
8204 ? Ssl 18:28 /usr/bin/docker daemon -H fd://
28649 ? Ss 0:00 \_ [sh]
28748 ? Rs 172:03 \_ /usr/bin/dpkg --status-fd 13 --unpack
--auto-deconfigure
/var/cache/apt/archives/libwrap0_7.6.q-25_amd64.deb
/var/cache/apt/archives/ncurses-term_5.9+20140913-1_all.deb
/var/cache/apt/archives/openssh-sftp-server_1%3a6.7p1-5_amd64.deb
/var/cache/apt/archives/openssh-server_1%3a6.7p1-5_amd64.deb
/var/cache/apt/archives/tcpd_7.6.q-25_amd64.deb
sudo kill -9 28748
didn’t work.
Performing a docker inspect
showed the container directory:
$ sudo docker inspect 8b46f61ea788
# cd /var/lib/docker/containers/8b46f61ea788904aa52c1e5ea48dcb1e91de582dfe00966f797d36bf1d352855
# tail -n1 8b46f61ea788904aa52c1e5ea48dcb1e91de582dfe00966f797d36bf1d352855-json.log
{"log":"Unpacking ncurses-term (5.9+20140913-1) ...\r\n","stream":"stdout","time":"2016-01-24T17:56:37.965639089Z"}
My best guess was that unpacking meant that it was probably stuck writing to docker’s aufs mounted filesystem.
I tried asking what work the process was doing:
$ sudo strace -p 28748
Process 28748 attached
Despite this process consuming ~100% of the CPU, I waited for a while and did not see anything from strace. From this perspective, it doesn’t look like this process is doing any work.
I tried CTRL-C strace and that didn’t work either. Maybe strace never attached and was waiting on something system related? Maybe strace attached and was stuck in something related to the process itself? I’m not sure.
There were lots of dmesg
messages, including:
[801548.782477] auplink[29302]: segfault at 7ffddabd7888 ip 00007f387eaba7f9 sp 00007ffddabd7890 error 6 in libc-2.19.so[7f387e9df000+19f000]
And lots of aufs related messages about dirperm1 breaks the protection
by the permission bits on the lower branch
.
At this point, I rebooted:
$ sudo reboot
But that didn’t even work, the system never went down. I had to reboot the instance from the VPS’s web management console.
After the reboot, the container was stopped and could be removed without problems:
$ sudo docker rm 8b46f61ea788
8b46f61ea788
$
Very frustrating.