Forcing fsck to repair a corrupted CentOS root for systemd based hosts

May 27, 2019

WARNING: This is just a pile of text that isn't probably super useful to anyone, including myself. I've already forgotten how I got some of the data off the disk. In the end, I destroyed the VM's disk and rebuilt it from scratch.

This last weekend I had an unfortunate incident with one of my remote VMs. The underlying host crashed, hard, taking my VM down with it in an unclean manner. And lucky me, that left a bunch of docker containers with corrupted filesystems lying around!

May 27 22:38:40 blimp.aether.earth dockerd[9767]: Error starting daemon: error initializing graphdriver: lstat /var/lib/docker/overlay2/280979306bc8a5a038282a1216fc887c
alizing graphdriver: lstat /var/lib/docker/overlay2/280979306bc8a5a038282a1216fc887c443d1b9756a74641c2e15f3259e6aeda: input/output error

ls: cannot access /var/lib/docker/overlay2/280979306bc8a5a038282a1216fc887c443d1b9756a74641c2e15f3259e6aeda: Input/output error

rm: cannot remove ‘/var/lib/docker/overlay2/280979306bc8a5a038282a1216fc887c443d1b9756a74641c2e15f3259e6aeda’: Input/output error

Referring to the documentation of systemd-fsck: https://www.freedesktop.org/software/systemd/man/systemd-fsck@.service.html

Edit grub config:

vim /etc/default/grub
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

recovering dotfiles

this is the easiest one to fix, since it's all stored in my public gitlab repo:

rm ~/dotfiles
git clone
rake

recovering yum

One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:

file is encrypted or is not a database

Running file on the failed repo returns just data rather than recognizing it as an SQLite3 file, indicating it's been corrupted.

file /var/cache/yum/x86_64/7/jdoss-wireguard/gen/primary_db.sqlite
/var/cache/yum/x86_64/7/jdoss-wireguard/gen/primary_db.sqlite:  data

deleting it fixes some things but wasn't the complete solution.

delete all of the downloaded sqlite package dbs:

sudo rm /var/cache/yum/x86_64/7/*/*primary.sqlite.bz2

got a step closer, yum changed error message to 'database disk image is malformed'

yum clean all
yum makecache

Seemed to work, but now I'm getting a bunch of rpm errors:

error: rpmdbNextIterator: skipping h#     500 region trailer: BAD, tag 962398765 type 542860144 offset -1969648229 count 1901928553

And yum update is freezing and locking a CPU at 100%. Rebuild the RPMDB:

rm -f /var/lib/rpm/__db*
rpm --rebuilddb

That worked, but now yum check reports a bunch of errors.

In particular,

/sbin/ldconfig: /lib64/libseaudit.so.4 is not an ELF file - it has the wrong magic bytes at the start.
/sbin/ldconfig: /lib64/libseaudit.so.4 is not a symbolic link

Figured out that libseaudit.so.4 is owned by the setools-libs package So reinstall it fixes that error: sudo yum reinstall setools-libs

Then ran yum check obsoleted duplicates dependencies again

« Automatically adding UptimeRobot to Gitlab's monitoring whitelist with Ansible | Caching CI stuff from Gitlab with MinIO »