k3s.live

Based on the IT journey of Michael Rickert

Ansible AWX-Operator Postgres 13 -> 15 upgrade

Looking on how to recover your awx-operator helm install of ansible after upgrading from versions 12.1 and lower to 13.x+ and higher, only to find out that postgres 15 cant start and the entire service is down? Look no further.

Step 1:

First, lets make sure if anything strange happens that our postgres database is safe. Make sure that the PV (persistent volume) has the following under spec (by editing the PV yaml). This can be changed at any time without risk of downtime etc.

kubectl edit pv pvc-<your-pv-id>

spec:
persistentVolumeReclaimPolicy: Retain

Lets also create an awx database backup, this can be used later if we need to restore our database to a known good state. Create a new file, backup.yaml

apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: awxbackup-0-0-1
namespace: awx
spec:
deployment_name: awx

kubectl apply -f backup.yaml

Step 2:

Once the backup completes (kubectl describe awxbackup -n awx awxbackup-0-0-1), upgrade the awx-operator helm chart to the latest version. This will break awx while we perform the next few steps, don’t panic.

helm repo update
helm upgrade awx-operator --namespace awx

Also upgrade the CRDs

kubectl apply --server-side --force-conflicts -k github.com/ansible/awx-operator/config/crd

After the upgrade process finishes, and the postgres15 pod tries (and fails) to come online, proceed to the next step

Step 3:

Now we’re going to create a temporary pod that we’ll connect to the awx postgres pvc so that we can fix the permission issues. Create a new yaml file (pvc.yaml) that we’ll use to create the temporary pod, if your pvc created by awx-operator or namespace is different, edit as needed. Even if your storage backing is ReadWriteOnce, this should still succeed as it’ll start up between postgres15 pod crashloopbackoffs.:

apiVersion: v1
kind: Pod
metadata:
  name: pvc-inspector
  namespace: awx
spec:
  containers:
  - image: busybox
    name: pvc-inspector
    command: ["tail"]
    args: ["-f", "/dev/null"]
    volumeMounts:
    - mountPath: /pvc
      name: pvc-mount
  volumes:
  - name: pvc-mount
    persistentVolumeClaim:
      claimName: postgres-15-awx-postgres-15-0

Step 4:

Create the pod based on the yaml file created in the previous step:

kubectl apply -f pvc.yaml

Step 5:

Exec into the temporary pod and change the permission settings for the postgres database:

kubectl exec -it -n awx pvc-inspector /bin/sh

> chown -R 26:26 /pvc/data/

Step 6:

Tear down the temporary pod:

kubectl delete pod -n awx pvc-inspector

Step 7:

Success! AWX should now be up and running on postgres15 with the latest helm chart release. The migration/upgrade process make take up to 5 minutes to complete, so please be patient.

Troubleshooting:

If for some reason its been 10+ minutes and the migration/upgrade process continues to fail, you can try to restore the database backup on top of the failed postgres15 database, assuming the postgres15 pod is online.

Create a new file, restore.yaml:

apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
  name: restore1
  namespace: awx
spec:
  deployment_name: awx
  backup_name: awxbackup-0-0-1

And apply it kubectl apply -f restore.yaml

Emergency rescue:

If all else fails, we can restore back the old database. First, lets tear down the non-working awx instance. helm uninstall -n awx awx-operator

Now lets re-install awx: helm install -n awx awx-operator --version 2.12.2

Be sure to edit the values.yaml to set
AWX:
enabled: true

Wait for awx to come fully online. With any luck it will grab the retained postgres13 database and things will be back online after several minutes. If after ~10 minutes its still not online, you can then run the awx-restore operator command.

Create a new file, restore.yaml:

apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
  name: restore1
  namespace: awx
spec:
  deployment_name: awx
  backup_name: awxbackup-0-0-1

And apply it kubectl apply -f restore.yaml

Leave a Reply