Automation

Installing AWX is easy. Operating it cleanly is not.

A production-oriented note on AWX with the operator, including namespace design, persistence, exposure, execution environments, validation, backups, and the failure modes that appear after the first successful login.

20 Apr 2026 awxansiblekubernetesautomationoperator

A first AWX login does not prove that the platform is ready for real work. Most weak installations fail later, when projects need persistent storage, custom execution environments, controlled ingress, backup discipline, and predictable upgrades.

This article uses the AWX Operator on Kubernetes and keeps the scope narrow on purpose. The goal is not to expose every field of the custom resource. The goal is to stand up an AWX instance that can survive the first serious operational questions.

What this article assumes

The examples below assume a Kubernetes cluster that already exists, working storage classes, TLS handled by your ingress layer or reverse proxy strategy, and administrators who prefer pinned versions over floating install commands.

The reference design is simple.

One namespace dedicated to AWX
Operator installed in that namespace
AWX exposed behind ingress instead of a raw public LoadBalancer
Persistent project storage enabled because many teams still depend on project sync and local content
External execution environments prepared early instead of accepting ad hoc package drift later
Backup and restore objects tested before the platform is declared usable

Pin the operator version before you install anything

Do not use a moving install target. Pin the operator version that you have validated in your lab or pre production environment.

bash 00-variables.sh

export AWX_NAMESPACE=awx
export AWX_NAME=awx-prod
export AWX_OPERATOR_VERSION=2.19.1

Create the namespace first.

bash 01-namespace.sh

kubectl create namespace ${AWX_NAMESPACE}

Install the operator from a pinned release.

bash 02-install-operator.sh

kubectl apply -n ${AWX_NAMESPACE} -f https://github.com/ansible/awx-operator/releases/download/${AWX_OPERATOR_VERSION}/awx-operator.yaml

Check that the operator is actually running before creating an AWX custom resource.

bash 03-check-operator.sh

kubectl get pods -n ${AWX_NAMESPACE}
kubectl get deployments -n ${AWX_NAMESPACE}
kubectl logs deployment/awx-operator-controller-manager -n ${AWX_NAMESPACE}

If you skip this validation and apply the AWX resource immediately, you only gain noise. Start with a healthy operator.

Create an AWX custom resource that reflects an actual platform choice

A throwaway demo often uses the smallest possible resource and accepts the defaults. That is usually where later problems begin.

This example keeps the service private inside the cluster, enables project persistence, sets a clear hostname, and avoids pretending that the default execution image is a long term operating model.

yaml awx-prod.yaml

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx-prod
namespace: awx
spec:
service_type: ClusterIP
ingress_type: ingress
hostname: awx.naxaya.internal

projects_persistence: true
projects_storage_access_mode: ReadWriteOnce
projects_storage_size: 20Gi

admin_user: admin

web_resource_requirements:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 1
    memory: 2Gi

task_resource_requirements:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2
    memory: 4Gi

Apply it.

bash 04-apply-awx.sh

kubectl apply -f awx-prod.yaml

Then watch the rollout instead of assuming it will settle correctly.

bash 05-watch-awx.sh

kubectl get awx -n ${AWX_NAMESPACE}
kubectl get pods -n ${AWX_NAMESPACE} -w

A realistic first validation is not only “pods are Running”. It is that the web, task, migration, Redis, and PostgreSQL related components converge without repeated restarts.

Retrieve the admin password and confirm the platform object state

The basic install flow is useful, but it should be treated as a checkpoint, not an end state.

bash 06-admin-password.sh

kubectl get secret ${AWX_NAME}-admin-password -n ${AWX_NAMESPACE} -o jsonpath='{.data.password}' | base64 --decode && echo

Collect the object status as well.

bash 07-awx-status.sh

kubectl describe awx ${AWX_NAME} -n ${AWX_NAMESPACE}
kubectl get svc -n ${AWX_NAMESPACE}
kubectl get ingress -n ${AWX_NAMESPACE}

If the password works but the status is unstable, you are not done. The first login is only the start of the validation sequence.

For most internal environments, ClusterIP behind a controlled ingress is cleaner than exposing AWX directly as a public LoadBalancer. It keeps the network boundary explicit and lets you align AWX with your existing TLS, reverse proxy, and access control model.

At minimum, validate these points.

bash 08-ingress-checks.sh

kubectl get ingress -n ${AWX_NAMESPACE}
kubectl describe ingress ${AWX_NAME}-ingress -n ${AWX_NAMESPACE}
curl -Ik https://awx.naxaya.internal

If your ingress is terminating TLS, validate that the certificate presented to users matches the chosen hostname and that websocket related behavior is not broken by an intermediate proxy policy.

Project persistence is not optional if your operating model still uses project sync heavily

A surprising number of AWX installs remain “fine” only as long as they are demos. The moment teams expect local project content, custom inventories, or data that must survive pod churn, the absence of storage becomes visible.

Inspect the resulting persistent volume claims.

bash 09-storage-checks.sh

kubectl get pvc -n ${AWX_NAMESPACE}
kubectl describe pvc -n ${AWX_NAMESPACE}

If the cluster has weak storage behavior or badly understood reclaim policies, AWX will expose that quickly. This is one of the reasons a successful login is a poor success metric.

Execution environments should be deliberate, not accidental

Teams often postpone this step and let package drift accumulate. That works for a short time and then turns every job template into a dependency lottery.

Build and publish a tested execution environment image instead.

yaml execution-environment.yml

version: 3
images:
base_image:
  name: quay.io/ansible/awx-ee:latest

dependencies:
galaxy: requirements.yml
python: requirements.txt
system: bindep.txt

Build it with ansible-builder.

bash 10-build-ee.sh

ansible-builder build --tag registry.internal.example.com/ee/netops:1.0.0

Then push the image and register it in AWX as an execution environment. Do not wait until a failed job reveals that one template expected pywinrm, another needed a cloud SDK, and a third depended on a system package never documented anywhere.

Organization and credential design should be visible from the start

A clean AWX installation is also a clean permissions model.

Create at least one organization, one team, and a bounded set of credentials instead of leaving everything under the initial admin account.

The AWX API is useful for validating this discipline early.

bash 11-api-health.sh

curl -sk -u admin:'<admin-password>' https://awx.naxaya.internal/api/v2/ping/

curl -sk -u admin:'<admin-password>' https://awx.naxaya.internal/api/v2/config/

Those calls are simple, but they confirm that the application API is alive, authenticated access works, and the platform is reachable through the same path your users and automation will actually use.

Backups are part of installation quality, not a later project

The operator supports backup and restore objects. That matters because a platform is not operationally credible until you know what can be recovered and under which constraints.

A basic backup object looks like this.

yaml awx-backup.yaml

apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: awx-prod-backup
namespace: awx
spec:
deployment_name: awx-prod

Apply it and inspect the result.

bash 12-run-backup.sh

kubectl apply -f awx-backup.yaml
kubectl get awxbackup -n ${AWX_NAMESPACE}
kubectl describe awxbackup awx-prod-backup -n ${AWX_NAMESPACE}

The important operational point is not only that a backup object can be created. It is that you know where the backup data is stored, which secrets and configuration elements are included, and what namespace assumptions apply at restore time.

A basic restore object looks like this.

yaml awx-restore.yaml

apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
name: awx-prod-restore
namespace: awx
spec:
deployment_name: awx-prod-restored
backup_name: awx-prod-backup

You do not need to run a restore during every deployment, but you do need to prove the process in a non production environment before you rely on the platform.

Post install checks that are actually worth keeping

A useful validation sequence after installation looks like this.

bash 13-post-install-checks.sh

kubectl get awx -n ${AWX_NAMESPACE}
kubectl get pods -n ${AWX_NAMESPACE}
kubectl get pvc -n ${AWX_NAMESPACE}
kubectl get ingress -n ${AWX_NAMESPACE}
kubectl logs deployment/awx-operator-controller-manager -n ${AWX_NAMESPACE} --tail=200
curl -sk https://awx.naxaya.internal/api/v2/ping/
curl -sk -u admin:'<admin-password>' https://awx.naxaya.internal/api/v2/me/

Keep these checks close to the deployment procedure. They are more valuable than screenshots.

What breaks most often after the installation appears successful

The recurring failures are predictable.

Project synchronization works until the storage class behaves badly under restart conditions.

Job templates run until different teams silently depend on different Python packages or system binaries because execution environments were never designed properly.

Ingress works for login but breaks background behavior because proxies, TLS, or websocket related handling were never validated end to end.

The platform survives a few pod restarts but not a real recovery event because backup and restore were treated as future work.

Permissions drift appears because the initial admin account stayed the center of the operating model for too long.

None of those issues are solved by reinstalling AWX. They are solved by deciding early what kind of platform you are building.

Decisions worth making before you scale usage

Decide whether AWX is only an internal orchestrator for a small team or whether it is becoming a shared service.

If it is a shared service, pin versions, control exposure, define backup frequency, standardize execution environments, separate organizations and credentials, and document the validation sequence you expect after every operator or platform change.

That is the boundary between a demo that happens to work and an automation platform that people can trust.