Automation
Installing AWX is easy. Operating it cleanly is not.
A production-oriented note on AWX with the operator, including namespace design, persistence, exposure, execution environments, validation, backups, and the failure modes that appear after the first successful login.
A first AWX login does not prove that the platform is ready for real work. Most weak installations fail later, when projects need persistent storage, custom execution environments, controlled ingress, backup discipline, and predictable upgrades.
This article uses the AWX Operator on Kubernetes and keeps the scope narrow on purpose. The goal is not to expose every field of the custom resource. The goal is to stand up an AWX instance that can survive the first serious operational questions.
What this article assumes
The examples below assume a Kubernetes cluster that already exists, working storage classes, TLS handled by your ingress layer or reverse proxy strategy, and administrators who prefer pinned versions over floating install commands.
The reference design is simple.
- One namespace dedicated to AWX
- Operator installed in that namespace
- AWX exposed behind ingress instead of a raw public
LoadBalancer - Persistent project storage enabled because many teams still depend on project sync and local content
- External execution environments prepared early instead of accepting ad hoc package drift later
- Backup and restore objects tested before the platform is declared usable
Pin the operator version before you install anything
Do not use a moving install target. Pin the operator version that you have validated in your lab or pre production environment.
export AWX_NAMESPACE=awx
export AWX_NAME=awx-prod
export AWX_OPERATOR_VERSION=2.19.1 Create the namespace first.
kubectl create namespace ${AWX_NAMESPACE} Install the operator from a pinned release.
kubectl apply -n ${AWX_NAMESPACE} -f https://github.com/ansible/awx-operator/releases/download/${AWX_OPERATOR_VERSION}/awx-operator.yaml Check that the operator is actually running before creating an AWX custom resource.
kubectl get pods -n ${AWX_NAMESPACE}
kubectl get deployments -n ${AWX_NAMESPACE}
kubectl logs deployment/awx-operator-controller-manager -n ${AWX_NAMESPACE} If you skip this validation and apply the AWX resource immediately, you only gain noise. Start with a healthy operator.
Create an AWX custom resource that reflects an actual platform choice
A throwaway demo often uses the smallest possible resource and accepts the defaults. That is usually where later problems begin.
This example keeps the service private inside the cluster, enables project persistence, sets a clear hostname, and avoids pretending that the default execution image is a long term operating model.
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx-prod
namespace: awx
spec:
service_type: ClusterIP
ingress_type: ingress
hostname: awx.naxaya.internal
projects_persistence: true
projects_storage_access_mode: ReadWriteOnce
projects_storage_size: 20Gi
admin_user: admin
web_resource_requirements:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
task_resource_requirements:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi Apply it.
kubectl apply -f awx-prod.yaml Then watch the rollout instead of assuming it will settle correctly.
kubectl get awx -n ${AWX_NAMESPACE}
kubectl get pods -n ${AWX_NAMESPACE} -w A realistic first validation is not only “pods are Running”. It is that the web, task, migration, Redis, and PostgreSQL related components converge without repeated restarts.
Retrieve the admin password and confirm the platform object state
The basic install flow is useful, but it should be treated as a checkpoint, not an end state.
kubectl get secret ${AWX_NAME}-admin-password -n ${AWX_NAMESPACE} -o jsonpath='{.data.password}' | base64 --decode && echo Collect the object status as well.
kubectl describe awx ${AWX_NAME} -n ${AWX_NAMESPACE}
kubectl get svc -n ${AWX_NAMESPACE}
kubectl get ingress -n ${AWX_NAMESPACE} If the password works but the status is unstable, you are not done. The first login is only the start of the validation sequence.
Ingress and exposure boundaries matter more than the first login screen
For most internal environments, ClusterIP behind a controlled ingress is cleaner than exposing AWX directly as a public LoadBalancer. It keeps the network boundary explicit and lets you align AWX with your existing TLS, reverse proxy, and access control model.
At minimum, validate these points.
kubectl get ingress -n ${AWX_NAMESPACE}
kubectl describe ingress ${AWX_NAME}-ingress -n ${AWX_NAMESPACE}
curl -Ik https://awx.naxaya.internal If your ingress is terminating TLS, validate that the certificate presented to users matches the chosen hostname and that websocket related behavior is not broken by an intermediate proxy policy.
Project persistence is not optional if your operating model still uses project sync heavily
A surprising number of AWX installs remain “fine” only as long as they are demos. The moment teams expect local project content, custom inventories, or data that must survive pod churn, the absence of storage becomes visible.
Inspect the resulting persistent volume claims.
kubectl get pvc -n ${AWX_NAMESPACE}
kubectl describe pvc -n ${AWX_NAMESPACE} If the cluster has weak storage behavior or badly understood reclaim policies, AWX will expose that quickly. This is one of the reasons a successful login is a poor success metric.
Execution environments should be deliberate, not accidental
Teams often postpone this step and let package drift accumulate. That works for a short time and then turns every job template into a dependency lottery.
Build and publish a tested execution environment image instead.
version: 3
images:
base_image:
name: quay.io/ansible/awx-ee:latest
dependencies:
galaxy: requirements.yml
python: requirements.txt
system: bindep.txt Build it with ansible-builder.
ansible-builder build --tag registry.internal.example.com/ee/netops:1.0.0 Then push the image and register it in AWX as an execution environment. Do not wait until a failed job reveals that one template expected pywinrm, another needed a cloud SDK, and a third depended on a system package never documented anywhere.
Organization and credential design should be visible from the start
A clean AWX installation is also a clean permissions model.
Create at least one organization, one team, and a bounded set of credentials instead of leaving everything under the initial admin account.
The AWX API is useful for validating this discipline early.
curl -sk -u admin:'<admin-password>' https://awx.naxaya.internal/api/v2/ping/
curl -sk -u admin:'<admin-password>' https://awx.naxaya.internal/api/v2/config/ Those calls are simple, but they confirm that the application API is alive, authenticated access works, and the platform is reachable through the same path your users and automation will actually use.
Backups are part of installation quality, not a later project
The operator supports backup and restore objects. That matters because a platform is not operationally credible until you know what can be recovered and under which constraints.
A basic backup object looks like this.
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
name: awx-prod-backup
namespace: awx
spec:
deployment_name: awx-prod Apply it and inspect the result.
kubectl apply -f awx-backup.yaml
kubectl get awxbackup -n ${AWX_NAMESPACE}
kubectl describe awxbackup awx-prod-backup -n ${AWX_NAMESPACE} The important operational point is not only that a backup object can be created. It is that you know where the backup data is stored, which secrets and configuration elements are included, and what namespace assumptions apply at restore time.
A basic restore object looks like this.
apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
name: awx-prod-restore
namespace: awx
spec:
deployment_name: awx-prod-restored
backup_name: awx-prod-backup You do not need to run a restore during every deployment, but you do need to prove the process in a non production environment before you rely on the platform.
Post install checks that are actually worth keeping
A useful validation sequence after installation looks like this.
kubectl get awx -n ${AWX_NAMESPACE}
kubectl get pods -n ${AWX_NAMESPACE}
kubectl get pvc -n ${AWX_NAMESPACE}
kubectl get ingress -n ${AWX_NAMESPACE}
kubectl logs deployment/awx-operator-controller-manager -n ${AWX_NAMESPACE} --tail=200
curl -sk https://awx.naxaya.internal/api/v2/ping/
curl -sk -u admin:'<admin-password>' https://awx.naxaya.internal/api/v2/me/ Keep these checks close to the deployment procedure. They are more valuable than screenshots.
What breaks most often after the installation appears successful
The recurring failures are predictable.
Project synchronization works until the storage class behaves badly under restart conditions.
Job templates run until different teams silently depend on different Python packages or system binaries because execution environments were never designed properly.
Ingress works for login but breaks background behavior because proxies, TLS, or websocket related handling were never validated end to end.
The platform survives a few pod restarts but not a real recovery event because backup and restore were treated as future work.
Permissions drift appears because the initial admin account stayed the center of the operating model for too long.
None of those issues are solved by reinstalling AWX. They are solved by deciding early what kind of platform you are building.
Decisions worth making before you scale usage
Decide whether AWX is only an internal orchestrator for a small team or whether it is becoming a shared service.
If it is a shared service, pin versions, control exposure, define backup frequency, standardize execution environments, separate organizations and credentials, and document the validation sequence you expect after every operator or platform change.
That is the boundary between a demo that happens to work and an automation platform that people can trust.