Cloud
Azure Container Apps: diagnose private ingress before changing revisions
Build an operational runbook for Azure Container Apps private ingress failures by separating DNS, ingress mode, revision routing, application logs and rollback evidence.
Azure Container Apps is often used as a convenient landing zone for internal APIs, lightweight backends or event-driven services. The operational difficulty starts when a private endpoint, an internal environment, Application Gateway, DNS forwarding and revision traffic all sit on the same request path. A 404, 502, empty response or intermittent failure can look like an application bug while the real issue is a hostname, ingress setting, revision split or private DNS record.
The use case is a private API hosted on Azure Container Apps. Internal clients reach a friendly hostname, traffic may enter through Application Gateway, and the container app exposes internal ingress inside a Container Apps environment. A new revision has just been deployed, or a private endpoint/DNS change has been applied. Before rolling back blindly or changing traffic weights, the runbook must prove which layer is failing.
Write the private ingress path as an operational object
A Container Apps private path has more moving parts than the application code suggests. The environment can be internal. The app can expose ingress as internal or external. A private endpoint can be used for the environment. A custom domain can point to Application Gateway or to the Container Apps environment FQDN. Inside the environment, traffic is routed through the platform proxy and then to one or more revisions.
Internal client
Resolves api.internal.example.com
Calls the expected private hostname
Optional Application Gateway
Terminates TLS or forwards with the configured host header
Runs WAF and health probes
Sends traffic to the Container Apps environment path
Container Apps environment
Exposes private endpoint or internal load balancer path
Resolves the environment domain through private DNS
Routes traffic through the platform ingress proxy
Container app
Accepts internal ingress on the configured target port
Splits traffic between active revisions
Emits system and console logs
Backend dependencies
Require private DNS, identity, secrets or managed service access This map prevents the first common mistake: treating every failed request as a broken revision. If the hostname still resolves publicly, if Application Gateway health probes fail, or if traffic never reaches the Container Apps environment, changing a revision will only add noise.
Separate DNS, ingress and revision symptoms
The runbook should classify the symptom before changing anything. A missing DNS record does not need a rollback. A revision crash does. A traffic split pointing at an unhealthy revision needs a controlled shift, not a WAF exception.
Symptom
Hostname resolves to public address
Check private DNS zone, VNet link, forwarding rule and custom domain target
Application Gateway returns 502
Check backend health, host header, TLS/SNI and probe path toward Container Apps
Request reaches Container Apps but returns 404
Check ingress type, target port, custom domain binding, path routing and revision label
Request is intermittent
Check active revisions, traffic weights, replica restarts and scale events
Request reaches app but dependency fails
Check managed identity, private DNS, Key Vault or downstream service firewall The useful question is not only “is the app up?”. It is “does this exact hostname reach the expected environment, the expected ingress configuration and the expected revision?”.
Prove private DNS before debugging the container
Start from the same network that real clients use. If a diagnostic runner sits outside the private path, it can produce a clean result while production clients fail, or the opposite. The first check should capture the hostname, CNAME chain and final address.
HOSTNAME=api.internal.example.com
nslookup "$HOSTNAME"
dig +short "$HOSTNAME"
a=$(dig +short "$HOSTNAME" | tail -n 1)
case "$a" in
10.*|172.16.*|172.17.*|172.18.*|172.19.*|172.2*|172.30.*|172.31.*|192.168.*)
echo "private_resolution_ok=$a"
;;
*)
echo "unexpected_public_or_empty_resolution=$a"
exit 2
;;
esac
openssl s_client -connect "$HOSTNAME:443" -servername "$HOSTNAME" </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer If this fails, keep the fix at the DNS or private endpoint layer: private DNS zone records, VNet links, resolver forwarding or the custom domain target. A new container image cannot repair a resolver path.
Read Container Apps system and console logs together
Container Apps exposes two useful Log Analytics views: system logs for platform events and console logs for application output. During an ingress incident, looking at only stdout/stderr is too narrow. The platform may already be telling you that a revision is provisioning, deactivating, failing probes or receiving no traffic.
let Window = 2h;
let App = "orders-api";
let Env = "aca-prod-weu";
let System =
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(Window)
| where ContainerAppName_s == App or EnvironmentName_s == Env
| project TimeGenerated,
Source="system",
Environment=EnvironmentName_s,
App=ContainerAppName_s,
Revision=RevisionName_s,
Replica="",
Message=Log_s;
let Console =
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(Window)
| where ContainerAppName_s == App
| project TimeGenerated,
Source="console",
Environment=EnvironmentName_s,
App=ContainerAppName_s,
Revision=RevisionName_s,
Replica=tostring(ContainerGroupName_g),
Message=Log_s;
System
| union Console
| where Message has_any ("ingress", "probe", "revision", "error", "failed", "timeout", "502", "404")
| order by TimeGenerated desc This query gives the incident lead a compact picture: platform events, revision names, replicas and application messages in one timeline. It is intentionally broad at the start. Once the failing revision or event family is identified, narrow the query.
Compare configured traffic with observed logs
Container Apps revisions make rollback attractive, but traffic weights need evidence. If multiple revisions are active, a small percentage can still create a visible intermittent incident. If labels are used, a caller may target a labeled revision even when default traffic looks healthy.
APP=orders-api
RG=rg-prod-apps
az containerapp ingress show --name "$APP" --resource-group "$RG" --query '{external:external,targetPort:targetPort,transport:transport,traffic:traffic}' --output table
az containerapp revision list --name "$APP" --resource-group "$RG" --query '[].{name:name,active:properties.active,traffic:properties.trafficWeight,created:properties.createdTime}' --output table The comparison is simple: if logs show failures only on one revision and traffic points to it, shift traffic or roll back that revision. If no request reaches any revision, stay at ingress, DNS or gateway. If every revision logs the same downstream 403, the dependency path is the better suspect.
Make rollback small and observable
A rollback should not become a silent workaround. Before shifting traffic, capture the failing revision, current weights, last deployment reference and the diagnostic evidence that made rollback reasonable. After the rollback, keep the same hostname and correlation header for validation.
APP=orders-api
RG=rg-prod-apps
GOOD_REV=orders-api--000018
BAD_REV=orders-api--000019
az containerapp ingress traffic set --name "$APP" --resource-group "$RG" --revision-weight "$GOOD_REV=100" "$BAD_REV=0"
CORRELATION_ID="ops-$(date +%Y%m%d%H%M%S)"
curl -vk "https://api.internal.example.com/health" -H "x-correlation-id: $CORRELATION_ID"
echo "validate_correlation_id=$CORRELATION_ID" If the rollback fixes the request but DNS, gateway and ingress checks were never captured, the team still has a fragile result. The next deployment may reintroduce the failure because the true condition was not written down.
Conclusion
Azure Container Apps private ingress is operationally friendly only when the path is observable. DNS, private endpoint, ingress mode, platform proxy, revision routing and application logs all need to be read before a fix is chosen.
The practical rule is direct: do not change revision traffic until traffic has reached Container Apps, do not change DNS until the resolver path is proven, and do not add gateway exceptions without a visible gateway symptom. With that discipline, a private Container Apps incident becomes a bounded diagnosis instead of a blind rollback.