Cloud
Azure App Service: diagnose a private endpoint before redeploying
Build an operational runbook for App Service private access failures by separating DNS, Private Endpoint, access restrictions, Application Gateway, application logs and rollback evidence.
A private App Service can fail without the last application deployment being responsible. The name may still resolve to a public address, the Private Endpoint may be approved but unreachable from the caller network, an access restriction may block Application Gateway or APIM, the health check may target an unstable route, or the app may return 403, 502 or 503 before producing useful logs.
The use case is an internal application hosted on Azure App Service and called from a private network through Application Gateway, internal APIM, a synthetic probe or a service consumer. The runbook has one goal: prove where the request stops before redeploying, opening public access or changing a WAF rule too broadly.
Read App Service as a production path
A private App Service is not just a hostname. It is a chain made of DNS resolution, Private Link, an optional entry point, access restrictions, application runtime, identity and observability. Diagnosis should follow that chain in order.
Internal client, APIM or probe
Resolves app.internal.example.com from the consumer network
Sends the right Host header, a correlation ID and the real path
Optional entry point
Application Gateway or internal APIM
Preserves TLS/SNI, host header and x-correlation-id
Surfaces WAF, gateway or policy status without mixing them
App Service
Private Endpoint approved on the Web site
Public network access and access restrictions aligned
Health check mapped to a stable application route
Dependencies
Key Vault, storage, database or downstream API reachable through the expected path
Managed identity authorized only for the required scope
Observability
AppServiceHTTPLogs, application traces and gateway logs can be correlated
Rollback targets the layer that actually changed This map prevents two expensive reflexes: redeploying when DNS is wrong, or opening public access when the request does not even reach the runtime.
Classify the symptom before fixing
The first decision is to separate a private entry failure from an application failure. If the request does not appear in App Service logs, code is not yet the main suspect.
Symptom
DNS returns a public address
Check privatelink.azurewebsites.net zone, VNet link and hybrid DNS forwarding
Application Gateway returns 502
Check backend health, host header, TLS/SNI, probe path and App Service access restrictions
APIM or the client returns 403
Check access restrictions, public network access, Private Endpoint and real call source
No AppServiceHTTPLogs line for the correlation ID
Go back to DNS, gateway, APIM, WAF or access restrictions
App Service logs exist with 500 or an exception
Read application traces, managed identity, Key Vault and downstream dependencies
Health check is unstable after release
Validate health route, warmup, slot, critical dependencies and application rollback The operating rule is intentionally strict: until the correlation ID appears on the App Service side, stay on networking, private entry and restrictions.
Test from the consumer network
The test must start from a location that sees the same DNS and routing as the real consumer: diagnostic VM, probe runner, application subnet, internal APIM environment or operations bastion. A public workstation does not validate the private path.
HOSTNAME=app.internal.example.com
PATH=/health
CORRELATION_ID="ops-$(date +%Y%m%d%H%M%S)"
nslookup "$HOSTNAME"
dig +short "$HOSTNAME"
openssl s_client -connect "$HOSTNAME:443" -servername "$HOSTNAME" </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
curl -vk "https://$HOSTNAME$PATH" -H "x-correlation-id: $CORRELATION_ID" -H "x-naxaya-check: private-app-service"
echo "correlation_id=$CORRELATION_ID" If DNS does not return the expected private address, fix the zone or forwarding before moving on. If TLS fails, inspect SNI, certificate and host header. If curl returns 403, verify the source actually seen by App Service or by the entry point.
Check App Service state without exposing the app
Azure commands should confirm service dimensions without pasting secrets into the ticket: status, restrictions, Private Endpoint, health check, identity and useful platform settings.
APP_RG=rg-prod-app
APP_NAME=app-prod-orders
az webapp show -g "$APP_RG" -n "$APP_NAME" --query "{state:state, enabled:enabled, httpsOnly:httpsOnly, hostNames:hostNames, defaultHostName:defaultHostName}" -o jsonc
az webapp config show -g "$APP_RG" -n "$APP_NAME" --query "{alwaysOn:alwaysOn, healthCheckPath:healthCheckPath, publicNetworkAccess:publicNetworkAccess, ftpsState:ftpsState}" -o jsonc
az webapp config access-restriction show -g "$APP_RG" -n "$APP_NAME" -o table
az webapp identity show -g "$APP_RG" -n "$APP_NAME" --query "{type:type, principalId:principalId, tenantId:tenantId}" -o jsonc
az network private-endpoint-connection list --id "$(az webapp show -g "$APP_RG" -n "$APP_NAME" --query id -o tsv)" --query "[].{name:name,status:privateLinkServiceConnectionState.status,description:privateLinkServiceConnectionState.description}" -o table This step catches common drift: unapproved Private Endpoint, access restriction that does not cover Application Gateway, missing health check, or a health route that depends on a fragile downstream service.
Correlate gateway, WAF and App Service in KQL
A correlated request shows whether the failure is before App Service, inside App Service, or after code starts. The query below combines Application Gateway/WAF and App Service logs around the same host, path and correlation ID.
let Window = 2h;
let Host = "app.internal.example.com";
let Path = "/health";
let CorrelationId = "ops-20260612080000";
let Gateway =
AzureDiagnostics
| where TimeGenerated > ago(Window)
| where Category in ("ApplicationGatewayAccessLog", "ApplicationGatewayFirewallLog")
| where host_s has Host or requestUri_s has Path or transactionId_g == CorrelationId
| project TimeGenerated, Source=Category, host=host_s, uri=requestUri_s, status=coalesce(httpStatus_d, status_d), ruleId=ruleId_s, action=action_s, transactionId=tostring(transactionId_g);
let AppService =
AppServiceHTTPLogs
| where TimeGenerated > ago(Window)
| where CsHost has Host or CsUriStem has Path or CsUserAgent has CorrelationId
| project TimeGenerated, Source="AppServiceHTTPLogs", host=CsHost, uri=CsUriStem, status=ScStatus, ruleId="", action=CsMethod, transactionId=CorrelationId;
Gateway
| union AppService
| order by TimeGenerated desc Quick read: gateway logs without App Service logs point to WAF, backend health, private DNS, TLS or restrictions; App Service logs with 500 point to code or dependency; no logs at all point to DNS, routing or the wrong test location.
Choose the smallest rollback
Rollback should restore the layer that changed, not undo the whole release. A private zone change, access rule, backend pool or application slot does not have the same return path.
Recent change
Private DNS or forwarding
Rollback: restore the previous record, VNet link or forwarder
Evidence: private nslookup then correlated request visible at the right layer
Access restriction or public network access
Rollback: restore the previous source rule
Evidence: AppServiceHTTPLogs sees the correlation ID
Application Gateway backend or probe
Rollback: return to the validated backend setting, host header or probe path
Evidence: healthy backend health and coherent application status
Application slot or package
Rollback: swap back or restore the previous package
Evidence: same request succeeds with App Service logs and application traces Conclusion
A private App Service incident should be treated as a path incident, not as a code incident by default. DNS, Private Endpoint, access restrictions, Application Gateway, App Service logs and dependencies must be isolated before touching the deployment.
The decision becomes more defensible: if the request does not reach App Service, fix the private entry; if it reaches App Service and fails in logs, investigate runtime, identity, dependencies or code. This separation reduces temporary public openings, broad WAF exceptions and rollbacks that hide the real failure.