Cloud

Azure Functions: diagnose a private HTTP endpoint before changing code

Build an operational runbook for private Azure Functions failures by separating DNS, Private Endpoint, access restrictions, private storage, Application Insights logs and rollback evidence.

11 Jun 2026 azurefunctionsprivate-endpointdnskqllogsmonitoringrunbookidentityrollbackautomation

A private Azure Function can fail before the code is involved. The hostname may still resolve publicly, the Private Endpoint may be approved but unreachable from the caller network, access restrictions may block the request, the runtime may fail to start because private storage is unreachable, or the app may return 403, 502, 503 or timeouts before the function executes.

The use case is an internal API served by an Azure Functions HTTP trigger. Consumers call it through a private network, sometimes behind Application Gateway or internal APIM. A change has just touched DNS, networking, storage, managed identity, application settings or deployment package. Before redeploying code or opening public access, the runbook must prove where the request stops.

Read the private path as one chain

The diagnosis is more reliable when the team draws the full path before changing settings. A private Function App is not only a URL. It combines DNS, Private Link, access rules, the Functions runtime, platform storage, identity and logs.

text azure-functions-private-http-path.txt
Internal client or probe
Resolves api.internal.example.com
Calls the expected hostname with the correct host header

Private entry point
Application Gateway, internal APIM or direct client
Preserves TLS/SNI and correlation ID

Function App
Private Endpoint on the HTTP site
Access restrictions aligned with the caller network
Functions runtime started

Storage and dependencies
AzureWebJobsStorage reachable through the expected private path
Key Vault, files, queues or downstream APIs resolved privately when required

Observability
Application Insights or Log Analytics receives traces, requests and exceptions
Rollback targets the layer that actually changed

This map prevents broad fixes. A 403 caused by an access rule is not fixed by redeployment. A runtime that cannot start because private storage is blocked is not fixed by changing Application Gateway.

Classify the symptom before acting

The first question is not “does the function work?”. It becomes: “does the private hostname reach the Functions site, is the runtime started, and did the function execute the request?”.

text azure-functions-private-http-symptoms.txt
Symptom
DNS returns a public address or no answer
  Check privatelink.azurewebsites.net zone, VNet links and hybrid DNS forwarding

curl returns 403 immediately
  Check access restrictions, Private Endpoint, source route and optional APIM/Application Gateway

curl returns 502 or 503
  Check runtime state, worker process, Functions settings and storage availability

No request appears in Application Insights
  Go back to DNS, gateway, APIM, access restrictions or Private Endpoint

The request appears but fails with an exception
  Read traces, exceptions, managed identity, Key Vault and downstream dependencies

Runtime does not start after a network change
  Validate AzureWebJobsStorage, private storage DNS and storage account network rules

This classification gives a simple rule: do not touch code until function execution is visible in the logs.

Prove DNS, TLS and access from the consumer network

The test must start from the same network as the real consumer: application subnet, probe runner, diagnostic VM or internal APIM environment. A public workstation test can give the wrong answer.

bash 01-functions-private-dns-http-check.sh
HOSTNAME=api.internal.example.com
PATH=/api/health
CORRELATION_ID="ops-$(date +%Y%m%d%H%M%S)"

nslookup "$HOSTNAME"
dig +short "$HOSTNAME"

openssl s_client -connect "$HOSTNAME:443" -servername "$HOSTNAME" </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer

curl -vk "https://$HOSTNAME$PATH" -H "x-correlation-id: $CORRELATION_ID" -H "x-naxaya-check: private-functions"

echo "correlation_id=$CORRELATION_ID"

If resolution is not private, fix DNS before anything else. If TLS or the host header fails, inspect the entry point. If curl returns 403 before Functions logs show the call, the problem is probably network or access restrictions.

Check the Function App and private storage

A Function App can accept a private route but remain unusable if the runtime cannot reach its platform storage. This often appears after network hardening: the site is private, but AzureWebJobsStorage or its related file share no longer resolves or is not allowed from the right path.

bash 02-functions-platform-checks.sh
APP_RG=rg-prod-app
APP_NAME=func-prod-orders

az functionapp show -g "$APP_RG" -n "$APP_NAME" --query "{state:state, httpsOnly:httpsOnly, defaultHostName:defaultHostName, outboundIpAddresses:outboundIpAddresses}" -o table

az functionapp config access-restriction show -g "$APP_RG" -n "$APP_NAME" -o table

az functionapp config appsettings list -g "$APP_RG" -n "$APP_NAME" --query "[?name=='AzureWebJobsStorage' || contains(name, 'WEBSITE_') || contains(name, 'FUNCTIONS_')].[name,value]" -o table

az network private-endpoint-connection list --id "$(az functionapp show -g "$APP_RG" -n "$APP_NAME" --query id -o tsv)" --query "[].{name:name,status:privateLinkServiceConnectionState.status}" -o table

The goal is not to paste secrets into a ticket. It is to validate dimensions: runtime active, expected restrictions, network settings present, Private Endpoint approved and storage reachable through private DNS.

Correlate requests, traces and exceptions in KQL

When the request reaches the Function App, Application Insights should show at least one request, trace or exception in the incident window. Correlating by hostname, URL and correlation ID prevents mixing the private failure with unrelated application noise.

kusto 03-functions-private-http-correlation.kql
let Window = 2h;
let Host = "api.internal.example.com";
let CorrelationId = "ops-20260611080000";
let Req =
requests
| where timestamp > ago(Window)
| where url has Host or tostring(customDimensions["x-correlation-id"]) == CorrelationId
| project timestamp, Source="request", name, resultCode, success, operation_Id, url, cloud_RoleName;
let Tr =
traces
| where timestamp > ago(Window)
| where message has_any (Host, CorrelationId, "Host lock", "storage", "listener", "Starting", "Stopping", "Function")
| project timestamp, Source="trace", message, severityLevel, operation_Id, cloud_RoleName;
let Ex =
exceptions
| where timestamp > ago(Window)
| project timestamp, Source="exception", message=outerMessage, severityLevel, operation_Id, cloud_RoleName;
Req
| union Tr, Ex
| order by timestamp desc

Quick read: no request after a correlated curl points to DNS, private edge or access restrictions; a 403 or 503 request points to platform and configuration; a request with an exception points to code, identity or a dependency.

Keep rollback bounded

Rollback should restore the layer that changed, not the entire environment. If a DNS change broke private resolution, restore the zone or forwarder. If an access restriction blocked APIM, restore the rule. If the last package throws an exception after entering the function, roll back the deployment.

text functions-private-http-rollback-matrix.txt
Recent change
Private DNS
  Rollback: restore the previous record, VNet link or forwarder
  Evidence: private nslookup + curl with correlation ID

Access restriction
  Rollback: restore the previous source rule
  Evidence: request visible in Application Insights

Private storage
  Rollback: restore previous storage network permission or DNS path
  Evidence: runtime started + no Host lock/storage errors

Application package
  Rollback: return to the validated package or slot
  Evidence: same request succeeds with operation_Id kept

Conclusion

A private Azure Function must be diagnosed like a production path, not like a small serverless handler. DNS, Private Endpoint, restrictions, runtime, storage and logs must be separated before code changes.

The operating rule is intentionally strict: until a correlated request appears in logs, stay on network and platform; once it appears, move to runtime, identity, dependencies or code. That separation reduces temporary public openings, unnecessary redeployments and overly broad rollbacks.