Cloud
Azure VNet Integration: diagnose outbound networking before changing the application
An operational runbook for qualifying Azure App Service or Functions outbound failures by separating VNet Integration, DNS, UDR, NSG, NAT, logs and rollback.
An Azure outbound networking failure often looks like an application failure. The called API stops responding, a worker sees timeouts, a Function retries repeatedly, or a SaaS provider rejects an unexpected source IP. When App Service or Functions use VNet Integration, the cause may be DNS, a UDR, an NSG, a NAT Gateway, a firewall or a dependency-side change.
The use case is an Azure workload that must reach a private API, an Azure service, a controlled Internet endpoint or an internal dependency from an integration subnet. The runbook goal is to prove where outbound connectivity breaks before changing code, opening broad flows or removing a route.
Read outbound as an operable chain
VNet Integration does not make the application privately reachable. It mostly affects outbound flows from the application into a VNet. The incident must therefore be read as an outbound chain, not as an inbound exposure problem.
Workload
App Service, Function App, WebJob or internal worker
VNet Integration enabled
Dedicated subnet with enough capacity
Resolution
Target FQDN resolved from the application context
Private DNS, forwarder or resolver aligned with the destination
No conclusion based only on the admin workstation
Routing
System route or UDR applied to the integration subnet
Expected next hop: Internet, Virtual Appliance, Firewall, Virtual Network or None
Specific route that does not capture too much traffic
Filtering and egress
Subnet NSG
Azure Firewall or NVA when present
NAT Gateway or expected outbound IP
Destination-side rules
Evidence
Test timestamp
Tested hostname and port
Resolved address
Observed outbound IP
Application, firewall or destination logs This view avoids two common mistakes: looking for a Private Endpoint when the flow is outbound, or changing the application while the route has changed.
Classify the symptom before fixing it
The same timeout may come from DNS resolution, a wrong next hop, a blocked port, missing NAT or a destination-side rule. Classify the symptom first, with the test location attached.
Observed symptom
Name does not resolve or resolves to the wrong address
Check DNS, forwarders, private zones and resolution from the workload
Timeout to the correct address
Check UDR, next hop, firewall, NSG and return route
Immediate connection refused
Check port, target service, listener, proxy or destination-side rule
403 from API or SaaS endpoint
Check application identity and authorized outbound IP
Works from the admin workstation but not from the app
Test again from the integration subnet or an equivalent probe
Incident after a network change
Compare UDR, NSG, NAT Gateway, DNS and firewall before redeploying the application The operating rule is simple: without a test from the real outbound path, the application hypothesis remains weak.
Verify integration and subnet controls
Start by confirming that the application uses the expected subnet, then read the controls applied to that subnet. A configuration that was correct yesterday may have been replaced by integration to another subnet or by a broader route.
RG=rg-prod-app
APP=app-prod-orders
VNET_RG=rg-prod-network
VNET=vnet-prod-spoke
SUBNET=snet-app-outbound
az webapp vnet-integration list -g "$RG" -n "$APP" -o table
az network vnet subnet show -g "$VNET_RG" --vnet-name "$VNET" -n "$SUBNET" --query "{name:name,addressPrefix:addressPrefix,routeTable:routeTable.id,nsg:networkSecurityGroup.id,natGateway:natGateway.id,delegations:delegations[].serviceName}" -o jsonc
az network route-table route list --ids $(az network vnet subnet show -g "$VNET_RG" --vnet-name "$VNET" -n "$SUBNET" --query routeTable.id -o tsv) --query "[].{name:name,prefix:addressPrefix,nextHop:nextHopType,nextHopIp:nextHopIpAddress}" -o table For Functions, adapt the integration command to the application type. The important point is the proof: the workload exits through the subnet you are diagnosing.
Test DNS from the right context
DNS is often the first difference between an admin workstation and the application. A dependency may resolve privately from one subnet, publicly from another, or to an old endpoint because of a forwarder.
TARGET_HOST=api.internal.example
TARGET_PORT=443
# Run from an application console, diagnostic container,
# temporary VM in an equivalent subnet, or controlled probe.
nslookup "$TARGET_HOST"
dig +short "$TARGET_HOST"
curl -vk --connect-timeout 5 "https://$TARGET_HOST/health"
# If the destination expects a fixed outbound IP, compare with the approved IP.
curl -s https://ifconfig.me A VM in the same VNet is not always equivalent to the integration subnet, especially when UDR, NSG or NAT differ. It is still useful when its subnet, routes and rules are explicitly documented.
Read UDR, NSG and NAT together
A route may send traffic to a firewall, an NSG may block a port, and a NAT Gateway may change the address seen by the destination. Reading them separately gives an incomplete diagnosis.
Diagnostic question
Is the destination private or public?
Does the most specific prefix point to the expected next hop?
Does the firewall or NVA have an explicit outbound rule?
Does the NSG allow the port from the integration subnet?
Is the NAT Gateway attached to the right subnet?
Does the destination allow the observed outbound IP?
Is the return route symmetric when a private path is used? This matters especially after a Terraform change. A 0.0.0.0/0 UDR toward a firewall may be correct, but it must be paired with the corresponding firewall, DNS and NAT rules.
Correlate application and network logs
Application logs show what the runtime observes. Network logs show whether traffic reaches a control layer. Correlate them by time window, hostname, port, IP and request identifier when available.
let Window = 2h;
let TargetHost = "api.internal.example";
AppTraces
| where TimeGenerated > ago(Window)
| where Message has_any (TargetHost, "timeout", "NameResolution", "SocketException", "connection refused", "403")
| project TimeGenerated, AppRoleName, SeverityLevel, Message, OperationId
| order by TimeGenerated desc If you use Azure Firewall, NSG flow logs or destination logs, add the same time window. No firewall log for a documented test points back to DNS, route, local NSG or the wrong test point. An explicit deny points to the rule. A visible 403 response points more often to identity, IP allowlist or destination policy.
Choose the smallest correction
The correction must target the proven layer. Temporarily opening all outbound traffic or removing the default route can restore service, but it hides the cause and often creates a second security incident.
Proven cause
Wrong DNS resolution
Correction: zone, forwarder, resolver or app DNS configuration
Validation: hostname returns the expected address from the workload path
UDR too broad or wrong next hop
Correction: more specific route or corrected next hop
Validation: traffic reaches the expected firewall or destination
NSG or firewall blocks the port
Correction: targeted source, destination, port and protocol rule
Validation: deny disappears and the application request is visible
NAT or outbound IP drift
Correction: align NAT Gateway, route or destination allowlist
Validation: destination sees the approved IP
Identity or destination policy
Correction: application right or precise allowlist
Validation: same request succeeds without broader network opening The right change is the one that explains the symptom and can be validated again.
Prepare rollback without losing evidence
Rollback must be decided by layer: DNS, route, NSG, NAT, firewall or application version. It should not erase useful logs or make before/after comparison impossible.
Recent change
Route table or UDR
Rollback: restore the previous route or remove the faulty route
Evidence: observed next hop and documented connectivity test
NSG or firewall
Rollback: return to the previous rule or apply a targeted temporary exception
Evidence: deny identified, source/destination/port scope limited
NAT Gateway or outbound IP
Rollback: reattach the previous NAT or restore the known allowlist
Evidence: outbound IP observed by the destination
DNS or forwarder
Rollback: restore the previous resolution path
Evidence: timestamped FQDN resolution from the workload
Application deployment
Rollback: return to the previous version only when the network path is clean
Evidence: same dependency reachable with clean logs Conclusion
A VNet Integration outbound failure should be handled as a production chain: workload, DNS, integration subnet, UDR, NSG, firewall, NAT, destination, logs and rollback. Diagnosis must prove whether the application can no longer resolve, can no longer route, is filtered, exits with the wrong IP or is rejected by the dependency.
The decision then becomes defensible: fix DNS when the name is wrong, UDR when the next hop is wrong, NSG or firewall when the deny is proven, NAT when the outbound IP drifts, and the application only when the network path is clean.