Cloud

Azure VNet Integration: diagnose outbound networking before changing the application

An operational runbook for qualifying Azure App Service or Functions outbound failures by separating VNet Integration, DNS, UDR, NSG, NAT, logs and rollback.

17 Jun 2026 azurevnet-integrationoutbound-networkingudrnsgnat-gatewaydnsapp-servicefunctionslogsrunbookrollback

An Azure outbound networking failure often looks like an application failure. The called API stops responding, a worker sees timeouts, a Function retries repeatedly, or a SaaS provider rejects an unexpected source IP. When App Service or Functions use VNet Integration, the cause may be DNS, a UDR, an NSG, a NAT Gateway, a firewall or a dependency-side change.

The use case is an Azure workload that must reach a private API, an Azure service, a controlled Internet endpoint or an internal dependency from an integration subnet. The runbook goal is to prove where outbound connectivity breaks before changing code, opening broad flows or removing a route.

Read outbound as an operable chain

VNet Integration does not make the application privately reachable. It mostly affects outbound flows from the application into a VNet. The incident must therefore be read as an outbound chain, not as an inbound exposure problem.

text vnet-integration-outbound-chain.txt
Workload
App Service, Function App, WebJob or internal worker
VNet Integration enabled
Dedicated subnet with enough capacity

Resolution
Target FQDN resolved from the application context
Private DNS, forwarder or resolver aligned with the destination
No conclusion based only on the admin workstation

Routing
System route or UDR applied to the integration subnet
Expected next hop: Internet, Virtual Appliance, Firewall, Virtual Network or None
Specific route that does not capture too much traffic

Filtering and egress
Subnet NSG
Azure Firewall or NVA when present
NAT Gateway or expected outbound IP
Destination-side rules

Evidence
Test timestamp
Tested hostname and port
Resolved address
Observed outbound IP
Application, firewall or destination logs

This view avoids two common mistakes: looking for a Private Endpoint when the flow is outbound, or changing the application while the route has changed.

Classify the symptom before fixing it

The same timeout may come from DNS resolution, a wrong next hop, a blocked port, missing NAT or a destination-side rule. Classify the symptom first, with the test location attached.

text outbound-symptoms.txt
Observed symptom
Name does not resolve or resolves to the wrong address
  Check DNS, forwarders, private zones and resolution from the workload

Timeout to the correct address
  Check UDR, next hop, firewall, NSG and return route

Immediate connection refused
  Check port, target service, listener, proxy or destination-side rule

403 from API or SaaS endpoint
  Check application identity and authorized outbound IP

Works from the admin workstation but not from the app
  Test again from the integration subnet or an equivalent probe

Incident after a network change
  Compare UDR, NSG, NAT Gateway, DNS and firewall before redeploying the application

The operating rule is simple: without a test from the real outbound path, the application hypothesis remains weak.

Verify integration and subnet controls

Start by confirming that the application uses the expected subnet, then read the controls applied to that subnet. A configuration that was correct yesterday may have been replaced by integration to another subnet or by a broader route.

bash 01-vnet-integration-subnet-check.sh
RG=rg-prod-app
APP=app-prod-orders
VNET_RG=rg-prod-network
VNET=vnet-prod-spoke
SUBNET=snet-app-outbound

az webapp vnet-integration list -g "$RG" -n "$APP" -o table

az network vnet subnet show -g "$VNET_RG" --vnet-name "$VNET" -n "$SUBNET" --query "{name:name,addressPrefix:addressPrefix,routeTable:routeTable.id,nsg:networkSecurityGroup.id,natGateway:natGateway.id,delegations:delegations[].serviceName}" -o jsonc

az network route-table route list --ids $(az network vnet subnet show -g "$VNET_RG" --vnet-name "$VNET" -n "$SUBNET" --query routeTable.id -o tsv) --query "[].{name:name,prefix:addressPrefix,nextHop:nextHopType,nextHopIp:nextHopIpAddress}" -o table

For Functions, adapt the integration command to the application type. The important point is the proof: the workload exits through the subnet you are diagnosing.

Test DNS from the right context

DNS is often the first difference between an admin workstation and the application. A dependency may resolve privately from one subnet, publicly from another, or to an old endpoint because of a forwarder.

bash 02-dns-and-connectivity-probe.sh
TARGET_HOST=api.internal.example
TARGET_PORT=443

# Run from an application console, diagnostic container,
# temporary VM in an equivalent subnet, or controlled probe.
nslookup "$TARGET_HOST"
dig +short "$TARGET_HOST"

curl -vk --connect-timeout 5 "https://$TARGET_HOST/health"

# If the destination expects a fixed outbound IP, compare with the approved IP.
curl -s https://ifconfig.me

A VM in the same VNet is not always equivalent to the integration subnet, especially when UDR, NSG or NAT differ. It is still useful when its subnet, routes and rules are explicitly documented.

Read UDR, NSG and NAT together

A route may send traffic to a firewall, an NSG may block a port, and a NAT Gateway may change the address seen by the destination. Reading them separately gives an incomplete diagnosis.

text route-filter-nat-reading.txt
Diagnostic question
Is the destination private or public?
Does the most specific prefix point to the expected next hop?
Does the firewall or NVA have an explicit outbound rule?
Does the NSG allow the port from the integration subnet?
Is the NAT Gateway attached to the right subnet?
Does the destination allow the observed outbound IP?
Is the return route symmetric when a private path is used?

This matters especially after a Terraform change. A 0.0.0.0/0 UDR toward a firewall may be correct, but it must be paired with the corresponding firewall, DNS and NAT rules.

Correlate application and network logs

Application logs show what the runtime observes. Network logs show whether traffic reaches a control layer. Correlate them by time window, hostname, port, IP and request identifier when available.

kusto 03-outbound-errors.kql
let Window = 2h;
let TargetHost = "api.internal.example";
AppTraces
| where TimeGenerated > ago(Window)
| where Message has_any (TargetHost, "timeout", "NameResolution", "SocketException", "connection refused", "403")
| project TimeGenerated, AppRoleName, SeverityLevel, Message, OperationId
| order by TimeGenerated desc

If you use Azure Firewall, NSG flow logs or destination logs, add the same time window. No firewall log for a documented test points back to DNS, route, local NSG or the wrong test point. An explicit deny points to the rule. A visible 403 response points more often to identity, IP allowlist or destination policy.

Choose the smallest correction

The correction must target the proven layer. Temporarily opening all outbound traffic or removing the default route can restore service, but it hides the cause and often creates a second security incident.

text outbound-correction-matrix.txt
Proven cause
Wrong DNS resolution
  Correction: zone, forwarder, resolver or app DNS configuration
  Validation: hostname returns the expected address from the workload path

UDR too broad or wrong next hop
  Correction: more specific route or corrected next hop
  Validation: traffic reaches the expected firewall or destination

NSG or firewall blocks the port
  Correction: targeted source, destination, port and protocol rule
  Validation: deny disappears and the application request is visible

NAT or outbound IP drift
  Correction: align NAT Gateway, route or destination allowlist
  Validation: destination sees the approved IP

Identity or destination policy
  Correction: application right or precise allowlist
  Validation: same request succeeds without broader network opening

The right change is the one that explains the symptom and can be validated again.

Prepare rollback without losing evidence

Rollback must be decided by layer: DNS, route, NSG, NAT, firewall or application version. It should not erase useful logs or make before/after comparison impossible.

text outbound-rollback.txt
Recent change
Route table or UDR
  Rollback: restore the previous route or remove the faulty route
  Evidence: observed next hop and documented connectivity test

NSG or firewall
  Rollback: return to the previous rule or apply a targeted temporary exception
  Evidence: deny identified, source/destination/port scope limited

NAT Gateway or outbound IP
  Rollback: reattach the previous NAT or restore the known allowlist
  Evidence: outbound IP observed by the destination

DNS or forwarder
  Rollback: restore the previous resolution path
  Evidence: timestamped FQDN resolution from the workload

Application deployment
  Rollback: return to the previous version only when the network path is clean
  Evidence: same dependency reachable with clean logs

Conclusion

A VNet Integration outbound failure should be handled as a production chain: workload, DNS, integration subnet, UDR, NSG, firewall, NAT, destination, logs and rollback. Diagnosis must prove whether the application can no longer resolve, can no longer route, is filtered, exits with the wrong IP or is rejected by the dependency.

The decision then becomes defensible: fix DNS when the name is wrong, UDR when the next hop is wrong, NSG or firewall when the deny is proven, NAT when the outbound IP drifts, and the application only when the network path is clean.