Cloud

Private Azure Key Vault: diagnose DNS, managed identity, and network access without mixing everything

An operational method to analyze access failures to Azure Key Vault behind Private Endpoint by separating DNS resolution, network path, managed identity, RBAC, and application configuration.

31 May 2026 azurekey-vaultprivate-endpointmanaged-identitydnstroubleshooting

A Key Vault placed behind a Private Endpoint is often seen as a simple network lock: the application is in Azure, the vault is private, the managed identity reads secrets, so everything should work. In practice, failures look too similar. An authorization error, bad DNS resolution, a badly routed subnet, or incomplete application configuration can produce the same symptom from the application side: the secret cannot be retrieved.

The scenario here is deliberately common. An application hosted on App Service or Azure Functions must read a secret from Azure Key Vault. The vault is no longer publicly exposed, a Private Endpoint exists in a dedicated subnet, a private DNS zone privatelink.vaultcore.azure.net is linked to the VNet, and the application uses a managed identity. After the move to private access, reads fail intermittently or permanently.

The goal is not to list every Key Vault option. The goal is to build a readable diagnostic method: first prove the name resolution, then the network path, then the identity used, then the permissions actually granted. This separation avoids fixing a permission when the problem is DNS, or reopening public access when only the application identity is wrong.

Write the expected model before testing

Before running commands, write the expected path. The vault has a public name even when its useful access goes through a private address. The application usually calls https://<vault-name>.vault.azure.net. DNS resolution must return the Private Endpoint private IP, not the public service address. Only then does it make sense to evaluate the application identity against Key Vault.

text key-vault-private-path.txt
Application
App Service or Function App
System-assigned or user-assigned managed identity
VNet integration if the service must reach the private network

Expected DNS resolution
myvault.vault.azure.net
-> CNAME myvault.privatelink.vaultcore.azure.net
-> Private A record in the VNet

Key Vault
Public network access limited or disabled depending on the design
Approved Private Endpoint
RBAC or access policy aligned with the application identity

This model gives a reading grid. If the name does not resolve to a private IP, identity does not matter yet. If the name resolves correctly but the token is issued for another identity, the network is no longer the main topic. If the token is correct but Key Vault refuses the operation, inspect RBAC or access policies.

Verify that the Private Endpoint exists and is approved

The first control is the Private Endpoint state. An endpoint created but not approved, deleted and recreated, or attached to the wrong vault gives an unstable base for diagnosis. Also distinguish the Private Endpoint itself from the DNS record that lets clients find it.

bash 01-check-private-endpoint.sh
az network private-endpoint show -g rg-network-prod -n pe-kv-prod --query '{name:name, subnet:subnet.id, state:provisioningState, connections:privateLinkServiceConnections[].privateLinkServiceConnectionState}'

az keyvault show -g rg-app-prod -n kv-prod-app --query '{name:name, publicNetworkAccess:properties.publicNetworkAccess, networkAcls:properties.networkAcls.defaultAction}'

An approved Private Endpoint does not guarantee that the application uses it. It only confirms that the private entry point exists. The next question is therefore: from the application network context, which name is resolved?

Test DNS resolution from the right place

Testing DNS from a workstation, Azure Cloud Shell, or any random VM can create false confidence. The test must be run from a point that shares the same DNS path as the application, or at least from a resource in the same VNet and subject to the same private zone links.

For App Service and Functions, a temporary diagnostic endpoint or the Kudu console can help test resolution. For a jump VM in the same VNet, nslookup already provides useful proof.

bash 02-check-dns.sh
nslookup kv-prod-app.vault.azure.net

# Expected result in the private VNet
# kv-prod-app.vault.azure.net canonical name = kv-prod-app.privatelink.vaultcore.azure.net
# Address: 10.42.20.14

If the response points to a public address, the problem is probably DNS: private zone not linked to the VNet, wrong resolver, incomplete on-premises forwarder, unlinked spoke, or an application that does not use the expected DNS path after VNet integration. In that case, adding a Key Vault role to the identity will not change anything.

For Key Vault, the expected private zone is usually privatelink.vaultcore.azure.net. It must contain the vault record and be linked to the VNets from which clients resolve the name. In a hub-and-spoke architecture, decide whether spokes resolve directly through zone links, through Private Resolver, or through a centralized DNS path. The important point is that the design is explicit.

bash 03-check-private-dns-zone.sh
az network private-dns record-set a list -g rg-network-prod -z privatelink.vaultcore.azure.net --query '[].{name:name, records:aRecords[].ipv4Address}'

az network private-dns link vnet list -g rg-network-prod -z privatelink.vaultcore.azure.net --query '[].{name:name, vnet:virtualNetwork.id, registration:registrationEnabled}'

A correct record in the zone is not enough if the consumer VNet is not linked, or if queries go through a resolver that does not know the zone. Conversely, multiplying zone links without a resolution model can make incidents harder to explain.

Separate network access from Key Vault authorization

Once private resolution is confirmed, test HTTPS connectivity. A 403 response from Key Vault is very different from a timeout or DNS error. A 403 often proves that the service responds, but the identity or permission is wrong. A timeout points more toward a network path, routing, firewall, or resolution issue leading to the wrong endpoint.

bash 04-read-http-signal.sh
curl -I https://kv-prod-app.vault.azure.net/

# Useful signals
# 401 or 403: Key Vault responds, inspect identity and permissions
# Could not resolve host: inspect DNS
# Connection timed out: inspect network path, VNet integration, routes, NSG, firewall
# Certificate error: verify TLS interception, hostname, and proxy path

This test does not validate full application access. It simply provides direction. During an incident, that direction avoids opening everything just to “see if it works”.

Identify the identity actually used by the application

The managed identity can be system-assigned or user-assigned. An application can also have several identities attached. The code, SDK, or configuration must therefore use the expected identity. Otherwise, rights may be granted to the right identity on paper, while the application presents another principal to Key Vault.

bash 05-check-managed-identity.sh
az webapp identity show -g rg-app-prod -n app-prod-api --query '{principalId:principalId, tenantId:tenantId, userAssignedIdentities:userAssignedIdentities}'

az functionapp identity show -g rg-app-prod -n func-prod-worker --query '{principalId:principalId, tenantId:tenantId, userAssignedIdentities:userAssignedIdentities}'

For a user-assigned managed identity, the code or configuration may need to specify a client ID. This is a classic issue when an application works in development with a local identity or default identity, then fails in production because the authentication chain does not select the expected principal.

Verify the vault permission model

Key Vault can use Azure RBAC or the historical access policy model depending on configuration. Diagnosis must respect the active model. Mixing both readings produces false conclusions: you may see a correct access policy while the vault uses RBAC, or look for an RBAC role while access policies still drive permissions.

bash 06-check-key-vault-permissions.sh
az keyvault show -g rg-app-prod -n kv-prod-app --query '{enableRbacAuthorization:properties.enableRbacAuthorization}'

az role assignment list --assignee 00000000-0000-0000-0000-000000000000 --scope /subscriptions/<sub>/resourceGroups/rg-app-prod/providers/Microsoft.KeyVault/vaults/kv-prod-app --query '[].{role:roleDefinitionName, scope:scope}'

az keyvault show -g rg-app-prod -n kv-prod-app --query 'properties.accessPolicies[].{objectId:objectId, permissions:permissions}'

The role or policy must match the real operation. Reading a secret, listing secrets, reading a key, or decrypting with a key are not the same actions. To reduce risk, grant the application the required right rather than broad permissions “to unblock”.

Build useful incident evidence

When the incident is resolved, keep proof of the diagnosis. A short ticket note is often enough, provided it separates the layers. This also helps avoid the same confusion during the next network change.

text key-vault-incident-evidence.txt
Private Key Vault incident
Application: app-prod-api
Vault: kv-prod-app
Tested FQDN: kv-prod-app.vault.azure.net
DNS from application VNet: 10.42.20.14 via privatelink.vaultcore.azure.net
Private Endpoint: approved
HTTP signal: 403 Key Vault, not timeout
Presented identity: system-assigned principalId app-prod-api
Cause: secret read role missing on the vault
Correction: Key Vault Secrets User assignment at vault scope
Post-correction check: application secret read OK

This format makes the conclusion defensible. It shows that the private network worked, that Key Vault responded, and that the correction was authorization-related rather than a network opening.

Common mistakes to avoid

The first mistake is re-enabling public access for a quick check. That test can mask the real problem and leave a forgotten exception. If a temporary workaround is unavoidable, it must be traced, limited, and removed.

The second mistake is granting overly broad rights to the identity. A high role at resource group level can unblock reading, but it unnecessarily expands scope. Diagnosis should target the useful right at the right scope.

The third mistake is testing from the wrong network. An administrator workstation resolving the vault correctly does not prove that the application resolves it. In a private architecture, where you test from is part of the result.

The fourth mistake is forgetting serverless applications. A Function App or App Service that must reach a private service needs a coherent outbound design: VNet integration, DNS, possible routes, and platform dependencies included.

Conclusion

A private Key Vault should not be diagnosed as a single block. Separate the name, the path, the identity, and the right. DNS resolution proves that the application targets the Private Endpoint. The HTTP signal distinguishes a reachable service from a broken path. The managed identity tells who speaks to Key Vault. RBAC or access policies finally decide whether that identity can read the expected secret.

This method is slower than a blind change, but it produces an operable result: a targeted correction, readable proof, and a design that stays private without becoming incomprehensible during the first incident.