Cloud
Azure Storage: diagnose a private endpoint without opening the account
An operational runbook for Azure Storage private access failures by separating DNS, Private Endpoint, firewall, identity, logs and rollback evidence.
An Azure Storage account behind a Private Endpoint can become unavailable without the consuming service, pipeline or application being the real cause. The failure may come from public DNS resolution, a missing Storage subresource, a firewall rule, a managed identity without the right role, incomplete Terraform drift, or a client that still calls the public endpoint.
The use case is deliberately common: an internal application, automation job or Function must read a blob, write a file or access a queue through a private path. The runbook has one goal: prove where access fails before reopening the Storage account to all networks, adding a broad role or redeploying the application.
Read Storage as multiple endpoints
A Storage account does not have a single network path. Blob, queue, table, file and dfs can each have their own Private Endpoint and private DNS record. A reliable diagnosis starts by naming the subresource that is actually called.
Consumer
Application, Function, CI pipeline, runbook or diagnostic VM
Resolves the Storage name from the same network as the workload
Sends a useful correlation ID or client request ID
Private DNS
privatelink.blob.core.windows.net for Blob
privatelink.queue.core.windows.net for Queue
privatelink.table.core.windows.net for Table
privatelink.file.core.windows.net for File
privatelink.dfs.core.windows.net for ADLS Gen2
Storage account
Private Endpoint approved for the right subresource
Public network access and firewall aligned with the target design
Diagnostic settings enabled for useful logs
Identity
Managed identity, service principal or explicit SAS
Role limited to the required container, queue or account scope
Rollback
Restore DNS, firewall, role or client configuration according to the changed layer This view avoids a common mistake: validating blob while the application uses dfs, or testing from a workstation that does not see the same DNS as the workload.
Classify the symptom before fixing
The first triage step is to separate a private path failure from an authorization or application usage failure. Storage HTTP codes are useful, but they must be read with the test location and subresource.
Symptom
DNS returns a public address
Check the subresource privatelink zone, VNet link and hybrid DNS forwarding
Timeout or connection refused
Check Private Endpoint, NSG, effective route and test location
403 with AuthenticationFailed or AuthorizationPermissionMismatch
Check real identity, RBAC role, scope and propagation delay
403 with firewall or network rules
Check public network access, selected networks, trusted services and real source
No Storage logs for the client request ID
Go back to DNS, routing, public endpoint or wrong subresource
Failure after Terraform change
Compare Private Endpoint, private DNS zone group, firewall and applied role The operating rule is simple: until the name resolves privately from the consumer network, an application fix is premature.
Test from the consumer network
The test must start from a diagnostic VM, private runner, application subnet or operations bastion that uses the same DNS as the real workload. It must also target the exact subresource.
ACCOUNT=stprodorders
SERVICE=blob
HOSTNAME="$ACCOUNT.$SERVICE.core.windows.net"
CONTAINER=health
CLIENT_REQUEST_ID="ops-$(date +%Y%m%d%H%M%S)"
nslookup "$HOSTNAME"
dig +short "$HOSTNAME"
openssl s_client -connect "$HOSTNAME:443" -servername "$HOSTNAME" </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
az storage blob list --account-name "$ACCOUNT" --container-name "$CONTAINER" --auth-mode login --only-show-errors --debug 2>&1 | tee "storage-$CLIENT_REQUEST_ID.log"
echo "client_request_id=$CLIENT_REQUEST_ID" When the test uses --auth-mode login, it also validates the connected Azure CLI identity. For a workload, the next step is to confirm the real runtime identity: application managed identity, CI service principal, OIDC federation or explicit SAS.
Check the account without broadening access
The following commands provide an operational view without exposing secrets. They separate network state, Private Endpoints, private DNS and permissions.
RG=rg-prod-data
ACCOUNT=stprodorders
az storage account show -g "$RG" -n "$ACCOUNT" --query "{kind:kind, sku:sku.name, publicNetworkAccess:publicNetworkAccess, allowBlobPublicAccess:allowBlobPublicAccess, defaultAction:networkRuleSet.defaultAction}" -o jsonc
az storage account network-rule list -g "$RG" -n "$ACCOUNT" -o jsonc
STORAGE_ID=$(az storage account show -g "$RG" -n "$ACCOUNT" --query id -o tsv)
az network private-endpoint-connection list --id "$STORAGE_ID" --query "[].{name:name,status:privateLinkServiceConnectionState.status,groupIds:groupIds,description:privateLinkServiceConnectionState.description}" -o table
az network private-dns zone list --query "[?contains(name, 'privatelink') && contains(name, 'core.windows.net')].name" -o table
az role assignment list --scope "$STORAGE_ID" --query "[].{principal:principalName, role:roleDefinitionName, scope:scope}" -o table Common drift is visible here: Private Endpoint approved for blob but not dfs, firewall set to Deny without a valid private source, private zone not linked to the consumer VNet, or RBAC role assigned at the wrong scope.
Correlate Storage access in KQL
Storage logs should separate network denial, authorization denial and clients using the wrong endpoint. The query below starts from one account, one subresource and a short window.
let Window = 2h;
let Account = "stprodorders";
StorageBlobLogs
| where TimeGenerated > ago(Window)
| where AccountName == Account
| project TimeGenerated, AccountName, OperationName, StatusCode, StatusText, AuthenticationType, RequesterObjectId, Uri, CallerIpAddress, UserAgentHeader, ClientRequestId
| order by TimeGenerated desc Quick read: no logs for a correlated test points to DNS, routing or the wrong subresource; 403 with a visible identity points to RBAC or SAS; 403 without useful identity often points to firewall, public endpoint or invalid signature.
Choose the smallest rollback
Rollback should not reopen the whole Storage account by reflex. It should restore the layer that changed and produce observable evidence.
Recent change
Private DNS zone or zone group
Rollback: restore the previous record, VNet link or zone group
Evidence: private resolution from the workload and correlated Storage log
Firewall or public network access
Rollback: restore the previous network rule
Evidence: same client request ID visible with the expected status
RBAC role or managed identity
Rollback: restore the role at the previous scope
Evidence: same identity succeeds without broadening to the whole account when fine scope is enough
Client change or environment variable
Rollback: restore previous endpoint, subresource or credential
Evidence: the client calls the expected private hostname Conclusion
A private Storage incident is rarely fixed by one switch. Name the subresource, test from the right network, prove private resolution, read Private Endpoints and correlate logs before touching code or opening the firewall.
The decision stays operational: fix DNS when the path is public, fix Private Endpoint when the subresource is missing, fix RBAC when the identity is visible but denied, and limit rollback to the layer that actually changed.