Cloud

Azure Storage: diagnose a private endpoint without opening the account

An operational runbook for Azure Storage private access failures by separating DNS, Private Endpoint, firewall, identity, logs and rollback evidence.

13 Jun 2026 azurestorageprivate-endpointdnskqllogsmonitoringrunbookidentityrollbackautomation

An Azure Storage account behind a Private Endpoint can become unavailable without the consuming service, pipeline or application being the real cause. The failure may come from public DNS resolution, a missing Storage subresource, a firewall rule, a managed identity without the right role, incomplete Terraform drift, or a client that still calls the public endpoint.

The use case is deliberately common: an internal application, automation job or Function must read a blob, write a file or access a queue through a private path. The runbook has one goal: prove where access fails before reopening the Storage account to all networks, adding a broad role or redeploying the application.

Read Storage as multiple endpoints

A Storage account does not have a single network path. Blob, queue, table, file and dfs can each have their own Private Endpoint and private DNS record. A reliable diagnosis starts by naming the subresource that is actually called.

text storage-private-path.txt
Consumer
Application, Function, CI pipeline, runbook or diagnostic VM
Resolves the Storage name from the same network as the workload
Sends a useful correlation ID or client request ID

Private DNS
privatelink.blob.core.windows.net for Blob
privatelink.queue.core.windows.net for Queue
privatelink.table.core.windows.net for Table
privatelink.file.core.windows.net for File
privatelink.dfs.core.windows.net for ADLS Gen2

Storage account
Private Endpoint approved for the right subresource
Public network access and firewall aligned with the target design
Diagnostic settings enabled for useful logs

Identity
Managed identity, service principal or explicit SAS
Role limited to the required container, queue or account scope

Rollback
Restore DNS, firewall, role or client configuration according to the changed layer

This view avoids a common mistake: validating blob while the application uses dfs, or testing from a workstation that does not see the same DNS as the workload.

Classify the symptom before fixing

The first triage step is to separate a private path failure from an authorization or application usage failure. Storage HTTP codes are useful, but they must be read with the test location and subresource.

text storage-private-symptoms.txt
Symptom
DNS returns a public address
  Check the subresource privatelink zone, VNet link and hybrid DNS forwarding

Timeout or connection refused
  Check Private Endpoint, NSG, effective route and test location

403 with AuthenticationFailed or AuthorizationPermissionMismatch
  Check real identity, RBAC role, scope and propagation delay

403 with firewall or network rules
  Check public network access, selected networks, trusted services and real source

No Storage logs for the client request ID
  Go back to DNS, routing, public endpoint or wrong subresource

Failure after Terraform change
  Compare Private Endpoint, private DNS zone group, firewall and applied role

The operating rule is simple: until the name resolves privately from the consumer network, an application fix is premature.

Test from the consumer network

The test must start from a diagnostic VM, private runner, application subnet or operations bastion that uses the same DNS as the real workload. It must also target the exact subresource.

bash 01-storage-private-check.sh
ACCOUNT=stprodorders
SERVICE=blob
HOSTNAME="$ACCOUNT.$SERVICE.core.windows.net"
CONTAINER=health
CLIENT_REQUEST_ID="ops-$(date +%Y%m%d%H%M%S)"

nslookup "$HOSTNAME"
dig +short "$HOSTNAME"

openssl s_client -connect "$HOSTNAME:443" -servername "$HOSTNAME" </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer

az storage blob list --account-name "$ACCOUNT" --container-name "$CONTAINER" --auth-mode login --only-show-errors --debug 2>&1 | tee "storage-$CLIENT_REQUEST_ID.log"

echo "client_request_id=$CLIENT_REQUEST_ID"

When the test uses --auth-mode login, it also validates the connected Azure CLI identity. For a workload, the next step is to confirm the real runtime identity: application managed identity, CI service principal, OIDC federation or explicit SAS.

Check the account without broadening access

The following commands provide an operational view without exposing secrets. They separate network state, Private Endpoints, private DNS and permissions.

bash 02-storage-platform-checks.sh
RG=rg-prod-data
ACCOUNT=stprodorders

az storage account show -g "$RG" -n "$ACCOUNT" --query "{kind:kind, sku:sku.name, publicNetworkAccess:publicNetworkAccess, allowBlobPublicAccess:allowBlobPublicAccess, defaultAction:networkRuleSet.defaultAction}" -o jsonc

az storage account network-rule list -g "$RG" -n "$ACCOUNT" -o jsonc

STORAGE_ID=$(az storage account show -g "$RG" -n "$ACCOUNT" --query id -o tsv)

az network private-endpoint-connection list --id "$STORAGE_ID" --query "[].{name:name,status:privateLinkServiceConnectionState.status,groupIds:groupIds,description:privateLinkServiceConnectionState.description}" -o table

az network private-dns zone list --query "[?contains(name, 'privatelink') && contains(name, 'core.windows.net')].name" -o table

az role assignment list --scope "$STORAGE_ID" --query "[].{principal:principalName, role:roleDefinitionName, scope:scope}" -o table

Common drift is visible here: Private Endpoint approved for blob but not dfs, firewall set to Deny without a valid private source, private zone not linked to the consumer VNet, or RBAC role assigned at the wrong scope.

Correlate Storage access in KQL

Storage logs should separate network denial, authorization denial and clients using the wrong endpoint. The query below starts from one account, one subresource and a short window.

kusto 03-storage-private-access-correlation.kql
let Window = 2h;
let Account = "stprodorders";
StorageBlobLogs
| where TimeGenerated > ago(Window)
| where AccountName == Account
| project TimeGenerated, AccountName, OperationName, StatusCode, StatusText, AuthenticationType, RequesterObjectId, Uri, CallerIpAddress, UserAgentHeader, ClientRequestId
| order by TimeGenerated desc

Quick read: no logs for a correlated test points to DNS, routing or the wrong subresource; 403 with a visible identity points to RBAC or SAS; 403 without useful identity often points to firewall, public endpoint or invalid signature.

Choose the smallest rollback

Rollback should not reopen the whole Storage account by reflex. It should restore the layer that changed and produce observable evidence.

text storage-private-rollback-matrix.txt
Recent change
Private DNS zone or zone group
  Rollback: restore the previous record, VNet link or zone group
  Evidence: private resolution from the workload and correlated Storage log

Firewall or public network access
  Rollback: restore the previous network rule
  Evidence: same client request ID visible with the expected status

RBAC role or managed identity
  Rollback: restore the role at the previous scope
  Evidence: same identity succeeds without broadening to the whole account when fine scope is enough

Client change or environment variable
  Rollback: restore previous endpoint, subresource or credential
  Evidence: the client calls the expected private hostname

Conclusion

A private Storage incident is rarely fixed by one switch. Name the subresource, test from the right network, prove private resolution, read Private Endpoints and correlate logs before touching code or opening the firewall.

The decision stays operational: fix DNS when the path is public, fix Private Endpoint when the subresource is missing, fix RBAC when the identity is visible but denied, and limit rollback to the layer that actually changed.