Cloud
Azure SQL: diagnose a private endpoint before changing the database
An operational runbook for Azure SQL private access failures by separating DNS, Private Endpoint, firewall, identity, SQL logs and rollback evidence.
An Azure SQL database often exposes an ambiguous symptom: timeout from the application, connection error, authentication denial, no database-side logs, or a 40615 firewall response. When the server sits behind a Private Endpoint, those signals should not be treated as one SQL failure. The issue may come from private DNS, the caller subnet, an unapproved endpoint, a still-public network rule, a managed identity, a Microsoft Entra group or a changed connection string.
The use case is an internal application, automation job or private API that must reach Azure SQL without public exposure. The runbook has one goal: prove where the connection stops before changing schema, redeploying the application, opening the SQL server or broadening an identity permission.
Read Azure SQL as a complete path
Diagnosis must follow the real chain between the consumer and the database. Validating the SQL server in the portal is not enough when the workload does not resolve the same name, leave through the same network or use the same identity.
Consumer
App Service, Function, API, CI runner or diagnostic VM
Resolves the SQL FQDN from the same network as the workload
Uses the real connection string and identity
Private DNS
server.database.windows.net must follow the privatelink.database.windows.net chain
The final answer must be a private address from the expected VNet
Network path
Private Endpoint approved for the SQL server
Private zone linked to the consumer VNet or valid hybrid forwarding
Public access aligned with the security target
Identity and authorization
SQL login, Microsoft Entra ID, managed identity or service principal
Minimum rights on the database, not only on the server
Evidence
Test from the consumer network
SQL logs, connection metrics, application traces and correlation ID This view avoids two dangerous shortcuts: opening the firewall because the application sees a timeout, or changing SQL permissions while the name still resolves to a public endpoint.
Classify the symptom before fixing
The first triage step is to separate private path failures, network denials and identity denials. The error message alone is not enough; it must be read with the test location.
Observed symptom
DNS returns a public address
Check privatelink.database.windows.net, VNet link and DNS forwarders
Timeout before login
Check Private Endpoint, routing, NSG, local firewall and test location
40615 or firewall error
Check public network access, firewall rules and the source actually seen by SQL
Login failed or principal not found
Check real identity, authentication method, contained database user and Entra ID groups
No SQL-side trace
Go back to DNS, routing, public endpoint or connection string
Failure after Terraform or pipeline change
Compare Private Endpoint, private DNS zone group, SQL firewall and deployed identity The operating rule is stable: until the SQL FQDN resolves privately from the consumer network, an application or SQL fix is premature.
Test from the right network
The test must run from a diagnostic VM, private runner, application subnet or operations bastion that shares the workload DNS. The goal is not to prove that SQL responds from your laptop, but that the production path is coherent.
SERVER=sql-prod-orders
DATABASE=orders
HOSTNAME="$SERVER.database.windows.net"
nslookup "$HOSTNAME"
dig +short "$HOSTNAME"
openssl s_client -connect "$HOSTNAME:1433" -servername "$HOSTNAME" </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
az sql db show-connection-string --client ado.net --server "$SERVER" --name "$DATABASE" --auth-type ADIntegrated
sqlcmd -S "$HOSTNAME" -d "$DATABASE" -G -l 10 -Q "select @@servername as server_name, db_name() as database_name, sysdatetimeoffset() as checked_at;" If sqlcmd is not available on the test point, keep at least nslookup, dig, openssl and a correlated application attempt. The important part is to preserve the timestamp, hostname, expected identity and exact error.
Check the platform without opening the server
The following commands give a quick view of network and SQL configuration without changing the exposure perimeter.
RG=rg-prod-data
SERVER=sql-prod-orders
az sql server show -g "$RG" -n "$SERVER" --query "{fqdn:fullyQualifiedDomainName, publicNetworkAccess:publicNetworkAccess, minimalTlsVersion:minimalTlsVersion}" -o jsonc
SQL_ID=$(az sql server show -g "$RG" -n "$SERVER" --query id -o tsv)
az network private-endpoint-connection list --id "$SQL_ID" --query "[].{name:name,status:privateLinkServiceConnectionState.status,description:privateLinkServiceConnectionState.description}" -o table
az network private-dns zone show -g "$RG" -n privatelink.database.windows.net --query "{name:name, records:numberOfRecordSets}" -o jsonc
az sql server firewall-rule list -g "$RG" -s "$SERVER" -o table
az sql server ad-admin list -g "$RG" -s "$SERVER" -o table Typical drift is visible here: pending Private Endpoint, private zone absent from the consumer VNet, public access still enabled by a legacy exception, missing Entra ID admin or a firewall rule created for an old runner.
Correlate errors in Log Analytics
If SQL diagnostics are sent to Log Analytics, a short query helps separate network errors, authentication errors and missing activity. Exact table names can vary by diagnostic settings, but the logic stays the same: start from the window, server, database and message.
let Window = 2h;
let Server = "sql-prod-orders";
let Database = "orders";
AzureDiagnostics
| where TimeGenerated > ago(Window)
| where ResourceProvider == "MICROSOFT.SQL"
| where Resource has Server or database_name_s == Database
| where action_name_s has_any ("DATABASE AUTHENTICATION", "LOGOUT", "BATCH COMPLETED")
or error_number_d in (40615, 18456, 18452, 18470)
or statement_s has_any ("Login failed", "firewall", "denied")
| project TimeGenerated, Resource, database_name_s, action_name_s, error_number_d, succeeded_s, session_server_principal_name_s, client_ip_s, statement_s
| order by TimeGenerated desc Quick read: no line for the test points back to DNS, routing or wrong endpoint; 40615 points to firewall or public access; 18456 and similar errors point to identity, contained user, Entra ID group or stale secret.
Choose the smallest rollback
Rollback should restore the layer that actually changed, not bypass the whole private chain.
Recent change
Private DNS zone, VNet link or forwarding
Rollback: restore the previous link or forwarding path
Evidence: the FQDN returns the private address from the workload
Private Endpoint or public access
Rollback: restore previous Private Endpoint state and publicNetworkAccess
Evidence: correlated sqlcmd test and visible SQL logs
Managed identity, Entra ID group or secret
Rollback: restore the previous identity or permission scope
Evidence: same identity connects without broader rights than needed
Connection string or environment variable
Rollback: restore previous server, database, auth mode and TLS options
Evidence: the connection uses the expected private FQDN Conclusion
An Azure SQL private incident should be handled as an operational path, not as an isolated database failure. The reliable order is stable: DNS resolution from the consumer network, Private Endpoint, public exposure, authentication, logs and limited rollback.
This method avoids overly broad reactions. Fix DNS when the name goes public, Private Endpoint when the private path is missing, firewall when SQL sees an unexpected source, identity when the connection reaches SQL but fails, and application configuration only when network and platform evidence are clean.