Infrastructure

Troubleshooting an Azure Linux VM that cannot join Active Directory

A practical diagnostic procedure for an Azure Linux VM that fails to join Active Directory, with DNS, routing, ports, Kerberos, realmd, and SSSD checks.

27 Apr 2026 azurelinuxactive-directorytroubleshootingdns

An Azure Linux VM that cannot join Active Directory should not be diagnosed by retrying realm join with random options. In an Azure spoke, the failure can come from VNet DNS, a route to the hub, a firewall, an NSG, the wrong resolver, clock drift, the join account, or incomplete SSSD configuration. The most reliable method is to follow the real chain in order.

The scenario is straightforward. A Linux VM is deployed in Azure. Active Directory is located in a hybrid network or in a hub. The VM must join the domain so that AD groups can be used for system administration. The join fails with a Kerberos error, LDAP error, domain not found message, or timeout. The goal is to identify where the chain breaks before changing local configuration.

1. Check what the VM actually uses

The first step is to collect the resolver, routes, and hostname seen by the operating system. In Azure, VNet settings, cloud-init, NetworkManager, or systemd-resolved can produce a result that differs from what was expected.

bash 01-local-context.sh
ip route show
cat /etc/resolv.conf
hostname -f || true

If the VM uses 168.63.129.16, it relies on Azure-provided DNS. That is not automatically wrong, but internal AD resolution must then be designed elsewhere. If it uses private IP addresses, those IPs must match the intended internal DNS servers or forwarders.

2. Validate Active Directory DNS

A domain join depends on SRV records. Resolving only the domain name or one domain controller is not enough.

bash 02-dns-test.sh
DNS="10.10.0.10"
DOMAIN="corp.example.local"

nslookup -type=SRV _ldap._tcp.$DOMAIN $DNS
nslookup -type=SRV _kerberos._tcp.$DOMAIN $DNS

If SRV records do not answer, there is no point retrying the join. The problem is DNS first. The correction may involve the VNet resolver, a conditional forwarder, an Azure DNS Private Resolver rule, or domain controller DNS registration.

3. Test the required network paths

When DNS works, connectivity to a domain controller must be checked. The first ports to validate are DNS, Kerberos, LDAP, SMB, and Kerberos password change.

bash 03-ad-ports.sh
DC="dc01.corp.example.local"

nc -vz $DC 53
nc -vz $DC 88
nc -vz $DC 389
nc -vz $DC 445
nc -vz $DC 464

A timeout often points to a missing route, UDR, NSG, hub firewall, or asymmetric return path. An immediate refusal points more toward a closed service or explicit filtering.

4. Check time before Kerberos

Kerberos does not tolerate significant clock drift. A VM with an incorrect time can return authentication errors even when the account is valid.

bash 04-time-check.sh
timedatectl status
chronyc tracking 2>/dev/null || true

If time is wrong, fix synchronization before continuing. Changing SSSD in that state only adds noise.

5. Test Kerberos without realmd

Before joining, kinit can confirm whether the VM can obtain a Kerberos ticket. This is more precise than realm join.

bash 05-kerberos-test.sh
REALM="CORP.EXAMPLE.LOCAL"
USER="join-user"

kinit $USER@$REALM
klist
kdestroy

Cannot find KDC often points to DNS SRV records or routing. A preauthentication failure points to the account, password, or Kerberos policy. A clock skew error points to time.

6. Validate realmd and domain discovery

After DNS, ports, and Kerberos are validated, the join tools become useful to inspect.

bash 06-domain-discovery.sh
DOMAIN="corp.example.local"

realm discover $DOMAIN
adcli info $DOMAIN

If adcli info works but realm discover fails, check packages and realmd behavior. If both fail while DNS and Kerberos are good, inspect LDAP and distribution dependencies.

7. Run and validate the join

The join should use a dedicated account and, when available, the OU intended for Linux machines. After a successful join, validate that the domain appears, SSSD is coherent, and AD users and groups resolve.

bash 07-post-join.sh
DOMAIN="corp.example.local"
TEST_USER="user@$DOMAIN"
TEST_GROUP="linux-admins@$DOMAIN"

realm list
sssctl config-check
sssctl domain-status $DOMAIN
getent passwd "$TEST_USER"
getent group "$TEST_GROUP"

If realm list is correct but getent returns nothing, the issue is likely in SSSD, NSS, name format, or group resolution. If groups resolve but final access fails, inspect PAM, SSH configuration, allowed groups, and home directory creation.

Conclusion

The useful sequence is stable: local context, SRV records, AD ports, time, Kerberos, domain discovery, join, then SSSD validation. This order avoids random VM changes while the cause often sits in Azure, the network hub, or internal DNS. A realm join error is only the visible symptom; the useful diagnosis is the one that identifies the broken link precisely.