Infrastructure

Linux and Active Directory: troubleshoot SSSD failures that appear after the join

A diagnostic procedure for Linux Active Directory incidents after a successful join: SSSD, Kerberos, DNS, cache, time, machine account and distribution differences.

21 May 2026 linuxactive-directorysssdkerberosrealmdtroubleshooting

A successful Active Directory join does not guarantee stable Linux integration. realm join can finish without errors, the machine account can appear in the expected OU, and problems may start later: users not found, intermittent authentication, incomplete groups, slow logins, inconsistent SSSD cache or hard-to-read Kerberos errors.

The scenario starts after the join. The Linux machine is expected to authenticate AD users through SSSD. DNS points to domain controllers, time is synchronized and the configuration looks correct. The goal is to diagnose operational failures methodically.

Start again from the base layers

SSSD depends on several layers. If DNS, time or Kerberos are unstable, SSSD cannot compensate. Do not start by clearing cache or changing sssd.conf randomly. First verify prerequisites from the affected machine.

bash 01-basic-checks.sh
hostname -f
realm list
resolvectl status || cat /etc/resolv.conf
timedatectl

nslookup _ldap._tcp.example.local
nslookup _kerberos._tcp.example.local

klist -k
kinit user@example.local
klist

If SRV records do not answer correctly, the issue is probably DNS. If kinit fails while the password is correct, check time, realm, domain controllers or account state. If Kerberos works but users do not appear through NSS, focus on SSSD.

Verify what SSSD really sees

id, getent and sssctl provide a more useful view than a login test alone. They show whether the user is found, groups resolve and the domain is online.

bash 02-sssd-checks.sh
systemctl status sssd --no-pager
sssctl domain-status example.local
sssctl user-checks user@example.local
getent passwd user@example.local
id user@example.local

An offline SSSD domain can come from an unavailable domain controller, DNS issue, filtered port or Kerberos error. A missing user can come from an LDAP filter, use_fully_qualified_names, attribute mapping or stale cache.

Read logs at the right level

SSSD logs are often too quiet by default. Temporarily raising debug level can help understand a failure. Keep this controlled because logs can grow quickly and may contain sensitive information.

ini sssd-debug.conf
[domain/example.local]
debug_level = 6

[sssd]
debug_level = 6

[nss]
debug_level = 6

[pam]
debug_level = 6

After the change, restart SSSD and reproduce the issue with a specific user. Then return to the normal level. The goal is not to leave production in debug mode, but to capture a readable trace.

bash 03-logs.sh
systemctl restart sssd
id user@example.local
journalctl -u sssd --since "10 minutes ago"
ls -l /var/log/sssd/
tail -n 200 /var/log/sssd/sssd_example.local.log

Important messages often relate to LDAP discovery, selected domain controller, Kerberos errors, timeouts or access denials caused by filters.

Understand the cache before deleting it

Clearing SSSD cache can fix a symptom, but it should not be automatic. Cache explains some behaviors: a recently changed group does not appear, a disabled user remains visible, or an error persists after an AD-side fix. Know whether the incident is cache-related or a communication failure.

bash 04-cache.sh
sssctl cache-status
sss_cache -u user@example.local
sss_cache -E
systemctl restart sssd

sss_cache -E invalidates all cache. It is useful for a test, but in production understand why the cached value was harmful. Otherwise the incident will return after the next AD change.

Check the machine account

The Linux machine account in Active Directory can fail after password rotation, OU movement, VM restore or snapshot use. A cloned machine may also share an AD identity with another host. Symptoms can look like SSSD failure while the machine trust is broken.

bash 05-machine-account.sh
realm list
adcli testjoin
net ads testjoin 2>/dev/null || true
klist -k | head

# Depending on distribution and available tooling
adcli info example.local

If adcli testjoin fails, fix the machine account before changing SSSD filters. Rejoining the domain may be necessary, but only after understanding the impact on OU, allowed groups and configuration files.

Conclusion

SSSD failures after an Active Directory join are easier to diagnose when DNS, time, Kerberos, machine account, NSS resolution and cache are separated. Changing sssd.conf without that separation often creates fragile fixes.

A good runbook fits in a few commands: verify DNS SRV records, test Kerberos with kinit, inspect domain state with sssctl, read logs at the right level, invalidate cache selectively and test the machine account. With that method, Linux AD integration remains operable beyond the first successful realm join.