Recently I encountered an interesting Kerberos authentication issue in our environment that I thought would be worth sharing, as it highlights the importance of understanding PAC validation in the Kerberos authentication process.
The Problem
I was troubleshooting an authentication failure where clients couldn't access a particular service. The symptoms pointed to Kerberos issues, specifically problems with validating the Privilege Attribute Certificate (PAC).
Understanding the Kerberos Flow with PAC Validation
First, let's understand what's supposed to happen in a normal Kerberos authentication flow with PAC validation:
- The client sends an AP-REQ (Application Request) to the server
- The server forwards the PAC from the client's ticket to the Domain Controller for verification via KERB_VERIFY_PAC
- The DC validates the PAC and returns an RPC status code
- The server then responds to the client with an AP-REP (Application Reply)
This process is critical because the PAC contains the authorization information that determines what the user is allowed to do. The server needs to verify this information is legitimate before granting access.
Investigating the Issue
I started my investigation by checking the event logs on the server. I found several errors in the Security-Kerberos event log:
The critical error message stated:
"The digitally signed Privilege Attribute Certificate (PAC) that contains the authorization information for client bear.paws in realm BEAR.LOCAL could not be validated."
This clearly pointed to a PAC validation failure, but now we need to look at why this occurring, usually this will occur with Domain Trust issues but in this scenario the user, client and server were all in the same domain, so that seemed unlikely.
To dig deeper, I captured network traffic during the authentication attempt:
In the capture, I could see the initial Kerberos TGS-REQ (Ticket Granting Service Request) and subsequent TLS traffic, but the actual AP request details weren't visible since they were encrypted in the Application Data packets.
Netlogon.log : Finding Conclusive Evidence
The real smoking gun came from examining the Netlogon.log on the server at exactly 08:24:41 server time (07:24:41 UTC), I found:
05/02 08:24:41 [LOGON] [2620] SamLogon: Kerberos Ticket logon of BEAR.LOCAL\bear.paws
from AppMgr1 Operations:0x30001 Entered
...several connection and security context initialization entries...
05/02 08:24:41 [CRITICAL] [2620] BEAR: NlpUserValidateHigher:
denying access after status: 0xc003000c 1
The status code 0xc003000c translates to RPC_NT_BAD_STUB_DATA, suggesting that during the PAC validation, corrupted data was being transmitted.
Collecting the Diagnostic Data
Before analyzing any Kerberos issues, you need to collect the right diagnostic data. The most effective way to gather comprehensive logs for authentication issues is using Microsoft's TSS (Transactional Support Scripts) tool.
Setting up the TSS Tool
- Download the TSS tool from Microsoft's official link: aka.ms/getTSS
- Once downloaded, run PowerShell as administrator
- Navigate to the folder where the tool was downloaded and run it. The tool will attempt to self-update to the latest version if needed.
Running the Authentication Scenario Trace
To collect data for authentication issues, I used the following command:
.\tss.ps1 -scenario ADS_Auth
When this command runs, it will start several ETW (Event Tracing for Windows) traces tailored for authentication troubleshooting. you'll see what I call the "mission clock" - a box displaying the current time, username, and server.
Important Tips for Data Collection
- Collect on both sides: Run this trace simultaneously on both the client experiencing the issue and the server it's trying to connect to
- Note exact timestamps: Record the precise time when the error occurs
- Reproduce the issue: After starting the trace, reproduce the authentication problem
- End the trace correctly: When you see the "reproduce done?" prompt, only press 'Y' AFTER you've successfully reproduced the error
- Time zone awareness: If you're in British Summer Time (GMT+1), remember that logs will be recorded in UTC time. So an 8:46 AM failure will appear as 7:46 AM in the logs. This only applies during daylight saving time.
The TSS tool will generate a comprehensive ZIP file containing all relevant logs for authentication analysis.
Analyzing Kerberos ETL Logs
A powerful but often overlooked troubleshooting tool is the Kerberos ETL (Event Tracing for Windows) logging. Here's how I captured and analyzed these logs:
-
First, I enabled Kerberos ETL logging using the following PowerShell commands:
# Create a trace session for Kerberos wevtutil.exe sl Microsoft-Windows-Kerberos/Operational /e:true # For more detailed tracing netsh trace start scenario=NetConnection capture=yes report=disabled persistent=no maxsize=1024 correlation=disabled traceFile=C:\temp\Kerberos.etl
-
After reproducing the issue, I stopped the trace:
netsh trace stop
-
To analyze the ETL file, I used Microsoft Network Monitor with the Microsoft-Windows-Kerberos parser or Windows Performance Analyzer (WPA).
Final confirmation of our issue came from parsing these Kerberos ETL logs:
[KERBEROS] TicketLogonClient_cxx870 PacValidation::TicketLogon() - Domain controllers
did not support ticket logon
[KERBEROS] TicketLogonClient_cxx1093 KerbVerifyPacSignature() - Ticket logon was required
to validate ticket signatures but was not supported
[KERBEROS] krbtoken_cxx3084 KerbCreateTokenFromTicketEx() - Pac signature did not verify:
domain BEAR.LOCAL, status c000006d
The status code c000006d (STATUS_LOGON_FAILURE) indicates the authentication was rejected due to invalid credentials - in this case, an invalid PAC signature.
Root Cause and Solution
The root cause in this scenario was that the domain controllers had not been patched to a recent enough version to recognize the new PAC structure. Specifically, I discovered that all domain controllers needed to be patched to at least the April 2024 updates for them to properly validate the new PAC structure.
During PAC validation, the newer clients were sending a PAC format that older domain controllers simply didn't recognize, resulting in the "bad stub data" error and the subsequent authentication failure.
To resolve this issue, I performed the following steps:
- Identified all domain controllers that hadn't received the April 2024 update
- Applied the necessary security patches to these domain controllers
- Verified the patch installation was successful
- Tested the authentication again to confirm the PAC validation was working
After these steps, the PAC validation started working properly, and users could authenticate successfully.
Lessons Learned
This troubleshooting exercise highlights several important aspects of Kerberos authentication, but without log data and evidence you can end up guessing what is going on and the Interest will gladly take you down a rabbit hole of unnecessariness.
- PAC validation is a critical security step that shouldn't be disabled
- Multiple logs need to be correlated to get the complete picture (event logs, network captures, Netlogon.log, and Kerberos ETL)
- Trust relationship issues can manifest as PAC validation failures and can send you down a unnecessary rabbit holes
- Understanding the entire authentication flow helps identify where problems occur