Powershell : Cleaning up duplicated exported SAML data

When managing SAML certificates across multiple applications, duplicate entries can create confusion and complicate maintenance tasks. This post demonstrates how to identify and remove duplicate entries from a CSV file containing SAML certificate information.

The Problem

A SAML certificate inventory CSV file contains duplicate entries, such as:

"Bear Honeypot","08/03/2024 14:06:25",
"EB762B68D7FE1107504B65504A0B472A3A5EEA9E72EFD7EADC13115EFCFB6F11"
"Bear Honeypot","08/03/2024 14:06:25",
"EB762B68D7FE1107504B65504A0B472A3A5EEA9E72EFD7EADC13115EFCFB6F11"

These duplicates likely occurred during data collection or system exports, but they need to be removed to maintain accurate records.

The Solution

The following PowerShell script efficiently removes duplicate entries by comparing all fields in each record:

# PowerShell script to remove duplicate entries from SAML_Details.csv
# Usage: .\Remove-SAMLDuplicates.ps1 -InputFile "SAML_Details.csv" 
-OutputFile "SAML_Details_Unique.csv"

param(
    [Parameter(Mandatory=$true)]
    [string]$InputFile,
    
    [Parameter(Mandatory=$false)]
    [string]$OutputFile = "SAML_Details_Unique.csv"
)

# Check if input file exists
if (-not (Test-Path $InputFile)) {
    Write-Error "Input file '$InputFile' does not exist!"
    exit 1
}

Write-Host "Reading CSV file: $InputFile"

# Import the CSV file
$data = Import-Csv -Path $InputFile

# Initial record count
$initialCount = $data.Count
Write-Host "Total records in original file: $initialCount"

# Create a hashtable to track unique records
$uniqueRecords = @{}
$uniqueData = @()

foreach ($row in $data) {
    # Create a unique key by combining all values
    $key = "$($row.ApplicationName)|$($row.CertificateEndDate)|$($row.CertificateThumbprint)"
    
    # If this key hasn't been seen before, add it to our unique records
    if (-not $uniqueRecords.ContainsKey($key)) {
        $uniqueRecords[$key] = $true
        $uniqueData += $row
    }
}

# Count of unique records
$uniqueCount = $uniqueData.Count
$duplicatesRemoved = $initialCount - $uniqueCount

Write-Host "Unique records found: $uniqueCount"
Write-Host "Duplicates removed: $duplicatesRemoved"

# Export unique records to the output file
$uniqueData | Export-Csv -Path $OutputFile -NoTypeInformation
Write-Host "Unique records saved to: $OutputFile"

How It Works

  1. Input Parameters: The script accepts an input file path (required) and an output file path (optional).

  2. Validation: It checks if the input file exists before proceeding.

  3. Record Processing: The script reads the CSV file and creates a unique identifier for each record by combining:

    • ApplicationName
    • CertificateEndDate
    • CertificateThumbprint
  4. Duplicate Detection: Using a hashtable for efficient lookups, it tracks which records have already been seen.

  5. Output: Only unique records are written to the output file.

Usage Examples

# Basic usage (creates SAML_Details_Unique.csv)
.\Remove-SAMLDuplicates.ps1 -InputFile "SAML_Details.csv"

# Specify a custom output filename
.\Remove-SAMLDuplicates.ps1 -InputFile "SAML_Details.csv" -OutputFile "SAML_Clean.csv"

Results

The script provides clear feedback about its operations:

  • Total records processed
  • Number of unique records found
  • Number of duplicates removed
  • Output file location

In this particular case, analyzing the sample file revealed 262 total records with exactly 131 unique entries, meaning half of the records were duplicates.

Previous Post Next Post

نموذج الاتصال