I recently needed to remove entire rows from a CSV file when specific email addresses appeared in a particular column. The requirement was straightforward: if certain email addresses were detected as recipients, the entire row needed to be filtered out and excluded from the final dataset.
The Problem
I had a CSV file containing email data with multiple columns including SenderAddress
, RecipientAddress
, and other metadata. The task was to remove all rows where the RecipientAddress
column contained any email from a predefined list of addresses that needed to be excluded.
Initial Approach and Issues
My first attempt focused on the wrong column (the requirements that were originally given to me were wrong ) I initially tried filtering based on SenderAddress:
$filteredData = $csvData | Where-Object {
$_.SenderAddress -notin $usernamesToRemove
}
This approach failed because I was filtering on senders when I actually needed to filter on recipients. The script was working correctly but targeting the wrong data.
Data Format Challenges
The username list I needed to work with had formatting issues that required cleanup:
manager.user@bythepowerofgreyskull.com
'James.O''Bond@bythepowerofgreyskull.com'
'mike.O''roger@bythepowerofgreyskull.com'
junior.bear@bythepowerofgreyskull.com
Some entries were wrapped in single quotes and contained escaped apostrophes (''
representing a single apostrophe). I needed to clean these up before performing the comparison:
$usernamesToRemove = $rawUsernames | ForEach-Object {
$cleaned = $_.Trim()
# Remove surrounding single quotes if they exist
if ($cleaned.StartsWith("'") -and $cleaned.EndsWith("'")) {
$cleaned = $cleaned.Substring(1, $cleaned.Length - 2)
}
# Fix double apostrophes ('' becomes ')
$cleaned = $cleaned -replace "''", "'"
return $cleaned
}
Script Summary
The script reads the CSV file, loads the list of email addresses to exclude, cleans up any formatting issues, and then filters out rows where the RecipientAddress
matches any address in the exclusion list:
# Read and process the data
$csvData = Import-Csv -Path $csvFilePath
$usernamesToRemove = Get-Content -Path $usernameListPath | ForEach-Object { /* cleanup logic */ }
# Filter out matching rows
$filteredData = $csvData | Where-Object {
$_.RecipientAddress -notin $usernamesToRemove
}
# Export the cleaned data
$filteredData | Export-Csv -Path $outputPath -NoTypeInformation
Debugging and Verification
I included debugging output to verify the script was working correctly:
# Check for matches before filtering
$matchingRows = $csvData | Where-Object { $_.RecipientAddress -in $usernamesToRemove }
Write-Host "Found $($matchingRows.Count) rows that match the removal list"
# Display results
$removedCount = $initialCount - $finalCount
Write-Host "Rows removed: $removedCount"
Write-Host "Final row count: $finalCount"
This approach helped identify when the script wasn't finding matches and led to discovering the sender vs. recipient column issue.
Script : csv-data-recipentaddress--remover.ps1
Here's the complete working script:
# PowerShell script to remove CSV rows where RecipientAddress matches usernames from a list
# Configuration - Update these paths as needed
$csvFilePath = "data.csv" # Path to your CSV file
$usernameListPath = "usernames.txt" # Path to text file with usernames (one per line)
$outputPath = "filtered_data.csv" # Path for the cleaned CSV output
try {
# Read the CSV file
Write-Host "Reading CSV file: $csvFilePath"
$csvData = Import-Csv -Path $csvFilePath
# Read usernames from file and clean them up
Write-Host "Reading username list: $usernameListPath"
$rawUsernames = Get-Content -Path $usernameListPath
# Clean up the usernames - remove quotes and fix apostrophes
$usernamesToRemove = $rawUsernames | ForEach-Object {
$cleaned = $_.Trim()
# Remove surrounding single quotes if they exist
if ($cleaned.StartsWith("'") -and $cleaned.EndsWith("'")) {
$cleaned = $cleaned.Substring(1, $cleaned.Length - 2)
}
# Fix double apostrophes ('' becomes ')
$cleaned = $cleaned -replace "''", "'"
return $cleaned
}
# Display initial row count
$initialCount = $csvData.Count
Write-Host "Initial row count: $initialCount"
# Check for matches before filtering
$matchingRows = $csvData | Where-Object { $_.RecipientAddress -in $usernamesToRemove }
Write-Host "Found $($matchingRows.Count) rows that match the removal list"
# Filter out rows where RecipientAddress matches any username in the list
$filteredData = $csvData | Where-Object {
$_.RecipientAddress -notin $usernamesToRemove
}
# Display results
$finalCount = $filteredData.Count
$removedCount = $initialCount - $finalCount
Write-Host "Rows removed: $removedCount"
Write-Host "Final row count: $finalCount"
# Export the filtered data to new CSV
$filteredData | Export-Csv -Path $outputPath -NoTypeInformation
Write-Host "Filtered CSV saved to: $outputPath"
Write-Host "Operation completed successfully!"
} catch {
Write-Error "An error occurred: $($_.Exception.Message)"
}
Write-Host "`nSummary:"
Write-Host "- Original file: $csvFilePath"
Write-Host "- Username list: $usernameListPath"
Write-Host "- Output file: $outputPath"
Write-Host "- Usernames to remove: $($usernamesToRemove.Count)"
Write-Host "- Rows removed: $removedCount"
Usage
- Create a text file
usernames.txt
with email addresses to exclude (one per line):
user1@company.com
user2@company.com
user3@company.com
- Update the file paths in the script:
$csvFilePath = "email_data.csv"
$usernameListPath = "usernames.txt"
$outputPath = "filtered_email_data.csv"
- Run the script in PowerShell
Handling Different Column Names
If your CSV uses a different column name for recipients, modify the filtering logic:
# For a column named 'ToAddress'
$filteredData = $csvData | Where-Object {
$_.ToAddress -notin $usernamesToRemove
}
# For a column named 'EmailAddress'
$filteredData = $csvData | Where-Object {
$_.EmailAddress -notin $usernamesToRemove
}
Multiple Column Filtering
To filter based on multiple columns:
$filteredData = $csvData | Where-Object {
($_.RecipientAddress -notin $usernamesToRemove) -and
($_.SenderAddress -notin $usernamesToRemove)
}
The script preserves the original CSV file and creates a new filtered version, making it safe to run multiple times with different exclusion lists.