Monitoring Email Infrastructure: Why Platform Independence Matters

When your email monitoring system relies on the same platform it's supposed to monitor, you're essentially asking the fox to guard the henhouse. This fundamental flaw became painfully apparent during a recent Exchange Online outage that highlighted why diversifying your alerting mechanisms isn't just good practice—it's essential for maintaining visibility when things go wrong.

The Paradox of Self-Monitoring

Picture this scenario: you've built a sophisticated email monitoring script that checks for message flow between test accounts. It dutifully tracks inbound and outbound communications, and when something goes wrong, it sends you an email alert. Sounds foolproof, right?

The irony reveals itself when the very platform you're monitoring—Exchange Online—experiences an outage. Your monitoring script detects the problem perfectly, logs it to the console, but the email alert never reaches you because it's stuck in the same queue that's causing the problem in the first place.

The Exchange Online Outage: A Case Study

On August 29, 2025, Exchange Online experienced a significant service degradation that affected users across Europe. The outage presented several challenging characteristics:

Silent Failures: No NDR (Non-Delivery Report) messages were generated, creating a false sense that emails were being processed
Queue Buildup: Local email queues filled up while trying to deliver to Exchange Online
Platform Blindness: Exchange Online's monitoring capabilities were compromised because the platform itself was experiencing issues
Routing Confusion: On-premises Exchange servers continued attempting to route messages to an unresponsive cloud service

The most insidious aspect of this outage was its silence. Without proper NDRs, administrators had no immediate indication that messages weren't being delivered. Email simply disappeared into the void - not lost but like an aircraft in a holding pattern.

Independent Monitoring: The Teams Webhook Solution

Recognizing this fundamental flaw, I redesigned my email monitoring approach to use Microsoft Teams webhooks instead of email notifications. This creates platform independence—if Exchange Online fails, Teams can still receive and display alerts through its separate infrastructure.

Key Implementation Features

Here's how the monitoring script addresses the core challenges:

# Define monitoring direction clearly
$SenderEmail = "lee.online@bear.local"      # External sender
$ReceiverEmail = "bearsbearsbears@bear.local"   # Internal receiver

# Inbound monitoring: External → Internal (emails coming into your organization)
if ($Direction -eq "Inbound") {
    $recentEmails = Get-MessageTraceV2 -SenderAddress $SenderEmail -RecipientAddress $ReceiverEmail
}

# Outbound monitoring: Internal → External (emails leaving your organization)  
else {
    $recentEmails = Get-MessageTraceV2 -SenderAddress $ReceiverEmail -RecipientAddress $SenderEmail
}

Teams Webhook Integration

The script creates rich, actionable notifications in Teams:

function Send-TeamsAlert {
    param (
        [string]$Title,
        [string]$Message,
        [string]$AlertType = "Warning"
    )
    
    $teamsMessage = @{
        "@type" = "MessageCard"
        "themeColor" = $themeColor  # Red for errors, orange for warnings
        "sections" = @(
            @{
                "activityTitle" = "**$Title**"
                "facts" = @(
                    @{ "name" = "Alert Time"; "value" = (Get-Date -Format "yyyy-MM-dd HH:mm:ss") },
                    @{ "name" = "Direction"; "value" = $Direction },
                    @{ "name" = "Monitoring Period"; "value" = "Last $CheckIntervalMinutes minutes" }
                )
            }
        )
        "potentialAction" = @(
            @{
                "@type" = "OpenUri"
                "name" = "View Exchange Admin Center"
                "targets" = @(@{ "uri" = "https://admin.exchange.microsoft.com" })
            }
        )
    }
    
    Invoke-RestMethod -Uri $TeamsWebhookUrl -Method Post -Body ($teamsMessage | ConvertTo-Json -Depth 10) -ContentType "application/json"
}

Defining Inbound vs. Outbound Monitoring

Inbound Monitoring: Tracks emails entering your organization from external sources
Outbound Monitoring: Tracks emails leaving your organization to external destinations

Visual Notification : Team Web hook Notification

This is an example of the notification when the trigger is fired and the web hook is called:

The Multi-Channel Approach: Teams and email

For the truly cautious, running both email and Teams notifications provides redundancy:

# Dual notification approach
if (-not $emailsReceived) {
    # Send Teams alert (always works if Teams is up)
    Send-TeamsAlert -Title "Email Flow Alert" -Message $alertMessage
    
    # Also attempt email alert (works if Exchange is healthy)
    if ($EnableEmailBackup) {
        Send-AlertEmail -Subject "Backup Alert" -Body $alertMessage
    }
}

This approach acknowledges that while Exchange Online outages are infrequent, their impact is significant. When the entire platform goes offline, response times are typically swift, but visibility during the outage window is crucial for understanding scope and impact.

Lessons Learned

The August 2025 Exchange Online outage reinforced several critical principles:

Platform Independence: Never rely solely on the platform you're monitoring for alerting
Silent Failures: The absence of NDRs doesn't mean everything is working
Queue Monitoring: Local queue buildup often indicates remote platform issues
Multi-Channel Alerts: Redundant notification paths increase reliability
Clear Definitions: Ambiguous monitoring parameters lead to misinterpreted results

Conclusion

Email infrastructure monitoring requires thinking beyond the obvious. When Exchange Online experiences issues, your monitoring visibility shouldn't disappear along with it. By implementing platform-independent alerting through Teams webhooks, you maintain situational awareness even when your primary email platform is compromised.

The goal isn't to predict every possible failure mode, but to ensure that when failures occur, you know about them through channels that remain operational. In the world of managed services like Exchange Online, the response is typically swift once issues are identified—but you can't manage what you can't see.

Monitoring Email Infrastructure: Why Platform Independence Matters

نموذج الاتصال