Blogger XML : Download Images + Rewrite XML

If you are currently using Blogger, then when you store photos on your blog, it gets uploaded to the Google content user servers, this can be a bit of a problem if you wish to move from Blogger to a different platform because there is no Google takeout option for this data.

If the year was 2023 you had the option of this being included in Google photosBut since this has become separated, you have to go to your personal user content and download each of your files manually, which sounds like a lot of administrative effort.

Download Images from XML file

What I would recommend instead, is that you get a script to complete these operations, for this to work, you simply need to download your blogger. XML backup that does not contain the images, but contains the links to those images - Then from there, you can use “wget” to download these images, one by one based on what’s in the XML.

When I first got this working, it downloaded lots of horrible files that were simply just GUID files so I’ve had to amend it to only include valid image format extensions.

Script : BloggerXMLImageDownload.ps1

# Function to select the Blogger XML backup file
function Select-BloggerBackupFile {
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
$OpenFileDialog.Filter = "XML Files (*.xml)|*.xml"
$OpenFileDialog.Title = "Select Blogger XML Backup File"
if ($OpenFileDialog.ShowDialog() -eq [System.Windows.Forms.DialogResult]::OK) {
return $OpenFileDialog.FileName
} else {
Write-Error "No file selected. Exiting script."
exit
}
}
# Function to extract image URLs from the XML content
function Extract-ImageUrls {
param (
[string]$XmlContent
)
$imageUrls = @()
# Match all Google image URLs with various patterns
$patterns = @(
"https://blogger\.googleusercontent\.com/img/[^""'\s()<>]+"
"https://\d+\.bp\.blogspot\.com/[^""'\s()<>]+"
"https://lh\d+\.googleusercontent\.com/[^""'\s()<>]+"
)
foreach ($pattern in $patterns) {
$matches = [regex]::Matches($XmlContent, $pattern)
foreach ($match in $matches) {
# Clean up the URL (remove trailing punctuation or HTML that might have been caught)
$url = $match.Value -replace '[,;\.)]$', ''
# Check if the URL appears to be an image file
$isImageFile = $false
# Method 1: Check if the URL has an image extension
if ($url -match '\.(jpg|jpeg|png|gif|bmp|webp|svg|tiff)($|\?|&)') {
$isImageFile = $true
}
# Method 2: Check if the URL has image parameters
elseif ($url -match '=s\d+' -or $url -match '=w\d+' -or $url -match '=h\d+') {
$isImageFile = $true
}
if ($isImageFile) {
$imageUrls += $url
}
}
}
return $imageUrls | Sort-Object -Unique
}
# Function to download images from URLs
function Download-Images {
param (
[string[]]$ImageUrls,
[string]$DownloadFolder
)
# Create the download folder if it doesn't exist
if (-not (Test-Path -Path $DownloadFolder)) {
New-Item -ItemType Directory -Path $DownloadFolder | Out-Null
}
$total = $ImageUrls.Count
$current = 0
foreach ($url in $ImageUrls) {
$current++
try {
# Extract a reasonable filename from the URL
$fileName = ($url -split '/' | Select-Object -Last 1) -replace '\?.*$', ''
# If no extension or parameter-style URL, add a default extension
if (-not ($fileName -match '\.(jpg|jpeg|png|gif|bmp|webp|svg|tiff)$')) {
# If URL uses size parameters, it's likely a JPEG
if ($url -match '=s\d+' -or $url -match '=w\d+' -or $url -match '=h\d+') {
$fileName = "image_$current.jpg"
} else {
# If parameter style (=xxx), use jpg extension
if ($fileName -match '=') {
$fileName = "image_$current.jpg"
} else {
# If no clear extension and no parameters, skip this file
Write-Warning "Skipping $url - can't determine file type"
continue
}
}
}
$destinationPath = Join-Path -Path $DownloadFolder -ChildPath $fileName
Write-Progress -Activity "Downloading Images" -Status "$current of $total" -PercentComplete (($current / $total) * 100)
Write-Output "Downloading $url to $destinationPath"
Invoke-WebRequest -Uri $url -OutFile $destinationPath -TimeoutSec 30
} catch {
Write-Error "Failed to download $url. Error: $_"
}
}
}
# Main script execution
$backupFilePath = Select-BloggerBackupFile
$xmlContent = Get-Content -Path $backupFilePath -Raw -Encoding UTF8
$imageUrls = Extract-ImageUrls -XmlContent $xmlContent
if ($imageUrls.Count -eq 0) {
Write-Warning "No image URLs found in the backup file."
exit
}
Write-Output "Found $($imageUrls.Count) unique image URLs."
$downloadFolder = Read-Host "Enter the folder path where images should be saved"
Download-Images -ImageUrls $imageUrls -DownloadFolder $downloadFolder
Write-Output "Download complete. Images saved to $downloadFolder"

This will then run through your XML ask you which folder do you’d like to save the files in and then download all those images which can take a bit of time depending on the amount of images you need to download.

Once the download has complete, you will now have the images in a folder, which for example, is called “Images”  however, the original XML will still be pointing at Google servers for the images.

Re-Write XML (with a Backup) to the new "base URL"

The next mission you have to solve is once they’ve been uploaded to your new blog platform you need to update the XML to point of this new location rather than getting your images cross site from Google.

I have therefore created another script that will update the base URL of your images, the file names will remain exactly the same, but the website serving them and the path will obviously be different, The XML rewrite is done locally - and the time this can take to run differs drastically, depending on the content of your file.

Script : ReWriteXMLFile.ps1

# Function to select the Blogger XML backup file
function Select-BloggerBackupFile {
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
$OpenFileDialog.Filter = "XML Files (*.xml)|*.xml"
$OpenFileDialog.Title = "Select Blogger XML Backup File"
if ($OpenFileDialog.ShowDialog() -eq [System.Windows.Forms.DialogResult]::OK) {
return $OpenFileDialog.FileName
} else {
Write-Error "No file selected. Exiting script."
exit
}
}
# Function to create backup of original file
function Create-BackupFile {
param (
[string]$FilePath
)

$directory = [System.IO.Path]::GetDirectoryName($FilePath)
$filename = [System.IO.Path]::GetFileNameWithoutExtension($FilePath)
$extension = [System.IO.Path]::GetExtension($FilePath)
$backupFilePath = Join-Path -Path $directory -ChildPath "$filename-backup$extension"

Write-Output "Creating backup at: $backupFilePath"
Copy-Item -Path $FilePath -Destination $backupFilePath -Force

return $backupFilePath
}
# Function to replace image URLs in the XML content
function Replace-ImageUrls {
param (
[string]$XmlContent,
[string]$NewBaseUrl
)

# Ensure the new base URL ends with a slash if not empty
if ($NewBaseUrl -ne "" -and -not $NewBaseUrl.EndsWith('/')) {
$NewBaseUrl = $NewBaseUrl + '/'
}

# Define patterns for Google image URLs
$patterns = @(
"https://blogger\.googleusercontent\.com/img/[^""'\s()<>]+"
"https://\d+\.bp\.blogspot\.com/[^""'\s()<>]+"
"https://lh\d+\.googleusercontent\.com/[^""'\s()<>]+"
)

# Counter for tracking replacements
$totalReplacements = 0

# Process each pattern
foreach ($pattern in $patterns) {
# Find all URLs matching the pattern
$matches = [regex]::Matches($XmlContent, $pattern)
$replacementCount = 0

# Process each match
foreach ($match in $matches) {
$originalUrl = $match.Value

# Check if the URL appears to be an image file
$isImageFile = $false

# Method 1: Check if the URL has an image extension
if ($originalUrl -match '\.(jpg|jpeg|png|gif|bmp|webp|svg|tiff)($|\?|&)') {
$isImageFile = $true
}
# Method 2: Check if the URL has image parameters
elseif ($originalUrl -match '=s\d+' -or $originalUrl -match '=w\d+' -or $originalUrl -match '=h\d+') {
$isImageFile = $true
}

if ($isImageFile) {
# Extract filename from the URL
$fileName = ($originalUrl -split '/' | Select-Object -Last 1) -replace '\?.*$', ''

# If no extension or parameter-style URL, add a default extension
if (-not ($fileName -match '\.(jpg|jpeg|png|gif|bmp|webp|svg|tiff)$')) {
# If URL uses size parameters, it's likely a JPEG
if ($originalUrl -match '=s\d+' -or $originalUrl -match '=w\d+' -or $originalUrl -match '=h\d+') {
$fileName = "image_$($totalReplacements + 1).jpg"
} else {
# If parameter style (=xxx), use jpg extension
if ($fileName -match '=') {
$fileName = "image_$($totalReplacements + 1).jpg"
} else {
# Skip if not a recognized image URL
continue
}
}
}

# Replace the entire URL with new base URL + filename
$newUrl = $NewBaseUrl + $fileName

# Perform the replacement
$XmlContent = $XmlContent.Replace($originalUrl, $newUrl)
$replacementCount++
$totalReplacements++
}
}

Write-Output "Replaced $replacementCount image URLs matching pattern: $pattern"
}

Write-Output "Total image URL replacements: $totalReplacements"
return $XmlContent
}
# Function to save the modified XML content
function Save-ModifiedXml {
param (
[string]$XmlContent,
[string]$OriginalFilePath
)

$directory = [System.IO.Path]::GetDirectoryName($OriginalFilePath)
$filename = [System.IO.Path]::GetFileNameWithoutExtension($OriginalFilePath)
$extension = [System.IO.Path]::GetExtension($OriginalFilePath)
$newFilePath = Join-Path -Path $directory -ChildPath "$filename-modified$extension"

# Save the modified content
Write-Output "Saving modified XML to: $newFilePath"
$XmlContent | Out-File -FilePath $newFilePath -Encoding UTF8

return $newFilePath
}
# Main script execution
$backupFilePath = Select-BloggerBackupFile
# Create backup first
$backupPath = Create-BackupFile -FilePath $backupFilePath
# Read the original content
$xmlContent = Get-Content -Path $backupFilePath -Raw -Encoding UTF8
# Get the new base URL from the user
Write-Output "Enter the new base URL for images (e.g., https://example.com/images/)"
$newBaseUrl = Read-Host "New base URL"
# Replace the image URLs (always preserving filenames)
$modifiedXmlContent = Replace-ImageUrls -XmlContent $xmlContent -NewBaseUrl $newBaseUrl
# Save the modified XML
$newFilePath = Save-ModifiedXml -XmlContent $modifiedXmlContent -OriginalFilePath $backupFilePath
Write-Output "Process complete!"
Write-Output "Original backup saved to: $backupPath"
Write-Output "Modified XML saved to: $newFilePath"

Previous Post Next Post

نموذج الاتصال