Source Control, Observability, AI-Assisted Development, and Service Principal Hygiene
Automation that runs silently and fails silently is worse than no automation, because administrators assume it is working when it is not. This section covers the operational discipline required to keep automation workloads maintainable and observable over time: source control, CI/CD deployment, safe use of AI-assisted development, monitoring and alerting, and service principal lifecycle hygiene. These practices apply across all the platforms covered in Section 4.
Learning journey - the four operational excellence pillars covered in this section
Opening scenario - 18 days of silent failure on a CA exclusion runbook
Discussion prompt - how would you detect a silent automation failure today?
Source control as a security requirement - the with/without comparison
CI/CD pipeline - the five stages from code edit to deploy
Without source control, automation code exists only in the Azure portal or in a local file on someone's workstation. There is no change history, no peer review, no rollback capability, and no audit trail. Portal-edited runbooks are the automation equivalent of ungoverned infrastructure: they change without anyone knowing, they break without anyone understanding why, and they cannot be recovered to a known-good state.
Rule: treat the Azure portal as read-only for production automation code. All changes must go through a pull request in a Git repository.
/src folder for Standard)Repository layout - what belongs (and what does not) in source control
Automation Accounts support native source control integration with GitHub and Azure DevOps. Configure sync to pull from a specific branch. The sync copies runbook files from the repository to the Automation Account automatically when changes are merged.
Important limitation: native source control sync has incomplete support for PowerShell 7.x runbooks. Teams using PS7 must deploy via a CI/CD pipeline instead.
Automation Account source control sync - native sync vs CI/CD pipeline
A CI/CD pipeline provides what native sync cannot: pre-deployment validation.
Basic GitHub Actions pipeline for runbook deployment:
name: Deploy Runbook
on:
push:
branches: [main]
paths: ['runbooks/**']
permissions:
id-token: write
contents: read
jobs:
validate-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run PSScriptAnalyzer
shell: pwsh
run: |
Install-Module PSScriptAnalyzer -Force -Scope CurrentUser
$results = Invoke-ScriptAnalyzer -Path ./runbooks -Recurse -Severity Error,Warning
if ($results) { $results; exit 1 }
- uses: azure/login@v2
with:
client-id: ${{ vars.AZURE_CLIENT_ID }}
tenant-id: ${{ vars.AZURE_TENANT_ID }}
subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
- name: Deploy runbook
run: |
az automation runbook replace-content \
--resource-group ${{ vars.RG_NAME }} \
--automation-account-name ${{ vars.AA_NAME }} \
--name MyRunbook \
--content @./runbooks/MyRunbook.ps1
This pipeline runs PSScriptAnalyzer before deploying. Any severity Error or Warning result fails the pipeline and blocks deployment.
CI/CD demo flow - violation blocks pipeline, fix unblocks deployment
Define Automation Accounts, Logic Apps, Function Apps, RBAC role assignments, and managed identities in Bicep. Store IaC in the same repository as the code. This enables consistent, repeatable deployments and prevents configuration drift between environments.
The four common weaknesses in AI-assisted automation code
When using GitHub Copilot, Claude, or other AI coding assistants to write automation:
Safe sandboxes for AI-assisted coding - review, sandbox, never paste production data
AI models tend to generate code with predictable security weaknesses. Review every AI-generated script for:
$clientSecret = "..." patterns..ReadWrite.All because it avoids permission errors. Use the minimum scope required.PSScriptAnalyzer is a static analysis tool for PowerShell. Run it on every runbook before deployment:
Install-Module PSScriptAnalyzer -Scope CurrentUser
# Analyze a single runbook
Invoke-ScriptAnalyzer -Path .\MyRunbook.ps1 -Severity Error, Warning
# Analyze a directory recursively
Invoke-ScriptAnalyzer -Path .\runbooks -Recurse -Severity Error, Warning
# Include specific security rules
Invoke-ScriptAnalyzer -Path .\MyRunbook.ps1 `
-IncludeRule PSAvoidUsingPlainTextForPassword, PSAvoidUsingConvertToSecureStringWithPlainText
PSScriptAnalyzer catches common security issues: plain-text passwords, use of deprecated cmdlets, missing error handling patterns, and code style violations that obscure intent.
PSScriptAnalyzer pipeline - block on violation, deploy on clean run
Bandit is the Python equivalent of PSScriptAnalyzer. Run it on Python runbooks and Function App scripts:
pip install bandit
bandit -r ./scripts/ -ll # report medium and high severity
Bandit for Python - same CI gate pattern for Python runbooks
Static analysis tools catch syntax and pattern issues. Peer review catches logic errors, incorrect permission scope choices, and missing security considerations that tools cannot detect. Require at least one reviewer for all automation code changes, even in small teams.
Monitoring Stack
Automation that runs silently and fails silently is operationally dangerous. An emergency access CA exclusion runbook that has been failing for 18 days looks fine from the outside - until a major incident occurs and the break-glass account does not have the expected policy exclusions. The failure was invisible because there were no alerts.
Monitoring must be configured on every automation workload before it is deployed to production.
Process Automation > Jobs.// Find failed runbook jobs in the last 24 hours
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.AUTOMATION"
| where Category == "JobLogs"
| where ResultType == "Failed"
| where TimeGenerated > ago(24h)
| project TimeGenerated, RunbookName_s, ResultDescription_s
| order by TimeGenerated desc
Automation Account monitoring - diagnostic settings → Log Analytics → alert
Log Analytics KQL anatomy - each clause has a teaching purpose
Log Analytics portal view - failed job results with the error message field highlighted
Logic App run history is visible in the portal under the workflow view. Enable diagnostic settings to send run history to Log Analytics:
// Find failed Logic App runs
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.LOGIC"
| where Category == "WorkflowRuntime"
| where status_s == "Failed"
| where TimeGenerated > ago(24h)
| project TimeGenerated, resource_runId_s, code_s, error_message_s
| order by TimeGenerated desc
Function App execution telemetry is captured in Application Insights. Query for failures:
// Application Insights - failed function executions
requests
| where success == false
| where timestamp > ago(24h)
| project timestamp, name, resultCode, duration, operation_Id
| order by timestamp desc
Enable Application Insights on every security automation Function App. Without it, failures are not queryable.
Application Insights for Function Apps - four telemetry pillars
Configure Azure Monitor alerts so that automation failures create visible signals rather than silent voids.
Monitor > Alerts > Create alert rule.Condition, select Total Job Runs metric, dimension Status = Failed, threshold > 0.Actions, configure an action group that sends an email or Teams notification.Details, name the alert and set severity.Alternatively, create a Log Analytics alert rule that queries job logs:
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.AUTOMATION"
| where Category == "JobLogs"
| where ResultType == "Failed"
Set this as a scheduled query alert with a frequency of 5 minutes and a threshold of 0 results.
Sentinel playbooks - the testing gap that makes them silently broken
Azure Monitor alert rule anatomy - scope, condition, action group, severity
Wire failure events to your incident management workflow:
A Sentinel playbook that has never been tested with a synthetic incident may have been silently broken for months. Running a playbook only on real incidents means the first time you discover it is broken is during an actual security event.
SP naming convention - one display name answers owner, purpose, environment
Owner field plus Notes field - making every SP self-documenting
Without a naming standard, SP inventories become unmanageable at scale. When a service principal named App1 or test_new shows up in sign-in logs, there is no way to determine ownership, purpose, or environment from the name alone. Orphan detection and lifecycle management depend on being able to identify what an SP is for from its display name alone.
Use the pattern [team]-[purpose]-[env]:
secops-signinmonitor-proddevteam-deployagent-stagingitops-caexclusion-prodThis encodes team ownership, automation purpose, and target environment into every display name. Combined with the Notes field and owner assignment, it makes inventory and lifecycle management tractable at scale.
The app registration Notes field is queryable via Graph API and can be included in automated inventory reports. Use it to record:
Every app registration must have at least one owner assigned in the Entra portal. Ownerless registrations should be flagged by automated hygiene checks. An SP whose creator left the organization becomes an orphan with no one responsible for it.
Sp Hygiene
Credential expiry query - sample output highlighting bad names and 7-day-out secrets
Set expiry on all secrets and certificates. Build automation to alert on credentials expiring within 30, 60, and 90 days:
# Find credentials expiring in the next 30 days
$cutoff = (Get-Date).AddDays(30)
Get-MgApplication -All | ForEach-Object {
$app = $_
$app.PasswordCredentials | Where-Object {
$_.EndDateTime -lt $cutoff -and $_.EndDateTime -gt (Get-Date)
} | ForEach-Object {
[PSCustomObject]@{
DisplayName = $app.DisplayName
AppId = $app.AppId
SecretHint = $_.Hint
ExpiresOn = $_.EndDateTime
}
}
} | Sort-Object ExpiresOn
Key Vault emits Event Grid events when certificates approach expiry. Use these events to trigger Logic Apps or Automation Account runbooks that alert or initiate rotation.
Enterprise apps with no recent sign-in activity are candidates for review and possible deletion. Use sign-in logs as a first-pass heuristic by checking whether the app ID has appeared in the last 90 days:
Import-Module Microsoft.Graph.Applications
Import-Module Microsoft.Graph.Reports
Connect-MgGraph -Scopes "Application.Read.All", "AuditLog.Read.All"
# Enterprise apps with no sign-in activity in the past 90 days
$cutoff = (Get-Date).AddDays(-90)
$cutoffString = $cutoff.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ssZ")
Get-MgServicePrincipal -All | ForEach-Object {
$sp = $_
$signIns = Get-MgAuditLogSignIn `
-Filter "appId eq '$($sp.AppId)' and createdDateTime ge $cutoffString" `
-Top 1
if (-not $signIns) {
[PSCustomObject]@{
DisplayName = $sp.DisplayName
AppId = $sp.AppId
ObjectId = $sp.Id
}
}
} | Format-Table -AutoSize
This query requires both Application.Read.All and AuditLog.Read.All, plus the Microsoft.Graph.Applications and Microsoft.Graph.Reports modules if you are using Graph PowerShell. Treat the result as a review list, not an automatic delete list, because appId-based sign-in activity is a heuristic rather than a complete ownership signal.
Use Entra application authentication method policies to enforce maximum credential lifetimes. Restrict secrets to a maximum of 180 days (or less, depending on your policy). This prevents indefinitely-lived secrets from accumulating.
An orphaned SP is one with no current owner assigned and no recent sign-in activity. Build a query that surfaces ownerless registrations:
# Find app registrations with no owner
Get-MgApplication -All | Where-Object {
-not (Get-MgApplicationOwner -ApplicationId $_.Id)
} | Select-Object DisplayName, AppId, CreatedDateTime
Combine with the 90-day activity heuristic for a combined orphan report: no owner and no recent sign-in activity.
Graph API orphan detection - three decisions, four outcomes
Have sample runbook code, analyzer output, KQL examples, and Graph credential reports ready. Log Analytics and alert rules may not show results immediately; use the saved output if the live signal is delayed. Clean up alert rules, test failures, and sample service principals created during the lab.
Configure source control sync for an Automation Account, lint a runbook with PSScriptAnalyzer, inventory expiring Entra application credentials, send Automation diagnostics to Log Analytics, and create a Monitor alert rule from a Log Analytics query.
Source control and add a new connection.main branch, and the /runbooks folder.Az CLI:
Create the source control connection and trigger a sync job
$resourceGroupName = "<resource-group>" $automationAccountName = "<automation-account>" $repoUrl = "https://github.com/<github-owner>/<repo-name>.git" $githubToken = gh auth token # Source control sync requires the Automation Account managed identity and a Contributor assignment on the Automation Account itself. $automationId = az automation account show --automation-account-name $automationAccountName --resource-group $resourceGroupName --query id -o tsv $automationPrincipalId = az automation account show --automation-account-name $automationAccountName --resource-group $resourceGroupName --query identity.principalId -o tsv az role assignment create ` --assignee-object-id $automationPrincipalId ` --assignee-principal-type ServicePrincipal ` --role Contributor ` --scope $automationId | Out-Null az automation source-control create ` --resource-group $resourceGroupName ` --automation-account-name $automationAccountName ` --name github-runbooks ` --repo-url $repoUrl ` --branch main ` --source-type GitHub ` --folder-path /runbooks ` --access-token $githubToken ` --token-type PersonalAccessToken ` --auto-sync false ` --publish-runbook true | Out-Null # In PowerShell, preserve the required empty string for commit-id with the stop-parsing operator. az --% automation source-control sync-job create --resource-group <resource-group> --automation-account-name <automation-account> --source-control-name github-runbooks --job-id 11111111-1111-1111-1111-111111111111 --commit-id "" az automation runbook list ` --automation-account-name $automationAccountName ` --resource-group $resourceGroupName ` --query "[].{name:name,state:state}" ` -o table
Expected outcomes:
- The sync job reaches Succeeded
- The source-controlled runbook appears in the Automation Account
- The imported runbook shows as Published
Native PowerShell:
Create a sample runbook and lint it
@' function Invoke-LegacyLogin { param([string]$Password) Write-Host "Using password input" } Invoke-LegacyLogin -Password "hardcoded-password-value-here" $password = ConvertTo-SecureString "plaintext" -AsPlainText -Force '@ | Set-Content -Path .\Sample-Runbook.ps1 -Encoding utf8 Install-Module PSScriptAnalyzer -Scope CurrentUser -Force Invoke-ScriptAnalyzer -Path .\Sample-Runbook.ps1 -Severity Error, Warning Invoke-ScriptAnalyzer -Path .\Sample-Runbook.ps1 ` -IncludeRule PSAvoidUsingPlainTextForPassword,PSAvoidUsingConvertToSecureStringWithPlainText,PSAvoidUsingWriteHost
Expected outcomes:
- PSScriptAnalyzer flags the plain-text password rules
- Write-Host is called out as a lint issue in the sample file
- Re-running after a fix removes the matching rule from the results
Native PowerShell:
Use a Graph token plus raw REST
$cutoff = (Get-Date).AddDays(30) $token = az account get-access-token --resource-type ms-graph --query accessToken -o tsv $headers = @{ Authorization = "Bearer $token" } $uri = "https://graph.microsoft.com/v1.0/applications?`$select=displayName,appId,passwordCredentials&`$top=999" $results = @() do { $response = Invoke-RestMethod -Method GET -Uri $uri -Headers $headers $results += $response.value $uri = $response.'@odata.nextLink' } while ($uri) $results | ForEach-Object { $app = $_ $app.passwordCredentials | Where-Object { [datetime]$_.endDateTime -lt $cutoff -and [datetime]$_.endDateTime -gt (Get-Date) } | ForEach-Object { [PSCustomObject]@{ DisplayName = $app.displayName AppId = $app.appId ExpiresOn = $_.endDateTime DaysLeft = [int]([datetime]$_.endDateTime - (Get-Date)).TotalDays } } } | Sort-Object DaysLeft | Format-Table -AutoSize
Graph PowerShell:
Use
Invoke-MgGraphRequestfor the same inventoryConnect-MgGraph -Scopes "Application.Read.All" $cutoff = (Get-Date).AddDays(30) $uri = "https://graph.microsoft.com/v1.0/applications?`$select=displayName,appId,passwordCredentials&`$top=999" $results = @() do { $response = Invoke-MgGraphRequest -Method GET -Uri $uri $results += $response.value $uri = $response.'@odata.nextLink' } while ($uri) $results | ForEach-Object { $app = $_ $app.passwordCredentials | Where-Object { [datetime]$_.endDateTime -lt $cutoff -and [datetime]$_.endDateTime -gt (Get-Date) } | ForEach-Object { [PSCustomObject]@{ DisplayName = $app.displayName AppId = $app.appId ExpiresOn = $_.endDateTime DaysLeft = [int]([datetime]$_.endDateTime - (Get-Date)).TotalDays } } } | Sort-Object DaysLeft | Format-Table -AutoSize
Az CLI:
Stay in CLI but keep the same pagination logic
$cutoff = (Get-Date).AddDays(30) $uri = "https://graph.microsoft.com/v1.0/applications?`$select=displayName,appId,passwordCredentials&`$top=999" $results = @() do { $page = az rest --method get --uri $uri | ConvertFrom-Json $results += $page.value $uri = $page.'@odata.nextLink' } while ($uri) $results | ForEach-Object { $app = $_ $app.passwordCredentials | Where-Object { [datetime]$_.endDateTime -lt $cutoff -and [datetime]$_.endDateTime -gt (Get-Date) } | ForEach-Object { [PSCustomObject]@{ DisplayName = $app.displayName AppId = $app.appId ExpiresOn = $_.endDateTime DaysLeft = [int]([datetime]$_.endDateTime - (Get-Date)).TotalDays } } } | Sort-Object DaysLeft | Format-Table -AutoSize
Expected outcomes:
- The query returns app registrations with credentials expiring inside the chosen window
- The output includes display name, app ID, expiry, and days remaining
Diagnostic settings and add a setting.JobLogs, JobStreams, and AuditEvent.Az CLI:
Create the diagnostic setting
$automationId = az automation account show ` --automation-account-name "<automation-account>" ` --resource-group "<resource-group>" ` --query id ` -o tsv $workspaceId = az monitor log-analytics workspace show ` --resource-group "<workspace-resource-group>" ` --workspace-name "<workspace-name>" ` --query id ` -o tsv az monitor diagnostic-settings create ` --name send-to-law ` --resource $automationId ` --workspace $workspaceId ` --logs '[{"category":"JobLogs","enabled":true},{"category":"JobStreams","enabled":true},{"category":"AuditEvent","enabled":true}]' ` --metrics '[{"category":"AllMetrics","enabled":true}]'
Az CLI:
Query the workspace
$workspaceCustomerId = az monitor log-analytics workspace show ` --resource-group "<workspace-resource-group>" ` --workspace-name "<workspace-name>" ` --query customerId ` -o tsv az monitor log-analytics query ` --workspace $workspaceCustomerId ` --analytics-query 'AzureDiagnostics | where ResourceProvider == "MICROSOFT.AUTOMATION" | where Category in ("JobLogs", "JobStreams", "AuditEvent") | where TimeGenerated > ago(1h) | project TimeGenerated, RunbookName_s, Category, ResultType, ResultDescription_s | order by TimeGenerated desc'
Expected outcomes:
- The diagnostic setting is created successfully
- After the first post-enable runbook job, the Automation records land in Log Analytics
- You can pivot on Category, RunbookName_s, and ResultType
Monitor > Alerts > Create.Az CLI:
Create a disabled scheduled-query alert rule first
$workspaceId = az monitor log-analytics workspace show ` --resource-group "<workspace-resource-group>" ` --workspace-name "<workspace-name>" ` --query id ` -o tsv az monitor scheduled-query create ` --resource-group "<resource-group>" ` --name "Lab5-RunbookAlert" ` --scopes $workspaceId ` --condition "count 'AutomationJobs' > 0" ` --condition-query AutomationJobs="AzureDiagnostics | where ResourceProvider == 'MICROSOFT.AUTOMATION' | where Category == 'JobLogs' | where ResultType == 'Failed' | where TimeGenerated > ago(15m)" ` --evaluation-frequency 15m ` --window-size 15m ` --severity 3 ` --disabled true
Expected outcomes:
- The rule is created successfully
- You can inspect the condition safely before enabling it
- Once diagnostics are flowing and the query returns the expected schema, you can remove --disabled true
az automation source-control sync-job create requires the stop-parsing operator so the empty --commit-id "" survives the shell.Section takeaways - five operational principles that compound together
Section 5 to Section 6 - practices carry forward into solution packaging
Volatile platform behavior and dated claims in this section were checked against these current sources on April 26, 2026:
All links below were reviewed on 2026-03-10.
Section 5 - Maintenance and Management - Reference Guide