Add TSG: Secure Boot (AzStackHci_Hardware_Test_Secure_Boot)#304
Add TSG: Secure Boot (AzStackHci_Hardware_Test_Secure_Boot)#3041008covingtonlane wants to merge 3 commits into
Conversation
New Environment Validator troubleshooting guide for the Hardware Secure Boot check (Test-SecureBoot, which runs Confirm-SecureBootUEFI). Covers where the failure appears (portal, on-box validator, and the AzStackHciEnvironmentChecker event log), how to enable UEFI Secure Boot in firmware, and a BitLocker precaution: suspend BitLocker before the firmware change and resume after, since Azure Local enables data-at-rest encryption by default and a Secure Boot change is measured into TPM PCR 7. Indexed in the EnvironmentValidator README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Environment Validator troubleshooting guide (TSG) for the Azure Local Secure Boot hardware validation check (AzStackHci_Hardware_Test_Secure_Boot, aggregated as AzStackHci_Hardware_SecureBoot), documenting where the failure surfaces, how to remediate it in firmware, and the BitLocker precaution/workflow around firmware changes.
Changes:
- Introduces a new Secure Boot TSG covering portal/on-box symptoms, validation commands, remediation steps, and escalation guidance.
- Adds the new TSG to the EnvironmentValidator index README.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Secure-Boot.md |
New TSG detailing diagnosis/remediation for the Secure Boot validator and BitLocker precautions. |
TSG/EnvironmentValidator/README.md |
Adds an index entry linking to the new Secure Boot TSG. |
| Get-WinEvent -LogName AzStackHciEnvironmentChecker -FilterXPath '*[System[(EventID=17205)]]' -MaxEvents 2000 | | ||
| Where-Object { $_.Message -match 'AzStackHci_Hardware_Test_Secure_Boot' } | | ||
| Select-Object -First 1 -ExpandProperty Message |
…yed-node path Address PR review: the realistic BitLocker case is an already-deployed, encrypted cluster member, so enabling Secure Boot reboots a live node. Add a drain-first step (Suspend-ClusterNode -Drain with cluster-health and quorum pre-checks, one node at a time) and a resume-node step (Resume-ClusterNode, wait for storage resync) around the existing BitLocker suspend/resume flow. Reconcile the Overview scope line to acknowledge the deployed-node path, and cite Suspend-ClusterNode and Resume-ClusterNode. The BitLocker suspend/resume content is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1008covingtonlane
left a comment
There was a problem hiding this comment.
Re-reviewed at 280334a.
The deployed-node path is now covered. Previously the TSG was scoped purely pre-deployment, but the BitLocker section addressed already-encrypted machines, which are deployed cluster members; rebooting one into firmware to change Secure Boot needed cluster-safety steps that were missing.
This revision adds them:
- Step 1 drains the node first (health pre-checks on
Get-ClusterNode/Get-VirtualDisk/Get-StorageJob, thenSuspend-ClusterNode -Drain), one node at a time. - Step 6 resumes the node and waits for storage resync before the next node.
- The Overview now reconciles the scope (primarily a pre-deployment gate, with the deployed-member drain path called out).
Verified during this pass:
- Step renumbering and every cross-reference are consistent (BitLocker resume references step 2, node resume references step 1, "repeat steps 1 through 6").
- The
Suspend-ClusterNode/Resume-ClusterNodereference links resolve, and only the TSG file changed. - The BitLocker guidance itself is correct:
Suspend-BitLocker -RebootCount 0holds across the firmware change and reboot, and resume reseals to the new measured-boot (PCR 7) state. The UEFI-mode / GPT prerequisite is handled accurately.
No remaining findings. Ready for maintainer review.
…rimary, drain gated) Apply the same framing the TPM Version TSG (Azure#305) landed and that PR Azure#170 captured into the harness skill. Test-SecureBoot runs in the Hardware validator (Deployment and Add Node) and the readiness/bootstrap set, with no upgrade-renamed variant, so the machine it flags is a host being validated to become a node, not a deployed member. - BitLocker check/suspend is now the primary step 1, because a host being vetted may have been recycled from a prior project with BitLocker already enabled; a Secure Boot change (TPM PCR 7) would trip an encrypted volume into recovery regardless of deployment state. - The cluster-drain/quorum steps move into a gated 'If the machine is already a deployed, encrypted cluster member' section instead of being front-loaded as step 1. - Steps are now: 1 check/suspend BitLocker, 2 enable Secure Boot, 3 confirm, 4 resume BitLocker, plus the gated deployed-member drain section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds a public Environment Validator troubleshooting guide for the Hardware Secure Boot check (
AzStackHci_Hardware_Test_Secure_Boot, aggregatedAzStackHci_Hardware_SecureBoot). The check runsTest-SecureBoot/Confirm-SecureBootUEFIand is Critical (it blocks deployment until UEFI Secure Boot is enabled on the machine).What the TSG covers
Invoke-AzStackHciHardwareValidation -Include Test-SecureBoot), and theAzStackHciEnvironmentCheckerevent log (Event ID 17205), with the real failure detail.Suspend-BitLocker -RebootCount 0before the change andResume-BitLockerafter.Validation
The check was validated end to end on a live Azure Local (masonenode) lab cluster: injecting Secure Boot OFF made the validator return FAILURE, and enabling Secure Boot returned it to SUCCESS.
Indexed in the EnvironmentValidator README.