From 1a9c94411b674e30dc115d59634f11c52f4589aa Mon Sep 17 00:00:00 2001
From: 1008covingtonlane <42551186+1008covingtonlane@users.noreply.github.com>
Date: Fri, 26 Jun 2026 13:16:31 -0400
Subject: [PATCH 1/3] Add Environment Validator TSG: TPM Version
(AzStackHci_Hardware_Test_Tpm_Version)
Add a customer-facing TSG for the Hardware TPM Version environment check
(Test-TpmVersion), which fails when a present TPM reports a specification
version other than 2.0. Covers where the failure surfaces (portal validation,
the single on-box validator, and the AzStackHciEnvironmentChecker event log,
Event ID 17205), a source-accurate result example, and a firmware remediation
path that accounts for the platform-specific reality of TPM version changes:
the switch clears the module, can be limited or one-way or impossible by vendor,
and must be paired with suspending BitLocker and draining a deployed cluster
member so the firmware reboot does not strand the node. Also notes the
Test-TpmProperties companion check for TPM presence/enablement, and links the
TPM 2.0, Get-Tpm, BitLocker, and cluster-node-maintenance docs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
TSG/EnvironmentValidator/README.md | 1 +
...oubleshooting-Hardware-Test-Tpm-Version.md | 329 ++++++++++++++++++
2 files changed, 330 insertions(+)
create mode 100644 TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
diff --git a/TSG/EnvironmentValidator/README.md b/TSG/EnvironmentValidator/README.md
index 11611c19..ca84bf9c 100644
--- a/TSG/EnvironmentValidator/README.md
+++ b/TSG/EnvironmentValidator/README.md
@@ -6,6 +6,7 @@ This folder contains the TSG's related to Environment Validators.
* [Troubleshooting Test NetAdapter API Failure](./Troubleshooting-Test-NetAdapter-API.md)
* [Troubleshooting Test PhysicalDisk API Failure](./Troubleshooting-Test-PhysicalDisk-API.md)
* [Troubleshooting Test System Drive Free Space](./Troubleshooting-Test-SystemDrive-Free-Space.md)
+* [Troubleshooting TPM Version (Hardware Test TPM Version)](./Troubleshooting-Hardware-Test-Tpm-Version.md)
* [Troubleshooting TestPowerShell Module Version](./Troubleshooting-Test-PowerShell-Module-Version.md)
* [Troubleshooting Module Versions](Troubleshooting-Module-Versions.md)
* [Troubleshooting MSI Does Not Have Access to Subscription](Troubleshooting-MSI-Does-Not-Have-Access-To-Subscription.md)
diff --git a/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md b/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
new file mode 100644
index 00000000..02f62afd
--- /dev/null
+++ b/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
@@ -0,0 +1,329 @@
+# AzStackHci_Hardware_Test_Tpm_Version
+
+
+
+ | Name |
+ AzStackHci_Hardware_Test_Tpm_Version (aggregated as AzStackHci_Hardware_TpmVersion) |
+
+
+ | Display name |
+ TPM Version |
+
+
+ | Validator / test |
+ Test-TpmVersion (run with Invoke-AzStackHciHardwareValidation) |
+
+
+ | Component |
+ Hardware (Environment Validator / Environment Checker) |
+
+
+ | Severity |
+ Critical: this validator blocks deployment until the machine's TPM reports specification version 2.0. |
+
+
+ | Requirement |
+ Each machine must have a TPM that reports specification version 2.0 (TPM 2.0) before deployment. |
+
+
+ | Applicable Scenarios |
+ Deployment, Add Node, and Upgrade (pre-deployment / readiness validation). |
+
+
+ | Affected Versions |
+ Azure Local, version 23H2 and later. |
+
+
+
+## Overview
+
+This validator checks that each Azure Local machine has a **Trusted Platform Module (TPM)
+that reports specification version 2.0**. TPM 2.0 is part of the Azure Local hardware
+security baseline: it is the hardware root of trust that backs measured boot, BitLocker
+key protection, and the platform's attestation and secured-core features. The check fails
+when a TPM is present but reports a specification version other than 2.0 (for example a
+module in TPM 1.2 mode).
+
+It runs by reading the `Win32_Tpm` instance from each machine
+(`Get-CimInstance -Namespace root/cimv2/Security/MicrosoftTpm -ClassName Win32_Tpm`) and
+comparing the **first segment of the reported `SpecVersion`** to `2.0`. A machine whose
+TPM reports `2.0` is a **SUCCESS**; a machine whose TPM reports a different version (such
+as `1.2`) is a **FAILURE**.
+
+> **Important coverage note.** This check evaluates the TPM **version** only. If no TPM is
+> present at all, `Win32_Tpm` returns nothing and this specific check does not raise a
+> failure. Whether the TPM is **present and enabled** is covered by the companion check
+> `AzStackHci_Hardware_TpmProperties` (`Test-TpmProperties`), which fails when a TPM is
+> missing or disabled. If you are investigating a TPM problem, check both.
+
+While this check is failing, deployment is blocked at the Hardware validation stage and
+the machine cannot proceed. Unlike a software setting, the fix is a **firmware and
+hardware** change that is specific to your server model, and on some platforms it is
+limited, irreversible, or not possible at all (see [How to fix it](#how-to-fix-it)).
+
+## Where this failure appears
+
+You can see this failure in two places, the Azure portal and the machine itself. Both
+show the same underlying result.
+
+### In the Azure portal
+
+This check runs during the deployment validation step. When you deploy Azure Local from
+the portal (or with a deployment template), the **Validation** phase runs the
+environment checks and lists any that fail:
+
+1. Open the Azure Local deployment for your cluster and go to its **Validation**
+ results (the deployment surfaces these before it proceeds to apply).
+2. In the list of checks, this one appears under its display name, **TPM Version**, with
+ a **Critical** severity.
+3. Select the failing check to see the per-machine detail, which names the machine whose
+ TPM is not reporting version 2.0.
+
+### On the machine
+
+Two on-box sources carry the result.
+
+**Run the single validator (fastest).** The Environment Checker module ships on every
+Azure Local machine, so you can run this one Hardware check directly and read the result
+in a few seconds. Use `-Include Test-TpmVersion` to run only this check, so you do not
+have to run the full Hardware validation suite:
+
+```powershell
+$r = Invoke-AzStackHciHardwareValidation -Include Test-TpmVersion -PassThru
+$r | Select-Object Name, Status, Severity
+$r.AdditionalData.Detail
+```
+
+You can also read the underlying values directly:
+
+```powershell
+# Presence / enabled state (covered by Test-TpmProperties, shown here for context).
+Get-Tpm | Select-Object TpmPresent, TpmReady, TpmEnabled
+
+# The version this check evaluates. It compares the FIRST comma-separated segment
+# of SpecVersion to '2.0' (for example "2.0, 0, 1.59" passes; "1.2, ..." fails).
+(Get-CimInstance -Namespace 'root/cimv2/Security/MicrosoftTpm' -ClassName Win32_Tpm).SpecVersion
+```
+
+A machine whose TPM reports a non-2.0 version returns `Status` of `FAILURE` and a detail
+line of the form:
+
+```
+Machine: AzL-Node-01, Class: Tpm, Manufacturer ID: 1314145024 Tpm version is 1.2. Expected 2.0
+```
+
+**Event log (per machine).** The Environment Checker writes every check result to the
+**AzStackHciEnvironmentChecker** event log, located at
+`C:\Windows\System32\winevt\Logs\AzStackHciEnvironmentChecker.evtx`. Each result is the
+JSON body of an **Event ID 17205** entry. To read this check's most recent result on a
+machine:
+
+```powershell
+Get-WinEvent -LogName AzStackHciEnvironmentChecker -FilterXPath '*[System[(EventID=17205)]]' -MaxEvents 2000 |
+ Where-Object { $_.Message -match 'AzStackHci_Hardware_Test_Tpm_Version' } |
+ Select-Object -First 1 -ExpandProperty Message
+```
+
+In both sources the result for this check looks like this:
+
+```json
+{
+ "Name": "AzStackHci_Hardware_Test_Tpm_Version",
+ "Title": "Test TPM Version",
+ "DisplayName": "Test TPM Version AzL-Node-01",
+ "Severity": "Critical",
+ "Status": "FAILURE",
+ "Description": "Checking TPM for desired version (2.0)",
+ "TargetResourceName": "Machine: AzL-Node-01, Class: Tpm, Manufacturer ID: 1314145024",
+ "Remediation": "https://aka.ms/hci-envch",
+ "AdditionalData": {
+ "Source": "Version",
+ "Resource": "1.2",
+ "Detail": "Machine: AzL-Node-01, Class: Tpm, Manufacturer ID: 1314145024 Tpm version is 1.2. Expected 2.0",
+ "Status": "FAILURE"
+ }
+}
+```
+
+## How to fix it
+
+The TPM specification version is a firmware and hardware property, so the fix is made in
+the machine's firmware setup (or with the vendor's management tooling), not from Windows.
+**Before you change anything, read the warnings in this section.** Unlike most validator
+fixes, changing a TPM's version is platform-specific and has serious side effects:
+
+- **Switching a TPM clears it.** Moving a TPM between specification versions (for example
+ 1.2 to 2.0) re-provisions the module and **erases the keys it holds**. Any key sealed to
+ that TPM, including a **BitLocker** key protector, is invalidated by the change.
+- **It is vendor-specific and may be limited or impossible.** Some platforms allow a
+ reversible firmware switch (sometimes with a documented limit on how many times it can be
+ done), some allow only a one-way move, and some ship a **fixed module that cannot be
+ switched at all** and would have to be replaced. Consult your hardware vendor's TPM
+ documentation for your exact model before proceeding.
+
+The high-level order is: if the machine is an already-deployed cluster member, drain it
+first; if it has BitLocker on, suspend BitLocker and confirm the recovery key is escrowed;
+enable the TPM and set it to TPM 2.0 in firmware per your vendor's documentation; confirm;
+resume BitLocker; and resume the node. Then re-run the check.
+
+### 1. Confirm the current TPM state
+
+Establish what the machine actually reports before you touch firmware:
+
+```powershell
+Get-Tpm | Select-Object TpmPresent, TpmReady, TpmEnabled
+(Get-CimInstance -Namespace 'root/cimv2/Security/MicrosoftTpm' -ClassName Win32_Tpm).SpecVersion
+```
+
+- If `TpmPresent` is `False`, the machine has no usable TPM. This version check will not
+ fail (it only evaluates a present TPM), but `Test-TpmProperties` will, and the machine
+ is not deployable without a TPM. This is a hardware action, not a firmware setting.
+- If a TPM is present but `SpecVersion` starts with something other than `2.0`, continue
+ below.
+
+### 2. If the machine is an already-deployed cluster member, drain it first
+
+If this machine has BitLocker on, it has almost certainly already been deployed into a
+cluster (Azure Local turns on encryption during deployment). Changing the TPM requires a
+reboot into firmware, which takes this node down, so drain it first and do **one node at a
+time**. This is a [MEDIUM RISK] change: draining live-migrates VMs off the node, and the
+node is unavailable until you resume it.
+
+```powershell
+# Confirm the cluster is healthy and can lose this one node before you start.
+Get-ClusterNode | Select-Object Name, State # every other node should be Up
+Get-VirtualDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus # all Healthy / OK
+Get-StorageJob # should be empty (no active repair/resync)
+```
+
+Only continue when every other node is `Up`, all virtual disks are Healthy, and
+`Get-StorageJob` returns nothing. Then pause and drain this node so its VMs live-migrate
+off:
+
+```powershell
+Suspend-ClusterNode -Name -Drain
+# Confirm the node is Paused and its roles have moved before you reboot it.
+Get-ClusterNode -Name | Select-Object Name, State # State should be Paused
+```
+
+### 3. If the machine has BitLocker enabled, suspend it first
+
+Changing the TPM alters (and in the version-switch case **clears**) the hardware root of
+trust that BitLocker seals its key to. On a machine where **BitLocker is enabled, the next
+boot after the change will stop at the BitLocker recovery screen** and ask for the 48-digit
+recovery password, which can strand the machine. Azure Local enables data-at-rest
+encryption (BitLocker) by default, so a machine that has already been deployed (or any
+machine with drive encryption) is affected. A fresh, pre-deployment machine that has never
+been encrypted is not.
+
+If BitLocker is on, suspend it **before** you change firmware, and resume it **after** the
+machine is back and the TPM is confirmed. Use `-RebootCount 0` so the suspend holds across
+the firmware change and reboot until you explicitly resume it:
+
+```powershell
+# Are any volumes protected?
+Get-BitLockerVolume | Select-Object MountPoint, ProtectionStatus, VolumeStatus
+
+# Suspend each protected volume indefinitely (until you resume it).
+Suspend-BitLocker -MountPoint "C:" -RebootCount 0
+# Repeat for any data volumes that report ProtectionStatus = On, for example:
+# Suspend-BitLocker -MountPoint "C:\ClusterStorage\Volume1" -RebootCount 0
+```
+
+You will resume BitLocker in step 6, after the TPM is confirmed. **Confirm the recovery
+key is available (escrowed) before you start**, because a TPM version switch clears the
+module and the machine must be recoverable with the recovery key if anything is
+interrupted.
+
+### 4. Enable the TPM and set it to TPM 2.0 in firmware
+
+1. Reboot the machine and enter firmware setup (the key varies by vendor, commonly `F2`,
+ `F10`, `Del`, or via the BMC / iDRAC / iLO / XClarity remote console).
+2. Locate the **TPM** (sometimes shown as "Security Device", "Trusted Computing", or
+ "PTT/Intel Platform Trust Technology" / "AMD fTPM") settings.
+3. Make sure the TPM is **enabled** and visible to the operating system.
+4. If the platform supports selecting the TPM specification version and the TPM is in 1.2
+ mode, set it to **2.0** (often labelled "TPM Device Version", "TCG Spec Version", or
+ similar), following your vendor's documented procedure. **Heed the warnings in this
+ section first**: the switch clears the TPM and may be limited or one-way on your model.
+5. Save and exit, and let the machine boot back into the OS.
+
+The exact menu names and the availability of a version switch are vendor-specific. If your
+platform's TPM is a fixed module that cannot report 2.0, it cannot be remediated in
+firmware and the module (or machine) must be brought to spec by your hardware vendor;
+confirm the machine is on the Azure Local supported hardware list.
+
+### 5. Confirm the TPM now reports version 2.0
+
+```powershell
+Get-Tpm | Select-Object TpmPresent, TpmReady, TpmEnabled
+(Get-CimInstance -Namespace 'root/cimv2/Security/MicrosoftTpm' -ClassName Win32_Tpm).SpecVersion
+```
+
+The first segment of `SpecVersion` should now be `2.0`.
+
+### 6. Resume BitLocker (only if you suspended it in step 3)
+
+```powershell
+Resume-BitLocker -MountPoint "C:"
+# And any data volumes you suspended, for example:
+# Resume-BitLocker -MountPoint "C:\ClusterStorage\Volume1"
+```
+
+Resuming reseals the BitLocker key to the current TPM. If the TPM was cleared by the
+version switch, make sure the volume re-protects cleanly and a fresh recovery key is
+escrowed.
+
+### 7. Resume the cluster node (only if you drained it in step 2)
+
+Bring the node back into the cluster and let storage resync before you touch the next node.
+
+```powershell
+Resume-ClusterNode -Name
+# Wait for resync to finish before moving on; do not drain the next node until this clears.
+Get-StorageJob # wait until empty
+Get-VirtualDisk | Select-Object FriendlyName, HealthStatus # back to Healthy
+```
+
+Repeat steps 1 through 7 for each remaining machine, one node at a time, so the cluster
+always keeps quorum and storage resiliency.
+
+## Verify the fix
+
+Re-run the single validator:
+
+```powershell
+$r = Invoke-AzStackHciHardwareValidation -Include Test-TpmVersion -PassThru
+$r | Select-Object Name, Status, Severity
+$r.AdditionalData.Detail
+```
+
+A machine whose TPM reports version 2.0 returns `Status` of `SUCCESS`. Once every machine
+you are deploying reports success, re-run the deployment validation; the **TPM Version**
+check should now pass and deployment can proceed.
+
+## When to escalate
+
+Open a support case if any of the following are true:
+
+- `Win32_Tpm` reports `SpecVersion` starting with `2.0`, but the **TPM Version** check
+ still fails during deployment validation.
+- The firmware has no option to enable a TPM or to select TPM 2.0, or the platform's TPM
+ is a fixed module that cannot report 2.0. TPM 2.0 is an Azure Local hardware requirement,
+ so confirm the machine is on the Azure Local supported hardware list, and engage your
+ hardware vendor if the module must be replaced.
+- The machine has no TPM at all (`TpmPresent` is `False`); this is a hardware requirement
+ that cannot be satisfied in firmware.
+- The machine stops at the BitLocker recovery screen after the change and the recovery key
+ is not available.
+
+## Related
+
+- General Environment Checker remediation link shown in the validator output:
+ https://aka.ms/hci-envch
+- [Azure Local security features and baseline](https://learn.microsoft.com/azure/azure-local/concepts/security-features)
+- [Trusted Platform Module (TPM 2.0) overview](https://learn.microsoft.com/windows/security/hardware-security/tpm/trusted-platform-module-overview)
+- [Get-Tpm](https://learn.microsoft.com/powershell/module/trustedplatformmodule/get-tpm)
+- [Suspend-BitLocker before firmware changes](https://learn.microsoft.com/powershell/module/bitlocker/suspend-bitlocker)
+- [Suspend-ClusterNode (pause and drain a node)](https://learn.microsoft.com/powershell/module/failoverclusters/suspend-clusternode)
+- [Resume-ClusterNode](https://learn.microsoft.com/powershell/module/failoverclusters/resume-clusternode)
From 08cb5df8aeb33573a0047b5a75333dc801e29ac1 Mon Sep 17 00:00:00 2001
From: 1008covingtonlane <42551186+1008covingtonlane@users.noreply.github.com>
Date: Fri, 26 Jun 2026 13:47:55 -0400
Subject: [PATCH 2/3] TPM Version TSG: reframe for the pre-deployment scenario;
fix bot display-name finding
Addresses review of PR #305:
- Bot finding (display name): the metadata row said 'TPM Version' while the result
JSON shows 'Test TPM Version'. Note both forms in the row and add a
names-across-surfaces callout (portal aggregated name vs the verbose per-machine
Title in the JSON / event log); the underlying Name is identical.
- Scenario accuracy: this check (AzStackHci_Hardware_Test_Tpm_Version) is emitted only
by the Hardware validator, whose OperationType is Deployment and Add Node, so the
machine it flags is a host being validated to become a node, not a deployed member.
Reframe the Overview and remediation accordingly and correct the Applicable Scenarios
row (the upgrade-time TPM check is a separately named AzStackHci_Upgrade_* WARNING).
- BitLocker in the primary path: a host being vetted may have been recycled from a prior
project with BitLocker already enabled, so check for and suspend BitLocker before the
firmware change regardless of deployment state. The cluster-drain/quorum steps are now
gated to the uncommon case of a live, deployed cluster member.
- Add a 'Before you start' hardware-capability gate so a customer on a fixed-module or
one-way platform does not drain a node or suspend BitLocker for a change that turns out
to be impossible in firmware.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
...oubleshooting-Hardware-Test-Tpm-Version.md | 161 ++++++++++--------
1 file changed, 91 insertions(+), 70 deletions(-)
diff --git a/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md b/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
index 02f62afd..6e2f83be 100644
--- a/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
+++ b/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
@@ -7,7 +7,7 @@
| Display name |
- TPM Version |
+ TPM Version (the aggregated name shown in the portal). The per-machine result JSON and event log carry the verbose form Test TPM Version <machine>; both refer to this same check. |
| Validator / test |
@@ -27,7 +27,7 @@
| Applicable Scenarios |
- Deployment, Add Node, and Upgrade (pre-deployment / readiness validation). |
+ Deployment and Add Node (pre-deployment readiness validation). |
| Affected Versions |
@@ -61,6 +61,13 @@ the machine cannot proceed. Unlike a software setting, the fix is a **firmware a
hardware** change that is specific to your server model, and on some platforms it is
limited, irreversible, or not possible at all (see [How to fix it](#how-to-fix-it)).
+This check runs during **pre-deployment validation** (the Deployment and Add Node readiness
+checks), so the machine it evaluates is normally a **host being validated to become a cluster
+node**, not an existing cluster member. The remediation is usually short (set the TPM to 2.0
+in firmware and re-validate), but two cautions apply: a host being vetted may have been
+**recycled from another project and could already have BitLocker enabled**, and the
+cluster-drain precaution is needed only if the machine is already a live, deployed member.
+
## Where this failure appears
You can see this failure in two places, the Azure portal and the machine itself. Both
@@ -145,12 +152,25 @@ In both sources the result for this check looks like this:
}
```
+> **A note on names across surfaces.** The portal shows the aggregated display name
+> **TPM Version**, while the per-machine result JSON and the event log carry the verbose form
+> **Test TPM Version ``**. The underlying `Name`,
+> `AzStackHci_Hardware_Test_Tpm_Version`, is the same on both, so if you are matching strings
+> between the portal and the on-box output, expect the two forms.
+
## How to fix it
-The TPM specification version is a firmware and hardware property, so the fix is made in
+This check runs during **pre-deployment validation**, so the machine it flags is normally a
+**host being prepared to become a cluster node**, not a running cluster member: there is
+usually no cluster to keep in quorum. Do not assume the host is otherwise "clean", though.
+A host being vetted may have been **recycled from another project and could already have
+BitLocker enabled**, and a TPM change trips an encrypted volume into recovery, so check for
+BitLocker before you touch firmware (step 2). The cluster-drain precaution only applies in
+the uncommon case that the machine is already a live, deployed cluster member.
+
+The TPM specification version is a firmware and hardware property, so the change is made in
the machine's firmware setup (or with the vendor's management tooling), not from Windows.
-**Before you change anything, read the warnings in this section.** Unlike most validator
-fixes, changing a TPM's version is platform-specific and has serious side effects:
+**Before you change anything, read these two warnings:**
- **Switching a TPM clears it.** Moving a TPM between specification versions (for example
1.2 to 2.0) re-provisions the module and **erases the keys it holds**. Any key sealed to
@@ -161,10 +181,18 @@ fixes, changing a TPM's version is platform-specific and has serious side effect
switched at all** and would have to be replaced. Consult your hardware vendor's TPM
documentation for your exact model before proceeding.
-The high-level order is: if the machine is an already-deployed cluster member, drain it
-first; if it has BitLocker on, suspend BitLocker and confirm the recovery key is escrowed;
-enable the TPM and set it to TPM 2.0 in firmware per your vendor's documentation; confirm;
-resume BitLocker; and resume the node. Then re-run the check.
+### Before you start: confirm a TPM 2.0 switch is possible on your hardware
+
+The single most platform-variable fact is whether your model can switch to TPM 2.0 at all,
+so settle it first. Read the current state (step 1 below, non-disruptive), then consult your
+hardware vendor's TPM documentation for your exact model to confirm whether the TPM can be
+switched from 1.2 to 2.0, and if so whether the switch is reversible or subject to a toggle
+limit.
+
+If the machine has **no TPM**, or its TPM is a **fixed module that cannot report 2.0**, the
+firmware steps below will not help. Confirm the machine is on the Azure Local supported
+hardware list and engage your hardware vendor (the module or machine has to be brought to
+spec). Do not start any disruptive change until you have confirmed the switch is possible.
### 1. Confirm the current TPM state
@@ -181,61 +209,34 @@ Get-Tpm | Select-Object TpmPresent, TpmReady, TpmEnabled
- If a TPM is present but `SpecVersion` starts with something other than `2.0`, continue
below.
-### 2. If the machine is an already-deployed cluster member, drain it first
-
-If this machine has BitLocker on, it has almost certainly already been deployed into a
-cluster (Azure Local turns on encryption during deployment). Changing the TPM requires a
-reboot into firmware, which takes this node down, so drain it first and do **one node at a
-time**. This is a [MEDIUM RISK] change: draining live-migrates VMs off the node, and the
-node is unavailable until you resume it.
-
-```powershell
-# Confirm the cluster is healthy and can lose this one node before you start.
-Get-ClusterNode | Select-Object Name, State # every other node should be Up
-Get-VirtualDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus # all Healthy / OK
-Get-StorageJob # should be empty (no active repair/resync)
-```
+### 2. Check for BitLocker, and suspend it if present
-Only continue when every other node is `Up`, all virtual disks are Healthy, and
-`Get-StorageJob` returns nothing. Then pause and drain this node so its VMs live-migrate
-off:
+Do this even on a fresh pre-deployment host. A host you are vetting may have been **recycled
+from a previous project with BitLocker already enabled**, and a TPM version change clears the
+module, which invalidates the TPM-sealed BitLocker key. If a protected volume is left armed,
+**the next boot after the change stops at the BitLocker recovery screen** and asks for the
+48-digit recovery password, which can strand the machine.
```powershell
-Suspend-ClusterNode -Name -Drain
-# Confirm the node is Paused and its roles have moved before you reboot it.
-Get-ClusterNode -Name | Select-Object Name, State # State should be Paused
+# Are any volumes protected? (On a truly clean, never-encrypted host this is empty.)
+Get-BitLockerVolume | Select-Object MountPoint, ProtectionStatus, VolumeStatus
```
-### 3. If the machine has BitLocker enabled, suspend it first
-
-Changing the TPM alters (and in the version-switch case **clears**) the hardware root of
-trust that BitLocker seals its key to. On a machine where **BitLocker is enabled, the next
-boot after the change will stop at the BitLocker recovery screen** and ask for the 48-digit
-recovery password, which can strand the machine. Azure Local enables data-at-rest
-encryption (BitLocker) by default, so a machine that has already been deployed (or any
-machine with drive encryption) is affected. A fresh, pre-deployment machine that has never
-been encrypted is not.
-
-If BitLocker is on, suspend it **before** you change firmware, and resume it **after** the
-machine is back and the TPM is confirmed. Use `-RebootCount 0` so the suspend holds across
-the firmware change and reboot until you explicitly resume it:
+If every volume reports `ProtectionStatus = Off`, there is nothing to suspend; go to step 3.
+If any volume is protected, **confirm its recovery key is escrowed first**, then suspend it
+with `-RebootCount 0` so the suspend holds across the firmware change and reboot until you
+explicitly resume it:
```powershell
-# Are any volumes protected?
-Get-BitLockerVolume | Select-Object MountPoint, ProtectionStatus, VolumeStatus
-
-# Suspend each protected volume indefinitely (until you resume it).
Suspend-BitLocker -MountPoint "C:" -RebootCount 0
-# Repeat for any data volumes that report ProtectionStatus = On, for example:
-# Suspend-BitLocker -MountPoint "C:\ClusterStorage\Volume1" -RebootCount 0
+# Repeat for any data volume that reports ProtectionStatus = On, for example:
+# Suspend-BitLocker -MountPoint "D:" -RebootCount 0
```
-You will resume BitLocker in step 6, after the TPM is confirmed. **Confirm the recovery
-key is available (escrowed) before you start**, because a TPM version switch clears the
-module and the machine must be recoverable with the recovery key if anything is
-interrupted.
+### 3. Enable the TPM and set it to TPM 2.0 in firmware
-### 4. Enable the TPM and set it to TPM 2.0 in firmware
+> If this machine is already a deployed, encrypted cluster member, do **not** reboot it into
+> firmware yet. Follow [If the machine is already a deployed cluster member](#if-the-machine-is-already-a-deployed-encrypted-cluster-member) first so you take the node down safely.
1. Reboot the machine and enter firmware setup (the key varies by vendor, commonly `F2`,
`F10`, `Del`, or via the BMC / iDRAC / iLO / XClarity remote console).
@@ -244,16 +245,16 @@ interrupted.
3. Make sure the TPM is **enabled** and visible to the operating system.
4. If the platform supports selecting the TPM specification version and the TPM is in 1.2
mode, set it to **2.0** (often labelled "TPM Device Version", "TCG Spec Version", or
- similar), following your vendor's documented procedure. **Heed the warnings in this
- section first**: the switch clears the TPM and may be limited or one-way on your model.
+ similar), following your vendor's documented procedure. **Heed the warnings above**: the
+ switch clears the TPM and may be limited or one-way on your model.
5. Save and exit, and let the machine boot back into the OS.
The exact menu names and the availability of a version switch are vendor-specific. If your
-platform's TPM is a fixed module that cannot report 2.0, it cannot be remediated in
-firmware and the module (or machine) must be brought to spec by your hardware vendor;
-confirm the machine is on the Azure Local supported hardware list.
+platform's TPM is a fixed module that cannot report 2.0, it cannot be remediated in firmware
+and the module (or machine) must be brought to spec by your hardware vendor; confirm the
+machine is on the Azure Local supported hardware list.
-### 5. Confirm the TPM now reports version 2.0
+### 4. Confirm the TPM now reports version 2.0
```powershell
Get-Tpm | Select-Object TpmPresent, TpmReady, TpmEnabled
@@ -262,31 +263,51 @@ Get-Tpm | Select-Object TpmPresent, TpmReady, TpmEnabled
The first segment of `SpecVersion` should now be `2.0`.
-### 6. Resume BitLocker (only if you suspended it in step 3)
+### 5. Resume BitLocker (only if you suspended it in step 2)
```powershell
Resume-BitLocker -MountPoint "C:"
-# And any data volumes you suspended, for example:
-# Resume-BitLocker -MountPoint "C:\ClusterStorage\Volume1"
+# And any data volume you suspended, for example:
+# Resume-BitLocker -MountPoint "D:"
```
-Resuming reseals the BitLocker key to the current TPM. If the TPM was cleared by the
-version switch, make sure the volume re-protects cleanly and a fresh recovery key is
-escrowed.
+Resuming reseals the BitLocker key to the new TPM. Because the version switch cleared the
+module, make sure each volume re-protects cleanly and a fresh recovery key is escrowed.
+
+### If the machine is already a deployed, encrypted cluster member
+
+Because this is a pre-deployment check, it does not normally fire on a machine that is
+already a deployed cluster node: the machine must have reported TPM 2.0 to deploy, and the
+version does not change on its own. But if you are changing the TPM on a machine that is
+**already a live, encrypted cluster member** for any reason, add one precaution to the steps
+above: the firmware reboot takes a running node down, so **drain it first** and do this
+**one node at a time**.
+
+This is a [MEDIUM RISK] change: draining live-migrates VMs off the node, and the node is
+unavailable until you resume it.
+
+```powershell
+# Confirm the cluster is healthy and can lose this one node before you start.
+Get-ClusterNode | Select-Object Name, State # every other node should be Up
+Get-VirtualDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus # all Healthy / OK
+Get-StorageJob # should be empty (no active repair/resync)
-### 7. Resume the cluster node (only if you drained it in step 2)
+# Only when the cluster is healthy, pause and drain this node so its VMs live-migrate off.
+Suspend-ClusterNode -Name -Drain
+Get-ClusterNode -Name | Select-Object Name, State # State should be Paused
+```
-Bring the node back into the cluster and let storage resync before you touch the next node.
+Then run steps 2 through 5 above (suspend BitLocker, change firmware, confirm, resume
+BitLocker). Finally bring the node back and let storage resync before the next one:
```powershell
Resume-ClusterNode -Name
-# Wait for resync to finish before moving on; do not drain the next node until this clears.
Get-StorageJob # wait until empty
Get-VirtualDisk | Select-Object FriendlyName, HealthStatus # back to Healthy
```
-Repeat steps 1 through 7 for each remaining machine, one node at a time, so the cluster
-always keeps quorum and storage resiliency.
+Repeat for each remaining member, one node at a time, so the cluster always keeps quorum and
+storage resiliency.
## Verify the fix
From 88c6700d5828cc7d0e244a6374c3bac11818ddd6 Mon Sep 17 00:00:00 2001
From: 1008covingtonlane <42551186+1008covingtonlane@users.noreply.github.com>
Date: Fri, 26 Jun 2026 17:48:40 -0400
Subject: [PATCH 3/3] TPM Version TSG: turn 'Before you start' into a
capability + ownership decision matrix
From the 10-persona usability read, this was the single highest-leverage change (it resolved
nearly every persona's 'wants improved'). Replace the prose 'Before you start' with a decision
table keyed on what the hardware/vendor reports (already 2.0 / switchable 1.2 / one-way or
limited / fixed module / no TPM), each row naming what it means, WHO owns the action (server or
firmware admin, hardware vendor/OEM, procurement), and what to do. Keep the explicit STOP gate
(model supported + BitLocker key escrowed + drained if a deployed member) and add an
expectation-setting note about hardware lead time for fixed-module/unsupported cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
...oubleshooting-Hardware-Test-Tpm-Version.md | 35 ++++++++++++-------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md b/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
index 6e2f83be..4c419415 100644
--- a/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
+++ b/TSG/EnvironmentValidator/Troubleshooting-Hardware-Test-Tpm-Version.md
@@ -181,18 +181,29 @@ the machine's firmware setup (or with the vendor's management tooling), not from
switched at all** and would have to be replaced. Consult your hardware vendor's TPM
documentation for your exact model before proceeding.
-### Before you start: confirm a TPM 2.0 switch is possible on your hardware
-
-The single most platform-variable fact is whether your model can switch to TPM 2.0 at all,
-so settle it first. Read the current state (step 1 below, non-disruptive), then consult your
-hardware vendor's TPM documentation for your exact model to confirm whether the TPM can be
-switched from 1.2 to 2.0, and if so whether the switch is reversible or subject to a toggle
-limit.
-
-If the machine has **no TPM**, or its TPM is a **fixed module that cannot report 2.0**, the
-firmware steps below will not help. Confirm the machine is on the Azure Local supported
-hardware list and engage your hardware vendor (the module or machine has to be brought to
-spec). Do not start any disruptive change until you have confirmed the switch is possible.
+### Before you start: decide whether and how this can be fixed (and who does it)
+
+The single most platform-variable fact is whether your exact server model can switch to TPM
+2.0 at all, so settle that first. Read the current state (step 1 below, non-disruptive), then
+consult your hardware vendor's TPM documentation for your model and use this table to decide
+the path and the owner **before** any disruptive change:
+
+| What your hardware reports / the vendor says | What it means | Who owns the action | What to do |
+| --- | --- | --- | --- |
+| TPM already reports **2.0** | This check should pass | No change needed | Re-confirm with step 1; if it still fails, see [When to escalate](#when-to-escalate) |
+| TPM present, reports **1.2**, vendor says it is **switchable to 2.0** | A firmware switch is possible (it clears the TPM) | Server / firmware admin; Windows admin confirms BitLocker | Escrow the BitLocker key first, then follow [How to fix it](#how-to-fix-it) |
+| TPM **1.2**, switch is **one-way or limited** (for example a toggle-count cap) | You can switch but cannot easily go back | Server / firmware admin **with hardware-vendor sign-off** | Confirm with the vendor, then treat it as a one-time change |
+| TPM is a **fixed module** that cannot report 2.0 | Cannot be fixed in firmware | Hardware vendor (OEM) | Engage the OEM; the module or machine must be brought to spec. Expect lead time |
+| **No TPM present** | Not deployable (this version check will not fail, but `Test-TpmProperties` will) | Hardware vendor (OEM) plus procurement | Confirm the machine is on the Azure Local supported hardware list; add or replace the TPM |
+
+**Do not start any disruptive change until you have confirmed all three:** the switch is
+supported on your exact model, the **BitLocker recovery key is escrowed**, and (if this machine
+is already a deployed cluster member) it has been **drained** first. A TPM switch clears the
+module and is sometimes irreversible, so if any of the three is unknown, stop and confirm.
+
+> **Setting expectations:** a firmware switch is usually quick, but a fixed-module or
+> unsupported-hardware case means a hardware change or replacement with real lead time and
+> possible procurement. Surface that to the customer early so the deployment schedule reflects it.
### 1. Confirm the current TPM state