Add N9300 switch memory check#370
Conversation
Harinadh-Saladi
left a comment
There was a problem hiding this comment.
Pls address the comments
|
@Priyanka-Patil14 please attach test result from APIC for possible scenario's |
|
Uploaded the logs. |
|
@Priyanka-Patil14 can you please provide updated script output from APIC and error handling cases? |
MemoryCheck_Logs.txt Uploaded the logs. Please review it. |
|
Attached pytest and full script run logs as requested. |
bac2092 to
8197e54
Compare
…n required (datacenter#355) * add `--max-threads` arg * fix bad descriptor errs/race conditions * update pytests
… cscwh68103 invalid fabricpathep targets (datacenter#357) * specific testing for known failure conditions of cscwh68103 as to not catch valid scenarios
8197e54 to
d7d4289
Compare
|
Attaching the full run logs. NewValidation_APIC_FullRun_Logs.txt |
lovkeshsharma702
left a comment
There was a problem hiding this comment.
intigrated test completed with no fault.
|
|
||
| Impact: Running an N9K-C93180YC-FX3 switch with less than 32GB memory can lead to memory pressure and increase the risk of service instability. | ||
|
|
||
| If any N9K-C93180YC-FX3 switch is flagged by this check, upgrade the switch memory to at least 32GB. |
There was a problem hiding this comment.
Dig into the TME doc on this one, we should align messaging on script to match external docs, not introduce contradictions to obfuscation.
if doc needs to be made clear to address on proactive failure notice, then do it.
goal should be to link that doc or bug back to the check.
There was a problem hiding this comment.
reference back to CSCwm42741
| result = NA | ||
| msg = 'No N9K-C93180YC-FX3 switches found. Skipping.' | ||
| else: | ||
| proc_mem_mos = icurl('class', 'procMemUsage.json') |
There was a problem hiding this comment.
use API ?query-target-filter=lt(procMemUsage.total,"<32gigs>") logic to reduce the churn.
alternatively, the icurl function could be updated to take in a node specifier in the URI:
/api/node-xxx/class/procMemUsage...
| missing_nodes = [] | ||
|
|
||
| for node in affected_nodes: | ||
| node_id = node['fabricNode']['attributes']['id'] | ||
| total_kb = node_total_kb.get(node_id) | ||
| if total_kb is None: | ||
| missing_nodes.append([ | ||
| node_id, | ||
| node['fabricNode']['attributes']['name'], | ||
| node['fabricNode']['attributes']['model'], | ||
| ]) | ||
| continue | ||
|
|
||
| if total_kb < min_memory_kb: | ||
| memory_in_gb = round(total_kb / 1000000, 2) | ||
| result = MANUAL | ||
| data.append([ | ||
| node_id, | ||
| node['fabricNode']['attributes']['name'], | ||
| node['fabricNode']['attributes']['model'], | ||
| memory_in_gb, | ||
| ]) | ||
|
|
||
| if missing_nodes and data: | ||
| result = MANUAL | ||
| msg = ( | ||
| 'Some N9K-C93180YC-FX3 nodes have insufficient memory and others are missing ' | ||
| 'procMemUsage data. Please manually verify the memory on all affected nodes.\n' | ||
| 'Nodes with insufficient memory: {}\n' | ||
| 'Nodes with missing data: {}'.format( | ||
| ', '.join(str(row[0]) for row in data), | ||
| ', '.join(str(row[0]) for row in missing_nodes), | ||
| ) | ||
| ) | ||
| headers = ['NodeId', 'Name', 'Model', 'Memory Detected (GB)'] | ||
| data = data + [row + ['N/A'] for row in missing_nodes] | ||
| elif missing_nodes: | ||
| result = ERROR | ||
| msg = 'Missing procMemUsage data for one or more affected N9K-C93180YC-FX3 nodes.' | ||
| headers = ['NodeId', 'Name', 'Model'] | ||
| data = missing_nodes | ||
| recommended_action = '' |
There was a problem hiding this comment.
In what scenario do we see missing nodes? impication is that either fabricNode or procMemUsage is not being returned for a switch actively in the fabric.
if this was seen in testing, file the behavior as a bug and keep the logic with that explanation. If not, I don't know of any scenario where we would expect missing nodes for the sake of flagging them.
|
|
||
| This check applies to N9K-C93180YC-FX3 switches only. It checks whether the switch has less than 32GB of memory. The minimum RAM requirement for the N9K-C93180YC-FX3 to operate properly in ACI mode is 32GB. This check is not version dependent and runs for all upgrade versions. | ||
|
|
||
| [CSCwm42741][70] tracks this issue. N9K-C93180YC-FX3 switches running in ACI mode with less than 32GB of memory will not perform well and are at risk of service instability. On ACI releases that include the fix for CSCwm42741, a critical fault F4680 (`eqpt-low-memory-device`) is raised on affected switches. |
There was a problem hiding this comment.
[CSCwm42741][70] tracks this issue. N9K-C93180YC-FX3 switches running in ACI mode with less than 32GB of memory will not perform well and are at risk of service instability. With fix of CSCwm42741, a critical fault F4680 (eqpt-low-memory-device) is raised on affected switches.
| @@ -0,0 +1,13 @@ | |||
| [ | |||
There was a problem hiding this comment.
please match the file name also with model and not general 9300..
every statement, filename should have proper name for clarity.
| 'One or more switches have less than 32GB of memory and may experience service instability. ' | ||
| 'Upgrade the switch memory to at least 32GB.' | ||
| ) | ||
|
|
There was a problem hiding this comment.
grammer mistake. "switches have less than 32GB of memory may experience service instability. '
'Upgrade the switch memory to at least 32GB.'"
There was a problem hiding this comment.
Corrected the sentence.
| result = PASS | ||
| headers = ["NodeId", "Name", "Model", "Memory Detected (GB)"] | ||
| data = [] | ||
| recommended_action = 'Increase the switch memory to at least 32GB on affected N9K-C93180YC-FX3 switches.' |
There was a problem hiding this comment.
change it to "Increase the switch memory to at least 32GB on affected N9K-C93180YC-FX3."
lovkeshsharma702
left a comment
There was a problem hiding this comment.
intigration test passed. Logic changed, Code added in bug section with bug reference.
Summary
This PR adds a new validation check: N9300 Switch Memory.
The check runs only when the target upgrade version is 6.1 and validates that N9300-series switches have at least 24 GB memory before upgrade.
What Changed
n9300_switch_memory_checkinaci-preupgrade-validation-script.pydocs/docs/validations.mdtests/checks/n9300_switch_memory_24g_check/Check Behavior
MANUALif target version is missingN/Aif target version is not 6.1N/Aif no N9300 switches are presentFAIL_Ofor N9300 nodes with memory< 24 GBPASSwhen all applicable N9300 nodes have>= 24 GB##Test Validation
Executed:
pytest tests/checks/n9300_switch_memory_24g_check/test_n9300_switch_memory_24g_check.py -qResult:
7 passed in 0.11sNo failures observed.