Improve dissolve network process#2412
Conversation
| } | ||
|
|
||
| // Ignore the weight for a single value update | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex is a single Vec, but retain scans and rewrites the whole vector after the metered prefix clears finish. A subnet with many timelocked commitment entries can force this on_idle cleanup phase to do unbounded runtime work outside proportional weight accounting. Move this index to a keyed storage shape that can be prefix-cleared, or meter a bounded chunked rewrite with a resumable checkpoint before returning true.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
remove_data_for_dissolved_networks(limit) can consume the full on_idle limit, but process_network_registration_queue() still executes afterward. Adding its actual weight to the return value is too late: the state changes already happened outside the available block budget. Only run queued registration when there is remaining weight and make the registration path consume from that remaining budget before mutating storage.
| if !weight_meter.can_consume(r.saturating_mul(2_u64)) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); | ||
| break; |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner alpha_iter_single_prefix(&hot) scan runs out of weight, the checkpoint records only the outer hotkey. The next on_idle resumes the same hotkey from the beginning of its coldkey prefix, so one hotkey with more coldkeys than fit in a block can cause this phase to repeat forever and leave subnet cleanup stuck. Store an inner coldkey/raw prefix checkpoint, or process/removal-map the coldkey prefix with a resumable iterator key.
| .checked_div(total_alpha_value_u128) | ||
| .unwrap_or_default(); | ||
| if leftover > 0 { | ||
| portions.sort_by(|a, b| b.rem.cmp(&a.rem)); |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
This largest-remainder pass only sees the stakers in the current weight-limited batch. Across multiple on_idle batches, extra RAO can be awarded based on batch boundaries rather than the globally largest remainders, making final payouts dependent on available weight and iteration slicing. Persist all remainders for a final global pass, or use a deterministic residual rule that is correct across chunked execution.
| [[package]] | ||
| name = "ml-kem" | ||
| version = "0.2.2" | ||
| version = "0.2.3" |
There was a problem hiding this comment.
[MEDIUM] Crypto/randomness lockfile drift is unrelated to the dissolve change
The manifest change only adds sp-io to common, but the lockfile also changes randomness/crypto-adjacent packages, including getrandom and ml-kem, plus a new r-efi version. That supply-chain drift is unrelated to dissolve-network cleanup and lacks a PR-body justification. Regenerate the lockfile from the intended manifest change only, or split and justify the dependency movement explicitly.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This phase reserves one coarse reads_writes(20, 12) budget, then conditionally performs mint/spend, balance reads, owner transfer, recycling, an insert, and multiple removals. If any path exceeds that estimate, the runtime performs extra balance/storage work inside on_idle without precise metering. Meter each conditional operation at the point it executes, or prove the maximum bound matches every storage and balance mutation in all branches.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| // Ignore the weight for a single value update | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
purge_netuid is called from the metered dissolve cleanup phase, but after the prefix clears finish it mutates TimelockedIndex with retain while explicitly ignoring weight. TimelockedIndex can grow with timelocked commitments across subnets, so a subnet with many indexed entries can force one on_idle call to scan and rewrite the whole index outside the WeightMeter, defeating the PR's bounded-cleanup goal. Make this index cleanup resumable/metered, or replace the global index shape with storage that can be cleared by netuid under the existing meter.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
remove_data_for_dissolved_networks(limit) may consume the full on_idle limit, but process_network_registration_queue() is always run afterward and can call set_new_network_state, mutating many storage items and scanning NetworksAdded. This lets queued registration work exceed the block's idle budget after cleanup has already used it. Gate this call on remaining weight, or pass the same WeightMeter through and only process a queued registration when its full bounded weight fits.
| if !weight_meter.can_consume(r.saturating_mul(2_u64)) { | ||
| inner_read_all = false; | ||
| last_hot = Some(hot.clone()); | ||
| break; |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan runs out of weight, the checkpoint records only TotalHotkeyAlpha::<T>::hashed_key_for(&hot, netuid). The next block restarts the same hotkey's coldkey prefix from the beginning, so a hotkey with more coldkeys than fit in one idle budget can never make progress; in the total-value phase, already-scanned coldkeys can also be counted again because the accumulated total is persisted before returning. Persist an inner-prefix checkpoint or structure the phase so each retry resumes after the last processed coldkey.
| .checked_div(total_alpha_value_u128) | ||
| .unwrap_or_default(); | ||
| if leftover > 0 { | ||
| portions.sort_by(|a, b| b.rem.cmp(&a.rem)); |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
destroy_alpha_in_out_stakes_settle_stakes processes only the stakers that fit in the current idle budget, but it applies the largest-remainder distribution to that batch alone while using the global total_alpha_value_u128. Remainders from later batches are not compared with earlier batches, so the extra RAO can be assigned to the wrong coldkeys, and any final undistributed pot is later folded into the owner refund/recycle path rather than globally allocated to the highest remainders. Store global remainder candidates or defer payout until a full ordered pass is available.
| [[package]] | ||
| name = "ml-kem" | ||
| version = "0.2.2" | ||
| version = "0.2.3" |
There was a problem hiding this comment.
[MEDIUM] Crypto/randomness lockfile drift is unrelated to the dissolve change
The lockfile changes getrandom versions and moves ml-kem from 0.2.2 to 0.2.3 without a corresponding dissolve-network manifest change or PR justification. Crypto/randomness dependency drift is supply-chain sensitive and should not be bundled into a runtime cleanup PR unless the manifest change and rationale are explicit. Remove the unrelated lockfile churn or document the exact dependency edge that requires it.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This final phase reserves a coarse reads_writes(20, 12) budget, then performs conditional mint/spend/transfer/recycle/status writes based on live balances and refund state. If that estimate is short, the dissolve cleanup can still exceed the idle budget even though the meter says the phase fits. Account for each conditional operation before executing it, or benchmark and charge a conservative WeightInfo value that covers the worst-case path.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex is a single BTreeSet storage value. This cleanup loads, scans, mutates, and re-encodes the whole set while charging only one write, so a large number of timelocked commitments on other subnets can make dissolve cleanup exceed the on_idle budget with no checkpoint inside the index. Store the index by netuid or add an incremental bounded cleanup path that meters each removed entry.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
remove_data_for_dissolved_networks(limit) can consume the full on_idle limit, then this still calls process_network_registration_queue(). That function reads an unbounded Vec, may scan failed entries, and may run set_new_network_state without first reserving remaining weight. A filled or partly failing queue can push the block over its idle budget. Pass a WeightMeter or remaining limit into the queue processor and skip it unless enough weight is available.
| // Handle one hotkey and all its coldkeys or skip the hotkey if the weight is not enough | ||
| // Then we just need to record the hotkey as checkpoint | ||
| for (cold, this_netuid, share_u64f64) in Self::alpha_iter_single_prefix(&hot) { |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
This phase only checkpoints the outer hotkey. If one hotkey has more coldkey entries than fit in the available block weight, the inner loop exits with last_hot = Some(hot), and the next run restarts the same hotkey from the beginning. Cleanup can become permanently stuck for that subnet. The same hotkey-only checkpointing pattern appears in the later alpha cleanup phase. Persist an inner coldkey checkpoint or process/removal progress per (hotkey, coldkey) prefix.
| let leftover: u128 = total_rem | ||
| .checked_div(total_alpha_value_u128) | ||
| .unwrap_or_default(); | ||
| if leftover > 0 { | ||
| portions.sort_by(|a, b| b.rem.cmp(&a.rem)); |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
The largest-remainder pass is computed only over the current batch of stakers, but total_alpha_value_u128 is the global subnet total and cleanup can span multiple on_idle calls. Remainders from earlier/later batches are not compared globally, so dust is assigned to the wrong accounts or left for the final owner/refund path. Persist global remainder state or perform a separate globally ordered remainder distribution before final recycling/refund.
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] New git patch dependency and lockfile drift are unrelated to dissolve cleanup
This PR adds a new external git patch for core2 and the lockfile also moves dependency versions such as borsh, generic-array, rust_decimal, and wasm-bindgen. That is supply-chain scope outside the dissolve-network change, and the git source cannot be verified from the prefetched context. Split this into a dedicated dependency PR with maintainer/changelog review, or remove the dependency drift from this runtime cleanup PR.
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This consumes a fixed upfront reads_writes(20, 12) budget, then conditionally runs balance mint/spend, transfer, recycle, storage inserts/removals, and status writes. Because the later operations are not individually checked against the remaining WeightMeter, this can under-meter the final cleanup path when refund/recycle branches execute. Meter each conditional operation at the point it is performed, or make this phase reserve an audited worst-case weight that covers every branch including balance operations and status updates.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) decodes, scans, filters, and rewrites the entire index while charging only one write. The index is user-grown through timelocked commitments, so a large index can make the dissolve cleanup exceed the on_idle budget and stall/block production. This needs a resumable, per-entry metered cleanup path or a storage layout that can clear by netuid prefix.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
remove_data_for_dissolved_networks(limit) may consume the whole on_idle limit, but process_network_registration_queue() still runs afterward with no remaining-budget check. That function reads/scans the registration queue and can call set_new_network_state, which itself scans and mutates subnet state. A full cleanup block can therefore execute additional unbounded runtime work outside the hook budget. Gate this on limit.saturating_sub(weight) and pass a WeightMeter into the queue processor.
|
|
||
| // Handle one hotkey and all its coldkeys or skip the hotkey if the weight is not enough | ||
| // Then we just need to record the hotkey as checkpoint | ||
| for (cold, this_netuid, share_u64f64) in Self::alpha_iter_single_prefix(&hot) { |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
The inner alpha_iter_single_prefix(&hot) loop checkpoints only last_hot. If one hotkey has more coldkey stake entries than fit in the remaining block weight, every retry restarts the same hotkey from the beginning and can never make progress. The checkpoint must include the inner coldkey position, or the storage should be cleared through a prefix API that returns a cursor for the nested keyspace.
| .checked_div(total_alpha_value_u128) | ||
| .unwrap_or_default(); | ||
| if leftover > 0 { | ||
| portions.sort_by(|a, b| b.rem.cmp(&a.rem)); |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
The largest-remainder pass sorts only the portions collected in this invocation, but total_alpha_value_u128 is global across all stakers and settlement can be split across multiple on_idle calls. Remainder winners are therefore chosen per batch rather than globally; unassigned dust is later folded into the owner refund/final leftover path. This can misallocate subnet TAO during dissolution. Persist a global remainder pass/cursor or use a deterministic per-staker formula that does not depend on batch boundaries.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] New git patch dependency and lockfile drift are unrelated to dissolve cleanup
This adds a crates.io patch for core2 to an external git repository, and the lockfile also changes unrelated packages such as borsh, generic-array, rust_decimal, and wasm-bindgen without corresponding manifest changes tied to dissolve cleanup. Git patches are supply-chain sensitive because CI/builds fetch source from the repository at the pinned rev rather than using the crates.io checksum path. Split this into a dedicated dependency PR with justification and review, or remove the unrelated lockfile/package drift from this runtime change.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
destroy_alpha_in_out_stakes reserves a coarse reads_writes(20, 12) up front and then performs conditional mint/spend/transfer/recycle/event/storage work based on subnet balances and refund state. Because this phase runs from on_idle, the cleanup should meter each branch before executing it or reserve a benchmarked worst-case weight that includes all conditional balance operations. Otherwise a dissolve with refund/recycle paths can exceed the advertised idle budget.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| // Reduce the total networks count. | ||
| TotalNetworks::<T>::mutate(|n: &mut u16| *n = n.saturating_sub(1)); | ||
|
|
||
| TotalStake::<T>::mutate(|total| *total = total.saturating_sub(SubnetTAO::<T>::get(netuid))); |
There was a problem hiding this comment.
[HIGH] Subnet stake is subtracted twice during dissolve cleanup
do_dissolve_network subtracts SubnetTAO from TotalStake when the netuid is only queued for cleanup, but destroy_alpha_in_out_stakes_settle_stakes still subtracts the same SubnetTAO during settlement. If settlement spans multiple on_idle batches, that later path can also run more than once while SubnetTAO remains set. This corrupts the global stake accounting for dissolved networks and can skew downstream economic logic. Move the TotalStake decrement to exactly one finalized point, or track in DissolveCleanupStatus that it has already been applied.
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
After the prefix clears finish, this mutates the entire TimelockedIndex vector and calls retain while charging only one write. TimelockedIndex is global and can contain entries for every committed account across all subnets, so dissolving one subnet can still scan and rewrite an unbounded vector inside on_idle beyond the available weight. This reintroduces the DoS risk the cleanup queue is meant to remove. Store the index per netuid or process it with a resumable cursor and charge per decoded/removed item.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
remove_data_for_dissolved_networks(limit) can consume the full on_idle limit, then this always calls process_network_registration_queue() and accrues its weight afterward. That queue path reads and scans the queue and may call set_new_network_state, so a block can exceed the advertised idle budget after cleanup uses all available weight. Gate this call on remaining weight, pass a WeightMeter into it, and stop before any queue read/state mutation that cannot fit.
|
|
||
| // Handle one hotkey and all its coldkeys or skip the hotkey if the weight is not enough | ||
| // Then we just need to record the hotkey as checkpoint | ||
| for (cold, this_netuid, share_u64f64) in Self::alpha_iter_single_prefix(&hot) { |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
When the inner coldkey scan runs out of weight, the checkpoint only records TotalHotkeyAlpha::hashed_key_for(&hot, netuid). The next iter_from restarts at the same hotkey and the inner alpha_iter_single_prefix(&hot) starts from the first coldkey again, so a hotkey with enough coldkeys to exceed one block's idle budget can make this phase repeat forever. Persist a nested coldkey cursor, or require enough remaining weight to process the entire hotkey before entering its inner prefix.
| .checked_div(total_alpha_value_u128) | ||
| .unwrap_or_default(); | ||
| if leftover > 0 { | ||
| portions.sort_by(|a, b| b.rem.cmp(&a.rem)); |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
This largest-remainder pass only sees the stakers collected in the current resumable slice while using the global total_alpha_value_u128. Across multiple on_idle batches, leftover units are assigned to the largest remainders within each slice rather than the largest remainders globally, so payout depends on iteration chunking and available idle weight. The same batch-local settlement also re-reads the full SubnetTAO each call. Persist each staker's remainder/share and perform one global final remainder allocation, or make settlement atomic after the scan has materialized a bounded global distribution.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] New git patch dependency and lockfile drift are unrelated to dissolve cleanup
This PR adds a [patch.crates-io] override for core2 to a GitHub revision and changes the lockfile source from crates.io to that git source. Even pinned git dependencies expand build-time supply-chain risk, and this dependency change is unrelated to dissolve-network cleanup. Split it into a dedicated dependency PR with explicit justification and review, or remove it from this runtime change.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This finalization path reserves a coarse reads_writes(20, 12) budget and then executes conditional balance operations, mint/spend paths, transfer, recycle, event-affecting cleanup, and storage removals. Several branches depend on live balances and refund state, so the actual database work is not tied to the meter and can drift from the charged weight. Meter each conditional operation immediately before it runs, or split this phase into smaller resumable states with exact charges.
|
🔄 AI review updated — Skeptic: VULNERABLE |
| } | ||
|
|
||
| if weight_meter.can_consume(write_weight) { | ||
| TimelockedIndex::<T>::mutate(|index| { |
There was a problem hiding this comment.
[HIGH] Timelocked commitment index cleanup is unbounded and unmetered
TimelockedIndex::<T>::mutate(... retain ...) scans and rewrites the entire global timelock index while charging only writes(1). A subnet with a large or attacker-inflated timelock index can make this on_idle cleanup exceed the supplied weight and keep doing unmetered runtime work. This needs to be made resumable and charged per element, or moved to bounded indexed storage that can be cleared by prefix.
| fn on_idle(_block: BlockNumberFor<T>, limit: Weight) -> Weight { | ||
| let mut weight = Self::remove_data_for_dissolved_networks(limit); | ||
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); |
There was a problem hiding this comment.
[HIGH] Queued network registration runs outside on_idle weight accounting
on_idle passes the full block limit to dissolve cleanup, then unconditionally calls process_network_registration_queue() afterward and just accrues whatever it reports. If cleanup consumes the limit, registration processing still scans the queue and can call set_new_network_state with no remaining-weight gate. This creates an unbounded on_idle path; pass only remaining weight into queue processing and make it stop before any read/write/call that cannot be charged.
|
|
||
| // Handle one hotkey and all its coldkeys or skip the hotkey if the weight is not enough | ||
| // Then we just need to record the hotkey as checkpoint | ||
| for (cold, this_netuid, share_u64f64) in Self::alpha_iter_single_prefix(&hot) { |
There was a problem hiding this comment.
[HIGH] Coldkey-prefix checkpointing is not resumable
The nested coldkey scan checkpoints only the outer hotkey. If the meter runs out inside alpha_iter_single_prefix(&hot), the next pass resumes from the same hotkey and must rescan that hotkey's coldkeys from the beginning. In the total-alpha phase, the partially accumulated status.subnet_total_alpha_value is also persisted, so resuming can double-count already scanned coldkeys. Track the inner coldkey prefix/key in DissolveCleanupStatus, or require enough weight to process an entire hotkey without persisting partial totals.
| .checked_div(total_alpha_value_u128) | ||
| .unwrap_or_default(); | ||
| if leftover > 0 { | ||
| portions.sort_by(|a, b| b.rem.cmp(&a.rem)); |
There was a problem hiding this comment.
[HIGH] Remainder payouts are batch-local
Largest-remainder distribution is calculated over the current cleanup batch (stakers) while total_alpha_value_u128 and pot_u128 are global. When settlement is split across idle passes, each pass sorts only that pass's remainders, so the extra RAO units are not assigned to the globally largest remainders. That can mispay stakers during dissolved-subnet settlement. Persist a global remainder state or split settlement into a deterministic global pass before transfers.
| // Reduce the total networks count. | ||
| TotalNetworks::<T>::mutate(|n: &mut u16| *n = n.saturating_sub(1)); | ||
|
|
||
| TotalStake::<T>::mutate(|total| *total = total.saturating_sub(SubnetTAO::<T>::get(netuid))); |
There was a problem hiding this comment.
[HIGH] Subnet stake is subtracted twice during dissolve cleanup
do_dissolve_network subtracts SubnetTAO from TotalStake when queueing cleanup, and the later settlement phase subtracts the same SubnetTAO again before payouts. Because cleanup is asynchronous, this double-debits global stake accounting for every dissolved subnet. Remove one of the two debits and keep the accounting transition in a single phase.
| [patch.crates-io] | ||
| w3f-bls = { git = "https://github.com/opentensor/bls", branch = "fix-no-std" } | ||
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } |
There was a problem hiding this comment.
[MEDIUM] New git patch dependency and lockfile drift are unrelated to dissolve cleanup
This adds a workspace-wide git patch for core2 and the lockfile also drifts borsh, generic-array, and rust_decimal. The PR is about dissolve cleanup, but this change alters supply-chain resolution for all builds and uses a git source without a crates.io checksum. Split this out with explicit dependency-review justification, or keep the dependency graph unchanged in this PR.
|
|
||
| // Check if there is enought weight to complete all the operations in this function | ||
| // It is the maximum weight that can be consumed by the function. including all potential reads and writes. | ||
| let max_weight = T::DbWeight::get().reads_writes(20, 12); |
There was a problem hiding this comment.
[MEDIUM] Owner refund and final recycling bypass precise weight metering
This phase reserves a fixed reads_writes(20, 12) budget, then conditionally mints, spends, transfers, recycles, writes RAORecycledForRegistration, clears multiple storages, and calls into swap/balance helpers whose actual database work is not tied to the meter. Undercharging here undermines the whole asynchronous cleanup design. Charge each conditional operation as it happens or make these helper calls return/consume their actual weight.
|
🔄 AI review updated — Skeptic: VULNERABLE |
evgeny-s
left a comment
There was a problem hiding this comment.
Hey, @open-junius . Found 2 high-severity issues. Please check them:
PR2412-HIGH-1-alpha-cleanup-offbyone.md
PR2412-HIGH-2-netuid-reuse.md
There was a problem hiding this comment.
Overall looks correct to me even if there is a lot of changes and I'm not able to internalize all of them.
My only concern is about the TotalStake drift reported by AI and also the use of iter_from which uses the next key and can create stale state if a key is not fully cleaned up.
| # core2 was yanked from crates.io; keep the last published sources available via git. | ||
| core2 = { git = "https://github.com/bbqsrc/core2", rev = "545e84bcb0f235b12e21351e0c69767958efe2a7" } | ||
| zstd-sys = { git = "https://github.com/gztensor/zstd-sys" } |
|
|
||
| weight.saturating_accrue(Self::process_network_registration_queue()); | ||
|
|
There was a problem hiding this comment.
Make sense to check that weight < limit before calling the Self::process_network_registration_queue(), wdyt?
|
|
||
| TotalStake::<T>::mutate(|total| *total = total.saturating_sub(SubnetTAO::<T>::get(netuid))); | ||
|
|
||
| dissolved_networks.push(netuid); |
There was a problem hiding this comment.
This remark from the AI may be wise to look at, we don't want TotalStake to drift.
Description
The PR will fix the #2411, and optimize the process of dissolve a network.
Instead of remove all storages related to the netuid once, it will use the on_idle hook to remove these step by step.
and take the weights into consideration.
Related Issue(s)
Type of Change
Breaking Change
If this PR introduces a breaking change, please provide a detailed description of the impact and the migration path for existing applications.
Checklist
./scripts/fix_rust.shto ensure my code is formatted and linted correctlyScreenshots (if applicable)
Please include any relevant screenshots or GIFs that demonstrate the changes made.
Additional Notes
Please provide any additional information or context that may be helpful for reviewers.