Skip to content

feat: integrate LFQ and TMT workflows#27

Open
hjn0415a wants to merge 3 commits into
OpenMS:mainfrom
hjn0415a:feature/integrate-lfq-tmt
Open

feat: integrate LFQ and TMT workflows#27
hjn0415a wants to merge 3 commits into
OpenMS:mainfrom
hjn0415a:feature/integrate-lfq-tmt

Conversation

@hjn0415a

@hjn0415a hjn0415a commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Changes

Unified LFQ/TMT Workflow (feat)

  • Integrated LFQ and TMT workflows into a single configurable pipeline.
  • Added an Analysis Mode selector to switch between LFQ and TMT workflows directly from the UI.
  • Refactored configure() and execution() to dynamically render parameters and execute the appropriate workflow based on the selected analysis mode.
  • Added workflow-specific sample group assignment for both LFQ and TMT analyses.

Result Visualization Integration (feat)

  • Added interactive downstream analysis pages for:

    • Abundance
    • Heatmap
    • Volcano Plot
    • PCA
    • Pathway Analysis
  • Connected visualization modules to both LFQ and TMT outputs for streamlined result exploration.

Related Issues

  • Related to quantitative proteomics workflow integration and downstream result analysis.

Testing Checklist

  • Verified switching between LFQ and TMT analysis modes.
  • Confirmed workflow-specific parameter rendering.
  • Tested LFQ workflow execution.
  • Tested TMT workflow execution.
  • Verified Abundance, Heatmap, Volcano, PCA, and Pathway Analysis pages.

Summary by CodeRabbit

  • New Features
    • Added Pathway Analysis to the Results menu with GO enrichment (BP/CC/MF) and protein abundance tables.
    • Introduced analysis mode selection (LFQ/TMT) with dynamic rendering of results pages (e.g., abundance, heatmap, PCA, volcano).
  • Bug Fixes
    • Improved results processing robustness with better validation of available data and clearer handling when required inputs are missing/insufficient.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR introduces a configurable analysis-mode (LFQ vs TMT) across the quantms-web stack. ParameterManager gains a three-layer merge helper for ini/defaults/user-override parameter blending; StreamlitUI.input_TOPP adds multi-instance TOPP tool support with per-instance session state and flag-parameter rendering. load_abundance_data branches on mode: LFQ reads quant CSVs with mzML-group- maps and Welch t-tests; TMT reads tab-delimited CSVs with TMT-group- maps, skip-channel handling, and identical statistics. CommandExecutor.run_topp uses tool_instance_name for parameter selection and implements flag-conditional CLI argument emission. WorkflowTest adds an analysis-mode selector, tunes LFQ defaults (CometAdapter, PercolatorAdapter, ProteomicsLFQ), introduces TMT UI with per-channel group inputs, and branches execution into complete LFQ (with optional EasyPQP and GO enrichment) and TMT pipelines. All results pages (abundance, volcano, PCA, heatmap) become mode-aware with appropriate column/threshold/rendering differences. A new Pathway Analysis page adds GO enrichment via UniProt ID derivation, MyGene.info querying, Fisher's exact test, and Plotly visualization.

Changes

LFQ/TMT Dual-Mode Analysis

Layer / File(s) Summary
ParameterManager merge helper + StreamlitUI multi-instance TOPP support
src/workflow/ParameterManager.py, src/workflow/StreamlitUI.py
ParameterManager.get_merged_params performs ini/defaults/user-override layered merge. StreamlitUI.input_TOPP adds tool_instance_name and flag_parameters, registers per-instance session state, persists _flag_params to params.json, and resolves parameter values through get_merged_params; flag and bool params render as True/False selectboxes.
load_abundance_data: LFQ and TMT data paths
src/common/results_helpers.py
load_abundance_data branches on analysis-mode: LFQ reads quant CSVs, builds mzML-group- group maps, runs Welch t-tests, and applies Benjamini-Hochberg correction; TMT reads tab-delimited CSVs, removes ratio columns, builds TMT-group- maps with skip-channel handling, prepends Group row, and computes the same statistics.
CommandExecutor: tool_instance_name parameter routing
src/workflow/CommandExecutor.py
run_topp adds tool_instance_name parameter and selects merged parameters via get_merged_params. Detects flag parameters from _flag_params, conditionally emits CLI flags based on "enabled" parsing, skips empty/None non-flag params while preserving 0/0.0, and supports multiline string splitting and lowercase boolean serialization.
WorkflowTest: analysis-mode selector, LFQ tuning, TMT UI, and dual pipelines
src/WorkflowTest.py
Adds analysis-mode selectbox routing to render_lfq_tabs() or render_tmt_tabs(). Adjusts LFQ defaults for CometAdapter (instrument, tolerances, peptide indexing), PercolatorAdapter (post_processing_tdc), and ProteomicsLFQ (alignment, quantification strategy). Introduces render_tmt_tabs() wiring IsobaricAnalyzer through ProteinQuantifier with dynamic per-channel TMT-group-* inputs. execution() branches into complete LFQ pipeline (CometAdapter → PercolatorAdapter → IDFilter → optional EasyPQP → ProteomicsLFQ → GO enrichment) and complete TMT pipeline (IsobaricAnalyzer → CometAdapter → PercolatorAdapter → IDFilter → IDMapper → FileMerger → ProteinInference → IDConflictResolver → ProteinQuantifier).
Abundance results page: mode-aware rendering with render_protein_table
content/results_abundance.py
Imports ParameterManager and workflow directory; derives analysis_mode and branches rendering. Introduces render_protein_table helper for conditional ID/sample/stat column selection and is_lfq-specific intensity transformations. LFQ renders protein and PSM tabs; non-LFQ renders pre-processing and protein tabs.
Volcano results page: LFQ and non-LFQ branching with mode-specific thresholds
content/results_volcano.py
Imports ParameterManager and workflow directory; reads analysis-mode and branches volcano-plot paths. LFQ exposes fold-change minimum 0.5 and uses ProteinName hover column; non-LFQ exposes 0.1 minimum and uses protein column. Both compute -log10(p-adj), apply threshold labeling, and differ in chart width rendering.
PCA results page: mode-specific protein column and group mapping
content/results_pca.py
Imports ParameterManager and workflow directory; reads analysis_mode and selects protein column (ProteinName for LFQ, protein for non-LFQ). Replaces group-mapping normalization: LFQ uses direct mzML-stripping transform; non-LFQ parses sample indices from group_map keys and matches substrings with fallback to "Unassigned". Updates chart rendering to width="stretch".
Heatmap results page: mode-aware Z-score clustering and rendering
content/results_heatmap.py
Imports ParameterManager and workflow directory; reads analysis_mode and branches heatmap rendering. Both LFQ and non-LFQ paths compute top-N proteins by variance, apply row-wise Z-score, and hierarchical cluster rows/columns; differ only in Plotly chart sizing (use_container_width=True vs width="stretch").
Pathway Analysis page: GO enrichment via MyGene + navigation update
app.py, content/results_pathway_analysis.py
New results page loads pivot data, derives UniProt IDs, queries MyGene.info for GO annotations (BP/CC/MF), computes Fisher exact test per GO term using foreground/background gene sets, builds horizontal Plotly bar charts, serializes results to go_results.json, and renders Protein Table and GO Enrichment tabs with figure reconstruction from stored JSON. Navigation replaces "Proteomics LFQ" entry with "Pathway Analysis" (icon 📉).

Poem

🐰 A rabbit hops through LFQ and TMT lanes,
Two modes of protein data, each with different refrains,
GO terms are queried from the MyGene nest,
Bar charts bloom for BP, CC, and MF's best,
The workflow branches cleanly — now merged without fright,
New pathways illuminated, with analysis in sight! 📉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: integrate LFQ and TMT workflows' accurately captures the main objective: integrating two distinct quantification workflows (LFQ and TMT) into a configurable pipeline, which is reflected throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Nitpick comments (7)
content/results_pathway_analysis.py (1)

68-69: 💤 Low value

Use pandas boolean negation instead of != True comparison.

Per static analysis, comparing to True with != is not recommended. For pandas boolean columns, use the bitwise negation operator.

♻️ Proposed fix
                 res_go = pd.DataFrame(res_list)
                 if "notfound" in res_go.columns:
-                    res_go = res_go[res_go["notfound"] != True]
+                    res_go = res_go[~res_go["notfound"].fillna(False)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_pathway_analysis.py` around lines 68 - 69, In the filtering
operation on the res_go dataframe where the "notfound" column is checked,
replace the `!= True` comparison with the bitwise negation operator `~` to
properly filter pandas boolean columns. This change makes the code more
idiomatic and follows pandas best practices for boolean indexing.
src/common/results_helpers.py (5)

337-338: 💤 Low value

Redundant Path() wrapping and duplicate ParameterManager.

workflow_dir is already a Path (from line 198). Also, a ParameterManager was created at line 201 and can be reused here instead of creating another instance.

Proposed fix
-        parameter_manager = ParameterManager(Path(workflow_dir), "TOPP Workflow")
-        params = parameter_manager.get_parameters_from_json()
+        params = parameter_manager.get_parameters_from_json()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/results_helpers.py` around lines 337 - 338, The ParameterManager
instantiation at line 337 is redundant because a ParameterManager instance was
already created at line 201 and should be reused. Additionally, workflow_dir is
already a Path object from line 198, so wrapping it with Path() again is
unnecessary. Remove the Path() wrapper around workflow_dir and replace the new
ParameterManager instantiation with a reference to the existing
parameter_manager instance created earlier to obtain the parameters from JSON.

225-226: 💤 Low value

Redundant ParameterManager instantiation.

A ParameterManager was already created at line 201 and assigned to parameter_manager. This creates a duplicate instance with a different workflow name argument (missing here).

Proposed fix
-        # Get group mapping from parameters
-        param_manager = ParameterManager(workflow_dir)
-        params = param_manager.get_parameters_from_json()
+        # Get group mapping from parameters (reuse existing parameter_manager)
+        params = parameter_manager.get_parameters_from_json()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/results_helpers.py` around lines 225 - 226, The ParameterManager
is being instantiated twice - once as parameter_manager at line 201 and again as
param_manager at lines 225-226. Remove the redundant ParameterManager
instantiation and instead reuse the existing parameter_manager variable that was
already created earlier in the code, then update the get_parameters_from_json()
call to use the existing parameter_manager instance.

323-326: ⚡ Quick win

Catch specific CSV parsing exceptions for TMT branch.

Same issue as the LFQ branch - bare Exception catch masks debugging.

Proposed fix
         try:
             df = pd.read_csv(csv_file, sep="\t", comment="#", engine="python")
-        except Exception:
+        except (pd.errors.ParserError, pd.errors.EmptyDataError, FileNotFoundError):
             return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/results_helpers.py` around lines 323 - 326, The exception handling
in the pd.read_csv call within the TMT branch function is too broad by catching
a bare Exception, which masks actual parsing errors and complicates debugging.
Replace the generic Exception catch with specific exceptions that pd.read_csv
can raise, such as FileNotFoundError for missing files and pd.errors.ParserError
for parsing issues. This will allow proper error handling and logging of
specific failure modes while still gracefully returning None when appropriate.

Source: Linters/SAST tools


216-219: ⚡ Quick win

Catch specific CSV parsing exceptions.

Catching bare Exception masks unexpected errors and makes debugging harder. For CSV parsing, consider catching pd.errors.ParserError and pd.errors.EmptyDataError.

Proposed fix
         try:
             df = pd.read_csv(csv_file)
-        except Exception:
+        except (pd.errors.ParserError, pd.errors.EmptyDataError, FileNotFoundError):
             return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/results_helpers.py` around lines 216 - 219, The try-except block
around pd.read_csv(csv_file) is catching a bare Exception which masks unexpected
errors and makes debugging difficult. Replace the generic Exception catch with
specific pandas exceptions that are likely to occur during CSV parsing. Catch
pd.errors.ParserError and pd.errors.EmptyDataError specifically, either in a
single except clause with both exceptions or using multiple except blocks, while
allowing other unexpected exceptions to propagate and surface actual errors.

Source: Linters/SAST tools


379-482: ⚖️ Poor tradeoff

Redundant group_map check and unclear control flow.

The check if group_map: at line 379 is redundant since line 365-367 already returns None when group_map is empty. The nested structure and fallthrough to line 483 makes the control flow hard to follow.

Consider flattening:

Proposed refactor
         if not group_map:
             st.warning("⚠️ Group mapping information is missing. Please configure sample groups in the Setup page.")
             return None
+
+        if len(set(group_map.values())) < 2:
+            st.warning("⚠️ At least two distinct groups are required for statistical analysis.")
+            return None
         
         if exclude_indices:
             # ... existing column dropping logic ...
-
-        if group_map:
-            # ... all the nested logic ...
-            if group_map and len(set(group_map.values())) >= 2:
-                # ... stats computation ...
-                return pivot_df, expr_df, group_map
-            else:
-                st.warning("⚠️ At least two distinct groups are required for statistical analysis.")
-        else:
-            st.warning("⚠️ No group mapping information is set. Please check the Configure page.")
-        return None
+        # ... stats computation (now at top level) ...
+        return pivot_df, expr_df, group_map
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/results_helpers.py` around lines 379 - 482, The redundant check
for group_map at line 379 combined with the nested if condition at line 410
creates unclear control flow. Since the code already returns None at lines
365-367 when group_map is empty, the outer if group_map check is unnecessary.
Remove the entire outer if group_map block starting at line 379 and dedent all
the contained code. Replace it with just the inner condition checking if
group_map has at least two distinct groups, and remove the trailing else clause
that warns about no group mapping since that case is already handled by the
earlier return statement. This will flatten the control structure and eliminate
the redundant check.
content/results_volcano.py (1)

43-192: ⚖️ Poor tradeoff

Consider extracting shared volcano plot logic.

The LFQ and non-LFQ branches duplicate ~50 lines of nearly identical code. The only meaningful differences are the protein column name, slider minimum, and layout style. Consider extracting the shared logic into a helper function.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_volcano.py` around lines 43 - 192, Extract the shared volcano
plot creation logic into a helper function to eliminate code duplication between
the LFQ and non-LFQ branches. The helper function should accept parameters for
the differences: the protein column name (ProteinName vs protein), the fc_thresh
minimum value (0.5 vs 0.1), and the layout style parameter (use_container_width
vs width). Move the common code that creates volcano_df, calculates
neg_log10_padj, sets Significance categories, creates the scatter plot with
fig_volcano, adds threshold lines, sets x-axis range symmetry, updates layout,
displays counts, and shows navigation links into this helper function. Then
replace both the if and else branches with calls to this helper function,
passing the appropriate parameter values for each analysis mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@content/results_abundance.py`:
- Around line 74-76: The code in the lambda function within the apply method on
pivot_df uses np.log2() but the numpy module is not imported at the top of the
file, causing a NameError when this code executes. Add import numpy as np to the
import statements at the top of the results_abundance.py file to resolve the
undefined np reference.

In `@content/results_heatmap.py`:
- Line 122: The st.plotly_chart function call contains an invalid width
parameter set to "stretch". Remove the width="stretch" parameter from the
st.plotly_chart function call, as this is not a supported parameter value for
this function.
- Around line 41-132: The code contains nearly complete duplication between the
if analysis_mode == "LFQ" and else branches, with all the logic for slider
creation with key "heatmap_top_n", variance calculation using var(), top
proteins selection, heatmap normalization and z-score calculation, hierarchical
clustering with linkage() and leaves_list(), and figure creation with
px.imshow() being identical. Extract all this common code outside the
conditional statement to execute once before the if-else block, and only keep
the different st.plotly_chart() parameter call (use_container_width=True versus
width="stretch") within the conditional, or consolidate that parameter as well
if they achieve the same result.

In `@content/results_pathway_analysis.py`:
- Line 18: The page title in the st.title() function call does not reflect the
actual purpose of this page, which is Pathway Analysis. Change the title from
"ProteomicsLFQ Results" to a more appropriate title that clearly indicates this
is the Pathway Analysis page, such as "Pathway Analysis Results" or similar, to
align the page title with its actual functionality and avoid confusing users.
- Around line 191-197: The _run_go_enrichment() function is called
unconditionally on every page load, which causes unnecessary external API calls
to MyGene.info even when cached results already exist in go_results.json. Add a
check before calling _run_go_enrichment() to verify whether the go_json_file
already exists and contains valid cached results. Only execute
_run_go_enrichment() when the cached results file does not exist or is invalid,
allowing subsequent page loads to skip the expensive computation and reuse the
previously saved results.

In `@content/results_pca.py`:
- Line 110: The st.plotly_chart function call in the results_pca.py file
contains an invalid width parameter with value "stretch". Remove the
width="stretch" argument from the st.plotly_chart call entirely, or if you need
the chart to span the full container width, replace it with the valid parameter
use_container_width=True instead.
- Around line 112-113: There is a duplicate st.markdown call displaying the
"Proteins used" message in the results_pca.py file. Remove one of the two
identical st.markdown statements that display the proteins count and top N
filter information. Keep only one instance of the line that constructs the
message with expr_df_pca.shape[0] and top_n variables.

In `@content/results_volcano.py`:
- Around line 31-33: The st.info() and st.stop() function calls have excessive
indentation that incorrectly nests them deeper than the if pivot_df.empty check.
Remove one level of indentation from both the st.info("No data available for
volcano plot.") and st.stop() lines so they align with the if statement at the
same indentation level, ensuring these statements execute as the direct body of
the if condition.
- Line 180: The st.plotly_chart() call with the fig_volcano argument is using an
invalid width="stretch" parameter that is not supported in Streamlit 1.43.0.
Replace the width="stretch" parameter with use_container_width=True to properly
stretch the chart to fill the container width.

In `@src/workflow/ParameterManager.py`:
- Line 133: In the get_merged_params method, update the type annotation for the
ini_params parameter to explicitly indicate it can be None by changing it from
dict = None to dict | None = None. This makes the optional nature of the
parameter explicit to satisfy type checking requirements.

In `@src/workflow/StreamlitUI.py`:
- Around line 836-840: The issue is that widget keys now contain the
tool_instance_name (e.g., CometAdapter-TMT:1:...), and when
ParameterManager.save_parameters() extracts the tool name from these keys and
calls create_ini(tool), it passes the instance name instead of the base tool
name, causing it to look for non-existent files like CometAdapter-TMT.ini and
skip saving parameter overrides. In the ParameterManager.save_parameters()
method, before calling create_ini(tool), resolve the instance name back to its
base tool name by extracting just the tool portion (the part before the :1: or
instance identifier) so that it correctly loads and saves to the actual ini file
for that tool.
- Around line 872-877: In the bool branch where isinstance(p["value"], bool) is
already confirmed on line 872, the subsequent type check for string
(type(p["value"]) == str) is unreachable and will never be true. Simplify the
bool_value assignment by directly using p["value"] since it is guaranteed to be
a boolean at that point, removing the unnecessary conditional logic that checks
if it's a string.
- Around line 616-621: In the input_TOPP function signature, replace the mutable
default arguments with None and add explicit Optional type annotations. Change
flag_parameters from List[str] = [] to Optional[List[str]] = None, change
custom_defaults from dict = {} to Optional[dict] = None, and ensure
tool_instance_name has the proper Optional[str] type annotation. Then, inside
the function body, add initialization logic that checks if each parameter is
None and initializes them to their intended empty values (empty list for
flag_parameters, empty dict for custom_defaults) only when needed.

In `@src/WorkflowTest.py`:
- Around line 675-690: The WorkflowTest.py file contains unresolved Git merge
conflict markers that prevent the code from being parsed. You need to manually
resolve all merge conflicts in the execution() method by examining each conflict
region (marked by <<<<<<< HEAD, =======, and >>>>>>> upstream/main) and choosing
the correct code from either branch or merging the logic appropriately. Delete
all conflict marker lines after resolving each section, ensuring the final code
maintains proper syntax and logic flow. The conflicts occur in multiple
locations throughout the file including directory setup, CometAdapter execution,
ProteomicsLFQ input handling, report generation, and the return statement.
- Around line 390-391: Remove the debug statement
st.write(Path(self.workflow_dir, "results")) from the TMT tab configuration UI
code. This line is displaying raw path information to users and should be
deleted entirely as it serves no functional purpose in production.
- Around line 425-436: The custom_defaults dictionary contains a duplicate key
"PeptideIndexing:unmatched_action" that appears on both line 425 and line 436,
where the second occurrence silently overwrites the first. Remove one of the
duplicate "PeptideIndexing:unmatched_action": "warn" entries from the dictionary
to eliminate the copy-paste error.

---

Nitpick comments:
In `@content/results_pathway_analysis.py`:
- Around line 68-69: In the filtering operation on the res_go dataframe where
the "notfound" column is checked, replace the `!= True` comparison with the
bitwise negation operator `~` to properly filter pandas boolean columns. This
change makes the code more idiomatic and follows pandas best practices for
boolean indexing.

In `@content/results_volcano.py`:
- Around line 43-192: Extract the shared volcano plot creation logic into a
helper function to eliminate code duplication between the LFQ and non-LFQ
branches. The helper function should accept parameters for the differences: the
protein column name (ProteinName vs protein), the fc_thresh minimum value (0.5
vs 0.1), and the layout style parameter (use_container_width vs width). Move the
common code that creates volcano_df, calculates neg_log10_padj, sets
Significance categories, creates the scatter plot with fig_volcano, adds
threshold lines, sets x-axis range symmetry, updates layout, displays counts,
and shows navigation links into this helper function. Then replace both the if
and else branches with calls to this helper function, passing the appropriate
parameter values for each analysis mode.

In `@src/common/results_helpers.py`:
- Around line 337-338: The ParameterManager instantiation at line 337 is
redundant because a ParameterManager instance was already created at line 201
and should be reused. Additionally, workflow_dir is already a Path object from
line 198, so wrapping it with Path() again is unnecessary. Remove the Path()
wrapper around workflow_dir and replace the new ParameterManager instantiation
with a reference to the existing parameter_manager instance created earlier to
obtain the parameters from JSON.
- Around line 225-226: The ParameterManager is being instantiated twice - once
as parameter_manager at line 201 and again as param_manager at lines 225-226.
Remove the redundant ParameterManager instantiation and instead reuse the
existing parameter_manager variable that was already created earlier in the
code, then update the get_parameters_from_json() call to use the existing
parameter_manager instance.
- Around line 323-326: The exception handling in the pd.read_csv call within the
TMT branch function is too broad by catching a bare Exception, which masks
actual parsing errors and complicates debugging. Replace the generic Exception
catch with specific exceptions that pd.read_csv can raise, such as
FileNotFoundError for missing files and pd.errors.ParserError for parsing
issues. This will allow proper error handling and logging of specific failure
modes while still gracefully returning None when appropriate.
- Around line 216-219: The try-except block around pd.read_csv(csv_file) is
catching a bare Exception which masks unexpected errors and makes debugging
difficult. Replace the generic Exception catch with specific pandas exceptions
that are likely to occur during CSV parsing. Catch pd.errors.ParserError and
pd.errors.EmptyDataError specifically, either in a single except clause with
both exceptions or using multiple except blocks, while allowing other unexpected
exceptions to propagate and surface actual errors.
- Around line 379-482: The redundant check for group_map at line 379 combined
with the nested if condition at line 410 creates unclear control flow. Since the
code already returns None at lines 365-367 when group_map is empty, the outer if
group_map check is unnecessary. Remove the entire outer if group_map block
starting at line 379 and dedent all the contained code. Replace it with just the
inner condition checking if group_map has at least two distinct groups, and
remove the trailing else clause that warns about no group mapping since that
case is already handled by the earlier return statement. This will flatten the
control structure and eliminate the redundant check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 08db369f-2cbf-4c06-981c-b3365985aa6a

📥 Commits

Reviewing files that changed from the base of the PR and between 46bf208 and 4c7c158.

📒 Files selected for processing (10)
  • app.py
  • content/results_abundance.py
  • content/results_heatmap.py
  • content/results_pathway_analysis.py
  • content/results_pca.py
  • content/results_volcano.py
  • src/WorkflowTest.py
  • src/common/results_helpers.py
  • src/workflow/ParameterManager.py
  • src/workflow/StreamlitUI.py

Comment on lines +74 to +76
pivot_df["Intensity"] = pivot_df[sample_cols].apply(
lambda row: [np.log2(v + 1) for v in row], axis=1
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

np is undefined - missing numpy import.

The non-LFQ branch uses np.log2() but numpy is not imported in this file, causing a NameError at runtime when TMT mode is used.

Proposed fix - add import at top of file
+import numpy as np
 from src.workflow.ParameterManager import ParameterManager
🧰 Tools
🪛 GitHub Actions: Pylint / 0_build.txt

[error] 75-75: pylint E0602: Undefined variable 'np' (undefined-variable)

🪛 GitHub Actions: Pylint / build

[error] 75-75: pylint E0602: Undefined variable 'np' (undefined-variable)

🪛 Ruff (0.15.17)

[error] 75-75: Undefined name np

(F821)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_abundance.py` around lines 74 - 76, The code in the lambda
function within the apply method on pivot_df uses np.log2() but the numpy module
is not imported at the top of the file, causing a NameError when this code
executes. Add import numpy as np to the import statements at the top of the
results_abundance.py file to resolve the undefined np reference.

Source: Linters/SAST tools

Comment on lines +41 to +132
if analysis_mode == "LFQ":
top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")

heatmap_clustered = heatmap_z.iloc[row_order, col_order]
var_series = expr_df.var(axis=1)
top_proteins = var_series.sort_values(ascending=False).head(top_n).index
heatmap_df = expr_df.loc[top_proteins]
heatmap_z = heatmap_df.sub(heatmap_df.mean(axis=1), axis=0).div(heatmap_df.std(axis=1), axis=0)
heatmap_z = heatmap_z.replace([np.inf, -np.inf], np.nan).dropna()

fig_heatmap = px.imshow(
heatmap_clustered,
labels=dict(x="Sample", y="Protein", color="Z-score"),
aspect="auto",
color_continuous_scale=[[0.0, "#3b6fb6"], [0.5, "white"], [1.0, "#b40426"]],
zmin=-3, zmax=3
)
if not heatmap_z.empty:
row_linkage = linkage(pdist(heatmap_z.values), method="average")
row_order = leaves_list(row_linkage)

fig_heatmap.update_layout(
height=700,
xaxis={'side': 'bottom'},
yaxis={'side': 'left'}
)
col_linkage = linkage(pdist(heatmap_z.T.values), method="average")
col_order = leaves_list(col_linkage)

fig_heatmap.update_xaxes(tickfont=dict(size=10))
fig_heatmap.update_yaxes(tickfont=dict(size=8))
heatmap_clustered = heatmap_z.iloc[row_order, col_order]

st.plotly_chart(fig_heatmap, use_container_width=True)
fig_heatmap = px.imshow(
heatmap_clustered,
labels=dict(x="Sample", y="Protein", color="Z-score"),
aspect="auto",
color_continuous_scale=[[0.0, "#3b6fb6"], [0.5, "white"], [1.0, "#b40426"]],
zmin=-3, zmax=3
)

fig_heatmap.update_layout(
height=700,
xaxis={'side': 'bottom'},
yaxis={'side': 'left'}
)

fig_heatmap.update_xaxes(tickfont=dict(size=10))
fig_heatmap.update_yaxes(tickfont=dict(size=8))

st.plotly_chart(fig_heatmap, use_container_width=True)
else:
st.warning("Insufficient data to generate the heatmap.")

st.markdown("---")
st.markdown("**Other visualizations:**")
col1, col2 = st.columns(2)
with col1:
st.page_link("content/results_volcano.py", label="Volcano Plot", icon="🌋")
with col2:
st.page_link("content/results_pca.py", label="PCA", icon="📊")
else:
st.warning("Insufficient data to generate the heatmap.")

st.markdown("---")
st.markdown("**Other visualizations:**")
col1, col2 = st.columns(2)
with col1:
st.page_link("content/results_volcano.py", label="Volcano Plot", icon="🌋")
with col2:
st.page_link("content/results_pca.py", label="PCA", icon="📊")
top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")

var_series = expr_df.var(axis=1)
top_proteins = var_series.sort_values(ascending=False).head(top_n).index
heatmap_df = expr_df.loc[top_proteins]
heatmap_z = heatmap_df.sub(heatmap_df.mean(axis=1), axis=0).div(heatmap_df.std(axis=1), axis=0)
heatmap_z = heatmap_z.replace([np.inf, -np.inf], np.nan).dropna()

if not heatmap_z.empty:
row_linkage = linkage(pdist(heatmap_z.values), method="average")
row_order = leaves_list(row_linkage)

col_linkage = linkage(pdist(heatmap_z.T.values), method="average")
col_order = leaves_list(col_linkage)

heatmap_clustered = heatmap_z.iloc[row_order, col_order]

fig_heatmap = px.imshow(
heatmap_clustered,
labels=dict(x="Sample", y="Protein", color="Z-score"),
aspect="auto",
color_continuous_scale=[[0.0, "#3b6fb6"], [0.5, "white"], [1.0, "#b40426"]],
zmin=-3, zmax=3
)

fig_heatmap.update_layout(
height=700,
xaxis={'side': 'bottom'},
yaxis={'side': 'left'}
)

fig_heatmap.update_xaxes(tickfont=dict(size=10))
fig_heatmap.update_yaxes(tickfont=dict(size=8))

st.plotly_chart(fig_heatmap, width="stretch")
else:
st.warning("Insufficient data to generate the heatmap.")

st.markdown("---")
st.markdown("**Other visualizations:**")
col1, col2 = st.columns(2)
with col1:
st.page_link("content/results_volcano.py", label="Volcano Plot", icon="🌋")
with col2:
st.page_link("content/results_pca.py", label="PCA", icon="📊")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Eliminate near-complete code duplication.

The LFQ and non-LFQ branches (lines 41-86 vs 87-132) are nearly identical - the only difference is the chart rendering call. This should be a single code path.

Proposed refactor
-if analysis_mode == "LFQ":
-    top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")
-    # ... 40+ lines of identical code ...
-    st.plotly_chart(fig_heatmap, use_container_width=True)
-    # ... footer links ...
-else:
-    top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")
-    # ... 40+ lines of identical code ...
-    st.plotly_chart(fig_heatmap, width="stretch")
-    # ... footer links ...
+top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")
+
+var_series = expr_df.var(axis=1)
+top_proteins = var_series.sort_values(ascending=False).head(top_n).index
+heatmap_df = expr_df.loc[top_proteins]
+heatmap_z = heatmap_df.sub(heatmap_df.mean(axis=1), axis=0).div(heatmap_df.std(axis=1), axis=0)
+heatmap_z = heatmap_z.replace([np.inf, -np.inf], np.nan).dropna()
+
+if not heatmap_z.empty:
+    # ... clustering and figure creation ...
+    st.plotly_chart(fig_heatmap, use_container_width=True)
+else:
+    st.warning("Insufficient data to generate the heatmap.")
+
+st.markdown("---")
+st.markdown("**Other visualizations:**")
+# ... links ...
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_heatmap.py` around lines 41 - 132, The code contains nearly
complete duplication between the if analysis_mode == "LFQ" and else branches,
with all the logic for slider creation with key "heatmap_top_n", variance
calculation using var(), top proteins selection, heatmap normalization and
z-score calculation, hierarchical clustering with linkage() and leaves_list(),
and figure creation with px.imshow() being identical. Extract all this common
code outside the conditional statement to execute once before the if-else block,
and only keep the different st.plotly_chart() parameter call
(use_container_width=True versus width="stretch") within the conditional, or
consolidate that parameter as well if they achieve the same result.

fig_heatmap.update_xaxes(tickfont=dict(size=10))
fig_heatmap.update_yaxes(tickfont=dict(size=8))

st.plotly_chart(fig_heatmap, width="stretch")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalid width parameter for st.plotly_chart.

Same issue as other pages - width="stretch" is not valid.

Proposed fix
-        st.plotly_chart(fig_heatmap, width="stretch")
+        st.plotly_chart(fig_heatmap, use_container_width=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
st.plotly_chart(fig_heatmap, width="stretch")
st.plotly_chart(fig_heatmap, use_container_width=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_heatmap.py` at line 122, The st.plotly_chart function call
contains an invalid width parameter set to "stretch". Remove the width="stretch"
parameter from the st.plotly_chart function call, as this is not a supported
parameter value for this function.

# Page setup
# ================================
params = page_setup()
st.title("ProteomicsLFQ Results")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Page title does not match page purpose.

The page title is "ProteomicsLFQ Results" but this is the Pathway Analysis page. This creates user confusion.

🧹 Proposed fix
 params = page_setup()
-st.title("ProteomicsLFQ Results")
+st.title("Pathway Analysis")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_pathway_analysis.py` at line 18, The page title in the
st.title() function call does not reflect the actual purpose of this page, which
is Pathway Analysis. Change the title from "ProteomicsLFQ Results" to a more
appropriate title that clearly indicates this is the Pathway Analysis page, such
as "Pathway Analysis Results" or similar, to align the page title with its
actual functionality and avoid confusing users.

Comment on lines +191 to +197
go_json_file = results_dir / "go-terms" / "go_results.json"

go_input_df = pivot_df.copy()
if "ProteinName" in go_input_df.columns:
go_input_df = go_input_df.rename(columns={"ProteinName": "protein"})

_run_go_enrichment(go_input_df, results_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

GO enrichment runs on every page load, even when results already exist.

_run_go_enrichment() is called unconditionally every time the page loads (line 197), which triggers an external API call to MyGene.info and re-computes the entire analysis. Since results are saved to go_results.json, the function should skip execution when valid cached results exist.

⚡ Proposed fix
 go_json_file = results_dir / "go-terms" / "go_results.json"

+# Only run GO enrichment if results don't already exist
+if not go_json_file.exists():
     go_input_df = pivot_df.copy()
     if "ProteinName" in go_input_df.columns:
         go_input_df = go_input_df.rename(columns={"ProteinName": "protein"})

     _run_go_enrichment(go_input_df, results_dir)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/results_pathway_analysis.py` around lines 191 - 197, The
_run_go_enrichment() function is called unconditionally on every page load,
which causes unnecessary external API calls to MyGene.info even when cached
results already exist in go_results.json. Add a check before calling
_run_go_enrichment() to verify whether the go_json_file already exists and
contains valid cached results. Only execute _run_go_enrichment() when the cached
results file does not exist or is invalid, allowing subsequent page loads to
skip the expensive computation and reuse the previously saved results.

Comment on lines +836 to +840
# get key and name – use tool_instance_name in session state key
key_str = p['key'].decode()
if tool_instance_name != topp_tool_name:
key_str = key_str.replace(f"{topp_tool_name}:1:", f"{tool_instance_name}:1:", 1)
key = f"{self.parameter_manager.topp_param_prefix}{key_str}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve instance names before saving TOPP parameter overrides.

These widget keys now persist as ...{tool_instance_name}:1:...; ParameterManager.save_parameters() derives the tool name from that key and calls create_ini(tool). For instances such as CometAdapter-TMT, it attempts to create/load a non-existent CometAdapter-TMT.ini and skips saving user overrides, so TMT parameter edits are lost.

🐛 Proposed fix in `ParameterManager.save_parameters()`
-        # for each TOPP tool, open the ini file
-        for tool in current_topp_tools:
-            if not self.create_ini(tool):
+        # for each TOPP tool instance, open the real executable's ini file
+        tool_instance_map = st.session_state.get("_topp_tool_instance_map", {})
+        for tool_instance in current_topp_tools:
+            tool = tool_instance_map.get(tool_instance, tool_instance)
+            if not self.create_ini(tool):
                 # Could not create ini file - skip this tool
                 continue
             ini_path = Path(self.ini_dir, f"{tool}.ini")
-            if tool not in json_params:
-                json_params[tool] = {}
+            if tool_instance not in json_params:
+                json_params[tool_instance] = {}
@@
-                if key.startswith(f"{self.topp_param_prefix}{tool}:1:"):
+                if key.startswith(f"{self.topp_param_prefix}{tool_instance}:1:"):
@@
-                    ini_key = key.replace(self.topp_param_prefix, "").encode()
+                    ini_key_str = key.replace(self.topp_param_prefix, "")
+                    if tool_instance != tool:
+                        ini_key_str = ini_key_str.replace(
+                            f"{tool_instance}:1:",
+                            f"{tool}:1:",
+                            1,
+                        )
+                    ini_key = ini_key_str.encode()
@@
-                        or (key.split(":1:")[1] in json_params[tool])
+                        or (key.split(":1:")[1] in json_params[tool_instance])
                         or (is_list_param and not value)  # Always save empty list params
                     ):
                         # store non-default value
-                        json_params[tool][key.split(":1:")[1]] = value
+                        json_params[tool_instance][key.split(":1:")[1]] = value
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 836-836: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/workflow/StreamlitUI.py` around lines 836 - 840, The issue is that widget
keys now contain the tool_instance_name (e.g., CometAdapter-TMT:1:...), and when
ParameterManager.save_parameters() extracts the tool name from these keys and
calls create_ini(tool), it passes the instance name instead of the base tool
name, causing it to look for non-existent files like CometAdapter-TMT.ini and
skip saving parameter overrides. In the ParameterManager.save_parameters()
method, before calling create_ini(tool), resolve the instance name back to its
base tool name by extracting just the tool portion (the part before the :1: or
instance identifier) so that it correctly loads and saves to the actual ini file
for that tool.

Comment on lines +872 to +877
elif isinstance(p["value"], bool):
bool_value = (
(p["value"] == "true")
if type(p["value"]) == str
else p["value"]
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify direct type comparisons flagged by Ruff E721 are removed.
# Expect: no matches after switching to `isinstance()` or removing unreachable checks.
rg -nP 'type\(.+\)\s*==\s*str' src/workflow/StreamlitUI.py

Repository: OpenMS/quantms-web

Length of output: 121


🏁 Script executed:

sed -n '865,885p' src/workflow/StreamlitUI.py | cat -n

Repository: OpenMS/quantms-web

Length of output: 1193


Remove the unreachable string type check in the bool branch.

The isinstance(p["value"], bool) check on line 872 guarantees that p["value"] is a bool at that point. The subsequent type(p["value"]) == str check is unreachable and will never be true, making the conditional unnecessary.

🔧 Proposed fix
                     # bools
                     elif isinstance(p["value"], bool):
-                        bool_value = (
-                            (p["value"] == "true")
-                            if type(p["value"]) == str
-                            else p["value"]
-                        )
+                        bool_value = p["value"]
                         cols[i].selectbox(
🧰 Tools
🪛 Ruff (0.15.17)

[error] 875-875: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/workflow/StreamlitUI.py` around lines 872 - 877, In the bool branch where
isinstance(p["value"], bool) is already confirmed on line 872, the subsequent
type check for string (type(p["value"]) == str) is unreachable and will never be
true. Simplify the bool_value assignment by directly using p["value"] since it
is guaranteed to be a boolean at that point, removing the unnecessary
conditional logic that checks if it's a string.

Source: Linters/SAST tools

Comment thread src/WorkflowTest.py
Comment on lines +390 to +391
st.write(Path(self.workflow_dir, "results"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove debug statement.

st.write(Path(self.workflow_dir, "results")) appears to be a debug statement that was left in the TMT tab configuration UI. This will display raw path information to users in production.

🧹 Proposed fix
             st.info(comet_info)

-            st.write(Path(self.workflow_dir, "results"))
-
             comet_include = [":enzyme", "missed_cleavages", "fixed_modifications", "variable_modifications",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/WorkflowTest.py` around lines 390 - 391, Remove the debug statement
st.write(Path(self.workflow_dir, "results")) from the TMT tab configuration UI
code. This line is displaying raw path information to users and should be
deleted entirely as it serves no functional purpose in production.

Comment thread src/WorkflowTest.py
Comment on lines +425 to +436
"PeptideIndexing:unmatched_action": "warn",
"max_variable_mods_in_peptide": 3,
"precursor_mass_tolerance": 4.5,
"isotope_error": "0/1",
"precursor_error_units": "ppm",
"num_hits": 1,
"num_enzyme_termini": "fully",
"fragment_bin_offset": 0.0,
"minimum_peaks": 10,
"precursor_charge": "2:4",
"fragment_mass_tolerance": 0.015,
"PeptideIndexing:unmatched_action": "warn",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate dictionary key overwrites earlier value.

"PeptideIndexing:unmatched_action": "warn" is specified twice in the custom_defaults dict (lines 425 and 436). The second occurrence silently overwrites the first. While both have the same value here, this indicates copy-paste error and could cause subtle bugs if values diverge.

🧹 Proposed fix
                     "precursor_error_units": "ppm",
                     "num_hits": 1,
                     "num_enzyme_termini": "fully",
                     "fragment_bin_offset": 0.0,
                     "minimum_peaks": 10,
                     "precursor_charge": "2:4",
                     "fragment_mass_tolerance": 0.015,
-                    "PeptideIndexing:unmatched_action": "warn",
                     "variable_modifications": "Oxidation (M)\nAcetyl (Protein N-term)\nTMT6plex (K)\nTMT6plex (N-term)",
                     "debug": 0,
                     "force": True,
                 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"PeptideIndexing:unmatched_action": "warn",
"max_variable_mods_in_peptide": 3,
"precursor_mass_tolerance": 4.5,
"isotope_error": "0/1",
"precursor_error_units": "ppm",
"num_hits": 1,
"num_enzyme_termini": "fully",
"fragment_bin_offset": 0.0,
"minimum_peaks": 10,
"precursor_charge": "2:4",
"fragment_mass_tolerance": 0.015,
"PeptideIndexing:unmatched_action": "warn",
"PeptideIndexing:unmatched_action": "warn",
"max_variable_mods_in_peptide": 3,
"precursor_mass_tolerance": 4.5,
"isotope_error": "0/1",
"precursor_error_units": "ppm",
"num_hits": 1,
"num_enzyme_termini": "fully",
"fragment_bin_offset": 0.0,
"minimum_peaks": 10,
"precursor_charge": "2:4",
"fragment_mass_tolerance": 0.015,
"variable_modifications": "Oxidation (M)\nAcetyl (Protein N-term)\nTMT6plex (K)\nTMT6plex (N-term)",
"debug": 0,
"force": True,
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/WorkflowTest.py` around lines 425 - 436, The custom_defaults dictionary
contains a duplicate key "PeptideIndexing:unmatched_action" that appears on both
line 425 and line 436, where the second occurrence silently overwrites the
first. Remove one of the duplicate "PeptideIndexing:unmatched_action": "warn"
entries from the dictionary to eliminate the copy-paste error.

Comment thread src/WorkflowTest.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/WorkflowTest.py (3)

688-688: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Accidental overwrite of results_dir displays wrong path to users.

Line 682 sets results_dir = Path(self.workflow_dir, "results"), then line 688 overwrites it to "input-files". At line 1154, st.code(str(results_dir)) displays "input-files" instead of the actual results directory.

🐛 Proposed fix: Remove the errant line
             for d in [comet_dir, perc_dir, filter_dir, quant_dir]:
                 d.mkdir(parents=True, exist_ok=True)

             self.logger.log("📁 Output directories created")
-
-            results_dir = Path(self.workflow_dir, "input-files")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/WorkflowTest.py` at line 688, The `results_dir` variable is being
overwritten at line 688 with an incorrect path to "input-files" after being
properly set to the "results" directory at line 682. This causes the wrong path
to be displayed when `st.code(str(results_dir))` is called later. Remove the
accidental reassignment of `results_dir` at line 688 that sets it to the
"input-files" path.

1608-1610: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

TMT execution branch does not return a success value.

The execution() method returns bool, and the LFQ branch correctly returns True at line 1161. However, the TMT branch ends without a return statement, causing an implicit None return. Callers checking the return value will interpret this as failure.

🐛 Proposed fix
             self.logger.log("📄 Generating protein table...")
             
             self.logger.log("🎉 WORKFLOW FINISHED")
+            return True
 
     `@st.fragment`
     def results(self) -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/WorkflowTest.py` around lines 1608 - 1610, The `execution()` method in
the TMT execution branch is missing a return statement after the "🎉 WORKFLOW
FINISHED" log message at line 1610. While the LFQ branch correctly returns
`True` at line 1161, the TMT branch ends without an explicit return, causing an
implicit `None` return instead of a boolean. Add `return True` after the final
logger.log call in the TMT branch to ensure the method consistently returns a
success boolean value that callers can properly check.

457-458: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

TMT decoy pattern mismatch will cause incorrect FDR calculation.

DecoyDatabase at line 368 uses "decoy_string": "rev_", but PercolatorAdapter-TMT uses "decoy_pattern": "DECOY_". Similarly, ProteinInference-TMT at line 507 uses "picked_decoy_string": "DECOY_".

Percolator will not recognize decoys prefixed with rev_ when searching for DECOY_, causing all decoys to be treated as targets and leading to incorrect FDR estimates.

🐛 Proposed fix: Use consistent decoy string
             self.ui.input_TOPP(
                 "PercolatorAdapter",
                 custom_defaults={
                     "subset_max_train": 300000,
-                    "decoy_pattern": "DECOY_",
+                    "decoy_pattern": "rev_",
                     "score_type": "pep",
                     "post_processing_tdc": True,
                     "debug": 0,
                 },

Also fix ProteinInference-TMT at line 507:

             self.ui.input_TOPP(
                 "ProteinInference",
                 custom_defaults={
                     "threads": 8,
-                    "picked_decoy_string": "DECOY_",
+                    "picked_decoy_string": "rev_",
                     "picked_fdr": "true",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/WorkflowTest.py` around lines 457 - 458, The decoy string prefix used in
DecoyDatabase component at line 368 is inconsistent with the decoy patterns
expected by PercolatorAdapter-TMT (around line 457-458) and ProteinInference-TMT
(at line 507). The DecoyDatabase is using "rev_" as the decoy prefix, but
PercolatorAdapter-TMT is looking for "DECOY_" pattern, which means decoys won't
be recognized correctly during FDR calculation. Update the "decoy_string"
parameter in the DecoyDatabase component to use "DECOY_" instead of "rev_", and
ensure ProteinInference-TMT at line 507 uses the same consistent decoy string
prefix so that all components reference the same decoy pattern throughout the
workflow.
src/workflow/CommandExecutor.py (1)

286-293: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bug: Loop variable i shadowed by single-value handling causes incorrect file pairing.

When a single-value input/output key is processed before a multi-value key within the same outer loop iteration, i is set to 0 and remains so for subsequent keys. This causes files to be incorrectly paired.

Example: {"db": ["db.fasta"], "mzML": ["f1.mzML", "f2.mzML", "f3.mzML"]} — when processing outer loop i=1, the "db" key sets i=0, then "mzML" incorrectly uses value[0] instead of value[1].

🐛 Proposed fix
             for k in input_output.keys():
                 # add key as parameter name
                 command += [f"-{k}"]
                 # get value from input_output dictionary
                 value = input_output[k]
-                # when multiple input/output files exist (e.g., multiple mzMLs and featureXMLs), but only one additional input file (e.g., one input database file)
-                if len(value) == 1:
-                    i = 0
-                # when the entry is a list of collected files to be passed as one [["sample1", "sample2"]]
-                if isinstance(value[i], list):
-                    command += value[i]
-                # standard case, files was a list of strings, take the file name at index
+                # Use index 0 for single-value inputs, otherwise use process index
+                idx = 0 if len(value) == 1 else i
+                # when the entry is a list of collected files to be passed as one [["sample1", "sample2"]]
+                if isinstance(value[idx], list):
+                    command += value[idx]
+                # standard case, files was a list of strings, take the file name at index
                 else:
-                    command += [value[i]]
+                    command += [value[idx]]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/workflow/CommandExecutor.py` around lines 286 - 293, The variable `i` is
being reassigned to 0 when processing single-value entries in the conditional
block `if len(value) == 1`, but since `i` is the outer loop index used across
multiple key-value pairs, this modification persists and causes incorrect file
pairing for subsequent keys in the same iteration. Replace the assignment `i =
0` with a separate local variable (for example, `index = 0`), then use this new
variable consistently in the `isinstance(value[i], list)` check and the
subsequent `command += [value[i]]` line to access the correct element from the
current value without affecting the outer loop index.
🧹 Nitpick comments (1)
src/workflow/CommandExecutor.py (1)

219-219: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Mutable default argument and implicit Optional type.

Static analysis flags two issues:

  1. custom_params: dict = {} uses a mutable default which can cause unexpected behavior if modified.
  2. tool_instance_name: str = None should be str | None per PEP 484.
♻️ Proposed fix
-    def run_topp(self, tool: str, input_output: dict, custom_params: dict = {}, tool_instance_name: str = None) -> bool:
+    def run_topp(self, tool: str, input_output: dict, custom_params: dict | None = None, tool_instance_name: str | None = None) -> bool:

Then initialize inside the function:

if custom_params is None:
    custom_params = {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/workflow/CommandExecutor.py` at line 219, The run_topp method has two
type-related issues in its signature. First, change the custom_params parameter
from dict = {} to dict = None, and add initialization logic inside the function
body to create an empty dict if custom_params is None. Second, update the
tool_instance_name parameter type annotation from str = None to str | None =
None to properly indicate it accepts None values per PEP 484.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/workflow/CommandExecutor.py`:
- Around line 286-293: The variable `i` is being reassigned to 0 when processing
single-value entries in the conditional block `if len(value) == 1`, but since
`i` is the outer loop index used across multiple key-value pairs, this
modification persists and causes incorrect file pairing for subsequent keys in
the same iteration. Replace the assignment `i = 0` with a separate local
variable (for example, `index = 0`), then use this new variable consistently in
the `isinstance(value[i], list)` check and the subsequent `command +=
[value[i]]` line to access the correct element from the current value without
affecting the outer loop index.

In `@src/WorkflowTest.py`:
- Line 688: The `results_dir` variable is being overwritten at line 688 with an
incorrect path to "input-files" after being properly set to the "results"
directory at line 682. This causes the wrong path to be displayed when
`st.code(str(results_dir))` is called later. Remove the accidental reassignment
of `results_dir` at line 688 that sets it to the "input-files" path.
- Around line 1608-1610: The `execution()` method in the TMT execution branch is
missing a return statement after the "🎉 WORKFLOW FINISHED" log message at line
1610. While the LFQ branch correctly returns `True` at line 1161, the TMT branch
ends without an explicit return, causing an implicit `None` return instead of a
boolean. Add `return True` after the final logger.log call in the TMT branch to
ensure the method consistently returns a success boolean value that callers can
properly check.
- Around line 457-458: The decoy string prefix used in DecoyDatabase component
at line 368 is inconsistent with the decoy patterns expected by
PercolatorAdapter-TMT (around line 457-458) and ProteinInference-TMT (at line
507). The DecoyDatabase is using "rev_" as the decoy prefix, but
PercolatorAdapter-TMT is looking for "DECOY_" pattern, which means decoys won't
be recognized correctly during FDR calculation. Update the "decoy_string"
parameter in the DecoyDatabase component to use "DECOY_" instead of "rev_", and
ensure ProteinInference-TMT at line 507 uses the same consistent decoy string
prefix so that all components reference the same decoy pattern throughout the
workflow.

---

Nitpick comments:
In `@src/workflow/CommandExecutor.py`:
- Line 219: The run_topp method has two type-related issues in its signature.
First, change the custom_params parameter from dict = {} to dict = None, and add
initialization logic inside the function body to create an empty dict if
custom_params is None. Second, update the tool_instance_name parameter type
annotation from str = None to str | None = None to properly indicate it accepts
None values per PEP 484.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fa02f9da-05c8-42f2-90d2-c24d3cf25e4c

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7c158 and 033a295.

📒 Files selected for processing (2)
  • src/WorkflowTest.py
  • src/workflow/CommandExecutor.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant