Skip to content

HTML report/nfcore addons#4

Open
Lebaranto wants to merge 5 commits into
mainfrom
feature/nfcore-adaptation
Open

HTML report/nfcore addons#4
Lebaranto wants to merge 5 commits into
mainfrom
feature/nfcore-adaptation

Conversation

@Lebaranto

@Lebaranto Lebaranto commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Updated pipeline HTML report with fixed nfcore modules issues

Summary by CodeRabbit

  • New Features
    • Added expanded QC sidecar outputs, including FastQC reports and samtools QC summaries (flagstat, stats, idxstats, quickcheck), published under qc/fastqc/ and qc/samtools/.
    • Added an end-of-run TrES report: tres_report/tres_report.html plus tres_report/tres_report_metrics.json.
    • Added MultiQC outputs under multiqc/ and enhanced DNA reporting based on updated duplicate/NoDup and coverage handling.
  • Documentation
    • Updated output and runtime documentation to reflect the new qc/, multiqc/, and tres_report/ directories and the updated report contents/paths.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@Lebaranto, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 37 minutes and 42 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f83f043-bea3-4039-8702-d2c9c697c051

📥 Commits

Reviewing files that changed from the base of the PR and between 8525b05 and 2dd0855.

📒 Files selected for processing (11)
  • .gitattributes
  • .github/workflows/nf-test.yml
  • .pre-commit-config.yaml
  • README.md
  • modules.json
  • modules/local/rna_starsolo_align/main.nf
  • scripts/core_runtime/Split_ReadsV2.codon
  • scripts/core_runtime/Tag.codon
  • scripts/core_runtime/Tag_Lig3.codon
  • scripts/core_runtime/Tag_UMI.codon
  • scripts/core_runtime/utils.codon
📝 Walkthrough

Walkthrough

The PR adds nf-core QC modules, a TrES HTML report generator, and workflow wiring for new RNA/DNA reporting outputs. It also replaces legacy DNA duplicate-marking and coverage steps with nf-core markduplicates and DeepTools-based coverage plus normalization.

Changes

QC & Reporting Infrastructure

Layer / File(s) Summary
DNA markduplicates and normalization
modules/nf-core/gatk4/markduplicates/..., modules/local/normalize_dna_markduplicates/main.nf
Introduces GATK4_MARKDUPLICATES with its environment, metadata, tests, and config overrides, and adds NORMALIZE_DNA_MARKDUPLICATES to rename the outputs into the TrESFlow DNA filename contract.
NoDup check and coverage
modules/local/check_dna_nodup_bam/main.nf, modules/nf-core/deeptools/bamcoverage/..., modules/local/normalize_dna_bamcoverage/main.nf
Adds CHECK_DNA_NODUP_BAM, the DeepTools bamcoverage module, and NORMALIZE_DNA_BAMCOVERAGE, including the zero-mapped warning path, CRAM handling, and renamed *_NoDup.bw output.
FastQC and samtools sidecar QC
modules/nf-core/fastqc/..., modules/nf-core/samtools/{flagstat,stats,idxstats,quickcheck}/..., modules/local/samtools_quickcheck_report/main.nf, assets/test_realdata/ligation_barcode_whitelist.txt
Installs FastQC, the samtools flagstat/stats/idxstats/quickcheck modules, the quickcheck report adapter, and the barcode whitelist asset.
MultiQC module
modules/nf-core/multiqc/...
Introduces the MultiQC module, its environment and lockfiles, metadata, process implementation, tests, and prefix override configuration.
TrES report renderer
bin/render_tres_report.py, modules/local/tres_report_html/main.nf
Adds the TrES report renderer script and the TRES_REPORT_HTML module that turns collected QC artifacts into HTML and JSON outputs.
Subworkflow and TRESEQ wiring
subworkflows/local/dna_core/main.nf, subworkflows/local/dna_core/meta.yml, subworkflows/local/rna_core/main.nf, modules/local/rna_starsolo_align/main.nf, workflows/treseq.nf, conf/base.config, conf/modules.config, conf/test.config, modules.json, README.md, docs/output.md, docs/usage.md, docs/architecture/implemented_pipeline.md
Rewires DNA_CORE and RNA_CORE around the new QC modules, updates TRESEQ to build and aggregate QC channels, and refreshes the config, registry, and documentation to describe the new outputs and runtime contract.

Sequence Diagram(s)

sequenceDiagram
  participant TRESEQ as TRESEQ workflow
  participant RNA as RNA_CORE
  participant DNA as DNA_CORE
  participant FQC as FASTQC
  participant SAM as SAMTOOLS sidecars
  participant QCR as SAMTOOLS_QUICKCHECK_REPORT
  participant TRES as TRES_REPORT_HTML
  participant MQC as MULTIQC

  TRESEQ->>RNA: RNA BAMs, solo summaries, star logs, barcode files
  TRESEQ->>DNA: DNA BAMs, dup metrics, bigwigs, warnings
  TRESEQ->>FQC: raw FASTQ tuples
  TRESEQ->>SAM: BAM/BAI inputs
  SAM-->>QCR: exit codes
  RNA-->>TRES: solo summaries, star logs, barcode files
  DNA-->>TRES: dup metrics, coverage warnings
  FQC-->>TRES: FASTQC ZIPs
  SAM-->>TRES: flagstat/stats/idxstats/quickcheck
  TRES-->>TRESEQ: HTML report + metrics JSON
  RNA-->>MQC: QC inputs
  DNA-->>MQC: QC inputs
  FQC-->>MQC: HTML/ZIP
  SAM-->>MQC: sidecar QC files
  MQC-->>TRESEQ: MultiQC report
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hoppy hops through QC lanes,
Sidecars, reports, and shiny data trains.
Mark dups, count reads, and singout loud,
MultiQC and TrES make this bunny proud!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is related, but too vague to convey the main change in the PR. Use a more specific title that names the core change, such as the new nf-core QC/report integrations and HTML report generation.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/nfcore-adaptation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (5)
bin/render_tres_report.py (1)

127-154: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding strict=True to zip() for data integrity.

At line 138, zip(header, parts) without strict=True will silently truncate if the header and data row have different column counts. While GATK metrics files have a consistent format, adding strict=True would catch malformed input early rather than producing incorrect metrics silently.

Suggested fix
-        values = dict(zip(header, parts))
+        values = dict(zip(header, parts, strict=True))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/render_tres_report.py` around lines 127 - 154, The `zip(header, parts)`
call in the read_duplicate_metrics function will silently truncate if the header
and data row have different column counts, potentially causing incorrect metrics
without any warning. Add `strict=True` as a parameter to the zip call where
`values = dict(zip(header, parts))` is assigned to ensure that mismatched column
counts are detected and caught early, rather than silently producing incomplete
or incorrect data.

Source: Linters/SAST tools

modules/nf-core/multiqc/tests/main.nf.test (1)

106-107: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Avoid network-dependent fixtures in module tests.

Line 106-107 and Line 149-150 fetch config files from GitHub at test time. This makes tests susceptible to network/transient failures. Prefer committed local fixture files under the module test assets to keep CI deterministic.

Also applies to: 149-150

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/multiqc/tests/main.nf.test` around lines 106 - 107, Remove
network dependencies from the MultiQC module tests by replacing the GitHub URL
references in the multiqc_config.yml file access with local fixture files.
Download the multiqc_config.yml file from the GitHub URL and commit it to the
module test assets directory, then update both occurrences (around lines 106-107
and 149-150) to reference the local file path instead of the remote GitHub URL
using the file() function with the local asset path. This ensures tests remain
deterministic and do not depend on network availability or transient GitHub
connectivity issues.
modules/nf-core/samtools/idxstats/meta.yml (1)

29-38: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Align declared input contract with the actual process requirements.

Line 31-37 currently documents SAM/SAI support, but modules/nf-core/samtools/idxstats/main.nf (Line 11) requires an index path in all cases. Please either narrow the documented patterns to indexed BAM/CRAM inputs or make the index optional in the process contract so docs and runtime behavior match.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/samtools/idxstats/meta.yml` around lines 29 - 38, The input
patterns declared in the meta.yml file do not match the actual process
requirements in main.nf, which mandates an index file in all cases. Narrow the
pattern specifications for both the bam and bai input definitions to remove
unsupported formats: remove SAM (sam) from the bam pattern on line 33 to include
only "*.{bam,cram}", and remove SAI (sai) from the bai pattern on line 37 to
include only "*.{bai,crai}". This aligns the documented input contract with the
actual runtime behavior that requires indexed BAM or CRAM files.
modules/nf-core/gatk4/markduplicates/main.nf (2)

29-32: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider adding def keyword for local variables.

While Nextflow allows variable assignments without def, using def for local variables (prefix and prefix_bam) is more idiomatic and makes scope explicit.

♻️ Suggested style improvement
-    prefix = task.ext.prefix ?: "${meta.id}.bam"
+    def prefix = task.ext.prefix ?: "${meta.id}.bam"

     // If the extension is CRAM, then change it to BAM
-    prefix_bam = prefix.tokenize('.')[-1] == 'cram' ? "${prefix.substring(0, prefix.lastIndexOf('.'))}.bam" : prefix
+    def prefix_bam = prefix.tokenize('.')[-1] == 'cram' ? "${prefix.substring(0, prefix.lastIndexOf('.'))}.bam" : prefix
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/gatk4/markduplicates/main.nf` around lines 29 - 32, Add the
`def` keyword before both local variable declarations for `prefix` and
`prefix_bam`. This makes the variable scope explicit and follows Nextflow
idioms. Change the assignments to use `def prefix = ...` and `def prefix_bam =
...` instead of declaring them without the keyword.

65-74: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Stub creates both output formats, while script creates only one.

The stub implementation creates both .bam and .cram outputs (plus both index types), whereas the real script produces only the format specified by prefix. This inconsistency could cause confusion when tests run in stub mode versus real mode, though it won't cause failures since outputs are marked optional.

Consider making the stub match the real behavior more closely by checking the prefix and creating only the expected output format.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/gatk4/markduplicates/main.nf` around lines 65 - 74, The stub
block unconditionally creates both BAM and CRAM output files with their indices,
while the real script creates only the format specified by the prefix variable.
Modify the stub implementation to conditionally create only the expected output
format based on the prefix. Check whether the prefix indicates BAM or CRAM
format (by examining the prefix string or the actual script's logic) and use
conditional statements in the stub block to create only the corresponding output
files and their appropriate index, making the stub behavior consistent with the
actual script behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@conf/modules.config`:
- Around line 47-50: In the GATK4_MARKDUPLICATES configuration block, the
ext.prefix parameter currently includes the .bam file extension. Remove the .bam
extension from the ext.prefix value so that it only contains
"${meta.id}_MarkedDup" instead of "${meta.id}_MarkedDup.bam". The nf-core module
will automatically append the appropriate file extension, and removing the
hardcoded extension will prevent double extensions in the output file names.

In `@conf/test.config`:
- Around line 95-98: The DEEPTOOLS_BAMCOVERAGE process block is missing the
ext.mock = true configuration that other test tasks have. Add the line ext.mock
= true inside the withName: DEEPTOOLS_BAMCOVERAGE block so that the real module
is not executed during test runs, keeping tests fast and reliable.

In `@modules/nf-core/deeptools/bamcoverage/main.nf`:
- Around line 61-64: The stub block uses the args variable without declaring it
first, while the script block properly declares args. Add a declaration of args
(likely from task.ext.args with an empty string as default) at the beginning of
the stub block, before the line where args.contains() is called. This ensures
the variable is defined when checking for bedgraph format options, matching the
pattern used in the script block.
- Around line 33-34: The CRAM detection and output naming logic uses incorrect
Nextflow Path property names with uppercase letters. Replace `input.Extension`
with `input.extension` and replace `input.BaseName` with `input.baseName` in the
lines where `is_cram` and `input_out` variables are defined, as Nextflow Path
API requires lowercase property names or method calls like `getExtension()` and
`getBaseName()`.

In `@modules/nf-core/deeptools/bamcoverage/tests/main.nf.test`:
- Line 43: The test name contains a typo where "homo_sampiens" is misspelled.
Change "homo_sampiens" to "homo_sapiens" in the test function name to correct
the species name spelling in the test method declaration.
- Line 105: Fix the typo in the test name string for the test function at this
location. The word "homo_sampiens" in the test name is misspelled and should be
corrected to "homo_sapiens" to match the correct scientific name of the human
species. Update the string passed to the test function to use the correct
spelling.
- Line 12: The test name contains a typo: "homo_sampiens" should be corrected to
"homo_sapiens" (the correct scientific name for humans). Locate the test
function and update the test name string from "homo_sampiens - bam" to
"homo_sapiens - bam" to fix the spelling error.
- Line 74: Fix the typo in the test name where "homo_sampiens" should be
corrected to "homo_sapiens" in the test function. Change the string in the
test() call from "homo_sampiens - cram - fasta" to "homo_sapiens - cram - fasta"
to match the correct species name spelling.

In `@modules/nf-core/fastqc/main.nf`:
- Around line 24-27: The `old_new_pairs` assignment has a logic error in the
single-element list handling. When `reads.size() == 1` is true, `reads` is a
list, not a Path, so accessing `reads.extension` directly will fail. Fix this by
extracting the first element from the list (using `reads[0]` or `reads.first()`)
before accessing the `.extension` property when handling the single-element case
in the ternary operator.

In `@modules/nf-core/fastqc/tests/main.nf.test`:
- Around line 133-140: The assertions in the test are checking for output
filenames that do not match the input files. The 3rd and 4th outputs in the html
and zip assertions (at indices [0][1][2] and [0][1][3]) expect test_3 and
test_4, but the actual input fixtures are test2_1.fastq.gz and test2_2.fastq.gz.
Update all four assertions in both the html and zip output blocks to expect
test2_1_fastqc and test2_2_fastqc respectively instead of test_3 and test_4 to
align with the input file names.

In `@modules/nf-core/gatk4/markduplicates/main.nf`:
- Around line 58-62: The CRAM conversion block (when prefix ends with .cram)
uses the fasta reference file with the samtools view command but lacks
validation that the fasta file exists. Add a validation check inside the if
block that tests whether the fasta variable points to an existing file, and only
execute the samtools view -Ch -T command and subsequent operations if the fasta
file is present. If fasta is missing, the block should either error out with a
clear message or skip the CRAM conversion.

In `@modules/nf-core/gatk4/markduplicates/meta.yml`:
- Around line 1-99: The metrics output section declares a pattern of
`*.{metrics.txt}` but the actual process outputs files with the pattern
`*.{metrics}` without the `.txt` suffix. Update the pattern field in the metrics
output section to match the actual file output by changing it from
`*.{metrics.txt}` to `*.{metrics}`.

In `@modules/nf-core/multiqc/meta.yml`:
- Around line 64-68: Update the pattern field for the HTML output in the MultiQC
module metadata to use a valid glob pattern. Change the pattern value from
".html" to "*.html" to correctly match actual emitted filenames like
multiqc_report.html. This ensures the pattern field accurately reflects the
output contract of the module.
- Around line 85-88: The type declaration for the `*_plots` output does not
match the actual emission type from the Nextflow process. Change the `type:
file` declaration for the `*_plots` output to `type: directory` to correctly
reflect that the process emits a directory path (via `path("*_plots")` in
main.nf) rather than a single file, ensuring the schema aligns with the actual
process contract.

In `@modules/nf-core/samtools/quickcheck/tests/main.nf.test`:
- Around line 151-152: The remote fixture URLs at lines 151-152, 172-173, and
193-194 use mutable branch references (refs/heads/develop) which can cause
non-deterministic CI behavior. Replace all occurrences of the github.com URLs
pointing to refs/heads/develop with immutable commit-SHA based URLs (e.g.,
refs/heads/develop replaced with a specific commit hash) or vendor the test
fixtures locally to ensure snapshots and expected exit codes remain stable over
time.

---

Nitpick comments:
In `@bin/render_tres_report.py`:
- Around line 127-154: The `zip(header, parts)` call in the
read_duplicate_metrics function will silently truncate if the header and data
row have different column counts, potentially causing incorrect metrics without
any warning. Add `strict=True` as a parameter to the zip call where `values =
dict(zip(header, parts))` is assigned to ensure that mismatched column counts
are detected and caught early, rather than silently producing incomplete or
incorrect data.

In `@modules/nf-core/gatk4/markduplicates/main.nf`:
- Around line 29-32: Add the `def` keyword before both local variable
declarations for `prefix` and `prefix_bam`. This makes the variable scope
explicit and follows Nextflow idioms. Change the assignments to use `def prefix
= ...` and `def prefix_bam = ...` instead of declaring them without the keyword.
- Around line 65-74: The stub block unconditionally creates both BAM and CRAM
output files with their indices, while the real script creates only the format
specified by the prefix variable. Modify the stub implementation to
conditionally create only the expected output format based on the prefix. Check
whether the prefix indicates BAM or CRAM format (by examining the prefix string
or the actual script's logic) and use conditional statements in the stub block
to create only the corresponding output files and their appropriate index,
making the stub behavior consistent with the actual script behavior.

In `@modules/nf-core/multiqc/tests/main.nf.test`:
- Around line 106-107: Remove network dependencies from the MultiQC module tests
by replacing the GitHub URL references in the multiqc_config.yml file access
with local fixture files. Download the multiqc_config.yml file from the GitHub
URL and commit it to the module test assets directory, then update both
occurrences (around lines 106-107 and 149-150) to reference the local file path
instead of the remote GitHub URL using the file() function with the local asset
path. This ensures tests remain deterministic and do not depend on network
availability or transient GitHub connectivity issues.

In `@modules/nf-core/samtools/idxstats/meta.yml`:
- Around line 29-38: The input patterns declared in the meta.yml file do not
match the actual process requirements in main.nf, which mandates an index file
in all cases. Narrow the pattern specifications for both the bam and bai input
definitions to remove unsupported formats: remove SAM (sam) from the bam pattern
on line 33 to include only "*.{bam,cram}", and remove SAI (sai) from the bai
pattern on line 37 to include only "*.{bai,crai}". This aligns the documented
input contract with the actual runtime behavior that requires indexed BAM or
CRAM files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a845e072-2d59-4e8e-b9b3-881191005d1c

📥 Commits

Reviewing files that changed from the base of the PR and between d89a0b8 and 24b6560.

⛔ Files ignored due to path filters (8)
  • modules/nf-core/deeptools/bamcoverage/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/fastqc/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/gatk4/markduplicates/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/multiqc/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/samtools/flagstat/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/samtools/idxstats/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/samtools/quickcheck/tests/main.nf.test.snap is excluded by !**/*.snap
  • modules/nf-core/samtools/stats/tests/main.nf.test.snap is excluded by !**/*.snap
📒 Files selected for processing (61)
  • README.md
  • assets/test_realdata/ligation_barcode_whitelist.txt
  • bin/render_tres_report.py
  • conf/base.config
  • conf/modules.config
  • conf/test.config
  • docs/architecture/implemented_pipeline.md
  • docs/output.md
  • docs/usage.md
  • modules.json
  • modules/local/check_dna_nodup_bam/main.nf
  • modules/local/normalize_dna_bamcoverage/main.nf
  • modules/local/normalize_dna_markduplicates/main.nf
  • modules/local/rna_starsolo_align/main.nf
  • modules/local/samtools_quickcheck_report/main.nf
  • modules/local/tres_report_html/main.nf
  • modules/nf-core/deeptools/bamcoverage/environment.yml
  • modules/nf-core/deeptools/bamcoverage/main.nf
  • modules/nf-core/deeptools/bamcoverage/meta.yml
  • modules/nf-core/deeptools/bamcoverage/tests/main.nf.test
  • modules/nf-core/fastqc/.conda-lock/linux_amd64-bd-5cb1a2fa2f18c7c2_1.txt
  • modules/nf-core/fastqc/.conda-lock/linux_arm64-bd-e455e32f745abe68_1.txt
  • modules/nf-core/fastqc/environment.yml
  • modules/nf-core/fastqc/main.nf
  • modules/nf-core/fastqc/meta.yml
  • modules/nf-core/fastqc/tests/main.nf.test
  • modules/nf-core/gatk4/markduplicates/environment.yml
  • modules/nf-core/gatk4/markduplicates/main.nf
  • modules/nf-core/gatk4/markduplicates/meta.yml
  • modules/nf-core/gatk4/markduplicates/tests/bam.config
  • modules/nf-core/gatk4/markduplicates/tests/cram.config
  • modules/nf-core/gatk4/markduplicates/tests/main.nf.test
  • modules/nf-core/multiqc/.conda-lock/linux_amd64-bd-c17fb751507e9dfc_1.txt
  • modules/nf-core/multiqc/.conda-lock/linux_arm64-bd-5c84a5000a226ab5_1.txt
  • modules/nf-core/multiqc/environment.yml
  • modules/nf-core/multiqc/main.nf
  • modules/nf-core/multiqc/meta.yml
  • modules/nf-core/multiqc/tests/custom_prefix.config
  • modules/nf-core/multiqc/tests/main.nf.test
  • modules/nf-core/multiqc/tests/nextflow.config
  • modules/nf-core/samtools/flagstat/environment.yml
  • modules/nf-core/samtools/flagstat/main.nf
  • modules/nf-core/samtools/flagstat/meta.yml
  • modules/nf-core/samtools/flagstat/tests/main.nf.test
  • modules/nf-core/samtools/idxstats/environment.yml
  • modules/nf-core/samtools/idxstats/main.nf
  • modules/nf-core/samtools/idxstats/meta.yml
  • modules/nf-core/samtools/idxstats/tests/main.nf.test
  • modules/nf-core/samtools/quickcheck/environment.yml
  • modules/nf-core/samtools/quickcheck/main.nf
  • modules/nf-core/samtools/quickcheck/meta.yml
  • modules/nf-core/samtools/quickcheck/tests/main.nf.test
  • modules/nf-core/samtools/quickcheck/tests/nextflow.config
  • modules/nf-core/samtools/stats/environment.yml
  • modules/nf-core/samtools/stats/main.nf
  • modules/nf-core/samtools/stats/meta.yml
  • modules/nf-core/samtools/stats/tests/main.nf.test
  • subworkflows/local/dna_core/main.nf
  • subworkflows/local/dna_core/meta.yml
  • subworkflows/local/rna_core/main.nf
  • workflows/treseq.nf

Comment thread conf/modules.config
Comment on lines +47 to +50
withName: GATK4_MARKDUPLICATES {
ext.prefix = { "${meta.id}_MarkedDup.bam" }
ext.args = '--REMOVE_DUPLICATES false --BARCODE_TAG CB --CREATE_INDEX true --MAX_RECORDS_IN_RAM 10000000'
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove the .bam extension from the ext.prefix.

The ext.prefix for GATK4_MARKDUPLICATES should not include the file extension. nf-core modules append their own extensions to the prefix, so including .bam here will produce output files with double extensions like *_MarkedDup.bam.bam.

🐛 Proposed fix
-    ext.prefix = { "${meta.id}_MarkedDup.bam" }
+    ext.prefix = { "${meta.id}_MarkedDup" }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/modules.config` around lines 47 - 50, In the GATK4_MARKDUPLICATES
configuration block, the ext.prefix parameter currently includes the .bam file
extension. Remove the .bam extension from the ext.prefix value so that it only
contains "${meta.id}_MarkedDup" instead of "${meta.id}_MarkedDup.bam". The
nf-core module will automatically append the appropriate file extension, and
removing the hardcoded extension will prevent double extensions in the output
file names.

Comment thread conf/test.config
Comment on lines +95 to 98
withName: DEEPTOOLS_BAMCOVERAGE {
cpus = 1
memory = '1 GB'
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

DEEPTOOLS_BAMCOVERAGE is not mocked in test profile.

Line 95 configures this process in conf/test.config, but unlike the surrounding test tasks it is missing ext.mock = true. That can execute the real module during tests and make test runs slower/flaky.

Suggested fix
     withName: DEEPTOOLS_BAMCOVERAGE {
+        ext.mock = true
         cpus = 1
         memory = '1 GB'
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/test.config` around lines 95 - 98, The DEEPTOOLS_BAMCOVERAGE process
block is missing the ext.mock = true configuration that other test tasks have.
Add the line ext.mock = true inside the withName: DEEPTOOLS_BAMCOVERAGE block so
that the real module is not executed during test runs, keeping tests fast and
reliable.

Comment on lines +33 to +34
def is_cram = input.Extension == "cram" ? true : false
def input_out = is_cram ? input.BaseName + ".bam" : "${input}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify path-property usage patterns in this module and nearby nf-core modules.
rg -n -C2 '\b(input\.Extension|input\.BaseName)\b|getExtension\(|getBaseName\(' modules/nf-core

Repository: CSOgroup/TrESFlow

Length of output: 672


🏁 Script executed:

# Read the full context of the main.nf file to understand the input variable type
head -50 modules/nf-core/deeptools/bamcoverage/main.nf

Repository: CSOgroup/TrESFlow

Length of output: 2239


🏁 Script executed:

# Search for how other nf-core modules handle file extension detection and basename extraction
rg -n '\.(getExtension|getBaseName|extension|baseName)\(\)|\bext\b|\bstem\b' modules/nf-core --type groovy -A1 -B1 | head -100

Repository: CSOgroup/TrESFlow

Length of output: 43


🌐 Web query:

Nextflow Path object properties Extension BaseName vs getExtension getBaseName Groovy

💡 Result:

In Nextflow, for Path objects, prefer the Groovy-style properties baseName and extension (e.g., path.baseName, path.extension). These correspond to the underlying methods getBaseName() and getExtension(). 1) Nextflow Path “properties” - path.baseName returns the path name without its extension; for /some/path/file.tar.gz it’s file.tar [1]. - path.extension returns the path extension without the dot; for /some/path/file.txt it’s txt [1]. - These work in Nextflow’s docs/examples as path.baseName, path.extension, and path.name [2]. 2) getBaseName() / getExtension() are the method forms - The Nextflow TaskPath type explicitly defines getBaseName() and getExtension() methods [3]. - Groovy property access like path.extension maps to the getter getExtension() (and similarly path.baseNamegetBaseName()). This is consistent with how Groovy exposes extension-method properties for Path/File (via NioExtensions with getExtension(Path) and analogous base-name support) [4][5] and with Nextflow’s TaskPath.groovy implementing getBaseName() / getExtension() [6]. 3) What about “getExtension getBaseName Groovy”? - If you call the methods directly in Groovy, use parentheses: path.getExtension() and path.getBaseName(). - If you use properties (recommended/readable): path.extension and path.baseName. 4) Practical examples in Nextflow - From Nextflow’s working-with-files docs: - assert path.baseName == 'file' - assert path.extension == 'txt' - assert path.name == 'file.txt' - assert path.parent == '/some/path' [2]. So, “Extension BaseName vs getExtension getBaseName” is simply: extension/baseName are the property-style aliases for getExtension()/getBaseName() on Nextflow Path objects.

Citations:


Use valid path API for CRAM detection and output naming.

input.Extension and input.BaseName are not valid Nextflow Path properties. In Nextflow, Path objects use lowercase extension and baseName properties, or the methods getExtension() and getBaseName(). The capitalized versions will cause script evaluation failure at runtime.

Proposed fix
-    def is_cram = input.Extension == "cram" ? true : false
-    def input_out = is_cram ? input.BaseName + ".bam" : "${input}"
+    def is_cram = input.getExtension() == 'cram'
+    def input_out = is_cram ? "${input.getBaseName()}.bam" : "${input}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_cram = input.Extension == "cram" ? true : false
def input_out = is_cram ? input.BaseName + ".bam" : "${input}"
def is_cram = input.getExtension() == 'cram'
def input_out = is_cram ? "${input.getBaseName()}.bam" : "${input}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/deeptools/bamcoverage/main.nf` around lines 33 - 34, The CRAM
detection and output naming logic uses incorrect Nextflow Path property names
with uppercase letters. Replace `input.Extension` with `input.extension` and
replace `input.BaseName` with `input.baseName` in the lines where `is_cram` and
`input_out` variables are defined, as Nextflow Path API requires lowercase
property names or method calls like `getExtension()` and `getBaseName()`.

Comment on lines +61 to +64
stub:
def prefix = task.ext.prefix ?: "${meta.id}"
def extension = args.contains("--outFileFormat bedgraph") || args.contains("-of bedgraph") ? "bedgraph" : "bigWig"
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify undeclared use of `args` inside stub blocks in this file.
rg -n -C3 'stub:|def args|args\.contains' modules/nf-core/deeptools/bamcoverage/main.nf

Repository: CSOgroup/TrESFlow

Length of output: 832


Define args inside stub block before using it.

The stub block uses args.contains() on line 63 without declaring it, while the parallel script block correctly declares args on line 26. This causes a runtime error in stub execution.

Proposed fix
 stub:
     def prefix    = task.ext.prefix ?: "${meta.id}"
+    def args      = task.ext.args ?: ''
     def extension = args.contains("--outFileFormat bedgraph") || args.contains("-of bedgraph") ? "bedgraph" : "bigWig"
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/deeptools/bamcoverage/main.nf` around lines 61 - 64, The stub
block uses the args variable without declaring it first, while the script block
properly declares args. Add a declaration of args (likely from task.ext.args
with an empty string as default) at the beginning of the stub block, before the
line where args.contains() is called. This ensures the variable is defined when
checking for bedgraph format options, matching the pattern used in the script
block.

tag "deeptools"
tag "deeptools/bamcoverage"

test("homo_sampiens - bam") {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in test name.

Test name has "homo_sampiens" instead of "homo_sapiens".

✏️ Proposed fix
-    test("homo_sampiens - bam") {
+    test("homo_sapiens - bam") {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test("homo_sampiens - bam") {
test("homo_sapiens - bam") {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/deeptools/bamcoverage/tests/main.nf.test` at line 12, The
test name contains a typo: "homo_sampiens" should be corrected to "homo_sapiens"
(the correct scientific name for humans). Locate the test function and update
the test name string from "homo_sampiens - bam" to "homo_sapiens - bam" to fix
the spelling error.

Comment on lines +58 to +62
if [[ ${prefix} == *.cram ]]; then
samtools view -Ch -T ${fasta} -o ${prefix} ${prefix_bam}
rm ${prefix_bam}
samtools index ${prefix}
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing validation: CRAM output requires fasta reference.

The CRAM conversion logic uses ${fasta} without checking if it exists. If a user configures ext.prefix to end with .cram but doesn't provide a fasta reference, the samtools view -Ch -T ${fasta} command will fail because -T requires a valid reference file argument.

🛡️ Proposed fix to add validation
     # If cram files are wished as output, the run samtools for conversion
     if [[ ${prefix} == *.cram ]]; then
+        if [[ -z "${fasta}" ]]; then
+            echo "ERROR: CRAM output requires a reference fasta file" >&2
+            exit 1
+        fi
         samtools view -Ch -T ${fasta} -o ${prefix} ${prefix_bam}
         rm ${prefix_bam}
         samtools index ${prefix}
     fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/gatk4/markduplicates/main.nf` around lines 58 - 62, The CRAM
conversion block (when prefix ends with .cram) uses the fasta reference file
with the samtools view command but lacks validation that the fasta file exists.
Add a validation check inside the if block that tests whether the fasta variable
points to an existing file, and only execute the samtools view -Ch -T command
and subsequent operations if the fasta file is present. If fasta is missing, the
block should either error out with a clear message or skip the CRAM conversion.

Comment on lines +1 to +99
name: gatk4_markduplicates
description: This tool locates and tags duplicate reads in a BAM or SAM file,
where duplicate reads are defined as originating from a single fragment of
DNA.
keywords:
- bam
- gatk4
- markduplicates
- sort
tools:
- gatk4:
description: Developed in the Data Sciences Platform at the Broad Institute,
the toolkit offers a wide variety of tools with a primary focus on variant
discovery and genotyping. Its powerful processing engine and
high-performance computing features make it capable of taking on projects
of any size.
homepage: https://gatk.broadinstitute.org/hc/en-us
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
tool_dev_url: https://github.com/broadinstitute/gatk
doi: 10.1158/1538-7445.AM2017-3590
licence: ["MIT"]
identifier: ""
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: Sorted BAM file
pattern: "*.{bam}"
ontologies: []
- fasta:
type: file
description: Fasta file
pattern: "*.{fasta}"
ontologies: []
- fasta_fai:
type: file
description: Fasta index file
pattern: "*.{fai}"
ontologies: []
output:
cram:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*cram":
type: file
description: Marked duplicates CRAM file
pattern: "*.{cram}"
ontologies: []
bam:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*bam":
type: file
description: Marked duplicates BAM file
pattern: "*.{bam}"
ontologies: []
crai:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.crai":
type: file
description: CRAM index file
pattern: "*.{cram.crai}"
ontologies: []
bai:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.bai":
type: file
description: BAM index file
pattern: "*.{bam.bai}"
ontologies: []
metrics:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.metrics":
type: file
description: Duplicate metrics file generated by GATK
pattern: "*.{metrics.txt}"
ontologies: []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pattern mismatch: metrics file extension.

Line 98 specifies the pattern as *.{metrics.txt}, but the actual process (main.nf line 52) outputs ${prefix}.metrics without the .txt suffix. The pattern should match the actual output.

📝 Proposed fix
       - "*.metrics":
           type: file
           description: Duplicate metrics file generated by GATK
-          pattern: "*.{metrics.txt}"
+          pattern: "*.metrics"
           ontologies: []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: gatk4_markduplicates
description: This tool locates and tags duplicate reads in a BAM or SAM file,
where duplicate reads are defined as originating from a single fragment of
DNA.
keywords:
- bam
- gatk4
- markduplicates
- sort
tools:
- gatk4:
description: Developed in the Data Sciences Platform at the Broad Institute,
the toolkit offers a wide variety of tools with a primary focus on variant
discovery and genotyping. Its powerful processing engine and
high-performance computing features make it capable of taking on projects
of any size.
homepage: https://gatk.broadinstitute.org/hc/en-us
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
tool_dev_url: https://github.com/broadinstitute/gatk
doi: 10.1158/1538-7445.AM2017-3590
licence: ["MIT"]
identifier: ""
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: Sorted BAM file
pattern: "*.{bam}"
ontologies: []
- fasta:
type: file
description: Fasta file
pattern: "*.{fasta}"
ontologies: []
- fasta_fai:
type: file
description: Fasta index file
pattern: "*.{fai}"
ontologies: []
output:
cram:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*cram":
type: file
description: Marked duplicates CRAM file
pattern: "*.{cram}"
ontologies: []
bam:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*bam":
type: file
description: Marked duplicates BAM file
pattern: "*.{bam}"
ontologies: []
crai:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.crai":
type: file
description: CRAM index file
pattern: "*.{cram.crai}"
ontologies: []
bai:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.bai":
type: file
description: BAM index file
pattern: "*.{bam.bai}"
ontologies: []
metrics:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.metrics":
type: file
description: Duplicate metrics file generated by GATK
pattern: "*.{metrics.txt}"
ontologies: []
name: gatk4_markduplicates
description: This tool locates and tags duplicate reads in a BAM or SAM file,
where duplicate reads are defined as originating from a single fragment of
DNA.
keywords:
- bam
- gatk4
- markduplicates
- sort
tools:
- gatk4:
description: Developed in the Data Sciences Platform at the Broad Institute,
the toolkit offers a wide variety of tools with a primary focus on variant
discovery and genotyping. Its powerful processing engine and
high-performance computing features make it capable of taking on projects
of any size.
homepage: https://gatk.broadinstitute.org/hc/en-us
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
tool_dev_url: https://github.com/broadinstitute/gatk
doi: 10.1158/1538-7445.AM2017-3590
licence: ["MIT"]
identifier: ""
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bam:
type: file
description: Sorted BAM file
pattern: "*.{bam}"
ontologies: []
- fasta:
type: file
description: Fasta file
pattern: "*.{fasta}"
ontologies: []
- fasta_fai:
type: file
description: Fasta index file
pattern: "*.{fai}"
ontologies: []
output:
cram:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*cram":
type: file
description: Marked duplicates CRAM file
pattern: "*.{cram}"
ontologies: []
bam:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*bam":
type: file
description: Marked duplicates BAM file
pattern: "*.{bam}"
ontologies: []
crai:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.crai":
type: file
description: CRAM index file
pattern: "*.{cram.crai}"
ontologies: []
bai:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.bai":
type: file
description: BAM index file
pattern: "*.{bam.bai}"
ontologies: []
metrics:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.metrics":
type: file
description: Duplicate metrics file generated by GATK
pattern: "*.metrics"
ontologies: []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/gatk4/markduplicates/meta.yml` around lines 1 - 99, The
metrics output section declares a pattern of `*.{metrics.txt}` but the actual
process outputs files with the pattern `*.{metrics}` without the `.txt` suffix.
Update the pattern field in the metrics output section to match the actual file
output by changing it from `*.{metrics.txt}` to `*.{metrics}`.

Comment on lines +64 to +68
- "*.html":
type: file
description: MultiQC report file
pattern: ".html"
ontologies: []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a valid glob pattern for report output.

Line 67 uses pattern: ".html", which does not match emitted filenames like multiqc_report.html. Use *.html to reflect the actual output contract.

Proposed fix
-          pattern: ".html"
+          pattern: "*.html"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- "*.html":
type: file
description: MultiQC report file
pattern: ".html"
ontologies: []
- "*.html":
type: file
description: MultiQC report file
pattern: "*.html"
ontologies: []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/multiqc/meta.yml` around lines 64 - 68, Update the pattern
field for the HTML output in the MultiQC module metadata to use a valid glob
pattern. Change the pattern value from ".html" to "*.html" to correctly match
actual emitted filenames like multiqc_report.html. This ensures the pattern
field accurately reflects the output contract of the module.

Comment on lines +85 to +88
- "*_plots":
type: file
description: Plots created by MultiQC
pattern: "*_plots"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align plots output type with the process contract.

Line 86 declares plots as type: file, but modules/nf-core/multiqc/main.nf emits path("*_plots"), which is a directory. This cross-file mismatch can break schema/lint contract checks.

Proposed fix
   plots:
@@
       - "*_plots":
-          type: file
+          type: directory
           description: Plots created by MultiQC
           pattern: "*_plots"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- "*_plots":
type: file
description: Plots created by MultiQC
pattern: "*_plots"
- "*_plots":
type: directory
description: Plots created by MultiQC
pattern: "*_plots"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/multiqc/meta.yml` around lines 85 - 88, The type declaration
for the `*_plots` output does not match the actual emission type from the
Nextflow process. Change the `type: file` declaration for the `*_plots` output
to `type: directory` to correctly reflect that the process emits a directory
path (via `path("*_plots")` in main.nf) rather than a single file, ensuring the
schema aligns with the actual process contract.

Comment on lines +151 to +152
file('https://github.com/samtools/samtools/raw/refs/heads/develop/test/quickcheck/1.quickcheck.badeof.bam', checkIfExists: true),
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin remote quickcheck fixtures to immutable revisions

Line 151, Line 172, and Line 193 fetch test data from refs/heads/develop, which is mutable and can make CI nondeterministic. Please switch to immutable commit-SHA URLs (or vendored local fixtures) so snapshots and expected exit codes stay stable over time.

Also applies to: 172-173, 193-194

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/nf-core/samtools/quickcheck/tests/main.nf.test` around lines 151 -
152, The remote fixture URLs at lines 151-152, 172-173, and 193-194 use mutable
branch references (refs/heads/develop) which can cause non-deterministic CI
behavior. Replace all occurrences of the github.com URLs pointing to
refs/heads/develop with immutable commit-SHA based URLs (e.g.,
refs/heads/develop replaced with a specific commit hash) or vendor the test
fixtures locally to ensure snapshots and expected exit codes remain stable over
time.

…tion

# Conflicts:
#	conf/base.config
#	conf/test.config
#	docs/architecture/implemented_pipeline.md
#	docs/output.md
#	workflows/treseq.nf

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/output.md (1)

138-149: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Document only the report fields that are actually rendered.

The current report schema does not include any UMI column or RNA UMI summary: build_export_rows() and sequencing_table() only output read counts, mapping, barcode rates, and usable reads. The bullets for “UMI/no-UMI fields” and “RNA observed UMI count” therefore overstate the shipped report.

Suggested fix
-- a detailed per-library sequencing QC table for barcode, read-count, and UMI/no-UMI fields
+- a detailed per-library sequencing QC table for barcode, read-count, mapping, and usable-read fields
...
-- RNA observed UMI count from UMI tagging counts
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/output.md` around lines 138 - 149, The report description is listing
fields that are not actually rendered by the current report output. Update the
documentation text in the report summary so it matches the real schema produced
by build_export_rows() and sequencing_table(), keeping only the fields that
appear in the HTML report. Remove the references to UMI/no-UMI fields and RNA
observed UMI count, and make sure the bullets stay aligned with the rendered RNA
mapping, DNA mapped/unique-read, barcode rate, and usable-read metrics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bin/render_tres_report.py`:
- Around line 443-462: The aggregate RNA genome-mapping and DNA mapped-read
metrics are computed in build_metrics() but never exposed because
main_statistics only contains RNA transcriptome and DNA unique reads. Update the
main_statistics list in render_tres_report.py to include the overall RNA
genome-mapped card and the DNA mapped-reads card alongside the existing entries,
using the corresponding values, counts, denominators, and subtitles from
mapping_quality. This will ensure those aggregate metrics show up in both the
HTML cards and the CSV/Excel exports driven by main_statistics.

---

Outside diff comments:
In `@docs/output.md`:
- Around line 138-149: The report description is listing fields that are not
actually rendered by the current report output. Update the documentation text in
the report summary so it matches the real schema produced by build_export_rows()
and sequencing_table(), keeping only the fields that appear in the HTML report.
Remove the references to UMI/no-UMI fields and RNA observed UMI count, and make
sure the bullets stay aligned with the rendered RNA mapping, DNA
mapped/unique-read, barcode rate, and usable-read metrics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42e6af5f-a9a5-49d3-9c9e-4384ac5761be

📥 Commits

Reviewing files that changed from the base of the PR and between 24b6560 and 8525b05.

📒 Files selected for processing (8)
  • README.md
  • bin/render_tres_report.py
  • conf/base.config
  • conf/test.config
  • docs/architecture/implemented_pipeline.md
  • docs/output.md
  • docs/usage.md
  • workflows/treseq.nf
💤 Files with no reviewable changes (4)
  • conf/test.config
  • conf/base.config
  • workflows/treseq.nf
  • docs/architecture/implemented_pipeline.md
✅ Files skipped from review due to trivial changes (1)
  • docs/usage.md

Comment thread bin/render_tres_report.py
Comment on lines +443 to +462
main_statistics = [
{
"metric": "RNA confidently mapped to transcriptome",
"value": mapping_quality["rna_confidently_mapped_to_transcriptome_percent"],
"count": rna_transcriptome_count,
"denominator": rna_raw_reads,
"value_type": "percent",
"subtitle": "STARsolo GeneFull unique reads / raw RNA reads",
"modality": "RNA",
},
{
"metric": "DNA unique reads",
"value": mapping_quality["dna_unique_reads_percent"],
"count": dna_usable_count,
"denominator": dna_raw_reads,
"value_type": "percent",
"subtitle": "Final NoDup BAM read pairs / raw DNA reads",
"modality": "DNA",
},
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Expose the aggregate genome/mapped cards as main statistics.

build_metrics() already computes overall RNA genome-mapping and DNA mapped-read percentages, but main_statistics only includes RNA transcriptome and DNA unique reads. Because both the HTML cards and CSV/Excel export are driven from main_statistics, those two aggregate metrics never appear anywhere in the report.

Suggested fix
     main_statistics = [
         {
             "metric": "RNA confidently mapped to transcriptome",
             "value": mapping_quality["rna_confidently_mapped_to_transcriptome_percent"],
             "count": rna_transcriptome_count,
             "denominator": rna_raw_reads,
             "value_type": "percent",
             "subtitle": "STARsolo GeneFull unique reads / raw RNA reads",
             "modality": "RNA",
         },
+        {
+            "metric": "RNA confidently mapped to genome",
+            "value": mapping_quality["rna_confidently_mapped_to_genome_percent"],
+            "count": rna_genome_count,
+            "denominator": rna_raw_reads,
+            "value_type": "percent",
+            "subtitle": "STARsolo unique genome reads / raw RNA reads",
+            "modality": "RNA",
+        },
+        {
+            "metric": "DNA confidently mapped reads",
+            "value": mapping_quality["dna_confidently_mapped_percent"],
+            "count": dna_mapped_count,
+            "denominator": dna_raw_reads,
+            "value_type": "percent",
+            "subtitle": "Aligned DNA read pairs / raw DNA reads",
+            "modality": "DNA",
+        },
         {
             "metric": "DNA unique reads",
             "value": mapping_quality["dna_unique_reads_percent"],
             "count": dna_usable_count,
             "denominator": dna_raw_reads,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
main_statistics = [
{
"metric": "RNA confidently mapped to transcriptome",
"value": mapping_quality["rna_confidently_mapped_to_transcriptome_percent"],
"count": rna_transcriptome_count,
"denominator": rna_raw_reads,
"value_type": "percent",
"subtitle": "STARsolo GeneFull unique reads / raw RNA reads",
"modality": "RNA",
},
{
"metric": "DNA unique reads",
"value": mapping_quality["dna_unique_reads_percent"],
"count": dna_usable_count,
"denominator": dna_raw_reads,
"value_type": "percent",
"subtitle": "Final NoDup BAM read pairs / raw DNA reads",
"modality": "DNA",
},
]
main_statistics = [
{
"metric": "RNA confidently mapped to transcriptome",
"value": mapping_quality["rna_confidently_mapped_to_transcriptome_percent"],
"count": rna_transcriptome_count,
"denominator": rna_raw_reads,
"value_type": "percent",
"subtitle": "STARsolo GeneFull unique reads / raw RNA reads",
"modality": "RNA",
},
{
"metric": "RNA confidently mapped to genome",
"value": mapping_quality["rna_confidently_mapped_to_genome_percent"],
"count": rna_genome_count,
"denominator": rna_raw_reads,
"value_type": "percent",
"subtitle": "STARsolo unique genome reads / raw RNA reads",
"modality": "RNA",
},
{
"metric": "DNA confidently mapped reads",
"value": mapping_quality["dna_confidently_mapped_percent"],
"count": dna_mapped_count,
"denominator": dna_raw_reads,
"value_type": "percent",
"subtitle": "Aligned DNA read pairs / raw DNA reads",
"modality": "DNA",
},
{
"metric": "DNA unique reads",
"value": mapping_quality["dna_unique_reads_percent"],
"count": dna_usable_count,
"denominator": dna_raw_reads,
"value_type": "percent",
"subtitle": "Final NoDup BAM read pairs / raw DNA reads",
"modality": "DNA",
},
]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/render_tres_report.py` around lines 443 - 462, The aggregate RNA
genome-mapping and DNA mapped-read metrics are computed in build_metrics() but
never exposed because main_statistics only contains RNA transcriptome and DNA
unique reads. Update the main_statistics list in render_tres_report.py to
include the overall RNA genome-mapped card and the DNA mapped-reads card
alongside the existing entries, using the corresponding values, counts,
denominators, and subtitles from mapping_quality. This will ensure those
aggregate metrics show up in both the HTML cards and the CSV/Excel exports
driven by main_statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants