modules/
custom_orfmerge

Cluster normalised per-sample, per-caller ORF predictions into a single cohort-level catalogue. Pair with custom/orfnormalise upstream and (typically) bedtools/getfasta + seqkit/translate downstream to obtain the AA FASTA.

Strategy is class-aware (operating on the harmonised orf_class written by custom/orfnormalise):

canonical_cds: collapse by (transcript_id, strand). One canonical CDS per transcript by definition.
uORF, dORF, other: collapse by (transcript_id, strand, start, end). A single transcript can host multiple distinct uORFs / dORFs / internal ORFs, so keying on the outer span keeps them in separate clusters while still merging cross-caller calls that agree on coordinates.
novel_u, smORF: greedy reciprocal-overlap clustering on the outer genomic span at --reciprocal-overlap (default 0.8). Catches fuzzy cross-caller matches and exact-coordinate collapses in one pass. Order-dependent at the boundary: a chain A-B-C where A-B and B-C overlap at ~0.85 but A-C only at ~0.75 may cluster as {A,B,C} or {A,B}+{C} depending on iteration order. Rare in practice at 0.8.

Cross-caller consensus is recorded in two column families on the catalogue TSV:

called_by_<caller>: 0/1 indicator per supported caller (ribotish, ribocode, ribotricer, rpbp, price).
score_<caller>: best score from that caller within the cluster. Score direction is per-caller (p-values are minimised; Bayes factors / phase scores are maximised).

Cross-sample recurrence is recorded in two further columns:

n_samples: number of distinct samples contributing to the cluster (a cohort recurrence metric).
samples: sorted, comma-separated list of those sample ids.

Emits a small MultiQC custom-content TSV (per-class counts) for inclusion in downstream MultiQC reports.

Alongside the full catalogue, emits a consensus view (*.consensus.*) filtered to ORFs supported by at least --min-callers distinct callers and recurring in at least --min-samples samples (both default 1, i.e. no filtering, so the consensus view equals the full catalogue). Raising either threshold yields a higher-confidence catalogue without altering the full one.

orf ribo-seq catalogue merge clustering

https://github.com/nf-core/modules/tree/master/modules/nf-core/custom/orfmerge

Description

Strategy is class-aware (operating on the harmonised orf_class written by custom/orfnormalise):

canonical_cds: collapse by (transcript_id, strand). One canonical CDS per transcript by definition.
uORF, dORF, other: collapse by (transcript_id, strand, start, end). A single transcript can host multiple distinct uORFs / dORFs / internal ORFs, so keying on the outer span keeps them in separate clusters while still merging cross-caller calls that agree on coordinates.
novel_u, smORF: greedy reciprocal-overlap clustering on the outer genomic span at --reciprocal-overlap (default 0.8). Catches fuzzy cross-caller matches and exact-coordinate collapses in one pass. Order-dependent at the boundary: a chain A-B-C where A-B and B-C overlap at ~0.85 but A-C only at ~0.75 may cluster as {A,B,C} or {A,B}+{C} depending on iteration order. Rare in practice at 0.8.

Cross-caller consensus is recorded in two column families on the catalogue TSV:

called_by_<caller>: 0/1 indicator per supported caller (ribotish, ribocode, ribotricer, rpbp, price).
score_<caller>: best score from that caller within the cluster. Score direction is per-caller (p-values are minimised; Bayes factors / phase scores are maximised).

Cross-sample recurrence is recorded in two further columns:

n_samples: number of distinct samples contributing to the cluster (a cohort recurrence metric).
samples: sorted, comma-separated list of those sample ids.

Emits a small MultiQC custom-content TSV (per-class counts) for inclusion in downstream MultiQC reports.

Input

Name

Description

Pattern

`0` ()

`1` ()

`2` ()

Output

Name

Description

Pattern

`0` ()

Tools

orfmerge Documentation

Python helper that clusters normalised ORF BED12+TSV pairs across callers and samples into one unified catalogue, recording per-caller provenance and best score in the output table.

https://github.com/nf-core/modules/blob/master/modules/nf-core/custom/orfmerge/main.nfLicense: MIT

Authors

get in touch

Ask a question on Slack

Open an issue on GitHub

Paste this command into your terminal to download this module.

modules/custom_orfmerge

Description

Input

0 ()

1 ()

2 ()

Output

0 ()

0 ()

0 ()

0 ()

0 ()

0 ()

0 ()

0 ()

Tools

orfmerge Documentation

Authors

get in touch

modules/
custom_orfmerge

`0` ()

`1` ()

`2` ()

`0` ()

`0` ()

`0` ()

`0` ()

`0` ()

`0` ()

`0` ()

`0` ()