Description

Convert one ORF caller's per-sample output table into a unified BED12 plus a sidecar metadata TSV, ready for cross-caller merging.

An "ORF caller" is a tool that scans ribosome-profiling (Ribo-seq) data and predicts which open reading frames are being translated. Each caller writes its own table format and uses its own location encoding, classification vocabulary, and confidence score. This module reconciles five callers into one harmonised schema. The caller val input selects the parser; supported values:

  • ribocode (RiboCode predicted ORF table; transcript-coord input, lifted to genomic blocks against the GTF)
  • ribotish (Ribo-TISH predict output; GenomePos + optional Blocks)
  • ribotricer (Ribotricer detect-orfs translating ORFs TSV; ORF span parsed from ORF_ID, multi-exon blocks recovered by intersecting with host-transcript exon structure from the GTF)
  • rpbp (Rp-Bp predicted-orfs BED12 with extra columns)
  • price (PRICE orfs.tsv; Gedi-style Location field, already genomic)

Output BED12 column order: chrom start end name score strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts The BED name column carries <caller>|<caller-native-id>. The BED score column is the caller's native score rescaled to 0-1000 (higher == more confident regardless of native direction).

Output sidecar TSV columns: orf_id caller sample_id chrom start end strand gene_id transcript_id orf_class aa_length score

Harmonised orf_class vocabulary written into the sidecar TSV:

  • canonical_cds: ORF maps to an annotated CDS (including truncated / extended variants of one).
  • uORF: upstream ORF (5'UTR-resident).
  • dORF: downstream ORF (3'UTR-resident).
  • novel_u: novel / intergenic ORF not assigned to an annotated CDS.
  • smORF: small ORF (aa_length <= 100); promoted regardless of location-based class so downstream tools can treat smORFs uniformly.
  • other: internal / overlap / frame variants and anything else.

Per-caller mapping notes (lossy collapses):

  • PRICE iORF (internal ORF), intronic, and orphan map to other. Cross-caller catalogue tracking still flags these via called_by_price, but the specific PRICE sub-type is not preserved.
  • Rp-Bp's predicted-orfs BED carries no ORF-type column; this module defaults every Rp-Bp call to canonical_cds (the post- selectfinalpredictionset curated set is dominated by canonical CDSs). uORF/dORF/novel calls present in Rp-Bp's separate .tab.gz / extracted-orfs.bed.gz files are not propagated here.

Each caller's native confidence score has a "direction" - some are lower-is-better (p-values), some are higher-is-better (Bayes factors, phase scores):

ribocode: min (combined p-value) ribotish: min (combined p-value) ribotricer: max (phase_score) rpbp: max (Bayes factor mean) price: min (p-value)

Downstream merging uses this to pick the best per-ORF call.

Input

Name
Description
Pattern

0 ()

1 ()

2 ()

0 ()

1 ()

Output

Name
Description
Pattern

0 ()

0 ()

0 ()

Tools

orfnormalise Documentation

Python helper that parses any of five Ribo-seq ORF caller output tables and emits a unified BED12 + sidecar TSV. Caller is selected by the caller val input.

ribocode Documentation

Identifying genome-wide translated ORFs from Ribo-seq data

doi: 10.1093/nar/gky179License: GPL-3.0

ribotish Documentation

Ribo-seq based ORF prediction with Ribo-TISH

ribotricer Documentation

Ribosome profiling P-site phasing-based ORF detection

rpbp Documentation

Translated ORF identification with Rp-Bp

price Documentation

Probabilistic inference of codon activities by an EM algorithm (PRICE)

doi: 10.1038/nmeth.4631License: Apache-2.0