Skip to content

Arriba v2.0.0

Compare
Choose a tag to compare
@suhrig suhrig released this 11 Oct 13:54
· 168 commits to master since this release
  • report viral integration sites
  • report fusions supported by multi-mapping reads (e.g., CIC-DUX4, NPM1-ALK)
  • report internal tandem duplications (e.g., FLT3, BCOR, ERBB2, NOTCH1)
  • improved detection of IG/TCR rearrangements
  • known fusions file based on the Mitelman database is now part of the download
  • more comprehensive annotation (gene IDs, transcript IDs, user-defined tags, retained protein domains)
  • support for mouse (mm10)
  • (optionally) report the full transcript/peptide sequence (parameter -I) rather than only what can be assembled from the supporting reads
  • structural variants can be supplied in VCF format (parameter -d)
  • MacOS support
  • faster loading of BAM files thanks to HAT-trie map as well as other speed improvements
  • draw_fusions.R accepts the format of STAR-Fusion
  • ability to make use of external duplicate marking, e.g., for UMIs (parameter -u)
  • enhanced blacklist
  • simplified code compilation procedure
  • support assemblies with up to 65,000 contigs (previously 32,000)

Important compatibility notes when upgrading from version 1.x:

  • STAR version >= 2.7.6a is required to make use of multi-mapping chimeric reads
  • new columns were added to the output files and some were rearranged
  • the parameter -P is obsolete; the parameters -I and -T have been repurposed
  • parsing of input TSV files (GTF, known fusions, blacklist, structural variants) is now stricter
  • the order of the genes in the known fusions file (parameter -k) is now important
  • the reading_frame column may contain the new value stop-codon
  • the site1/2 columns may contain new values
  • the parameters of the run_arriba.sh script have changed
  • the download_references.sh script is now parameterized using environment variables
  • the chr prefix is no longer removed from the output files
  • the alignment parameters of run_arriba.sh are set to report up to 50 multi-mapping reads
  • some filters were removed/renamed, which is relevant if the parameter -f is used