Arriba v2.0.0
- report viral integration sites
- report fusions supported by multi-mapping reads (e.g., CIC-DUX4, NPM1-ALK)
- report internal tandem duplications (e.g., FLT3, BCOR, ERBB2, NOTCH1)
- improved detection of IG/TCR rearrangements
- known fusions file based on the Mitelman database is now part of the download
- more comprehensive annotation (gene IDs, transcript IDs, user-defined tags, retained protein domains)
- support for mouse (mm10)
- (optionally) report the full transcript/peptide sequence (parameter
-I
) rather than only what can be assembled from the supporting reads - structural variants can be supplied in VCF format (parameter
-d
) - MacOS support
- faster loading of BAM files thanks to HAT-trie map as well as other speed improvements
draw_fusions.R
accepts the format of STAR-Fusion- ability to make use of external duplicate marking, e.g., for UMIs (parameter
-u
) - enhanced blacklist
- simplified code compilation procedure
- support assemblies with up to 65,000 contigs (previously 32,000)
Important compatibility notes when upgrading from version 1.x:
- STAR version >= 2.7.6a is required to make use of multi-mapping chimeric reads
- new columns were added to the output files and some were rearranged
- the parameter
-P
is obsolete; the parameters-I
and-T
have been repurposed - parsing of input TSV files (GTF, known fusions, blacklist, structural variants) is now stricter
- the order of the genes in the known fusions file (parameter
-k
) is now important - the
reading_frame
column may contain the new valuestop-codon
- the
site1/2
columns may contain new values - the parameters of the
run_arriba.sh
script have changed - the
download_references.sh
script is now parameterized using environment variables - the
chr
prefix is no longer removed from the output files - the alignment parameters of
run_arriba.sh
are set to report up to 50 multi-mapping reads - some filters were removed/renamed, which is relevant if the parameter
-f
is used