Skip to content

Output Formats

Chee-Hong Wong edited this page Jan 2, 2018 · 2 revisions

There are two sets of output from Picky pipeline.

Set 1 : Read alignments

In Picky's selectRep step, selected representative alignments for reads are recorded in .align file. It is a text file that records parameters needed for .sam file generation and most importantly the selected alignments from all lastal generated alignments used to chain the representative alignments for each read.

header section

The lastal version and sequence database are recorded at the beginning of file to help with the .sam file generation in callSV step.

# @PG_ID        lastal
# @PG_PN        lastal
# @PG_VN        755
# @PG_DB        hg19.lastdb
# @PG_END

Read representative section

For each read that lastal returns the list of possible alignments, Picky selectRep will record some general information along with the chosen alignment candidate(s).

# 23535 fmh_15l4526_20161102_FNFAB42810_MN16457_sequencing_run_161102_Human_genomic_run4_LSK108R9_4_36382_ch269_read7280_strand.fast5   (X)     align(110,60)   seed(3) nonseed(2)
# score EG2     E       =       %=      X       %X      D       %D      I       %I      qStart  qEnd    qStrand qALen   q%      refId   refStart        refEnd  refStrand       refALen cigar
### candidate#1/1
S6802   0       0       9959    88.12   277     2.45    657     5.81    409     3.62    2       10647   +       10645   45.23   chr20   40787508        40798401        +       10893   2S18=2D11=...<snipped>...2I2X37=12888S
E3460   0       0       5134    88.12   158     2.71    278     4.77    256     4.39    10629   16177   +       5548    23.57   chr20   40798415        40803985        +       5570    10629S17=1X1=...<snipped>...8=1X9=7358S
e512    5.5e-280        7.1e-284        964     83.10   47      4.05    91      7.84    58      5.00    16225   17294   +       1069    4.54    chr20   40804046        40805148        +       1102    16225S12=1X1D...<snipped>...31=1D11=6241S
E3919   0       0       5716    88.55   189     2.93    317     4.91    233     3.61    17360   23498   +       6138    26.08   chr20   40805236        40811458        +       6222    17360S41=1X23=...<snipped>...11=1D12=37S

FIRST LINE

The first line is the general information. It starts with the length of the read, and the read id. The third cell indicates the general selection result as follow.

3rd cell Description
{!!!} there is no alignment left after filtering
(1) single fragment single-locus alignment; no SV possible
(X) Multi-fragments with single locus; possible SV
[ ] Multi-fragments with multi-loci; possible SV but non-unique alignment location

The fourth cell align(x,y) reports the total number of alignments (x) reported by lastal and the total number of alignments left (y) after filtering by EG2/E-value and/or %Identity.

The fifth and sixth cells tells us the number of collated seed alignments in seed(..) and the number of collated non-seed alignments in nonseed(..).

SECOND LINE

The second line is the header for each of the tab-delimitered columns of read alignment records. The columns are self-explanatory except for the encoding used in the first character of the score column. Possible letters are "S" for seed alignment, "E" (note cap) for seed alignment used as extension, and "e" (note small letter) for alignment used as extension.

THE CANDIDATE BLOCK

Each candidate extension alignment 'block' is prefixed with '### candidate#<x>/<y>'. A blank line or the prefix marks the end of the block. The alignment rows in each block is ordered by the read coordinates.

Set 2 : SVs Calling

In Picky's callSV step, SV-specific files are generated along with other auxiliary files.

.xls Output

File RecordType Description
<oprefix>.DEL.xls Span tab-delimited file for deletions
<oprefix>.INS.xls Span-like file for insertions
<oprefix>.INDEL.xls Span / Span-like tab-delimited file for possible co-insertion-and-deletion
<oprefix>.INV.xls Span & Breakpoint tab-delimited file for inversions
<oprefix>.TTLC.xls Breakpoint tab-delimited file for translocations
<oprefix>.TDSR.xls Span tab-delimited file for tandem duplications where read span the junction
<oprefix>.TDC.xls Span tab-delimited file for tandem duplications where read completely cover the duplications
<oprefix>.xls N.A. tab-delimited file for all read alignment segments
Span RecordType

Span record type will have the columns "SVChrom", "SVStart", "SVEnd", and "SVSpan".

Breakpoint RecordType

Breakpoint record type will have the columns "SVChrom1", "SVPos1", "SVStrand1", "SVChrom2", "SVPos2", and "SVStrand2".

Auxiliary files

File Description
<oprefix>.profile.sam read alignment in .sam format. (See Picky-specific tags below.)
<oprefix>.profile.bed 6-columns bed file for aligned read fragment with optional 7th column recording the SV(s) harbored in the read; use to aid visualization in IGV with additional filtering
<oprefix>.profile.exclude Records all reads (id) which has no alignment candidate. An alignment summary is appended at the end.

Picky-specific SAM/BAM tags

Tag Type Description
zi f %Identity of this aligned fragment
zq f Percentage of the read length this aligned fragment represent
zl i This aligned fragment length
zs i Lastal's score for this aligned fragment
ze f Lastal reported EG2 value for this aligned fragment
zt c This aligned read fragment type.
"S" for seed alignment,
"E" (note cap) for seed alignment used as extension, and
"e" (note small letter) for alignment used as extension.
zc i Number of alignment candidates for this read
zk Z Read alignment category.
MC : multiple candidates
SCSF : Single candidate single fragment; NOT a split read
SCMFSL : Single candidate multiple fragment single locus, i.e. split read aligned to a single genomic location
SCMFML : Single candidate multiple fragment multiple location, i.e. split read with some fragments aligned to multiple genomic locations
zn Z This read alignment fragment is f<i>/<total_fragments>
CO Z Comment tag is used to recorded the detected SVs for the read

Set 2 : SVs in VCF SVs (Experimental)

Please refer to Variant calling data files section for VCF v4.1, VCF v4.2, and VCF v4.3 specification.

The specific INFO field entries reported by Picky are:

Entry Description
IMPRECISE/PRECISE Indicates the confidence of the exact breakpoint positions (bp).
SVMETHOD= "picky"; SV detection method
END= The position (bp) of the second breakpoint of the reported SV.
SVTYPE= The type of the SV.
[DEL,INS,DUP, and BND]
RE= Number of reads supporting the reported SV.
RNAMES= A comma separated list of read names that support the reported SV.
SVLEN= Indicates the length of SVs.
CIPOS= Confidence interval around POS.
CIEND= Confidence interval around END.
NOTE= Additional notes on called SV.
ISVTYPE= Internal type of structural variant supporting the reported SV separated by comma and suffix with "(<number of read support>)".
Internal type = [DEL,INS,INDEL,TDC,TDSR,TTLC,INV]
BERS= Breakend replacement string; replicate of ALT for float tip in IGV.