Picky commands

Running without any parameter


will provide the list of commands picky support.

Please specify the command.

./ <command> -h

<command> [hashFq, selectRep, callSV]
hashFq    : hash read uuids to friendly ids
lastParam : Last parameters for alignment
selectRep : select representative alignments for read
callSV    : call structural variants
xls2vcf   : convert Picky sv xls file to vcf
sam2align : convert sam to align format
preparepbs: chunk last fastq file and write pbs script for cluster submission
script    : write a bash-script for single fastq processing

./ hashFq

OPTIONAL step to hash read uuids to human-friendly ids.

./ hashFq --pfile <passFQFile> --ffile <failFQFile> --oprefix <outputPrefix>

  --pfile STR    pass .fastq file
  --ffile STR    fail .fastq file
  --oprefix STR  prefix to output filename

./ lastParam

Return the suggested alignment parameters to be used with lastal.

It is advisable to use ./ script to set up the pipeline. See Quick start for example.

./ selectRep

Select representative alignments from lastal's maf output.

./ selectRep [--thread <numberOfThreads>] [--preload <preloadFold>]

--thread INT   number of threads
--preload INT  Fold of thread count to preload maf records


Read from STDIN


Output directly to console (stdout) the selected representative alignment for the read in .align format.


parameter description
--thread number of threads to be used for alignment
For faster turn-around, use more threads but this should not exceed the number of cores available on your machine.
--preload fold of preloading for read alignments.
Uses more memory but shorten turnaround time by allowing alignment and selectRep steps to be executed concurrently.

./ callSV

Perform SV calling on the .align file generated by selectRep.

./ callSV --in <alignFile> --fastq <fqFile> --lastpara <last parameters> [--genome <genomeFastaFile> --removehomopolymerdeletion] [--sam] [--exlucde <chromosomeToExeclude> [--exlucde <anotherChromosomeToExeclude>]]

  --oprefix STR   prefix for output files
  --fastq STR     .fastq file
  --lastpara STR  lastal parameters used
                  exclude DEL and INDEL possibly affected by homopolymer
  --genome STR    genome sequence in .fasta file
  --sam           flag to output .sam file
  --exclude STR   exclude SV invovling specified chromosome
                  (specify each chromosome with --exclude individually)
  --multiloci     report SVs on best alignment of multi-loci aligments


Provide .align file from Picky selectrep via STDIN or "--in"


Output a set of SV .xls files along with auxiliary files. See Output Format's Set 2 : SVs Calling.


parameter description
--oprefix prefix for output files
--fastq fastq file containing reads analyzed
--lastpara specified lastal parameters used which will be recorded in .sam file
--sam indicate .sam file to be generated
--exclude exclude SV invovling specified chromosome
specify each chromosome with --exclude individually
--multiloci report SVs on best alignment of multi-loci aligments
--removehomopolymerdeletion OPTIONAL: exclude DEL and INDEL possibly affected by homopolymer
ONLY necessary if you are using earlier base-called fastq
--genome OPTIONAL: genome sequence in .fasta file;
ONLY necessary if you are using "--removehomopolymerdeletion"

./ xls2vcf

Convert .xls SV files generated by callSV to .vcf file.

./ xls2vcf --xls <picky_xls_file> [--chrom <chromosome>] [--re <minReadsSupport>]

  --xls STR       picky SV xls file
  --chrom STR     restrict output to specified chromosomes [e.g. chr20]
  --re INT        min number of read evidence [default:2]
  --merge         window to merge SV [default: 1000 bp]
  --converge      window which SVs are considered converged concordantly [default: 20 bp]


parameter description
--xls SV .xls file generated by callSV
multiple .xls files separated by comma or each .xls file prefix with --xls
i.e. "--xls sv.del.xls,sv.indel.xls" and "--xls sv.del.xls --xls sv.indel.xls" are equivalent


Output directly to console (stdout) in .vcf format.


parameter description
--chrom report SVs found on specified chromosomes
--re report SVs that has at least this required number of reads support [default:2]
--merge window to merge SV [default: 1000 bp]
--converge window which SVs are considered converged concordantl [default: 20 bp]

./ sam2align

Convert .sam content to .align format for callSV.


Input from console (stdin) in .sam format.

NOTE: The sam records should be read-blocked, i.e. alignment records from the same read should be contiguous. The tag value of 'SO:' must be "queryname" in the header line '@HD' or the tag 'SO:' excluded.


Output directly to console (stdout) in .align format.

NOTE: A large number of output columns are specific to LAST output and for tracebility. sam2align only output the minimum columns needed for callSV. The minimum columns are qStrand, qStart, qEnd for read/query and refId, refStrand, refStart and refEnd for reference/subject.

./ preparepbs

Chunk the specified .fastq file and write PBS scripts instantiated from the template "template.pbs" for all chunk. This prepapres files for cluster jobs to be submitted.

./ preparePBS --fastq <fastq_file> [--chunksize <numberOfReadsPerChunk>] [--template <template_file>]

  --fastq STR      fastq file
  --chunksize INT  number of fastq record per chunk file [default: 1000]
  --template STR   template file for PBS script [default: template.pbs]
  --init STR       write a copy of the template to specific file

See cluster support for an detail example.


parameter description
--fastq fastq file to be analyzed


Write chunked fastq file for each <chunksize> fastq records from the specified fastq file along with the corresponding PBS script.

For a specified fastq "SCP20.fastq" with says 277,054 reads, Picky preparepbs will generate 278 chunk .fastq files (SCP20-c000001.fastq, SCP20-c000002.fastq, ..., SCP20-c000278.fastq) and the corresponding 278 PBS scripts (SCP20-c000001.pbs, SCP20-c000002.pbs, ..., SCP20-c000278.pbs).


parameter description
--chunksize number of fastq record per chunk file [default: 1000]
Large chunksize means longer run time, but less number of chunk files to manage.
You should adjust this value according to your needs and available cluster resources and configuration.
--template template file for PBS script [default: template.pbs]
omit to use the default template, or
specify your project-specific template
--init write a copy of the template to specific file.
can be used to create "template.pbs", or
use to create initial project-specific template

./ script

Write a bash script for Picky pipeline stringing together lastal alignment, picky selectRep, picky callSV and picky xls2vcf.

./ script --fastq <fastq_file> [--thread <numberOfThreads>] [--preload <preload_fold_of_reads_alignments>]

  --fastq STR    fastq file
  --thread INT   number of threads to use [default: 4]
  --preload INT  fold of preloading for read alignments [default: 6]

See Quick start's Picky processes and The Picky Script for example.


parameter description
--fastq fastq file to be analyzed


Parameterized bash script output directly to console (stdout) can be redirected to a file or stream edited.


parameter description
--thread number of threads to be used for alignment
For faster turn-around, use more threads but this should not exceed the number of cores available on your machine.
--preload fold of preloading for read alignments.
Uses more memory but shorten turnaround time by allowing alignment and selectRep steps to be executed concurrently.