Skip to content

Commands

Chee-Hong Wong edited this page Mar 9, 2018 · 2 revisions

Picky commands

Running picky.pl without any parameter

./picky.pl

will provide the list of commands picky support.

Please specify the command.

./picky.pl <command> -h

<command> [hashFq, selectRep, callSV]
hashFq    : hash read uuids to friendly ids
lastParam : Last parameters for alignment
selectRep : select representative alignments for read
callSV    : call structural variants
xls2vcf   : convert Picky sv xls file to vcf
sam2align : convert sam to align format
preparepbs: chunk last fastq file and write pbs script for cluster submission
script    : write a bash-script for single fastq processing

./picky.pl hashFq

OPTIONAL step to hash read uuids to human-friendly ids.

./picky.pl hashFq --pfile <passFQFile> --ffile <failFQFile> --oprefix <outputPrefix>

  --pfile STR    pass .fastq file
  --ffile STR    fail .fastq file
  --oprefix STR  prefix to output filename

./picky.pl lastParam

Return the suggested alignment parameters to be used with lastal.

It is advisable to use ./picky.pl script to set up the pipeline. See Quick start for example.

./picky.pl selectRep

Select representative alignments from lastal's maf output.

./picky.pl selectRep [--thread <numberOfThreads>] [--preload <preloadFold>]

--thread INT   number of threads
--preload INT  Fold of thread count to preload maf records

Input

Read from STDIN

Output

Output directly to console (stdout) the selected representative alignment for the read in .align format.

General

parameter description
--thread number of threads to be used for alignment
For faster turn-around, use more threads but this should not exceed the number of cores available on your machine.
--preload fold of preloading for read alignments.
Uses more memory but shorten turnaround time by allowing alignment and selectRep steps to be executed concurrently.

./picky.pl callSV

Perform SV calling on the .align file generated by picky.pl selectRep.

./picky.pl callSV --in <alignFile> --fastq <fqFile> --lastpara <last parameters> [--genome <genomeFastaFile> --removehomopolymerdeletion] [--sam] [--exlucde <chromosomeToExeclude> [--exlucde <anotherChromosomeToExeclude>]]

  --oprefix STR   prefix for output files
  --fastq STR     .fastq file
  --lastpara STR  lastal parameters used
  --removehomopolymerdeletion
                  exclude DEL and INDEL possibly affected by homopolymer
  --genome STR    genome sequence in .fasta file
  --sam           flag to output .sam file
  --exclude STR   exclude SV invovling specified chromosome
                  (specify each chromosome with --exclude individually)
  --multiloci     report SVs on best alignment of multi-loci aligments

Input

Provide .align file from Picky selectrep via STDIN or "--in"

Output

Output a set of SV .xls files along with auxiliary files. See Output Format's Set 2 : SVs Calling.

General

parameter description
--oprefix prefix for output files
--fastq fastq file containing reads analyzed
--lastpara specified lastal parameters used which will be recorded in .sam file
--sam indicate .sam file to be generated
--exclude exclude SV invovling specified chromosome
specify each chromosome with --exclude individually
--multiloci report SVs on best alignment of multi-loci aligments
--removehomopolymerdeletion OPTIONAL: exclude DEL and INDEL possibly affected by homopolymer
ONLY necessary if you are using earlier base-called fastq
--genome OPTIONAL: genome sequence in .fasta file;
ONLY necessary if you are using "--removehomopolymerdeletion"

./picky.pl xls2vcf

Convert .xls SV files generated by picky.pl callSV to .vcf file.

./picky.pl xls2vcf --xls <picky_xls_file> [--chrom <chromosome>] [--re <minReadsSupport>]

  --xls STR       picky SV xls file
  --chrom STR     restrict output to specified chromosomes [e.g. chr20]
  --re INT        min number of read evidence [default:2]
  --merge         window to merge SV [default: 1000 bp]
  --converge      window which SVs are considered converged concordantly [default: 20 bp]

Input

parameter description
--xls SV .xls file generated by picky.pl callSV
multiple .xls files separated by comma or each .xls file prefix with --xls
i.e. "--xls sv.del.xls,sv.indel.xls" and "--xls sv.del.xls --xls sv.indel.xls" are equivalent

Output

Output directly to console (stdout) in .vcf format.

General

parameter description
--chrom report SVs found on specified chromosomes
--re report SVs that has at least this required number of reads support [default:2]
--merge window to merge SV [default: 1000 bp]
--converge window which SVs are considered converged concordantl [default: 20 bp]

./picky.pl sam2align

Convert .sam content to .align format for picky.pl callSV.

Input

Input from console (stdin) in .sam format.

NOTE: The sam records should be read-blocked, i.e. alignment records from the same read should be contiguous. The tag value of 'SO:' must be "queryname" in the header line '@HD' or the tag 'SO:' excluded.

Output

Output directly to console (stdout) in .align format.

NOTE: A large number of output columns are specific to LAST output and for tracebility. sam2align only output the minimum columns needed for callSV. The minimum columns are qStrand, qStart, qEnd for read/query and refId, refStrand, refStart and refEnd for reference/subject.

./picky.pl preparepbs

Chunk the specified .fastq file and write PBS scripts instantiated from the template "template.pbs" for all chunk. This prepapres files for cluster jobs to be submitted.

./picky.pl preparePBS --fastq <fastq_file> [--chunksize <numberOfReadsPerChunk>] [--template <template_file>]

  --fastq STR      fastq file
  --chunksize INT  number of fastq record per chunk file [default: 1000]
  --template STR   template file for PBS script [default: template.pbs]
  --init STR       write a copy of the template to specific file

See cluster support for an detail example.

Input

parameter description
--fastq fastq file to be analyzed

Output

Write chunked fastq file for each <chunksize> fastq records from the specified fastq file along with the corresponding PBS script.

For a specified fastq "SCP20.fastq" with says 277,054 reads, Picky preparepbs will generate 278 chunk .fastq files (SCP20-c000001.fastq, SCP20-c000002.fastq, ..., SCP20-c000278.fastq) and the corresponding 278 PBS scripts (SCP20-c000001.pbs, SCP20-c000002.pbs, ..., SCP20-c000278.pbs).

General

parameter description
--chunksize number of fastq record per chunk file [default: 1000]
Large chunksize means longer run time, but less number of chunk files to manage.
You should adjust this value according to your needs and available cluster resources and configuration.
--template template file for PBS script [default: template.pbs]
omit to use the default template, or
specify your project-specific template
--init write a copy of the template to specific file.
can be used to create "template.pbs", or
use to create initial project-specific template

./picky.pl script

Write a bash script for Picky pipeline stringing together lastal alignment, picky selectRep, picky callSV and picky xls2vcf.

./picky.pl script --fastq <fastq_file> [--thread <numberOfThreads>] [--preload <preload_fold_of_reads_alignments>]

  --fastq STR    fastq file
  --thread INT   number of threads to use [default: 4]
  --preload INT  fold of preloading for read alignments [default: 6]

See Quick start's Picky processes and The Picky Script for example.

Input

parameter description
--fastq fastq file to be analyzed

Output

Parameterized bash script output directly to console (stdout) can be redirected to a file or stream edited.

General

parameter description
--thread number of threads to be used for alignment
For faster turn-around, use more threads but this should not exceed the number of cores available on your machine.
--preload fold of preloading for read alignments.
Uses more memory but shorten turnaround time by allowing alignment and selectRep steps to be executed concurrently.