Phrap UMD Description
THE LATEST VERSION OF THE SOFTWARE CAN BE FOUND HERE. Phrap UMD Version 2 consists of the UMD Trimmer, UMD Overlapper and a modified version of Phrap. It is capable of assembling data downloaded directly from the NCBI Trace Archive. The pipeline runs in 3 stages: first the vector ends of the reads are examined and the vector is found. Then the reads are trimmed for vector and quality. After that the trimmed reads afe fed into the 5-pass UMD Overlapper that finds the overlaps, corrects the base caller errors and performs additional trimming if necessary. After the overlaps are produced, the trimmed and error-corrected reads and overlaps are input into the modified version of Phrap, whichonly puts the reads together if they overlap according to the list of overlaps produced by the UMD Overlapper. 1. Installation. * Unzip and untar the provided PhrapUMDV2.tar.gz file. It will untar into the PhrapUmd2 folder. cd to PhrapUmd2 folder and type ./install.sh This script will compile all executables for the PhrapUMDV2. * Now you are ready to go. The test data set consisting of three MAIZE bacs is located under $UMD_PHRAP_ROOT/test_data_set, and you can assemble it using the pipeline-runner.pl script 2. Usage If you just run pipeline-complete.sh with no parameters, it will tell you how to use it: $ ./pipeline-complete.sh pipeline-complete.sh [-e bash_eval_expr] PROJECT_NAME reads quals; Args were '' where: PROJECT_NAME -- your name for the current assembly reads -- location of the reads file quals -- location of the quals file reads and quals must be listed in the same order in the input files. Currently only one single reads file and single quals file input is supported. In the future multiple reads file input will be implemented. As it stands now, the PhrapUmdV2 assembles 5mb genome on 2.8GHz Intel Xeon machine with 4GB memory in ~40 minutes. Results appear in the ./PROJECT_NAME directory as follows: PROJECT_NAME.reads -- trimmed reads PROJECT_NAME.reads.qual -- quality scores for the trimmed reads PROJECT_NAME.overlaps -- UMD overlaps listed by read number (starting with 1) PROJECT_NAME.reliable.overlaps -- UMD Reliable overlaps (see our Rat paper) PROJECT_NAME.reads.ace -- phrap-generated ACE file containing the assembly PROJECT_NAME.reads.contigs -- phrap-generated contig file containing the assembly PROJECT_NAME.reads.contigs.quals -- phrap-generated contig quals file containing the assembly PROJECT_NAME.reads.problems, PROJECT_NAME.reads.problems.qual, PROJECT_NAME.reads.singlets -- phrap-generated overlapper.stdout -- STDOUT for the overlapper overlapper.stderr -- STDERR for the overlapper phrap_output/ -- misc Phrap output files (stdout and stderr) 3. Support Please write to Aleksey Zimin <alekseyz@ipst.umd.edu> with questions/comments on this software. 4. Acknowledgements We thank Phil Green for providing a copy of Phrap and allowing us to modify it to be used with UMD Overlapper.