Phrap UMD Description
THE LATEST VERSION OF THE SOFTWARE CAN BE FOUND HERE.

Phrap UMD Version 2 consists of the UMD Trimmer, UMD Overlapper and a modified version of Phrap.  
It is capable of assembling data downloaded directly from the NCBI Trace Archive. The pipeline 
runs in 3 stages:  first the vector ends of the reads are examined and the vector is found.  
Then the reads are trimmed for vector and quality.  After that the trimmed reads afe fed into 
the 5-pass UMD Overlapper that finds the overlaps, corrects the base caller errors and performs 
additional trimming if necessary.  After the overlaps are produced, the trimmed and error-corrected 
reads and overlaps are input into the modified version of Phrap, whichonly puts the reads together 
if they overlap according to the list of overlaps produced by the UMD Overlapper. 

1.  Installation.  

* Unzip and untar the provided PhrapUMDV2.tar.gz file.  It will untar into the PhrapUmd2 folder.  
cd to PhrapUmd2 folder and type ./install.sh  This script will compile all executables for the PhrapUMDV2.

* Now you are ready to go.  The test data set consisting of three MAIZE bacs is located under 
$UMD_PHRAP_ROOT/test_data_set, and you can assemble it using the pipeline-runner.pl script

2. Usage

If you just run pipeline-complete.sh with no parameters, it will tell you how to use it:

$ ./pipeline-complete.sh
pipeline-complete.sh [-e bash_eval_expr] PROJECT_NAME reads quals;
Args were ''

where:
PROJECT_NAME -- your name for the current assembly
reads -- location of the reads file
quals -- location of the quals file

reads and quals must be listed in the same order in the input files.  Currently only one single 
reads file and single quals file input is supported.  In the future multiple reads file input will 
be implemented.  

As it stands now, the PhrapUmdV2 assembles 5mb genome on 2.8GHz Intel Xeon machine with 4GB memory 
in ~40 minutes.

Results appear in the ./PROJECT_NAME directory as follows:

PROJECT_NAME.reads -- trimmed reads 
PROJECT_NAME.reads.qual -- quality scores for the trimmed reads
PROJECT_NAME.overlaps -- UMD overlaps listed by read number (starting with 1)
PROJECT_NAME.reliable.overlaps -- UMD Reliable overlaps (see our Rat paper)
PROJECT_NAME.reads.ace -- phrap-generated ACE file containing the assembly
PROJECT_NAME.reads.contigs -- phrap-generated contig file containing the assembly
PROJECT_NAME.reads.contigs.quals -- phrap-generated contig quals file containing the assembly
PROJECT_NAME.reads.problems, PROJECT_NAME.reads.problems.qual, PROJECT_NAME.reads.singlets -- phrap-generated
overlapper.stdout -- STDOUT for the overlapper
overlapper.stderr -- STDERR for the overlapper
phrap_output/ -- misc Phrap output files (stdout and stderr)

3. Support

Please write to Aleksey Zimin <alekseyz@ipst.umd.edu> with questions/comments on this software.

4. Acknowledgements

We thank Phil Green for providing a copy of Phrap and allowing us to modify it to be used with UMD Overlapper.