- Different reference genome indexing from previous versions. We have masked all ambiguous bases in the reference. Users can disable this masking by disabling the macro "CHECK_UNKOWN_GENOME_BASES" in the Makefile while compiling. Type command "cushaw2 index" to get options for genome index building.
- Support ABI SOLiD color-space single-end and paired-end/mate-paired alignment. Type command "cushaw2 calign" to get options for color-space alignment.
By default, CUSHAW2 aligns color-space reads in a conservative manner, i.e. our aligner will only ouput unique alignments with <= 5 edit distances (an alignment is deemed to be unique if its mapping quality score >= 30, as the quite small read lengths). For paired-end/mate-paired alignment, the two alignments of both ends will be reported if either alignment is unique. To yield higher sensitivity, users can run the program in an aggressive manner by specifying parameters "-min_qual 0 -max_edit_dist -1". Use the option "-mode" to specify if the reads are paired-end or mate-paired.
- Improved base-space single-end and paired-end alignment quality by using a double-seeding policy (see our arXiv paper for more details). Type "cushaw2 align" to get options for base-space alignment (e.g. Illumina, 454, Ion Torrent and PacBio).
The use of local alignment or the combination of local and semi-global alignments can be switched by option "-atype". By default, this version uses the combination of local and semi-global alignments.
- Add a new option "-max_edit_dist" to allow users to specify the maximum edit distance of a reported alignment.
- Each alignment in the SAM file has two tags given: NM (edit distance) and AS (alignment score).