1. Separate HCV Sequences by type from HCV_2002-2004_dupnamesremoved.fasta using Read_Fasta_sepbytype.pl.

2. For type 1, because there are so many, count duplicate sequences and append number to name of one, delete rest, using Unique String Frequency.vi.

3. Align sequences of each type using Clustalx. (open/extend gap penalties = 0 and realign selected residues of bad alignments).

3. Search for duplicates using Unique String Frequency.vi.
 	For type 1, search for duplicate sequences using Unique String Frequecy 2nd Run.vi, which adds numbers of duplicates together
	rather than initializing the value.

4. Remove sequences with ambiguous bases using RemoveAmbigSeqs.vi (ambiguous bases overrepresent unique sequences and possibly indicate sequencing error).

5. Put unique sequences of all types in same .fasta file and check for duplicates (they should all be unique because they are different types).
	This returns duplicates across types, meaning ARUP error in assigning types.

6. Graph frequencies of unique sequences using Graph Unique Sequences.vi.

7. Take highly represented sequences for each type (i.e. >100 duplicates for type 1) and save as pretty format in Se-Al. This allows us to see
   variation between types and consensus within a type for the 'most important' sequences. By eye can look for good probe regions.  

8. Create a weight matrix of aligned sequences using Aln2PWM or UniqAln2PWM.pl.

9. Plot weight matrices with ProfilePlot.vi. Look at probe region on plot of the PWM.


Misc. tools:
Use Se-Al on mac to fix alignments. Input fasta by copy, paste. Save as fasta. Open in Word on PC and 'save' to convert line returns.