.. _validate_demultiplexed_fasta:

.. index:: validate_demultiplexed_fasta.py

*validate_demultiplexed_fasta.py* -- Checks a fasta file to verify if it has  been properly demultiplexed, i.e., it is in QIIME compatible format.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Description:**

Checks file is a valid fasta file, does not contain gaps ('.' or '-' characters), contains only valid nucleotide characters, no fasta label is duplicated, SampleIDs match those in a provided mapping file, fasta labels are formatted to have SampleID_X as normally generated by QIIME demultiplexing, and the BarcodeSequence/LinkerPrimerSequences are not found in the fasta sequences.  Optionally this script can also verify that the SampleIDs in the fasta sequences are also present in the tip IDs of a provided newick tree file, can test for equal sequence lengths across all sequences, and can test that all SampleIDs in the mapping file are represented in the fasta file labels.


**Usage:** :file:`validate_demultiplexed_fasta.py [options]`

**Input Arguments:**

.. note::

	
	**[REQUIRED]**
		
	-m, `-`-mapping_fp
		Name of mapping file. NOTE: Must contain a header line indicating SampleID in the first column and BarcodeSequence in the second, LinkerPrimerSequence in the third.  If no barcode or  linkerprimer sequence is present, leave data fields empty.
	-i, `-`-input_fasta_fp
		Path to the input fasta file
	
	**[OPTIONAL]**
		
	-o, `-`-output_dir
		Directory prefix for output files [default: .]
	-t, `-`-tree_fp
		Path to the tree file; Needed to test if sequence IDs are a subset or exact match to the tree tips, options -s and -e  [default: None]
	-s, `-`-tree_subset
		Determine if sequence IDs are a subset of the tree tips, newick tree must be passed with the -t option. [default: False]
	-e, `-`-tree_exact_match
		Determine if sequence IDs are an exact match to tree tips, newick tree must be passed with the -t option. [default: False]
	-l, `-`-same_seq_lens
		Determine if sequences are all the same length. [default: False]
	-a, `-`-all_ids_found
		Determine if all SampleIDs provided in the mapping file are represented in the fasta file labels. [default: False]


**Output:**




**Example:**

 

::

	 validate_demultiplexed_fasta.py -f seqs.fasta -m Mapping_File.txt


