The feedback from the COMBINE survey was available prior to the H

The feedback from the COMBINE survey was available prior to the HARMONY meeting, and we took the results into consideration when organizing HARMONY. What could have been done differently?
To demonstrate the accuracy of binning using ClaMS, we binned a real metagenome and a simulated metagenome using ClaMS. The real metagenome, the Phrap-assembled phosphorus removal sludge metagenome (SLU) sampled from laboratory-scale bioreactor (IMG/M, taxon OID: 2000000000 [6]), is 56.6M bases long, has 60.45% GC, and contains 31,742 assembled contigs. The simulated metagenome, the assembled medium complexity simulated simMC dataset from FAMeS [7], has 15109 non-chimeric contigs that were 1000 bases or longer and candidates for binning using ClaMS. We evaluated the results using cross-validation of the binned contigs. In the case of simMC, the correct bins of the contigs were already known for cross-validation, in the case of SLU, best hits from Blast alignment were used to cross-validate bins. The phylogenetic distribution of genes in the SLU dataset based on their best Blast hits in IMG/M [6] and the 16S rRNA tree in [8] showed that the dataset was dominated by Betaproteobacteria (127 species), Gammaproteobacteria (396 species), Bacteroidetes (81 species), and the genome of Candidatus A. phosphatis. Four training sets were used to bin SLU: the longest contig belonging to Candidatus A. phosphatis in the SLU dataset (subsequently removed from the set to be binned), betaproteobacterial isolate genomes, all gammaproteobacterial isolate genomes, and all genomes of Bacteroidetes. Scaffolds assigned to each bin were then cross-validated using their existing Blast-based class assignment in IMG/M. As part of the processing pipeline in IMG/M ,the phylogenetic distribution for the metagenome is computed by aligning genes on scaffolds (using BLASTP) to the non-redundant database of sequences computed from isolate genomes stored in IMG. Results are viewable as a phylogenetic distribution of genes in the metagenome by assigning scaffolds to appropriate bins at various taxonomic levels based on the alignment of genes present on them. Results are outlined in Figure 1 Approximately 91% of the scaffolds in the Candidatus A. phosphatis bin have best BLAST matches to Betaproteobacteria, as do 77% of the scaffolds in the Betaproteobacteria bin. Similarly,90% of the scaffolds in the Bacteroidetes bin have BLAST matches to Bacteroidetes, while the scaffolds in the Gammaproteobacteria bin are distributed between Betaproteobacteria (59%) and Gammaproteobacteria (25%). The latter misclassification could be attributed to the fact that the Gammaproteobacteria in the SLU dataset are dominated by Xanthomonadales whose scaffolds have high GC content (64-67%) that is closer to that of Betaproteobacteria (62%) than to Gammaproteobacteria (48%). Moreover the taxonomic position of Xanthomonadales is not well defined [9].

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>