CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing

Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.

The  implementation of CloudLCA, data generation scripts, sample datasets, and usage instructions are available online. Download software here.

Reference:

Zhao G, Bu D, Liu C, Li J, Yang J, Liu Z, Zhao Y, Chen R. CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing. Protein Cell. 2012 Feb;3(2):148-52. Epub 2012 Mar 17. PubMed PMID: 22426983.

Incoming search terms:

  • analysis of metagenomic data sets time consuming
  • ip:67 223 98 194 node

Subscribe NGS Updates

Share This Article

You might also like

Leave a Reply

Submit Comment

© 2024 Genomics Gateway. All rights reserved.