Cloudgene: A graphical execution platform for MapReduce programs

Researchers at  Innsbruck Medical University, Innsbruck, Austria developed  a freely available platform, Cloudgene, to improve the usability of MapReduce programs in Bioinformatics by providing a graphical user interface for the execution, the import and export of data and the reproducibility of workflows on in-house (private clouds) and rented clusters (public clouds). The aim of Cloudgene is to build a standardized graphical execution environment for currently available and future MapReduce programs, which can all be integrated by using its plug-in interface. Since Cloudgene can be executed on private clusters, sensitive datasets can be kept in house at all time and data transfer times are therefore minimized.

Developers show that MapReduce programs can be integrated into Cloudgene with little effort and without adding any computational overhead to existing programs. Cloidgene gives developers the opportunity to focus on the actual implementation task and provides scientists a platform with the aim to hide the complexity of MapReduce. In addition to MapReduce programs, Cloudgene can also be used to launch predefined systems (e.g. Cloud BioLinux, RStudio) in public clouds. Currently, five different bioinformatic programs using MapReduce and two systems are integrated and have been successfully deployed. Cloudgene is freely available at http://cloudgene.uibk.ac.at.

The difference between Cloudgene and similar approaches like Galaxy Cloudman or Amazon’s Elastic MapReduce is that Cloudgene-Mapred improves the usability of current MapReduce programs by executing them and getting feedback via a graphical web interface. One big advantage of Cloudgene is that it also runs on every private cluster having Hadoop MapReduce installed, providing the possibilty to test MapReduce programs on your local cluster and share configurations with other research institutes.

Cloudgene-Cluster

Cloudgene-Cluster supports scientists by launching a cluster in the cloud (currently AWS-EC2) and set ups a ready-to-use environment for a specific use case. It installs the MapReduce framework and necessary variables (instance type, amount of instances, firewall rules) as defined in the configuration file and launches Cloudgene-MapRed on it. All complicated set-ups through the command line are elimnated.

Cloudgene-MapRed

Cloudgene-MapRed improves the usability of currently available MapReduce programs by providing a web interface for their execution and monitoring. Furthermore, a standarized way to import/export data (from S3, HTTP, FTP, file upload) is provided. Cloudgene-MapRed supports the execution of Hadoop jar files (written in Java), the Hadoop Streaming mode (written in any other programming language) and allows a concatenation of programs by defining steps in the manifest file and a reproducibility of analysis.

Reference 

Schoenherr S, Forer L, Weissensteiner H, Specht G, Kronenberg F, Kloss-Brandstaetter A. Cloudgene: Agraphical execution platform for MapReduce programs on private and public clouds. BMC Bioinformatics. 2012 Aug 13;13(1):200.

Incoming search terms:

  • map reduce gui
  • cloud computing in bioinformatics
  • cloudman
  • metagenomics workflow
  • bioscholar bioinformatics cloud
  • variants of vat
  • clouds and clusters
  • mapreduce bioinformatics 2013
  • mapreduc commandlinee program execution
  • latest updation in cloud computing 2012

Subscribe NGS Updates

Share This Article

You might also like

Leave a Reply

Submit Comment

© 2017 Genomics Gateway. All rights reserved.