Biocep-R project
Copyright © 2007-2009 Karim Chine

Executive Summary
I - Open Science in the cloud, towards a universal platform for mathematical and statistical computing
      R , the open-source software environment for statistical computing and graphics, is becoming the lingua franca of data analysis. Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemometrics, etc are growing at an exponential rate. Scilab , the open-source software package for numerical computations, is becoming more and more widely used for engineering and scientific applications. The ubiquitous Java technologies allow the building of highly effective platform-independent distributed systems and graphical user interfaces. Free virtualization technologies allow the creation, distribution and reuse in any environment of snapshots of operating systems, computing software stacks and data sets. Finally The Amazon EC2's simple web services interface let anyone run computations on demand on Amazon's proven computing environment (public cloud) and the open-source Eucalyptus system enables to mimic those web services on private infrastructures (private cloud). Biocep builds with these ingredients and others a universal open-source computing platform that enhances dramatically the accessibility of mathematical and statistical computing, creates an open environment for the production, sharing and reuse of all the artifacts of computing and puts unprecedented analytical, numerical and processing capabilities in the hands of everyone (open science).
      With Biocep, R/Scilab computational engines are abstracted with URLs and can run at any location. They can be interactively controlled from the user's laptop either programatically or via an extensible, highly productive data analysis workbench or from highly programmable spreadsheets. The computational engines can be used as clusters on Grids and Clouds to solve computationally intensive problems, to build scalable analytical web applications or to expose functions as web services or nodes for workflow workbenches. They can also be used to distribute numerical/statistical user interfaces created with drag-and-drop tools and can be accessed simultaneously by several users to work with data collaboratively.
    II - A Google docs-like portal for data analysis : towards a user-friendly facade for the ubiquitous cloud
      The Biocep-R software platform makes it possible to use mainstream statistical/scientific computing environments such as R,Scilab, SciPy, Sage and Root as a service in the cloud. The full capabilities of the environments are exposed to the end user from within a simple browser. He/she can issue commands, install and use new packages, generate and interact with graphics, upload and process files, download results, etc. using high-capacity virtual machines that he/she starts and stops on-demand. The full computational environment and the data can be snapshotted any time, shared and reused. Spreadsheets running in the cloud and fully integrated with the computing environments functions and data can be mirrored to web browsers and to Excel. The Platform takes the computing engine to the data and allows many collaborators to access and analyze together that data using collaborative consoles, editors, spreadsheets and annotatable graphics. The platform helps performing elastic distributed computing with any number of virtual machines to solve heavily computational problems or deploying highly scalable computational backbends for analytical applications and workflows. Finally, the platform enables to easily drag and drop visual components and create user interfaces and dashboards that use advanced statistical/numerical models running on cloud machines. Those Interfaces can be easily delivered to the end user with simple URLs. Elastic-R is a new portal built using the Biocep-R platform. It enables anyone to use AWS resources seamlessly, to work with R, Scilab, etc. within the browser and to collaborate, share and reuse data, functions, algorithms, user interfaces, and servers. It aims to become the "Google docs" of data analysis.

    Articles / Citations

    • Karim Chine, "Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud-Ready Computational Open Platform" escience,pp.321-322, 2008 Fourth IEEE International Conference on eScience, 2008
      pdf


    • Karim Chine, "Scientific Computing Environments in the age of virtualization, toward a universal platform for the Cloud" pp. 44-48, 2009 IEEE International Workshop on Open Source Software for Scientific Computation (OSSC), 2009
      pdf - doc


    • Karim Chine, "Open Science in the Cloud: Towards a Universal Platform for Scientific and Statistical Computing", Chapter 19 in "Handbook of Cloud Computing", Springer, 2010 (in press)
      pdf - docx

    • Karim Chine, "Learning Math and statistics on the cloud: an EC2-based Google-docs- like portal for teaching/learning collaboratively with R and Scilab", 2010 10th IEEE International Conference on Advanced Learning Technologies
      pdf - doc


    Presentations Slides

    • Bio-IT World 2010 slides (Elastic-R) : pptx pdf
    • Bio-IT World 2009 slides : ppt
    • UseR 2009 slides: pdf  ppt

    Biocep-R within the Technology Environment
    Open Platform Diagram
    Biocep-R Computational Open Platform Ecosystem
    Open Platform Diagram
    Distributed Computing in the Cloud
    Open Platform Diagram


    Author

      Karim - Chine --- Open Platform Diagram ---
      CV
      - -

    Talks, Tutorials, Conferences


    License

      This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

    Project Source Code

    • the public svn link (anonymous access) : svn://svn.r-forge.r-project.org/svnroot/biocep
    • Project summary page on R-Forge: here

    They talk about Biocep-R

    • The Biocep R Project Brings Open Science to the Cloud - ReadWriteCloud - here
    • Interview with the author (e-taalim) here
    • Interview with the author (Tekiano) here
    • Tekiano article here
    • Interview with the author (Decisionstats) here
    • Hans Gilde's weblog here
    • CRAN Task Views - High Performance Computing here
    • State-of-the-art in Parallel Computing with R -Technical Report Number 47, 2009 - Department of Statistics-University of Munich State-of-the-art in Parallel Computing with R here
    • BusinessWeek Cloud Computing Ad Section here
    • DecisionStats blog here
    • R & BioConductor Manual here
    • Bitlab Wiki- here
    • Enabling reproducible research: licensing for scientific innovation here
    • e-Taalim, Cloud computing for science and education here
    • e-Taalim, Cloud computing: Recherche scientifique, éducation et solidarité numérique here


    Project Deliverables & Howtos (Last update: June 1st, 15h25)

    For Windows :

    • R Workbench without R, without plugins and without extensions here (install R first, and make it accessible from your command line by adding its binary's location to your system PATH)
    • R Workbench without R, with plugins (EC2/S3 monitors + examples) and with extensions (OpenOffice-based file converter) here
    • R Workbench with R (2.8.0), with plugins (EC2/S3 monitors + examples) and with extensions (OpenOffice-based file converter) here
    • readme

    For all Operating Systems :

    • Prerequisties:

      • java 5 (or upper JRE) installed
        to run the workbench and connect to R servers on remote hosts

      • java 5 (or upper JDK) and R>=2.5 installed and accessibles from the command line
        to run R servers from the command line or via the workbench
        to generate java mappings for S4 classes and R functions
        to generate Web Services exposing R functions
        to run the miniature R virtualisation, the R-SOAP and the generated Web Services Web Applications

    • The Virtual R Workbench


      • As a Java Web Start application (recommended) : here
        create and connect to an R server on your machine : choose "Create New R Server" , choose "On My Machine", OK

      • As a desktop application :
        download biocep.jar : here
        from your command line: java -jar biocep.jar
        create and connect to an R server on your machine : choose "Create New R Server", choose "On My Machine", OK
      • As an applet : here
        create and connect to an R server on your machine : choose "Create New R Server" , choose "On My Machine", OK

    • The Biocep Core

      • download it here

      • run an R server
        from your command line: rmiregistry & (on windows: start rmiregistry)
        from your command line: java -Dname=toto -jar biocep-core.jar (replace toto with any other name, repeat the command with different server names to run several R servers)
        connect to the server : open the workbench, choose "Connect to R via RMI", choose "Use:" → "Rmi Registry", click on "Refresh", choose your R Server name (toto, ..), OK

      • jython scripting with an R server
        example (connect to an existing R server)

      • groovy scripting with an R server
        example (create an R server)

      • create and use R Server from your own java web application : use the biocep core for tomcat available here

      • run the miniature R virtualization and the R-SOAP web applications (automatic download)
        java -Dport=8080 -cp biocep-core.jar HttpServer
        connect to the virtualization server : open the workbench, choose "Connect to R via Http", keep the default value for "Url", OK


    • The Miniature R Virtualization Web Application

      • download it here

        run it via the biocep embedded jetty server : java -Dport=8080 -cp biocep-core.jar HttpServer rvirtual.war or deploy it to tomcat ?
        connect to the virtualization server :
        • open the R workbench
        • choose "Connect to R via Http"
        • keep the default value for "Url"
        • keep "Private R" checked
        • keep "Private R Name" empty or enter a name of your choice for your R Server if you would like to keep it alive after logging off (for future reconnections, reenter the same server name)
        • OK


        connect to the virtualization server from Java : example requires biocep-core.jar

    • The R-SOAP Web Application

      • download it here

        run it via the biocep embedded jetty server: java -Dport=8080 -cp biocep-core.jar HttpServer rvirtual.war rws.war or deploy it to tomcat ?
        (needs the miniature R virtualization web application to run)
        get the R-SOAP WSDL : open the following URL : http://127.0.0.1:8080/rws/rGlobalEnvFunction?wsdl
        use the URL to generate a Web Service Client for R-SOAP, use R-SOAP from java: example, R-SOAP java client eclipse project

    • The Biocep Tools

      • download it here

      • generate stateful and stateless Web Services for R functions
        download (save as) globals.r and rjmap.xml
        add to globals.r your functions definitions and their dependencies (library..)
        in globals.r, add TypeInfos to your functions ?
        add to rjmap.xml "function" tags with your functions names under "rj"/"publish"/"functions"
        java -Dfile=rjmap.xml -Dwarname=MyWebService -jar biocep-tools.jar
        run the generated web services web application (distrib/MyWebService.war) and the miniature virtualization web application:
        java -Dport=8080 -cp biocep-core.jar HttpServer rvirtual.war distrib/MyWebService.war or deploy them to tomcat ?
        use the following url (wsdl) : http://127.0.0.1:8080/MyWebService/rGlobalEnvFunction?wsdl to generate a web service client using your published functions


    • Biocep Plugins

      • A simple example of a plugin created with the Netbeans GUI designer
        download the SimplePlugin.jar here
        run the workbench as a desktop application: java -jar biocep.jar
        create and connect to an R server on your machine : choose "Create New R Server", choose "On My Machine", OK
        go to the menu "Plugins" / "Open Plugin View From Jar File" → "Choose jar" → pick SimplePlugin.jar → OK
        in the new View, set a value for n and click "Submit", the SVG Panels are resizables

      • A plugin embdding xulrunner. Enables the use of Firefox (Browser) , Elasticfox (EC2 monitor) and S3fox (S3 monitor) as views of the workbench
        download the mozillabrowser.zip here
        run the workbench as a desktop application: java -jar biocep.jar
        create and connect to an R server on your machine : choose "Create New R Server", choose "On My Machine", OK
        go to the menu "Plugins" / "Install Plugin from Zip File" → choose the mozillabrowser.zip file → OK
        open Elasticfox (EC2 monitor) : go to the menu "Plugins" / mozillabrowser / Elastcifox - EC2 monitor
        open S3fox (S3 monitor) : go to the menu "Plugins" / mozillabrowser / S3fox - S3 monitor

      • A Netbeans project for creating plugins visually and distribute them via simple URLs
        download and unzip BiocepPluginsStudio.zip here
        Open the project with Netbeans and edit MyDashboard (source editing or visual editing)
        Press F11 to build the plugin
        1 open the workbench , open Plugins / ISMB / MyDashboard
        2 Use the first displayed URL to distribute an EC2-based version of your new view
        3 Use the second displayed URL to distribute a version of your new view that creates transparently an R engine on your user's machine and use it



    Biocep-R on Amazon's Cloud

    • Getting Started with Amazon EC2

      • Sign up for Amazon EC2 here
      • Install Elasticfox, the Mozilla Firefox extension for interacting with Amazon EC2 from here
      • Learn how to use Elasticfox to connect to your EC2 account, browse available AMIs (Amazon Machine Images ) and run AMIs from here
      • Few issues like keys conversion for beeing able to ssh the virtual machines instances can be answered using EC2 getting started documentation here

      • The following Workbench plugin : mozillabrowser embeds Firefox, Elasticfox and S3fox as views of the Workbench
        Unzip the plugin under ~/RWorkbench/plugins ( for windows : %UserProfile%\RWorkbench\plugins )
        start the workbench and Choose the menu "Plugins" / "Mozilla browser" / "Elasticfox - EC2 Monittor" to run the standalone Elasticfox
    • Start the Biocep-R AMI ami-cd5fb9a4 : Ubuntu 9.0.4 Jaunty Jackalope / R version 2.9.0 / Scilab 5.1.0 /java version 1.6.0

      • find ami-cd5fb9a4 (select region "us-east-1", search with AMI id or with the keyword "biocep", the AMI manifest is : biocep-ubuntu904-r290-j160-sci510-cologne/biocepimage.manifest.xml )
      • Create a keys pair if you dont have one already
      • Create a security group with one port of your choice open {my_port} : add a permission for a TCP/IP port {my_port} open to the network 0.0.0.0/0
      • Run ami-cd5fb9a4 , choose your keys pair and your security group , insert the following to the field user data
        start=true
        port={my_port}
        login={my_login}
        pwd={my_pwd}
        email={my_email}
        workers={nbr_workers}
      • when the ami starts running, you receive an email with the URL to use to connect the Workbench to the ami

        or





    Biocep-R on Virtual Appliances

    • Download and install the VMware player from here. On Mac, use VMware Fusion.
    • Download and unzip the VMware image (R+Scilab+Biocep) from here.
    • Double-click on Ubuntu-server-9.04-i386.vmx (file under the folder "ubuntu-r-scilab-biocep") to run the virtual machine. Once asked wether you moved the image or copied it, answer "I moved it".
    • The machine dipslays "Host IP :" followed by its IP address.


    R Virtualization

    R Virtualization Diagram

    R Servers Pool - Deployment

    R Servers Pool - Deployment Diagram

    R Servers Pool - Architecture

    R Servers Pool - Architecture Diagramt

    R Virtualization on an LSF Cluster

    R Virtualization on an LSF Cluster Diagram

    Biocep on the National Grid Service

    Biocep on the National Grid Service Diagram

    Scripting with R

    Scripting

    Web Services Generation

    Web Services Generation Diagram

    Workflows with Generated Stateful Web Services

    Workflows with Stateful Web Services Diagram

    Workbench Plugins

    Workbench Plugins  Diagram

    Collaborative R

    Collaborative R  Diagram

    Standard R Objects Mapping Class Diagram

    Scripting

    Generated Mapping for S4 ExpressionSet Class Diagram

    Scripting

    Acknowledgements

      ACS: Madi Nassiri Amazon: Simone Brunozzi, Deepak Singh AT&T Research Labs: Simon Urbanek ATUGE: Imen Essafi, Béchir Tourki, Ilyes Gouja, HatemHachicha, Amine Elleuch Banca d'Italia: Giuseppe Bruno Bio-IT World :Kevin Davies Cambridge Healthtech Institute: Cindy Crowninshield City University of New York: Mario Morales, Makram Talih Columbia University: Omar Besbes Dataspora: Michael E. Driscoll EBI: Alvis Brazma, Wolfgang Huber, Kimmo Kallio, Misha Kapushesky, Michael Kleen, Alberto Labarga, Philippe Rocca-Serra, Ugis Sarkans, Kirsten Williams, Eamonn Maguire EPFL: Darlene Goldstein Esprit: Farouk Kamoun, Tahar Ben Lakhdar ETH Zürich: Yohan Chalabi, Diethelm Würtz, Martin Mächler e-Taalim.com: Nadhir Douma FHCRC: Martin Morgan, Seth Falcon, Nianhua Li FVG LLC: Lisa Wood Google: Olivier Bosquet Harvard Business School: Ousseynou Nakoulima Harvard University: Tim Clark, Sudeshna Das, Douglas Burke, Paolo Ciccarese IBM: Jean-Louis Bernaudin, Pascal Sempe, Loic Simon, Lea A Deleris, Alex Fleischer, Alain Chabrier Imperial College London: Asif Akram, Vasa Curcin, John Darlington, Brian Fuchs Indiana University: Michael Grobe INRIA: David Monteau JISC: David Flanders Johnson & Johnson - Janssen Pharmaceutica: Patrick Marichal Lancaster University: Robert Crouchley, Daniel Grose Leibniz Universität Hannover: Kornelius Rohmeier Limagrain: Zivan Karaman Mekentosj: Alexander Griekspoor Microsoft: Eric Le Marois, Tony Hey Mubadala: Ghazi Ben Amor Nature Publishing Group: Ian Mulvany, Steve Scott NCeSS: Peter Halfpenny, Rob Procter, Marzieh Asgari-Targhi, Alex Voss, YuWei Lin, Mercedes Argüello Casteleiro, Wei Jie, Meik Poschen, Katy Middlebrough, Pascal Ekin, June Finch, Farzana Latif, Elisa Pieri, Frank O'Donnell, Kenny Baird New York Java User Group: Frank D Greco OeRC: Dimitrina Spencer, Matteo Turilli, David Wallom, Steven Young OMII-UK: Neil Chue Hong, Steve Brewer OpenAnalytics: Tobias Verbeke Oracle: Dominique van Deth, Andrew Bond OSS Watch: Ross Gardler Platform Computing: Christopher Smith San Diego Supercomputer Center: Nancy R. Wilkins-Diehr Sanger Institute: Daniel Jeffares, Matt Wood, Phil Butcher Shell: Wayne.W.Jones, Nigel Smith Stanford University: John Chambers, Balasubramanian Narasimhan, Gunter Walther SYSTEM@TIC: Karim Azoum Technische Universität Dortmund: Uwe Ligges, Bernd Bischl The Generations Network: Jim Porzak Tunisian Ministry of Communication Technologies: Naceur Ammar, Lamia Chaffai-Sghaier, Mohamed Saïd Ouerghi Tunisian Ecole Polytechnique: Riadh Robbana UC Berkeley: Noureddine El Karoui, Terry Speed UC Davis: Rudy Beran, Debashis Paul, Duncan Temple Lang UCLA: Ivo Dinov UCSF: Tena Sakai Université Catholique de Louvain: Christian Ritter University of Cambridge: Ian Roberts, Robert MacInnis,Peter Murray-Rust, Jim Downing University of Manchester: Carole Goble, Len Gill, Simon Peters, Richard D Pearson, Iain Buchan, John Ainsworth University of Plymouth: Paul Hewson University of Split: Ivica Puljak UTK: Ajay Ohri Wirtschaftsuniversität Wien: Stefan Theussl World Bank Group-IFC: Oualid Ammar Yahoo: Laurent Mirguet, Rob Weltman Independent: Charles Dallas, Romain François, Manfred Duchrow, Joerg Mueller, Slava Pestov