Enabling scientific applications on hybrid e-Infrastructures: the FutureGateway framework
Date:
Thursday, September 29, 2016 - 11:30
Overview:
Cloud computing has evolved quickly in the last few years either in the public and private sectors. However, there are numerous areas of interest to scientific communities where Cloud Computing uptake is currently lacking because of a low integration level with scientific applications and workflow of the cloud provided services. In this context, INDIGO-DataCloud [1] (INtegrating Distributed data Infrastructures for Global ExplOitation - www.indigo-datacloud.eu), a project funded under the Horizon 2020 framework program of the European Union, aims at developing a data & computing platform targeted at scientific communities, deployable on multiple hardware, and provisioned over hybrid e-Infrastructures.
The INDIGO-DataCloud platform features contributions from leading European distributed resource providers, developers, and users from various Virtual Research Communities (VRCs). It is based on open source solutions addressing scientific challenges in the Grid, Cloud and HPC/local infrastructures and, in the case of Cloud platforms, providing IaaS, PaaS and SaaS solutions. SaaS solutions are exposed to end user through Science Gateways, mobile and desktop appliances. INDIGO adopts the Future Gateway (FG) [2] framework which provides both a presentation layer and a back-end API service.
The idea behind the FG is to simplify the development of User Interface (UI) and portals by providing a set of components managing the application lifecycle and some portal component, developed for Liferay application framework, which provide some basic functionalities of a Science Gateway. Additionally, Liferay portlet developed for specific applications are made available so they can be integrated in any Liferay based Science Gateway.
The API service is the main component and is responsible to interact, on behalf of the user, with many distributed computing infrastructures using different protocols. To support different protocols, the API services make use of jSAGA, a java implementation of the SAGA protocol, and its adaptor based architecture. Currently adaptors are available for all the major grid infrastructure based on software, like gLite and unicore, and new cloud infrastructure supporting the standard protocol OCCI. In the context of INDIGO project these adaptors have been extended to include the TOSCA specification for the description of application architectures. Additionally, the SAGA model has been extended including a new “resource model” which allows to better represent the activities performed in a cloud.
The API service exposes a small set of REST APIs [3] which allow to define remote infrastructures, applications and tasks. Portal components and mobile applications can access the API to create tasks. Each task is associated with an application and this can represent either a remote cloud resource the user can access or an application running on a remote resource. The lifecycle of the task is entirely managed by the API service so that the user level code has only to check the status of the task and get the output if available.
The authentication and authorisation steps into the API service are based on token. These are released by the INDIGO IAM service using the OpenID Connect protocol. The user authentication has to be performed through a portal and then the access token can be used to interact with the API service. The API service verifies the token validation using the portal performing the authentication so it has to provide a REST API for the token validation and authorisation. This approach allows to connect the API service with portals having different authn/authz mechanisms without changing in the code. The FG includes the component for Liferay to implement the authentication and authorisation with IAM and specifications are provided to implement the same functionalities with other portal technologies. Mobile applications can access a portal to retrieve the token. Having a portal is mandatory although it can be limited to few functionalities.
The FG has been tested with several use cases selected by the project from the final users’ perspective, these are: climate change, molecular dynamics of proteins and bioinformatics analysis. In more details, the case study on Climate models intercomparison data analysis relates to the climate change domain and community (European Network for Earth System modelling - ENES [4]). It is directly connected to the Coupled Model Intercomparison Project (CMIP), one of the most internationally relevant and large climate experiment as well as to the Earth System Grid Federation (ESGF) [5] infrastructure in terms of existing eco-system and services. The test case demonstrates the INDIGO capabilities to distribute the software framework on heterogeneous infrastructures (e.g., HPC clusters and cloud environments) and integrate them in a workflow using Kepler [6]. This integration allows to run on a distributed infrastructure many parallel data analyses using Ophidia framework at the single-site level where scientific data analytics workflows consisting of tens/hundreds of data processing, analysis, and visualization operators are executed. The end user interface developed on FG provides specific/advanced support for data analytics and visualization.
In the molecular dynamics of proteins use case the computer simulation provides a full atomistic view of motions throughout all regions of the macromolecules [7]. The FG API Service has been connected to an existing portal, not using Liferay application framework, in order to exploit Cloud technologies/solutions from the INDIGO PaaS for MD simulations using protocolized methods in VMs. The existing web interfaces allow to set up and analyze such simulations executed on INDIGO resources.
Finally, the bioinformatics analysis use case revolves around Galaxy [8], an open-source workflow manager platform, that integrates command line tools into an easy to use web-based environment. Galaxy capabilities can be improved by adding new tools from public repositories or custom bioinformatic tool developed by users, adapting each instance to the user needs. INDIGO-DataCloud is developing a Galaxy instance provider, allowing to fully customize each virtual instance through a user-friendly web interface. The components required to automatically set up Galaxy instances are deployed using the INDIGO orchestration service, based on the TOSCA orchestration language, that is compatible with both OpenNebula and OpenStack. The user data access rights will be controlled by the OneData component. Finally, a web front-end is under development to grant a user friendly access to the service, allowing to easily configure and launch each Galaxy instance.
The initial tests demonstrate the easy integration with already established scientific applications and related communities. They were able to extend their workflows to the cloud with relatively few changes on existing front-ends.
In this contribution we will present the FutureGateway architecture and current implementation and we will discuss the use cases supported so far. Future plans for the FG and the next steps within INDIGO-DataCloud will also be presented.
References
[1] https://www.indigo-datacloud.eu
[2] https://www.indigo-datacloud.eu/documents/software-architecture-and-work...
[3] http://docs.fgapis.apiary.io
[4] European Network for Earth System modelling https://verc.enes.org/community/about-enes
[5] Earth System Grid Federation http://esgf.llnl.gov
[6] Marcin Płóciennik, Tomasz Żok, Ilkay Altintas, Jianwu Wang, Daniel Crawl, David Abramson, Frederic Imbeaux, Bernard Guillerminet, Marcos Lopez Caniego, Isabel Campos Plasencia, Wojciech Pych, Pawel Ciecieląg, Bartek Palak, Michał Owsiak, and Yann Frauel. 2013. Approaches to Distributed Execution of Scientific Workflows in Kepler. Fundam. Inf. 128, 3 (July 2013), 281302.
[7] https://www.structuralbiology.eu/
[8] https://galaxyproject.org
Target Audience:
The target audience includes people involved with the development of Science Gateways, or more in general web portals and mobile applications, for communities requiring to access remote e-Infrastructures.
Other people interested are application developers interested in executing the application on remote e-Infrastrcture.
Benefits for Audience:
The audience will learn how the FutureGateway framework can enable scientific applications on e-Infrastructure and provide the related communities with user friendly tools for their control.
The presentation will provide some specific use cases as example but the approach can be easily adapted to many others so users from all scientific communities can be interested.
Application developers can learn how to integrate their applications with the FutureGateway to execute computations or create resources on remote e-Infrastructures by a small set of REST APIs.
Topic 1: Challenges facing users and service providers
Presenter | Organisation |
Marco Fargetta |
INFN |
Download presentation: