Calibration of LOFAR data on the cloud

Publication date: 
Main author: 
Sabater, J.
IAA authors: 
Sánchez-Expósito, S.;Garrido, J.;Verdes-Montenegro, L.
Sabater, J.;Sánchez-Expósito, S.;Best, P.;Garrido, J.;Verdes-Montenegro, L.;Lezzi, D.
Astronomy and Computing
Publication type: 

New scientific instruments are starting to generate an unprecedented amount of data. The Low Frequency Array (LOFAR), one of the Square Kilometre Array (SKA) pathfinders, is already producing data on a petabyte scale. The calibration of these data presents a huge challenge for final users: (a) extensive storage and computing resources are required; (b) the installation and maintenance of the software required for the processing is not trivial; and (c) the requirements of calibration pipelines, which are experimental and under development, are quickly evolving. After encountering some limitations in classical infrastructures like dedicated clusters, we investigated the viability of cloud infrastructures as a solution. We found that the installation and operation of LOFAR data calibration pipelines is not only possible, but can also be efficient in cloud infrastructures. The main advantages were: (1) the ease of software installation and maintenance, and the availability of standard APIs and tools, widely used in the industry; this reduces the requirement for significant manual intervention, which can have a highly negative impact in some infrastructures; (2) the flexibility to adapt the infrastructure to the needs of the problem, especially as those demands change over time; (3) the on-demand consumption of (shared) resources. We found that a critical factor (also in other infrastructures) is the availability of scratch storage areas of an appropriate size. We found no significant impediments associated with the speed of data transfer, the use of virtualization, the use of external block storage, or the memory available (provided a minimum threshold is reached). Finally, we considered the cost-effectiveness of a commercial cloud like Amazon Web Services. While a cloud solution is more expensive than the operation of a large, fully-utilized cluster completely dedicated to LOFAR data reduction, we found that its costs are competitive if the number of datasets to be analysed is not high, or if the costs of maintaining a system capable of calibrating LOFAR data become high. Coupled with the advantages discussed above, this suggests that a cloud infrastructure may be favourable for many users.

ADS Bibcode: