Second SKAO Science Data Challenge concludes with strong collaboration and innovation
The SKAO Data Challenges are designed to prepare future users to efficiently handle SKAO data, so that it can be exploited to its full potential as soon as the telescopes enter early operations, and to drive the development of data analysis techniques. They also assist the Observatory and its computing partners in preparing the systems and processes needed for the network of SKA Regional Centres (SRCs) which will store, process and provide access to data for astronomers globally.
Forty teams comprising 280 participants in 22 countries took part in SDC2, which kicked off in February this year and lasted for six months. They were supported by eight supercomputing centres around the world, providing vital storage and processing resources*
“We have been delighted to see such enthusiasm for the challenge and such a wide geographical spread of participants, which shows the strong engagement from the science and software communities,” says SKAO Postdoctoral Fellow Dr Philippa Hartley, who co-led the challenge. “It’s exciting to see the variety of methods used, and how they compare and complement each other. Thank you to everyone who contributed, both within the teams and at our computing partner facilities, whose generosity made the challenge possible.”
For this challenge, teams were tasked with developing computer algorithms to identify and characterise nearly 250,000 galaxies in a simulated 1TB SKAO data cube. They were scored on two elements: the number of objects found (with a penalty for false positives), and how accurately they measured the objects’ different characteristics, for example their size or brightness. These were combined to give a final score.
The MINERVA team at Paris Observatory-PSL, CNRS and partners at the Canadian Institute for Theoretical Astrophysics and Observatoire astronomique de Strasbourg, achieved the top score after using an innovative approach based on Machine Learning on the GENCI-IDRIS computing facility.
The team developed two independent but complementary tools for the analysis, the results of which were then cross-checked with each other. If an object appeared in the results of both approaches it was weighted more highly, helping to reduce the false positive rate.
“It’s a beautiful example of the kind of joint effort that we are aiming for by coordinating SKA-related activities in France,” says Dr. C. Ferrari, SKA-France Director and Astronomer at Observatoire de la Côte d’Azur (OCA). “Having astronomers, developers, engineers from different research institutes and infrastructures working together will be paramount for the future organisation of SRCs.”
Some teams, including Minerva and the FORSKA-Sweden team, a close runner up, used machine learning – specifically deep learning, or neural networks – whereby computers learn to recognise objects after being fed training data, like speech recognition software on a smartphone. Others used or further developed existing software to apply complex filtering algorithms to the data, making sources stand out from the instrumental noise. These methods also demonstrated a high success rate for identifying sources.
“There isn’t necessarily one ‘correct’ approach, as the value of SDC2 is in seeing the variety of techniques deployed by teams,” says Dr Hartley. “By sharing what they have learned, what worked and what didn’t, everyone who took part is helping us to refine our processes going forward. It could well be that we use a combination of complementary methods in the future to analyse SKAO data.”
The sensitivity of the SKAO’s telescopes means they will “see” much more than existing telescopes, and dealing efficiently with large numbers of sources was a key focus of SDC2.
“Teams have told us this has been a hugely valuable experience, not just in dealing with the data, but also in bringing together lots of different groups and specialisms to focus on source finding, something which otherwise wouldn’t necessarily have happened until much later without the challenges,” Dr Hartley adds. “The knowledge and networks that have been built through SDC2 will form the foundation for even stronger collaborations in the future.”
The data set in SDC2 was more than 300 times larger than in 2019’s Science Data Challenge 1, and a more realistic example of what astronomers can expect from SKAO observations. This meant that rather than downloading it to their personal computers (impractical if not impossible given its size), participants were instead given access via the computing partner facilities, ensuring a level playing field regardless of the local download speed, storage or processing facilities of each team.
“The contributions made by the computing facilities, some of which will become SKA Regional Centres (SRCs) in the future, can’t be overstated. We are grateful for their involvement and eager to continue working together as we further develop the design for the SRC network which will be an essential part of the Observatory’s operations,” says SKAO Project Scientist Dr Anna Bonaldi, co-lead of the challenge.
As part of the challenge, and in keeping with its commitment to the principle of Open Science, the SKAO partnered with the Software Sustainability Institute to offer ‘reproducibility awards’. These will be given to teams whose code can be used by others to reproduce the same result, or re-used in part to develop other software. Links to the teams’ code repositories will be included in a paper the SKAO Science team is now working on with SDC2 participants, which will detail and analyse the methods used.
“That deeper analysis will be interesting even beyond our own community, because the nature of SDC2 means the techniques could find applications beyond astronomy, in areas where huge volumes of data need to be analysed efficiently,” Dr Anna Bonaldi says. “It’s really exciting to see where we go from here and how we can pool our knowledge to make the best possible use of SKAO data.”
There are already plans in the works for future challenges, and the SKAO is in talks with several of its Science Working Groups about the areas of science which could be selected next.
* The eight computing facilities which provided resources for SDC2 are:
AusSRC and Pawsey – Perth, Australia
China SRC-proto – Shanghai, China
CSCS – Lugano, Switzerland
ENGAGE SKA-UCLCA – Aveiro and Coimbra, Portugal
GENCI-IDRIS – Orsay, France
IAA-CSIC – Granada, Spain
INAF – Rome, Italy
IRIS (STFC) – UK
Read the press release from the Paris Observatory.
The full results can be found on the Science Data Challenge 2 website.
More information about the eight computing facilities can be found here.
Read more about the challenge and its goals here.