Handling a deluge of big data

Due to the sheer volume of data it will have to transport, process, store and distribute to its end users around the globe, the SKA project is considered by many the ultimate Big Data challenge. Learn below about the different parts of the data pipeline.

An average of 8 terabits per second of data will be transferred over hundreds of kilometres from the SKA-Low telescope in the Murchison outback in Australia to the processing facility in Perth. For the SKA-Mid telescope in South Africa, the design is similar, but the data rates are higher - the transfer rate from the telescope in the Karoo desert to the processing facility in Cape Town is around 20 terabits per second. This is approximately 1,000 times the equivalent data rate generated by the Atacama Large Millimeter/submillimeter Array (ALMA), the current state-of-the-art astronomy facility located in the Chilean Andes, and 100,000 times faster than the projected global average home broadband speed for 2022! (Source: CISCO). 

Because signals from space reach each antenna at a slightly different time, the signals must first be aligned. This is done thanks to highly precise atomic clocks that timestamp the time each signal arrived. Signals are then processed in one of two ways: 

  • They can be stacked. This allows the detection of transient objects like pulsars, gamma-ray bursts and fast radio bursts, but also to measure the arrival time of known signals very precisely and if a small time delay is detected, infer that a gravitational wave passed in the space between us and the object a. This is called time-domain astronomy. 
  • They can be multiplied. This allows the creation of images and is called image-domain astronomy.

Data is then transferred to two high-performance supercomputers called Science Data Processors (SDPs), To process this enormous volume of data, the two SDP supercomputers will each have a processing speed of ~135 PFlops, which would have placed them in the top three of the fastest supercomputers on Earth in 2020 (Source: Top500; June 2020). 

In total, the SKAO will archive 300 petabytes of data per year. This would fill the data storage capacity of about half a  million typical laptops every year by today’s standard!

From the SDP supercomputers, data will be distributed via intercontinental telecommunications networks to SKA Regional Centres in the SKAO Member States where science products will be stored for access by the end users, the astronomers, to conduct their science and improve our knowledge of the Universe.