The amount of data produced each year by scientific user facilities such as those at national labs or government organizations can range up to several billion gigabytes per year. This massive amount of data generation has now begun to exceed researchers’ capacity to effectively parse this data in order to accomplish their scientific goals — a supersized problem when it comes to achieving new scientific advances.
To develop new mathematical and computational techniques to reduce the size of these data sets, the U.S. Department of Energy (DOE) awarded $13.7 million to nine projects as part of the Advanced Scientific Computing Research (ASCR) program in September 2021. A team led by Byung-Jun Yoon, associate professor in the Department of Electrical and Computer Engineering at Texas A&M University, has received $2.4 million to address the challenges of moving, storing and processing the massive data sets produced and processed by scientific workflows.
The overarching principle of this project is to focus on the scientific objectives of each data set and maintain retention of the quantities of interest (QoI) that pertain to the objectives. By optimizing the representation of the data while keeping the focus on the scientific goals at hand, Yoon’s team is able to preserve the important information that can lead to scientific breakthroughs despite the significant reduction in data size.
“Our idea is to not only significantly reduce the amount of data but to ultimately preserve the goals for which the data is intended to serve,” Yoon said. “That’s why we call it the objective-based data reduction for scientific workflows. We want to reduce the amount of data but not sacrifice the quantities or qualities of interest.”
Read more at Texas A&M University
Photo Credit: geralt via Pixabay