Space Industry and Business News
ROBO SPACE
New datasets aim to teach AI models cross-disciplinary scientific thinking
illustration only
New datasets aim to teach AI models cross-disciplinary scientific thinking
by Clarence Oxford
Los Angeles CA (SPX) Dec 03, 2024

What can exploding stars reveal about blood flow in arteries, or how might swimming bacteria inform our understanding of ocean dynamics? Researchers from leading institutions have taken a major step forward in training artificial intelligence (AI) models to draw insights across disciplines to unlock scientific discoveries.

The initiative, known as Polymathic AI, leverages advanced technology similar to large language models like ChatGPT, but instead of processing text, it uses datasets from fields such as astrophysics, biology, chemistry, and fluid dynamics. This approach equips the models with cross-disciplinary scientific capabilities.

"These groundbreaking datasets are by far the most diverse large-scale collections of high-quality data for machine learning training ever assembled for these fields," said Michael McCabe, a research engineer at the Flatiron Institute in New York City and a member of Polymathic AI. "Curating these datasets is a critical step in creating multidisciplinary AI models that will enable new discoveries about our universe."

The Polymathic AI team has released two open-source datasets, collectively comprising 115 terabytes of data sourced from dozens of contributors. This massive resource is available to the public and is expected to accelerate the development of AI models capable of solving complex scientific problems. For comparison, GPT-3 required only 45 terabytes of unfiltered data during its training phase.

"The freely available datasets are an unprecedented resource for developing sophisticated machine learning models that can then tackle a wide range of scientific problems," added Ruben Ohana, a research fellow at the Flatiron Institute's Center for Computational Mathematics. "Open-sourcing this data benefits both the machine learning and scientific communities, creating a win-win situation."

The datasets are hosted on HuggingFace, a popular platform for AI models and data, and detailed in papers accepted for presentation at the prestigious NeurIPS conference in Vancouver, Canada.

"We've seen again and again that the most effective way to advance machine learning is to take difficult challenges and make them accessible to the wider research community," said McCabe. "When a new benchmark is released, it initially seems insurmountable. But opening access accelerates progress far beyond what any individual group could achieve."

Polymathic AI is a collaborative effort involving researchers from institutions such as the Simons Foundation, Flatiron Institute, New York University, and the Lawrence Berkeley National Laboratory.

The first dataset, named the Multimodal Universe, focuses on astrophysics and includes hundreds of millions of observations, such as images from NASA's James Webb Space Telescope and stellar data from ESA's Gaia spacecraft. "Machine learning has been happening for around 10 years in astrophysics, but it's still very hard to use across instruments, missions, and disciplines," said Polymathic AI researcher Francois Lanusse. "Datasets like the Multimodal Universe allow us to create models that natively understand this data and act as a Swiss Army knife for astrophysics."

The second dataset, dubbed the Well, spans 15 terabytes of data across 16 diverse datasets. It features simulations of biological systems, fluid dynamics, supernovae, and more, all rooted in mathematical equations called partial differential equations. These equations appear in a wide array of scientific problems but are notoriously difficult to solve. "This dataset encompasses a diverse range of physics simulations designed to address key limitations of current machine learning models," said Polymathic AI member Rudy Morel.

Building these datasets required extensive collaboration. "The creators of numerical simulations are sometimes skeptical of machine learning because of the hype, but they're curious about how it can benefit their research," Ohana explained.

The team is now using the datasets to train AI models, with early results showing promise. "Understanding how machine learning models generalize and interpolate across datasets from different physical systems is an exciting research challenge," said Polymathic AI member Regaldo-Saint Blancard.

Shirley Ho, project lead and group leader at the Flatiron Institute, noted, "Just like the Protein Data Bank spawned AlphaFold, I'm excited to see what the Well and the Multimodal Universe will help create." Ho will present Polymathic AI's findings at NeurIPS.

Related Links
Polymathic AI
Simons Foundation
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
Altman says Trump will keep US in AI lead; as Musk trolls OpenAI with profiteering suit
Washington (AFP) Dec 1, 2024
OpenAI CEO Sam Altman on Sunday expressed confidence that US President-elect Donald Trump's administration would support the artificial intelligence sector to ensure the United States and its allies continue to lead it. Speaking to conservative US broadcaster Fox News on Sunday, Altman said AI technology needed massive infrastructure support and that he believed Trump would be good at providing it. "We need to build that here and we need to be able to have the best AI infrastructure in the world ... read more

ROBO SPACE
A new way to create realistic 3D shapes using generative AI

Speaking crystal AI predicts atomic arrangements to aid material discovery

Scientists explore sustainable use of fly ash for water treatment

Cracking the Code for materials that can learn

ROBO SPACE
China launches communication technology satellite aboard Long March 3B

Orbit secures $9M contract to provide satellite communication systems for Israeli defense forces

Airbus to deliver advanced satellite modems to UK MoD for Skynet comms

Fleet Space Centauri 6 advances resilient SATCOM for defence

ROBO SPACE
ROBO SPACE
Deciphering city navigation AI advances GNSS error detection

GPS alternative for drone navigation leverages celestial data

China advances next-generation BeiDou satellite navigation system

Space Systems Command and U.S. Navy achieve major MGUE program milestone

ROBO SPACE
Study defines sustainable aviation and provides framework for progress

Qatar to invest 1 bn pounds in climate technologies with UK

Macron says Paris, Riyadh have 'will' to progress fighter jet sale

South Korea scrambles jets as Chinese, Russian warplanes approach

ROBO SPACE
New AI cracks complex engineering problems faster than supercomputers

Researchers design new materials for advanced chip manufacturing

Superconducting quantum processors enable precise insights into quantum transport

US clean energy, defense to be impacted by China export curbs

ROBO SPACE
Neo Space Group to acquire UP42 earth observation platform from Airbus

How Mobile Technology is Changing the Geospatial Game

NASA data reveals role of green spaces in cooling cities

Commercial Earth Observation to exceed $8 billion by 2033

ROBO SPACE
Rio Tinto's Bougainville mine poses ongoing threats: report

With blasts and grit, Colombia fights gold mines run by crime gangs

Can insects play a role in reducing microplastic pollution

Relief as Delhi schools reopen but smog crisis persists

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.