Google AI Introduces ‘Groundsource’: A New Methodology that Uses Gemini Model to Transform Unstructured Global News into Actionable, Historical Data

By Tokenboy March 13, 2026

Google AI Research team recently released Groundsource, a new methodology that uses Gemini model to extract structured historical data from unstructured public news reports. The project addresses the lack of historical data for rapid-onset natural disasters. Its first output is an open-source dataset containing 2.6 million historical urban flash flood events across more than 150 countries.

The Hydro-Meteorological Data Gap

Machine learning models for early warning systems (EWS) require extensive historical baselines for training and validation. However, hydro-meteorological hazards like flash floods lack standardized, global observation networks.

The Impact of Flash Floods: According to the World Meteorological Organization (WMO), flash floods cause approximately 85% of flood-related fatalities, resulting in over 5,000 deaths annually.
Limitations of Existing Data: Satellite-based databases, such as the Global Flood Database (GFD) and the Dartmouth Flood Observatory (DFO), are limited by cloud cover, satellite revisit times, and a bias toward long-lasting events.
Scale of the Deficit: The Global Disaster Alert and Coordination System (GDACS) provides an inventory of roughly 10,000 high-impact events. This volume is insufficient for training global-scale predictive models.

The Groundsource Methodology

To build a larger training corpus, Google’s research team developed a pipeline that processes decades of localized news reports to synthesize a historical baseline.

Semantic Parsing with Gemini: The LLM is deployed for entity extraction. It processes unstructured, multilingual text to identify specific hazard events, classify their severity, and filter out irrelevant noise.
Geospatial Mapping: The extracted text descriptions of flood locations are integrated with Google Maps APIs to assign precise geographic coordinates and polygonal boundaries to each event.

This pipeline successfully converts qualitative journalistic reporting into a highly structured, machine-readable dataset.

Application: Flash Flood Forecasting

Historically, Google’s Flood Forecasting Initiative focused on riverine floods, which develop slowly and are easier to track. Flash floods require distinct predictive approaches due to their rapid onset.

Using the 2.6-million-record Groundsource dataset, the research team trained a new AI model to predict urban flash flood risks up to 24 hours in advance. Empirical studies note that even a 12-hour lead time can reduce flash flood damage by 60%. These forecasts are now live on Google’s Flood Hub platform. The underlying dataset has been open-sourced to allow the broader data science community to train their own localized predictive models.

Key Takeaways

LLM-Driven Data Pipeline: Groundsource uses the Gemini model for semantic parsing to extract structured historical disaster data from unstructured, multilingual public news reports.
Massive Dataset Generation: The pipeline successfully produced an open-source dataset containing 2.6 million historical urban flash flood records across more than 150 countries.
Overcoming Sensor Limitations: This NLP-based approach addresses the historical ‘data desert,’ bypassing the physical constraints of remote sensing (such as cloud cover or satellite revisit times) and the limited volume of existing traditional databases like GDACS.
Geospatial Integration: Extracted natural language descriptions of hazard locations are integrated with Google Maps APIs to assign precise geographic coordinates and polygonal boundaries to each event.
Predictive Model Deployment: The resulting dataset was utilized to train a new AI model capable of predicting urban flash flood risks up to 24 hours in advance, which is now actively deployed on Google’s Flood Hub platform.

Check out Dataset, Pre-Print Paper and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Google AI Introduces ‘Groundsource’: A New Methodology that Uses Gemini Model to Transform Unstructured Global News into Actionable, Historical Data appeared first on MarkTechPost.

from MarkTechPost https://ift.tt/QHPh08E
via IFTTT

World Wire