Global Earthquake–Tsunami Risk Assessment Dataset

Seismic Features, Temporal Trends, and Global Distribution Analysis (2001–2022)

Overview

The Global Earthquake–Tsunami Risk Assessment Dataset is a comprehensive, machine learning–ready seismic database designed for research in tsunami prediction, earthquake analysis, and hazard assessment.
It compiles 782 globally recorded earthquakes (2001–2022), detailing each event’s magnitude, depth, intensity, and tsunami potential.

This dataset serves as a vital foundation for predictive analytics, early warning models, and geophysical research at DatalytIQs Academy, where data science meets Earth science.

Dataset Highlights

Attribute Details
Time Period 2001 – 2022
Total Records 782 earthquakes
Coverage Global (Latitude −61.85° to 71.63°, Longitude −179.97° to 179.66°)
Completeness 100% (no missing values)
Target Variable Tsunami indicator (0 = No, 1 = Yes)
Format CSV (~41KB)

Tsunami Classification Summary:

  • Non-Tsunami Events: 478 (61.1%)

  • Tsunami-Potential Events: 304 (38.9%)

  • Balanced dataset suitable for binary classification

1. Descriptive Statistics

Feature Mean Std Min Max Description
Magnitude 6.94 0.45 6.5 9.1 Earthquake strength (Richter scale)
Depth (km) 75.9 137.3 2.7 670.8 Focal depth (km)
Significance (sig) 870.1 322.5 650 2910 Event hazard score
Latitude 3.54 27.3 −61.85 71.63 Global coverage
Longitude 52.61 117.9 −179.97 179.66 Epicentral positions
MMI (Mercalli) 5.96 1.46 1 9 Structural intensity
CDI (Community Intensity) 4.33 3.17 0 9 Perceived shaking
NST (Stations) 230.3 250.2 0 934 Monitoring density
Year 2012.3 6.1 2001 2022 Temporal coverage
Month 6.56 3.51 1 12 Seasonal distribution
Tsunami (binary) 0.39 0.49 0 1 Target variable

Summary

  • The average magnitude of 6.94 indicates consistent inclusion of major quakes (≥6.5).

  • Depth varies widely (2.7–670.8 km), confirming a mix of shallow and deep events.

  • The significance score (mean ≈ 870) implies most events were moderate-to-high hazard.

  • The binary tsunami indicator shows a healthy class balance, ensuring robust model training.

2. Magnitude vs Depth of Earthquakes

Insights:

  • High-magnitude earthquakes (≥8.0) are primarily shallow (≤100 km) — more likely to generate tsunamis.

  • Deep-focus events (≥500 km) are less destructive and rarely cause surface damage.

  • The clustering near the surface reflects plate boundary stress release zones, often near subduction regions.

3. Earthquake Frequency Over Time

Observations:

  • Global earthquake counts fluctuate between 25–55 per year, with peaks during 2010–2015, coinciding with events like the 2011 Japan (Tohoku) and 2010 Chile mega-quakes.

  • No long-term trend of increase or decline — suggesting episodic tectonic release rather than predictable cycles.

  • 2011 and 2013 marked years of unusually high seismic activity globally.

4. Global Distribution of Earthquakes

Patterns:

  • Earthquakes are concentrated along major tectonic plate boundaries, especially:

    • The Pacific Ring of Fire (Japan, Indonesia, Chile, Alaska).

    • The Himalayan–Eurasian belt.

    • The Mid-Atlantic Ridge.

  • Color gradient indicates magnitude intensity — lighter shades denote mega-quakes (>8.0).

  • The visualization validates the seismic clustering principle: energy concentrates where plates collide, subduct, or diverge.

5. Applications of the Dataset

  • Machine Learning Classification: Predict tsunami occurrence using seismic predictors.

  • Hazard Mapping: Visualize regional earthquake risk zones.

  • Temporal Modeling: Forecast periods of elevated seismic risk.

  • Magnitude Estimation Models: Predict quake magnitude from early sensor data.

  • Policy & Planning: Support evidence-based disaster preparedness.

Data Quality & Research Value

  • Zero missing values across all features.

  • Global dataset covering 22 years of records.

  • Balanced binary target (38.9% tsunami events).

  • Includes 28 major (≥8.0) earthquakes.

  • Ideal for machine learning, visualization, and policy analytics.

Acknowledgment

This study was conducted by DatalytIQs Academy, a global educational platform bridging Mathematics, Economics, Data Science, and Geoscience Analytics.

Data Source: Kaggle — Global Earthquake–Tsunami Risk Assessment Dataset (2001–2022)
Tools Used: Python, Pandas, Matplotlib, Seaborn, and JupyterLab

“At DatalytIQs Academy, we turn seismic data into seismic insight — empowering global resilience through analytics.”
Collins Odhiambo Owino, Founder

Comments

Leave a Reply