Seismic Features, Temporal Trends, and Global Distribution Analysis (2001–2022)
Overview
The Global Earthquake–Tsunami Risk Assessment Dataset is a comprehensive, machine learning–ready seismic database designed for research in tsunami prediction, earthquake analysis, and hazard assessment.
It compiles 782 globally recorded earthquakes (2001–2022), detailing each event’s magnitude, depth, intensity, and tsunami potential.
This dataset serves as a vital foundation for predictive analytics, early warning models, and geophysical research at DatalytIQs Academy, where data science meets Earth science.
Dataset Highlights
| Attribute | Details |
|---|---|
| Time Period | 2001 – 2022 |
| Total Records | 782 earthquakes |
| Coverage | Global (Latitude −61.85° to 71.63°, Longitude −179.97° to 179.66°) |
| Completeness | 100% (no missing values) |
| Target Variable | Tsunami indicator (0 = No, 1 = Yes) |
| Format | CSV (~41KB) |
Tsunami Classification Summary:
-
Non-Tsunami Events: 478 (61.1%)
-
Tsunami-Potential Events: 304 (38.9%)
-
Balanced dataset suitable for binary classification
1. Descriptive Statistics
| Feature | Mean | Std | Min | Max | Description |
|---|---|---|---|---|---|
| Magnitude | 6.94 | 0.45 | 6.5 | 9.1 | Earthquake strength (Richter scale) |
| Depth (km) | 75.9 | 137.3 | 2.7 | 670.8 | Focal depth (km) |
| Significance (sig) | 870.1 | 322.5 | 650 | 2910 | Event hazard score |
| Latitude | 3.54 | 27.3 | −61.85 | 71.63 | Global coverage |
| Longitude | 52.61 | 117.9 | −179.97 | 179.66 | Epicentral positions |
| MMI (Mercalli) | 5.96 | 1.46 | 1 | 9 | Structural intensity |
| CDI (Community Intensity) | 4.33 | 3.17 | 0 | 9 | Perceived shaking |
| NST (Stations) | 230.3 | 250.2 | 0 | 934 | Monitoring density |
| Year | 2012.3 | 6.1 | 2001 | 2022 | Temporal coverage |
| Month | 6.56 | 3.51 | 1 | 12 | Seasonal distribution |
| Tsunami (binary) | 0.39 | 0.49 | 0 | 1 | Target variable |
Summary
-
The average magnitude of 6.94 indicates consistent inclusion of major quakes (≥6.5).
-
Depth varies widely (2.7–670.8 km), confirming a mix of shallow and deep events.
-
The significance score (mean ≈ 870) implies most events were moderate-to-high hazard.
-
The binary tsunami indicator shows a healthy class balance, ensuring robust model training.
2. Magnitude vs Depth of Earthquakes

Insights:
-
High-magnitude earthquakes (≥8.0) are primarily shallow (≤100 km) — more likely to generate tsunamis.
-
Deep-focus events (≥500 km) are less destructive and rarely cause surface damage.
-
The clustering near the surface reflects plate boundary stress release zones, often near subduction regions.
3. Earthquake Frequency Over Time

Observations:
-
Global earthquake counts fluctuate between 25–55 per year, with peaks during 2010–2015, coinciding with events like the 2011 Japan (Tohoku) and 2010 Chile mega-quakes.
-
No long-term trend of increase or decline — suggesting episodic tectonic release rather than predictable cycles.
-
2011 and 2013 marked years of unusually high seismic activity globally.
4. Global Distribution of Earthquakes

Patterns:
-
Earthquakes are concentrated along major tectonic plate boundaries, especially:
-
The Pacific Ring of Fire (Japan, Indonesia, Chile, Alaska).
-
The Himalayan–Eurasian belt.
-
The Mid-Atlantic Ridge.
-
-
Color gradient indicates magnitude intensity — lighter shades denote mega-quakes (>8.0).
-
The visualization validates the seismic clustering principle: energy concentrates where plates collide, subduct, or diverge.
5. Applications of the Dataset
-
Machine Learning Classification: Predict tsunami occurrence using seismic predictors.
-
Hazard Mapping: Visualize regional earthquake risk zones.
-
Temporal Modeling: Forecast periods of elevated seismic risk.
-
Magnitude Estimation Models: Predict quake magnitude from early sensor data.
-
Policy & Planning: Support evidence-based disaster preparedness.
Data Quality & Research Value
-
Zero missing values across all features.
-
Global dataset covering 22 years of records.
-
Balanced binary target (38.9% tsunami events).
-
Includes 28 major (≥8.0) earthquakes.
-
Ideal for machine learning, visualization, and policy analytics.
Acknowledgment
This study was conducted by DatalytIQs Academy, a global educational platform bridging Mathematics, Economics, Data Science, and Geoscience Analytics.
Data Source: Kaggle — Global Earthquake–Tsunami Risk Assessment Dataset (2001–2022)
Tools Used: Python, Pandas, Matplotlib, Seaborn, and JupyterLab
“At DatalytIQs Academy, we turn seismic data into seismic insight — empowering global resilience through analytics.”
— Collins Odhiambo Owino, Founder
Leave a Reply
You must be logged in to post a comment.