If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

10 Best Free Climate and Environment Datasets for Machine Learning

If you would like to contribute your own blog post, feel free to reach out to us via blog@deepchecks.com. We typically pay a symbolic fee for content that’s accepted by our reviewers.


Climate change is the greatest threat of our time, where the main discussion of this issue revolves around data. Many data scientists have developed models and visualizations to measure and track changes in surface temperatures, sea ice levels, global warming, and other environmental variables using climate change and environmental datasets. Numerous datasets are available to the public, and mentioned indicators are simple measures that provide a cost-effective way to track the state of the environment and may alert us to impending environmental issues. This blog will present ten free datasets for machine learning that will give one a place to apply in their own ML projects. Please note that users must be registered to provided repositories to access free downloading.

Global warming

Global warming (Source)

1. Climate Change: Earth Surface Temperature Data

Berkeley Earth, an independent U.S. non-profit institution concentrated on environmental data science affiliated with Lawrence Berkeley National Laboratory, provides the first dataset. The Berkeley Earth Surface Temperature Study incorporates 1.6 billion temperature reports from 16 pre-existing libraries. This dataset includes the following files with global variables/features: Land and Ocean-and-Land Temperatures, Average Land Temperature by Country, Average Land Temperature by State, Land Temperatures By Major City, and Land Temperatures By City. The size of the dataset file is 600 MB and can be downloaded here.

2. International Greenhouse Gas Emissions

The following dataset is related to the Greenhouse Gas (GHG) Inventory. It contains the most recently submitted data, covering the period from 1990. The GHG data hold information on anthropogenic emissions by sources and expatriations by sinks of carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O), and a few more chemical compounds. The size of the dataset is 1.01 MB and can be downloaded here.

Greenhouse Gas Emissions

Gas emisions (Source)

3. Daily Sea Ice Extent Data

The National Snow and Ice Data Center (NSIDC) promotes research into our planet’s frozen domains, which include snow, glaciers, ice, icy ground, and climate influence. The NSIDC governs and disseminates scientific data, designs data access tools, assists users, performs scientific research, and educates the community about the cryosphere. Their dataset provides the total extent for each day from 1978 until 2015 and consists of 7 variables: Year, Month, Day, Extent, Missing, Source and Hemisphere. The dataset can be found here, and its size is 4.29 MB.

Sea Ice Extent Data

Sea ice (Source)

4. Temperature Change Dataset

The FAOSTAT Temperature Change domain issues annual statistics on mean surface temperature change by country. They provide distribution, including data from 1961 to 2021. The dataset consists of monthly, seasonal, and annual temperature values covering more than 250 countries. These files can be downloaded here, and the size of the whole archive is 14.5 MB.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks HubOur GithubOpen Source

5. World Bank Climate Change Data

Climate systems, resilience, exposure to climate impacts, greenhouse gas emissions, and energy use data are incorporated in a dataset provided by World Development Indicators and the Climate Change Knowledge Portal. In addition to the available data, the Climate Change Knowledge Portal has a web interface to a collection of water indicators that can be used to assess the influence of climate change across over 8,000 water basins worldwide. The size of this dataset is 5.7 MB, which can be downloaded here.

6. Air Quality Annual Summary

The Environmental Protection Agency (EPA) creates air quality data using measurements from monitors across the country, and data comes from EPA’s Air Quality System (AQS). This dataset contains 55 features, such as State Code, Metric Used, Year, 1st to 4th Max Value, and others. The size of this dataset is 994 MB and can be downloaded here.

Air Quality

Air Quality (Source)

7. VEMAP 2: Annual Ecosystem Model Responses to U.S. Climate Change, 1994-2100

The VEMAP Project’s Phase 2 created historical (1895-1993) gridded climate data sets (temperature, precipitation, solar radiation, humidity, and wind speed) as well as projected (1994-2100) gridded annual and monthly climate data sets using outputs from two climate system models [CCCma (Canadian Centre for Climate Modeling and Analysis) and Hadley Centre models]. This dataset size is 747 MB and can be downloaded here.

8. Climate Change Tweets Ids

This dataset incorporates the tweet ids of 39,622,026 tweets associated with climate change. The data were collected from September 21, 2017, to May 17, 2019 (with a gap in data collection between January 7, 2019, and April 17, 2019) from the Twitter API using Social Feed Manager. For collecting the tweets, the following keywords were used: #climatechange, #climatechangeisreal, #actonclimate, #globalwarming, #climatechangehoax, #climatedeniers, #climatechangeisfalse, #globalwarminghoax, #climatechangenotreal. This dataset can be downloaded on this webpage, and its size is 739 MB.

Climate Change

Climate change (Source)

9. Global Environmental Indicators

Environmental indicators assist us in comprehending and analyzing the state of the planet. This dataset includes various environmental indicators: Air and Climate, Biodiversity, Energy and Minerals, Forest, Governance, Inland Water Resources, Land and Agriculture, Marine and Coastal Areas, Natural Disasters, and Waste. This dataset’s size is 3.5 MB and can be downloaded here.

10. EU emission trading system

This dataset is related to the EU emission trading system (ETS). ETS achieve cost-efficient reductions of greenhouse gas emissions and reach its targets under the Kyoto Protocol and other commitments. The data mainly comes from the EU Transaction Log (EUTL), and this dataset has seven features: country_code, country, main activity sector name, ETS information, year, value, and unit. Its size is 5 MB, and it can be downloaded here.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Deepchecks Hub Our GithubOpen Source

Recent Blog Posts

Reducing Bias and Ensuring Fairness in Machine Learning
Reducing Bias and Ensuring Fairness in Machine Learning

Testing your NLP Models:
Hands-On Tutorial
March 29th, 2023    18:00 PM IDT

Register Now