Machine Learning for Air Quality and  CO2 Emissions: The Role of Data Understanding

Agnieszka Głowacka Agnieszka Głowacka; Bartosz Dziewit; Paulina Trybek

doi:10.18690/um.fov.5.2026

Machine Learning for Air Quality and CO2 Emissions: The Role of Data Understanding

Authors

Agnieszka Głowacka

University of Silesia, Faculty of Science and Technology

Bartosz Dziewit

University of Silesia, Institute of Physics

Paulina Trybek

University of Silesia, Institute of Physics

DOI: https://doi.org/10.18690/um.fov.5.2026.6

Synopsis

Machine Learning for Air Quality and CO₂ Emissions: The Role of Data Understanding: In recent years, the emergence of machine learning (ML) techniques has enabled increasingly sophisticated approaches to environmental prediction. However, comparatively little attention has been paid to the nature, origin, and methodological construction of the datasets underlying these models. This study investigates the role of data in ML-based environmental applications, focusing on two domains: greenhouse gas (GHG) emissions, particularly carbon dioxide (CO₂), and particulate matter concentrations (PM_{2.5} and PM_{10}). For Poland and Slovakia, a LightGBM model was trained to predict CO₂ emissions across all major economic sectors: Residential, Power, Transport, Industry, and Aviation. Predictive performance was highest in sectors with regular, seasonal emission patterns, while low-variability sectors such as Domestic Aviation posed greater challenges. For particulate matter, meteorological variables and time-related features were used to forecast PM_{2.5} and PM_{10} during the heating season. The models captured general temporal patterns, including short-term fluctuations and seasonal peaks, although extreme events were partially underestimated. Overall, the findings highlight that predictive accuracy is strongly influenced by the quality, resolution, and structure of input datasets, as well as by emission regularity and environmental conditions. This work underscores the importance of careful dataset design and preprocessing in ML applications for environmental monitoring, providing guidance for improving the reliability of emission and air quality forecasting.

Author Biographies

Agnieszka Głowacka, University of Silesia, Faculty of Science and Technology

Agnieszka Głowacka is a first-year Master’s student of Micro- and Nanotechnology at the University of Silesia in Katowice. Her interests focus on the application of modern technologies in the analysis and processing of scientific data. She is particularly interested in interdisciplinary approaches combining elements of physics, chemistry, and informatics, as well as practical applications of advanced technologies in science and industry.

Katowice, Poland. Email: agnieszka.glowacka@us.edu.pl

Bartosz Dziewit, University of Silesia, Institute of Physics

Bartosz Dziewit is an assistant professor at the Faculty of Science and Technology of the University of Silesia in Katowice, affiliated with the Institute of Physics, and cur-rently serves as the Director of the Applied Computer Science program. His research focuses on particle physics (especially neutrino physics), data analysis, and computer science, and he is actively involved in teaching and supervising students in areas such as computer systems, networks, and cybersecurity

Katowice, Poland. E-mail: bartosz.dziewit@us.edu.pl

Paulina Trybek, University of Silesia, Institute of Physics

Paulina Trybek is an assistant professor at the University of Silesia in Katowice, affil-iated with the Institute of Physics, where she specializes in the analysis of biomedical time series. She is actively involved in numerous student projects, supporting the de-velopment of data analysis competencies. She is also the coordinator of the project “Developing Talents in Artificial Intelligence to Solve Disruptive Environmental Prob-lems”.

Katowice, Poland: E-mail: paulina.trybek@us.edu.pl

Downloads

PDF

Volume

Artificial Intelligence and Environmental Challenges: Research Insights and Emerging Solutions

Pages

107-130

Published

June 18, 2026

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Agnieszka Głowacka, A. G., Dziewit, B., & Trybek, P. (2026). Machine Learning for Air Quality and CO2 Emissions: The Role of Data Understanding. In R. Leskovar (Ed.), Artificial Intelligence and Environmental Challenges: Research Insights and Emerging Solutions (pp. 107-130). University of Maribor Press. https://doi.org/10.18690/um.fov.5.2026.6

Download Citation