Dunkel, J., Dominguez, D., Borzdynski, Ó. G., & Sánchez, Á. (2022). Solid Waste Analysis Using Open-Access Socio-Economic Data. Sustainability, 14(3), 1233.
Nowadays, problems related with solid waste management become a challenge for most countries due to the rising generation of waste, related environmental issues, and associated costs of produced wastes. Effective waste management systems at different geographic levels require accurate forecasting of future waste generation. In this work, we investigate how open-access data, such as provided from the Organisation for Economic Co-operation and Development (OECD), can be used for the analysis of waste data. The main idea of this study is finding the links between socio-economic and demographic variables that determine the amounts of types of solid wastes produced by countries. This would make it possible to accurately predict at the country level the waste production and determine the requirements for the development of effective waste management strategies. In particular, we use several machine learning data regression (Support Vector, Gradient Boosting, and Random Forest) and clustering models (k-means) to respectively predict waste production for OECD countries along years and also to perform clustering among these countries according to similar characteristics. The main contributions of our work are: (1) waste analysis at the OECD country-level to compare and cluster countries according to similar waste features predicted; (2) the detection of most relevant features for prediction models; and (3) the comparison between several regression models with respect to accuracy in predictions. Coefficient of determination (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE), respectively, are used as indices of the efficiency of the developed models. Our experiments have shown that some data pre-processings on the OECD data are an essential stage required in the analysis; that Random Forest Regressor (RFR) produced the best prediction results over the dataset; and that these results are highly influenced by the quality of available socio-economic data. In particular, the RFR model exhibited the highest accuracy in predictions for most waste types. For example, for “municipal” waste, it produced, respectively, R2 = 1 and MAPE=4.31 global error values for the test set; and for “household” waste, it, respectively, produced R2 = 1 and MAPE=3.03. Our results indicate that the considered models (and specially RFR) all are effective in predicting the amount of produced wastes derived from input data for the considered countries.