EN
A novel method was proposed for identifying air quality in China. Causality analysis-based significance tests combined with different machine-learning algorithms were carried out to achieve an automated and accurate classification. To this end, the most developed 100 cities in China were selected as study areas. We analyzed meteorological factors such as temperature, humidity, precipitation, wind speed, air pressure, sunshine duration, evaporation and grand surface temperature, and the individual industrial pollutants of NO₂, SO₂, CO and O₃ by means of time series from a large amount of air monitoring data, and focused on the causality influence of the accumulative process of each pollution ingredient on PM₂.₅. In order to better clarify the formation of haze, joint regression models were established to quantify the influence degree of different factors on the cause of PM₂.₅. Different classification models, including KNN, SVM, ensemble and decision tree were trained and tested to predict air quality. An accuracy of 90.2% with the ensemble (boosted trees) classifier was obtained in this study. Results of feature selection and classification both indicated that NO₂ took an important role in the contribution of PM₂.₅ concentrations during 2015-2017 in China.