Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2007 | 16 | 4 |

Tytuł artykułu

Chemometric treatment of missing elements in air quality data sets

Warianty tytułu

Języki publikacji



The article reports the results of an exploratory analysis of an air monitoring data set, collected at a monitoring station in the biggest, most congested and most polluted city of the silesian region, Katowice. In order to extract important information on air pollution in this city, the strategy of exploring the data set with missing elements and outliers simultaneously existing in the data was used. The strategy assumed the initial estimation of missing elements based on the application of robust Partial Least Squares (rPLS) and outli­ers identification based on the so-called robust distance. After outliers identification and replacing them with missing elements, the Expectation-Maximization iterative approach (built into Principal Component Analysis (PCA)) was used for the construction of the final model.








Opis fizyczny



  • Central Mining Institute, Plac Gwarkow 1, 40-166 Katowice,Poland


  • 1. MCLACHLAN G.J., KRISHNAN T. The EM Algorithm and Extensions.; John Wiley & Sons: New York, 1997.
  • 2. WALCZAK B. Dealing with missing data. Part 1. Chemom. Intell. Lab. Syst., 58, 15, 2001.
  • 3. WALCZAK B. Dealing with missing data. Part 2. Chemom. Intell. Lab. Syst., 58, 29, 2001.
  • 4. FISHER R.A. Theory of statistical estimation. Proc. Cambr. Phil. Soc, 22, 700, 1925.
  • 5. MUTEKI K., MACGREGOR J.F., Ueda T. Estimation of missing data using latent variable methods with auxiliary information. Chemom. Intell. Lab. Syst., 78, 41, 2005.
  • 6. STANIMIROVA I., SIMEONOV V. Modeling of environ­mental four-way data from air quality control. Chemom. Intell. Lab. Syst., 77, 115, 2005.
  • 7. STOICA P., XU L., LI J. A new type of parameter estimation algorithm for missing data problems. Stat. Prob. Letters, 75, 219, 2005.
  • 8. WOODWARDA W.A., SAINB S. Testing for outliers from a mixture distribution when some data are missing. Comput. Stat. Data Analysis, 44, 193, 2003.
  • 9. RUBIN D.B. Multiple Imputation for Nonresponse in Sur­vey.; John Wiley & Sons: New York, 1987.
  • 10. HO P., SILVA M.C.M., HOGG T.A. Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port. Chemom. Intell. Lab. Syst., 55, 1, 2001.
  • 11. HUI D., WAN S., SUA B., KATUL G., MONSONC R., LUO Y. Gap-filling missing data in eddy covariance mea­surements using multiple imputation (MI) for annual estimations. Agricul. and Forest Meteorology, 121, 93, 2004.
  • 12. CROUX C., RUIZ-GAZEN A. A fast algorithm for ro­bust principal components based on projection pursuit. (In: A. Prat (Ed.), Compstat: Proceedings in Computa­tional Statists; Heidelberg: Physica-Verlag, pp 211-216, 1996.
  • 13. HUBERT M., ROUSSEEUW P.J., VERBOVEN S. A fast method for robust principal components with applications to chemometrics. Chemom. Intell. Lab. Syst., 60, 101, 2002.
  • 14. STANIMIROVA I., WALCZAK B., MASSART D.L., SIMEONOV V. A comparison between two robust PCA al­gorithms. Chemom. Intell. Lab. Syst., 71, 83, 2004.
  • 15. VANLANDUIT S., CAUBERGHE B., GUILLAUME P., VERBOVEN P., PARLOO E. Reduction of large frequency response function data sets using a robust singular value de­composition. Comp. & Struct., 84, 808, 2006.
  • 16. SMOLIŃSKI A., WALCZAK B., EINAX, J.W. Exploratory analysis of data sets with missing elements and outliers. Chemosph., 49, 233, 2002.
  • 17. JOLIFFE I.T. Principal Components Analysis.; Springer: New York, 1986.
  • 18. Wold S. Principal Components Analysis. Chemom. Intell. Lab. Syst., 2, 37, 1987.
  • 19. VANDEGINSTE B.G.M., MASSART D.L., BUYDENS L.M.C., DEJONG S., LEWI P.J., SMEYERS-VERBEKE J. Handbook of Chemometrics and Qualimetrics: Part B.; Elsevier: Amsterdam, pp 87-150, 1998.
  • 20. SINGH C.V., Pattern characteristics of Indian monsoon rainfall using principal component analysis (PCA). Atmo- sph. Research, 79, 317, 2006.
  • 21. KANYA Z., FORGACS E., CSERHATI T., ILLES Z. Re­ducing Dimensionality in Principal Component Analysis - A Method Comparison. Chromatographia, 63, 129, 2006.
  • 22. WALCZAK B. Outlier detection in bilinear calibration. Chemom. Intell. Lab. Syst., 29, 63, 1995.
  • 23. WALCZAK B. Outlier detection in multivariate calibration. Chemom. Intell. Lab. Syst., 28, 259, 1995.
  • 24. DASZYKOWSKI M., STANIMIROVA I.,, WALCZAK B., DAEYAERT F., DEJONGE M.R., HEERES J., KOYMAN- SC L.M.H., LEWI P.J., VINKERS H.M., JANSSEN P.A., MASSART D.L. Improving QSAR models for the biologi­cal activity of HIV Reverse Transcriptase inhibitors: Aspects of outlier detection and uninformative variable elimination. Talanta, 68, 54, 2005.
  • 25. SERNEELSA S., FILZMOSERB P., CROUXC C., VANE- SPEN P. J., Robust continuum regression. Chemom. Intell. Lab. Syst., 76, 197, 2005.
  • 26. VERBOVENA S., HUBERT M. LIBRA: a MATLAB li­brary for robust analysis. Chemom. Intell. Lab. Syst., 75, 127, 2005.
  • 27. ROUSSEUW P.J., VANZOMEREN B.C. Unmasking Mul­tivariate Outliers and Leverage Points. J. Amer. Stat. Assoc., 85, 633, 1990.
  • 28. FILZMOSERA P., GARRETT R.G., REIMANN C., Multi­variate outlier detection in exploration geochemistry. Comp. & Geosciences, 31, 579, 2005.
  • 29. CHIANG L.H., PELL R.J., SEASHOLTZ M.B., Exploring process data with the use of robust outlier detection algo­rithms. J. Process Contr., 13, 437, 2003.
  • 30. PIERNA J., JINA L., DASZYKOWSKI M., WAHL F., MASSART D.L. A methodology to detect outliers/inliers in prediction with PLS. Chemom. Intell. Lab. Syst., 68, 17, 2003.
  • 31. PIERNA J.A.F., WAHL F., DENOORD O., MASSART D.L. Methods for outlier detection in prediction. Chemom. Intell. Lab. Syst., 63, 27, 2002.
  • 32. HUBERT M., ROUSSEEUW P.J., VERBOVEN S. A fast method for robust principal components with applications to chemometrics. Chemom. Intell. Lab. Syst., 60, 101, 2002.
  • 33. ROUSSEEUW P. J., CROUX C. Alternatives to the Me­dian Absolute Deviation. J. Amer. Stat. Assoc, 88, 1273, 1992.
  • 34. MARTENS H., NAES T. Multivariate Calibration.; John Wiley & Sons: New York, 1989.
  • 35. WOLD S., MARTENS H., WOLD H. The Multivariate Cal­ibration Problem in Chemistry Solved by the PLS Method, Lecture Notes in Mathematics.; Springer-Verlag: Heidel­berg, 1983.
  • 36. DINC E., USTUNDAG O., Application of Multivariate Cal­ibration Techniques to HPLC Data for Quantitative Analysis of a Binary Mixture of Hydrochlorothiazide and Losartan in Tablets. Chromatographia, 61, 237, 2005.
  • 37. HUANG J., BRENNAN D., SATTLER L., ALDERMAN J., LANE B., O'MATHUNA C. A comparison of calibra­tion methods based on calibration data size and robustness. Chemom. Intell. Lab. Syst., 62, 25, 2002.
  • 38. GELADI P., Some recent trends in the calibration literature. Chemom. Intell. Lab. Syst., 60, 211, 2002.
  • 39. GOLDBERG D.E. Genetic Algorithms in Search Optimization, and Machine Learning.; Addison-Wesley: New York, 1989.
  • 40. LUCASIUS C.B., KATEMAN, G. Understanding and using genetic algorithms. Part I. Concepts, properties and context. Chemom. Intell. Lab. Syst., 19, 1, 1993.
  • 41. LAVINE B.K., DAVIDSON C.E., MOORES A.J. Innova­tive genetic algorithms for chemoinformatics. chemom. In­tell. Lab. Syst., 60, 161, 2002.
  • 42. LAVINE B.K., DAVIDSON C.E., MOORES A.J. Genetic algo­rithms for spectral pattern recognition. Vibr. Spectr., 28, 83, 2002.
  • 43. USTUN B., MELSSEN W.J., OUDENHUIJZEN M., BUYDENS L.M.C. Determination of optimal support vec­tor regression parameters by genetic algorithms and simplex optimization. Anal. Chim. Acta, 544, 292, 2005.
  • 44. MICHALEWICZ Z. Genetic Algorithms + Data Structures = Evolution Programs.; Springer-Verlag: New York, 1992.
  • 45. JUN Y., XIANDE L., LU H. Evolutionary game algorithm for continuous parameter optimization. Inf. Process. Lett., 91, 211, 2004.
  • 46. WOLD S. Cross-validatory estimation of the number of components in factor and principal components models. Technometrics, 20, 397, 1978.

Typ dokumentu



Identyfikator YADDA

JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.