Application of Bayesian Inference Model Variational Bayesian Principal Component Analysis (VBPCA) for Handling Missing Data in Principal Component Analysis

Authors

  • Ricky Yordani Sekolah Tinggi Ilmu Statistik

DOI:

https://doi.org/10.34123/jurnalasks.v8i1.12

Keywords:

Variational Bayesian PCA, Principal Component Analysis, Missing Value, Incomplete Data

Abstract

In standard Principal Component Analysis (PCA) comes one problem in addressing the set of incomplete data. The standard PCA procedure on incomplete data is to eliminate (listwise deletion procedure) or using the mean of the variable, this procedure may result in loss information from these observations. Another method used is to integrate Expectation Maximization (EM) to the method of Probabilistic Principal Component Analysis (PPCA). But PPCA can produce overfitting response prediction. In this study discussed the Variational Bayesian Principal Component Analysis (VBPCA) which is a method of development of PPCA method by incorporating prior information from the distribution of the principal components of the model parameters. From the simulation studies by eliminating the data through the concept of missing at random (MAR), obtained results that the value of the correlation scores principal components complete data with the principal component score predicted results PPCA method is superior when compared with VBPCA, as well as to the value of the correlation scores for the various percentages are generally incomplete data. However, judging from the size of a match between the response to predictions by the size normalized root mean square error of prediction (NRMSEP) VBPCA method produces better than PPCA.

Downloads

Download data is not yet available.

References

Bakshi, B. R., Nounou, M. N., Goel, P. K., dan Shen, X. (tanpa tahun). Bayesian Principal Component Analysis. Melalui < http://classifion.com/References/BayesianPCA.pdf > [15/9/15]
Banda, J.P. 2003. Nonsampling Errors in Surveys. Expert Group Meeting to Review the Draft Handbook on Designing of Household Sample Survey. Melalui <http://unstats.un.org/unsd/demographic/meetings/egm/Sampling_1203/docs/n o_7.pdf >[12/01/16]
Baraldi, A. N. dan Enders, C. K. 2010. An introduction to Modern Missing Data Analyses. Journal of School Psychology 48, 5–37.
Bentley, J. P. 2009. Missing Data: An Introduction (with a focus on multiple imputation). Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology. Melalui <http://www.mcsr.olemiss.edu/mathematica/Missing%20Data%20%20An%20introduction.pdf.copy > [6/8/15]
Bernard, C., Michel dan Jegou, H. 2008. Chris Bishop’s Pattern Recognition and Machine Learning, Ch. XII. Contiuous Latent Variables. Melalui <http://lear.inrialpes.fr/~jegou/bishopreadingg roup/chap12.pdf> [18/9/15] Bishop, C. M. 1999. Variational Principal Component. Ninth International Conference on Artificial Neural Networks, ICANN, IEE Vol I, 509-514.
Bishop, C. M dan Tipping, M. E. 1999. Probabilistic Principal Component Analysis. Journal of The Royal Statistical Society, Series B, 61, Part3, 611-622.
Bolstad, W. M. 2004. Introduction to Bayesian Statistics. New Jersey : John Wiley & Sons, Inc.
Borman, S. 2009. The Expectation Maximization Algorithm A Short Tutorial. Melalui <http://www.seanborman.com/publicat ions/EM_algorithm.pdf> [11/01/16]
BPS. 2011. Perkembangan Beberapa Indikator Utama Sosial-Ekonomi Indonesia (Mei). Jakarta-Indonesia : BPS.
Chen, H. 2001. Principal Component Analysis With Missing Data and Outliers. Melalui <http://www.neclabs.com/~haifeng/mypubs/tutorialrpca.pdfhttp://www.neclabs.com/~haifeng/mypubs/tutorialrpca.pdf> [10/8/15]
Enders, C. K. 2010. Applied Missing Data Analysis. New York: Guilford Press.
Graham, J. W. 2009. Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.
Gold, M. S., Bentler, P. M. dan Kim, K. H. 2002. A Comparison of MaximumLikelihood and Asymptotically Distribution-Free Methods of Treating Incomplete Non-Normal Data. Melalui <http://statistics.ucla.edu/system/resour ces/BAhbBlsHOgZmSSIBkjIwMTIv MDUvMjEvMTVfNDdfNTFfNjYyX0 FfQ29tcGFyaXNvbl9vZl9NYXhpbXV tX0xpa2VsaWhvb2RfYW5kX0FzeW1 wdG90aWNhbGx5X0Rpc3RyaWJ1dG lvbl9GcmVlX01ldGhvZHNfb2ZfVHJl YXRpbmdfSW5jb21wbGV0ZV9Ob25 fTm9ybWFsX0RhdGEucGRmBjoGR VQ/A%20Comparison%20of%20Maxi mum-Likelihood%20and%
20Asymptotically%20DistributionFree%20Methods%20of%20Treating%20Incomplete%20Non-Normal%20Data.pdf. > [24/6/15] Hogg, R. V., McKean, J. W., dan Craig, A. T. 2005. Introduction to Mathematical Statistics. New Jersey : Prentice Hall. Sixth Edition.
Ilin, A. dan Raiko, T. 2010. Pratical Approach to Principal Component Analysis in the Presence of Missing Values. Journal of Machine Learning Research, 11, 1957-2000.
Izenman, A. J. 2008. Modern Multivariate Statistical Technique Regression Classification and Manifold Learning. New York : Springer Text in Statistics.
Jaya, I G. N. M. 2010. Modul Komputasi Statistik dengan Software R. Jurusan Statistika Fakultas MIPA Unpad. Edisi Kedua.
Jolliffe, I. T. 2002. Principal Component Analysis. New York : Springer-Verlag, 2nd Edition.
Little, R. J. A. 1998. A Test of Missing Completely at Random for Multivariate Data With Missing Value. Journal of The American Statistical Association, Vol. 83, No.404, 1198-1202. American Statistical Association.
Little, R. J. A., dan Rubin, D. B. 1987. Statistical Analysis with Missing Data. Hoboken, NJ: Jhon Wiley & Sons.
Luttinen, J. dan Illin, A. 2009. Transformation for variational factor analysis to speed up learning. Neurocomputing.
Rubin, D. B. 1976. Inference and Missing Data. Bioinformatics, Vol. 63, No.3, 581-592. Great Britain : Biometrika Trust.
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K. dan Ishii, S. 2003. A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics, Vol. 19, No.16, 20882096. Oxford University Press.
Oba, S., Sato, M., dan Ishii, S. 2003. Prior Hyperparameters in Bayesian PCA. Joint International Conference ICANN/ICONIP, LNCS 2714, 271-279. Berlin : SpringerVerlag.
Scheffer, J. 2002. Dealing with Missing Data. Res. Lett. Inf. Math. Sci, Vol. 3, 153-160.
Schlomer, G. L., Bauman, S. dan Card, N. A. 2010. Best Practices for Missing Data
Management in Counseling Psychology. Journal of Counseling Psychology Vol. 57, No. 1, 1–10.
Sharma, S. 1996. Applied Multivariate Techniques. New York : Jhon Wiley & Sons Inc.
Stacklies, W. dan Redestig, H. 2016. The pcaMethodsPackage. Melalui <https://www.bioconductor.org/package s/3.3/bioc/manuals/pcaMethods/man/pca
Methods.pdf > [13/5/16]
Stacklies, W., Redestig, H., Scholz, M., Walther, D. dan Selbig, J. 2007. pcaMethods -a bioconductor package providing PCA methods for incomplete data. Bioinformatics, Vol. 23 No. 9, 1164-1167.
Takane, Y. dan Takane, Y. O. 2003. Relationships between Two Methos for Dealing with Missing Data in Principal Component Analysis. Behaviormetrika, 30, 145-154.
Widiastuti, S. dkk. 2003. Analisis Komponen Utama. Makalah Metode Penelitian dan Telaah Pustaka. IPB.

Published

2016-06-30

How to Cite

Yordani, R. (2016). Application of Bayesian Inference Model Variational Bayesian Principal Component Analysis (VBPCA) for Handling Missing Data in Principal Component Analysis. Jurnal Aplikasi Statistika & Komputasi Statistik, 8(1), 57. https://doi.org/10.34123/jurnalasks.v8i1.12