python - Finding and utilizing eigenvalues and eigenvectors from PCA in scikit-learn -

- April 15, 2014

i have been utilizing pca implemented in scikit-learn. however, want find eigenvalues , eigenvectors result after fit training dataset. there no mention of both in docs.

secondly, can these eigenvalues , eigenvectors utilized features classification purposes?

i assuming here eigenvectors mean eigenvectors of covariance matrix.

lets have n data points in p-dimensional space, , x p x n matrix of points directions of principal components eigenvectors of covariance matrix xx^t. can obtain directions of these eigenvectors sklearn accessing components_ attribute of pca object. can done follows:

from sklearn.decomposition import pca import numpy np x = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) pca = pca() pca.fit(x) print pca.components_

this gives output

[[ 0.83849224  0.54491354] [ 0.54491354 -0.83849224]]

where every row principal component in p-dimensional space (2 in toy example). each of these rows eigenvector of centered covariance matrix xx^t.

as far eigenvalues go, there no straightforward way them pca object. pca object have attribute called explained_variance_ratio_ gives percentage of variance of each component. these numbers each component proportional eigenvalues. in case of our toy example, these if print explained_variance_ratio_ attribute :

[ 0.99244289  0.00755711]

this means ratio of eigenvalue of first principal component eigenvalue of second principal component 0.99244289:0.00755711.

if understanding of basic mathematics of pca clear, better way eigenvectors , eigenvalues use numpy.linalg.eig eigenvalues , eigenvectors of centered covariance matrix. if data matrix p x n matrix, x (p features, n points), can use following code:

import numpy np centered_matrix = x - x.mean(axis=1)[:, np.newaxis] cov = np.dot(centered_matrix, centered_matrix.t) eigvals, eigvecs = np.linalg.eig(cov)

coming second question. these eigenvalues , eigenvectors cannot used classification. classification need features each data point. these eigenvectors , eigenvalues generate derived entire covariance matrix, xx^t. dimensionality reduction use projections of original points(in p-dimensional space) on principal components obtained result of pca. however, not useful, because pca not take account labels of training data. recommend lda supervised problems.

hope helps.

Search This Blog

HTPPS

python - Finding and utilizing eigenvalues and eigenvectors from PCA in scikit-learn -

Comments

Post a Comment

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -