python - Finding and utilizing eigenvalues and eigenvectors from PCA in scikit-learn -
i have been utilizing pca implemented in scikit-learn. however, want find eigenvalues , eigenvectors result after fit training dataset. there no mention of both in docs.
secondly, can these eigenvalues , eigenvectors utilized features classification purposes?
i assuming here eigenvectors mean eigenvectors of covariance matrix.
lets have n data points in p-dimensional space, , x p x n matrix of points directions of principal components eigenvectors of covariance matrix xxt. can obtain directions of these eigenvectors sklearn accessing components_
attribute of pca
object. can done follows:
from sklearn.decomposition import pca import numpy np x = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) pca = pca() pca.fit(x) print pca.components_
this gives output
[[ 0.83849224 0.54491354] [ 0.54491354 -0.83849224]]
where every row principal component in p-dimensional space (2 in toy example). each of these rows eigenvector of centered covariance matrix xxt.
as far eigenvalues go, there no straightforward way them pca
object. pca
object have attribute called explained_variance_ratio_
gives percentage of variance of each component. these numbers each component proportional eigenvalues. in case of our toy example, these if print explained_variance_ratio_
attribute :
[ 0.99244289 0.00755711]
this means ratio of eigenvalue of first principal component eigenvalue of second principal component 0.99244289:0.00755711
.
if understanding of basic mathematics of pca clear, better way eigenvectors , eigenvalues use numpy.linalg.eig
eigenvalues , eigenvectors of centered covariance matrix. if data matrix p x n matrix, x (p features, n points), can use following code:
import numpy np centered_matrix = x - x.mean(axis=1)[:, np.newaxis] cov = np.dot(centered_matrix, centered_matrix.t) eigvals, eigvecs = np.linalg.eig(cov)
coming second question. these eigenvalues , eigenvectors cannot used classification. classification need features each data point. these eigenvectors , eigenvalues generate derived entire covariance matrix, xxt. dimensionality reduction use projections of original points(in p-dimensional space) on principal components obtained result of pca. however, not useful, because pca not take account labels of training data. recommend lda supervised problems.
hope helps.
Comments
Post a Comment