Skip to content

Commit

Permalink
Refined print of top rotations.
Browse files Browse the repository at this point in the history
  • Loading branch information
dereckmezquita committed Jul 14, 2024
1 parent 5b2a726 commit 6594fc2
Showing 1 changed file with 36 additions and 41 deletions.
77 changes: 36 additions & 41 deletions vignettes/stat_review-principal-component-analysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,46 +54,41 @@ Now what does orthogonal mean? In this context, it means that the principal comp

PCA involves several mathematical steps:

1. **Standardisation**:
PCA begins with a dataset of n-dimensions. In our implementation, genes are dimensions and samples are observations. The data is standardised, transforming each dimension to have a mean of 0 and a standard deviation of 1. This step is crucial because PCA is sensitive to the relative scaling of the original variables.

Mathematically, for each feature x, we compute:

$$
x_standardised = (x - μ) / σ
$$

where μ is the mean and σ is the standard deviation of the feature.

2. **Covariance Matrix Computation**:
A covariance matrix is computed. This matrix indicates the covariance (shared variance) between each pair of dimensions. The covariance between different dimensions is used to understand the correlation structure of the original dimensions.

For a dataset X with m features, the covariance matrix C is computed as:

$$
C = (1 / (n-1)) * (X^T * X)
$$

where n is the number of observations and X^T is the transpose of X.

3. **Eigendecomposition**:
The covariance matrix is then decomposed into its eigenvectors and eigenvalues. Each eigenvector represents a principal component, which is a linear combination of the original dimensions. The associated eigenvalue represents the amount of variance explained by the principal component.

We solve the equation:

$$
C * v = λ * v
$$

where v is an eigenvector and λ is the corresponding eigenvalue.

The eigenvectors are ordered by their corresponding eigenvalues, so the first principal component (PC1) explains the most variance, followed by PC2, and so on.

4. **Selection of Principal Components**:
Depending on the goal of the analysis, some or all of the principal components can be selected for further analysis. The 'elbow method' is commonly used, where you plot the explained variance by each principal component and look for an 'elbow' in the plot as a cut-off point.

5. **Interpretation**:
The 'top rotations' in the context of PCA refer to the features (genes) that contribute most to each principal component. The 'rotation' matrix gives the loadings of each feature onto each PC. By identifying features with large absolute loadings, we can understand what features drive the separation in the data along the principal components.
1. **Standardisation**: PCA begins with a dataset of n-dimensions. In our implementation, genes are dimensions and samples are observations. The data is standardised, transforming each dimension to have a mean of 0 and a standard deviation of 1. This step is crucial because PCA is sensitive to the relative scaling of the original variables.

Mathematically, for each feature x, we compute:

$$
x_{standardised} = (x - μ) / σ
$$

where μ is the mean and σ is the standard deviation of the feature.

2. **Covariance Matrix Computation**: A covariance matrix is computed. This matrix indicates the covariance (shared variance) between each pair of dimensions. The covariance between different dimensions is used to understand the correlation structure of the original dimensions.

For a dataset X with m features, the covariance matrix C is computed as:

$$
C = (1 / (n-1)) * (X^T * X)
$$

where n is the number of observations and X^T is the transpose of X.

3. **Eigendecomposition**: The covariance matrix is then decomposed into its eigenvectors and eigenvalues. Each eigenvector represents a principal component, which is a linear combination of the original dimensions. The associated eigenvalue represents the amount of variance explained by the principal component.

We solve the equation:

$$
C * v = λ * v
$$

where v is an eigenvector and λ is the corresponding eigenvalue.

The eigenvectors are ordered by their corresponding eigenvalues, so the first principal component (PC1) explains the most variance, followed by PC2, and so on.

4. **Selection of Principal Components**: Depending on the goal of the analysis, some or all of the principal components can be selected for further analysis. The 'elbow method' is commonly used, where you plot the explained variance by each principal component and look for an 'elbow' in the plot as a cut-off point.

5. **Interpretation**: The 'top rotations' in the context of PCA refer to the features (genes) that contribute most to each principal component. The 'rotation' matrix gives the loadings of each feature onto each PC. By identifying features with large absolute loadings, we can understand what features drive the separation in the data along the principal components.

## Using the Pca Class

Expand Down Expand Up @@ -162,7 +157,7 @@ pca_obj2$prcomp_results
# View the refined PCA results
pca_obj2$prcomp_refined
# View the top contributors to each PC
pca_obj2$top_rotations
as.data.frame(pca_obj2$top_rotations)
```

### Visualising PCA Results
Expand Down

0 comments on commit 6594fc2

Please sign in to comment.