Generalised low rank models #
GLRMs model a data array by a low rank matrix, presumably with much lower rank than the actual dataset. Includes as special cases principal component analysis, matrix completion, robust PCA, nonnegative matrix factorization, k-means, and others. It’s best thought of as a generalisation of PCA to arbitrary datasets:
PCA finds a low rank matrix that minimizes the approximation error, in the least-squares sense, to the original data set. A factorization of this low rank matrix embeds the original high dimensional features into a low dimensional space. Extensions of PCA can handle missing data values, and can be used to impute missing entries. Here, we extend PCA to approximate an aRbitrary data set by replacing the least-squares error used in PCA with a loss function that is appropriate for the given data type.
Links and resources #
- A Julia package for implementing GLRMs.
- A very thorough survey of GLRMs, from which the above quote was taken. I had never seen PCA described so simply as in this paper and I think this explanation makes so much more sense than trying to motivate it through variance which is how I’ve always seen it.