You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Motivation: Unify classification and regression via hyperspherical output space with class prototypes defined a priori. The output number is no longer constrained to fixed output size (i.e., #class for classification or single dimension for regression).
Class prototype is a feature vector representing its original class. You can think of word2vec, where word is the original class and its feature vector is its "class prototype".
Intuition: Class prototypes provide inductive biases for easier optimization and better generalization (i.e.,, large margin separation), instead of just learning to output fixed-size values in a fully parametric manner, ignoring known inductive biases.
Class prototype definition: Unlike previous work on learning prototypes requires constant re-estimation, this work places prototypes uniformly on hypersphere (following large margin separation principle). By doing this, the prototypes do not need to be inferred from data or changed constantly.
Class prototype positioning: Uniformly distributing an arbitrary number of prototypes and output dimension is an open mathematical problem (i.e., "Tammes problem"). As an approximation, they design a training loss to learn the position of prototypes on hypersphere.
Uniformity training loss: Minimize the nearest cosine similarity for each prototype.
Prototypes with task-specific semantic information: Use ranking-based loss function that incorporates similarity order instead of direct similarities, to make prototypes align to pretrained word2vec (avoid to learn direct similarities since word vectors did not have uniformity property)
Pick "lower & upper bound" prototypes by cosine similarity = -1; the lower bound and upper bound regression values are typically maximum and minimum regression values in training examples.
Training using hyperspherical regression loss function.
Differs from standard regression, which backpropagates losses on one dimensional outputs, this work learns on the line from lower bound prototype to upper bound prototype.
Metadata
The text was updated successfully, but these errors were encountered: