This is a basic exercise in covariate shift importance weighting. The simulation exercise described by Sugiyama et al. (2007) and a simple solution using weighted least squares is implemented here.
When performing maximum likelihood estimation, the goal is to maximize the
expected value of a likelihood function
Now consider a distribution of feature-target pairs in a source domain with a
probability density function
How could one train a model that maximizes the expected value of
the likelihood
Consider further a new combined distribution resulting from the mixture of
both domains, with probability density function
where
Given a sample from this mixture and domain labels, it's possible to train
classifier to estimate the probability of the observation belonging to the
target distribution,
Further, we know that, by the Bayes theorem
and therefore
This suggests that we could estimate the necessary importance weight as
As described by Sugiyama, we draw the training and test samples from two different gaussian distributions (the source domain and the target domain, respectively), as in the figure below:
\clearpage
An unweighted linear regression model is fit on the training data, which is unsurprisingly not accurate at predicting the test data, since the linear approximation is optimized locally on the region where the training data is sampled from (the source domain).
\clearpage
A weighted linear regression model is then fit, using a logistic regression classifier to estimate importance weights, as explained in the previous section. The model approximates the function in the region where the test data is sampled from (the target domain) much better.
\clearpage
In addition, we can see that if we train a model on a training data sampled from the target domain, the resulting linear approximation is very similar to that resulting from the weighted linear regression model.
\clearpage
Sugiyama et al., Journal of Machine Learning Research 8 (2007) 985-1005.