Replies: 1 comment 1 reply
-
The key reason lies in user-item interaction records in validation sets will not be ranked while testing. For example, for one certain user, the model may rank the items as following:
, where As we'll not rank validation ground truth items when calculating test results, the top-5 rank list for validation and test are as follows. Validation: a1, b1, a2, b2, a3 (The items in bold are ground truth items) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I see on the docs site,
I don't understand this explanation. Since we're using the validation set to determine early stopping, shouldn't the results on the validation set be, on average, slightly better than the test results?
In my observations I don't think I've ever seen the validation results be better. The test results appear to be systematically better.
Beta Was this translation helpful? Give feedback.
All reactions