-
I'm trying to implement a non-neural model which uses item features. I see that the All I need is to use a combination of But, I'm not totally sure how to do this in RecBole. Do I subclass In particular, for |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Figured out that def encode_categorical_item_features(dataset, included_features):
item_features = dataset.get_item_feature()
mlb = MultiLabelBinarizer(sparse_output=True)
ohe = OneHotEncoder(sparse=True)
encoded_feats = []
for feat in included_features:
t = dataset.field2type[feat]
feat_frame = item_features[feat].numpy()
if t == FeatureType.TOKEN:
encoded = ohe.fit_transform(feat_frame.reshape(-1, 1))
encoded_feats.append(encoded)
elif t == FeatureType.TOKEN_SEQ:
encoded = mlb.fit_transform(feat_frame)
# drop first column which corresponds to the padding 0; real categories start at 1
# convert to csc first?
encoded = encoded[:, 1:]
encoded_feats.append(encoded)
else:
raise Warning(
f'ADD-EASE only supports token or token_seq types. [{feat}] is of type [{t}].')
if not encoded_feats:
raise ValueError(
f'No valid token or token_seq features to include.')
return sp.hstack(encoded_feats).T.astype(np.float32) |
Beta Was this translation helpful? Give feedback.
Figured out that
dataset.get_item_feature
is the key here. It returns 0-padded factorized encodings for token_seq, so you need to drop the first column. Something like