Skip to content

'Context-aware' question #663

Answered by deklanw
deklanw asked this question in Q&A
Discussion options

You must be logged in to vote

Figured out that dataset.get_item_feature is the key here. It returns 0-padded factorized encodings for token_seq, so you need to drop the first column. Something like

def encode_categorical_item_features(dataset, included_features):
    item_features = dataset.get_item_feature()

    mlb = MultiLabelBinarizer(sparse_output=True)
    ohe = OneHotEncoder(sparse=True)

    encoded_feats = []

    for feat in included_features:
        t = dataset.field2type[feat]
        feat_frame = item_features[feat].numpy()

        if t == FeatureType.TOKEN:
            encoded = ohe.fit_transform(feat_frame.reshape(-1, 1))
            encoded_feats.append(encoded)
        elif t == FeatureType.TOKEN_SEQ

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by deklanw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
1 participant