Different vocab size between `data.vocab` and `embs_ann`? #21

yingShen-ys · 2022-07-18T19:22:10Z

Hi, I noticed that the data.vocab stored in the baseline model has a different vocabulary length compared to the language embedding stored in pretrained model.

For the baseline model "et_plus_h", the data.vocab file has Vocab(2554) for words while if I load the pretrained model from baseline_models/et_plus_h/latest.pth, the embedding layer model.embs_ann.lmdb_simbot_edh_vocab_none.weight has torch.Size([2788, 768]).

Did I miss something?

The text was updated successfully, but these errors were encountered:

aishwaryap · 2022-07-25T23:52:05Z

Hi @yingShen-ys, any idea whether such discrepancy exists in the following cases:

If you train a new model using our code
In pretrained models from the original E.T. repo
If models are trained on ALFRED using the original E.T. repo

I haven't previously examined the saved models in enough detail to notice a discrepancy like this so I'm not sure offhand on whether your intuition of expecting them to the same is correct, although it is plausible. I'll take a deeper look at the ET code and get back to you on this.

yingShen-ys · 2022-07-26T12:42:56Z

Hi @yingShen-ys, any idea whether such discrepancy exists in the following cases:

If you train a new model using our code

In pretrained models from the original E.T. repo

If models are trained on ALFRED using the original E.T. repo

I haven't previously examined the saved models in enough detail to notice a discrepancy like this so I'm not sure offhand on whether your intuition of expecting them to the same is correct, although it is plausible. I'll take a deeper look at the ET code and get back to you on this.

I am not training a new model but rather using the pretrained model from baseline_models downloaded using this repo.

The intuition is that model.embs_ann.lmdb_simbot_edh_vocab_none.weight is the weight of the word embedding layer and data.vocab stores the word vocabulary. So Vocab(2554) should be the word vocabulary size according to data.vocab, but if we check the word embedding layer in pretrained model, it seems like the pretrained model accepts a larger vocabulary size=2788 rather than 2554.

I think the pretrained model should have a corresponding data.vocab that is of size Vocab(2778) rather than 2554?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different vocab size between `data.vocab` and `embs_ann`? #21

Different vocab size between `data.vocab` and `embs_ann`? #21

yingShen-ys commented Jul 18, 2022

aishwaryap commented Jul 25, 2022

yingShen-ys commented Jul 26, 2022

Different vocab size between data.vocab and embs_ann? #21

Different vocab size between data.vocab and embs_ann? #21

Comments

yingShen-ys commented Jul 18, 2022

aishwaryap commented Jul 25, 2022

yingShen-ys commented Jul 26, 2022

Different vocab size between `data.vocab` and `embs_ann`? #21

Different vocab size between `data.vocab` and `embs_ann`? #21