Shape mismatch when trying to use different strides in decoder #498

amirbakhtiary23 · 2024-11-25T07:31:32Z

Hi, I tried to train RT-DETR with higher res (1280x1280), using default settings, the result were disappointing.
According to #187 , you mentioned we can use deeper look into features by configuring the num_layers in decoder.
I did it and training is fine so far but there is a problem. When i try to change the feat_strides by adding a 64 to it ( [8,16,32,64] ), i get an RuntimeError: The size of tensor a (34000) must match the size of tensor b (35200) at non-singleton dimension 1 during inference (evaluation) which is related to :

/src/zoo/rtdetr/rtdetr_decoder.py", line 496, in _get_decoder_input
 memory = valid_mask.to(memory.dtype) * memory  # TODO fix type error for onnx export

but when i set the feat_strides to [8,16,32,32] it works fine. How is that? shouldn't the shapes be consistent?

The text was updated successfully, but these errors were encountered:

lyuwenyu · 2024-11-25T09:54:03Z

feat_strides should not be manually add for extra features but depents on features from neck;

You can try to keep [8,16,32] as neck features, and jsut modify num_layers. It will add extra strides implicitly.

see related code:
https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetrv2_pytorch/src/zoo/rtdetr/rtdetrv2_decoder.py#L318-L319

amirbakhtiary23 · 2024-11-25T10:26:21Z

Thanks, I had seen that before and that's how I figured I might be able to change the extra stride.
another matter, do you have any full diagram of your model with every component clearly specified (with their module name in the code of course)? I need that because I want to make a few adjustment to your model to test my hypothesis.

lyuwenyu · 2024-11-28T03:20:00Z

Sorry, there are no relevant documents available; But feel free to ask questions whenever you have any doubts.

amirbakhtiary23 · 2024-12-07T06:24:59Z

Sorry, there are no relevant documents available; But feel free to ask questions whenever you have any doubts.

Thanks a lot. I have a question about the object queries. if im not mistaking, you mentioned the queries are optimized during the training process, and by looking at code the i figured this optimization is indirect and it is based on image features, since there is no computation regarding queries in the matcher nor loss function.
Is this assumption correct?
Also what is the range of the numbers in queries ? memory (which is the image features), refrence_points(which is computed by the memory and labels i guess) and target(which are the object queries if im not wrong) are passed to the decoder. I want to replace the target(queries) by some additional info about each scene. Should i normalize these data? How the original queries are normalized?

amirbakhtiary23 assigned lyuwenyu Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape mismatch when trying to use different strides in decoder #498

Shape mismatch when trying to use different strides in decoder #498

amirbakhtiary23 commented Nov 25, 2024

lyuwenyu commented Nov 25, 2024 •

edited

Loading

amirbakhtiary23 commented Nov 25, 2024

lyuwenyu commented Nov 28, 2024 •

edited

Loading

amirbakhtiary23 commented Dec 7, 2024 •

edited

Loading

Shape mismatch when trying to use different strides in decoder #498

Shape mismatch when trying to use different strides in decoder #498

Comments

amirbakhtiary23 commented Nov 25, 2024

lyuwenyu commented Nov 25, 2024 • edited Loading

amirbakhtiary23 commented Nov 25, 2024

lyuwenyu commented Nov 28, 2024 • edited Loading

amirbakhtiary23 commented Dec 7, 2024 • edited Loading

lyuwenyu commented Nov 25, 2024 •

edited

Loading

lyuwenyu commented Nov 28, 2024 •

edited

Loading

amirbakhtiary23 commented Dec 7, 2024 •

edited

Loading