Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch when trying to use different strides in decoder #498

Open
amirbakhtiary23 opened this issue Nov 25, 2024 · 4 comments
Open
Assignees

Comments

@amirbakhtiary23
Copy link

Hi, I tried to train RT-DETR with higher res (1280x1280), using default settings, the result were disappointing.
According to #187 , you mentioned we can use deeper look into features by configuring the num_layers in decoder.
I did it and training is fine so far but there is a problem. When i try to change the feat_strides by adding a 64 to it ( [8,16,32,64] ), i get an RuntimeError: The size of tensor a (34000) must match the size of tensor b (35200) at non-singleton dimension 1 during inference (evaluation) which is related to :

/src/zoo/rtdetr/rtdetr_decoder.py", line 496, in _get_decoder_input
 memory = valid_mask.to(memory.dtype) * memory  # TODO fix type error for onnx export 

but when i set the feat_strides to [8,16,32,32] it works fine. How is that? shouldn't the shapes be consistent?

@lyuwenyu
Copy link
Owner

lyuwenyu commented Nov 25, 2024

feat_strides should not be manually add for extra features but depents on features from neck;

You can try to keep [8,16,32] as neck features, and jsut modify num_layers. It will add extra strides implicitly.


see related code:
https://github.com/lyuwenyu/RT-DETR/blob/main/rtdetrv2_pytorch/src/zoo/rtdetr/rtdetrv2_decoder.py#L318-L319

@amirbakhtiary23
Copy link
Author

Thanks, I had seen that before and that's how I figured I might be able to change the extra stride.
another matter, do you have any full diagram of your model with every component clearly specified (with their module name in the code of course)? I need that because I want to make a few adjustment to your model to test my hypothesis.

@lyuwenyu
Copy link
Owner

lyuwenyu commented Nov 28, 2024

Sorry, there are no relevant documents available; But feel free to ask questions whenever you have any doubts.

@amirbakhtiary23
Copy link
Author

amirbakhtiary23 commented Dec 7, 2024

Sorry, there are no relevant documents available; But feel free to ask questions whenever you have any doubts.

Thanks a lot. I have a question about the object queries. if im not mistaking, you mentioned the queries are optimized during the training process, and by looking at code the i figured this optimization is indirect and it is based on image features, since there is no computation regarding queries in the matcher nor loss function.
Is this assumption correct?
Also what is the range of the numbers in queries ? memory (which is the image features), refrence_points(which is computed by the memory and labels i guess) and target(which are the object queries if im not wrong) are passed to the decoder. I want to replace the target(queries) by some additional info about each scene. Should i normalize these data? How the original queries are normalized?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants