Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monocular Localization against map created with RGB-D with very poor performance #1395

Open
jsreveal opened this issue Dec 3, 2024 · 5 comments

Comments

@jsreveal
Copy link

jsreveal commented Dec 3, 2024

Hi guys,

I have created with success a map with Intel RealSense D455. I am trying now to do localization in that map using a different camera, monocular RGB camera, however most of the frames are not correctly localized even though I am l looking into the same place as when creating the map.

Is this use case possible to work well at all?

What settings could be changed to try to improve localization?

@matlabbe
Copy link
Member

matlabbe commented Dec 4, 2024

I think the big difference is the "different camera". Is the field of view similar to D455? Is the resolution similar?

You can compare by feeding the same RGB topic used for mapping from the D455 (simulating a monocular camera without the depth).

If FOV and resolution are very different, you may want to use SIFT features for localization (Kp/DetectorStrategy=1).

@borongyuan
Copy link
Contributor

I have tested using OAK-D W to build the map in the first session and OAK-D S2 in the second session, and localization can be triggered normally. So the camera FOV actually has little impact. You may need further examination.

@jsreveal
Copy link
Author

I have matched RGB-D mapping resolution and the monocular camera resolution. Also used SIFT features. Results seem to have not improved that much.

Over the previous changes, I increased the number of detected features, and number loop closures, and results improved.

Still it feels that in many places it should be matching. What could be used to improve results, namely SIFT ability to detect similar features?

Just for context I am mapping/localizing supermarket corridors.

@jsreveal
Copy link
Author

Also if I use the same mapping and localization footage, same config, but only map one corridor, the localization on that single corridor map will match more closure loops compared to when I do localization in the same corridor against a map of the whole supermarket containing much more corridors.

@matlabbe
Copy link
Member

Is the point of view similar between the two cameras? Large point of view differences may also impact how the visual features are seen and may result in different descriptors, or even different features detected. Also, if the point of view during localization is quite different than during the mapping session, less loop closures will be detected.

SIFT would generally give better loop closure hypotheses, in particular in large environments.

Also if I use the same mapping and localization footage, same config, but only map one corridor, the localization on that single corridor map will match more closure loops compared to when I do localization in the same corridor against a map of the whole supermarket containing much more corridors.

Difficult to assess this without seeing the data. But normally even if the global localization hypotheses would be smaller because the map is bigger (i.e., trigger less often global localization), if you get at least one global localization in that corridor, the proximity detection should fill the gap to detect more loop closures in the same are afterwards.

One last thing, is the lighting varies in the environment? That could affect negatively recall rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants