-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues, notes and documentation while testing on the CMU dataset, using volumetric model #75
Comments
It seems like there is something wrong with world coordinates. The model usually learns that legs are close to the ground if no other info is there. By looking at the pictures it does not seem to be the case, so probably there is bug with coordinates conversion between CMU and Human36. |
Hi thanks for the reply! Uh, what do you mean by coordinates conversion? I believe that the world coordinates were properly converted by when I set the scaling factor and changed the world axes, according to the pointers in #24? My current hypothesis is that the model is unable to guess joints which are "out of the picture" (leg joints that are missing), and so the the heatmaps for those particular joints are either non-existent, or the model guesses that the person is kneeling or sitting instead. |
@Samleo8 I would have double-checked that everything is the same. As far as I remember, z-axis has different sign in CMU and Humans3.6M and at some point we had a bug in this part and saw somewhat similar behavior. |
Hi thanks again for the reply! You are right about the z-axis having a different sign, and indeed I saw this in
You are actually right: In most cases the model chose the feet to be closer to the floor (see below). The example I gave was a bad one as it was an "anomaly" compared to the rest. |
To confirm the hypothesis, I will try it out on cameras which are able to capture the full body (i.e. no truncation), I'll let you know how it goes! |
Apparently, the model is robust against different angles. It seems that the issue is due to some of the cameras being faulty.
With all cameras capturing full pose, preliminary results seem to suggest that the model works well on the CMU Dataset as well! The hypothesis about the lack of full-body pose seems to be correct. It would be good to train the model so that it knows what to do with occluded body parts. |
@Samleo8 I am a bit confused. Are you using algebraic or volumetric models? |
Oh, sorry I didn't make it clearer; I've since updated the title. I'm using the volumetric model, but didnt use the algebraic model to first predict the pelvis positions. Because of this, the |
@Samleo8 I see. I wonder, how do you get the 2D heatmap distributions? |
Thanks for pointing this out, I didn't think much of it before! Correct me if I am wrong, but the 2D heatmaps seem to come from the 2D backbone that is part of the volumetric model? The checkpoints for this backbone (human36m) were given as a pretrained weights ~~Am I therefore right to say that in order to properly evaluate (and train) the CMU dataset, I need to first run it on the algebraic model to produce a 2D backbone with weights targeted towards the joints that CMU wants? ~~ If you are wondering how I visualized the heatmaps, they were part of the |
Hi, @Samleo8! |
Looking at the images above, I think there can be 3 possible problems:
|
Hi @karfly thanks for the reply. Is this to answer the above comment #75 (comment) or problem with partially occluded body in #76 ?
For now, the model is being trained on the CMU dataset (but possible issue #77) and seems to be doing well if the Tensorboard images are anything to go by; we'll see how that goes! |
Following the instructions at issue #24 and #19, I was able to successfully test on the CMU Panoptic Dataset using the provided pretrained Human36M weights (more specifics here) on the volumetric models, with a snapshot of some of the results below:
Issues
However, despite following all 4 pointers in #24, I still have issues with the problems with some of the keypoint detections (especially with the predictions the lower body being completely off).
Is it possible that the pretrained (H36M) model is unable to handle cases where the lower body is truncated, and thus results in the wrong predictions above?
Notes/Documentation
To those who would like to recreate the results and evaluate on the CMU dataset, note there are many changes that need to be made. I list the important ones below; check my forked repository for the rest.
CMUPanopticDataset
class, similar to theHuman36MMultiviewDataset
class inmvn/datasets/human36m.py
. You will also need the ground truth BBOXes in the link in issue Creating new "ground truth" for several datasets #19, and generate your own labels file. If you are lazy, follow my pre-processing instructions here, but note that there may be missing documentation here and there.use_gt_pelvis
to be true in the yaml config file.For those who are interested, I have updated the documentation in my repository at https://github.com/Samleo8/learnable-triangulation-pytorch.
The text was updated successfully, but these errors were encountered: