Creating new "ground truth" for several datasets #19

isarandi · 2019-10-20T17:35:20Z

Hi, thanks for this amazing work.

Do you have any plans for running your method on other datasets and releasing the resulting poses? This would be very beneficial for correcting many ground truth errors. Specifically I'm thinking of MPI-INF-3DHP (some annotations are wrong) and HumanEva-I (in some sequences the head ground truth is wrong), in addition the already mentioned H3.6M (problems with S9) and CMU-Panoptic (ground truth unavailable for many sequences, e.g. dance + errors).

I think your results could be better than the original ground truth in many cases. So by e.g. leave-one-subject-out training and testing, one could generate new, polished "ground truth" for each subject of a particular dataset (to avoid memorizing the training set errors).

karfly · 2019-10-21T22:12:34Z

Hey, thank you for your interest.

Next step for us is to add CMU Panoptic dataset support. Then we can think about adding other multi-view datasets.

We have some vague plans about annotating/reannotating datasets. Maybe the community can help us with it? 😊

dulibubai · 2019-10-22T09:55:47Z

If I want to train the model with CMU Panoptic dataset ,that means I'd better modify the dataset prepare codes based on Human3.6M precesssing way? Can you add CMU Panoptic dataset support? Can you add CMU Panoptic dataset more details in your paper?Thanks a lot!

karfly · 2019-10-23T12:49:17Z

@dulibubai
We’re going to add CMU Panoptic dataset support soon, but if you need it right now, you can implement it yourself using Human3.6M as a reference.

What exact details of CMU Panoptic dataset training are you interested in?

dulibubai · 2019-10-23T12:55:52Z

When I download CMU dataset in its official network, I found most of the sequences have not labels,only some have , and there many people in most of them ,just single human is several..So when you train with it,which sequences you chose before,and how to split the train,val and test dataset when you train the model with CMU dataset? Thanks again.

karfly · 2019-10-23T13:35:09Z

@dulibubai
We use train/val splits provided by authors of the original paper "Monocular Total Capture: Posing Face, Body, and Hands in the Wild".

Each scene contains multiple recorded persons => for each person an interval is provided in format[start_frame, end_frame]. Here is the list of scene names split into train/val:

train:
    - 171026_pose3
      - [1000, 3000]

    - 171026_pose2
      - [1000, 7500]
      - [8000, 14000]

    - 171026_pose1
      - [380, 7300]
      - [7900, 14500]
      - [15400, 22400]

    - 171204_pose4
      - [500, 4300]
      - [4900, 8800]
      - [9400, 13200]
      - [14200, 17800]
      - [18700, 22500]
      - [23050, 27050]
      - [28000, 31600]

    - 171204_pose3
      - [500, 4400]
      - [5400, 9000]

    - 171204_pose2
      - [350, 4300]
      - [5000, 8800]
      - [9600, 13600]
      - [14300, 18500]
      - [19600, 23500]
      - [24200, 28200]
      - [28800, 32800]
      - [33500, 37700]

    - 171204_pose1
      - [300, 4100]
      - [4800, 8900]
      - [10000, 13600]
      - [14000, 18200]
      - [18500, 22900]
      - [23500, 27600]

  val:
    - 171204_pose5
      - [400, 4300]
      - [5000,  8500]
      - [9500, 13400]
      - [14200, 18000]
      - [19000, 22600]
      - [23500, 27100]

    - 171204_pose6
      - [1000, 4500]
      - [5150, 9100]
      - [9830, 13800]
      - [14370, 18300]
      - [19000, 22900]

dulibubai · 2019-10-23T13:43:01Z

Thanks for your sharing sincerely! As show in CMU dataset, every sequences have 31 cameras,and how to split 31 cameras images for train and val dataset? Thanks again.

karfly · 2019-10-23T13:48:45Z

@dulibubai
We used val cameras: ["00_02", "00_13", "00_16", "00_18"]

dulibubai · 2019-10-23T13:50:24Z

Yeah! Thanks a lot!

dulibubai · 2019-10-24T01:35:07Z

Hi! I have another question, Is it convenient to provide the 2d bbox lable file in every camera images with CMU that extracted by object detection net?

karfly · 2019-10-24T16:16:55Z

@dulibubai

I've uploaded our Mask R-CNN detections to the Google Drive.
The format of the detection is the same as in the Human3.6M dataset:

detection == (left, upper, right, lower, confidence)

dulibubai · 2019-10-25T01:08:08Z

Thanks a lot!

dulibubai · 2019-10-27T13:58:08Z

@karfly ,
1)In the generate-labels-npy-multiview.py,what's the effect of the square_the_bbox(bbox) function?
def square_the_bbox(bbox):
top, left, bottom, right = bbox
width = right - left
height = bottom - top
if height < width:
center = (top + bottom) * 0.5
top = int(round(center - width * 0.5))
bottom = top + width
else:
center = (left + right) * 0.5
left = int(round(center - height * 0.5))
right = left + height
return top, left, bottom, right
2)In the human36m.py, why do you ' TLBR to LTRB' with bbox information?
bbox = shot['bbox_by_camera_tlbr'][camera_idx][[1,0,3,2]] # TLBR to LTRB
Thanks a lot!

shrubb · 2019-10-27T15:07:08Z

@dulibubai

Object detectors will output rectangluar bounding boxes with arbitrary aspect ratio (this can possibly be true even for ground truth bounding boxes). However, since we are training a CNN, we'd like all input images to be of same size and obviously of same aspect ratio. Therefore, we decided to adjust all bounding boxes to 1:1 height-width ratio, i.e. make them square (we could have chosen some other ratio). This function does this for one box by growing the smallest side.
That is nothing important, it is there just for convenience. I think when we were writing human36m.py, some functions would already require LTRB bboxes (like crop_image(), scale_bbox()), so we had to adapt to them.

dulibubai · 2019-10-28T01:33:41Z

@shrubb ,Thanks a lot!
I have another question to ask for you
In the generate-labels-npy-multiview.py,
1)Why transposes R? Did R in Human3.6M have transposed when stored?
2)Why not store 'T' in camera_retval['t'] , Did 'T' in Human3.6M's cameras_param file not equal true 't'?
camera_retval['R'] = np.array(camera_params['R']).T
camera_retval['t'] = -camera_retval['R'] @ camera_params['T']
3)When I use my externel dataset, I don't need to do so with R's tranpose ?
If I can receive t directly, I can store t in camera_retval['t'] directly?

Thanks very much!

shrubb · 2019-10-28T04:46:28Z

@dulibubai
In our code, the projection math (for all datasets) is handled by the code in mvn/utils/multiview.py. There, we adopted an OpenCV-like convention regarding camera model-related formulae (I might be wrong here). Human3.6M's intrinsics and extrinsics came in a different format, they used different projection formulae, and a different distortion model. So, the code you quoted simply converts R and T shipped with Human3.6M to our format.

dulibubai · 2019-10-29T07:32:31Z

@shrubb , When train the model with cmu dataset, how to set the follow paras?
n_objects_per_epoch:
n_epochs:
And what's the GPU's memory did you used? Single GPU or Muiti-GPU?
Thanks again!

karfly · 2019-10-29T08:08:55Z

@dulibubai

We used same parameters as for Human3.6M. Paper experiments were done with single GPU, but you can use multiple GPUs to reduce training time.

dulibubai · 2019-11-05T01:45:30Z

@karfly
Hi！ When you train the volumetric model, you used the position of pelvis with ground truth or prediction by algebraic? The result in your paper about volumetric model is unclear?
Thanks!

karfly · 2019-11-05T02:37:20Z

@dulibubai
Hey! We use predictions by Algebraic method.

dulibubai · 2019-11-05T02:46:40Z

@karfly
Hi! If you train Algebraic model with Human3.6M, and trained the Volumetric model with Human3.6M based on the pelvis's position predicted by the Algebraic mode ltrained with Human3.6M still, It's not reasonable, because when you predict the pelvis position, that have been trained in Human3.6M .

karfly · 2019-11-05T02:52:03Z

@dulibubai
Why? In such scenario it’s absolutely fair and there is no data leak to validation data – so I think it’s reasonable.

dulibubai · 2019-11-05T03:00:43Z

@karfly
Okay! I get it.Thanks.

Samleo8 · 2020-05-29T02:27:47Z

@dulibubai
We used val cameras: ["00_02", "00_13", "00_16", "00_18"]

Hi @karfly what about the cameras for training? Did you just use all the other cameras? Also in my own attempts to test/train (#75 #77) I found that the projection matrix data for cameras 25 and 29 were off.

karfly · 2020-05-29T09:35:26Z

Hi, @Samleo8!
Yes, we just used all other cameras. Don’t remember if some of them missed projection matrix, but I think it’s okay to remove such cameras from training. It shouldn’t influence the result too much.

fxyQAQ · 2023-06-25T17:08:04Z

Hello. could you upload that Google Drive again?

karfly closed this as completed Nov 6, 2019

shrubb mentioned this issue Nov 7, 2019

How to obtain bbox file? #26

Closed

karfly mentioned this issue Feb 14, 2020

details of experimental settings on CMU dataset #55

Closed

Samleo8 added a commit to Samleo8/learnable-triangulation-pytorch that referenced this issue Apr 22, 2020

Added cmu frames from karfly#19

0ee2030

shrubb mentioned this issue Apr 26, 2020

How to interpret human36m-multiview-labels-GTbboxes.npy? #70

Closed

Samleo8 mentioned this issue May 1, 2020

Pretrained weights file for CMU Dataset #72

Closed

Samleo8 mentioned this issue May 18, 2020

Issues, notes and documentation while testing on the CMU dataset, using volumetric model #75

Open

Samleo8 mentioned this issue May 31, 2020

Training of CMU dataset gets stuck on batch 1 #79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating new "ground truth" for several datasets #19

Creating new "ground truth" for several datasets #19

isarandi commented Oct 20, 2019

karfly commented Oct 21, 2019 •

edited

Loading

dulibubai commented Oct 22, 2019

karfly commented Oct 23, 2019

dulibubai commented Oct 23, 2019

karfly commented Oct 23, 2019

dulibubai commented Oct 23, 2019

karfly commented Oct 23, 2019 •

edited

Loading

dulibubai commented Oct 23, 2019

dulibubai commented Oct 24, 2019

karfly commented Oct 24, 2019

dulibubai commented Oct 25, 2019

dulibubai commented Oct 27, 2019

shrubb commented Oct 27, 2019 •

edited

Loading

dulibubai commented Oct 28, 2019

shrubb commented Oct 28, 2019

dulibubai commented Oct 29, 2019

karfly commented Oct 29, 2019

dulibubai commented Nov 5, 2019

karfly commented Nov 5, 2019

dulibubai commented Nov 5, 2019

karfly commented Nov 5, 2019

dulibubai commented Nov 5, 2019

Samleo8 commented May 29, 2020

karfly commented May 29, 2020

fxyQAQ commented Jun 25, 2023

Creating new "ground truth" for several datasets #19

Creating new "ground truth" for several datasets #19

Comments

isarandi commented Oct 20, 2019

karfly commented Oct 21, 2019 • edited Loading

dulibubai commented Oct 22, 2019

karfly commented Oct 23, 2019

dulibubai commented Oct 23, 2019

karfly commented Oct 23, 2019

dulibubai commented Oct 23, 2019

karfly commented Oct 23, 2019 • edited Loading

dulibubai commented Oct 23, 2019

dulibubai commented Oct 24, 2019

karfly commented Oct 24, 2019

dulibubai commented Oct 25, 2019

dulibubai commented Oct 27, 2019

shrubb commented Oct 27, 2019 • edited Loading

dulibubai commented Oct 28, 2019

shrubb commented Oct 28, 2019

dulibubai commented Oct 29, 2019

karfly commented Oct 29, 2019

dulibubai commented Nov 5, 2019

karfly commented Nov 5, 2019

dulibubai commented Nov 5, 2019

karfly commented Nov 5, 2019

dulibubai commented Nov 5, 2019

Samleo8 commented May 29, 2020

karfly commented May 29, 2020

fxyQAQ commented Jun 25, 2023

karfly commented Oct 21, 2019 •

edited

Loading

karfly commented Oct 23, 2019 •

edited

Loading

shrubb commented Oct 27, 2019 •

edited

Loading