JRDB-Pose 2022

Advancing the state-of-the-art for human pose estimation and tracking in-the-wild.

JRDB-Pose contains new manually-labeled annotations for body pose and head box across our entire train and test video set. These annotations include 600,000 human body pose annotations and 600,000 head bounding box annotations, making JRDB-Pose one of the largest publically-available dataset of ground truth human body pose annotations. Crucially, annotations come from in-the-wild videos and include heavily occluded poses, making JRDB-Pose both difficult and representative of the real-world environment.


Human Pose

JRDB-Pose uses a 17 keypoint set for pose annotations, where each keypoint is annotated with its position and visibility score.

Keypoint Location Description

    1. head
    2. right eye
    3. left eye
    4. right shoulder
    5. neck
    6. left shoulder
    7. right elbow
    8. left elbow
    9. center hip
    10. right hand
    11. right hip
    12. left hip
    13. left hand
    14. right knee
    15. left knee
    16. right foot
    17. left foot

Annotated joint locations of JRDB-Pose


Visibility Score

Each annotated keypoint has a visibility score in {0, 1, 2}:
Visibility Meaning Description
0 Invisible This joint is out of frame or especially difficult to annotate.
1 Occluded The joint is somewhat occluded (by another body part of an object), but it is reasonably easy to infer its location.
2 Visible The joint is fully visible and in view of the camera.

Annotation Format

For each scene, we provide a .json file with the COCO-style annotations dictionary for that scene. The body pose data itself is contained in the annotations list, which looks like the following. Annotations with body pose information use category_id=2.

"annotations": [
    {
        "id": 9403
        "image_id": 822,
        "track_id": 37,
        "area": 5463.6864,
        "num_keypoints": 17,
        "keypoints": [229, 256, 2, ..., 223, 369, 0],
        "bbox": [204.01, 235.08, 60.84, 177.36],
        "category_id": 2,
    }
]

Annotation Details

These annotations were collected over a period of several months. Annotators were asked to label the locations of all 17 keypoints shown above. In case a keypoints is occluded, but its location is still reasonably possible to infer, its location is properly annotate, meaning that our dataset contains many severely occluded keypoints. We generally annotate all people in a scene whose bounding boxe area is at least 6500 pixels; below this, we find that the location of keypoints are too close together to be useful. While JRDB videos are 15fps, we annotate at 7.5fps (every other frame) and linearly interpolate the middle frames to provide further high-accuracy annotations with little additional work. These annotations, including interpolated frames, are provided as JRDB-Pose.


Human Pose Tracking

Every individual pose annotation comes with a track_id property, which remains consistent for the person across the video sequence.

Head Bounding Box

For each scene, we provide a .json file with the COCO-style annotations dictionary for that scene. The data for head bounding boxes is contained in the annotations list use category_id=1. Head bounding box annotations look like the following:

"annotations": [
    {
        "id": 9403
        "image_id": 822,
        "track_id": 37,
        "area": 476.0476,
        "bbox": [244.36, 239.79, 20.44, 23.29],
        "category_id": 1,
    }
]

Head Bounding Box Tracking

Every head box annotation comes with a track_id property, which remains consistent for the person across the video sequence.

Toolkit & Code Samples

We evaluate leaderboard results using both AP and OSPA-Pose. You can find the evaluation toolkit here:

Toolkit (on Github)
We trained our baseline models (See the leaderboard) using MMPose, and we will release the code we used soon.

Downloads

See downloads