... ... ...

3rd Workshop on Visual Perception for Navigation in Human Environments:

The JackRabbot Human Body Pose Dataset and Benchmark

Workshop Goals

This is the third workshop from the JRDB workshop series, tailored to many perceptual problems for an autonomous robot to operate, interact and navigate in human environments. These perception tasks include any 2D or 3D visual scene understating problem as well as any other problems pertinent to human action, intention and social behaviour understanding such as 2D-3D human detection, tracking and forecasting, 2D-3D human body skeleton pose estimation, tracking and forecasting and human social grouping and activity recognition.

JRDB dataset contains 67 minutes of the annotated sensory data acquired from the JackRabbot mobile manipulator and includes 54 indoor and outdoor sequences in a university campus environment. The sensory data includes a stereo RGB 360° cylindrical video stream, 3D point clouds from two LiDAR sensors, audio and GPS positions. In our first workshop, we introduced JRDB including annotations for 2D bounding boxes and 3D oriented cuboids around pedestrians. In our second workshop, we further introduced new annotations for individual actions, human social group formation, and social activity of each social group. In this workshop, we additionally release anontations for 2D human body pose including 800,000+ annotated human body skeletons with visibility and occlusion labels. We also have, as invited speakers, world-renowned experts in the field of visual perceptions for understanding human action, behaviour, shape, and pose.

Call for Papers


We invite researchers to submit their papers addressing topics related to autonomous (robot) navigation in human environments. Relevant topics include, but not limited to:

  • 2D or 3D human detection and tracking
  • 2D or 3D skeleton pose estimation and tracking
  • Human motion/body skeleton pose prediction
  • Social group formation and identification
  • Individual, group and social activity recognition
  • Abnormal activity recognition
  • Motion prediction and social models
  • Human walking behaviour analysis
  • 2D and 3D semantic, instance or panoptic segmentation
  • Visual and social navigation in crowded scenes
  • Dataset proposals and bias analysis
  • New metrics and performance measure for different visual perception problems related to autonomous navigation


  • Submission deadline for the full papers:
    July 20 (Anywhere on Earth)
  • Acceptance notification of full papers:
    August 4
  • Camera-ready deadline for the full papers:
    August 18
  • Submission deadline for the extended abstracts:
  • Acceptance notification of the extended abstracts:


Submissions could follow the ECCV format (maximum 14 single-column pages excluding references) with the submission deadline of June 29, or extended abstract (maximum 2 page, single-column excluding references) with the submission deadline of TBD. Accepted papers have the opportunity to be presented as a poster during the workshop. However, only papers in ECCV format will appear in the proceedings. By submitting to this workshop, the authors agree to the review process and understand that we will do our best to match papers to the best possible reviewers. The reviewing process is double-blind. Submission to the challenge is independent of the paper submission, but we encourage the authors to submit to one of the challenges.

Submissions can be made here. If you have any questions about submitting, please contact us here.


Start Time End Time Description
12:30 PM 12:40 PM Introduction
12:40 PM 13:10 PM Invited Talk
13:10 PM 13:30 PM Full Paper Oral Presentations
13:30 PM 14:00 PM Invited Talk
14:00 PM 14:30 PM Coffee Break & Video Demo Session
14:30 PM 15:00 PM Invited Talk
15:00 PM 15:30 PM Invited Talk
15:30 PM 15:45 PM Introduction to JRDB Pose Dataset and Challenge
15:45 PM 16:10 PM Challenge Winners' Presentation
16:40 PM 16:50 PM Discussion, Closing Remarks and Awards


In addition to the existing benchmarks and challenges on JRDB (2D-3D person detection and tracking, human social group identification, individual action detection, and social activity recognition), in this workshop, we organise two new challenges using our new annotations:

  • 2D Human skeleton pose estimation
  • 2D Human skeleton pose tracking

The first winner of each of the challenges will be awarded a prize (TBD) and a certificate. The winners will also have an opportunity to present their work as a spotlight (5 minutes) and poster presentation during the workshop.


In our previous two workshops at ICCV 2019 and CVPR 2021 we introduced annotations for the JackRobbot dataset and benchmark (JRDB) including:

  • 2D bounding box and 3D oriented cuboids
  • 2D-3D associations between bounding boxes and cuboids.
  • Individual action labels for all the individuals, representing:
    • Human pose actions
    • Human-to-human and human-to-object interaction actions
    • The action label difficulty level for each attribute.
  • Social group formation of all the individuals and social activity labels for each social group
  • Time consistent trajectories (tracks) for all annotated individuals in both 2D and 3D.

Now, in addition to all of the annotation above, we introduce a new set of annotations for human body pose including:

  • Human body pose annotations, representing:
    • 17 Keypoints per person
    • Occlusion Severity for each keypoint (Visible, Occluded, Invisible)
  • Time consistent trajectories (tracks) for all annotated individuals

For more details about the dataset, see here.

Guidelines for participation

The participants should strictly follow the same submission policy provided in the main JRDB webpage, which can be found here. Also, in order to distinguish the challenge submissions from the other regular submissions, each submission name should be followed by a ECCV22 tag, e.g., "submissionname_ECCV22". Otherwise, we ignore those submissions for the challenge. Each challenge submission should be followed by an extended abstract submission via our CMT webpage (the details are available below) or a link to an existing Arxiv preprint/publication.

Evaluation & Toolkit

We use the first metric after "name" in all the leaderboards as the main evaluation for ranking the entries. For each benchmark, we have also created toolkits to work with the dataset, perform evaluation, and create submissions. These toolkits are available at here.

Invited Speakers

Gerard Pons-Moll

Professor of Computer Science, University of Tübingen

Yaser Sheikh

Associate Professor in the Robotics Institute, Carnegie Mellon University and Director, Facebook Reality Lab

Dima Damen

Professor of Computer Vision at the University of Bristol

Andreas Geiger

Professor of Computer Science, University of Tübingen

Dana Kulić

Professor at Monash University

Adrien Gaidon

Head of Machine Learning Research at Toyota Research Institute (TRI)

Program Committee

Name Organization
Aakash Kumar University of Central Florida
Dan Jia RWTH Aachen University
Edwin Pan Stanford University
Ehsan Adeli Stanford University
Haofei Xu University of Tübingen
Hsu-Kuang Chiu Waymo
Huangying Zhan The University of Adelaide
Karttikeya Mangalam UC Berkeley
Michael Wray University of Bristol
Michael Villamizar Idiap Research Institute
Mohsen Fayyaz Microsoft
Nathan Tsoi Yale University
Nikos Athanasiou Max Planck Institute for Intelligent Systems
Saeed Saadatnejad EPFL
Sandika Biswas TCS
Shyamal Buch Stanford University
Tianyu Zhu Monash University
Vida Adeli University of Toronto
Vineet Kosaraju Stanford University
Ye Yuan Carnegie Mellon University


Hamid Rezatofighi

Monash University

Edward Vendrow

Stanford University

Ian Reid

The University of Adelaide

Silvio Savarese

Stanford University