CVPR2024: 5th Workshop on Robot Visual Perception in Human Crowded Environments.

The JackRabbot Open-World Panoptic Segmentation & Tracking Dataset and Benchmark


Workshop Goals

Welcome to the 5th workshop in the JRDB series! Our workshops are designed to tackle perceptual problems that arise when autonomous robots operate and interact in human environments including human detection, tracking, forecasting, and body pose estimation, as well as social grouping and activity recognition.

In this workshop, we are excited to explore the challenging task of Open-Word Panoptic Segmentation and Tracking in human-centred environments. We will introduce a new competition that challenges participants to develop models capable of accurately predicting panoptic segmentation and tracking results using the JRDB dataset. We have also invited speakers from the field of visual perception and robotics to offer valuable insights into understanding human-centred scenes.

Call for Papers


We invite researchers to submit their papers addressing topics related to autonomous robot in human environments. Relevant topics include, but not limited to:

  • 2D or 3D human detection and tracking
  • 2D or 3D semantic, instance or panoptic segmentation
  • 2D or 3D open-world recognition
  • 2D or 3D skeleton pose estimation and tracking
  • Human trajectory forecasting
  • Human motion prediction and safety
  • Visual scene prediction
  • Human-robot Interaction considering predictions
  • Visual and social navigation in crowded scenes
  • Human motion/body skeleton pose prediction
  • Predictive planning and control
  • Individual, group and social activity recognition
  • Human walking behaviour analysis
  • Dataset proposals and bias analysis
  • New metrics and performance measure for different visual perception problems related to autonomous robot


  • Submission deadline for the full papers:
    April 4 - 23:59 PT
  • Acceptance notification of full papers:
    April 12 - 23:59 PT
  • Camera-ready deadline for the full papers:
    April 14 - 23:59 PT
  • Submission deadline for the extended abstracts:
    June 5 - 23:59 PT
  • Acceptance notification of the extended abstracts:
    June 17 - 23:59 PT

More information

    The Best Paper Award will be granted if more than five papers are accepted.


Full papers are up to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. For more information, please refer to the guidelines provided by CVPR here. Extended abstracts should also adhere to the CVPR style and are restricted to a maximum of one page, excluding references. Accepted papers have the opportunity to be presented as a poster during the workshop. However, only full-papers will appear in the proceedings. By submitting to this workshop, the authors agree to the review process and understand that we will do our best to match papers to the best possible reviewers. The reviewing process is double-blind. Submission to the challenge is independent of the paper submission, but we encourage the authors to submit to one of the challenges.

Submissions website: here. If you have any questions about submitting, please contact us here.

Submit to our Workshop

Accepted Papers

InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500K Dialogues

Hanbo Zhang, Jie Xu, Yuchen Mo, Tao Kong

Must Unsupervised Continual Learning Relies on Previous Information?

Haoyang Cheng, Haitao Wen, Heqian Qiu, Lanxiao Wang, Minjian Zhang, Hongliang Li

HumanFormer: Human-centric Prompting Multi-modal Perception Transformer for Referring Crowd Detection

Heqian Qiu, Lanxiao Wang, Taijin Zhao, Fanman Meng, Hongliang Li

GM-DETR: Generalized Multispectral DEtection TRansformer with Efficient Fusion Encoder for Visible-Infrared Detection

Yiming Xiao, Fanman Meng, Qingbo Wu, Linfeng Xu, Mingzhou He, Hongliang Li

Pre-trained Bidirectional Dynamic Memory Network For Long Video Question Answering

Jinmeng Wu, Pengcheng Shu, HanYu Hong, Ma Lei, Ying Zhu, Wang Lei

DSTCFuse: A Method based on Dual-cycled Cross-awareness of Structure Tensor for Semantic Segmentation via Infrared and Visible Image Fusion

XUAN LI, Rongfu Chen, Jie Wang, Ma Lei, Li Cheng

Is Our Continual Learner Reliable? Investigating Its Decision Attribution Stability through SHAP Value Consistency

Yusong Cai, Shimou Ling, Liang Zhang, Lili Pan, Hongliang Li

Open Challenge

We are organizing a new challenge on Closed- and Open-World Panoptic Segmentation and Tracking .

The challenge includes:

Challenge Guidelines

Evaluation Protocol

Submissions will be evaluated based on:

All methods are evaluated on stitched images from the dataset at 1Hz (every 15th images). Participants can also use individual camera views for training, supported by provided camera calibration parameters and stitching code. For more details on the dataset, visit JRDB-PanoTrack dataset.


The best submission from each account will be displayed automatically during challenge period, encouraging competition and showcasing leading-edge methods.


  • Data release and challenge start: April 1 - 23:59 PT
    For dataset information and download: For evaluation toolkit: For data manimulation tutorials:
  • Leaderboard available and submissions allowed: April 14 - 23:59 PT
    Submit to benchmark:
  • Challenge close:
    June 1 - 23:59 PT


PST Time Speakers Topic
1:30 - 1:40 Organisers Introduction
1:40 - 2:10 Laura Leal-Taixe Towards segmenting anything in Lidar
2:10 - 2:30 Organisers JRDB dataset, annotations, and challenges
2:30 - 3:15 Break - Poster session
3:15 - 3:45 Deva Ramanan Self-supervised learning of dynamic scenes from moving cameras
3:45 - 4:15 Rita Cucchiara Embodied Navigation by visual and Language interaction with objects and people
4:15 - 4:45 Bolei Zhou UrbanSim: Generating and Simulating Diverse Urban Spaces for Embodied AI Research
4:45 - 5:15 Michael Milford Introspection, localization, and similar scene inference for autonomous systems
5:15 - 5:30 Organisers Conclusion

Please follow the ZOOM link for virtual attendance.

Invited Speakers

Rita Cucchiara

Full Professor at University of Modena and Reggio Emilia, Italy

Laura Leal-Taixe

NVIDIA | Adjunct Professor at Technical University of Munich

Deva Ramanan

Professor at Carnegie Mellon University, USA

Bolei Zhou

Assistant Professor at University of California, Los Angeles, USA

Michael Milford

Professor at Queensland University of Technology, Australia

Program Committee

Name Organization
Apoorv Singh Motional
Gengze Zhou The University of Adelaide
Haodong Hong The University of Queensland
Houzhang Fang Xidian University
Jiarong Guo Hong Kong University of Science and Technology
Jin Li Shaanxi Normal University
Michael Wray University of Bristol
Mingtao Feng Xidian University
Qi Wu Shanghai Jiao Tong University
Shuai Guo Shanghai Jiaotong University
Shun Taguchi Toyota Central R&D Labs., Inc.
Tiago Rodrigues de Almeida Örebro University
Weijia Liu Southeast University
Zijie Wu University of Western Australia
Ziyu Ma Hunan University


Hamid Rezatofighi

Monash University

Alexandre Alahi


Ian Reid

The University of Adelaide

Duy Tho Le*

Monash University

Hengcan Shi

Monash University

Chenhui Gou

Monash University

* Corresponding organizer. For inquiries, please contact