Preparing Submissions

Submission Policy

  • We strongly encourage all participants to use only the provided training data split to develop their algorithms (e.g. for the learning process and/or parameter tuning). The test data split should be used only to generate final results for a new submission to the challenge. Please, do not use the challenge submission system as way to tune your algorithm!
  • Important: We limit the number of submissions per account to THREE per month for each task. We only consider the best submission per account for the leaderboard. It is STRICTLY PROHIBITED to create multiple accounts using different email addresses! We will actively monitor submissions and delete accounts violating these rules based on invalid or repeating supervisor and institution.
  • For both 2D and 3D tracking challenge, the participants may opt between using their own detector or using the detections we provide in the website.
  • Submissions to the challenge should, at least, be accompanied by a short abstract (up to 5000 characters) explaining the technical details of the method used.
  • Metadata can be edited after submission by clicking edit where previous submissions are displayed. Note that you can update metadata for up to 6 months, after which submissions become finalized. If a submission is still anonymous after 6 months, it will be deleted.
  • Currently, all tracking and detection submissions are evaluated on stitched images and not the individual images but participants are free to use all available data.
  • Note incorrect submission format may result error in evaluation or abnormal results.

Development Kits

All of our developemnt kits can be found here. For more details, check out the task-specific information below.

Open Tracking Development Kit
The tracking development kit has been adapted from TrackEval, extended with our implementation for 3D tracking and OSPA. The original Git Repo provides details on the file structure under preparing submissions.
We provide python script to convert json to txt format:
python convert_to_kitti_tracking.py --input json_folder --output output_dir # 2d tracking --depth 0
python convert_to_kitti_tracking.py --input json_folder --output output_dir # 2d tracking --depth 1

To evaluate tracking results, cd to TrackEval and run the script:
python scripts/run_jrdb_2d.py --gt gt_folder --tracker tracker_folder # 2d tracking
python scripts/run_jrdb_3d.py --gt gt_folder --tracker tracker_folder # 3d tracking

The result will be printed in the terminal and also saved as a txt file in current dir.
As an example:
python scripts/run_jrdb_2d.py --gt data/gt/jrdb/jrdb_2d_box_train --tracker data/tracker/jrdb/jrdb_2d_box_train # 2d tracking
python scripts/run_jrdb_2d.py --gt data/gt/jrdb/jrdb_3d_box_train --tracker data/tracker/jrdb/jrdb_3d_box_train # 3d tracking

For our modification to original repo, we added a new metric class called OSPA, we added two new script in /scripts called run_jrdb_2d.py and run_jrdb_3d.py. We add two new dataset class kitti_2d_box.py and kitti_3d_box.py. These file generally follow Kitti Tracking implementation.

Open Detection Development Kit
The detection development kit has been adapted from KITTI Detection to for the format of our dataset. A sperated python file is provided for OSPA metric.
We also provide python script to convert json to txt format:
python convert_dataset_to_KITTI.py -i JRDB -o KITTI_converted_JRDB
For evaluate AP at 0.3, 0.5 and 0.8 threshold with cpp script:
g++ -O3 -o evaluate_object evaluate_object.cpp # compile cpp first.
./evaluate_object path/to/groundtruth path/to/results 1 output_file.txt # then run cpp file with gt, predictions and output file.

The evaluation script will print out 41-points precision/recall values. The result file will report the corresponding 41-points interpolated average precision, just as in KITTI.
For evaluate OSPA with python script:
python ospa_2d_det.py --gt gt_folder --pred pred_folder # 2d detections
python ospa_3d_det.py --gt gt_folder --pred pred_folder # 3d detections

The overall and per sequence results will be saved in a txt file.

Open Individual Action and Social Grouping/Activity Development Kit
The action/group/activity development kit has been adapted from AVA and extended to our dataset and different tasks.

Open Human Pose Development Kit

The pose development kit contains code for evaluation predictions on JRDB-Pose.

Human Trajectory Forecasting Development Kit

The human trajectory forecasting development kit contains code for evaluation of forecast trajectories.

Panoptic Segmentation and Tracking

The panoptic segmentation and tracking development kit contains code for evaluation of panoptic segmentation and tracking.


Visualization Toolkit

We have also created a visualization toolkit to make it easy to visualize your predictions on JRDB. Check out the Visualisation Toolkit , which has been adapted from Kitti Object Visualisation

Criteria for Evaluation

We adopted the wide-established metrics and criteria from KITTI and AVA. Details about the criteria can be found in the following document:

Evaluation of Tracking Same as most datasets in TrackEval, We will use several metric families to evaluate results: OSPA, Clear-MOT, HOTA and Identity. Each of them contains a set of metrics. Additional metrics may be included later in the challenge.

Evaluation of Detection: We will use OSPA and precision to evaluate the performance of each detection submission. However, we will also report recall and AOS for 2D detection. Additional metrics may be included later in the challenge.

Evaluation of Action/Group/Activity Detection: We use mean Average Precision (mAP) to evaluate the performance of each task. We also provide detailed AP results per-sequence and per-category.

Evaluation of Pose Detection: We use both Average Precision (based on thresholded OKS) as well as OSPA-Pose to evaluate the performance of each task. We further provide detailed AP results per-sequence and per-category. Since we only label some people in a scene (tiny people will not be labeled), we forgive predicted poses for unlabeled people by matching poses with all ground-truth boxes.

Evaluation of Trajectory Forecasting: We use both EFE (End-to-end Forecasting Error) as well as OSPA-Trajectory to evaluate the performance of each submission. Since some people disapear in the hidden future, we forgive forecast trajectories for disapeared people by matching trajectories with all ground-truth trajectories.

Tracking

Detection

Action/Group/Activity

Human Pose Detection & Tracking

Trajectory Forecasting