Pedestrian Detection for Mobile Platforms

Project Goal

"Produce a deep network capable of detecting and localizing nearby, potentially occluded pedestrians under reasonable lighting conditions, that can be integrated into a mobile robot (eg. Tegra X1 chip)"

Compression Techniques

Knowledge Distillation

Train a large network as teacher model and transfer the knowledge to smaller student model [Hinton, 2015]

Model Confidence

Estimate teacher confidence by enable dropout at test time [Yarin Gal, 2015], and maximize the log-likehood of multivariate Gaussian distribution

Hint Layer

Increase the dimensionality of teacher's guidance by adding extra fully-connected layer to allow better representation of teacher output distribution

Hand-designed Features

Use Aggregate Channel Features [Dollar, 2014] as hand-designed feature to increse the student model complexity without introducing significant overhead

Results

Log-average miss rate on Caltech
(lower is better)
Models	Log-avg MR	Drop
Teacher	17.5%	0.0%
Student	24.5%	7.0%
Student+KD	24.8%	7.3%
Student+KD+Conf	23.7%	6.2%
Student+KD+Hint	23.1%	5.6%
Student+KD+Conf+Hint	22.4%	4.9%
Student+KD+ACF	25.2%	7.7%
Student+KD+ACF+Conf+Hint	23.4%	5.9%

Resource Usage
(Measured on Titan-X)
Models	Params	Memory	Speed
ResNet-200 (Teacher)	63 M	4.93 GB	24 ms
ResNet-18	11 M	612 MB	3 ms
ResNet-18-Thin	2.8 M	308 MB	3 ms
ResNet-18-Small	0.16 M	240 MB	3 ms