Distributed training of CNNs
Challenge
When performing object detection on large datasets, using a single GPU machine can be impractical. We developed a system to perform distributed training of deep learning models across clusters of GPU-enabled VMs.
With a dataset of 1500h of traffic video, and the objective of detecting cars, buses, motorbikes, bicycles, and people, we turned to distributed deep learning to speed up the process.
We used AzureML to manage the GPU cluster, schedule the jobs and scale the resources. We performed experiments with a 32 GPUs and a 64 GPUs clusters. The data IO is an important factor in distributed training, to speed the data loading we used Azure Premium Blob Storage.
Results
Training time: | x28-55x faster |
Data IO: | +85% faster (with premium blob) |
Mean average precision: | +127% (over baseline) |