Enhanced multimodal representation learning with cross-modal KD

Publication
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition