Joint signal-symbol embedding space and multivariate time series

The aim of this subtask is to conceive of models able to learn embedding spaces linking symbolic, acoustic, and perceptual sources of information.