In the context of orchestration, this would allow us to find audio signals with identical timbre properties but that come from widely different musical notations (different scores leading to a similar perceptual effect). This could provide a computer-assisted orchestration software that could orchestrate any given sound signal. A wide set of first-of-their-kind tasks will also be explored from this multimodal inference.