2 PhD position available Multimodal (Audio and Vision) Conversational Foundation Models

2 PhD positions funded and in collaboration with Tavus inc in designing the next generation of conversation models – Multimodal Foundation Models that can see, hear, understand and generate audio and video with the responses of the Digital Human. The research will evolve around Generative AI and AudioVisual Language models, for analysis and understanding (e.g, VQA) and Generative AI (e.g., diffusion models) for image/video generation and editing. Please see here for more details.