Back to trainings
IA & Data
Advanced Multimodal AI: Designing and Deploying Systems Combining Text, Image, Audio, and Video
2 jour(s) • 14h
Description
Master multimodal AI architectures and tools to design, integrate, and deploy pipelines combining text, image, audio, and video for advanced use cases.
Learning Objectives
- Understand the architectures of modern multimodal models
- Leverage Vision-Language models (CLIP, LLaVA, GPT-4V)
- Implement audio pipelines (transcription, voice analysis)
- Analyze and exploit video streams with AI models
- Design complete multimodal pipelines for production
- Identify and implement advanced business use cases
Target Audience
Data Scientists
Machine Learning Engineers
AI Architects
Lead AI/Data Developers
Prerequisites
Proficiency in Python and ML libraries (PyTorch or TensorFlow)
Knowledge of deep learning (CNNs, Transformers)
Experience with AI APIs (OpenAI, Google, Hugging Face)
Understanding of natural language processing and computer vision
Program Outline
Informations
Duration
2 jour(s)
14h
Tarif
Sur demande