IA & Data

Advanced Multimodal AI: Designing and Deploying Systems Combining Text, Image, Audio, and Video

2 jour(s) • 14h

Description

Master multimodal AI architectures and tools to design, integrate, and deploy pipelines combining text, image, audio, and video for advanced use cases.

Learning Objectives

Understand the architectures of modern multimodal models
Leverage Vision-Language models (CLIP, LLaVA, GPT-4V)
Implement audio pipelines (transcription, voice analysis)
Analyze and exploit video streams with AI models
Design complete multimodal pipelines for production
Identify and implement advanced business use cases

Target Audience

Data Scientists

Machine Learning Engineers

AI Architects

Lead AI/Data Developers

Prerequisites

Proficiency in Python and ML libraries (PyTorch or TensorFlow)

Knowledge of deep learning (CNNs, Transformers)

Experience with AI APIs (OpenAI, Google, Hugging Face)

Understanding of natural language processing and computer vision

Program Outline

Informations

Duration

2 jour(s)

14h

Tarif

Sur demande

Similar Trainings

IA & Data

Cloud Migration

2 jour(s)

Sur demande

IA & Data

AI Agents – Designing Autonomous Systems with LangChain and LangGraph – Advanced

3 jour(s)

Sur demande

IA & Data

European AI Act — Understanding your obligations and ensuring compliance — Beginner level

1 jour(s)

Sur demande

IA & Data

Data Analysis with Microsoft Power BI (4-167)

3 jour(s)

2200 € HT