Formation en IA & Data: Advanced Multimodal AI: Designing and Deploying Systems Combining Text, Image, Audio, and Video - Ascent Formation
Back to trainings
IA & Data

Advanced Multimodal AI: Designing and Deploying Systems Combining Text, Image, Audio, and Video

2 jour(s)14h

Description

Master multimodal AI architectures and tools to design, integrate, and deploy pipelines combining text, image, audio, and video for advanced use cases.

Learning Objectives

  • Understand the architectures of modern multimodal models
  • Leverage Vision-Language models (CLIP, LLaVA, GPT-4V)
  • Implement audio pipelines (transcription, voice analysis)
  • Analyze and exploit video streams with AI models
  • Design complete multimodal pipelines for production
  • Identify and implement advanced business use cases

Target Audience

Data Scientists
Machine Learning Engineers
AI Architects
Lead AI/Data Developers

Prerequisites

Proficiency in Python and ML libraries (PyTorch or TensorFlow)
Knowledge of deep learning (CNNs, Transformers)
Experience with AI APIs (OpenAI, Google, Hugging Face)
Understanding of natural language processing and computer vision

Program Outline

Informations

Duration

2 jour(s)

14h

Tarif

Sur demande

    Advanced Multimodal AI: Designing and Deploying Systems Combining Text, Image, Audio, and Video | Ascent Formation | Ascent Formation