GlossaryMultimodal ModelHow models like GPT-4o and Gemini process text, images, audio, and video together within a unified architecture.