Open-source AI competing with OpenAI and Google

NVIDIA dropped an audio AI that can think–about music, conversation and more. Mistral again into picture for another efficient open-source audio models. Boston University trained AI model on medical podcasts.

Audio Flamingo 3: an open-source audio language model

Audio Flamingo 3, a fully open-source Large Audio Language Model (LALM) with state-of-the-art performance in audio understanding and reasoning across 20+ benchmarks.

Audio Flamingo 3 radar image — Audio Flamingo 3

Some of the novel capabilities are:

Including multi-turn, multi-audio chat
On-demand thinking
Voice-to-Voice interaction
Long-context audio reasoning (up to 10 minutes).

Audio	Instruction	Ground-truth	AF3 Think
	What sport are the people in the audio doing? (A) Running (B) Playing badminton (C) Playing football (D) Playing tennis	(D) Playing tennis	The audio contains sounds of a ball being hit and bounced, along with the sound of shoes squeaking on a hard surface. These sounds are characteristic of a sport played on a hard court with a racket or shuttlecock. Tennis and badminton are both such sports. The presence of a ball being hit and bounced suggests a sport where a racket is used to hit a ball, which is more indicative of tennis than badminton. Therefore, the most likely sport is tennis.

Voxtral: an open-source text-to-speech model

Open, affordable, and production-ready speech understanding for everyone. Until recently, gaining truly usable speech intelligence in production meant choosing between two trade-offs:

Open-source ASR systems with high word error rates and limited semantic understanding
Closed, proprietary APIs that combine strong transcription with language understanding, but at significantly higher cost and with less control over deployment.

These capabilities make the Voxtral models ideal for real-world interactions and downstream actions, such as summaries, answers, analysis, and insights. For cost-sensitive use-cases, Voxtral Mini Transcribe outperforms OpenAI Whisper for less than half the price. For premium use cases, Voxtral Small matches the performance of ElevenLabs Scribe, also for less than half the price.

Audio	Transcribe
	देखो यार, वो client meeting आज postpone करनी पड़ेगी, क्योंकि मेरा calendar एकदम full है, तुम please उन्हें mail करके reschedule कर लो, और वो updated agenda भी share कर देना, urgently, ठीक है?

PodGPT: AI model learns from science podcasts

Boston University introduce a newly created computer program called PodGPT that learns from science and medicine podcasts to become smarter at understanding and answering scientific questions. The work is published in the journal npj Biomedical Innovations.

“We wanted to go beyond the written word,” said Kolachama. “By integrating spoken content, we aim to enhance our model’s understanding of conversational language and extend its application to more specialized contexts within STEMM [science, technology, engineering, mathematics and medicine] disciplines.”

More than 3700 hours of audio were transcribed into 42 million text tokens, allowing the model to absorb and learn from diverse expert discussions across multiple fields. By integrating spoken content, we aimed to enhance the model’s understanding of conversational language and extend its application to more specialized contexts within STEMM disciplines.

Wondering how the latest open-source audio models can supercharge your AI solutions? If your roadmap includes smart voice assistants, healthcare transcription, or dynamic multilingual chat services, our specialists can smoothly integrate and fine-tune these state-of-the-art models for your entire product architecture.

Reach out to us today and we’ll help you deliver the future of audio AI to your customers—while you maintain full control, scale effortlessly, and keep costs optimized.

Open-source AI competing with OpenAI and Google - AI updates July 2025

Audio Flamingo 3: an open-source audio language model

Voxtral: an open-source text-to-speech model

PodGPT: AI model learns from science podcasts

Sign up for more like this.

Continue Reading

GPT-5 Explained: Latest OpenAI Update, Key Improvements, and Use Cases

Why Ethical AI Is Key to Business Innovation, Trust, and Long-Term Success

Introduction to RAG: LLMs with Real-Time Knowledge Access

Email

Phone