Vision Encoder - Search News

Encoder-Free AI explained: The architecture behind Google’s Gemma 4 12B

A vast majority of multi-modal AI systems function as a relay race. For example, an image will come in through the Vision ...

Tech Times

Google Gemma 4 12B Brings Multimodal AI to 16GB Laptops, Free Under Apache 2.0

Google Gemma 4 12B, released June 3, is an open-weight multimodal model that processes text, images, audio, and video in a ...

VentureBeat

New fully open source vision encoder OpenVision arrives to improve on OpenAI’s Clip, Google’s SigLIP

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more The University of California, Santa Cruz ...

VentureBeat

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop

Credit: VentureBeat made with OpenAI ChatGPT-Images-2.0 While many AI open source model providers are pursuing larger and more powerful models, Google is still giving attention to the smaller, more ...

InfoQ

Show inaccessible results

Encoder-Free AI explained: The architecture behind Google’s Gemma 4 12B

Google Gemma 4 12B Brings Multimodal AI to 16GB Laptops, Free Under Apache 2.0

New fully open source vision encoder OpenVision arrives to improve on OpenAI’s Clip, Google’s SigLIP

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop

Gemma 3 Supports Vision-Language Understanding, Long Context Handling, and Improved Multilinguality

Microsoft Builds A Compact AI Model That Decides When To Think

Hanwha Vision Video Encoders