AI
Be My Eyes: Small 'eyes', big 'brain'—a modular path to multimodal AI
LLMs are great thinkers—but they’re mostly text-only. BeMyEyes is a new way to give them “sight” without building giant, expensive multimodal models. * Two agents, one goal: a lean Perceiver (vision-language model) looks at images or other formats, while a powerful Reasoner LLM thinks through the answer. They collaborate