
Liquid AI has launched LFM2.5-VL-450M, a new vision-language model designed for edge deployment, featuring bounding box prediction and enhanced capabilities within a 450M parameter footprint.
Liquid AI’s latest release, LFM2.5-VL-450M, represents a significant step forward for vision-language models (VLMs). This model tackles the challenge of deploying powerful VLM capabilities on resource-constrained edge devices. Unlike many large VLMs that require substantial GPU memory and cloud infrastructure, LFM2.5-VL-450M is specifically engineered to run directly on hardware like NVIDIA Jetson Orin, AMD Ryzen AI Max+ 395 mini-PC APUs, and the Snapdragon 8 Elite within the Samsung S25 Ultra. The core of the model leverages the LFM2.5-350M language model backbone combined with a SigLIP2 NaFlex shape-optimized vision encoder. With a 32,768-token context window and a 65,536-vocabulary size, it efficiently handles image and text data. The model’s image processing capabilities include native resolution support up to 512x512 pixels, preservation of non-standard aspect ratios, and a tiling strategy that efficiently handles large images by splitting them into 512x512 patches while incorporating thumbnail encoding for crucial global scene understanding. This thumbnail encoding prevents the model from relying solely on local patches. Users can dynamically adjust parameters at inference time to optimize speed and quality depending on the hardware constraints. The training process significantly increased the model’s capabilities. Liquid AI scaled pre-training to 28T tokens, building upon the 10T tokens used for the previous LFM2-VL-450M. Post-training involved preference optimization and reinforcement learning to improve grounding, instruction following, and overall reliability. A key new feature is bounding box prediction, demonstrated by an impressive 81.28 score on the RefCOCO-M benchmark, a metric for object localization accuracy in natural language descriptions – a massive improvement over the previous model’s zero score. Recommended generation parameters include temperature=0.1, min_p=0.15, and repetition_penalty=1.05 for text, and min_image_tokens=32, max_image_tokens=256, and do_image_splitting=True for vision inputs. DATA: None
✨ This report was generated by AI News Assistant.
Tags:
United States