
Liquid AI has released LFM2-24B-A2B, a model optimized for local, low-latency tool dispatch, alongside LocalCowork, an open-source desktop agent application available in their Liquid4All GitHub Cookbook. The release provides a deployable architecture for running enterprise workflows entirely on-device, eliminating API calls and data egress for privacy-sensitive environments.
Architecture and Serving Configuration
To achieve low-latency execution on consumer hardware, LFM2-24B-A2B utilizes a Sparse Mixture-of-Experts (MoE) architecture. While the model contains 24 billion parameters in total, it only activates approximately 2 billion parameters per token during inference.
This structural design allows the model to maintain a broad knowledge base while significantly reducing the computational overhead required for each generation step. Liquid AI stress-tested the model using the following hardware and software stack:
- Hardware: Apple M4 Max, 36 GB unified memory, 32 GPU cores.
- Serving Engine:
llama-serverwith flash attention enabled. - Quantization:
Q4_K_M GGUFformat. - Memory Footprint: ~14.5 GB of RAM.
- Hyperparameters: Temperature set to 0.1, top_p to 0.1, and max_tokens to 512 (optimized for deterministic, strict outputs).
LocalCowork Tool Integration
LocalCowork is a completely offline desktop AI agent that utilizes the Model Context Protocol (MCP) to execute pre-built tools without relying on cloud APIs or compromising data privacy, logging every action to a local audit trail. The system includes 75 tools across 14 MCP servers capable of handling tasks like filesystem operations, OCR, and security scanning. However, the provided demo focuses on a highly reliable, curated subset of 20 tools across 6 servers, each rigorously tested to achieve over 80% single-step accuracy and verified multi-step chain participation.
LocalCowork acts as the practical implementation of this model. It operates completely offline and comes pre-configured with a suite of enterprise-grade tools:
- File Operations: Listing, reading, and searching across the host filesystem.
- Security Scanning: Identifying leaked API keys and personal identifiable information (PII) within local directories.
- Document Processing: Executing Optical Character Recognition (OCR), parsing text, diffing contracts, and generating PDFs.
- Audit Logging: Recording every tool call locally for compliance tracking.
Performance Benchmarks
Liquid AI team evaluated the model against a workload of 100 single-step tool selection prompts and 50 multi-step chains (requiring 3 to 6 discrete tool executions, such as searching a folder, running OCR, parsing data, deduplicating, and exporting).
Latency
The model averaged ~385 ms per tool-selection response. This sub-second dispatch time is highly suitable for interactive, human-in-the-loop applications where immediate feedback is necessary.
Accuracy
- Single-Step Executions: 80% accuracy.
- Multi-Step Chains: 26% end-to-end completion rate.
Key Takeaways
- Privacy-First Local Execution: LocalCowork operates entirely on-device without cloud API dependencies or data egress, making it highly suitable for regulated enterprise environments requiring strict data privacy.
- Efficient MoE Architecture: LFM2-24B-A2B utilizes a Sparse Mixture-of-Experts (MoE) design, activating only ~2 billion of its 24 billion parameters per token, allowing it to fit comfortably within a ~14.5 GB RAM footprint using
Q4_K_M GGUFquantization. - Sub-Second Latency on Consumer Hardware: When benchmarked on an Apple M4 Max laptop, the model achieves an average latency of ~385 ms for tool-selection dispatch, enabling highly interactive, real-time workflows.
- Standardized MCP Tool Integration: The agent leverages the Model Context Protocol (MCP) to seamlessly connect with local tools—including filesystem operations, OCR, and security scanning—while automatically logging all actions to a local audit trail.
- Strong Single-Step Accuracy with Multi-Step Limits: The model achieves 80% accuracy on single-step tool execution but drops to a 26% success rate on multi-step chains due to ‘sibling confusion’ (selecting a similar but incorrect tool), indicating it currently functions best in a guided, human-in-the-loop loop rather than as a fully autonomous agent.
Check out the Repo and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.






