Liquid AI Releases LocalCowork Powered By LFM2-24B-A2B to Execute Privacy-First Agent Workflows Locally Via Model Context Protocol (MCP)

Liquid AI has released LFM2-24B-A2B, a model optimized for local, low-latency tool dispatch, alongside LocalCowork, an open-source desktop agent application available in their Liquid4All GitHub Cookbook. The release provides a deployable architecture for running enterprise workflows entirely on-device, eliminating API calls and data egress for privacy-sensitive environments.

Architecture and Serving Configuration

To achieve low-latency execution on consumer hardware, LFM2-24B-A2B utilizes a Sparse Mixture-of-Experts (MoE) architecture. While the model contains 24 billion parameters in total, it only activates approximately 2 billion parameters per token during inference.

This structural design allows the model to maintain a broad knowledge base while significantly reducing the computational overhead required for each generation step. Liquid AI stress-tested the model using the following hardware and software stack:

Hardware: Apple M4 Max, 36 GB unified memory, 32 GPU cores.
Serving Engine: llama-server with flash attention enabled.
Quantization: Q4_K_M GGUF format.
Memory Footprint: ~14.5 GB of RAM.
Hyperparameters: Temperature set to 0.1, top_p to 0.1, and max_tokens to 512 (optimized for deterministic, strict outputs).

LocalCowork Tool Integration

LocalCowork is a completely offline desktop AI agent that utilizes the Model Context Protocol (MCP) to execute pre-built tools without relying on cloud APIs or compromising data privacy, logging every action to a local audit trail. The system includes 75 tools across 14 MCP servers capable of handling tasks like filesystem operations, OCR, and security scanning. However, the provided demo focuses on a highly reliable, curated subset of 20 tools across 6 servers, each rigorously tested to achieve over 80% single-step accuracy and verified multi-step chain participation.

LocalCowork acts as the practical implementation of this model. It operates completely offline and comes pre-configured with a suite of enterprise-grade tools:

File Operations: Listing, reading, and searching across the host filesystem.
Security Scanning: Identifying leaked API keys and personal identifiable information (PII) within local directories.
Document Processing: Executing Optical Character Recognition (OCR), parsing text, diffing contracts, and generating PDFs.
Audit Logging: Recording every tool call locally for compliance tracking.

Performance Benchmarks

Liquid AI team evaluated the model against a workload of 100 single-step tool selection prompts and 50 multi-step chains (requiring 3 to 6 discrete tool executions, such as searching a folder, running OCR, parsing data, deduplicating, and exporting).

Latency

The model averaged ~385 ms per tool-selection response. This sub-second dispatch time is highly suitable for interactive, human-in-the-loop applications where immediate feedback is necessary.

Accuracy

Single-Step Executions: 80% accuracy.
Multi-Step Chains: 26% end-to-end completion rate.

Key Takeaways

Privacy-First Local Execution: LocalCowork operates entirely on-device without cloud API dependencies or data egress, making it highly suitable for regulated enterprise environments requiring strict data privacy.
Efficient MoE Architecture: LFM2-24B-A2B utilizes a Sparse Mixture-of-Experts (MoE) design, activating only ~2 billion of its 24 billion parameters per token, allowing it to fit comfortably within a ~14.5 GB RAM footprint using Q4_K_M GGUF quantization.
Sub-Second Latency on Consumer Hardware: When benchmarked on an Apple M4 Max laptop, the model achieves an average latency of ~385 ms for tool-selection dispatch, enabling highly interactive, real-time workflows.
Standardized MCP Tool Integration: The agent leverages the Model Context Protocol (MCP) to seamlessly connect with local tools—including filesystem operations, OCR, and security scanning—while automatically logging all actions to a local audit trail.
Strong Single-Step Accuracy with Multi-Step Limits: The model achieves 80% accuracy on single-step tool execution but drops to a 26% success rate on multi-step chains due to ‘sibling confusion’ (selecting a similar but incorrect tool), indicating it currently functions best in a guided, human-in-the-loop loop rather than as a fully autonomous agent.

Check out the Repo and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Source link