Liquid AI Releases LocalCowork Powered By LFM2-24B-A2B to Execute Privacy-First Agent Workflows Locally Via Model Context Protocol (MCP)


Liquid AI has released LFM2-24B-A2B, a model optimized for local, low-latency tool dispatch, alongside LocalCowork, an open-source desktop agent application available in their Liquid4All GitHub Cookbook. The release provides a deployable architecture for running enterprise workflows entirely on-device, eliminating API calls and data egress for privacy-sensitive environments.

Architecture and Serving Configuration

To achieve low-latency execution on consumer hardware, LFM2-24B-A2B utilizes a Sparse Mixture-of-Experts (MoE) architecture. While the model contains 24 billion parameters in total, it only activates approximately 2 billion parameters per token during inference.

This structural design allows the model to maintain a broad knowledge base while significantly reducing the computational overhead required for each generation step. Liquid AI stress-tested the model using the following hardware and software stack:

  • Hardware: Apple M4 Max, 36 GB unified memory, 32 GPU cores.
  • Serving Engine: llama-server with flash attention enabled.
  • Quantization: Q4_K_M GGUF format.
  • Memory Footprint: ~14.5 GB of RAM.
  • Hyperparameters: Temperature set to 0.1, top_p to 0.1, and max_tokens to 512 (optimized for deterministic, strict outputs).

LocalCowork Tool Integration

LocalCowork is a completely offline desktop AI agent that utilizes the Model Context Protocol (MCP) to execute pre-built tools without relying on cloud APIs or compromising data privacy, logging every action to a local audit trail. The system includes 75 tools across 14 MCP servers capable of handling tasks like filesystem operations, OCR, and security scanning. However, the provided demo focuses on a highly reliable, curated subset of 20 tools across 6 servers, each rigorously tested to achieve over 80% single-step accuracy and verified multi-step chain participation.

LocalCowork acts as the practical implementation of this model. It operates completely offline and comes pre-configured with a suite of enterprise-grade tools:

  • File Operations: Listing, reading, and searching across the host filesystem.
  • Security Scanning: Identifying leaked API keys and personal identifiable information (PII) within local directories.
  • Document Processing: Executing Optical Character Recognition (OCR), parsing text, diffing contracts, and generating PDFs.
  • Audit Logging: Recording every tool call locally for compliance tracking.

Performance Benchmarks

Liquid AI team evaluated the model against a workload of 100 single-step tool selection prompts and 50 multi-step chains (requiring 3 to 6 discrete tool executions, such as searching a folder, running OCR, parsing data, deduplicating, and exporting).

Latency

The model averaged ~385 ms per tool-selection response. This sub-second dispatch time is highly suitable for interactive, human-in-the-loop applications where immediate feedback is necessary.

Accuracy

  • Single-Step Executions: 80% accuracy.
  • Multi-Step Chains: 26% end-to-end completion rate.

Key Takeaways

  • Privacy-First Local Execution: LocalCowork operates entirely on-device without cloud API dependencies or data egress, making it highly suitable for regulated enterprise environments requiring strict data privacy.
  • Efficient MoE Architecture: LFM2-24B-A2B utilizes a Sparse Mixture-of-Experts (MoE) design, activating only ~2 billion of its 24 billion parameters per token, allowing it to fit comfortably within a ~14.5 GB RAM footprint using Q4_K_M GGUF quantization.
  • Sub-Second Latency on Consumer Hardware: When benchmarked on an Apple M4 Max laptop, the model achieves an average latency of ~385 ms for tool-selection dispatch, enabling highly interactive, real-time workflows.
  • Standardized MCP Tool Integration: The agent leverages the Model Context Protocol (MCP) to seamlessly connect with local tools—including filesystem operations, OCR, and security scanning—while automatically logging all actions to a local audit trail.
  • Strong Single-Step Accuracy with Multi-Step Limits: The model achieves 80% accuracy on single-step tool execution but drops to a 26% success rate on multi-step chains due to ‘sibling confusion’ (selecting a similar but incorrect tool), indicating it currently functions best in a guided, human-in-the-loop loop rather than as a fully autonomous agent.

Check out the Repo and Technical detailsAlso, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source link

  • Related Posts

    Google AI Releases a CLI Tool (gws) for Workspace APIs: Providing a Unified Interface for Humans and AI Agents

    Integrating Google Workspace APIs—such as Drive, Gmail, Calendar, and Sheets—into applications and data pipelines typically requires writing boilerplate code to handle REST endpoints, pagination, and OAuth 2.0 flows. Google AI…

    A Coding Guide to Build a Scalable End-to-End Machine Learning Data Pipeline Using Daft for High-Performance Structured and Image Data Processing

    In this tutorial, we explore how we use Daft as a high-performance, Python-native data engine to build an end-to-end analytical pipeline. We start by loading a real-world MNIST dataset, then…

    Leave a Reply

    Your email address will not be published. Required fields are marked *