OpenClaw集成LiteLLM实现智能模型路由：本地+云端混合部署

发表于 2026-04-01 更新于 2026-05-26 分类于笔记

在 OpenClaw 中集成 LiteLLM Proxy，通过其自定义 Hook 实现「本地小模型处理简单问题＋云端思考型大模型处理复杂问题」的自动路由。复杂度路由是 LiteLLM 的功能，不是 OpenClaw 原生能力。

环境状态：OpenClaw v2026.4.1、oMLX、Qwen3.5-4B-MLX-4bit 已安装完成。
重要说明：复杂度智能路由通过 LiteLLM 的 async_pre_call_hook 实现，OpenClaw 本身不支持基于任务复杂度自动选择模型的功能。

OpenClaw 原生路由能力分析

OpenClaw 原生路由能力（v2026.4.1）

路由类型	支持情况	说明
多代理路由	✅ 支持	按 channel/account/peer 路由到不同 agent
会话路由	✅ 支持	基于 session key 的路由（线程、主题）
子代理分发	✅ 支持	使用 requesterOrigin 进行分发
模型故障转移	✅ 支持	agents.defaults.model.fallbacks
复杂度智能路由	❌ 不支持	无根据任务复杂度自动选择模型的能力

为什么选择 LiteLLM 方案？

flowchart TD
    subgraph OpenClaw原生["OpenClaw 原生路由能力"]
        A1["1️⃣ 多代理路由<br/>(Multi-Agent Routing)"] --> A1_result{✓ 根据用户/频道路由到不同 Agent<br/>✗ 不能根据任务复杂度选择模型}
        A2["2️⃣ 模型故障转移<br/>(Model Failover)"] --> A2_result{✓ 模型失败时切换备用模型<br/>✗ 简单的轮询/列表切换<br/>不考虑任务复杂度}
        A3["3️⃣ 复杂度智能路由<br/>(Complexity Routing)"] --> A3_result{✗ OpenClaw 完全不支持<br/>✓ 需要通过 LiteLLM Custom Hook 实现}
    end

LiteLLM vs OpenClaw 原生功能对比

功能	LiteLLM + Hook	OpenClaw 原生
复杂度评分	✅ Token + 关键词 + 代码检测	❌ 不支持
智能路由	✅ 自动选择本地/云端模型	❌ 不支持
故障转移	✅ 多层自动转移	✅ 基础故障转移
成本追踪	✅ 统一计费	❌ 分散计费
统一入口	✅ 单一 API 调用	❌ 需要多配置

结论：OpenClaw 原生路由主要用于多代理场景（不同用户/频道使用不同 Agent），而非复杂度路由（根据任务难度选择模型）。真正的智能路由必须依赖 LiteLLM 的 Custom Hook。

核心价值

通过 LiteLLM Proxy + 自定义 Hook 实现真正的本地+云端复杂度智能路由：

flowchart TB
    A["用户请求"] --> B["OpenClaw Gateway"]
    B --> C["LiteLLM Proxy<br/>(复杂度路由层)"]
    C --> D["async_pre_call_hook<br/>(复杂度分析)"]

    D --> D1["Token 数量统计"]
    D --> D2["关键词模式匹配"]
    D --> D3["复杂度评分算法"]

    D1 --> E{评分 ≥ 阈值?}
    D2 --> E
    D3 --> E

    E -->|是| F["astron-code-latest<br/>(云端)"]
    E -->|否| G["qwen-local<br/>(本地)"]

    F --> H["云端模型<br/>Astron Code<br/>(深度推理)"]
    G --> I["本地模型<br/>oMLX Qwen3.5-4B<br/>(毫秒响应)"]

核心优势：

真正的复杂度路由：自动根据任务复杂度选择模型（LiteLLM Hook 功能）
零成本处理简单任务：本地模型响应简单查询
智能故障转移：模型不可用时自动切换
统一成本追踪：集中查看所有模型消费

一、架构概述

flowchart TB
    subgraph OpenClaw["OpenClaw Gateway (v2026.4.1 + skill-vetter)"]
        A["用户请求"]
    end

    A -->|/v1/chat/completions| B["LiteLLM Proxy<br/>(localhost:4000 复杂度路由层)"]

    subgraph LiteLLM["LiteLLM Proxy"]
        direction TB
        C["complexity_router.py<br/>• 复杂度评分算法<br/>• 关键词匹配<br/>• 模型路由决策"]
        D["config.yaml<br/>• 模型列表定义<br/>• 故障转移规则<br/>• Hook 注册"]
    end

    B --> C
    B --> D

    C --> E{"模型选择"}
    E -->|简单任务| F["本地模型"]
    E -->|复杂任务| G["云端模型"]

    subgraph Local["本地模型"]
        F1["oMLX Qwen3.5-4B-MLX<br/>(localhost:8000)<br/>• 零成本<br/>• 毫秒级响应"]
    end

    subgraph Cloud["云端模型"]
        G1["Astron Code / Claude / GPT-4<br/>(api.astron.com 等)<br/>• 按量计费<br/>• 深度推理能力"]
    end

    F --> F1
    G --> G1

重要说明：复杂度路由完全由 LiteLLM Proxy 的自定义 Hook 实现，OpenClaw 仅作为统一的 AI agent 网关。

二、LiteLLM Proxy 安装

# 安装 LiteLLM Proxy (带完整依赖)
pip install 'litellm[proxy]'

# 或使用 Docker
docker run \
  -v $(pwd)/config.yaml:/app/config.yaml \
  -v $(pwd)/complexity_router.py:/app/complexity_router.py \
  -e DEEPSEEK_API_KEY="your-deepseek-key" \
  -p 4000:4000 \
  ghcr.io/berriai/litellm:main-latest

三、复杂度路由 Hook 实现

这是核心组件，实现基于 prompt 复杂度的智能路由。

重要说明：此复杂度路由器是 LiteLLM 的自定义 Hook，不是 OpenClaw 的原生功能。需要先配置 LiteLLM Proxy，然后 OpenClaw 连接到已配置好的 Proxy。

3.1 创建复杂度路由器

创建 ~/.openclaw/litellm/complexity_router.py：

"""
LiteLLM 复杂度路由 Hook
根据 prompt 复杂度自动选择本地或云端模型

注意：此功能完全由 LiteLLM 实现，OpenClaw 本身不支持复杂度路由

简单任务 → 本地模型 (oMLX Qwen3.5-4B) - 零成本，毫秒响应
复杂任务 → 云端模型 (DeepSeek Reasoner) - 深度推理能力
"""

import re
from typing import Any, Dict, Literal, Optional
from litellm.integrations.custom_logger import CustomLogger
from litellm.types.utils import ModelResponse
import litellm

class ComplexityRouter(CustomLogger):
    """
    复杂度路由器：根据 prompt 特征自动选择模型
    """

    def __init__(self):
        super().__init__()
        # 复杂度阈值：分数 >= threshold 路由到云端
        self.threshold = 2

        # 复杂任务关键词
        self.complex_keywords = [
            # 中文复杂任务
            "分析", "比较", "设计", "架构", "推理", "证明", "评估", "总结",
            "深度", "详细", "全面", "系统", "研究", "解释原因", "为什么",
            "如何实现", "如何解决", "论述", "阐述", "评估", "判断",
            # 英文复杂任务
            "analyze", "compare", "design", "architect", "reason", "prove",
            "evaluate", "synthesize", "comprehensive", "detailed", "explain why",
            "how to implement", "discuss", "assess", "critical thinking",
            # 数学/逻辑
            "计算", "推导", "求解", "证明", "calculate", "derive", "solve",
            # 代码相关
            "debug", "optimize", "refactor", "重构", "优化", "调试"
        ]

        # 简单任务关键词
        self.simple_keywords = [
            "你好", "hi", "hello", "嗨", "hey",
            "谢谢", "thanks", "thank you",
            "查询", "look up", "search",
            "翻译", "translate",
            "今天", "天气", "时间", "日期", "today", "weather", "time", "date",
            "问候", "greeting", "打招呼",
            "简单", "easy", "simple",
            "是什么", "what is", "who is", "define"
        ]

    def _calculate_complexity_score(self, text: str) -> int:
        """
        计算 prompt 复杂度分数
        返回值越高表示越复杂
        """
        score = 0
        text_lower = text.lower()

        # 1. Token 数量 (粗略估算)
        tokens = len(text.split())
        if tokens > 1000:
            score += 4
        elif tokens > 500:
            score += 3
        elif tokens > 200:
            score += 2
        elif tokens > 50:
            score += 1

        # 2. 复杂任务关键词匹配
        for keyword in self.complex_keywords:
            if keyword.lower() in text_lower:
                score += 1

        # 3. 简单任务关键词匹配
        for keyword in self.simple_keywords:
            if keyword.lower() in text_lower:
                score -= 2

        # 4. 代码/技术内容判断
        code_indicators = ["```", "def ", "class ", "function", "import ",
                          "代码", "code", "python", "javascript", "api"]
        for indicator in code_indicators:
            if indicator in text_lower:
                score += 1

        # 5. 问号数量（提问复杂度）
        question_count = text.count('?') + text.count('？')
        if question_count >= 3:
            score += 2
        elif question_count >= 1:
            score += 1

        # 6. 否定简单问候
        greeting_patterns = [
            r'^你好[吗呀啊]?', r'^hi\s*$', r'^hello\s*$', r'^hey\s*$',
            r'^嗨[吗呀啊]?', r'^嗨$'
        ]
        for pattern in greeting_patterns:
            if re.match(pattern, text.strip(), re.IGNORECASE):
                score = -10  # 强制为简单任务

        return score

    async def async_pre_call_hook(
        self,
        user_api_key_dict: Any,
        cache: Any,
        data: Dict,
        call_type: Literal["completion", "text_completion", "embeddings",
                          "image_generation", "moderation", "audio_transcription"]
    ) -> Dict:
        """
        LiteLLM Pre-Call Hook
        在请求发送到模型之前修改模型选择

        Args:
            user_api_key_dict: 用户 API 密钥信息
            cache: 缓存实例
            data: 请求数据 (包含 messages, model 等)
            call_type: 调用类型

        Returns:
            修改后的 data (其中 model 字段被替换)
        """
        # 只处理 completion 调用
        if call_type != "completion":
            return data

        # 提取文本内容
        messages = data.get("messages", [])
        text_content = ""

        for message in messages:
            if isinstance(message, dict):
                content = message.get("content", "")
                if isinstance(content, str):
                    text_content += content + " "
            elif isinstance(message, str):
                text_content += message + " "

        # 计算复杂度分数
        complexity_score = self._calculate_complexity_score(text_content)

        # 路由决策
        if complexity_score >= self.threshold:
            # 复杂任务 → 云端推理模型
            target_model = "astron-code-latest"
            route_reason = f"complex (score={complexity_score})"
        else:
            # 简单任务 → 本地模型
            target_model = "qwen-local"
            route_reason = f"simple (score={complexity_score})"

        # 记录路由决策
        print(f"[ComplexityRouter] {route_reason} → {target_model}")
        print(f"[ComplexityRouter] Text preview: {text_content[:100]}...")

        # 修改请求中的模型
        data["model"] = target_model

        return data

# 创建全局实例供 config.yaml 引用
complexity_router_instance = ComplexityRouter()

3.2 复杂度评分说明

特征	分值	说明
Token > 1000	+4	超长文本
Token 500-1000	+3	长文本
Token 200-500	+2	中等长度
Token 50-200	+1	短文本
复杂关键词	+1/个	分析、比较、设计等
简单关键词	-2/个	你好、查询、翻译等
代码内容	+1	包含代码标记
多个问号	+1~2	深度提问

示例：

输入	分数	路由
“你好”	-10	本地 (qwen-local)
“翻译 hello world”	-3	本地 (qwen-local)
“解释量子计算原理”	3	云端 (astron-code-latest)
“分析人工智能对就业市场的影响”	5	云端 (astron-code-latest)

3.3 工具感知路由（Tool-Aware Routing）

OpenClaw 的核心能力是工具调用（Function Calling），不同工具对模型能力要求差异巨大。以下是工具场景的路由建议：

工具类型与模型匹配矩阵

工具类型	示例场景	推荐模型	原因
简单查询	查天气、查时间、简单计算	qwen-local (本地)	零成本、毫秒响应
文件操作	读取本地文件、列出目录	qwen-local (本地)	本地数据处理，快速响应
文件编辑	写代码、改配置、创建文件	astron-code-latest (云端)	需要代码补全和语法理解
网页搜索	Google搜索、新闻查询	astron-code-latest (云端)	需要联网能力和最新知识
网页内容提取	抓取网页、解析HTML	astron-code-latest (云端)	需要理解和处理结构化数据
浏览器自动化	填表单、点击操作、爬虫	astron-code-latest (云端)	需要多步推理和视觉理解
代码调试	修复Bug、性能分析	astron-code-latest (云端)	需要深度推理和复杂分析
架构设计	系统设计、API规划	astron-code-latest (云端)	需要复杂推理和多轮思考
多工具编排	Agent子任务、多步骤工作流	astron-code-latest (云端)	需要长期规划和状态管理

工具感知路由流程图

flowchart TD
    A["用户请求"] --> B{检测工具调用意图?}

    B -->|无| C{文本复杂度评估}
    C -->|简单| C1["qwen-local<br/>(本地)"]
    C -->|复杂| C2["astron-code-latest<br/>(云端)"]

    B -->|是| D{"工具类型?"}

    D -->|简单查询/文件读取| E["qwen-local<br/>(本地)"]
    D -->|文件编辑/网页操作| F["astron-code-latest<br/>(云端)"]
    D -->|代码调试/架构设计| G["astron-code-latest<br/>(云端)"]
    D -->|多工具编排| H["astron-code-latest<br/>(云端)"]

    C1 --> Z["返回结果"]
    C2 --> Z
    E --> Z
    F --> Z
    G --> Z
    H --> Z

增强版复杂度路由器

更新 complexity_router.py 以支持工具感知路由：

"""
LiteLLM 增强版复杂度路由 Hook
支持工具感知路由，根据任务类型和复杂度自动选择模型

路由策略：
- 简单任务 → 本地模型 (oMLX Qwen3.5-4B)
- 工具操作 → 根据工具类型选择
- 复杂推理 → 云端推理模型 (DeepSeek Reasoner)
"""

import re
import json
from typing import Any, Dict, Literal, Optional, List
from litellm.integrations.custom_logger import CustomLogger
from litellm.types.utils import ModelResponse
import litellm

class EnhancedComplexityRouter(CustomLogger):
    """
    增强版复杂度路由器：支持工具感知的智能路由
    """

    def __init__(self):
        super().__init__()

        # 复杂度阈值
        self.text_threshold = 2

        # 工具类型与模型映射
        self.tool_model_mapping = {
            # 简单工具 → 本地模型
            "simple": {
                "tools": [
                    "weather", "time", "date", "calculator", "simple_search",
                    "read_file", "list_directory", "get_time", "currency_convert"
                ],
                "model": "qwen-local"
            },
            # 中等复杂度 → 云端聊天模型
            "medium": {
                "tools": [
                    "write_file", "edit_file", "web_search", "web_scrape",
                    "image_generation", "document_parser", "api_call",
                    "send_message", "create_reminder"
                ],
                "model": "astron-code-latest"
            },
            # 高复杂度 → 云端推理模型
            "complex": {
                "tools": [
                    "debug", "code_review", "architecture_design", "security_audit",
                    "multi_agent", "workflow_orchestration", "data_analysis",
                    "report_generation", "research_synthesis"
                ],
                "model": "astron-code-latest"
            }
        }

        # 复杂任务关键词
        self.complex_keywords = [
            "分析", "比较", "设计", "架构", "推理", "证明", "评估", "总结",
            "深度", "详细", "全面", "系统", "研究", "解释原因", "为什么",
            "如何实现", "如何解决", "论述", "阐述", "判断",
            "analyze", "compare", "design", "architect", "reason", "prove",
            "evaluate", "synthesize", "comprehensive", "debug", "optimize"
        ]

        # 简单任务关键词
        self.simple_keywords = [
            "你好", "hi", "hello", "嗨", "hey",
            "谢谢", "thanks", "查询", "翻译", "天气", "时间",
            "问候", "打招呼", "是什么", "what is", "define"
        ]

    def _detect_tool_intent(self, text: str, messages: List[Dict]) -> Optional[str]:
        """
        检测是否有工具调用意图
        返回工具类型：simple, medium, complex, 或 None
        """
        text_lower = text.lower()

        # 检查最后一条用户消息是否暗示使用工具
        tool_indicators = {
            "simple": [
                "查一下天气", "现在几点了", "今天多少号", "帮我算一下",
                "weather", "time", "date", "what time is it"
            ],
            "medium": [
                "帮我写一个文件", "搜索一下", "查找网页", "发消息",
                "write", "search", "scrape", "send", "create file",
                "帮我发送", "生成图片"
            ],
            "complex": [
                "帮我debug", "代码审查", "架构设计", "安全审计",
                "帮我分析这段代码", "多步骤完成", "自动执行",
                "debug", "review", "architect", "audit", "analyze code",
                "帮我规划", "research"
            ]
        }

        for level, indicators in tool_indicators.items():
            for indicator in indicators:
                if indicator.lower() in text_lower:
                    return level

        # 检查消息历史中是否有工具调用
        for msg in messages[-3:]:  # 检查最近3条消息
            if isinstance(msg, dict):
                content = msg.get("content", "")
                tool_calls = msg.get("tool_calls", [])
                if tool_calls:
                    # 有工具调用，检查工具名称
                    for call in tool_calls:
                        func_name = call.get("function", {}).get("name", "").lower()
                        for level, config in self.tool_model_mapping.items():
                            if func_name in config["tools"]:
                                return level

        return None

    def _calculate_complexity_score(self, text: str) -> int:
        """
        计算纯文本复杂度分数
        """
        score = 0
        text_lower = text.lower()

        # 1. Token 数量
        tokens = len(text.split())
        if tokens > 1000:
            score += 4
        elif tokens > 500:
            score += 3
        elif tokens > 200:
            score += 2
        elif tokens > 50:
            score += 1

        # 2. 复杂关键词
        for keyword in self.complex_keywords:
            if keyword.lower() in text_lower:
                score += 1

        # 3. 简单关键词
        for keyword in self.simple_keywords:
            if keyword.lower() in text_lower:
                score -= 2

        # 4. 问号数量
        question_count = text.count('?') + text.count('？')
        if question_count >= 3:
            score += 2
        elif question_count >= 1:
            score += 1

        # 5. 问候检测
        greeting_patterns = [
            r'^你好[吗呀啊]?', r'^hi\s*$', r'^hello\s*$', r'^hey\s*$'
        ]
        for pattern in greeting_patterns:
            if re.match(pattern, text.strip(), re.IGNORECASE):
                score = -10

        return score

    def _select_model_for_tool(self, tool_level: str) -> str:
        """
        根据工具类型选择模型
        """
        if tool_level in self.tool_model_mapping:
            return self.tool_model_mapping[tool_level]["model"]
        return "astron-code-latest"  # 默认使用云端聊天模型

    async def async_pre_call_hook(
        self,
        user_api_key_dict: Any,
        cache: Any,
        data: Dict,
        call_type: Literal["completion", "text_completion", "embeddings",
                          "image_generation", "moderation", "audio_transcription"]
    ) -> Dict:
        """
        LiteLLM Pre-Call Hook
        支持工具感知的智能路由
        """
        if call_type != "completion":
            return data

        messages = data.get("messages", [])
        text_content = ""

        for message in messages:
            if isinstance(message, dict):
                content = message.get("content", "")
                if isinstance(content, str):
                    text_content += content + " "
            elif isinstance(message, str):
                text_content += message + " "

        # 1. 首先检测工具调用意图
        tool_level = self._detect_tool_intent(text_content, messages)

        if tool_level:
            # 工具感知路由
            target_model = self._select_model_for_tool(tool_level)
            route_reason = f"tool-{tool_level}"
            print(f"[EnhancedRouter] {route_reason} → {target_model}")
        else:
            # 2. 纯文本复杂度路由
            complexity_score = self._calculate_complexity_score(text_content)

            if complexity_score >= self.text_threshold:
                target_model = "astron-code-latest"
                route_reason = f"complex-text (score={complexity_score})"
            else:
                target_model = "qwen-local"
                route_reason = f"simple-text (score={complexity_score})"

            print(f"[EnhancedRouter] {route_reason} → {target_model}")

        data["model"] = target_model
        return data

# 创建全局实例
enhanced_router_instance = EnhancedComplexityRouter()

工具场景路由示例

用户请求	工具检测	路由决策	模型
“你好”	无工具	简单文本	qwen-local
“查一下北京天气”	simple工具	天气查询	qwen-local
“帮我搜索OpenClaw最新消息”	medium工具	网页搜索	astron-code-latest
“这段代码有Bug帮我看看”	complex工具	代码调试	astron-code-latest
“分析这段Python代码的性能”	复杂文本	深度分析	astron-code-latest
“帮我设计一个微服务架构”	complex工具	架构设计	astron-code-latest

工具成本优化策略

工具类型	月均调用占比	使用模型	月成本(假设1000次/天)
简单工具	~40%	qwen-local	$0
中等工具	~35%	astron-code-latest	~$15
复杂工具	~25%	astron-code-latest	~$25
总计	100%	智能路由	~$40

对比全云端方案：全部使用 astron-code-latest，月成本约 $100，节省 60%。

四、LiteLLM 配置文件

创建 ~/.openclaw/litellm/config.yaml：

# ===========================================
# 模型列表定义
# ===========================================
model_list:
  # ─────────────────────────────────────────
  # 本地模型：oMLX Qwen3.5-4B
  # ─────────────────────────────────────────
  - model_name: qwen-local
    litellm_params:
      model: openai/qwen3.5-4b
      api_base: http://localhost:8000/v1
      api_key: "omlx-local"
      rpm: 60
    model_info:
      mode: chat
      supports_function_calling: false
      supports_vision: false

  # ─────────────────────────────────────────
  # 云端模型：DeepSeek
  # ─────────────────────────────────────────
  - model_name: astron-code-latest
    litellm_params:
      model: deepseek/astron-code-latest
      api_key: os.environ/DEEPSEEK_API_KEY
      rpm: 60
    model_info:
      mode: chat
      supports_function_calling: true

  - model_name: astron-code-latest
    litellm_params:
      model: deepseek/astron-code-latest
      api_key: os.environ/DEEPSEEK_API_KEY
      rpm: 10
    model_info:
      mode: chat
      supports_function_calling: false
      supports_reasoning: true

# ===========================================
# 复杂度路由 Hook
# ===========================================
litellm_settings:
  # 注册复杂度路由 Hook
  callbacks:
    - complexity_router.complexity_router_instance

  # 重试配置
  num_retries: 2
  request_timeout: 60

  # 故障转移配置
  fallbacks:
    # 本地模型失败 → 云端模型
    - "qwen-local": ["astron-code-latest"]
    # 云端推理失败 → 云端聊天模型 → 本地模型
    - "astron-code-latest": ["astron-code-latest", "qwen-local"]

  # 上下文窗口溢出时降级
  context_window_fallbacks:
    - "qwen-local": ["astron-code-latest"]
    - "astron-code-latest": ["astron-code-latest"]

  # 允许的连续失败次数
  allowed_fails: 3

  # 丢弃不支持的参数
  drop_params: true

# ===========================================
# 路由策略
# ===========================================
router_settings:
  # 由于我们使用自定义复杂度路由，这里设置为 simple-shuffle 作为 fallback
  routing_strategy: simple-shuffle
  timeout: 60

# ===========================================
# 服务配置
# ===========================================
general_settings:
  master_key: sk-openclaw-local
  port: 4000
  host: 0.0.0.0

五、环境变量配置

创建 ~/.openclaw/litellm/.env：

1 2	# 云端 API Keys DEEPSEEK_API_KEY=sk-your-deepseek-key

五点五、配置 LiteLLM Proxy 的复杂度路由 Hook

在启动 LiteLLM Proxy 之前，确保 Hook 已正确配置：

# 1. 确保复杂度路由器文件存在
ls -la ~/.openclaw/litellm/complexity_router.py

# 2. 验证 config.yaml 中的 Hook 注册
grep -A 5 "callbacks" ~/.openclaw/litellm/config.yaml

# 3. 测试 Hook 导入
cd ~/.openclaw/litellm
python -c "from complexity_router import complexity_router_instance; print('Hook loaded successfully')"

六、启动与验证

6.1 启动 LiteLLM Proxy

cd ~/.openclaw/litellm

# 启动服务
litellm --config ~/.openclaw/litellm/config.yaml

# 或者后台运行
nohup litellm --config ~/.openclaw/litellm/config.yaml \
  --detailed_debug > litellm.log 2>&1 &

6.2 验证服务

# 1. 检查健康状态
curl http://localhost:4000/health

# 2. 查看可用模型
curl http://localhost:4000/v1/model/list

# 3. 测试简单任务（应路由到本地模型）
curl -X POST http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-openclaw-local" \
  -d '{
    "model": "qwen-local",
    "messages": [{"role": "user", "content": "你好，请介绍一下自己"}]
  }'

# 4. 测试复杂任务（应路由到云端模型）
curl -X POST http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-openclaw-local" \
  -d '{
    "model": "astron-code-latest",
    "messages": [{"role": "user", "content": "深度分析人工智能对就业市场的影响，需要考虑技术进步、自动化、行业转型等多个维度"}]
  }'

6.3 查看日志验证路由

# 查看 LiteLLM 日志
tail -f ~/.openclaw/litellm/litellm.log | grep ComplexityRouter

# 预期输出示例：
# [ComplexityRouter] simple (score=-10) → qwen-local
# [ComplexityRouter] complex (score=3) → astron-code-latest

6.4 验证 OpenClaw 连接

# 测试 OpenClaw 与 LiteLLM Proxy 的连接
curl -X POST http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-openclaw-local" \
  -d '{
    "model": "qwen-local",
    "messages": [{"role": "user", "content": "测试连接"}]
  }'

# 验证复杂度路由是否生效
curl -X POST http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-openclaw-local" \
  -d '{
    "model": "astron-code-latest",
    "messages": [{"role": "user", "content": "深度分析人工智能发展趋势"}]
  }'

七、OpenClaw 配置

重要说明：OpenClaw 的配置只是连接到已配置好的 LiteLLM Proxy。复杂度路由完全在 LiteLLM 层实现，OpenClaw 本身不具备此功能。

编辑 ~/.openclaw/openclaw.json：

{
  "meta": {
    "lastTouchedVersion": "v2026.4.1",
    "lastTouchedAt": "2026-04-02T12:00:00.000Z",
    "platform": "apple-silicon",
    "mlx_optimized": true
  },
  "models": {
    "mode": "merge",
    "providers": {
      "local-omlx": {
        "baseUrl": "http://localhost:8000/v1",
        "apiKey": "omlx-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.5-4b-mlx-q4",
            "name": "Qwen3.5-4B-MLX-4bit",
            "contextWindow": 32768,
            "maxTokens": 4096,
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0 }
          }
        ]
      },
      "litellm": {
        "baseUrl": "http://localhost:4000",
        "apiKey": "sk-openclaw-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen-local",
            "name": "Qwen3.5-4B (LiteLLM)",
            "contextWindow": 32768,
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0 }
          },
          {
            "id": "astron-code-latest",
            "name": "DeepSeek-V3",
            "contextWindow": 64000,
            "reasoning": false,
            "input": ["text"]
          },
          {
            "id": "astron-code-latest",
            "name": "DeepSeek-R2",
            "contextWindow": 200000,
            "reasoning": true,
            "input": ["text"]
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "litellm/qwen-local",
        "fallbacks": [
          "litellm/astron-code-latest",
          "litellm/astron-code-latest"
        ]
      }
    }
  },
  "fault_tolerance": {
    "enable_failover": true,
    "max_retries": 2
  }
}

八、OpenClaw 原生路由能力

8.1 多代理路由（Multi-Agent Routing）

OpenClaw 支持按渠道/用户路由到不同 Agent，这与复杂度路由是不同的概念：

{
  "channels": {
    "telegram": {
      "dmPolicy": "routing",
      "route": {
        "+8613812345678": "agent-fast",      // 快速问答代理
        "+8613912345678": "agent-coding"      // 编程专用代理
      }
    },
    "discord": {
      "dmPolicy": "routing",
      "route": {
        "user-alpha": "agent-fast",
        "user-beta": "agent-reasoning"
      }
    }
  }
}

用途：不同用户/场景使用不同 Agent，而非根据任务复杂度选择模型。

8.2 Model Failover 机制

LiteLLM 的复杂度路由与 OpenClaw 的故障转移协同工作：

flowchart TB
    A["用户请求"] --> B["LiteLLM 复杂度路由<br/>(async_pre_call_hook)"]

    B --> C{复杂度评分 → 模型选择}
    C -->|简单| C1["qwen-local<br/>(本地)"]
    C -->|复杂| C2["astron-code-latest<br/>(云端)"]

    C1 & C2 --> D{模型可用?}
    D -->|是| Z["返回结果"]
    D -->|否| E["LiteLLM 故障转移"]

    subgraph LiteLLM_Fallback["LiteLLM 故障转移"]
        E["fallbacks 配置"]
        E --> E1["qwen-local → astron-code-latest"]
        E --> E2["astron-code-latest → astron-code-latest → qwen-local"]
    end

    E1 & E2 --> F{LiteLLM 层可用?}
    F -->|否| G["OpenClaw Model Failover"]

    subgraph OpenClaw_Fallback["OpenClaw Model Failover"]
        G["agents.defaults.model.fallbacks"]
        G --> G1["litellm/qwen-local → litellm/astron-code-latest → ..."]
    end

    G1 --> H["返回结果"]
    E1 & E2 --> H

8.3 路由能力总结

路由类型	实现层	说明
复杂度智能路由	LiteLLM Hook	✅ 本文方案
多代理路由	OpenClaw	✅ 按用户/渠道
故障转移	双重	LiteLLM + OpenClaw

九、测试完整流程

# 1. 重启 OpenClaw Gateway
openclaw gateway restart

# 2. 检查状态
openclaw gateway status

# 3. 运行健康检查
openclaw doctor

# 4. 测试复杂度路由（通过 LiteLLM Hook 实现）
# 简单任务 → 自动路由到本地模型
openclaw run "你好"

# 复杂任务 → 自动路由到云端推理模型
openclaw run "深度分析量子计算对未来 cryptography 的影响"

# 5. 查看日志
openclaw gateway logs --tail 100

# 同时查看 LiteLLM 日志验证路由决策
tail -f ~/.openclaw/litellm/litellm.log | grep -E "(ComplexityRouter|EnhancedRouter)"

测试要点：

验证 LiteLLM Proxy 的复杂度路由 Hook 是否正常工作
确认 OpenClaw 能正确调用 LiteLLM Proxy
检查故障转移机制是否生效

十、成本优化效果

任务类型	比例	模型	成本
简单任务（问候、查询等）	~60%	qwen-local (本地)	$0
复杂任务（分析、推理等）	~40%	astron-code-latest	按量计费

预估月成本节省（假设每日 100 次请求）：

方案	本地占比	云端占比	月成本
全部云端	0%	100%	~$30
智能路由	60%	40%	~$12

十一、架构总结

flowchart TB
    A["用户请求"] --> B["LiteLLM Proxy<br/>(复杂度路由层)"]

    subgraph Router["async_pre_call_hook (复杂度分析)"]
        B --> C["输入"]
        C --> C1["Token计数"]
        C1 --> C2["关键词匹配"]
        C2 --> C3["复杂度评分"]
        C3 --> C4{"模型选择"}

        C4 -->|分数 ≥ 2| C5["astron-code-latest<br/>(云端)"]
        C4 -->|分数 < 2| C6["qwen-local<br/>(本地)"]
    end

    B --> D["+ 故障自动转移"]
    B --> E["+ 成本追踪"]

    C5 & C6 --> F["模型响应"]

    subgraph LocalModel["本地模型"]
        F -->|简单任务 ~60%| G["oMLX Qwen3.5-4B<br/>✓ 零成本<br/>✓ 毫秒级响应<br/>✓ 数据隐私"]
    end

    subgraph CloudModel["云端模型"]
        F -->|复杂任务 ~40%| H["Astron Code<br/>✓ 深度推理<br/>✓ 按量计费"]
    end

    G <-.->|故障转移| H

组件职责分工：

组件	主要职责	功能范围
OpenClaw Gateway	AI agent 网关	多代理路由、会话管理、工具调用统一入口
LiteLLM Proxy	模型路由中枢	复杂度分析、模型选择、故障转移、成本追踪
本地模型 (oMLX)	简单任务处理	零成本、快速响应、数据隐私保护
云端模型 (Astron)	复杂任务处理	深度推理、知识更新、复杂计算

关键说明：复杂度智能路由完全由 LiteLLM 的自定义 Hook 实现，OpenClaw 提供统一的管理界面和工具调用支持。

智能路由效果：

简单任务 (~60%) → 本地模型 (零成本)
复杂任务 (~40%) → 云端模型 (按需付费)
故障时 → 自动转移备用模型
月成本节省 ~60%

十三、OpenClaw vs LiteLLM：功能对比总结

功能维度	OpenClaw	LiteLLM	本方案
复杂度路由	❌ 不支持	✅ Hook 实现	✅ 通过 LiteLLM
多代理路由	✅ 按用户/频道	❌ 不支持	✅ OpenClaw 原生
工具调用统一	✅ Function Calling	❌ 不支持	✅ OpenClaw 原生
模型故障转移	✅ 基础故障转移	✅ 高级故障转移	✅ 双重保障
成本追踪	❌ 分散计费	✅ 统一计费	✅ LiteLLM 统一
本地模型支持	✅ 通过 provider	✅ 通过 proxy	✅ 双重集成

最佳实践：

使用 OpenClaw 进行多代理管理和工具调用编排
使用 LiteLLM 进行复杂度路由和成本优化
组合使用 获得完整的 AI agent 解决方案

OpenClaw 官方资源

LiteLLM 官方资源

版本更新记录

日期	版本	关键更新
2026-04-01	v2026.4.1	/tasks 后台任务看板，Agent 本地故障转移
2026-03-23	v2026.3.23	Qwen endpoints，UI 优化
2026-03-22	v2026.3.22	ClawHub 优先，MCP 整合
2026-03-13	v2026.3.12	Gateway Dashboard v2，GPT-5.4/Claude 快速模式
2026-03-08	v2026.3.7	Context Engine 插件接口，GPT-5.4 支持