Multi-Node Targeting¶

Workflows can distribute execution across multiple edge nodes based on capabilities, location, load, and trust.

Targeting Strategies¶

1. Automatic (Local)¶

Let the gateway choose based on tool availability:

{
  "target": { "kind": "local" }
}

The gateway will: - Discover nodes with the required tool - Select based on availability and load - Fail over to backup nodes automatically

2. Explicit Node ID¶

Target a specific node by ID:

{
  "target": {
    "kind": "explicit",
    "node_id": "local:abc123"
  }
}

Use when: - Node has specialized hardware (GPS, camera) - Deterministic execution required - Testing specific device behavior

3. Capability-Based¶

Target nodes with specific capabilities:

{
  "target": {
    "kind": "capability",
    "required_capabilities": [
      "tool:pii_redact",
      "hardware:gps",
      "platform:android"
    ]
  }
}

The gateway will select nodes that match all capabilities.

4. Geographic¶

Target nodes by location:

{
  "target": {
    "kind": "geographic",
    "center": { "lat": 37.7749, "lon": -122.4194 },
    "radius_meters": 5000
  }
}

Use cases: - Field operations (nearest node to work site) - Emergency response (nodes in affected area) - Compliance (data must stay in region)

5. Load-Balanced¶

Distribute across multiple nodes:

{
  "target": {
    "kind": "load_balanced",
    "min_nodes": 2,
    "max_nodes": 5
  }
}

The gateway will: - Split work across available nodes - Balance based on current load - Aggregate results

Multi-Step Workflows¶

Different steps can target different nodes:

{
  "workflow_id": "multi_node_demo",
  "steps": [
    {
      "step_id": "capture",
      "tool_name": "capture_photo",
      "target": {
        "kind": "capability",
        "required_capabilities": ["hardware:camera"]
      }
    },
    {
      "step_id": "process",
      "tool_name": "image_analysis",
      "target": {
        "kind": "capability",
        "required_capabilities": ["tool:image_analysis", "gpu:available"]
      }
    },
    {
      "step_id": "store",
      "tool_name": "upload_result",
      "target": {
        "kind": "explicit",
        "node_id": "gateway:main"
      }
    }
  ]
}

Execution flow: 1. Mobile device captures photo 2. GPU-equipped laptop analyzes image 3. Gateway stores result

Node Discovery¶

Automatic Discovery (UDP Multicast)¶

Nodes broadcast presence:

# Edge node announces itself
{
  "node_id": "local:abc123",
  "capabilities": ["tool:pii_redact", "tool:summarize"],
  "platform": "macos",
  "location": { "lat": 37.7749, "lon": -122.4194 }
}

Gateway caches topology:

curl http://127.0.0.1:8787/v1/mesh_scan

{
  "nodes": [
    {
      "node_id": "local:abc123",
      "last_seen": "2024-01-27T10:30:00Z",
      "capabilities": ["tool:pii_redact"],
      "trusted": true
    }
  ]
}

Manual Registration¶

Add nodes manually when multicast unavailable:

curl -X POST http://127.0.0.1:8787/v1/nodes/register \
  -d '{
    "node_id": "remote:xyz789",
    "endpoint": "https://192.168.50.134:8000",
    "capabilities": ["tool:unit_convert"]
  }'

Failover Behavior¶

Automatic Failover¶

If primary node fails, gateway tries alternatives:

{
  "target": { "kind": "local" },
  "retry": {
    "max_attempts": 3,
    "failover": true
  }
}

Flow: 1. Gateway selects Node A 2. Node A offline → Try Node B 3. Node B fails → Try Node C 4. All nodes fail → Return error with execution path

Execution Path Tracking¶

Results include which nodes were attempted:

{
  "ok": true,
  "execution_path": [
    "local:node_a (failed)",
    "local:node_b (success)"
  ],
  "degraded": true
}

Offline Multi-Node Demo¶

Scenario: Execute workflow across Android and Raspberry Pi nodes on LAN (no internet).

Workflow Definition¶

{
  "workflow_id": "offline_multinode_demo",
  "steps": [
    {
      "step_id": "redact",
      "tool_name": "pii_redact",
      "tool_args": { "text": "Email me at john@example.com" },
      "target": {
        "kind": "explicit",
        "node_id": "android:device_a"
      }
    },
    {
      "step_id": "convert",
      "tool_name": "unit_convert",
      "tool_args": { "value": 10, "from_unit": "miles", "to_unit": "km" },
      "target": {
        "kind": "explicit",
        "node_id": "rpi:device_b"
      }
    }
  ]
}

Execution¶

python demos/offline_multinode_demo.py

Result:

{
  "ok": true,
  "verified": true,
  "execution_path": [
    "android:device_a",
    "rpi:device_b"
  ],
  "timeline": [
    {
      "step": "redact",
      "node": "Android Device A",
      "timestamp": "2024-01-27T10:30:00Z",
      "duration_ms": 45,
      "verified": true
    },
    {
      "step": "convert",
      "node": "Raspberry Pi B",
      "timestamp": "2024-01-27T10:30:01Z",
      "duration_ms": 12,
      "verified": true
    }
  ],
  "degraded": false
}

Best Practices¶

1. Use Automatic Targeting for Development¶

{
  "target": { "kind": "local" }
}

Simplest approach—gateway handles everything.

2. Use Explicit Targeting for Production Workflows¶

{
  "target": {
    "kind": "explicit",
    "node_id": "production:primary"
  }
}

Deterministic and predictable.

3. Enable Failover for Critical Workflows¶

{
  "retry": {
    "max_attempts": 3,
    "failover": true
  }
}

Resilience over speed.

4. Monitor Execution Paths¶

Review which nodes handled requests:

cat audit/audit.log | jq '.execution_path'

Next Steps¶

Failover & Retry - Build resilient workflows
Defining Workflows - Workflow structure
Policy Enforcement - Add governance
Mesh Transport - Network layer details