QUICKSTART

Get an inference call working in 60 seconds. All you need is a $SHIVER API key from the dashboard.

curl https://api.shrk.cloud/v1/chat \
  -H "Authorization: Bearer YOUR_SHIVER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "shark-llama-3.3-70b",
    "messages": [{"role":"user","content":"hello shiver"}]
  }'

The response shape matches OpenAI's chat completions API so you can drop SHARK in as a one-line provider swap — see Migrate from OpenAI below for a single-line change for the OpenAI Python and JS SDKs.

Authentication

Every request needs a bearer key in the Authorization header. Keys are issued on the dashboard, prefixed shvr_, and scoped to a single billing wallet. Keys can be rotated and revoked at any time.

Three key tiers, all priced the same per token but with different concurrency:

For client-side use, prefer the gateway's session-token endpoint (POST /v1/sessions) which mints a short-lived browser-safe token from a server-held parent key.

Migrate from OpenAI in one line

Python (openai 1.x):

from openai import OpenAI
client = OpenAI(
    base_url="https://api.shrk.cloud/v1",
    api_key="YOUR_SHIVER_KEY",
)
r = client.chat.completions.create(
    model="shark-llama-3.3-70b",
    messages=[{"role":"user","content":"hi"}],
)
print(r.choices[0].message.content)

TypeScript / Node:

import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "https://api.shrk.cloud/v1",
  apiKey: process.env.SHIVER_KEY,
});
const r = await client.chat.completions.create({
  model: "shark-mistral-24b-cold",
  messages: [{ role: "user", content: "hi" }],
});

That's literally it. Everything else — streaming, function calling, JSON mode, response formatting — works identically.

Chat completions

Endpoint: POST https://api.shrk.cloud/v1/chat

Request body:

Response shape matches OpenAI chat.completions exactly, plus two extra response headers:

Streaming responses (SSE)

Set stream: true to get token-by-token output as an text/event-stream. Each chunk is a JSON line prefixed with data: , terminated by data: [DONE].

data: {"choices":[{"delta":{"content":"hel"}}]}
data: {"choices":[{"delta":{"content":"lo"}}]}
data: {"choices":[{"finish_reason":"stop"}]}
data: [DONE]

The OpenAI Python and JS SDKs handle this transparently — no code change beyond setting stream=True.

Function calling

Define tools in the same format as OpenAI's tools API. The model returns tool_calls in the assistant message; you execute the function on your end and feed the result back as a tool-role message.

{
  "model": "shark-llama-3.3-70b",
  "messages": [{"role":"user","content":"weather in eindhoven"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a city",
      "parameters": {
        "type": "object",
        "properties": {"city": {"type":"string"}},
        "required": ["city"]
      }
    }
  }]
}

Vision (beta)

Vision input is in beta on shark-llama-3.3-70b-vision and shark-gemma-9b-vision. Pass image URLs or base64 data inline in the user message content array:

"messages": [{
  "role": "user",
  "content": [
    {"type": "text", "text": "what's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://…"}}
  ]
}]

Pricing: vision tokens are billed at 2× the text-token rate.

Embeddings

Endpoint: POST https://api.shrk.cloud/v1/embeddings. Currently serving shark-embed-large-v1 (1536 dims, multilingual). Same request shape as OpenAI text-embedding-3-large.

{
  "model": "shark-embed-large-v1",
  "input": ["text to embed", "another text"]
}

Available models

All weights also live on HuggingFace under shark-shiver/ — you're free to download and self-host if you don't want to use the Shiver Network.

Sampling parameters

Sensible defaults if not set: temperature=0.7, top_p=1.0, max_tokens=4096. The vision and embedding endpoints ignore most sampling params.

Run a node

One-liner install (Linux + macOS):

curl -fsSL https://shrk.cloud/install | sh

The installer will:

  1. Detect your GPU(s) and propose a safe thermal cap (default 70 °C).
  2. Ask you for the Base address that should receive $SHIVER payouts.
  3. Pull the latest node binary and register your endpoint with the routing layer.
  4. Start serving inference in the background as a systemd service.

Monitor with shrk status. Stop with shrk stop. No staking, no penalty for stopping.

Multi-GPU & advanced node configs

The client supports a few patterns out of the box:

All config can also live in /etc/shrk/config.toml; the env vars override it.

Billing & $SHIVER

Inference is priced per million tokens, denominated in $SHIVER. Roughly 30% cheaper than centralized providers at the same model size — and stays cheap because new node supply keeps coming online.

Indicative pricing (per 1M tokens, may vary with $SHIVER price):

The protocol takes a 1% fee on every inference call. That fee is converted to $SHIVER and burned, creating deflationary pressure tied directly to network usage.

Limits & SLAs

Error codes

Every error response includes a code field, a human-readable message, and an x-shiver-request-id header you can include in bug reports.

SDKs

Because the API is OpenAI-compatible, the official OpenAI SDKs work out of the box with base_url overridden. Dedicated SDKs are in development:

Webhooks

Subscribe to events at the dashboard. Available event types:

Every webhook is signed with an HMAC-SHA256 header (x-shrk-signature) so you can verify origin.

Community

Talk to the team and other node operators in the SHARK Telegram and Discord. Bug reports and pull requests welcome on the open-source repos. Anyone shipping with the API or running a node is welcome to a free monthly office-hours call.