SHARK PROJECT SUMMARY
Current Status & Milestones
- Token
SHIVERlive on Base · contract address pending publication. - Initial Shiver mesh: 4 cold-running nodes across EU + NA, serving inference for a private alpha group of ~30 builders.
- First open model (SHARK Llama 3.3 70B) targeted for HuggingFace publication this quarter.
- Public API + node-client release timed with the open model drop.
- OpenAI-compatible
/v1/chatendpoint already serving the alpha — drop-in replacement for any app using the OpenAI Python or JS SDK. - Cold-tuning pipeline (our internal fine-tune toolchain) validated on Mistral 24B; multi-shard distributed training works across 8+ heterogeneous GPUs.
Summary
SHARK is a distributed AI compute and model-distribution project on Base. It exists because three things are converging at the same time, and the existing landscape doesn't solve all three at once:
- Frontier LLM weights are increasingly closed and over-aligned, leaving builders without serious open alternatives.
- Datacenter compute keeps getting more expensive while consumer GPUs sit idle 80%+ of the day around the world.
- Apps need inference at the edge — not in a single AWS region behind a rate-limited API with a 30-second timeout window.
The Shiver Network solves all three: cold-running consumer GPUs serve open uncensored models at sub-second latency, paid in $SHIVER. Builders pay roughly 30% less than centralized providers. Node operators get a clean way to monetize hardware that would otherwise burn power for nothing.
The brand is intentional: a shiver is the English collective noun for sharks, and it also literally describes how our nodes are configured to run — cold, thermally capped, never thrashing the silicon.
Why SHARK, why now
Three forces are reshaping the AI compute market right now:
- Hyperscaler capacity is capped. AWS, Azure, and GCP are sold out of H100s in most regions for the foreseeable future. Lead times on the next-gen B200 series are 6+ months. Centralized providers are price-takers on hardware they can't even acquire.
- Closed-frontier alignment is tightening. Every major lab has narrowed acceptable outputs, refused entire categories of queries, and added hidden system prompts that bias toward their preferred answers. For developers building serious products, this is a steerability crisis.
- Consumer GPUs are everywhere. Estimates put deployed RTX 30/40-series cards alone at ~80 million worldwide. Most of them sit idle ~22 hours per day. The aggregate compute dwarfs every hyperscaler combined.
SHARK pulls those threads together into a network where consumer hardware serves uncensored open weights, paid in a native token, with onchain attestation for billing. None of those pieces are new individually. Combined and shipped, they're a different shape of company.
SHARK Models
SHARK is trained on a pipeline of leading open-source frontier models with the goal of customizing alignment to the user — not to a corporate policy. We start from Meta Llama 3.3, Mistral, and Google Gemma checkpoints and apply our internal toolchain called cold-tuning which modifies the base model with a custom dataset emphasizing instruction-following over refusal.
This is done through a process called "fine-tuning" which modifies the base model using a custom dataset. We have developed a pipeline to create Unaligned & Unbiased versions of the best open-source AI models that are available to the public. Every weight is published to HuggingFace under shark-shiver/ — anyone can audit, fork, or self-host. We do not gate model access behind the Shiver Network; we just believe the Shiver is the cheapest place to run them.
Three families are currently in active development:
- SHARK Llama 3.3 70B — flagship reasoning + code model. 4× A100 or 1× H100. Targeting GPT-4-class on reasoning, faster on code.
- SHARK Mistral 24B Cold — fast cheap multilingual workhorse. Runs on a single 4090.
- SHARK Gemma 9B — lightweight, edge-friendly. Runs on a 3090 or any 16GB+ card.
Shiver Network — Distributed Inference & Training
We have created the Shiver Network for distributed AI compute. Anyone running the SHARK client contributes GPU cycles in exchange for $SHIVER payouts, denominated per token of inference served. Onboarding takes around 10 minutes; the client auto-detects GPU and proposes a thermal target you can adjust before running.
- Cold thermal target — GPUs run capped at a user-set temp ceiling, protecting hardware longevity and electricity bill.
- Pool routing — requests route to the lowest-latency cold node that can serve the requested model. No central dispatcher.
- Onchain payouts — every epoch, contributors receive $SHIVER proportional to their served-token count, weighted by quality verification.
- No staking, no slashing — joining costs nothing, leaving costs nothing. You can drain your earned $SHIVER and uninstall in under a minute.
- Heterogeneous mesh — small cards serve small models, big cards serve big models. The router handles fragmentation transparently.
Architecture
The Shiver is a three-layer system:
- Edge nodes — the GPU operators. Each runs the SHARK client, which advertises capacity (model slugs supported, throughput, thermal headroom) to the routing layer over a gossip protocol.
- Routing layer — a thin peer-to-peer overlay that maps an inbound inference request to the best-matching idle node based on model availability, latency, and reputation score.
- Settlement layer — onchain on Base. Every served token generates a signed attestation from both the caller and the serving node; these are batched and submitted to the rewards contract per epoch.
There is no central "shiver.cloud" inference server — api.shrk.cloud is a thin gateway that resolves to the same routing layer any peer can speak. Self-hosting the gateway is a documented and supported path.
Security model
Three threats matter:
- Node lies about output — mitigated by N=3 redundancy on a sample of requests and reputation-weighted routing. Repeated divergence from the consensus output cuts a node's reputation and its earnings.
- Caller exfiltrates a private prompt — model weights are public; prompt content never persists on a node beyond the inference call. Optional end-to-end encryption via a forthcoming TEE-backed mode.
- Sybil attacks for inflated rewards — every node submits a hardware attestation (driver-signed GPU fingerprint) at registration. Multiple nodes claiming the same hardware get folded into a single earning entity.
Roadmap
- Q3 — Alpha Shiver opens to public. SHARK Llama 3.3 70B drops on HuggingFace.
- Q4 — Node client + onchain payouts go live on Base. $SHIVER market opens.
- Q1 — Synthetic data pipeline online. First public dataset release. Multi-tenant on-node isolation.
- Q2 — SHARK Mistral 24B Cold drops alongside the Telegram + Discord bot. Vision API beta.
- Q3+1 — Distributed training MVP (multi-node fine-tunes paid in $SHIVER).
Economics & $SHIVER value accrual
$SHIVER is the unit of account inside the Shiver. Apps pay for inference in $SHIVER (or auto-converted USDC at the gateway); node operators receive $SHIVER. The protocol takes a 1% fee on inference revenue and uses it to buy-and-burn $SHIVER from the open market, creating a continuous value loop tied directly to compute demand.
Supply schedule is deliberately simple:
- Total supply: 500,000,000 $SHIVER (fixed, no future minting).
- 40% — node operator rewards pool, distributed over 4 years via the epoch payouts contract.
- 30% — initial LP on Base (Uniswap V4, 0.3% fee tier).
- 15% — team & contributors, 4-year linear vest with a 1-year cliff.
- 10% — ecosystem grants & strategic partners.
- 5% — public airdrop to alpha-period node operators and depositors.
Token: SHIVER · Supply: 500,000,000 · Chain: Base · Initial LP fee: 0.3%
Governance
Onchain parameter changes (protocol fee, epoch length, slashing thresholds, model allowlist) move through a simple veToken vote: lock $SHIVER for 1–4 years, vote weight proportional to lock duration. Smart-contract upgrades require a 7-day timelock visible onchain. No emergency pause; if the contracts break, we ship new ones and migrate.
Open source
The SHARK node client, the routing layer, and the settlement contracts are all open-source under the Apache 2.0 license. Model weights are released under their respective base-model licenses (Llama Community License for the Llama variants, Apache for Mistral/Gemma variants). No proprietary fork of any base model — every artifact is reproducible by any contributor.
FAQ
Is this just another decentralized inference network? The market is real but most projects in it are token-first, product-second. We're shipping the model + the network + the chat interface together — judge it on what's online, not the roadmap.
Why Base instead of Solana / Ethereum mainnet? Cheap settlement, Coinbase-backed sequencer, growing onchain identity tooling, and the most active retail flow of any L2 today. Solana's L1 fees are competitive but Base's tooling for production apps is years ahead.
What about regulation? Models are open weights, the network is decentralized, payouts are software-mediated. We don't host or moderate content. Operators are responsible for their own jurisdiction. The same way Ethereum nodes don't get sued because someone deployed a meme coin.
Can I run a node behind NAT? Yes — the routing layer holepunches via STUN. No port forwarding required.
Can I self-host the gateway? Yes — the gateway is a thin proxy; spinning your own up is one binary + a config file. Documented in Docs.