Skip to content

GoCraftling/Craftling

Repository files navigation

craftling-go

The control plane and host agent for a multi-host, Firecracker-microVM Minecraft hosting platform. A Gin HTTP API owns users, auth, and game-server desired state; a reconciler drives each server onto a fleet host; and a per-host agent boots it as a real Firecracker microVM. See docs/PLAN.md for the phased roadmap (P0–P4 landed; P6 networking in progress).

Requirements

  • Go 1.26+
  • PostgreSQL 13+
  • For the real VM backend (AGENT_RUNTIME=firecracker): a Linux host with /dev/kvm, the firecracker binary, a vmlinux kernel, and rootfs images.
  • For (re)building the eBPF dataplane: clang + libbpf + bpftool on a Linux ≥6.6 host (see docs/ebpf-nat-dataplane.md).

Getting started

cp .env.example .env   # set DATABASE_URL, JWT_SECRET, etc.

# Spin up Postgres (example via Docker):
docker run -d --name craftling-pg -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=craftling -p 5432:5432 postgres:16-alpine

make run               # or: go run ./cmd/server

The server listens on :8080 by default and applies the embedded goose migrations (internal/db/migrations) automatically on startup — clean on both a fresh database and one already at an older revision.

Configuration

Control plane (make run)

Variable Default Description
PORT 8080 HTTP listen port
APP_ENV development production → release mode + JSON logs
DATABASE_URL postgres://postgres:postgres@localhost:5432/craftling?sslmode=disable Postgres connection string
JWT_SECRET dev-secret-change-me HMAC signing secret (set in prod)
ACCESS_TTL 15m Access-token (JWT) lifetime
REFRESH_TTL 720h (30d) Refresh-token lifetime
ADMIN_EMAIL (unset) If set with ADMIN_PASSWORD, seeds/promotes this admin on startup
ADMIN_PASSWORD (unset) Admin bootstrap password
TEMPLATE_INDEX_URL (unset) Marketplace registry index the /templates API fetches from

Host agent (make run-agent, MODE=agent)

Variable Default Description
CONTROL_PLANE_URL http://localhost:8080 Control plane the agent registers + heartbeats with
ADVERTISE_ADDR localhost:9000 Agent API address the control plane calls back
ADVERTISE_HOST 127.0.0.1 Player-facing connect host VMs report
AGENT_RUNTIME fake VM backend: fake (in-memory) or firecracker (real microVMs)
FC_BINARY firecracker Firecracker executable (PATH lookup by default)
FC_KERNEL (required) Uncompressed vmlinux all VMs boot
FC_IMAGE_DIR (required) Directory of per-version minecraft-<version>.ext4 rootfs images
FC_DEFAULT_IMAGE (unset) Fallback rootfs filename when a version has no image
FC_WORK_DIR OS temp dir Per-VM working dirs (sockets, writable rootfs, logs)

AGENT_ID, AGENT_HOSTNAME, ZONE, CPUS_TOTAL, MEMORY_MB_TOTAL, and AGENT_VERSION further describe the host in its registration.

Endpoints

Method Path Auth Description
GET /healthz Liveness probe
GET /api/v1/ping Example endpoint
POST /api/v1/auth/register Create user, returns a token pair
POST /api/v1/auth/login Verify credentials, returns a pair
POST /api/v1/auth/refresh Rotate refresh token, returns a pair
POST /api/v1/auth/logout Revoke a refresh token (204)
GET /api/v1/me Bearer Current authenticated user
POST /api/v1/servers Bearer Create a game server
GET /api/v1/servers Bearer List your game servers
GET /api/v1/servers/:id Bearer Get one of your servers
PATCH /api/v1/servers/:id Bearer Rename, or set desired_state (running/stopped)
DELETE /api/v1/servers/:id Bearer Tear down a server (202)
GET /api/v1/templates Bearer List marketplace templates
GET /api/v1/templates/:id Bearer Get one template's manifest
GET /api/v1/admin/users Admin List all users (role admin only)
GET /api/v1/admin/servers Admin List all servers across all owners
GET /api/v1/admin/hosts Admin List the fleet
POST /api/v1/agent/hosts/register Agent Host registers itself
POST /api/v1/agent/hosts/:id/heartbeat Agent Host liveness + capacity heartbeat

Auth endpoints return a token pair:

{
  "access_token": "<jwt>",
  "refresh_token": "<opaque>",
  "token_type": "Bearer",
  "expires_in": 900
}

The short-lived access token is sent as Authorization: Bearer <access_token> on protected routes. When it expires, exchange the refresh token at /auth/refresh for a new pair. Refresh tokens are rotated on every use (the old one is revoked); replaying a rotated token is treated as theft and revokes all of that user's tokens.

# Register (or login) to get a token pair
PAIR=$(curl -s -X POST localhost:8080/api/v1/auth/register \
  -H 'Content-Type: application/json' \
  -d '{"email":"alice@example.com","password":"hunter2pass"}')
ACCESS=$(echo "$PAIR" | jq -r .access_token)
REFRESH=$(echo "$PAIR" | jq -r .refresh_token)

# Call a protected route
curl localhost:8080/api/v1/me -H "Authorization: Bearer $ACCESS"

# Later: rotate for a fresh pair
curl -s -X POST localhost:8080/api/v1/auth/refresh \
  -H 'Content-Type: application/json' \
  -d "{\"refresh_token\":\"$REFRESH\"}"

# Log out (revoke the refresh token)
curl -s -X POST localhost:8080/api/v1/auth/logout \
  -H 'Content-Type: application/json' \
  -d "{\"refresh_token\":\"$REFRESH\"}"

Roles

Each user has a single role (user by default, or admin). The role is embedded in the access-token claims, and RequireRole middleware guards admin routes — so authorization is checked without a database round-trip.

Because the role lives in the JWT, a role change takes effect on the user's next token refresh (within ACCESS_TTL), not instantly.

Set ADMIN_EMAIL + ADMIN_PASSWORD to bootstrap an admin on startup: a matching user is created if absent, or promoted to admin if it already exists (its password is left untouched).

Game servers & the reconciler

A game server separates desired state (running / stopped / deleted, set by the API) from observed status (pendingprovisioningrunningstoppingstoppeddeleting, plus error). A background reconciler (internal/reconciler) ticks periodically, finds servers whose status doesn't match their desired state, and drives them one step at a time via a Provisioner backend.

Placement (P2). Before a server can run it must be placed on a fleet host. The internal/scheduler picks a ready host with enough allocatable cpu/memory (least-loaded) and atomically reserves that capacity; the assignment is recorded as game_servers.host_id. A server that fits no host right now is marked unschedulable and retried; a spec larger than any host is rejected at create time.

Agent split (P3). The control plane never touches KVM. The reconciler's backend is provisioner.RemoteProvisioner, which resolves the assigned host's address and calls that host's agent (cmd/agent / internal/agent) over HTTP to provision/start/stop/deprovision the VM. The agent runs a Runtime; which one is chosen by AGENT_RUNTIME (fake for the in-memory stub, or firecracker for real microVMs) — the API and reconciler are identical either way because both satisfy the same agent.Runtime interface. Agents register and heartbeat with the control plane so the scheduler knows the fleet.

# Create a server (desired_state defaults to running)
curl -s -X POST localhost:8080/api/v1/servers \
  -H "Authorization: Bearer $ACCESS" -H 'Content-Type: application/json' \
  -d '{"name":"survival","version":"1.20.4"}'

# Poll until the reconciler reports status "running" with host/port
curl -s localhost:8080/api/v1/servers/<id> -H "Authorization: Bearer $ACCESS"

# Stop / start
curl -s -X PATCH localhost:8080/api/v1/servers/<id> \
  -H "Authorization: Bearer $ACCESS" -H 'Content-Type: application/json' \
  -d '{"desired_state":"stopped"}'

# Tear down (reconciler deprovisions, then soft-deletes the row)
curl -s -X DELETE localhost:8080/api/v1/servers/<id> -H "Authorization: Bearer $ACCESS"

Deletes are soft: the reconciler deprovisions the VM and then sets status = deleted with a deleted_at timestamp. The row is retained for history/audit but hidden from every API read (and from reconciliation).

Firecracker runtime, the image pipeline & networking

Runtime (P4). internal/agent/firecracker boots each server as a real Firecracker microVM, driving Firecracker through the in-repo generated REST client (internal/firecracker) over the per-VM API Unix socket and managing the VMM process directly. Provision stages a per-VM writable copy of the version's base rootfs, launches firecracker, configures machine/boot-source/drive, then starts the instance; Stop sends SendCtrlAltDel (force-kill after a grace period) but keeps the rootfs so the world survives a restart on that host; Start re-boots from it. make test-kvm runs the gated lifecycle integration test on a /dev/kvm host.

Image pipeline. internal/image converts an OCI/Docker image into a read-only squashfs rootfs (internal/squashfs), injecting the Go init binary (cmd/init) as PID 1. The init agent mounts the kernel filesystems, fetches a run spec (internal/runspec) from Firecracker's MMDS over the link-local address, applies per-VM networking, then execs and supervises the workload — powering the VM off when it exits. The control plane resolves launchable templates from a marketplace registry (internal/registry, the /templates API).

Networking (P6, in progress). An eBPF NAT dataplane gives each VM real connectivity with no Linux bridge and no iptables/nftables rules: TCX-attached programs SNAT egress and DNAT a per-server host port to the in-VM service port, reusing the kernel's nf_conntrack via bpf_ct_* kfuncs. The guest applies its address, gateway neighbor, and default route from the run spec. Design and current status: docs/ebpf-nat-dataplane.md.

Layout

cmd/server          control-plane entry point: DB connect/migrate + HTTP API + graceful shutdown
cmd/agent           host-agent entry point: VM API + register/heartbeat; selects the VM backend
cmd/init            in-VM PID 1: mount, fetch run spec from MMDS, apply networking, supervise workload
internal/config     environment configuration (control plane + agent + Firecracker)
internal/db         pgx pool + embedded goose migrations
internal/model      domain types (User, GameServer, Host)
internal/repository data access (Postgres; in-memory host inventory)
internal/auth       bcrypt password hashing + JWT issue/verify
internal/handler    control-plane routes (auth, servers, templates, admin, agent)
internal/middleware request ID, request logging, JWT/agent auth guards
internal/registry   template registry (marketplace) client
internal/scheduler  host placement + atomic capacity reservation (P2)
internal/reconciler desired-state → observed-status convergence loop
internal/provisioner Provisioner seam: Fake + RemoteProvisioner (P3)
internal/agent      host agent: Runtime interface, FakeRuntime, VM API server + clients (P3)
internal/agent/firecracker  real Firecracker microVM Runtime + eBPF NAT dataplane (P4/P6)
internal/firecracker generated Firecracker REST client (go-openapi, over the API Unix socket)
internal/image      OCI/Docker image → squashfs rootfs converter (injects cmd/init)
internal/squashfs   squashfs writer/compressor used by the converter
internal/runspec    host↔guest run-spec contract delivered via MMDS
internal/reaper     periodic background cleanup (stale hosts → down)
internal/seed       initial-data bootstrap (admin account)
internal/logger     zap logger + request-scoped context helpers

Commands

make run          # run the control plane
make run-agent    # run a host agent (MODE=agent; see Makefile/.env.example for env)
make build        # build control plane to ./bin/server
make build-agent  # build agent to ./bin/agent
make test         # run unit tests
make test-e2e     # run end-to-end tests (requires Docker)
make test-kvm     # Firecracker lifecycle test (requires /dev/kvm + host artifacts)
make bpf-generate # regenerate eBPF bindings/objects (Linux ≥6.6 + clang/libbpf/bpftool)
make tidy         # tidy go.mod
make fmt          # format code

Testing

End-to-end tests live in test/e2e/. They are gated behind the e2e build tag, so make test (plain go test ./...) never touches Docker. The e2e suite uses testcontainers to start a real Postgres, serves the actual router over HTTP, and drives the full lifecycle (register → login → server placement → reconcile → teardown):

make test-e2e          # or: go test -tags e2e ./test/e2e/...

The Firecracker lifecycle test is gated behind the kvm build tag and kept out of the default lane; run it on a KVM host with the FC_* artifacts in place:

FC_KERNEL=... FC_IMAGE_DIR=... FC_DEFAULT_IMAGE=base.ext4 make test-kvm

About

Run a fleet of Minecraft servers like cloud infrastructure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors