RefPack Specification

Mission

RefPack establishes a standardized, secure, and self-contained format for distributing structured datasets. Our mission is to enable seamless data sharing across organizations, platforms, and tools while maintaining cryptographic integrity and eliminating dependency on external infrastructure.

By providing a ZIP-based packaging format with embedded cryptographic signatures, RefPack ensures that datasets can be validated, trusted, and consumed reliably in any environment - from air-gapped systems to cloud-native applications.

Why RefPack?

Modern data distribution faces several critical challenges:

RefPack solves these problems by providing:

What's Inside a RefPack?

A RefPack is fundamentally a ZIP archive with a standardized internal structure:

/                             ← Package root (no nested folders, except `assets/`)
├── data.meta.json            ← REQUIRED, signed manifest
├── data.meta.json.jws        ← REQUIRED, JWS signature over exact `data.meta.json` bytes
├── data.json                 ← REQUIRED, JSON array of objects
├── data.schema.json          ← OPTIONAL, JSON-Schema for `data.json`
├── data.changelog.json       ← OPTIONAL, versioned changelog
├── data.readme.md            ← OPTIONAL, human-readable documentation
└── assets/                   ← OPTIONAL, flat folder of supplemental files
    ├── image.png
    └── lookup.csv

Every RefPack contains three core components that work together to provide security, structure, and usability:

  1. Signed Manifest (data.meta.json + data.meta.json.jws): Package metadata with cryptographic integrity
  2. Structured Payload (data.json): The actual dataset as a JSON array
  3. Optional Documentation (schema, readme, changelog, assets): Supporting materials for understanding and using the data

Deep Dive: Core Components

1. Manifest: data.meta.json

The manifest serves as the authoritative metadata source for every RefPack. It contains essential information about the package identity, versioning, and provenance.

JSON Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RefPack DatasetMeta",
  "type": "object",
  "required": ["id","version","title","createdUtc"],
  "properties": {
    "id": {
      "type": "string",
      "pattern": "^[A-Za-z0-9](?:[A-Za-z0-9_-]*[A-Za-z0-9])?$",
      "description": "Package identifier (alphanumeric, -, _, no spaces)."
    },
    "version": {
      "type": "string",
      "pattern": "^\\d+\\.\\d+\\.\\d+(?:-[0-9A-Za-z\\.]+)?$",
      "description": "SemVer 2.0.0 version string."
    },
    "title": {
      "type": "string",
      "minLength": 1,
      "description": "Human-readable title."
    },
    "description": {
      "type": "string",
      "description": "Optional long description."
    },
    "authors": {
      "type": "array",
      "items": { "type": "string" },
      "description": "List of author names or organizations."
    },
    "createdUtc": {
      "type": "string",
      "format": "date-time",
      "description": "UTC timestamp of package creation."
    },
    "tags": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Free-form tags."
    },
    "license": {
      "type": "string",
      "description": "SPDX license identifier."
    }
  },
  "additionalProperties": false
}

Required Fields

Optional Fields

2. Signature: data.meta.json.jws

RefPack uses JSON Web Signature (JWS) with embedded public keys to provide cryptographic integrity without external dependencies.

JWS Header Structure

The JWS header must contain an embedded JSON Web Key (JWK) with the public key:

{
  "alg": "ES256",           // Algorithm (ES256 recommended)
  "kid": "publisher-key-1", // Key identifier
  "jwk": {                  // Embedded public key (JWK format)
    "kty": "EC",
    "crv": "P-256",
    "x": "...",             // Public key X coordinate
    "y": "...",             // Public key Y coordinate
    "use": "sig",           // Key usage: signing
    "key_ops": ["verify"]   // Allowed operations: verification only
  },
  "typ": "JWT"              // Token type
}

JWS Payload (Claims)

The JWS contains standard JWT claims for security:

{
  "iat": 1640995200,        // Issued at (Unix timestamp)
  "exp": 1641002400,        // Expiration time (Unix timestamp, typically 2 hours)
  "jti": "refpack"          // JWT ID (must be "refpack")
}

Signature Generation Process

  1. Create JWS header with embedded public key (JWK format)
  2. Create JWS payload with standard JWT claims
  3. Sign over manifest bytes: The signature covers the exact bytes of data.meta.json
  4. Generate compact JWS: Standard RFC 7515 compact serialization

3. Payload: data.json

The payload contains the actual dataset and must always be a JSON array of objects. This constraint ensures consistent data structure across all RefPacks.

Structure Requirements

Example

[
  { "id": "US", "name": "United States", "population": 331002651 },
  { "id": "CA", "name": "Canada",        "population": 37742154  },
  { "id": "MX", "name": "Mexico",        "population": 128932753 }
]

This structure enables:

Security & Validation

RefPack implements multiple layers of security validation to ensure package integrity and authenticity.

Cryptographic Validation

JWS Signature Verification

Clients and servers must:

  1. Parse JWS header and extract the embedded jwk field
  2. Validate JWK structure ensuring it contains only public key components
  3. Verify signature using the embedded public key over BASE64URL(header) . BASE64URL(payload)
  4. Validate JWT claims:
    • Check exp (expiration) if present
    • Verify iat (issued at) is not in the future (allow 5min clock skew)
    • Ensure jti equals "refpack"
  5. Verify manifest integrity: Ensure the JWS was signed over the exact bytes of data.meta.json

Key Security Requirements

Structural Validation

ZIP Archive Security

Schema Validation

Versioning Security

Trust Model

RefPack implements a decentralized trust model that doesn't rely on central authorities:

CLI Toolchain

The RefPack CLI provides a complete toolchain for creating, validating, and distributing RefPack archives.

Core Commands

Command Description
pack Validate folder, then create <id>-<version>.refpack.zip
validate Open a .refpack.zip, verify layout, schemas, and JWS.
push POST ZIP to /packages, expect JSON {"success":true}.
pull GET /packages/{id}?version={v}, saves ZIP or extracts folder.
meta GET /packages/{id}/meta?version={v}, prints JSON manifest.

Packaging Workflow

# 1. Pack & sign locally with private key
refpack pack \
  --input ./country-data/ \
  --output country-1.0.0.refpack.zip \
  --sign-key ~/.keys/publisher.pem \
  --key-id publisher-2025-05-20

# 2. Validate (no JWKS URL needed - uses embedded public key)
refpack validate \
  --package country-1.0.0.refpack.zip

# 3. Push to registry
refpack push \
  --package country-1.0.0.refpack.zip \
  --api-url https://api.refpack.example.com \
  --api-key $REFPACK_TOKEN

# 4. Later, pull & inspect
refpack pull --id country --version 1.0.0 --dest ./downloads/
refpack meta --id country --version 1.0.0

Key Management Commands

The CLI includes specialized commands for cryptographic key management:

# Generate new signing key pair
refpack keygen \
  --algorithm ES256 \
  --key-id publisher-2025-05-20 \
  --output ~/.keys/publisher.pem

# Extract public key for verification
refpack pubkey \
  --private-key ~/.keys/publisher.pem \
  --output ~/.keys/publisher.pub.json

# Verify signature with explicit public key
refpack verify \
  --package country-1.0.0.refpack.zip \
  --public-key ~/.keys/publisher.pub.json

Configuration Management

The CLI supports configuration files for streamlined workflows:

{
  "publisher": {
    "name": "Acme Data Corp",
    "keyId": "acme-2025-05-20",
    "keyFile": "~/.keys/acme.pem"
  },
  "registry": {
    "url": "https://api.refpack.example.com",
    "tokenFile": "~/.refpack/token"
  },
  "validation": {
    "strictMode": true,
    "allowPrerelease": false,
    "maxPackageSize": "100MB"
  }
}

Typical Use Cases

RefPack addresses a wide range of data distribution scenarios across different industries and use cases.

Data Science & Analytics

Scenario: Research teams sharing cleaned datasets for reproducible analysis

API Reference Data

Scenario: Distributing country codes, currency lists, or other reference data for applications

Machine Learning Models

Scenario: Sharing training datasets and model artifacts between ML teams

Configuration Distribution

Scenario: Distributing application configuration or feature flags across environments

Compliance & Audit

Scenario: Financial institutions sharing regulatory data with audit trails

Open Data Publishing

Scenario: Government agencies publishing public datasets

Getting Started (Spec)

This section provides implementation guidance for developers building RefPack-compatible tools and libraries.

Implementation Checklist

Core Requirements

Security Implementation

Optional Features

Library Integration

RefPack is designed to integrate seamlessly with existing data processing libraries:

Python Integration

import refpack
import pandas as pd

# Load RefPack into pandas DataFrame
with refpack.open('countries-1.0.0.refpack.zip') as package:
    df = pd.DataFrame(package.data)
    print(f"Loaded {package.meta.title} v{package.meta.version}")

JavaScript Integration

const refpack = require('refpack');

// Load and validate RefPack
const package = await refpack.load('countries-1.0.0.refpack.zip');
console.log(`Package: ${package.meta.title}`);
console.log(`Records: ${package.data.length}`);

REST API Integration

POST /packages HTTP/1.1
Content-Type: application/zip
Authorization: Bearer <token>

[ZIP file binary data]

Testing Strategy

Comprehensive testing is essential for RefPack implementations:

Unit Tests

Integration Tests

Security Tests

Join the Community

RefPack is an open specification designed to grow through community collaboration and feedback.

Contributing to the Specification

The RefPack specification evolves through community input and real-world usage:

Implementation Registry

We maintain a registry of RefPack-compatible tools and libraries:

Support Channels

Roadmap & Future Development

Current development priorities include:


Optional Components

Schema: data.schema.json (Optional)

When present, must validate an array of objects to match data.json.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "RefPack Data Schema",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id":        { "type": "string" },
      "name":      { "type": "string" },
      "population":{ "type": "integer", "minimum": 0 }
    },
    "required": ["id","name","population"],
    "additionalProperties": false
  }
}

Changelog & Readme (Optional)

Assets Folder (Optional)


Advanced Topics

Versioning

  1. SemVer 2.0.0 required.
  2. New version must be strictly greater than any published under the same id.
  3. Clients may reject pre-releases unless invoked with --allow-prerelease.

Key Management Best Practices

Private Key Security

Public Key Distribution

Trust Model

Extensibility