Skip to main content
Not every file a user attaches has prior provenance. When a file is new to the system, you need to ask: who created this? The answer changes the action type recorded, but in both cases the file gets a content-addressed CID that downstream AI actions can reference.

The Two Cases

Case 1 — Known file (provenance exists)

The file matches an existing record in the system. The existing CID is reused as an inputCid — no additional recording needed.
Search result: { cid: "Qm...", score: 1.0, ... }

Use this CID directly as inputCid in the AI response action.

Case 2 — New file (no prior provenance)

The file is not in the system. Ask the user: “Do you own this file?”
AnswerEAA action typeMeaning
Yes — I created it"create"User is the original creator. Resource enters the system as a claimed work.
No — it’s from elsewhere"reference"User is providing an external source. Original creator unknown. Resource enters as an unclaimed input.
Both paths produce a CID. Both CIDs can be used as inputCids in downstream provenance actions.

Why Both Cases Matter

Claimed resources (action.type = "create") establish a clean ownership chain. The provenance graph shows user → creates → file → inputs → AI response. This is strong evidence for copyright claims (see the Human Creative Input pattern). Unclaimed resources (action.type = "reference") are equally important. Recording that an AI response used an unknown-origin file is honest provenance — it accurately represents the training data and inputs used. Silently omitting unattributed inputs is worse than recording their existence.
An unclaimed resource is not a problem in the provenance graph — it’s an honest representation of reality. The absence of a creator entity signals “origin unknown” rather than “created by no one.”

Implementation

Server-side claim endpoint

// POST /api/claim
// FormData: { file, owned: "true"|"false", userId, mimeType }
import { ProvenanceKit } from "@provenancekit/sdk";

export async function POST(req: Request) {
  const form = await req.formData();
  const file = form.get("file") as File;
  const owned = form.get("owned") === "true";
  const userId = (form.get("userId") as string) ?? "anonymous";
  const mimeType = (form.get("mimeType") as string) ?? file.type;

  const pk = new ProvenanceKit({ apiKey: process.env.PK_API_KEY });

  const entityId = await pk.entity({ role: "human", name: userId });

  const result = await pk.file(file, {
    entity: { id: entityId, role: "human", name: userId },
    action: {
      type: owned ? "create" : "reference",
    },
    resourceType: mimeType.startsWith("image/") ? "image" : "text",
    // On-chain recording fires automatically if CHAIN_PRIVATE_KEY is set
  });

  return Response.json({
    cid: result.cid,
    actionId: result.actionId,
    onchain: result.onchain ?? null,
    status: owned ? "claimed" : "referenced",
  });
}

Client-side with FileProvenanceTag

import { FileProvenanceTag } from "@provenancekit/ui";

function AttachmentPreview({ file, userId, onCidAssigned }) {
  async function handleClaim(owned: boolean) {
    const form = new FormData();
    form.append("file", file, file.name);
    form.append("owned", String(owned));
    form.append("userId", userId);
    form.append("mimeType", file.type);

    const res = await fetch("/api/claim", { method: "POST", body: form });
    if (!res.ok) throw new Error("Claim failed");

    const { cid, status } = await res.json();
    onCidAssigned(cid);                         // ← propagate CID to parent state
    return { cid, status };                     // ← returned to FileOwnershipClaim
  }

  return (
    <div className="attachment">
      <span>{file.name}</span>
      <FileProvenanceTag
        file={file}
        onClaim={handleClaim}                   // ← shown when file not found
        onViewDetail={(cid) => navigate(`/provenance/${cid}`)}
      />
    </div>
  );
}
When the file has no prior provenance, FileProvenanceTag renders FileOwnershipClaim inline:
┌─────────────────────────────────────────┐
│ New file — do you own this?             │
│ [✓ Yes, I own it]  [↗ No, I don't]     │
└─────────────────────────────────────────┘
After the user decides, the component transitions to a success state:
✓ Claimed as your work       (owned = true)
✓ Recorded as external source (owned = false)

Using the CID in downstream actions

Once the file has a CID (from either a match or a claim), pass it as inputCids when recording the AI response:
// The claimed CID is now a first-class provenance input
const response = await pk.file(responseBlob, {
  entity: { id: agentId, role: "ai", name: "openai/gpt-4o" },
  action: {
    type: "generate",
    inputCids: [
      promptCid,          // the user's text prompt
      attachedFileCid,    // ← the claimed or matched file CID
    ],
    aiTool: { provider: "openai", model: "gpt-4o" },
  },
  resourceType: "text",
});
The resulting provenance graph:
[user] ──creates──► [photo.jpg]        (owned = true)
    OR
[user] ──references──► [photo.jpg]     (owned = false)

Both connect as:

[photo.jpg] ──inputCid──► [generate action] ──produces──► [AI response]

On-Chain Recording

Both "create" and "reference" actions are eligible for on-chain recording. If CHAIN_PRIVATE_KEY and BASE_SEPOLIA_RPC_URL are set, pk.file() automatically records the action hash to the ProvenanceRegistry contract on Base Sepolia.
const result = await pk.file(file, { ... });

if (result.onchain) {
  console.log("Anchored on-chain:", result.onchain.txHash);
  // txHash can be verified on Basescan
}
On-chain recording is fire-and-forget — if it fails, the off-chain record (in Supabase/PostgreSQL) always stands as the canonical provenance record.

What the Provenance Graph Looks Like

Claimed file

Entities:    alice (role: human)
Resources:   photo.jpg (CID: Qm..., type: image)
             ai-response.txt (CID: Qm..., type: text)
Actions:     [create] alice → photo.jpg
             [generate] gpt-4o → ai-response.txt
               inputCids: [photo.jpg, prompt.json]

Unclaimed file (referenced)

Entities:    alice (role: human)
Resources:   external-file.jpg (CID: Qm..., type: image)
             ai-response.txt (CID: Qm..., type: text)
Actions:     [reference] alice → external-file.jpg
             [generate] gpt-4o → ai-response.txt
               inputCids: [external-file.jpg, prompt.json]
The difference: create signals Alice made it; reference signals Alice used it but didn’t make it. Both are honest, auditable records.

Gotchas

  • Ask before the message is sent: The ownership decision should happen in the attachment UI, not after submission. FileProvenanceTag handles this — it runs the search and shows the claim prompt while the file is still in the input area.
  • Reuse existing CIDs: If FileProvenanceTag finds a match (score ≥ some threshold), use the existing CID directly as inputCid. Don’t re-record the same file as a new resource.
  • CID propagation: Store the claimed CID in local state immediately after onClaim resolves so it’s available when the message is submitted. FileOwnershipClaim’s onClaim callback is the right place to call setState.
  • Binary files (PDFs): For files where text content can’t be extracted inline, the provenance record still captures the file hash (CID). The LLM won’t see the content, but the provenance chain remains complete.