File Attribution & Ownership Claims

Not every file a user attaches has prior provenance. When a file is new to the system, you need to ask: who created this? The answer changes the action type recorded, but in both cases the file gets a content-addressed CID that downstream AI actions can reference.

The Two Cases

Case 1 — Known file (provenance exists)

The file matches an existing record in the system. The existing CID is reused as an inputCid — no additional recording needed.

Search result: { cid: "Qm...", score: 1.0, ... }
                        ↓
Use this CID directly as inputCid in the AI response action.

Case 2 — New file (no prior provenance)

The file is not in the system. Ask the user: “Do you own this file?”

Answer	EAA action type	Meaning
Yes — I created it	`"create"`	User is the original creator. Resource enters the system as a claimed work.
No — it’s from elsewhere	`"reference"`	User is providing an external source. Original creator unknown. Resource enters as an unclaimed input.

Both paths produce a CID. Both CIDs can be used as inputCids in downstream provenance actions.

Why Both Cases Matter

Claimed resources (action.type = "create") establish a clean ownership chain. The provenance graph shows user → creates → file → inputs → AI response. This is strong evidence for copyright claims (see the Human Creative Input pattern). Unclaimed resources (action.type = "reference") are equally important. Recording that an AI response used an unknown-origin file is honest provenance — it accurately represents the training data and inputs used. Silently omitting unattributed inputs is worse than recording their existence.

An unclaimed resource is not a problem in the provenance graph — it’s an honest representation of reality. The absence of a creator entity signals “origin unknown” rather than “created by no one.”

Implementation

Server-side claim endpoint

// POST /api/claim
// FormData: { file, owned: "true"|"false", userId, mimeType }
import { ProvenanceKit } from "@provenancekit/sdk";

export async function POST(req: Request) {
  const form = await req.formData();
  const file = form.get("file") as File;
  const owned = form.get("owned") === "true";
  const userId = (form.get("userId") as string) ?? "anonymous";
  const mimeType = (form.get("mimeType") as string) ?? file.type;

  const pk = new ProvenanceKit({ apiKey: process.env.PK_API_KEY });

  const entityId = await pk.entity({ role: "human", name: userId });

  const result = await pk.file(file, {
    entity: { id: entityId, role: "human", name: userId },
    action: {
      type: owned ? "create" : "reference",
    },
    resourceType: mimeType.startsWith("image/") ? "image" : "text",
    // On-chain recording fires automatically if CHAIN_PRIVATE_KEY is set
  });

  return Response.json({
    cid: result.cid,
    actionId: result.actionId,
    onchain: result.onchain ?? null,
    status: owned ? "claimed" : "referenced",
  });
}

Client-side with FileProvenanceTag

import { FileProvenanceTag } from "@provenancekit/ui";

function AttachmentPreview({ file, userId, onCidAssigned }) {
  async function handleClaim(owned: boolean) {
    const form = new FormData();
    form.append("file", file, file.name);
    form.append("owned", String(owned));
    form.append("userId", userId);
    form.append("mimeType", file.type);

    const res = await fetch("/api/claim", { method: "POST", body: form });
    if (!res.ok) throw new Error("Claim failed");

    const { cid, status } = await res.json();
    onCidAssigned(cid);                         // ← propagate CID to parent state
    return { cid, status };                     // ← returned to FileOwnershipClaim
  }

  return (
    <div className="attachment">
      <span>{file.name}</span>
      <FileProvenanceTag
        file={file}
        onClaim={handleClaim}                   // ← shown when file not found
        onViewDetail={(cid) => navigate(`/provenance/${cid}`)}
      />
    </div>
  );
}

When the file has no prior provenance, FileProvenanceTag renders FileOwnershipClaim inline:

┌─────────────────────────────────────────┐
│ New file — do you own this?             │
│ [✓ Yes, I own it]  [↗ No, I don't]     │
└─────────────────────────────────────────┘

After the user decides, the component transitions to a success state:

✓ Claimed as your work       (owned = true)
✓ Recorded as external source (owned = false)

Using the CID in downstream actions

Once the file has a CID (from either a match or a claim), pass it as inputCids when recording the AI response:

// The claimed CID is now a first-class provenance input
const response = await pk.file(responseBlob, {
  entity: { id: agentId, role: "ai", name: "openai/gpt-4o" },
  action: {
    type: "generate",
    inputCids: [
      promptCid,          // the user's text prompt
      attachedFileCid,    // ← the claimed or matched file CID
    ],
    aiTool: { provider: "openai", model: "gpt-4o" },
  },
  resourceType: "text",
});

The resulting provenance graph:

[user] ──creates──► [photo.jpg]        (owned = true)
    OR
[user] ──references──► [photo.jpg]     (owned = false)

Both connect as:

[photo.jpg] ──inputCid──► [generate action] ──produces──► [AI response]

On-Chain Recording

Both "create" and "reference" actions are eligible for on-chain recording. If CHAIN_PRIVATE_KEY and BASE_SEPOLIA_RPC_URL are set, pk.file() automatically records the action hash to the ProvenanceRegistry contract on Base Sepolia.

const result = await pk.file(file, { ... });

if (result.onchain) {
  console.log("Anchored on-chain:", result.onchain.txHash);
  // txHash can be verified on Basescan
}

On-chain recording is fire-and-forget — if it fails, the off-chain record (in Supabase/PostgreSQL) always stands as the canonical provenance record.

What the Provenance Graph Looks Like

Claimed file

Entities:    alice (role: human)
Resources:   photo.jpg (CID: Qm..., type: image)
             ai-response.txt (CID: Qm..., type: text)
Actions:     [create] alice → photo.jpg
             [generate] gpt-4o → ai-response.txt
               inputCids: [photo.jpg, prompt.json]

Unclaimed file (referenced)

Entities:    alice (role: human)
Resources:   external-file.jpg (CID: Qm..., type: image)
             ai-response.txt (CID: Qm..., type: text)
Actions:     [reference] alice → external-file.jpg
             [generate] gpt-4o → ai-response.txt
               inputCids: [external-file.jpg, prompt.json]

The difference: create signals Alice made it; reference signals Alice used it but didn’t make it. Both are honest, auditable records.

Gotchas

Ask before the message is sent: The ownership decision should happen in the attachment UI, not after submission. FileProvenanceTag handles this — it runs the search and shows the claim prompt while the file is still in the input area.
Reuse existing CIDs: If FileProvenanceTag finds a match (score ≥ some threshold), use the existing CID directly as inputCid. Don’t re-record the same file as a new resource.
CID propagation: Store the claimed CID in local state immediately after onClaim resolves so it’s available when the message is submitted. FileOwnershipClaim’s onClaim callback is the right place to call setState.
Binary files (PDFs): For files where text content can’t be extracted inline, the provenance record still captures the file hash (CID). The LLM won’t see the content, but the provenance chain remains complete.

​The Two Cases

​Case 1 — Known file (provenance exists)

​Case 2 — New file (no prior provenance)

​Why Both Cases Matter

​Implementation

​Server-side claim endpoint

​Client-side with FileProvenanceTag

​Using the CID in downstream actions

​On-Chain Recording

​What the Provenance Graph Looks Like

​Claimed file

​Unclaimed file (referenced)

​Gotchas

The Two Cases

Case 1 — Known file (provenance exists)

Case 2 — New file (no prior provenance)

Why Both Cases Matter

Implementation

Server-side claim endpoint

Client-side with FileProvenanceTag

Using the CID in downstream actions

On-Chain Recording

What the Provenance Graph Looks Like

Claimed file

Unclaimed file (referenced)

Gotchas