Transcript and Document Ingestion
How to feed meeting transcripts and documents into Coppermind so it builds knowledge automatically.
Automatic Transcript Cleaning#
When you ingest a meeting transcript, Coppermind automatically removes common noise before extraction. This ensures your briefs and follow-ups contain clean, professional language instead of the verbal habits and system artifacts that recording tools capture.
What gets cleaned:
- Speaker timestamps and labels (e.g., "[00:15] Sarah:" becomes just the statement)
- System banners (Otter.ai headers, Granola watermarks, recording start/stop messages)
- Filler words (um, uh, like, you know) — English-only in this version
- Repeated greetings (e.g., "Hi, everyone. Hi, everyone.")
- Email signatures and unsubscribe footers (when ingesting emails)
- Calendar metadata and meeting invite details
What's preserved:
- The substance of what was said — questions, decisions, commitments, names, numbers
- Code blocks or quoted text (marked with backticks or >) — never stripped
- Speaker turn boundaries (so you can follow the conversation flow)
- Emphasis and context that matters
How you'll see it:
When Coppermind extracts memories from a cleaned transcript, it tracks what was removed in a provenance record. This record shows which source document contributed to each memory, what cleaning rules were applied, and a hash of the original raw transcript for verification. You don't need to do anything with this — it's stored automatically and used by Coppermind to show you where your briefs came from.
Why Ingestion Matters#
Ingestion is the primary way Coppermind builds client knowledge. After 3-4 meetings worth of ingested transcripts, meeting briefs become significantly better. The more you feed it, the more context it has for prep, search, and content generation.
Two Ways to Ingest#
1. Paste Directly in Chat#
The simplest approach. Just paste the transcript:
"Here's the transcript from my strategy session with Acme last Tuesday"
[paste transcript content]
Coppermind processes the text through its extraction pipeline, pulling out decisions, commitments, preferences, campaign outcomes, stakeholder info, and general facts.
Works with any note-taker. If your tool (Fathom, Otter.ai, Fireflies, etc.) doesn't have a direct connector, just copy the transcript from its web interface and paste it into Claude. This is the most reliable method and works identically regardless of source.
Best for: Single transcripts, meeting recaps, ad-hoc documents, any note-taker without a connector.
2. Pull from Connected Sources#
Use the /cm-connect command to pull data from any connected MCP:
"/cm-connect pull from Google Drive"
This reads documents from connected services (Google Drive, Granola, etc.) and ingests them into the active client mind.
Best for: Documents that live in cloud services, ongoing integration with note-taking tools.
What Gets Extracted#
The extraction pipeline identifies six types of knowledge:
| Type | What It Finds | Example from a Transcript |
|---|---|---|
decision | Choices that were made | "We're pausing LinkedIn ads through Q2" |
commitment | Action items someone committed to | "Ben will send revised messaging by Friday" |
preference | Stated or implied preferences | "That's too corporate-sounding" |
campaign_outcome | Results and metrics | "Email open rate was 34%" |
stakeholder | New info about key people | "Sarah is taking over the rebrand brief" |
fact | Other noteworthy business facts | "Series B closed last month" |
The pipeline is conservative: it only extracts statements that are definitive, not hypothetical. "If the board approves, we might pause LinkedIn ads" would not be extracted as a decision.
Sales Minds: Enhanced Extraction#
When you ingest a transcript into a sales mind, the extraction pipeline applies additional sales-specific heuristics. It prioritizes four memory types that matter most for deal progression:
| Type | What It Captures |
|---|---|
buying_signal | Interest, urgency, budget availability, timeline pressure - even subtle positive signals |
objection | Any pushback, concern, or hesitation the prospect raised |
pain_point | Underlying problems the prospect needs solved |
prospect_alternative | Competitors, DIY approaches, or do-nothing options they mentioned |
The sales extraction prompt is more aggressive than the standard prompt: it extracts more signals per transcript and treats subtle cues (e.g., "that's interesting" after pricing) as meaningful. This makes sales minds significantly more useful for tracking deal momentum and preparing for follow-up calls.
Non-sales minds use the standard extraction prompt and will still capture these types if they appear, but without the extra emphasis.
How the Pipeline Works#
- Chunking -- Long transcripts are split into ~2000-character chunks with 200-character overlap to prevent losing context at boundaries
- Dedup check -- Each chunk is hashed (SHA-256) and checked against previously processed chunks. Already-processed chunks are skipped.
- LLM extraction -- Chunks are sent to the LLM in parallel batches (3 at a time for transcript extraction, 5 for file ingestion) using
Promise.allSettled. Each batch's results are collected before the next batch starts. - Quality gate -- Extracted memories must pass: minimum 10 characters, valid type, confidence >= 0.3, and no semantic duplicate (similarity > 0.95 with existing memory)
- Storage -- Qualifying memories are embedded and stored in parallel batches of 5
Handling Duplicates#
You can safely re-run ingestion on the same content:
- Content hash dedup prevents the same chunk from being re-processed
- Semantic dedup prevents storing paraphrases of existing memories
- Resume support -- if ingestion is interrupted (crash, Ctrl-C), re-running picks up where it left off
Viewing Ingested Documents#
"Show me what documents have been ingested for Acme"
This calls documents and returns a list of raw documents with word counts, extraction pass counts, and timestamps.
After Ingestion#
After ingesting content, Claude will summarize what Coppermind extracted:
I've processed the transcript for Acme Corp. Here's what I found:
>
- 8 memories extracted from 12 chunks (3 chunks were already in the system)
- 3 decisions, 2 commitments, 1 preference, 1 campaign outcome, 1 stakeholder note
- 2 extractions were below the confidence threshold and were skipped
>
The new knowledge is ready to use. Want me to prep your next meeting with Acme?
Run a search or meeting prep to see the new knowledge in action:
"What did we discuss in the last meeting?"
"Prep my next meeting with Acme"
Using Any Note-Taker (No Connector Required)#
Not every transcription tool has a direct connector. That's fine -- the paste method works with all of them:
| Tool | How to Get the Transcript |
|---|---|
| Fathom | Open any meeting, click the transcript tab, select all, copy |
| Otter.ai | Open the meeting, click the transcript, select all, copy |
| Fireflies | Go to meeting details, copy transcript text |
| Granola | Export as markdown or copy from the meeting view |
| Google Meet (built-in) | Download the transcript from Google Drive after the meeting |
| Teams (built-in) | Download from the meeting recap in Teams |
Then paste into Claude and say "ingest this meeting with [Client Name]." Coppermind handles the rest.
Tip: If you do this after every meeting, your briefs improve dramatically after 3-4 meetings. It takes 30 seconds.
Tips for Better Extraction#
- Paste transcripts after every meeting. This is the single most impactful habit.
- Use a transcription tool. Granola, Otter.ai, Fathom, or Fireflies all produce transcripts that Coppermind handles well.
- Include speaker labels when available. They help the extraction pipeline attribute statements correctly.
- Don't worry about formatting. Plain text works fine. The pipeline handles various formats.
- Supplement with quick notes. Between meetings, capture things the transcript missed: "Quick note: Josh wants the budget by Thursday."
Key Details#
- Ingestion is a batch operation. It runs after the meeting, not during it.
- Each chunk has a 30-second timeout. If the LLM stalls, the chunk is skipped and can be retried.
- Confidence threshold is 0.3. Extractions below this are discarded as too uncertain.
- Raw documents are preserved. The original text is stored in
raw_documentsfor future re-extraction with improved prompts. - No speaker attribution yet. The current pipeline does not attribute statements to specific speakers. This is planned for a future release.
Ready to try this yourself?
Coppermind is free to start and runs inside Claude. Your first meeting prep will convince you.
Try Coppermind Free