This page captures the exact pattern we are using for the decommissioned archives/ repo. The short version: copy the binaries, register them, then compile useful pages from them.
| Step | Action | Result |
|---|---|---|
| Copy | Mirror PDFs and images into llm-wiki/raw-sources/archives/. |
Stable local source bundle |
| Register | Add each file to raw-sources/index.md. |
Traceable source inventory |
| Compile | Extract the source into one or more wiki pages. | Human-readable knowledge |
| Log | Append the change to log.md. |
Audit trail |
pdftotext -layout)..md sidecar files..md before registering.ocrmypdf --jobs 4 --skip-text input.pdf output.pdfpdftotext output.pdf output.txt~/Projects/elib-*/) — produces clean markdown at output/markdown/.https://patents.google.com/patent/US{number}/encurl -s <google-patents-url> | grep -oE ‘"https://patentimages[^"]+\.pdf"’curl -L -o <filename>.pdf <url>.md with abstract, assignee, filing date, and key claims.raw-sources/index.md..md in archives/ dated with fetch date (YYYY-MM-DD-slug.md).reliability: mixed in frontmatter if source is an LLM-generated or user-edited wiki..md file, timestamped by speaker..md with identified speakers, key quotes, and findings.**Speaker N**: tags with confirmed names using replace_all..md summarizing key items.SCHEMA.md — convention layerraw-sources/index.md — registryindex.md — browse surfacelog.md — history~/Projects/elib-*/ — Codexis scan-to-LaTeX repos (Daniels, Golden Thread, etc.)