Architecture
Architecture
Two sites, one monorepo
┌──────────────────────────────┐
│ studentops.vikrant69g.com │
Cloudflare Pages ──▶ │ (Astro 5, static) │
└──────────────────────────────┘
┌──────────────────────────────┐
│ app.vikrant69g.com │
Fly.io (syd) ──▶ │ Streamlit + Prefect worker │
│ password-gated single │
│ tenant │
└──────┬───────────────┬───────┘
│ │
▼ ▼
Supabase Postgres Anthropic + Resend
The marketing site is fully static and zero-dependency at runtime. The app is the one that talks to the warehouse and external APIs. Each gets its own CI workflow; a marketing edit cannot break an app deploy and vice versa.
DNS topology
Both subdomains live under vikrant69g.com, which is already on Cloudflare
DNS. The records:
| Subdomain | Type | Target | Proxied | Notes |
|---|---|---|---|---|
studentops.vikrant69g.com | CNAME | <project>.pages.dev | yes | Configured in Cloudflare Pages custom domain |
app.vikrant69g.com | CNAME | <fly-app>.fly.dev | yes | flyctl certs add issues the cert |
Cloudflare SSL mode is Full (Strict) to avoid redirect loops. HSTS is
enabled at the zone level with a one-week max-age to start with. Always Use
HTTPS is on.
Data flow
Bronze / silver / gold lives in schemas, not separate databases:
raw.*- exact source payloads. Append-aware on the natural key. JSONB body plus a small number of extracted columns for indexing. Audit columns:_ingested_at,_source,_natural_key. Never transformed.stg_*- dbt views that flatten and type the JSON. One per source.mart_*- dbt tables. Facts and conformed dimensions. SCD2 only where the brief calls for it (dim_assessments).meta.*- run metadata, OAuth tokens, AI usage. Python-owned, not dbt.
flowchart TB
subgraph Bronze
R1[raw.ical_events]
R2[raw.gcal_events]
R3[raw.zotero_items]
R4[raw.notes]
R5[raw.bibtex_entries]
R6[raw.ris_entries]
R7[raw.citation_cache]
end
subgraph Silver
S1[stg_ical_events]
S2[stg_gcal_events]
S3[stg_zotero_items]
S4[stg_notes]
S5[stg_bibtex_entries]
S6[stg_ris_entries]
S7[stg_citation_cache]
end
subgraph Gold
D1[dim_courses]
D2[dim_assessments]
F1[fact_study_sessions]
F2[fact_readings]
F3[fact_notes_activity]
F4[fact_citations]
end
R1-->S1
R2-->S2
R3-->S3
R4-->S4
R5-->S5
R6-->S6
R7-->S7
S1-->D1
S4-->D2
S2-->F1
S3-->F2
S4-->F3
S7-->F4
Gold-layer ERD
dim_courses (course_sk PK, course_code)
dim_assessments (assessment_sk PK, note_path, course_code, due_date, weight_pct, status, valid_from, valid_to, is_current)
fact_study_sessions (session_sk PK, event_id, course_code, start_at, duration_minutes)
fact_readings (reading_sk PK, zotero_key, doi, linked_note_path, has_notes)
fact_notes_activity (note_sk PK, note_path, note_type, course_code, word_count, modified_local_date)
fact_citations (citation_sk PK, doi, provider, zotero_key, bibtex_key)
Course code is the join key between gold tables; we deliberately don’t expose the surrogate keys across facts because the dataset is small enough that human-readable course codes win on debuggability.
Idempotency strategy per source
| Source | Natural key | Strategy |
|---|---|---|
| iCal feeds | uid | Delete + insert per UID per ingest pass |
| Google Calendar | event.id | Delete + insert; nextSyncToken cuts work |
| Zotero | Zotero key | Delete + insert; full library each run |
| Markdown notes | path relative to repo root | Delete + insert; walk + mtime each run |
| BibTeX | citation key | Delete + insert per inbox file |
| RIS | sha256(type/title/doi/year) | Delete + insert; deterministic hash |
| OpenAlex/CrossRef | DOI | Cache row (resolved=true or false) |
A second run with the same inputs yields the same row counts.
Observability
Every flow opens a row in meta.flow_runs at start and updates it at end via
the run_context helper. Each log line is structured JSON with run_id,
flow_name, and source. The Pipelines dashboard page reads from this
table; the daily digest can hook into it for freshness alerts.
meta.ai_usage tracks every Anthropic call with token counts and an
estimated cost. The Anthropic client refuses to call once today’s spend
crosses ANTHROPIC_DAILY_BUDGET_USD.
Pipelines and schedules
| Flow | Cron (Australia/Adelaide) |
|---|---|
| ingestion (all sources) | 06:30 daily |
| dbt snapshot + build | 06:45 daily |
| AI summariser batch | 23:00 daily |
| daily digest (Resend) | 07:00 daily |
| weekly summary (Resend) | 18:00 Sunday |