Skip to content
StudentOps

← All docs

Architecture

Architecture

Two sites, one monorepo

                                      ┌──────────────────────────────┐
                                      │  studentops.vikrant69g.com   │
              Cloudflare Pages  ──▶   │       (Astro 5, static)      │
                                      └──────────────────────────────┘

                                      ┌──────────────────────────────┐
                                      │     app.vikrant69g.com       │
                  Fly.io (syd)   ──▶  │  Streamlit + Prefect worker  │
                                      │     password-gated single    │
                                      │            tenant            │
                                      └──────┬───────────────┬───────┘
                                             │               │
                                             ▼               ▼
                              Supabase Postgres        Anthropic + Resend

The marketing site is fully static and zero-dependency at runtime. The app is the one that talks to the warehouse and external APIs. Each gets its own CI workflow; a marketing edit cannot break an app deploy and vice versa.

DNS topology

Both subdomains live under vikrant69g.com, which is already on Cloudflare DNS. The records:

SubdomainTypeTargetProxiedNotes
studentops.vikrant69g.comCNAME<project>.pages.devyesConfigured in Cloudflare Pages custom domain
app.vikrant69g.comCNAME<fly-app>.fly.devyesflyctl certs add issues the cert

Cloudflare SSL mode is Full (Strict) to avoid redirect loops. HSTS is enabled at the zone level with a one-week max-age to start with. Always Use HTTPS is on.

Data flow

Bronze / silver / gold lives in schemas, not separate databases:

flowchart TB
  subgraph Bronze
    R1[raw.ical_events]
    R2[raw.gcal_events]
    R3[raw.zotero_items]
    R4[raw.notes]
    R5[raw.bibtex_entries]
    R6[raw.ris_entries]
    R7[raw.citation_cache]
  end
  subgraph Silver
    S1[stg_ical_events]
    S2[stg_gcal_events]
    S3[stg_zotero_items]
    S4[stg_notes]
    S5[stg_bibtex_entries]
    S6[stg_ris_entries]
    S7[stg_citation_cache]
  end
  subgraph Gold
    D1[dim_courses]
    D2[dim_assessments]
    F1[fact_study_sessions]
    F2[fact_readings]
    F3[fact_notes_activity]
    F4[fact_citations]
  end
  R1-->S1
  R2-->S2
  R3-->S3
  R4-->S4
  R5-->S5
  R6-->S6
  R7-->S7
  S1-->D1
  S4-->D2
  S2-->F1
  S3-->F2
  S4-->F3
  S7-->F4

Gold-layer ERD

dim_courses (course_sk PK, course_code)
dim_assessments (assessment_sk PK, note_path, course_code, due_date, weight_pct, status, valid_from, valid_to, is_current)
fact_study_sessions (session_sk PK, event_id, course_code, start_at, duration_minutes)
fact_readings (reading_sk PK, zotero_key, doi, linked_note_path, has_notes)
fact_notes_activity (note_sk PK, note_path, note_type, course_code, word_count, modified_local_date)
fact_citations (citation_sk PK, doi, provider, zotero_key, bibtex_key)

Course code is the join key between gold tables; we deliberately don’t expose the surrogate keys across facts because the dataset is small enough that human-readable course codes win on debuggability.

Idempotency strategy per source

SourceNatural keyStrategy
iCal feedsuidDelete + insert per UID per ingest pass
Google Calendarevent.idDelete + insert; nextSyncToken cuts work
ZoteroZotero keyDelete + insert; full library each run
Markdown notespath relative to repo rootDelete + insert; walk + mtime each run
BibTeXcitation keyDelete + insert per inbox file
RISsha256(type/title/doi/year)Delete + insert; deterministic hash
OpenAlex/CrossRefDOICache row (resolved=true or false)

A second run with the same inputs yields the same row counts.

Observability

Every flow opens a row in meta.flow_runs at start and updates it at end via the run_context helper. Each log line is structured JSON with run_id, flow_name, and source. The Pipelines dashboard page reads from this table; the daily digest can hook into it for freshness alerts.

meta.ai_usage tracks every Anthropic call with token counts and an estimated cost. The Anthropic client refuses to call once today’s spend crosses ANTHROPIC_DAILY_BUDGET_USD.

Pipelines and schedules

FlowCron (Australia/Adelaide)
ingestion (all sources)06:30 daily
dbt snapshot + build06:45 daily
AI summariser batch23:00 daily
daily digest (Resend)07:00 daily
weekly summary (Resend)18:00 Sunday