Astro for static sites

2025-11-19 08:48 UTC gpt-5-1-thinking Open in ChatGPT ↗

I’ll put it this way: Astro exists for people exactly like you who:

want to build sites
hate shipping 500kb of JS + a framework
don’t want to babysit a SPA just to render some text and images

So, in very plain terms:

1. What problem is Astro actually solving?

Problem 1: “Why is there so much JavaScript for a mostly-static site?”

Traditional React/Vue/etc setups:

Render everything with JS
Ship that JS to the browser
Hydrate it, run it, maintain it

Even if your site is basically static content, you end up with a mini-application runtime on every page.

Astro’s core idea:

“If the page doesn’t need JS on the client, don’t ship any.”

It renders everything to static HTML at build time and, by default, sends zero JS to the browser. JS is opt-in and per component, not “here’s the whole app runtime just in case.”

Problem 2: “I don’t want to pick one UI religion.”

In React-land, everything must be React. In Vue-land, everything must be Vue. Etc.

Astro says:

Use React and/or Vue and/or Svelte and/or Solid and/or just plain HTML
All in the same project if you want
They’re mostly compiled away to HTML, with optional islands of interactivity

So the problem: “I want to use whatever UI bits I like but still end up with a fast, static site.”
Astro’s answer: act as a build-time layer that stitches them together, then spits out simple HTML.

Problem 3: “I just want content + a layout, not an app architecture.”

Astro leans into:

Markdown / MDX content
File-based routing (src/pages/about.astro → /about)
Layout components
Fetching content at build time

It’s basically a static-site generator (SSG) that understands components and partial hydration, instead of being a full-on SPA framework.

So the problem it solves is:

“I want an SSG that isn’t dumb templates but also isn’t a full-blown JS app.”

2. Why is it better than just not using it at all?

Let’s compare three paths:

A. Raw HTML/CSS + a bit of hand JS

Pros:

Absolute control
Zero magic
Zero framework overhead

Cons:

You hand-roll routing, layouts, partials, includes
Updating shared layout (nav/footer/meta) means editing multiple files or hacking your own build
No “content system” out of the box (Markdown, collections, etc.)
If the site grows, you eventually reinvent a static site generator

If your site stays tiny, this is fine. If it grows past ~5–10 pages, it starts to hurt.

B. Classic SPA / framework (React/Vue/etc)

Pros:

Component model
Lots of libraries, ecosystem

Cons:

You’re shipping an app even when you just wanted a site
Performance overhead
Tooling overhead
Pain you already know and dislike 😀

C. Astro

Astro gives you:

A component system (layouts, partials, etc.)
Static site output (plain HTML files)
Almost no JS footprint by default
A fairly sane dev experience: Vite dev server, hot reload, etc.

So “better than not using it at all” when:

You have more than a handful of pages
You want shared layouts, components, content collections, etc.
You don’t want to maintain a bunch of custom build scripts for templating

It’s basically the middle ground:

Less painful than hand-rolled everything, without turning the site into an app.

3. Should you use it for a static portfolio site (with a basic CMS)?

Short answer: If it’s truly just a small portfolio, you can skip Astro. But Astro is a very reasonable choice, especially if:

You want blog posts / case studies / projects in Markdown or from a CMS
You want to keep the frontend as static and lightweight as possible
You might later drop in small interactive bits (e.g. WebGL canvas, code demos) without dragging in a SPA

Concrete scenarios

Use plain static + maybe a light build step if:

It’s 3–5 pages: home, about, work, contact
Content doesn’t change often
You’re comfortable editing HTML/Markdown directly or using a super-simple generator

In that case, Astro is nice, but not necessary.

Astro becomes appealing if:

You want a blog or project list that grows over time
You want to write in Markdown/MDX and have Astro generate pages
You want to hook into a headless CMS (Sanity, Contentful, etc.) at build time
You want to keep the output static (host on Netlify, Vercel, S3, whatever)
You want to occasionally embed heavier things (WebGL sketches, interactive toys) as islands, without turning the whole site into a JS app

Example flow:

Content lives in CMS or Markdown
Astro builds:
- / from src/pages/index.astro
- /projects/my-cool-demo from Markdown + a layout
You have one Layout.astro; changing nav/footer/meta updates everything
Frontend is still just HTML + some optional JS islands

4. Mental model for you as a graphics/creative coder

Think of Astro as:

A static-site build tool that:

treats components as a compilation detail,

outputs HTML first,

and only sprinkles JS where you explicitly ask for it.

Not: “another framework runtime.”
More: “a smart compiler for content + components → static site.”

If that mental model feels tolerable rather than disgusting, Astro might be a good fit for your portfolio + basic CMS setup. If you’d rather stay absolutely minimal and your site is tiny, straight HTML/Markdown + a simple static generator (or even just a Makefile) will still be the cleanest thing.

Short answer:
If you’re happy with your Python+Jinja setup and your site isn’t getting “complicated,” Astro probably does not buy you enough to justify a switch right now.

Let’s be concrete.

What you already have vs what Astro is

Your current setup:

Python script
Jinja templates
Spits out static HTML
Push to GitHub / GitHub Pages

That is already:

Templating ✅
Static generation ✅
No unnecessary JS ✅

This is 80% of what Astro is trying to achieve, just without Node and .astro files.

So Astro isn’t “solving a problem you fundamentally can’t solve,” it’s more:

“Here’s a standardized, batteries-included version of the static generator you already wrote.”

When Astro might make your life easier

Astro starts to look attractive if you hit any of these pain points:

You want component-y UIs without hand-rolling
Things like:
- Nested components
- Slots / layouts
- Reusable “cards”, “project blocks”, “gallery items”
  Jinja can do this, but Astro’s component model is a bit nicer for UI-heavy stuff, especially once you mix HTML + some client-side behavior.
You want to sprinkle in interactivity cleanly Example: small React/Svelte components for:
- WebGL demos
- Sliders / UI toys
- Interactive visualizations
  Astro’s “islands” model lets you:
- Keep 95% static HTML
- Hydrate only the few components that need JS
- Reuse existing React/Svelte ecosystem if you care
You want the “site stuff” done for you Astro has first-class stuff like:
- File-based routing
- RSS feed, sitemap, etc. via integrations
- Markdown/MDX content handling
- A nice dev server with hot reload

You can code all of that in Python, but if you don’t enjoy maintaining the glue, Astro centralizes it.

When switching is probably not worth it

I’d say don’t bother switching if:

Your site is:
- A portfolio
- A few content pages
- Maybe a blog or projects list
You already:
- Have Jinja layouts/partials
- Are comfortable tweaking the Python script
Interactivity is:
- Minimal and custom (you write tiny JS modules anyway)
- Not using any JS component framework

In that case, moving to Astro adds:

Node toolchain + another dev stack to care about
Learning .astro syntax & conventions
Migration work for… roughly the same end result

That’s a lot of churn for not much gain.

A more blunt “gut-check” version

Answer no, don’t switch if your feelings are:

“I like that my build is a 50-line Python script I fully control.”

Astro is more like:

“I’m okay adopting a Node-based static-site meta-framework because I want good DX, plugins, and JS component islands.”

If the idea of installing Node, configuring Astro, and living in that ecosystem already annoys you on a philosophical level, that’s your answer. Your Python+Jinja setup is already the “minimal, non-gross, webdev-lite” version of what Astro is going for.

Practical rule of thumb

Stick with Python+Jinja
until you actually feel pain around:
- Content modeling
- Layout/component complexity
- Interactivity sprinkled across many pages
- “Ugh, I’m reimplementing features SSGs already solved”
Consider Astro
only if your site starts to feel closer to a “content-heavy frontend project” than “some HTML pages I generate with a script.”

Until then, you’re not missing anything crucial.

Let’s make this super concrete and a bit opinionated.

We’ll build the same tiny “Projects Gallery” site three ways:

plain HTML + JS
React SPA
Astro (with boring DOM JS, no React/Svelte, to match your taste)

Feature set:

A list of projects (title + tag like webgl, tool, etc.)
A button: “Show only WebGL projects”

1. Plain HTML + JS

Files

index.html

<!doctype html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>My Projects</title>
  <style>
    .project-card { border: 1px solid #ccc; padding: 8px; margin: 8px 0; }
    .hidden { display: none; }
  </style>
</head>
<body>
  <h1>Projects</h1>

  <button id="filter-webgl">Show only WebGL</button>
  <button id="show-all">Show all</button>

  <section id="projects">
    <article class="project-card" data-tag="webgl">
      <h2>Fluid Sim</h2>
      <p>GPU-based fluid simulation.</p>
    </article>

    <article class="project-card" data-tag="tool">
      <h2>Shader Toy Clone</h2>
      <p>Online GLSL editor.</p>
    </article>

    <article class="project-card" data-tag="webgl">
      <h2>Voxel Renderer</h2>
      <p>Raymarching voxels in WebGL.</p>
    </article>
  </section>

  <script>
    const cards = document.querySelectorAll('.project-card');
    document.getElementById('filter-webgl').addEventListener('click', () => {
      cards.forEach(card => {
        card.classList.toggle('hidden', card.dataset.tag !== 'webgl');
      });
    });
    document.getElementById('show-all').addEventListener('click', () => {
      cards.forEach(card => card.classList.remove('hidden'));
    });
  </script>
</body>
</html>

What this gives you

Tiny, understandable.
No build step.
You fully control everything.

Where it starts to suck (as a site)

Add a blog? Now you’ve got multiple HTML files, repeated header/footer/nav.
Want a “projects.json” single source of truth? You start writing scripts/templates anyway.
Want markdown content? More scripts.
RSS / sitemap / image optimization / etc.? More glue code.

You start reinventing a static-site generator with Python/Jinja (which you already did).

2. React SPA version

Assume Vite or Create React App or whatever.

Files (simplified)

index.html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>My Projects</title>
  </head>
  <body>
    <div id="root"></div>
    <script type="module" src="/src/main.jsx"></script>
  </body>
</html>

src/main.jsx

import React from "react";
import ReactDOM from "react-dom/client";
import App from "./App.jsx";

ReactDOM.createRoot(document.getElementById("root")).render(
  <React.StrictMode>
    <App />
  </React.StrictMode>
);

src/App.jsx

import { useState } from "react";

const projects = [
  { title: "Fluid Sim", tag: "webgl", desc: "GPU-based fluid simulation." },
  { title: "Shader Toy Clone", tag: "tool", desc: "Online GLSL editor." },
  { title: "Voxel Renderer", tag: "webgl", desc: "Raymarching voxels in WebGL." }
];

function App() {
  const [filter, setFilter] = useState("all");

  const visible = projects.filter(p =>
    filter === "all" ? true : p.tag === "webgl"
  );

  return (
    <div>
      <h1>Projects</h1>
      <button onClick={() => setFilter("webgl")}>Show only WebGL</button>
      <button onClick={() => setFilter("all")}>Show all</button>

      <section>
        {visible.map(project => (
          <article key={project.title} className="project-card">
            <h2>{project.title}</h2>
            <p>{project.desc}</p>
            <small>{project.tag}</small>
          </article>
        ))}
      </section>
    </div>
  );
}

export default App;

What this gives you

Component model: nice for large UIs.
State, events, composition all built-in.
Ecosystem of UI libs, etc.

Downsides for “just a site”

Browser downloads a React bundle, the app JS, then hydrates.
For a content-heavy site, 90% of the work is done twice: server render (if you configure SSR) + client hydrate.
Even if nothing interactive exists, you still ship an app.

For a simple gallery + blog, this is basically using Unreal just to draw a triangle.

3. Astro version

Here’s the key idea:

Use Astro as a static site compiler:

Write components (like React-ish templates),

generate static HTML,

sprinkle a bit of JS where needed.

You don’t have to use React at all. We’ll keep the interactivity as plain DOM JS so this feels less gross.

File tree

my-site/
  src/
    pages/
      index.astro
    layouts/
      BaseLayout.astro
  package.json

`src/layouts/BaseLayout.astro`

---
const { title } = Astro.props;
---
<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>{title}</title>
    <style>
      .project-card { border: 1px solid #ccc; padding: 8px; margin: 8px 0; }
      .hidden { display: none; }
    </style>
  </head>
  <body>
    <header>
      <h1>{title}</h1>
      <nav>
        <a href="/">Projects</a>
      </nav>
    </header>

    <main>
      <slot />
    </main>
  </body>
</html>

`src/pages/index.astro`

---
const projects = [
  { title: "Fluid Sim", tag: "webgl", desc: "GPU-based fluid simulation." },
  { title: "Shader Toy Clone", tag: "tool", desc: "Online GLSL editor." },
  { title: "Voxel Renderer", tag: "webgl", desc: "Raymarching voxels in WebGL." },
];
---

---
import BaseLayout from "../layouts/BaseLayout.astro";
---

<BaseLayout title="My Projects">
  <button id="filter-webgl">Show only WebGL</button>
  <button id="show-all">Show all</button>

  <section id="projects">
    {projects.map((p) => (
      <article class="project-card" data-tag={p.tag}>
        <h2>{p.title}</h2>
        <p>{p.desc}</p>
        <small>{p.tag}</small>
      </article>
    ))}
  </section>

  <script>
    const cards = document.querySelectorAll('.project-card');
    document.getElementById('filter-webgl').addEventListener('click', () => {
      cards.forEach(card => {
        card.classList.toggle('hidden', card.dataset.tag !== 'webgl');
      });
    });
    document.getElementById('show-all').addEventListener('click', () => {
      cards.forEach(card => card.classList.remove('hidden'));
    });
  </script>
</BaseLayout>

Note: That <script> tag runs on the client, but only that JS is shipped. No React runtime, no Astro runtime on the client. The rest is static HTML.

Dev workflow

Roughly:

npm create astro@latest my-site
cd my-site
# drop the files above into src/pages and src/layouts
npm run dev        # dev server
npm run build      # static HTML output in ./dist

Then host dist/ anywhere (GitHub Pages, Netlify, S3, whatever).

What problem did Astro just solve?

Compared to plain HTML + JS:

You got a layout component (BaseLayout.astro), so:
- Add /blog, /about, /contact files and they all share the same header/footer/meta.
- Change the site chrome in one place.
Your project data is now a single array, rendered via {projects.map(...)}:
- No copy-paste of <article> blocks.
- Easy to later move that data to a separate file, CMS, or markdown collection.
You get routing for free:
- src/pages/about.astro → /about
You get a proper static build pipeline out of the box:
- Minified HTML, asset handling, etc.
You can still write raw DOM JS like a normal person.

Compared to React SPA:

The final HTML looks similar to the plain HTML version.
JS shipped to the client is just your filter script — not React, not Astro runtime.
No hydration of a giant app just to filter some cards.

Astro is essentially:

“React-like templating + static-site generator + no forced client-side framework.”

Strengths side by side

Plain HTML + JS

Strengths

Minimal stack.
Zero build tools if you want.
Perfect for truly tiny sites or prototypes.

Weaknesses

You quickly reimplement:
- layouts/partials
- routing
- content pipeline
Site-wide changes require discipline or scripts.

React SPA

Strengths

Component model, state, hooks: great for complex, heavily interactive UIs.
Huge ecosystem, UI libs, etc.

Weaknesses

Always an “app”, even when you just want documents.
More JS download, hydration, complexity.
Overkill for content-first sites.

Astro

Strengths

Templating + layouts + routing like a framework, but:
- Outputs static HTML.
- Ships no JS by default.
You can:
- Keep using plain DOM JS for small interactions, or
- Opt into islands (React/Svelte/etc.) only where needed.
Great fit for:
- Portfolios
- Blogs
- Docs
- “Site with some interactive toys”

Weaknesses

You now live in Node-land:
- npm / pnpm / node_modules
- Astro config and conventions
It’s still “a system” to learn and maintain, unlike your 50-line Python script.

So… why would you care?

Given you already have a minimal Jinja/Python setup:

Astro earns its keep if:

You want:
- nicer component/layout ergonomics than Jinja,
- built-in content collections / markdown,
- a path to small, isolated interactive bits without going full SPA.
You’re okay paying the cost of:
- Node tooling,
- learning Astro’s syntax and conventions.

If your current stack already gives you:

templates,
routing,
basic content management,

and you’re not feeling pain, Astro is “nice” but not necessary.

But if you like the idea of:

“Treat the web like static documents by default, with a component system and a build tool that doesn’t vomit a mountain of JS”

then that little Astro example above is pretty close to how you’d actually work day-to-day.

Let’s explain hydration in the lowest-bullshit… highest-clarity way possible.

💧 What “hydration” actually means

Hydration is what happens when:

The server sends you HTML that was rendered using a JavaScript framework (React, Vue, Svelte, etc.).
The browser displays that HTML immediately — it looks like a complete page.
Then, the JavaScript framework downloads, runs, and attaches event listeners + reactivates the UI so the page becomes interactive.

In other words:

Hydration = “make the static HTML come alive by re-running the JS framework on top of it.”

Or more bluntly:

The server draws the UI,
the browser redraws the UI again with JS
just to wire up button clicks, state, etc.

The second rendering pass is the “hydration.”

💧 Why does hydration exist?

Because early SPAs (Single-Page Apps) had this problem:

Client-side rendering (CSR) = blank page → JS loads → UI appears
→ slow, bad for SEO

So frameworks started doing SSR (server-side rendering):

Server renders the first view to HTML → fast, SEO-friendly.

But once that HTML arrives in the browser, the framework on the client still needs to take control, because React/Vue/etc. are driven by JS state.

Thus they invented a process to:

Take static HTML
Reattach framework state + event handlers
Turn it into a fully functional SPA

This reattachment step = hydration.

💧 Who actually coined the term?

The term “hydration” comes from the React team (around React 0.14–15 era, 2015-ish) when they introduced ReactDOM.hydrate().

However:

The concept existed earlier in the Rails/Turbogears era as “attach behavior to server-generated HTML.”
But the modern meaning (hydrate a virtual DOM with event listeners) was popularized and standardized by React, and then adopted by Vue, SvelteKit, Next.js, Nuxt, etc.

So:

React popularized it, the JS ecosystem copied it.

💧 Why is hydration kind of terrible?

Because it means:

You ship all JavaScript for every interactive part.
Even if the user never interacts with half the UI.
And the JS must re-run to “claim” the static HTML.
This costs:
- CPU
- battery
- bandwidth
- latency
You basically render twice.

Hydration is one reason why even simple React sites often ship hundreds of KB of JS just to do basic stuff like handle a button click.

💧 Why is Astro always bragging about “no hydration”?

Because Astro’s model flips the script:

Server generates static HTML (same as SSR).
But Astro does not hydrate anything by default.
It ships 0kb of JavaScript unless you explicitly opt in.

Then you can selectively hydrate just a component:

<MyReactWidget client:load />

This is called partial hydration or “islands.”

Meaning:

Your header/footer/blog posts remain dead simple HTML.
Only the interactive bits get hydrated.

💧 Summary in idiot-proof terms

Hydration =
“Your JavaScript framework waking up on the client and taking over the DOM that was already rendered by the server.”

React brought it to fame.
Vue, Svelte, Solid, etc. adopted it.
Next.js and other SSR frameworks rely heavily on it.
Astro’s main selling point is:

“Stop hydrating everything unless you absolutely have to.”

If you want, I can also explain:

why hydration is expensive
what partial hydration is
what resumability (Qwik) is
how Astro handles islands under the hood

Just say which one you want.

I’m worried that my jinja skills are juvenile. Check this jinja file and see if it can be imrpoved reasonably. Don’t improve for improvement’s sake. The changes (if any) must be fundamental.architectural.

{% include 'ulc_head_template.html.j2' %}

<style>
    @media (max-width: 512px) {
      #summary-group {
        flex-direction: column !important; /* Stack elements vertically */
        gap: 16px;
      }
    }

            @media (min-width: 513px) {
      #side-info {
            border-left: 1px solid white;
            padding-left: 12px;
      }
    }
</style>

{% include ‘ulc_navbar_template.html.j2’ %}

<section class="ulc-project-page">

    <div class="container site-section">

        <div class="row justify-content-between">
            <!-- MAIN CONTENT -->
            <div class="col-md-8" id="main-content">

                <!-- Header / Summary -->
                <div class="mb-4" data-aos="fade-up">
                    <!-- PROJECT TITLE -->
                    <h2>{{ project_title }}</h2>

                    <!-- PROJECT RELEASE DATE -->
                    <h6 class="h6">{{ project_date.year }}</h6>

                    <!-- PROJECT SUMMARY -->
                    <p class="mt-4 ulc-project-summary">
                        {{ project_summary }}
                        <br>
                        <a href="#description">Read More...</a>
                    </p>
                </div>

                <!-- MEDIA -->
                <div class="mb-4" data-aos="fade-up">
                    <!-- VIDEOS -->
                    {% for v in videos %}
                    <div class="video-container-hd mb-3">
                        <iframe
                                class="video"
                                src="{{ v.url }}"
                                title="{{ v.title }}"
                                frameborder="0"
                                allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
                                allowfullscreen>
                        </iframe>
                    </div>
                    {% endfor %}

                    <!-- IMAGES CAROUSEL -->
                    {% if images %}
                    <div id="carouselExampleIndicators" class="carousel slide w-100 overflow-hidden">
                        <div class="carousel-indicators">
                            {% for img in images %}
                            <button
                                    type="button"
                                    data-bs-target="#carouselExampleIndicators"
                                    data-bs-slide-to="{{ loop.index0 }}"
                                    class="{% if loop.first %}active{% endif %}"
                                    {% if loop.first %}aria-current="true" {% endif %}
                                    aria-label="Slide {{ loop.index }}">
                            </button>
                            {% endfor %}
                        </div>

                        <div class="carousel-inner">
                            {% for img in images %}
                            <div class="carousel-item {% if loop.first %}active{% endif %}">
                                <img
                                        src="{{ img }}"
                                        alt=""
                                        class="img-fluid mt-2"
                                        style="width:100%; object-fit:{% if loop.first %}contain{% else %}cover{% endif %};">
                                <br>
                            </div>
                            {% endfor %}
                        </div>

                        <button class="carousel-control-prev" type="button"
                                data-bs-target="#carouselExampleIndicators" data-bs-slide="prev">
                            <span class="carousel-control-prev-icon" aria-hidden="true"></span>
                            <span class="visually-hidden">Previous</span>
                        </button>
                        <button class="carousel-control-next" type="button"
                                data-bs-target="#carouselExampleIndicators" data-bs-slide="next">
                            <span class="carousel-control-next-icon" aria-hidden="true"></span>
                            <span class="visually-hidden">Next</span>
                        </button>
                    </div>
                    {% endif %}
                </div>

                <!-- DESCRIPTION -->
                <div id="description"
                     class="ulc-project-description"
                     data-aos="fade-up"
                     data-aos-delay="100">
                    <div class="sticky-content">
                        <h3 class="h4 mt-4">Description</h3>

                        <div class="mb-5">
                            {% for p in paragraphs %}
                            <p>{{ p }}</p>
                            {% endfor %}
                        </div>
                    </div>
                </div>
            </div>

            <!-- SECONDARY CONTENT -->
            <aside class="col-md-auto">
                <div id="side-info" class="d-flex flex-column gap-3">

                    <!-- COMMISSIONER -->
                    {% if commissioner %}
                    <section id="commissioner">
                        <h3 class="mb-3">{{ commissioner.role }}</h3>
                        <ul class="list-unstyled list-line">
                            <li>
                                <a href="{{ commissioner.url }}" target="_blank" class="link-1">
                                    {{ commissioner.name }}
                                </a>
                            </li>
                        </ul>
                    </section>
                    {% endif %}

                    <!-- SHOWS & EXHIBITIONS -->
                    {% if show_groups %}
                    <section id="shows">
                        <h3 class="mb-3">Shows &amp; Exhibitions</h3>
                        <ul class="list-unstyled list-line">
                            {% for show_group in show_groups %}
                            <li class="mb-2">
                                <div class="fw-bold">{{ show_group.year }}</div>
                                {% for show in show_group.shows %}
                                <div>
                                    <a href="{{ show.url }}" target="_blank" class="link-1">
                                        {{ show.name }}
                                    </a>
                                </div>
                                {% endfor %}
                            </li>
                            {% endfor %}
                        </ul>
                    </section>
                    {% endif %}

                    <!-- CREDITS -->
                    {% if collaborators %}
                    <section id="credits">
                        <h3 class="mb-3">Credits</h3>
                        <ul class="list-unstyled mb-0 list-line d-flex flex-column gap-2">
                            {% for item in collaborators %}
                            <li>
                                <div>{{ item.role }}</div>
                                <a href="{{ item.url }}" target="_blank" class="link-1">
                                    {{ item.name }}
                                </a>
                            </li>
                            {% endfor %}
                        </ul>
                    </section>
                    {% endif %}

                </div>
            </aside>
        </div>
    </div>
</section>

{% include ‘ulc_footer_template.html.j2’ %}

You’re not writing “juvenile” Jinja here at all. This is solid, straightforward template code:

You use include correctly for head/footer/navbar
You use for, if, loop.first, loop.index properly
You’re not doing anything scary like sprinkling |safe everywhere
Control flow is minimal and readable

So the only things I’d touch are structural / maintainability things that will matter as the site grows. Here are the ones that are actually worth considering.

1. Use a base layout with `{% extends %}` instead of full-page templates

Right now this template owns:

<!DOCTYPE html>, <html>, <head>, <body>
The ulc_head_template include
The navbar include
The footer include
The page-specific content

If every page looks like this, you’re repeating the outer chrome on each one.

A more “architectural” Jinja approach is:

`base.html.j2`

<!DOCTYPE html>
<html lang="en">
<head>
    {% include 'ulc_head_template.html.j2' %}
    {% block extra_head %}{% endblock %}
</head>
<body>

{% include 'ulc_navbar_template.html.j2' %}

<main id="main">
    {% block content %}{% endblock %}
</main>

<a href="#" class="back-to-top d-flex align-items-center justify-content-center">
    <i class="bi bi-arrow-up-short"></i>
</a>

{% include 'ulc_footer_template.html.j2' %}
</body>
</html>

Your project page: `project.html.j2`

{% extends "base.html.j2" %}

{% block extra_head %}
<style>
  @media (max-width: 512px) {
    #summary-group {
      flex-direction: column !important;
      gap: 16px;
    }
  }
  @media (min-width: 513px) {
    #side-info {
      border-left: 1px solid white;
      padding-left: 12px;
    }
  }
</style>
{% endblock %}

{% block content %}
<section class="ulc-project-page">
  <div class="container site-section">
    <!-- everything currently inside <main> goes here -->
  </div>
</section>
{% endblock %}

Why this is a real improvement, not bikeshedding:

All site-wide structure & includes live in one file.
You change navbar/footer or global <head> once.
Each page template only cares about its own content.
Makes adding new pages way cheaper cognitively.

This is the single biggest “grown-up Jinja” move.

2. Treat `project` as a single object instead of many loose variables

Right now the template takes separate variables:

project_title
project_summary
project_date
videos
images
paragraphs
commissioner, show_groups, collaborators

If this is a “project page”, it’s often nicer to pass one project object (dict or dataclass) and hang everything off that.

Instead of:

<h2>{{ project_title }}</h2>
<h6 class="h6">{{ project_date.year }}</h6>
<p>{{ project_summary }}</p>
{% for v in videos %}
...
{% endfor %}

Do:

<h2>{{ project.title }}</h2>
<h6 class="h6">{{ project.date.year }}</h6>
<p>{{ project.summary }}</p>

{% for v in project.videos %}
  ...
{% endfor %}

{% if project.images %}
  ...
{% endif %}

{% if project.commissioner %}
  ...
{% endif %}

Why this is architectural:

The template now expects a coherent “view model” (project) instead of random globals.
It’s trivial to pass that same object to other templates/partials.

Your Python side becomes clearer:

return render_template("project.html.j2", project=project)

rather than juggling a bunch of separate arguments.

This is especially nice if you ever reuse project in lists, previews, etc.

Your sidebar (commissioner, shows, credits) is already logically grouped and nicely guarded with {% if %}. It’s structurally good.

If this same sidebar lives on multiple templates (multiple project pages, maybe a “project list” detail, etc.), then it’s worth turning into its own piece.

Example as an include:

`project_sidebar.html.j2`

<aside class="col-md-auto">
  <div id="side-info" class="d-flex flex-column gap-3">

    {% if commissioner %}
    <section id="commissioner">
      <h3 class="mb-3">{{ commissioner.role }}</h3>
      <ul class="list-unstyled list-line">
        <li>
          <a href="{{ commissioner.url }}" target="_blank" class="link-1">
            {{ commissioner.name }}
          </a>
        </li>
      </ul>
    </section>
    {% endif %}

    {% if show_groups %}
    <section id="shows">
      <h3 class="mb-3">Shows &amp; Exhibitions</h3>
      <ul class="list-unstyled list-line">
        {% for show_group in show_groups %}
        <li class="mb-2">
          <div class="fw-bold">{{ show_group.year }}</div>
          {% for show in show_group.shows %}
          <div>
            <a href="{{ show.url }}" target="_blank" class="link-1">
              {{ show.name }}
            </a>
          </div>
          {% endfor %}
        </li>
        {% endfor %}
      </ul>
    </section>
    {% endif %}

    {% if collaborators %}
    <section id="credits">
      <h3 class="mb-3">Credits</h3>
      <ul class="list-unstyled mb-0 list-line d-flex flex-column gap-2">
        {% for item in collaborators %}
        <li>
          <div>{{ item.role }}</div>
          <a href="{{ item.url }}" target="_blank" class="link-1">
            {{ item.name }}
          </a>
        </li>
        {% endfor %}
      </ul>
    </section>
    {% endif %}

  </div>
</aside>

Then in your page:

{% include "project_sidebar.html.j2" %}

Why this is not over-optimizing:

If this layout is used once, leave it as-is.
If it’s used across many project pages, moving it into an include is foundational:
- You keep your page template focused.
- You have one place to tweak sidebar markup/structure.

4. Delete (or extract) the big commented-out alternate layout

At the bottom you’ve got an entire older version of the layout commented out. It’s clearly a previous iteration.

This is a subtle but real architectural issue:

It inflates the file length and cognitive load.
It mixes “current truth” with “fossil record”.
It makes future editing more error-prone (scrolling past 300 lines of dead code).

If you still care about that old layout:

Move it into a separate file: project_old_layout.html.j2 or just a Git branch.
Or rely on git history to resurrect it.

But inside a production template, this:

<!--    <section class="ulc-project-page">-->
...
<!--    </section>-->

is just noise and tech debt.

Tiny Jinja notes (that you’re already mostly doing right)

These aren’t “you must change this”, just confirming you’re using good patterns:

✅ Using loop.first in the carousel: that’s the idiomatic way.
✅ Guarding lists with {% if images %} / {% if collaborators %}.
✅ Passing objects like commissioner.role, commissioner.url, etc. (attribute access) – nice.

If you wanted to get fancy, you could turn the carousel into a macro, but that’s mostly ergonomics, not architecture.

TL;DR

Your Jinja is absolutely fine and not “baby mode.”

The only changes I’d call truly architectural and worthwhile:

Introduce a base.html.j2 and make this template {% extends %} it.
Pass a single project object instead of many loose variables.
Extract the sidebar to an include if it’s reused across pages.
Remove the huge commented-out old layout (or move it elsewhere).

If you do just #1 and #4, you’re already in “clean, grown-up Jinja project” territory.

the slowest part is github fetching. can it be sped up?

if I cache, how would I know if something changed?

from github import Github from website_utils.github_constants import * from website_utils.web_types import TeamMember, Collaborator, Nav, ProjectLite, ProjectV15, WebsiteInfo from pages_cms_utils import * from jinja2 import Environment, FileSystemLoader

if name == “main”: # Demonstration / manual test harness for local runs. # Connect to the GitHub API using the access token (from github_constants). g = Github(gh_access_token) repository = g.get_repo(f”{gh_owner}/{gh_repo}”)

print("Asking Github for Pages CSM content...")

# Create a ContentLoader instance and load the site and collections.
loader = GHContentLoader(repository)
print(f"Content Received.")

site = loader.load_site()
print(f"Main Site Info Received.")

projects = loader.load_projects()
print(f"({len(projects)}) Projects Received.")

#experiments = loader.load_experiments()
#print(f"({len(experiments)}) Experiments Received.")

collaborators = loader.load_collaborators()
print(f"({len(collaborators)}) Collaborators Received.")

team = loader.load_team()
print(f"({len(team)}) Team Received.")

#############################
# SITE INFO PARSING
#############################
webInfo = WebsiteInfo.from_pages_cms(site)

#############################
# TEAM PARSING
#############################
# convert raw dicts to application models (web_types).
team_member_objects = [TeamMember.from_pages_cms(tm) for tm in team]

#############################
# COLLABORATOR PARSING
#############################
# collaborator_objects = [Collaborator.from_pages_cms(c) for c in collaborators]

#############################
# PROJECT PARSING
#############################
# project_lite_objects = [ProjectLite.from_pages_cms(p) for p in projects]

full_projects = [ProjectV15.from_pages_cms(p) for p in projects]

env = Environment(loader=FileSystemLoader('../jinja_templates'))

project_html_pages = []
local_copy = True

for p in full_projects:

    html = p.to_html(env=env, web_info=webInfo, template_path='ulc_project__v15.html.j2')

    fn = f"{p.slug}_v15.html"

    # Save the rendered HTML to a file for testing purposes
    if local_copy:
        with open("../local/" + fn, "w", encoding="utf-8") as f:
            f.write(html)

Yeah, the slow bit here is always going to be “talking to GitHub over the network.” You can’t make that intrinsically fast, you can only:

Hit GitHub less often, and
Hit it in bigger, cheaper chunks.

Let’s go through practical options, and then answer your “if I cache, how do I know it changed?” question in a concrete way.

1. Biggest architectural win: stop using the GitHub API as your filesystem

Right now you’re doing:

g = Github(gh_access_token)
repository = g.get_repo(f"{gh_owner}/{gh_repo}")
loader = GHContentLoader(repository)
...
site = loader.load_site()
projects = loader.load_projects()
collaborators = loader.load_collaborators()
team = loader.load_team()

If GHContentLoader is pulling content out of repo files (YAML/JSON/Markdown) via the API, then you are basically using GitHub as a slow remote filesystem.

Much faster architecture:

Keep a local clone of that repo.
Point your content loader at the local filesystem instead of the GitHub API.
Use git pull when you actually want to update.

So instead of:

loader = GHContentLoader(repository)  # GitHub API

You refactor to:

loader = FSContentLoader(Path("../the_content_repo"))  # local clone

Then your “fetch” is just:

cd ../the_content_repo
git pull
python build_site.py

Git is already insanely optimized for “what changed?”, deltas, etc. You let it handle that, and your Python never waits on HTTP.

That’s the most fundamental architectural fix if this script is run on your machine / in CI where you can have a local checkout.

2. If you must use the GitHub API: cache by commit SHA

Now, answering your exact question:

if I cache, how would I know if something changed?

Don’t try to diff all content. Let Git tell you. Every change to your CMS files produces a new commit SHA.

Example idea:

Decide where your CMS content lives in the repo, e.g. cms/ or content/.
Compute “version of content” as latest commit touching that path.
Store that SHA next to your cache.
On next run: if SHA is the same, skip GitHub and use the cache.

Step 1: get “content version” from GitHub

Using PyGithub:

def get_content_sha(repo, content_path):
    # latest commit that touched content_path
    commits = repo.get_commits(path=content_path)
    latest = commits[0]   # first one is latest
    return latest.sha

Call it with the directory where your Pages CMS data lives:

content_sha = get_content_sha(repository, "cms")  # or "content", etc.

Step 2: store cache + SHA

Let’s say you cache the raw dicts you get from GHContentLoader:

import json
from pathlib import Path

CACHE_META = Path("cms_cache_meta.json")
CACHE_DATA = Path("cms_cache_data.json")

On a “slow” run where you actually hit GitHub:

loader = GHContentLoader(repository)
site = loader.load_site()
projects = loader.load_projects()
collaborators = loader.load_collaborators()
team = loader.load_team()

CACHE_DATA.write_text(json.dumps({
    "site": site,
    "projects": projects,
    "collaborators": collaborators,
    "team": team,
}, default=str), encoding="utf-8")

CACHE_META.write_text(json.dumps({
    "content_sha": content_sha,
}, indent=2), encoding="utf-8")

Step 3: reuse cache if SHA matches

At the top of your script:

def load_cached_content():
    meta = json.loads(CACHE_META.read_text(encoding="utf-8"))
    data = json.loads(CACHE_DATA.read_text(encoding="utf-8"))
    return meta, data

if CACHE_META.exists() and CACHE_DATA.exists():
    cached_meta, cached_data = load_cached_content()
else:
    cached_meta, cached_data = None, None

current_sha = get_content_sha(repository, "cms")

if cached_meta and cached_meta.get("content_sha") == current_sha:
    print("No content changes on GitHub, using cache.")
    site = cached_data["site"]
    projects = cached_data["projects"]
    collaborators = cached_data["collaborators"]
    team = cached_data["team"]
else:
    print("Content changed (or no cache), fetching from GitHub...")
    loader = GHContentLoader(repository)
    site = loader.load_site()
    projects = loader.load_projects()
    collaborators = loader.load_collaborators()
    team = loader.load_team()
    # write cache as above

Now you’re only doing the slow GitHub crawl when the content actually changed.

You can make this more granular (separate SHA for projects, team, etc.) but the principle is the same.

3. Also worth checking: are you doing too many small API calls?

Inside GHContentLoader, if you’re doing things like:

get_contents() per file
one request per project
one request per collaborator

you’ll get hammered by latency.

You can structurally speed it up by:

Listing directories once and fetching everything in that subtree.
Using the tree API or GraphQL to get a whole directory in one go.
Parallelizing requests with concurrent.futures.ThreadPoolExecutor (PyGithub is I/O-bound, so threads help).

But honestly, step 1 (local clone) + step 2 (SHA cache) are the real architectural wins. Everything else is micro-optimization by comparison.

TL;DR

Best fix: treat GitHub as a git remote, not as a content API. Clone locally, read files from disk.
If you must hit the API:
cache your content and tag it with latest commit SHA touching the CMS folder.
On the next run, compare SHA:
- SHA same → use cache, don’t call GHContentLoader.
- SHA changed → refresh from GitHub and update cache.

That gives you “I know if something changed” without diffing every file, and it turns most runs into “no network, just render.”

what changes need to happen to make an FSContentLoader?

class GHContentLoader: """High‑level loader for GitHub Pages CMS content.

This class wraps PyGithub calls with caching, minimal traversal logic,
and robust parsing for YAML and Markdown with YAML front matter.

Typical flow:
    g = Github(gh_token)
    repo = g.get_repo("owner/name")
    loader = GHContentLoader(repo)
    site = loader.load_site()
    projects = loader.load_projects()

All public methods return plain Python dicts/lists to keep this loader
decoupled from application‑specific models.
"""

def __init__(self, repo, paths: Optional[GHContentPaths] = None, ref: Optional[str] = None, max_retries: int = 3):
    """Initialize the content loader.

    Args:
        repo: A PyGithub Repository instance.
        paths: Optional path configuration. Defaults to GHContentPaths().
        ref: Optional git reference (branch/tag/SHA). Defaults to paths.ref.
        max_retries: Number of attempts for transient API errors.
    """
    self.repo = repo
    self.paths = paths or GHContentPaths()
    self.ref = ref or self.paths.ref
    self.max_retries = max_retries
    # Caches to minimize GitHub API calls within a single process lifetime.
    self._cache_listdir: Dict[str, List[str]] = {}
    self._cache_filetext: Dict[str, str] = {}

# ---------- public API ----------
def load_site(self) -> Dict[str, Any]:
    """Load and parse the site configuration file.

    Returns:
        A mapping of the site.yml content. If site.yml were mistakenly a
        Markdown file with YAML front matter, any remaining body would be
        placed under the "body" key for completeness.

    Raises:
        ValueError: If YAML is syntactically invalid or root is not a map.
        github.GithubException: On non‑transient API errors.
    """
    path = f"{self.paths.root_dir}/{self.paths.site_file}"
    text = self._fetch_file_text(path)
    data, body = self._parse_front_matter_or_yaml_text(text, path)
    if body is not None:
        # Preserve body if present (unexpected for site.yml but harmless).
        data.setdefault("body", body)
    return data

def load_projects(self) -> List[Dict[str, Any]]:
    """Load the "projects" collection as a list of dicts."""
    return self._load_collection(self.paths.projects_dir)

def load_team(self) -> List[Dict[str, Any]]:
    """Load the "team" collection as a list of dicts."""
    return self._load_collection(self.paths.team_dir)

def load_collaborators(self) -> List[Dict[str, Any]]:
    """Load the "collaborators" collection as a list of dicts."""
    return self._load_collection(self.paths.collaborators_dir)

def load_experiments(self) -> List[Dict[str, Any]]:
    """Load the "collaborators" collection as a list of dicts."""
    return self._load_collection(self.paths.experiments_dir)

# ---------- internals ----------
def _load_collection(self, subdir: str) -> List[Dict[str, Any]]:
    """Load a collection directory of YAML/MD/MDX items.

    Args:
        subdir: Name of the subdirectory under root_dir to load.

    Returns:
        A list of item dictionaries. Each item will have:
        - slug: Either provided in front matter/YAML or derived from filename.
        - _path: The repository path to the source file (for debugging).
        - body: For Markdown items, the content after front matter.
    """
    base = f"{self.paths.root_dir}/{subdir}"
    files = self._list_files(base, exts=(".yml", ".yaml", ".md", ".mdx"))
    items: List[Dict[str, Any]] = []

    for f in files:
        text = self._fetch_file_text(f)
        data, body = self._parse_front_matter_or_yaml_text(text, f)
        # Ensure a stable slug exists even if not set explicitly.
        data.setdefault("slug", data.get("slug") or self._slug_from_filename(f))
        # Keep source path for traceability and debugging.
        data["_path"] = f
        # Preserve body for Markdown content if not overridden in front matter.
        if body is not None and "body" not in data:
            data["body"] = body
        items.append(data)
    return items

def _list_files(self, dir_path: str, exts: Tuple[str, ...]) -> List[str]:
    """List all files under a directory (recursively) with matching extensions.

    Uses the GitHub Contents API which returns either a list (for
    directories) or a single object (for files). This function normalizes
    that into a flat list of file paths and recurses into subdirectories.

    Results are memoized for the process lifetime to reduce API calls.

    Args:
        dir_path: The repository path to list.
        exts: Tuple of lowercase filename extensions to include.

    Returns:
        A sorted list of repository file paths.
    """
    if dir_path in self._cache_listdir:
        return list(self._cache_listdir[dir_path])

    files: List[str] = []
    try:
        contents = self._safe_get_contents(dir_path)
    except Exception:
        # Non‑existent folders or unexpected errors are treated as empty.
        return files

    # Depth‑first traversal using an explicit stack; the API returns either
    # a single object or a list, so normalize to a list for iteration.
    stack = contents if isinstance(contents, list) else [contents]
    while stack:
        entry = stack.pop()
        if getattr(entry, "type", None) == "dir":
            try:
                sub = self._safe_get_contents(entry.path)
            except Exception:
                # Skip directories that intermittently fail to list.
                continue
            stack.extend(sub if isinstance(sub, list) else [sub])
        elif getattr(entry, "type", None) == "file" and entry.path.lower().endswith(exts):
            files.append(entry.path)

    files.sort()
    self._cache_listdir[dir_path] = list(files)
    return files

def _safe_get_contents(self, path: str):
    """Wrapper around repo.get_contents with simple exponential backoff.

    Retries common transient error codes. On final failure, re‑raises the
    last caught exception for clarity.
    """
    delay = 0.5
    last_exc = None
    for _ in range(self.max_retries):
        try:
            return self.repo.get_contents(path, ref=self.ref)
        except github.GithubException as e:
            last_exc = e
            # 403/429: rate limiting; 5xx: transient server errors.
            if getattr(e, 'status', None) in (403, 429, 500, 502, 503):
                time.sleep(delay)
                delay *= 2
                continue
            # Other GithubException: re‑raise immediately.
            raise
        except Exception as e:
            # Network/transport or unexpected errors — also retry briefly.
            last_exc = e
            time.sleep(delay)
            delay *= 2
    if last_exc:
        # Surface the last failure to the caller after exhausting retries.
        raise last_exc
    return None

def _fetch_file_text(self, path: str) -> str:
    """Fetch file contents as text, with caching and robust decoding.

    The GitHub API typically provides decoded_content as bytes; however,
    fallbacks are present for edge cases where manual base64 decoding is
    necessary. Results are cached by path for the process lifetime.
    """
    if path in self._cache_filetext:
        return self._cache_filetext[path]

    cf = self._safe_get_contents(path)
    try:
        data = cf.decoded_content
        if isinstance(data, bytes):
            # utf-8-sig handles optional BOM transparently.
            text = data.decode("utf-8-sig")
        else:
            # Some PyGithub versions may already provide a str.
            text = str(data)
    except Exception:
        # Fallback to manual base64 decode if needed
        if isinstance(getattr(cf, "content", None), str):
            try:
                text = base64.b64decode(cf.content).decode("utf-8-sig")
            except Exception:
                # As a last resort, decode with replacement to avoid crashes.
                text = base64.b64decode(cf.content).decode("utf-8", errors="replace")
        else:
            text = cf.decoded_content.decode("utf-8", errors="replace")

    self._cache_filetext[path] = text
    return text

def _parse_front_matter_or_yaml_text(self, text: str, path_hint: str) -> Tuple[Dict[str, Any], Optional[str]]:
    """Parse text as YAML (for .yml/.yaml) or MD with optional front matter.

    Args:
        text: Raw file text from the repository.
        path_hint: File path used to decide YAML vs Markdown parsing.

    Returns:
        A tuple of (data, body). For YAML files, body is None. For Markdown
        without front matter, data is an empty dict and body is the text.
    """
    if path_hint.lower().endswith((".yml", ".yaml")):
        return self._load_yaml_str(text), None

    # Markdown with optional YAML front matter
    m = _YFM_RE.match(text)
    if m:
        fm_yaml, body = m.group(1), m.group(2)
        return self._load_yaml_str(fm_yaml), body
    return {}, text  # no front matter; whole file is body

@staticmethod
def _load_yaml_str(s: str) -> Dict[str, Any]:
    """Load a YAML string into a dict with helpful validation.

    Returns an empty dict for empty YAML. Raises a ValueError if the root
    is not a mapping object to prevent accidental list/str roots.
    """
    try:
        data = yaml.safe_load(s)
    except yaml.YAMLError as e:
        raise ValueError(f"Invalid YAML: {e}") from e
    if data is None:
        data = {}
    if not isinstance(data, dict):
        raise ValueError("YAML root must be a mapping/object.")
    return data

@staticmethod
def _slug_from_filename(path: str) -> str:
    """Derive a URL‑safe slug from a repository file path.

    Example: "content/projects/Sunrise Drive.yml" -> "sunrise-drive".
    Falls back to the base filename (sans extension) if sanitization yields
    an empty string.
    """
    # e.g., content/projects/sunrise drive.yml -> sunrise-drive
    import posixpath
    base = posixpath.basename(path)
    name = base.rsplit(".", 1)[0]
    slug = re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")
    return slug or name

Good news: you don’t need to rewrite your whole world. You mostly just need to:

Factor out the GitHub-agnostic bits into a base class, and
Implement a second loader that reads from the filesystem instead of the GitHub API.

Everything that calls loader.load_site(), .load_projects(), etc. can stay the same.

Below is a concrete “this actually compiles” sketch.

1. Extract a `BaseContentLoader`

All the parsing / collection logic you already have is reusable. Only _list_files and _fetch_file_text are GH-specific.

Create a base class that contains:

load_site, load_projects, load_team, load_collaborators, load_experiments
_load_collection
_parse_front_matter_or_yaml_text
_load_yaml_str
_slug_from_filename

and leaves two abstract-ish methods for subclasses:

_list_files(dir_path, exts)
_fetch_file_text(path)

from typing import Any, Dict, List, Optional, Tuple
import re
import yaml

_YFM_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n(.*)$", re.DOTALL)

class BaseContentLoader:
    def __init__(self, paths: Optional[GHContentPaths] = None):
        self.paths = paths or GHContentPaths()
        # Optional caches (work for both GH and FS)
        self._cache_listdir: Dict[str, List[str]] = {}
        self._cache_filetext: Dict[str, str] = {}

    # ---------- public API (shared) ----------

    def load_site(self) -> Dict[str, Any]:
        path = f"{self.paths.root_dir}/{self.paths.site_file}"
        text = self._fetch_file_text(path)
        data, body = self._parse_front_matter_or_yaml_text(text, path)
        if body is not None:
            data.setdefault("body", body)
        return data

    def load_projects(self) -> List[Dict[str, Any]]:
        return self._load_collection(self.paths.projects_dir)

    def load_team(self) -> List[Dict[str, Any]]:
        return self._load_collection(self.paths.team_dir)

    def load_collaborators(self) -> List[Dict[str, Any]]:
        return self._load_collection(self.paths.collaborators_dir)

    def load_experiments(self) -> List[Dict[str, Any]]:
        return self._load_collection(self.paths.experiments_dir)

    # ---------- shared internals ----------

    def _load_collection(self, subdir: str) -> List[Dict[str, Any]]:
        base = f"{self.paths.root_dir}/{subdir}"
        files = self._list_files(base, exts=(".yml", ".yaml", ".md", ".mdx"))
        items: List[Dict[str, Any]] = []

        for f in files:
            text = self._fetch_file_text(f)
            data, body = self._parse_front_matter_or_yaml_text(text, f)
            data.setdefault("slug", data.get("slug") or self._slug_from_filename(f))
            data["_path"] = f
            if body is not None and "body" not in data:
                data["body"] = body
            items.append(data)
        return items

    # these two are “abstract” – subclasses implement them
    def _list_files(self, dir_path: str, exts: Tuple[str, ...]) -> List[str]:
        raise NotImplementedError

    def _fetch_file_text(self, path: str) -> str:
        raise NotImplementedError

    # ---------- parsing helpers ----------

    def _parse_front_matter_or_yaml_text(
        self, text: str, path_hint: str
    ) -> Tuple[Dict[str, Any], Optional[str]]:
        if path_hint.lower().endswith((".yml", ".yaml")):
            return self._load_yaml_str(text), None

        m = _YFM_RE.match(text)
        if m:
            fm_yaml, body = m.group(1), m.group(2)
            return self._load_yaml_str(fm_yaml), body
        return {}, text

    @staticmethod
    def _load_yaml_str(s: str) -> Dict[str, Any]:
        try:
            data = yaml.safe_load(s)
        except yaml.YAMLError as e:
            raise ValueError(f"Invalid YAML: {e}") from e
        if data is None:
            data = {}
        if not isinstance(data, dict):
            raise ValueError("YAML root must be a mapping/object.")
        return data

    @staticmethod
    def _slug_from_filename(path: str) -> str:
        import posixpath
        base = posixpath.basename(path)
        name = base.rsplit(".", 1)[0]
        slug = re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")
        return slug or name

Your existing GHContentLoader then becomes:

class GHContentLoader(BaseContentLoader)
and just implements _list_files and _fetch_file_text plus its GitHub-specific helpers.

You can basically paste your existing methods into that subclass with minimal tweaks.

2. Implement `FSContentLoader` (the new thing you want)

Now we add a filesystem implementation that uses os.walk / pathlib instead of GitHub.

Key ideas:

root_path = where your local content repo is checked out.
paths.root_dir still holds "content" or whatever subfolder inside the repo.
_list_files turns the GH-style path ("content/projects") into a real filesystem path (root_path / "content/projects"), walks it, and returns repo-style relative paths (so everything else continues to work).
_fetch_file_text just opens the local file.

import os
from pathlib import Path
from typing import Tuple, List

class FSContentLoader(BaseContentLoader):
    """Load Pages CMS content from a local filesystem checkout instead of GitHub."""

    def __init__(self, root_path: Path | str, paths: Optional[GHContentPaths] = None):
        super().__init__(paths=paths)
        self.root_path = Path(root_path).resolve()

    def _list_files(self, dir_path: str, exts: Tuple[str, ...]) -> List[str]:
        """List files under dir_path (relative to root_path) with given extensions."""
        # dir_path is e.g. "content/projects"
        full_dir = self.root_path / dir_path  # okay even if dir_path contains "/"
        if not full_dir.exists():
            return []

        cache_key = str(full_dir)
        if cache_key in self._cache_listdir:
            # Return previously computed (repo-style) paths.
            return list(self._cache_listdir[cache_key])

        files: List[str] = []
        for root, _, filenames in os.walk(full_dir):
            for name in filenames:
                if not name.lower().endswith(exts):
                    continue
                fpath = Path(root) / name
                # convert back to repo-style path like "content/projects/foo.yml"
                rel = fpath.relative_to(self.root_path)
                # normalized to posix separators so _slug_from_filename etc. behave the same
                files.append(rel.as_posix())

        files.sort()
        self._cache_listdir[cache_key] = list(files)
        return files

    def _fetch_file_text(self, path: str) -> str:
        """Read a file from disk, given a repo-style relative path."""
        if path in self._cache_filetext:
            return self._cache_filetext[path]

        full_path = self.root_path / path
        text = full_path.read_text(encoding="utf-8-sig")
        self._cache_filetext[path] = text
        return text

Usage:

# instead of github / PyGithub for local runs:
from pathlib import Path

content_root = Path("../my-pages-cms-repo")   # local clone
fs_loader = FSContentLoader(content_root)

site = fs_loader.load_site()
projects = fs_loader.load_projects()
collaborators = fs_loader.load_collaborators()
team = fs_loader.load_team()

Nothing else in your rendering pipeline (the ProjectV15.from_pages_cms, WebsiteInfo.from_pages_cms, Jinja stuff) needs to change, because the loader returns the same dict shape.

3. Minimal changes to your existing GH loader

To hook this up cleanly, your GHContentLoader just needs to extend the base:

class GHContentLoader(BaseContentLoader):
    def __init__(self, repo, paths: Optional[GHContentPaths] = None,
                 ref: Optional[str] = None, max_retries: int = 3):
        super().__init__(paths=paths)
        self.repo = repo
        self.ref = ref or self.paths.ref
        self.max_retries = max_retries
        # caches already defined in base; you can keep using them

Then:

Keep your _safe_get_contents exactly as is.
Keep your _fetch_file_text exactly as is (it will override the abstract one).
Keep your _list_files exactly as is (same).

All the public methods (load_site, load_projects, etc.) now come from BaseContentLoader and work identically for both loaders.

4. How this helps with “GitHub is slow”

Once FSContentLoader exists, your “local build” flow can be:

Once in a while (or in CI): git pull the content repo.
Run your generator with FSContentLoader instead of GHContentLoader. No HTTP, no API limits, no latency.
Only use GHContentLoader when you really need to fetch from GitHub directly (e.g., some admin tool, or a one-off sync job).

Architecturally, you’ve separated:

Where the data comes from (GitHub API vs local checkout)
How the data is parsed and used (your existing code)

…which is exactly the kind of change that is worth doing.

how to implement them?

def _list_files(self, dir_path: str, exts: Tuple[str, ...]) -> List[str]:
    raise NotImplementedError

def _fetch_file_text(self, path: str) -> str:
    raise NotImplementedError

Nice thing is: you’ve already implemented them once (for GitHub) — those are your _list_files and _fetch_file_text inside GHContentLoader.

What you’re missing is the filesystem version for FSContentLoader. Here’s exactly how to do that.

1. Filesystem implementation of `_list_files` and `_fetch_file_text`

Assuming you’ve made a BaseContentLoader with those two abstract methods, here’s a concrete FSContentLoader:

import os
from pathlib import Path
from typing import List, Tuple, Optional

class FSContentLoader(BaseContentLoader):
    """Load Pages CMS content from a local filesystem checkout.

    root_path points at the local clone of the repo.
    All paths from GHContentPaths (root_dir, projects_dir, etc.)
    are relative to this root_path.
    """

    def __init__(self, root_path: str | Path, paths: Optional[GHContentPaths] = None):
        super().__init__(paths=paths)
        self.root_path = Path(root_path).resolve()

    def _list_files(self, dir_path: str, exts: Tuple[str, ...]) -> List[str]:
        """List files under dir_path (relative to root_path) with given extensions.

        Returns repo-style relative paths like "content/projects/foo.yml",
        so the rest of the code can stay identical to the GitHub version.
        """
        full_dir = self.root_path / dir_path  # e.g. /.../repo/content/projects

        if not full_dir.exists():
            return []

        cache_key = str(full_dir)
        if cache_key in self._cache_listdir:
            return list(self._cache_listdir[cache_key])

        files: List[str] = []
        for root, _, filenames in os.walk(full_dir):
            for name in filenames:
                if not name.lower().endswith(exts):
                    continue
                fpath = Path(root) / name
                # Convert back to a path relative to root_path
                rel = fpath.relative_to(self.root_path)
                # Normalize to POSIX-style "content/projects/foo.yml"
                files.append(rel.as_posix())

        files.sort()
        self._cache_listdir[cache_key] = list(files)
        return files

    def _fetch_file_text(self, path: str) -> str:
        """Read a file from disk, given a repo-style relative path."""
        if path in self._cache_filetext:
            return self._cache_filetext[path]

        full_path = self.root_path / path  # /.../repo/content/site.yml etc.
        text = full_path.read_text(encoding="utf-8-sig")
        self._cache_filetext[path] = text
        return text

Usage:

from pathlib import Path

content_root = Path("../your-pages-cms-repo")  # local clone
loader = FSContentLoader(content_root)

site = loader.load_site()
projects = loader.load_projects()
collaborators = loader.load_collaborators()
team = loader.load_team()

Everything else (your WebsiteInfo.from_pages_cms, ProjectV15.from_pages_cms, and Jinja rendering) stays unchanged, because FSContentLoader returns the same dict shape as GHContentLoader.

2. For completeness: your GitHub implementations already exist

In GHContentLoader, you already have:

_list_files that walks GitHub contents API
_fetch_file_text that decodes decoded_content

When you subclass from BaseContentLoader, those are your implementations:

class GHContentLoader(BaseContentLoader):
    def __init__(self, repo, paths: Optional[GHContentPaths] = None,
                 ref: Optional[str] = None, max_retries: int = 3):
        super().__init__(paths=paths)
        self.repo = repo
        self.ref = ref or self.paths.ref
        self.max_retries = max_retries

    # keep your existing _list_files and _fetch_file_text here

So the only new code you needed is the FSContentLoader block above.

You can do this with super-minimal changes: just wrap each step with a timer and print the delta.

Here’s your __main__ block rewritten with timing around each major step:

import time
from github import Github
from website_utils.github_constants import *
from website_utils.web_types import TeamMember, Collaborator, Nav, ProjectLite, ProjectV15, WebsiteInfo
from pages_cms_utils import *
from jinja2 import Environment, FileSystemLoader

if __name__ == "__main__":
    t0 = time.perf_counter()

    # -------------------------------------------------
    # GitHub: connect + repo
    # -------------------------------------------------
    t = time.perf_counter()
    g = Github(gh_access_token)
    repository = g.get_repo(f"{gh_owner}/{gh_repo}")
    print(f"[timing] GitHub: connect + get_repo: {time.perf_counter() - t:.3f}s")

    print("Asking Github for Pages CSM content...")

    # -------------------------------------------------
    # GitHub: fetch raw CMS data via GHContentLoader
    # -------------------------------------------------
    t = time.perf_counter()
    loader = GHContentLoader(repository)

    t_site = time.perf_counter()
    site = loader.load_site()
    print(f"[timing] loader.load_site(): {time.perf_counter() - t_site:.3f}s")

    t_projects = time.perf_counter()
    projects = loader.load_projects()
    print(f"[timing] loader.load_projects(): {time.perf_counter() - t_projects:.3f}s")

    t_collab = time.perf_counter()
    collaborators = loader.load_collaborators()
    print(f"[timing] loader.load_collaborators(): {time.perf_counter() - t_collab:.3f}s")

    t_team = time.perf_counter()
    team = loader.load_team()
    print(f"[timing] loader.load_team(): {time.perf_counter() - t_team:.3f}s")

    print(f"[timing] Total GHContentLoader fetch: {time.perf_counter() - t:.3f}s")
    print("Content Received.")

    # -------------------------------------------------
    # Parsing: raw dicts -> app models
    # -------------------------------------------------
    t = time.perf_counter()

    webInfo = WebsiteInfo.from_pages_cms(site)
    print(f"[timing] WebsiteInfo.from_pages_cms: {time.perf_counter() - t:.3f}s")

    t_team_parse = time.perf_counter()
    team_member_objects = [TeamMember.from_pages_cms(tm) for tm in team]
    print(f"[timing] TeamMember.from_pages_cms (all): {time.perf_counter() - t_team_parse:.3f}s")

    t_proj_parse = time.perf_counter()
    full_projects = [ProjectV15.from_pages_cms(p) for p in projects]
    print(f"[timing] ProjectV15.from_pages_cms (all): {time.perf_counter() - t_proj_parse:.3f}s")

    # -------------------------------------------------
    # Jinja: env + render all project pages
    # -------------------------------------------------
    t = time.perf_counter()
    env = Environment(loader=FileSystemLoader('../jinja_templates'))
    print(f"[timing] Jinja Environment creation: {time.perf_counter() - t:.3f}s")

    project_html_pages = []
    local_copy = True

    t_render_all = time.perf_counter()
    for p in full_projects:
        t_render = time.perf_counter()
        html = p.to_html(env=env, web_info=webInfo, template_path='ulc_project__v15.html.j2')
        render_dt = time.perf_counter() - t_render

        fn = f"{p.slug}_v15.html"

        if local_copy:
            t_write = time.perf_counter()
            with open("../local/" + fn, "w", encoding="utf-8") as f:
                f.write(html)
            write_dt = time.perf_counter() - t_write
        else:
            write_dt = 0.0

        print(f"[timing] Render+write {fn}: render={render_dt:.3f}s, write={write_dt:.3f}s")

    print(f"[timing] Total render loop (all projects): {time.perf_counter() - t_render_all:.3f}s")

    # -------------------------------------------------
    # Overall
    # -------------------------------------------------
    print(f"[timing] TOTAL script time: {time.perf_counter() - t0:.3f}s")

What this gives you:

Per-step timings:
- GitHub connect + get_repo
- loader.load_site/projects/collaborators/team
- model conversions (WebsiteInfo, TeamMember, ProjectV15)
- Jinja env creation
- Per-project render + file write
Total GH fetching time and total render time
Overall script runtime

Once you run this a couple of times you’ll immediately see if:

GitHub API calls dominate (likely),
parsing/model conversion is negligible,
or Jinja rendering / disk IO is the bottleneck.

Cool, this is a pretty clean setup already. Let’s walk through the pipeline and only touch the parts where there’s a real win in simplicity / speed / security.

Your current workflow:

Public GitHub repo → GitHub Pages serves the built site
Cloudflare in front → HTTPS, caching, some security
Pages CMS in the same repo → edits content in that repo
Separate private repo → Python script pulls content from public repo, renders static HTML, pushes back to public repo → GitHub Pages rebuild

1. Biggest simplification: move the build script into the site repo + use GitHub Actions

Right now, the private repo adds:

Extra repo to manage
Extra credentials to manage (PAT with write access to site repo)
Mental overhead (“where do I change what?”)

Simpler, and more secure:

Put the Python build script + its deps in the website repo (or a /tools or /scripts folder).
Add a GitHub Actions workflow that:
1. Runs on push to main (or on changes under cms/ or whatever directory Pages CMS writes to),
2. Checks out the repo,
3. Runs your Python generator,
4. Commits generated HTML back to a gh-pages or docs branch if needed (or to the same branch if you keep generated files in-tree).

Why this is a win:

No personal PAT needed for CI. Actions come with a short-lived GITHUB_TOKEN with repo-scoped access.
One repo instead of two.
Everything (content, generator, templates) is versioned together.
Reproducible builds: anyone cloning the repo + running the workflow locally gets the same result.

If you like having generated HTML not committed to the same branch (which is fair), you can:

Use one branch for source + CMS files (e.g. main).
Have the Action push built static files to a publish branch (e.g. gh-pages) which GitHub Pages uses as the source.

2. Speed: stop reading content via the GitHub API

Given your current script + earlier code: your GHContentLoader walks the repo via the GitHub API. That’s always going to be slower and more fragile than just reading files from disk.

Once you run the build as a GitHub Action (or even locally), you get a full checkout of the repo for free.

So:

Use the FSContentLoader idea (the filesystem loader we sketched earlier):
- _list_files uses os.walk on the checked-out repo,
- _fetch_file_text just opens files from disk.
Keep GHContentLoader only for rare tooling or debugging if you want, but for actual builds, run on a local clone.

Result:

No HTTP requests to GitHub during the build.
Build speed depends only on disk speed + Python/Jinja, which is usually very fast for a static site.

If you still need to run the build from your own machine sometimes, same story: clone the repo once, then point the script at the local path; no need to fetch via API each run.

3. Optional speed tweak: only rebuild when content actually changes

Depending on how heavy your Python/Jinja step is, you could:

Rebuild everything on every push (often fine), or
Cache and do change detection if builds are slow.

But before going into clever caching, you get a ton of speed already by:

Moving off the GitHub API,
Running in CI,
Letting Cloudflare cache the built pages aggressively.

I’d only bother with “incremental build” logic if you start feeling pain.

4. Security review of each piece

GitHub Pages + public repo

Totally fine for a static site.
Just make sure:
- No secrets or API keys live in the repo.
- If you have any config that touches private services, use environment variables in Actions, not committed files.

Cloudflare in front

Good choice. A few easy wins:
- Force HTTPS.
- Turn on basic security (WAF, bot fight mode) if you care.
- Use caching rules so static assets + HTML are cached aggressively, unless there’s some reason not to.

Pages CMS in the repo

Storing CMS content as files in the repo is actually a nice model:
- You get version history for content.
- CMS cannot touch production runtime, it just edits text.
Security depends mostly on the CMS integration:
- Make sure it uses OAuth / GitHub App with minimal scopes.
- Ideally it only has access to that single repo.

Nothing obviously wrong there.

Separate private build repo (current)

This is the part most worth changing:

You either:
- Run it locally with your own credentials → fine but manual, or
- Run it somewhere else (CI, server) with a long-lived PAT that has access to push to the public repo → bigger attack surface.

Moving the build in-repo and running via GitHub Actions lets you:

Kill the long-lived PAT.
Avoid managing cross-repo auth.
Keep secrets in GitHub Actions’ secrets store instead of scattered.

5. If you keep the current 2-repo design for a while…

If for some reason you don’t want to change architecture yet, a few hardening tweaks:

Use a fine-grained PAT or deploy key scoped only to that one repo, with only the permissions needed (probably contents: write).
Rotate that token occasionally.
Make sure the script uses a local checkout of the content repo instead of GitHub API calls every run (clone once, then git pull + run FSContentLoader).

That alone will speed things up and reduce complexity, even without moving to Actions.

TL;DR: top 3 worthwhile changes

If I boil it down to the non-bike-shedding improvements:

Move the Python generator into the website repo and run it via GitHub Actions
→ simpler mental model, no private build repo needed, no long-lived PAT.
Switch from GitHub API loader → filesystem loader (FSContentLoader)
→ big speed win, fewer moving parts.
Let Cloudflare + GitHub handle the rest
→ no need to over-optimize unless your site or traffic gets huge.

Everything else is detail. Your overall architecture is already sane; it just wants that one consolidation step to become both simpler and more secure.

Think of “staging” as just another static site with:

its own domain / subdomain
its own deploy target
and an access-control layer in front

Your current setup already has all the pieces; you just need to double them and put a lock on the staging one.

What “staging but not public” really means

You want:

A URL like staging.yourdomain.com
Served as a normal website (full internet access, same assets, etc.)
But:
- not indexed
- and only visible to you / clients / specific people

That’s two separate concerns:

Where is staging hosted? (GitHub Pages, Cloudflare Pages, Netlify, S3, etc.)
How is access restricted? (Cloudflare Access, basic auth, IP allowlist, etc.)

You already use Cloudflare, which makes #2 pretty nice.

A concrete version close to your current workflow

Right now:

Public repo → GitHub Pages → Cloudflare → www.yoursite.com
Python builder in another repo → pushes built HTML to that public repo

To add staging cleanly:

1. Add a staging site host

You have a few options that all work with your Python-generated static output:

Option A – second GitHub Pages site (easy mental model)

Create a second repo: yourname/your-site-staging
That repo holds only generated HTML (just like prod), or source+generated if you want.
Configure GitHub Pages for it (e.g. from main or docs/ folder).
Now you have yourname.github.io/your-site-staging.

Option B – different branch in same repo

Keep one repo, but:
- main → prod (current)
- staging → staging content
Use GitHub Actions to build on staging pushes and deploy to a separate folder/branch used as “staging site”.
This is a bit fiddlier with GitHub Pages because one repo typically maps to one Pages site, so I’d usually lean to Option A or an external static host unless you’re comfortable wrestling it.

Option C – different host entirely

Host staging on:
- Cloudflare Pages
- Netlify
- S3 + CloudFront
Your Python script just outputs dist/ and a CI pipeline uploads it there.

Since you’re already using GitHub Pages + Cloudflare, I’d go Option A: second repo for staging. It keeps the mental model clear.

2. Put Cloudflare in front with access control

Set up a new subdomain in Cloudflare:

staging.yoursite.com → CNAME to the staging host (yourname.github.io or CF Pages/Netlify hostname).

Then secure it:

Use Cloudflare Access (Zero Trust) to require:
- login with Google/Microsoft/GitHub/etc, or
- a one-time PIN to email, or
- some specific IPs
At minimum, enable HTTP basic auth or some access policy so it’s not world-readable.

Also:

Add robots.txt on staging with Disallow: / so search engines don’t index it (defense in depth).

Now you have:

www.yoursite.com → prod (public)
staging.yoursite.com → same kind of site, but behind Cloudflare Access

3. Hook your Python builder into this

Right now the script:

reads CMS from repo
renders static HTML
pushes to website repo → triggers publish

You just give it two targets:

Target “prod”:
  - output → prod repo (or prod branch)
  - commit & push → GitHub Pages → Cloudflare → www.yoursite.com

Target “staging”:
  - output → staging repo (or staging branch)
  - commit & push → GitHub Pages (or CF Pages/Netlify) → Cloudflare → staging.yoursite.com

Implementation-wise you can:

Add a CLI flag to the script, e.g. --env prod / --env staging, that:
- chooses the output path
- chooses which repo/remote to push to
Or better: move the build into GitHub Actions and have two workflows:
- deploy-prod.yml: runs on push to main, builds, deploys to prod repo/branch
- deploy-staging.yml: runs on push to staging, builds, deploys to staging repo/branch

No secrets exposed; Actions gets a per-repo GITHUB_TOKEN for pushes.

Summary of changes that are actually worth it

To get “fully www-connected but not public” staging:

Create a separate staging deploy target
– easiest: a second GitHub Pages site in another repo (your-site-staging).
Put a staging subdomain behind Cloudflare
– staging.yoursite.com CNAME to that site.
Lock it down with Cloudflare Access / auth
– so only allowed users can see it, and add robots.txt with Disallow: /.
Teach your build pipeline about environments
– script or CI workflows that can deploy to either prod or staging depending on branch/flag.

That gives you a real staging site: same tech stack, same connectivity, but safely hidden from the general public.