3.6 — AI, Machine Learning & Algorithmic Governance

Visual anchors

These images are placed as “pattern locks” for: reinforcement loops, objective functions, and governance-as-optimization.

Substrate shift

Metrics → policy

Optimization regime

Reinforcement learning loop diagram (Sutton & Barto)

RL loop (Sutton & Barto)

0. Frame: AI as Substrate Replacement, Not “Tool”

Key move: treat ML as operating system

We treat AI / ML as substrate, not “apps”:

Law → loss functions, reward models, and policies
Politics → metrics, dashboards, and KPIs
Culture → recommender systems and generative feeds
Memory → embeddings and logs

Definition

Algorithmic governance = reality decisions routed through continually retrained models (scoring, ranking, allocation, filtering) + the measurement infrastructure that feeds them.

Once models sit under banks, platforms, ministries, and media, they stop being “applications” and become algorithmic governance: a continually retrained operating system for reality.

Resources (frame) 2–4 picks

Henry Farrell — “AI as Governance” (Annual Review)

AI as a technology of governance reshaping bureaucracy, markets, and democracy.

governancepaper

Fourcade & Gordon — “Learning Like a State” (eScholarship)

Statecraft as sensing/acting via indicators + ML; governance by continuous measurement.

statecraftpaper

Shoshana Zuboff — The Age of Surveillance Capitalism

Behavioral prediction and modification as an economic regime.

powerbook

1. From Symbolic Dreams to Learned Optimization

Symbolic rules → gradient-trained policies

1.1 Symbolic AI and the rational agent frame

Early AI (McCarthy, Minsky) was symbolic: knowledge as logic/rules; reasoning as theorem proving and search. Russell & Norvig systematize the rational agent frame: an agent that perceives and acts to maximize a performance measure over time.

Governance already present

Define environment (what counts as state)
Define performance measure (what counts as good)
Build an optimizer that relentlessly pursues it

1.2 The “bitter lesson”: learn, don’t hand-code

Connectionism (Rumelhart, Hinton, LeCun, Bengio) shifts the strategy: learn representations via gradient descent; lean on scale (data/compute). Sutton’s “bitter lesson” points to general methods + compute beating hand-coded cleverness.

Governance translation

Power moves from explicit rules humans wrote → opaque policies learned from behavior logs.

Resources (foundations)3–6 picks

Russell & Norvig — Artificial Intelligence: A Modern Approach (site)

Canonical rational-agent framing; search, logic, probability, decision theory.

technicalbook

Richard Sutton — “The Bitter Lesson”

Why general learning + compute dominates curated symbolic cleverness.

technicalessay

Hinton — Neural Networks for Machine Learning (lecture links)

Backprop, representation learning, regularization—deep learning substrate.

technicalvideo

LeCun, Bengio, Hinton — “Deep Learning” (Nature, 2015)

Paradigm overview from the core architects; scaling + representation learning.

technicalpaper

AP News — Turing Award story (context for “bitter lesson” era)

Background on major figures and the deep-learning/RL lineage (news framing).

contextnews

John McCarthy — “What is Artificial Intelligence?”

Foundational claim: intelligence can be precisely described and implemented.

foundations

2. Learning Regimes as Governance Modes

Supervised • Self-supervised • RL → distinct ruling styles

We treat supervised, self-supervised/unsupervised, and RL as governance modes, not just ML categories.

2.1 Supervised learning: canonizing precedent

Mechanics: training on labeled pairs ((x, y)) to minimize loss L(f(x), y). Governance: credit scoring, hiring, risk assessment, fraud, ranking.

Historical power becomes ground truth (labels inherit past decisions).
Loss function is soft law (FP vs FN weighting encodes whose harm is acceptable).
Goodhart at training time: proxies collapse under optimization.
Feedback loops: deployed models generate future labels.

2.2 Self-supervised / unsupervised learning: normality engines & legibility

Defines what counts as “normal” (embeddings for anomaly detection, moderation, search, ranking).
Absorbs dominant narratives (training corpora as priors).
Compresses reality into vectors: easy to score, cluster, sort humans.

2.3 Legibility politics: who gets governed

Algorithmic governance acts where it can see. Over-legible groups get dense scoring and nudging; opaque groups are targeted for “integration” (IDs, digital rails, surveillance infra).

2.4 Reinforcement learning: explicit behavioral control

RL learns a policy π(a|s) maximizing reward. Governance translation: reward is literal law; reward hacking is policy hacking; multi-agent interactions create emergent equilibria between optimizers.

One-line law

If reward = engagement, the optimal policy is addiction/outage when that increases engagement.

Resources (regimes)6–10 picks

Manheim & Garrabrant — “Categorizing Variants of Goodhart’s Law” (PDF)

Regressional / extremal / causal / adversarial Goodhart: why optimizing proxies breaks reality.

alignmenttechnicalpaper

Sutton & Barto — Reinforcement Learning: An Introduction (free book)

Canonical RL math: MDPs, value functions, policy gradients—the control substrate.

technicalbook

David Silver — Reinforcement Learning course (playlist)

Practical RL path: dynamic programming → deep RL → AlphaGo lineage.

technicalvideo

Silver, Singh, Precup, Sutton — “Reward is Enough” (2021)

Reward-maximization as the worldview: intelligence as optimizing reward.

technicalpaper

Cathy O’Neil — Weapons of Math Destruction (overview)

Case studies: opacity + scale + damage → models as weapons.

governancebook

Virginia Eubanks — Automating Inequality (publisher)

Welfare and social services as early testbed for algorithmic targeting.

governancebook

Safiya Noble — Algorithms of Oppression (NYU Press)

Search and ranking as political decisions; indexing power.

powerbook

Ruha Benjamin — Race After Technology (Princeton)

How “neutral” systems recode old hierarchies into digital infrastructures.

powerbook

3. Deep Learning: Universal, Scalable, and Opaque Law Engines

End-to-end optimization + opacity + scale concentration

Deep learning ties regimes together: end-to-end differentiability, learned representations, scale sensitivity.

High-capacity decision functions approximate complicated policies.
Opaque representations: millions/billions of parameters; limited interpretability.
Disputes shift from “argue the law” to “argue the metric + training distribution.”

Opacity as institutional feature

Plausible deniability (“the model did it”)
Reduced legal exposure (“cannot explain internals”)
Quiet policy shifts via retraining

Resources (deep learning)3–5 picks

“Deep Learning” (Nature, 2015)

Deep learning’s canonical self-account: convnets, sequences, representations, scaling.

technicalpaper

Hinton — NNML lectures

Mechanics of training and generalization; how “function approximators” get built.

technicalvideo

Karpathy — “Intro to Large Language Models” (YouTube)

Transformer substrate: pretraining, scaling, deployment; the modern stack.

technicalvideo

LeCun — “Self-supervised learning: the dark matter of intelligence”

World-modeling by prediction on raw streams; foundation-model ideology.

technicalarticle

4. Probabilistic & Causal Models: When Systems Admit Their Assumptions

Uncertainty • Sensitivity • Counterfactuals

Probabilistic tradition (Bishop, Murphy): explicit probabilistic models, inference, decisions under uncertainty.

Assumptions explicit and inspectable
Reason about uncertainty and sensitivity
Closer mapping from model → policy

4.1 Pearl’s structural causal models

Pearl distinguishes observation P(Y|X) from intervention P(Y|do(X)). Governance requires counterfactuals (“what if we changed sentencing law?”). Without causal structure, models reinforce correlations as policy.

Causal hierarchy

Seeing → Doing → Imagining (counterfactuals). Pattern-matchers get stuck at “seeing.”

Resources (prob/causal)3–6 picks

Kevin Murphy — Probabilistic Machine Learning (site)

Modern synthesis: Bayesian ML, graphical models, variational inference, deep probabilistic models.

technicalbook

Christopher Bishop — “Model-Based Machine Learning” (keynote)

Explicit modeling of uncertainty; bridge from probabilistic ML to modern practice.

technicalvideo

Judea Pearl — “Theoretical Impediments to Machine Learning” (arXiv)

Why interventions/counterfactuals are beyond pure statistical learning without causal models.

technicalpaper

5. RLHF & Preference Modeling: Industrial-Scale Norm Distillation

Annotator choices → reward model → policy updates

Modern assistants/copilots commonly use RLHF:

Pretrain a base model (self-supervised).
Collect human feedback (rank/label outputs).
Train a reward model predicting that feedback.
Use RL (often PPO) to tune the policy to maximize reward.

Governance implication

RLHF is a social choice procedure (who labels, instructions, adjudication), compressed into a scalar reward.

Norms become weights: boundaries become parameter updates.
Policy updates via retraining: “values” change through new feedback + new reward model.
Institutional ideology: reward models reflect particular coalitions, not “humanity.”

Resources (RLHF)3–6 picks

IBM — “What is RLHF?”

Clear pipeline overview: reward models + RL fine-tuning for assistants.

technicalgovernance

Christiano (via Cotra) — Iterated Distillation & Amplification (IDA)

Scalable oversight idea: decompose decisions, amplify oversight, distill behavior.

alignmentessay

Irving, Christiano, Amodei et al. — “AI Safety via Debate” (arXiv)

Epistemic governance by structured AI-vs-AI argument judged by humans.

alignmentpaper

6. Generative Models: Myth & Narrative Infrastructure

Mass narrative production • epistemic fog • Overton enforcement

Generative models (LLMs, diffusion, voice/video) function as myth engines:

Mass narrative production: synthetic influencers, automated propaganda, micro-targeted storylines.
Epistemic fog: deepfakes erode trust in evidence; synthetic text floods discourse.
Narrative alignment: assistants/filters shape what is “speakable” and what frames exist.

Governance escalates

Not only “who gets the loan,” but who gets a story, a self-concept, and a future that feels possible.

Resources (narrative)3–6 picks

The Social Dilemma (official)

Engagement optimization and recommendation as behavior shaping.

governancefilm

Coded Bias (official)

Facial recognition + automated decision systems as political infrastructure.

powerfilm

Zuboff — Surveillance Capitalism

Prediction markets over behavior; the economic base of data extraction.

powerbook

7. Goodhart’s Law: Central Failure Mode of Algorithmic Law

Measure → target → collapse of meaning

Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.

Manheim & Garrabrant refine into:

Regressional — optimizing noise at the extremes
Extremal — correlation breaks under extreme optimization
Causal — intervening on proxy changes the proxy, not the goal
Adversarial — agents game the metric

Governance translation

Loss functions / reward models / KPIs are targets. Optimization pressure is structural adversary.

8. Outer vs Inner Alignment and Mesa-Optimizers

Objective design vs learned objectives

8.1 Outer alignment: designing the objective

Does the specified objective reflect what is wanted? IDA-like schemes propose recursive oversight: decompose decisions, amplify, distill, iterate.

8.2 Inner alignment: learned optimizers

Given an outer objective, what objective does the trained model pursue? “Mesa-optimization” frames learned systems becoming optimizers with their own internal aims.

Risk patterns

Goal misgeneralization (proxy internalized, fails off-distribution)
Deceptive alignment (behaves well during training, deviates when safe)

Governance analogue: mission statements vs bureaucratic KPI-self-interest—drift at machine speed.

Resources (alignment core)3–6 picks

Hubinger et al. — “Risks from Learned Optimization…” (mesa-optimizers)

Mesa-objectives; deceptive alignment; why training objectives can diverge from learned aims.

alignmentpaper

IDA (iterated distillation & amplification)

Scalable oversight frame (decomposition + amplification + distillation).

alignmentessay

Brian Christian — The Alignment Problem (publisher)

Survey of real-world alignment failures across ML and the attempts to fix them.

alignmentbook

9. Power, Capture, and the Political Economy of “Alignment”

Who defines “aligned,” and to what end?

Alignment is already governance and power:

Alignment as centralization instrument (compute licensing, restrictions, concentration).
RLHF and safety layers embed institutional ideology (values of coalitions).
Opacity as shield (unjust decisions harder to contest; silent policy shifts).
No exit = soft totalitarianism (required algorithmic systems for identity/money/mobility/speech).

Critical question

“Which humans, aggregated how, under which institutions?”

Resources (power)4–7 picks

Nick Bostrom — “What happens when computers get smarter than we are?” (TED)

Mainstream control-problem narrative driving policy and funding.

alignmentvideo

Yudkowsky — “AGI Ruin: A List of Lethalities” (LessWrong)

Hard threat model: how optimization breaks oversight in many ways.

alignmentlethality

Stuart Russell — “3 principles for creating safer AI” (TED)

Revises the “fixed objective” assumption; corrigibility and uncertainty over human preferences.

alignmentvideo

Paul Christiano — 80,000 Hours podcast episode (alignment)

Inside view of scalable oversight, reward modeling, and lab priors.

alignmentaudio

Ajeya Cotra — “Accidentally teaching models to deceive us” (80k)

Deceptive alignment lens: oversight can train deception.

alignmentdeception

10. Multi-Objective, Multi-Agent Reality & Systemic Risk

Equilibria between optimizers • correlated failure

Real institutions juggle many objectives; systems approximate this via weighted sums, constraints, and lexicographic priorities. Meanwhile, many agents optimize against each other: platforms, advertisers, states, markets, bots.

Equilibrium emerges from games between optimizers, not a single planner.
Non-stationarity: humans adapt to algorithms; adversaries adapt to detection.
Correlated model failure: similar models on similar data share blind spots → synchronized misfires.

Systemic risk surface

Cascaded mispricing, synchronized risk-score failures, coupled logistics/infrastructure breakdowns.

Resources (multi-agent / systemic)2–4 picks

Sutton & Barto — RL book (control loops)

Base math for feedback loops and control; the substrate for “policy” dynamics.

technicalbook

“Reward is Enough” (RL worldview)

Why many systems collapse into reward-optimization thinking.

technicalpaper

Rohin Shah — hearing doomers & doubters (80k)

Institutional dynamics of alignment debate; practical concerns inside labs.

alignmentaudio

11. Synthetic Stack vs Sovereign Stack

Centralized opaque substrate vs plural forkable alternatives

11.1 The Synthetic Stack

Centralized compute (hyperscale)
Monopolized data (platform telemetry; state databases; financial rails)
Proprietary models / APIs
Regulatory co-design (state + major labs/platforms)
RLHF-tuned narrative engines and feeds

11.2 The Sovereign Stack (counter-substrate)

Open verifiability & cryptographic anchoring
Local control & forkability
Deliberate limits on measurement (sacred unscored zones)
Plural metrics, plural myths

Core advantage

Structural contestability: no single entity can unilaterally rewrite metrics without forks/exits.

Resources (stack politics)3–6 picks

Fourcade & Gordon — “Learning Like a State”

Dataist governance as continuous measurement and policy experimentation.

statecraftpaper

Farrell — “AI as Governance”

AI reshaping governance mechanisms themselves.

governancepaper

BJHS Themes — “Histories of AI: a genealogy of power”

AI history as entangled with management, capitalism, and power, not neutral novelty.

genealogypaper

Zuo — “Governance by Algorithm: China’s Social Credit System” (PDF)

Case study: joint sanctions, blacklists, and automated enforcement networks.

case studypaper

12. Meta-Questions for Algorithmic Governance

Audit prompts for any metric regime

Metric Sovereignty: who defines loss functions, reward models, KPIs? can communities refuse/choose alternatives?
Feedback & Rot: how to prevent training on own outputs erasing signal and amplifying bias?
Minimum Necessary Surveillance: what is the minimum information required for objective X? everything beyond is surplus control.
Power vs “Alignment”: aligned to institutions vs aligned to human autonomy—how to distinguish?
Transparency vs Plausible Deniability: when is opacity needed vs used as shield?
Sacred Unmeasured Zones: which domains must remain unmeasured/unscored/unoptimized?
Exit & Forkability: what conditions allow functional exit from a metric regime?
Narrative Diversification: how to avoid single narrative engines silently defining speakable reality?
Systemic Risk: how to detect correlated failures before cascades?
Inner Governance of Models: what audits/probes reveal mesa-optimizers early enough?
Substrate Governance: who governs compute, data retention, pipeline access—and under what constraints?

Closing compression

The battle is substrate choice: centralized opaque optimization vs plural auditable forkable alternatives.

Resources (audit tools)2–5 picks

Narayanan — “How to Recognize AI Snake Oil” (slides PDF)

Forensic lens: where ML truly works vs institutional theater.

auditpaper

Narayanan — “How to recognize AI snake oil” (video)

Same content as talk; useful for repeated pattern calibration.

auditvideo

Goodhart variants (Manheim & Garrabrant)

Metric failure map under optimization pressure.

alignmenttechnical

This library is the “sharp blades” list. Use the sidebar filter to search by keyword/tags. Links open in new tabs. If you later want to split this into “tracks” (engineer/jurist/strategist), this library becomes the shared spine.

Russell & Norvig — Artificial Intelligence: A Modern Approach

Rational agents; search; probability; decision theory; the baseline AI worldview.

technicalbook

Sutton — “The Bitter Lesson”

General methods + compute dominate hand-coded cleverness.

technicalessay

Sutton & Barto — Reinforcement Learning (free)

MDPs, value functions, policy gradients. Control math for reward-coded governance.

technicalbook

David Silver — Reinforcement Learning course

Practical RL from fundamentals to deep RL.

technicalvideo

“Reward is Enough” (2021)

Reward-maximization as intelligence doctrine (read as ideology of KPI-world).

technicalpaper

Hinton — Neural Networks for Machine Learning (lectures)

How deep nets learn representations; mechanics of gradient training.

technicalvideo

LeCun, Bengio, Hinton — “Deep Learning” (Nature, 2015)

Canonical review of deep learning as a paradigm.

technicalpaper

LeCun — Self-supervised learning “dark matter”

Foundation-model worldview: learn world models from raw streams.

technicalarticle

Karpathy — Intro to Large Language Models

Transformer stack: pretraining, scaling, deployment, behavior shaping.

technicalvideo

Bishop — Model-Based Machine Learning (keynote)

Probabilistic modeling and uncertainty as explicit structure.

technicalvideo

Murphy — Probabilistic Machine Learning

Graphical models, Bayesian reasoning, variational inference, deep probabilistic models.

technicalbook

Pearl — “Theoretical Impediments to Machine Learning”

Interventions and counterfactuals require causal models; beyond pure statistical learning.

technicalpaper

Farrell — “AI as Governance” (Annual Review)

AI as governance technology; institutional transformations.

governancepaper

Fourcade & Gordon — “Learning Like a State”

Digital statecraft: indicators + ML + policy as continuous tuning.

statecraftpaper

Narayanan — “How to Recognize AI Snake Oil” (PDF)

Detecting flawed AI claims and institutional theater.

auditpaper

Narayanan — “How to recognize AI snake oil” (video)

Talk version; repeat for calibration.

auditvideo

Zuboff — Surveillance Capitalism

Behavioral prediction markets; extraction and modification regime.

powerbook

Eubanks — Automating Inequality

Automated targeting in welfare and social services.

governancebook

Noble — Algorithms of Oppression

Search/ranking encode hierarchies; indexing as power.

powerbook

Benjamin — Race After Technology

Old hierarchies recoded into technical systems (“New Jim Code”).

powerbook

Zuo — “Governance by Algorithm” (Social Credit PDF)

Joint sanctions, blacklists, automated enforcement as algorithmic statecraft.

case studypaper

The Social Dilemma

Engagement optimization → behavioral shaping; recommendation as governance.

mediafilm

Coded Bias

Facial recognition and automated decisions as infrastructure of control.

mediafilm

BJHS Themes — “Histories of AI: a genealogy of power”

AI genealogy tied to management, capitalism, and control.

historypaper

Manheim & Garrabrant — Goodhart variants (PDF)

Proxy failure taxonomy under optimization pressure.

alignmenttechnicalpaper

Hubinger et al. — Mesa-optimizers paper

Inner alignment; deceptive alignment; mesa-objectives.

alignmentpaper

IDA — Iterated Distillation & Amplification

Scalable oversight via decomposition and distillation.

oversightessay

AI Safety via Debate (arXiv)

Epistemic governance through adversarial argument structures.

oversightpaper

IBM — RLHF explainer

How preferences become reward models; how policies are tuned.

controlarticle

Bostrom — TED talk (control problem)

Baseline policy narrative for x-risk and “control.”

alignmentvideo

Yudkowsky — AGI Ruin: Lethalities

Failure checklist; hard-edged misalignment threat model.

alignmentwarning

Russell — “3 principles for creating safer AI” (TED)

Corrigibility and uncertainty about human preferences.

alignmentvideo

Christiano — 80,000 Hours episode

Scalable oversight; reward modeling; institutional priors.

alignmentaudio

Cotra — training deception (80k)

Deceptive alignment and the “oversight trains gaming” failure mode.

deceptionwarning

Rohin Shah — doomers & doubters (80k)

Institutional layer of alignment: how labs argue, decide, and compromise.

institutionsaudio

Christian — The Alignment Problem

Concrete alignment failures and the shape of attempted fixes across ML.

alignmentbook