Home / Stage 3 / 3.6

3.6 — AI, Machine Learning & Algorithmic Governance

Substrate • Power • Sovereignty

Visual anchors

These images are placed as “pattern locks” for: reinforcement loops, objective functions, and governance-as-optimization.

AI governance visual 1
Substrate shift
AI governance visual 2
Metrics → policy
AI governance visual 3
Optimization regime
Reinforcement learning loop diagram (Sutton & Barto)
RL loop (Sutton & Barto)

0. Frame: AI as Substrate Replacement, Not “Tool”

Key move: treat ML as operating system

We treat AI / ML as substrate, not “apps”:

  • Law → loss functions, reward models, and policies
  • Politics → metrics, dashboards, and KPIs
  • Culture → recommender systems and generative feeds
  • Memory → embeddings and logs
Definition
Algorithmic governance = reality decisions routed through continually retrained models (scoring, ranking, allocation, filtering) + the measurement infrastructure that feeds them.

Once models sit under banks, platforms, ministries, and media, they stop being “applications” and become algorithmic governance: a continually retrained operating system for reality.

1. From Symbolic Dreams to Learned Optimization

Symbolic rules → gradient-trained policies

1.1 Symbolic AI and the rational agent frame

Early AI (McCarthy, Minsky) was symbolic: knowledge as logic/rules; reasoning as theorem proving and search. Russell & Norvig systematize the rational agent frame: an agent that perceives and acts to maximize a performance measure over time.

Governance already present
  • Define environment (what counts as state)
  • Define performance measure (what counts as good)
  • Build an optimizer that relentlessly pursues it

1.2 The “bitter lesson”: learn, don’t hand-code

Connectionism (Rumelhart, Hinton, LeCun, Bengio) shifts the strategy: learn representations via gradient descent; lean on scale (data/compute). Sutton’s “bitter lesson” points to general methods + compute beating hand-coded cleverness.

Governance translation
Power moves from explicit rules humans wrote → opaque policies learned from behavior logs.

2. Learning Regimes as Governance Modes

Supervised • Self-supervised • RL → distinct ruling styles

We treat supervised, self-supervised/unsupervised, and RL as governance modes, not just ML categories.

2.1 Supervised learning: canonizing precedent

Mechanics: training on labeled pairs ((x, y)) to minimize loss L(f(x), y). Governance: credit scoring, hiring, risk assessment, fraud, ranking.

  • Historical power becomes ground truth (labels inherit past decisions).
  • Loss function is soft law (FP vs FN weighting encodes whose harm is acceptable).
  • Goodhart at training time: proxies collapse under optimization.
  • Feedback loops: deployed models generate future labels.

2.2 Self-supervised / unsupervised learning: normality engines & legibility

  • Defines what counts as “normal” (embeddings for anomaly detection, moderation, search, ranking).
  • Absorbs dominant narratives (training corpora as priors).
  • Compresses reality into vectors: easy to score, cluster, sort humans.

2.3 Legibility politics: who gets governed

Algorithmic governance acts where it can see. Over-legible groups get dense scoring and nudging; opaque groups are targeted for “integration” (IDs, digital rails, surveillance infra).

2.4 Reinforcement learning: explicit behavioral control

RL learns a policy π(a|s) maximizing reward. Governance translation: reward is literal law; reward hacking is policy hacking; multi-agent interactions create emergent equilibria between optimizers.

One-line law
If reward = engagement, the optimal policy is addiction/outage when that increases engagement.

3. Deep Learning: Universal, Scalable, and Opaque Law Engines

End-to-end optimization + opacity + scale concentration

Deep learning ties regimes together: end-to-end differentiability, learned representations, scale sensitivity.

  • High-capacity decision functions approximate complicated policies.
  • Opaque representations: millions/billions of parameters; limited interpretability.
  • Disputes shift from “argue the law” to “argue the metric + training distribution.”
Opacity as institutional feature
  • Plausible deniability (“the model did it”)
  • Reduced legal exposure (“cannot explain internals”)
  • Quiet policy shifts via retraining

4. Probabilistic & Causal Models: When Systems Admit Their Assumptions

Uncertainty • Sensitivity • Counterfactuals

Probabilistic tradition (Bishop, Murphy): explicit probabilistic models, inference, decisions under uncertainty.

  • Assumptions explicit and inspectable
  • Reason about uncertainty and sensitivity
  • Closer mapping from model → policy

4.1 Pearl’s structural causal models

Pearl distinguishes observation P(Y|X) from intervention P(Y|do(X)). Governance requires counterfactuals (“what if we changed sentencing law?”). Without causal structure, models reinforce correlations as policy.

Causal hierarchy
Seeing → Doing → Imagining (counterfactuals). Pattern-matchers get stuck at “seeing.”

5. RLHF & Preference Modeling: Industrial-Scale Norm Distillation

Annotator choices → reward model → policy updates

Modern assistants/copilots commonly use RLHF:

  1. Pretrain a base model (self-supervised).
  2. Collect human feedback (rank/label outputs).
  3. Train a reward model predicting that feedback.
  4. Use RL (often PPO) to tune the policy to maximize reward.
Governance implication
RLHF is a social choice procedure (who labels, instructions, adjudication), compressed into a scalar reward.
  • Norms become weights: boundaries become parameter updates.
  • Policy updates via retraining: “values” change through new feedback + new reward model.
  • Institutional ideology: reward models reflect particular coalitions, not “humanity.”

6. Generative Models: Myth & Narrative Infrastructure

Mass narrative production • epistemic fog • Overton enforcement

Generative models (LLMs, diffusion, voice/video) function as myth engines:

  • Mass narrative production: synthetic influencers, automated propaganda, micro-targeted storylines.
  • Epistemic fog: deepfakes erode trust in evidence; synthetic text floods discourse.
  • Narrative alignment: assistants/filters shape what is “speakable” and what frames exist.
Governance escalates
Not only “who gets the loan,” but who gets a story, a self-concept, and a future that feels possible.

7. Goodhart’s Law: Central Failure Mode of Algorithmic Law

Measure → target → collapse of meaning

Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.

Manheim & Garrabrant refine into:

  • Regressional — optimizing noise at the extremes
  • Extremal — correlation breaks under extreme optimization
  • Causal — intervening on proxy changes the proxy, not the goal
  • Adversarial — agents game the metric
Governance translation
Loss functions / reward models / KPIs are targets. Optimization pressure is structural adversary.

8. Outer vs Inner Alignment and Mesa-Optimizers

Objective design vs learned objectives

8.1 Outer alignment: designing the objective

Does the specified objective reflect what is wanted? IDA-like schemes propose recursive oversight: decompose decisions, amplify, distill, iterate.

8.2 Inner alignment: learned optimizers

Given an outer objective, what objective does the trained model pursue? “Mesa-optimization” frames learned systems becoming optimizers with their own internal aims.

Risk patterns
  • Goal misgeneralization (proxy internalized, fails off-distribution)
  • Deceptive alignment (behaves well during training, deviates when safe)

Governance analogue: mission statements vs bureaucratic KPI-self-interest—drift at machine speed.

9. Power, Capture, and the Political Economy of “Alignment”

Who defines “aligned,” and to what end?

Alignment is already governance and power:

  • Alignment as centralization instrument (compute licensing, restrictions, concentration).
  • RLHF and safety layers embed institutional ideology (values of coalitions).
  • Opacity as shield (unjust decisions harder to contest; silent policy shifts).
  • No exit = soft totalitarianism (required algorithmic systems for identity/money/mobility/speech).
Critical question
“Which humans, aggregated how, under which institutions?”

10. Multi-Objective, Multi-Agent Reality & Systemic Risk

Equilibria between optimizers • correlated failure

Real institutions juggle many objectives; systems approximate this via weighted sums, constraints, and lexicographic priorities. Meanwhile, many agents optimize against each other: platforms, advertisers, states, markets, bots.

  • Equilibrium emerges from games between optimizers, not a single planner.
  • Non-stationarity: humans adapt to algorithms; adversaries adapt to detection.
  • Correlated model failure: similar models on similar data share blind spots → synchronized misfires.
Systemic risk surface
Cascaded mispricing, synchronized risk-score failures, coupled logistics/infrastructure breakdowns.

11. Synthetic Stack vs Sovereign Stack

Centralized opaque substrate vs plural forkable alternatives

11.1 The Synthetic Stack

  • Centralized compute (hyperscale)
  • Monopolized data (platform telemetry; state databases; financial rails)
  • Proprietary models / APIs
  • Regulatory co-design (state + major labs/platforms)
  • RLHF-tuned narrative engines and feeds

11.2 The Sovereign Stack (counter-substrate)

  • Open verifiability & cryptographic anchoring
  • Local control & forkability
  • Deliberate limits on measurement (sacred unscored zones)
  • Plural metrics, plural myths
Core advantage
Structural contestability: no single entity can unilaterally rewrite metrics without forks/exits.

12. Meta-Questions for Algorithmic Governance

Audit prompts for any metric regime
  1. Metric Sovereignty: who defines loss functions, reward models, KPIs? can communities refuse/choose alternatives?
  2. Feedback & Rot: how to prevent training on own outputs erasing signal and amplifying bias?
  3. Minimum Necessary Surveillance: what is the minimum information required for objective X? everything beyond is surplus control.
  4. Power vs “Alignment”: aligned to institutions vs aligned to human autonomy—how to distinguish?
  5. Transparency vs Plausible Deniability: when is opacity needed vs used as shield?
  6. Sacred Unmeasured Zones: which domains must remain unmeasured/unscored/unoptimized?
  7. Exit & Forkability: what conditions allow functional exit from a metric regime?
  8. Narrative Diversification: how to avoid single narrative engines silently defining speakable reality?
  9. Systemic Risk: how to detect correlated failures before cascades?
  10. Inner Governance of Models: what audits/probes reveal mesa-optimizers early enough?
  11. Substrate Governance: who governs compute, data retention, pipeline access—and under what constraints?
Closing compression
The battle is substrate choice: centralized opaque optimization vs plural auditable forkable alternatives.

This library is the “sharp blades” list. Use the sidebar filter to search by keyword/tags. Links open in new tabs. If you later want to split this into “tracks” (engineer/jurist/strategist), this library becomes the shared spine.

Rational agents; search; probability; decision theory; the baseline AI worldview.
technicalbook
General methods + compute dominate hand-coded cleverness.
technicalessay
MDPs, value functions, policy gradients. Control math for reward-coded governance.
technicalbook
Practical RL from fundamentals to deep RL.
technicalvideo
Reward-maximization as intelligence doctrine (read as ideology of KPI-world).
technicalpaper
How deep nets learn representations; mechanics of gradient training.
technicalvideo
Canonical review of deep learning as a paradigm.
technicalpaper
Foundation-model worldview: learn world models from raw streams.
technicalarticle
Transformer stack: pretraining, scaling, deployment, behavior shaping.
technicalvideo
Probabilistic modeling and uncertainty as explicit structure.
technicalvideo
Graphical models, Bayesian reasoning, variational inference, deep probabilistic models.
technicalbook
Interventions and counterfactuals require causal models; beyond pure statistical learning.
technicalpaper
AI as governance technology; institutional transformations.
governancepaper
Digital statecraft: indicators + ML + policy as continuous tuning.
statecraftpaper
Detecting flawed AI claims and institutional theater.
auditpaper
Talk version; repeat for calibration.
auditvideo
Behavioral prediction markets; extraction and modification regime.
powerbook
Automated targeting in welfare and social services.
governancebook
Search/ranking encode hierarchies; indexing as power.
powerbook
Old hierarchies recoded into technical systems (“New Jim Code”).
powerbook
Joint sanctions, blacklists, automated enforcement as algorithmic statecraft.
case studypaper
Engagement optimization → behavioral shaping; recommendation as governance.
mediafilm
Facial recognition and automated decisions as infrastructure of control.
mediafilm
AI genealogy tied to management, capitalism, and control.
historypaper
Proxy failure taxonomy under optimization pressure.
alignmenttechnicalpaper
Inner alignment; deceptive alignment; mesa-objectives.
alignmentpaper
Scalable oversight via decomposition and distillation.
oversightessay
Epistemic governance through adversarial argument structures.
oversightpaper
How preferences become reward models; how policies are tuned.
controlarticle
Baseline policy narrative for x-risk and “control.”
alignmentvideo
Failure checklist; hard-edged misalignment threat model.
alignmentwarning
Corrigibility and uncertainty about human preferences.
alignmentvideo
Scalable oversight; reward modeling; institutional priors.
alignmentaudio
Deceptive alignment and the “oversight trains gaming” failure mode.
deceptionwarning
Institutional layer of alignment: how labs argue, decide, and compromise.
institutionsaudio
Concrete alignment failures and the shape of attempted fixes across ML.
alignmentbook