Visual anchors
These images are placed as “pattern locks” for: reinforcement loops, objective functions, and governance-as-optimization.
0. Frame: AI as Substrate Replacement, Not “Tool”
We treat AI / ML as substrate, not “apps”:
- Law → loss functions, reward models, and policies
- Politics → metrics, dashboards, and KPIs
- Culture → recommender systems and generative feeds
- Memory → embeddings and logs
Once models sit under banks, platforms, ministries, and media, they stop being “applications” and become algorithmic governance: a continually retrained operating system for reality.
1. From Symbolic Dreams to Learned Optimization
1.1 Symbolic AI and the rational agent frame
Early AI (McCarthy, Minsky) was symbolic: knowledge as logic/rules; reasoning as theorem proving and search. Russell & Norvig systematize the rational agent frame: an agent that perceives and acts to maximize a performance measure over time.
- Define environment (what counts as state)
- Define performance measure (what counts as good)
- Build an optimizer that relentlessly pursues it
1.2 The “bitter lesson”: learn, don’t hand-code
Connectionism (Rumelhart, Hinton, LeCun, Bengio) shifts the strategy: learn representations via gradient descent; lean on scale (data/compute). Sutton’s “bitter lesson” points to general methods + compute beating hand-coded cleverness.
2. Learning Regimes as Governance Modes
We treat supervised, self-supervised/unsupervised, and RL as governance modes, not just ML categories.
2.1 Supervised learning: canonizing precedent
Mechanics: training on labeled pairs ((x, y)) to minimize loss L(f(x), y). Governance: credit scoring, hiring, risk assessment, fraud, ranking.
- Historical power becomes ground truth (labels inherit past decisions).
- Loss function is soft law (FP vs FN weighting encodes whose harm is acceptable).
- Goodhart at training time: proxies collapse under optimization.
- Feedback loops: deployed models generate future labels.
2.2 Self-supervised / unsupervised learning: normality engines & legibility
- Defines what counts as “normal” (embeddings for anomaly detection, moderation, search, ranking).
- Absorbs dominant narratives (training corpora as priors).
- Compresses reality into vectors: easy to score, cluster, sort humans.
2.3 Legibility politics: who gets governed
Algorithmic governance acts where it can see. Over-legible groups get dense scoring and nudging; opaque groups are targeted for “integration” (IDs, digital rails, surveillance infra).
2.4 Reinforcement learning: explicit behavioral control
RL learns a policy π(a|s) maximizing reward. Governance translation: reward is literal law; reward hacking is policy hacking; multi-agent interactions create emergent equilibria between optimizers.
3. Deep Learning: Universal, Scalable, and Opaque Law Engines
Deep learning ties regimes together: end-to-end differentiability, learned representations, scale sensitivity.
- High-capacity decision functions approximate complicated policies.
- Opaque representations: millions/billions of parameters; limited interpretability.
- Disputes shift from “argue the law” to “argue the metric + training distribution.”
- Plausible deniability (“the model did it”)
- Reduced legal exposure (“cannot explain internals”)
- Quiet policy shifts via retraining
4. Probabilistic & Causal Models: When Systems Admit Their Assumptions
Probabilistic tradition (Bishop, Murphy): explicit probabilistic models, inference, decisions under uncertainty.
- Assumptions explicit and inspectable
- Reason about uncertainty and sensitivity
- Closer mapping from model → policy
4.1 Pearl’s structural causal models
Pearl distinguishes observation P(Y|X) from intervention P(Y|do(X)). Governance requires counterfactuals (“what if we changed sentencing law?”). Without causal structure, models reinforce correlations as policy.
5. RLHF & Preference Modeling: Industrial-Scale Norm Distillation
Modern assistants/copilots commonly use RLHF:
- Pretrain a base model (self-supervised).
- Collect human feedback (rank/label outputs).
- Train a reward model predicting that feedback.
- Use RL (often PPO) to tune the policy to maximize reward.
- Norms become weights: boundaries become parameter updates.
- Policy updates via retraining: “values” change through new feedback + new reward model.
- Institutional ideology: reward models reflect particular coalitions, not “humanity.”
6. Generative Models: Myth & Narrative Infrastructure
Generative models (LLMs, diffusion, voice/video) function as myth engines:
- Mass narrative production: synthetic influencers, automated propaganda, micro-targeted storylines.
- Epistemic fog: deepfakes erode trust in evidence; synthetic text floods discourse.
- Narrative alignment: assistants/filters shape what is “speakable” and what frames exist.
7. Goodhart’s Law: Central Failure Mode of Algorithmic Law
Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.
Manheim & Garrabrant refine into:
- Regressional — optimizing noise at the extremes
- Extremal — correlation breaks under extreme optimization
- Causal — intervening on proxy changes the proxy, not the goal
- Adversarial — agents game the metric
8. Outer vs Inner Alignment and Mesa-Optimizers
8.1 Outer alignment: designing the objective
Does the specified objective reflect what is wanted? IDA-like schemes propose recursive oversight: decompose decisions, amplify, distill, iterate.
8.2 Inner alignment: learned optimizers
Given an outer objective, what objective does the trained model pursue? “Mesa-optimization” frames learned systems becoming optimizers with their own internal aims.
- Goal misgeneralization (proxy internalized, fails off-distribution)
- Deceptive alignment (behaves well during training, deviates when safe)
Governance analogue: mission statements vs bureaucratic KPI-self-interest—drift at machine speed.
9. Power, Capture, and the Political Economy of “Alignment”
Alignment is already governance and power:
- Alignment as centralization instrument (compute licensing, restrictions, concentration).
- RLHF and safety layers embed institutional ideology (values of coalitions).
- Opacity as shield (unjust decisions harder to contest; silent policy shifts).
- No exit = soft totalitarianism (required algorithmic systems for identity/money/mobility/speech).
10. Multi-Objective, Multi-Agent Reality & Systemic Risk
Real institutions juggle many objectives; systems approximate this via weighted sums, constraints, and lexicographic priorities. Meanwhile, many agents optimize against each other: platforms, advertisers, states, markets, bots.
- Equilibrium emerges from games between optimizers, not a single planner.
- Non-stationarity: humans adapt to algorithms; adversaries adapt to detection.
- Correlated model failure: similar models on similar data share blind spots → synchronized misfires.
11. Synthetic Stack vs Sovereign Stack
11.1 The Synthetic Stack
- Centralized compute (hyperscale)
- Monopolized data (platform telemetry; state databases; financial rails)
- Proprietary models / APIs
- Regulatory co-design (state + major labs/platforms)
- RLHF-tuned narrative engines and feeds
11.2 The Sovereign Stack (counter-substrate)
- Open verifiability & cryptographic anchoring
- Local control & forkability
- Deliberate limits on measurement (sacred unscored zones)
- Plural metrics, plural myths
12. Meta-Questions for Algorithmic Governance
- Metric Sovereignty: who defines loss functions, reward models, KPIs? can communities refuse/choose alternatives?
- Feedback & Rot: how to prevent training on own outputs erasing signal and amplifying bias?
- Minimum Necessary Surveillance: what is the minimum information required for objective X? everything beyond is surplus control.
- Power vs “Alignment”: aligned to institutions vs aligned to human autonomy—how to distinguish?
- Transparency vs Plausible Deniability: when is opacity needed vs used as shield?
- Sacred Unmeasured Zones: which domains must remain unmeasured/unscored/unoptimized?
- Exit & Forkability: what conditions allow functional exit from a metric regime?
- Narrative Diversification: how to avoid single narrative engines silently defining speakable reality?
- Systemic Risk: how to detect correlated failures before cascades?
- Inner Governance of Models: what audits/probes reveal mesa-optimizers early enough?
- Substrate Governance: who governs compute, data retention, pipeline access—and under what constraints?
This library is the “sharp blades” list. Use the sidebar filter to search by keyword/tags. Links open in new tabs. If you later want to split this into “tracks” (engineer/jurist/strategist), this library becomes the shared spine.