Entropy, Complexity, and Control under Hostile Models

Information here is not “bits in a pipe.” It is the structure required to reconstruct a world, a will, or a law. The same mathematics that enables compression and prediction also enables legibility and steering.

Anchors: Shannon · Kolmogorov · Solomonoff · Chaitin · Levin · Rissanen Axis: entropy → bottlenecks → capacity → distortion → algorithmic complexity → induction → MDL → network secrecy

0. Frame — Information as Battleground

Classical information theory markets neutrality. In a hostile setting, “neutrality” is a selection function: what is preserved, what is discarded, what becomes controllable.

Definitions (operational)
Information = structure required to reconstruct a world / will / law.
Signal = information bound to intention or protocol.
Noise = what a model discards / fails to detect.
Compression = decision about what persists and what is erased.

Six lenses used in this modulemap

Shannon: entropy, codes, capacity, rate–distortion.
Kolmogorov: algorithmic complexity, entanglement, depth.
Solomonoff: universal induction as ideal prediction.
Chaitin & Levin: incompleteness + resource-bounded simulation/search.
Rissanen/MDL: models as codes; selection as compression.
Network info: distributed coding, secrecy capacity, coordination under observation.

Orientation resourcesstart

VIDEO The Bit Player (Claude Shannon)
Historical context: why “entropy” became the constitutional limit language for communication.
TALK Sergio Verdú — “Information Theory Today”
High-level re-map of modern IT: limits framing (capacity, compression, exponents).
OCW MIT OCW 6.050J — Information and Entropy
Entropy foundations + max-entropy/stat mech bridge.

1. Shannon Layer — Entropy · Mutual Information · Channels · Distortion

1.1 Entropy and prefix-free codingH(X)

Shannon entropy for a discrete source $X$:

$$H(X) = -\sum_x p(x)\log_2 p(x).$$

Prefix-free codes satisfy:

$$H(X)\le \mathbb{E}[L(X)] < H(X)+1.$$

Kraft–McMillan constraint:

$$\sum_i 2^{-l_i}\le 1.$$

Sovereign reading
Entropy is the irreducible description cost an adversary pays to encode your source statistics under its chosen model. Raise entropy under their best model → raise minimal average code length.

1.2 Mutual information, DPI, and bottlenecksI(X;Y)

$$I(X;Y)=H(X)-H(X|Y)=H(Y)-H(Y|X).$$

Data Processing Inequality (Markov chain $X\to Y\to Z$): $$I(X;Z)\le I(X;Y).$$

Information bottleneck functional: $$\min_{p(t|x)}\; I(X;T)-\beta I(T;Y).$$

Sovereign reading
DPI hard-limits extraction: once you choose what interface representation $T$ leaks outward, no post-processing can exceed $I(X;T)$. Interfaces are constitutional boundaries.

1.3 Channel capacity and error-correcting structureC

Channel capacity: $$C=\max_{p(x)} I(X;Y).$$

Reliable communication exists for rates $R<C$; impossible for $R>C$ (Shannon coding theorem).

Sovereign reading
Redundancy is not waste; it is sacrifice of capacity to preserve signal under noise/attack. Critical law/keys/ledgers require redundant encoding across nodes and substrates.

1.4 Rate–distortion (approximation for control)R(D)

$$R(D)=\min_{p(\hat{x}|x):\;\mathbb{E}[d(X,\hat{X})]\le D} I(X;\hat{X}).$$

Distortion $d(\cdot)$ encodes what the compressor cares about.

Sovereign reading
Hostile models do not reconstruct you; they reconstruct control-relevant projections. The game is to force high rate for low-distortion approximation of those projections.

Resources (Shannon layer)spine

VIDEO MacKay — Cambridge lectures (playlist)
Entropy, coding, capacity, typicality, Bayesian bridges.
SITE MacKay — book/site (free)
“Information Theory, Inference, and Learning Algorithms” (text spine; MDL/Bayes bridge).
OCW MIT OCW 6.441 — Information Theory
Graduate rigor: entropy, MI, coding, hypothesis testing, lossy compression.
PAPER Shannon (1948) — A Mathematical Theory of Communication (PDF)
Origin: entropy, mutual information, channel capacity.
NOTES Tohme & Bialek — “A Brief Tutorial on Information Theory” (SciPost LN 87)
Modern intro framing for physics/neuro; clean inequalities and assumptions.
ARXIV Chodrow — “Divergence, Entropy, Information” (arXiv:1708.07459)
KL, MI, proofs and geometry; explicit on where Jensen/Gibbs do work.
BOOK Cover & Thomas — Elements of Information Theory (2e)
Canonical Shannon text for capacity/coding depth.
NOTES MIT 6.441 — Lecture notes index
Direct jump to entropy/MI chapters and PDFs.

2. Kolmogorov Layer — Algorithmic Complexity · Entanglement · Depth

2.1 Kolmogorov complexity and basis dependenceK(x)

For universal prefix machine $U$: $$K(x)=\min\{|p|: U(p)=x\}.$$

Invariance up to constant: $|K_{U_1}(x)-K_{U_2}(x)|\le c$. Exact $K(x)$ is uncomputable.

Sovereign reading
Effective complexity is representation-aware: “complex to us” may be “simple” inside an adversary’s latent basis.

2.2 Algorithmic mutual information (entanglement)I_A

$$I_A(x:y)=K(x)+K(y)-K(x,y).$$

Sovereign reading
High $I_A(\text{core}:\text{hostile models})$ ⇒ shared structure ⇒ cheap inference. Target: minimize outward entanglement; maximize internal entanglement among sovereign nodes.

2.3 Depth, sophistication, and cost asymmetrydepth

Raw $K(x)$ conflates noise with structure. Logical depth (Bennett) introduces generation-time cost for near-shortest programs.

Asymmetry condition
A pattern is strategically protective only if it is cheaper for you to maintain than for an adversary to infer/exploit at required fidelity.

Resources (Kolmogorov / AIT)K

SURVEY Hutter — “Algorithmic Information Theory” (Scholarpedia)
Map of the field: Kolmogorov, Solomonoff, Levin, randomness tests, MDL links.
BOOK Li & Vitányi — AIT (canonical text)
Kolmogorov complexity, universal distribution, MDL connections, randomness.
BOOK Shen · Uspensky · Vereshchagin — Kolmogorov Complexity & Randomness
Rigorous monograph reference (proof spine).
BOOK Calude — Information and Randomness
AIT + foundations/interpretive bridges (kept adversarial).
TALK Ard Louis — “Why the World is Simple”
Intuition: simplicity bias, emergence, low-complexity dominance.
TALK Hector Zenil — Algorithmic Information Dynamics (talks)
Modern extensions into causal discovery/complex systems (kept adversarial).

3. Solomonoff Layer — Universal Induction as Ideal Adversary

Algorithmic probability and Solomonoff inductionP(x)

For prefix-free $U$: $$P(x)=\sum_{p:\;U(p)\text{ outputs a string starting with }x}2^{-|p|},\quad P(x)\approx 2^{-K(x)}\text{ (up to constants).}$$

Universal induction = Bayesian mixture over all computable hypotheses with prior weight $2^{-|p|}$.

Sovereign reading
Ideal prediction heavily weights short programs. If a hostile stack can represent you with a short law under its inductive bias, it can predict/steer cheaply. Target: force long and/or slow adequate predictors for control-relevant projections.

Resources (Solomonoff)induction

PAPER Solomonoff (1964) — A Formal Theory of Inductive Inference (Part I)
Original algorithmic probability / universal induction formalization.
PAPER Solomonoff (1964) — Part II
Continuation: convergence behavior, theory completion.
PAPER Rathmanner & Hutter (2011) — Universal Induction (Entropy)
Reconstruction tying Solomonoff, Bayesianism, MDL, universality claims.
POD Lex Fridman Podcast #75 — Marcus Hutter
Long-form: AIXI, Solomonoff induction, and limits (filter hype).

4. Chaitin & Levin — Incompleteness and Resource Bounds

4.1 Chaitin Ω and complexity-certification limitsΩ

Halting probability: $$\Omega=\sum_{p:\;U(p)\text{ halts}}2^{-|p|}.$$

Incompleteness (informal): any fixed, consistent, computably axiomatized theory has bounded ability to certify high Kolmogorov complexity claims beyond some constant.

Sovereign reading
Static totalizing formalisms have ceilings. But adaptive empirical systems can expand models; cracks exist, but must be occupied with maneuver, not faith.

4.2 Levin universal search and “effective simulability”bounds

Universal search interleaves candidate programs with time proportional to $2^{-|p|}$. “Optimal” up to constants in theory; dominated by resource budgets in practice.

Sovereign reading
The sovereignty gap lives between “computable” and “effectively modelable.” Forcing long/slow adequate predictors under real budgets is the practical edge.

Resources (Chaitin / bounds)limits

POD Curt Jaimungal (TOE) — Chaitin episode(s)
Ω, randomness, incompleteness; separate AIT core from speculation.
POD Mind Matters — “Defining Randomness” (Chaitin)
How AIT randomness differs from classical probability.
NOTE Levin — universal search (primary refs)
Resource-bounded optimality framing.

5. Rissanen & MDL — Models as Codes

5.1 Minimum Description Length (MDL)MDL

Two-part MDL: $$L_{\text{total}}(M;D)=L(M)+L(D|M).$$

Global MDL optimization tends to erase rare high-signal modes as “not worth code length.”

Sovereign reading
Doctrines and governance schemes are compressors. Selection pressure favors low-parameter models that are “good enough” for administration and control.

5.2 Self-MDL audit (pressure, not tyranny)ΔL

Compare doctrine $M_{\text{codex}}$ vs simpler alternative $M_{\text{simple}}$:

$$\Delta L=[L(M_{\text{simple}})+L(D|M_{\text{simple}})]-[L(M_{\text{codex}})+L(D|M_{\text{codex}})].$$

Constraint
Some complexity is strategically/ethically necessary even if not MDL-optimal; it must be justified explicitly outside compressive efficiency.

Resources (MDL)codes

PAPER Rissanen (1978) — “Modeling by Shortest Data Description”
Foundational MDL paper (Automatica).
ARXIV Grünwald — “A Tutorial Introduction to the MDL Principle”
Best serious entry point; universal coding, NML, MDL–Bayes relations.
PAPER Grünwald & Roos (2019) — “MDL Revisited”
Modern compressed overview of MDL and refinements.
BOOK Grünwald — The Minimum Description Length Principle (book)
Definitive monograph: stochastic complexity, NML, universal coding.
BOOK Rissanen — Information and Complexity in Statistical Modeling
Mature MDL perspective; complements Grünwald’s systematic treatment.
TUTOR Potapov — “MDL and AGI” (AGI-15 tutorial)
MDL as practical surrogate for universal induction (kept adversarial).

6. Randomness — Algorithmic Randomness · Pseudorandomness · Projection Defense

6.1 Randomness vs pseudorandomnessAR/PR

A finite string $x$ is $c$-incompressible if $K(x)\ge |x|-c$. Pseudorandomness is generated by short programs but indistinguishable for bounded observers.

Sovereign reading
Pure noise is not the target. The target is structured complexity: unpredictability in control-relevant projections with internal consistency for coordination.

6.2 Rate–distortion of projectionsΠ

The hostile model controls via projections $\Pi(X)$ (spending patterns, location clusters, alignment scores). Even if $X$ is complex, $\Pi(X)$ may be low-dimensional and cheap to learn.

Requirement
Any projection $\Pi$ usable for steering must itself be expensive to approximate at low distortion.

Resources (randomness)tests

BOOK Downey & Hirschfeldt — Algorithmic Randomness and Complexity
Deep dive into randomness notions within computability theory.
BOOK Nies — Computability and Randomness
Complementary organization; strong focus on randomness hierarchies.
BOOK Martin-Löf randomness (primary/surveys)
Effective statistical tests ↔ prefix-complexity of prefixes.

7. Network Information — Multi-node Sovereignty and Secrecy

7.1 Distributed source coding (Slepian–Wolf)SW

Correlated sources $X,Y$ can be compressed separately and jointly decoded if:

$$R_X \ge H(X|Y),\quad R_Y \ge H(Y|X),\quad R_X+R_Y \ge H(X,Y).$$

Sovereign reading
Internal correlation can be exploited for efficient internal communication without exposing correlation structure externally.

7.2 Wiretap channels and secrecy capacitysec

Secrecy capacity exists when legitimate channels are effectively “better” than eavesdropper channels. Design objective: create and hold asymmetry where internal $C$ dominates external observation quality.

Network goal
High internal mutual information (reliable) + low external mutual information (degraded/expensive).

Resources (network)codes

BOOK Yeung — Information Theory and Network Coding (PDF)
Network information theory spine; deep on coding results and structure.
OCW MIT 6.441 (Spring 2010) — lecture notes list
Includes Slepian–Wolf notes and typicality/coding sequences.
PAPER Wyner — wiretap channel (primary)
Secrecy capacity formal origin.

8. Sovereign Information Doctrine — Final Laws

Law 1 — Threat-Scoped Unsurprisedness

Relative to adversary class $\mathcal{A}$ and distortion tolerance $D$: control-relevant prediction at distortion $\le D$ must be prohibitive in time/energy/data.

Law 2 — Dual-Code Architecture

Inner kernel: high complexity/depth. Outer code: low description length, survivable under distortion, points inward without revealing kernel.

Law 3 — Self-MDL Pressure (Not Tyranny)

Complexity must earn code length via compressive/explanatory gain or explicit strategic/ethical role; bloat is pruned or demoted.

Law 4 — Basis-Aware Complexity

Defenses must be hard under plausible hostile inductive biases (latent bases), not merely hard in your own descriptive language.

Law 5 — Control-Relevant Projection Defense

For any steering projection $\Pi$, ensure $R_{\Pi}(D)$ is high at low distortion.

Law 6 — Asymmetry of Cost

Depth protects only if cheaper to live than to infer/exploit at required fidelity.

Law 7 — Controlled Entanglement

Minimize $I_A(\text{core}:\text{hostile models})$; maximize internal $I_A(\text{node}_i:\text{node}_j)$. Enforce leak limits via DPI/bottlenecks.

Law 8 — Sacrificial Redundancy and Anchored Erasure

Critical memory is redundantly encoded across substrates with fork/kill-switches; erasure at scale is thermodynamically costly and trace-bearing.

Law 9 — Network Asymmetry and Secrecy Capacity

Exploit correlation (Slepian–Wolf) and channel advantage (wiretap-style) so internal reliability stays high while external leakage stays low.

Law 10 — Static Limits, Adaptive Adversaries

No static guarantee. Assume continual model updates; doctrine must be periodically re-audited against new representational bases and budgets.

9. Closing — Mathematics as Constitutional Constraint

Entropy, MI, DPI, capacity, rate–distortion, $K(x)$, $I_A$, universal induction, incompleteness, MDL, randomness, depth, and network coding: not “tools” but constraints on what power can compress, infer, erase, and steer without paying real costs.

Terminal statement
Hostile stacks win by compressing you into affordable control variables. Sovereign stacks win by arranging entropy, bottlenecks, depth, redundancy, and network asymmetry so hostile inference must either overspend or accept distortions that break control.

Resource index (all)links

VIDEOMacKay Cambridge playlist
Shannon spine.
SITEMacKay ITILA (book/site)
Shannon + inference + MDL bridges.
OCWMIT OCW 6.441
Graduate rigor.
PAPERShannon (1948) PDF
Origin paper.
NOTESChodrow (arXiv:1708.07459)
Entropy/KL/MI proofs + geometry.
SURVEYHutter (Scholarpedia)
AIT field map.
PAPERSolomonoff (1964) Part I
Universal induction origin.
PAPERSolomonoff (1964) Part II
Continuation.
ARXIVGrünwald — MDL tutorial
MDL entry spine.
PAPERRissanen (1978) — MDL founding
Model selection as coding length.
BOOKYeung — Network Coding (PDF)
Network IT spine.