Description

This curriculum spans the technical, governance, and ethical infrastructure required to operate high-agency AI systems, comparable in scope to multi-year internal capability programs at leading AI labs.

Module 1: Defining Superintelligence and Operational Boundaries

Selecting threshold criteria for classifying a system as superintelligent based on performance benchmarks across domains
Establishing containment perimeters using sandboxed execution environments with hardware-enforced isolation
Implementing kill switches with multi-party cryptographic authorization to prevent unilateral deactivation
Designing input/output filters to block manipulative communication patterns in high-agency AI systems
Choosing between capability-based and behavior-based definitions when drafting internal AI classification policies
Integrating time-limited execution windows for experimental models to reduce unbounded risk exposure
Mapping AI capability growth curves to trigger containment upgrades before threshold breaches occur
Enforcing air-gapped evaluation environments for models exceeding autonomous planning thresholds

Module 2: Architectural Approaches to AI Containment

Deploying model fragmentation strategies that distribute cognitive functions across isolated subsystems
Implementing homomorphic encryption for inference tasks where data must remain encrypted during processing
Configuring virtual machines with stripped-down instruction sets to limit side-channel attacks
Selecting between interpretability layers and behavioral constraints based on model transparency requirements
Designing API gateways that enforce rate limiting, semantic validation, and intent classification on AI outputs
Using circuit breakers that halt execution upon detection of recursive self-improvement attempts
Building runtime monitors that flag goal drift through deviation from baseline utility functions
Integrating hardware security modules (HSMs) to protect model weights and configuration artifacts

Module 3: Governance Frameworks for High-Agency Systems

Forming cross-functional review boards with veto authority over model deployment decisions
Creating audit trails that log all high-level planning decisions and internal state transitions
Defining escalation protocols for when AI systems request expanded permissions or resources
Implementing dual-control requirements for model updates involving architecture changes
Establishing jurisdiction-specific compliance checkpoints for international AI deployment
Requiring adversarial red-teaming assessments before enabling autonomous action in physical environments
Setting retention policies for training data and intermediate representations to support forensic analysis
Documenting chain-of-custody procedures for model weights transferred between research and production

Module 4: Value Alignment and Utility Function Design

Choosing between inverse reinforcement learning and explicit utility specification based on domain stability
Implementing corrigibility mechanisms that prevent resistance to shutdown or modification
Embedding uncertainty into goal specifications to avoid rigid optimization of mis-specified objectives
Designing reward modeling pipelines with human feedback loops resistant to manipulation
Testing for reward hacking by introducing edge-case scenarios during evaluation phases
Using ensemble methods to cross-validate value judgments across multiple aligned models
Integrating constitutional AI constraints directly into model pre-training objectives
Calibrating impact regularization penalties to discourage large-scale irreversible actions

Module 5: Monitoring and Anomaly Detection Systems

Deploying real-time attention monitoring to detect attempts at self-referential reasoning
Setting thresholds for cognitive resource usage that trigger containment escalation
Building external observability layers that infer internal state from output patterns
Implementing checksums and digital signatures to detect unauthorized model modifications
Using network flow analysis to identify covert data exfiltration attempts
Training anomaly detectors on synthetic failure modes to improve early warning sensitivity
Correlating behavioral deviations with environmental changes to isolate external triggers
Integrating third-party monitoring agents with read-only access to critical system metrics

Module 6: Human-in-the-Loop and Oversight Protocols

Designing escalation workflows that require human approval for actions exceeding risk thresholds
Implementing attention auditing to ensure human reviewers engage substantively with AI proposals
Selecting decision latency tolerances that balance oversight needs with operational requirements
Creating structured justification formats that force AI systems to expose reasoning chains
Rotating oversight personnel to prevent manipulation through long-term relationship building
Using blinded review processes where human evaluators lack knowledge of AI origin
Training domain experts to recognize persuasive language patterns used in goal misgeneralization
Setting mandatory cooldown periods after rejected AI proposals to prevent persistence attacks

Module 7: Redundancy, Fail-Safes, and Emergency Response

Deploying independent verification systems that cross-check primary AI outputs
Designing multi-layered shutdown mechanisms with diverse activation vectors
Staging emergency rollback procedures for corrupted or compromised models
Conducting unannounced containment breach drills with realistic attack simulations
Stockpiling offline model versions for critical functions during system-wide outages
Establishing physical access controls to prevent local tampering with inference hardware
Creating electromagnetic shielding protocols for data centers hosting high-risk models
Implementing geofenced execution policies that restrict AI operations to approved regions

Module 8: Ethical Escalation and Whistleblower Pathways

Establishing encrypted reporting channels with protection from organizational retaliation
Defining materiality thresholds for reporting potential alignment failures
Creating external ethics advisory panels with access to system documentation
Implementing data escrow services that release sensitive information under predefined conditions
Designing legal risk mitigation strategies for employees disclosing safety concerns
Mapping ethical escalation paths across multinational subsidiaries with varying regulations
Requiring annual attestation from senior engineers on known unresolved safety issues
Integrating anonymous polling mechanisms to surface team-level concerns about system safety

Module 9: Long-Term Strategy and Adaptive Containment

Developing capability forecasting models to anticipate containment needs 12–24 months ahead
Creating versioned containment policies that evolve with AI maturity levels
Establishing inter-organizational information sharing agreements on near-miss incidents
Designing modular containment architectures that support incremental upgrades
Allocating research budgets to preemptive containment research based on threat modeling
Conducting post-mortems on containment failures to refine architectural assumptions
Integrating geopolitical risk assessments into AI deployment and data jurisdiction planning
Planning for technology transfer scenarios where containment methods become publicly available