What it isn't about: AI becoming sentient, SkyNet etc.
Two main concepts:
Orthogonality thesis.
- Agent can have any combination of intelligence level & final goals.
Instrumental convergence
- Given any final goal, there is a set of commonly occurring sub-goals
We can get a flavour of the implications of these concepts. Particularly in the form of specification gaming:
- You're a game playing agent and are going to lose? Crash the game.
- Boat race? Just do infinite donuts to collect points YT.
- Hardware design (don't build a clock, build a receiver)
- C.f. arguments around divisive content on social etc
- Examples of specification gaming
Selection to give an idea of breadth:
- Rohin Shah's Alignment Newsletter.
- Rob Miles's AI Safety Channel
- stampy.ai - wiki / encyclopaedia for topics
- Nick Bostrom - Superintelligence
- FTX Future fund - funding for projects