Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.39 KB

ais.md

File metadata and controls

30 lines (22 loc) · 1.39 KB

AI Safety Mini Talk

What it isn't about: AI becoming sentient, SkyNet etc.

Two main concepts:

Orthogonality thesis.

  • Agent can have any combination of intelligence level & final goals.

Instrumental convergence

  • Given any final goal, there is a set of commonly occurring sub-goals

Specification gaming

We can get a flavour of the implications of these concepts. Particularly in the form of specification gaming:

  • You're a game playing agent and are going to lose? Crash the game.
  • Boat race? Just do infinite donuts to collect points YT.
  • Hardware design (don't build a clock, build a receiver)
  • C.f. arguments around divisive content on social etc
  • Examples of specification gaming

References

Selection to give an idea of breadth: