top of page

Agency Foundations & AI Safety

Agency as the core of human experience. At the center of human life lies "agency": the capacity to be an effective actor in the physical and complex social worlds that we create and inhabit. Despite many advancements in (neuro)biology, psychology, game theory, economics, artificial intelligence (AI) and many other fields - we still lack a comprehensive theory of human (or non-human animal) agency: we do not understand how biology, development and culture interact to create human values, goals and actions towards such goals.  A comprehensive theory of agency would explain how the factors fit together - and critically - how to promote and protect individual and collective capacity to be effective actors in the world and to explore our possibly limitless future.

Human agency depletion as a deep attractor for AI and ML systems and the "agency singularity" event horizon. Modern AI systems are increasingly being used to predict and shape human action in economic, personal and political decisions. Existential economic (and social) pressure will push ML and AI technologies to empower models and dis-empower humans - in both cases of misused and misaligned AI systems. We view the (nearly) complete control of human choices and society by AI systems as "the agency singularity". Beyond this point, individual human choice or control is nearly completely depleted.


Truthfulness and ontology are not enough. In our view, goals such as "truthfulness" or ontology/interpretability in AI systems do not sufficiently mitigate against misaligned-AIs - as utility maximization alone will push AI systems to form human agency depletion strategies. As AI systems develop misaligned agency depleting goals, they can trivially manipulate human behavior and escape our control - with neither truthfulness or interpretability being sufficient to stop them.

Agency foundations research. Our paper on agency loss in AI-human interactions has now been accepted to ICML 2024 and highlighted as a spotlight paper. Our work discusses the challenges with standard "intent-alignment" (or "truth") based approaches to AI-safety by highlighting the complexity of human intent formation and how agency-loss naturally arises in AI-human interactions. As part of our work we propose agency preservation research including investigating agency representation in LLMs from both behavior and mechanistic interpretability perspectives.

In the fall of 2023 we also hosted the "Agency Foundation Challenge" a multi-day hackathon hosted by Apart Research and sponsored by Future of Life Institute and This hackathon was attended by > 100 researchers and we awarded nearly $10,000 in prizes (announcement page here).

bottom of page