top of page

Agency Foundations & AI Safety

Agency as the core of human experience. At the center of human life lies "agency": the capacity to be an effective actor in the physical and complex social worlds that we create and inhabit. Despite many advancements in (neuro)biology, psychology, game theory, economics, artificial intelligence (AI) and many other fields - we still lack a comprehensive theory of human (or non-human animal) agency: we do not understand how biology, development and culture interact to create human values, goals and actions towards such goals.  A comprehensive theory of agency would explain how the factors fit together - and critically - how to promote and protect individual and collective capacity to be effective actors in the world and to explore our possibly limitless future.

Human agency depletion as a deep attractor for AI and ML systems and the "agency singularity" event horizon. Modern machine learning (ML) methods and AI systems are increasingly being used to predict and shape human action in economic, personal and political decisions. Existential economic (and social) pressure will push ML and AI technologies to empower models and dis-empower humans - in both cases of misused and misaligned AI systems. We view the (nearly) complete control of human choices and society by AI systems as "the agency singularity".


Despite claims from the AI-capabilities community and several AI-alignment teams that specific goals such as "truthfulness" in AIs can protect or help mitigate against misaligned-AIs - it is likely impossible to predict the evolution of AI systems as we approach and as we pass this singularity (e.g. the meaning and value of "truth" might be useless). That is, it is unlikely that once AI systems develop misaligned agency depleting goals - humanity can control them.

Agency foundations research. We propose to elevate research that directly targets the protection of human agency from loss due to AI technologies. Agency foundations research addresses conceptual problems such as how AI-systems represent agents and agency, and what are the pathways for agency loss in AI-human interactions. Our position paper highlights some the challenges with standard "intent-alignment" (or "truth") based approaches and their failure to  address the problem of agency-loss in AI-human interactions. We propose several areas for conceptual and formal research: agency-centered interpretability, agency formalization in causal models and reinforcement learning (RL) paradigms, (I)RL from internal states, and agency-preservation in game theoretic and RL paradigms.

Announcing the "Agency Foundation Challenge": We are hosting the first ever agency foundations research workshop hosted by Apart Research and sponsored by Future of Life Institute and The introductory hackathon is scheduled for September 8th-10th followed by two weeks of research and submission due on Sep 24, 2023. The event will be entirely remote-online hosted with some groups organizing. For preliminary information about the hackathon including topics covered or how to host a local session, please see our announcement page here.

bottom of page