Project summary

PIBBSS is a research initiative aiming to grow the AI Safety field by fostering interdisciplinary engagement with the natural and social sciences. Our mission is to support the advancement of foundational scientific contributions which would aid in the development of safe and globally beneficial AI systems.

To achieve this aim, we run multiple research residency programs and field-building events, each focused on achieving real-world outcomes aligned with our broader mission. Our efforts have yielded diverse and far-reaching results: from supporting the transition of interdisciplinary talent into the field, to collaborating on bleeding-edge research with established academics, to incubating novel research organizations and initiatives. Our approach to talent development involves guiding program participants to problems in AI Safety connected to their expertise, immersing them in the AI Safety community for ideas and networking, providing them with research management and engineering support, as well as covering their salaries and research expenses for the duration of the program. Over the past three years we have developed a strong reputation allowing us to attract increasingly exceptional talent and deliver impactful research results.

Your donation can go to our general expenses of running the program, or if you let us know, towards a specific program you wish to fund. About 20,000 USD gives us a month of runway, or 1 month of Horizon Scanning work, 30,000 USD gives us a marginal Fellow for 3 months, while 50,000 USD funds an affiliate for 6 months (numbers are approximate and economies of scale exist so these can be overestimates).

Our Mission

“PIBBSS is a research initiative aiming to grow the AI Safety field by fostering interdisciplinary engagement with the natural and social sciences.”

However, developing rigorous safety guarantees requires substantial scientific advances in our understanding of AI systems and their impacts. Unlike current day engineering practices for physical systems, the development and deployment of AI systems is not grounded in a mature science from which their safety properties can be rigorously deduced. Furthermore, as AI systems become deeply integrated into society, they will have complex and far-reaching impacts across multiple domains: from psychological effects of cognitive offloading, to economic disruptions from labor automation, to profound sociological and political shifts. Just as the development of safe vaccines required insights from immunology, microbiology, and epidemiology, we need a new interdisciplinary science that combines diverse fields to ensure the safety and efficacy of AI systems.

Our mission is to catalyze this interdisciplinary maturation of AI Safety by hosting an ecosystem of research residency programs as well as field building events.

Over our three-year history we have built a robust network and reputation for providing exceptional support to our community. Alumni from our research residency programs consistently praise the level of support they receive, often noting it surpasses their previous experiences in traditional academic settings. We're also unique in our ability to successfully attract senior academic talent, offering them opportunities to leverage their expertise in ways that many talent interventions, typically focused on early-career researchers, cannot accommodate. Lastly, our track record of 'making good bets' is evidenced by the praise we have praise and endorsement from the community, as well as the success of our alumni, who have gone on to found their own organizations (i.e. Simplex) and secure safety positions in top labs such as Anthropic and OpenAI.

Our Programs

PIBBSS works to catalyze the interdisciplinary maturation of AI Safety through various strategic interventions. We execute this plan by running diverse programs and events, focusing on three key strategies:

Research Interventions: Where we diversify the types of technical research bets pursued by the field through innovative interdisciplinary research.
1. We do this through our Affiliateship
Talent Interventions: Where we diversify the talent pool in the field by training up talent from diverse academic backgrounds to make the most impact on AI Safety.
1. We do this through our Horizon Scanning and Fellowship
Field Building Interventions: Where we diversify the landscape of AI Safety by creating and organizing a variety of events and initiatives aimed at developing and strengthening interdisciplinary connections.
1. We do this through our Field Building Events, Speaker Series and Reading Group

What We Are Doing Differently

We have a focused inside view which we are betting on. Rather than aiming to accelerate agendas which are already present in the field and have people who can lead them, we aim to produce new research leads which are pursuing novel bets. Not only does this diversify the field, it also adds relief to the mentorship shortage.[1]
- (See Appendix B for more detail on how our research taste relates to other actors in the space.)
We are tapping into the significant potential of onboarding established academics with deep knowledge in areas relevant but as-of-yet neglected expertise into AI safety. This is something which has proved to be very advantageous in the field (c.f. Developmental Interpretability, Computational Mechanics) but remains undervalued. We are advised by Alexander Gietelink Oldenziel, who has a strong track record in this type of work, having, among others, scouted Dan Murfet, thereby originating the agenda of Developmental Interpretability.
- We are aiming at a higher caliber talent demographic than other talent interventions in the field. Our distribution of researchers is mid-PhD to Professor. (Also see notes 1 & 2)
- We actively address a common bottleneck in getting senior people from theoretical backgrounds into alignment–that is, their lack of exposure to ML research engineering. We pair them with experienced ML research engineers[2] and introduce them to a standard of best practices for code maintenance that allows for scalable and reproducible research which doesn’t compromise on the speed of empirical iteration.
- We have a solid understanding of the AI Safety landscape backed by years of experience in the field, and are able to quickly get newcomers up to date. We do this in a personalized manner which allows us to quickly refine what would otherwise be misguided ideas from new folks coming into the field.
We are institutionalizing research taste & the diversification of research bets. Due to our close collaboration with technical experts in a wide variety of theoretical domains, we are able to continuously refine our research & talent-scouting portfolio. Having in-house technical management allows us to not only understand the key risks and promises specific to each research thread, but also to make connections between different theoretical domains which would otherwise have gone unnoticed by their respective specialists. This makes our research output greater than the sum of its parts.

Evidence & Track Record

Research highlights

Transformers represent belief state geometry in their residual stream, Shai et al 2024 is the most upvoted research post of 2024 on LessWrong & Alignment Forum and is currently under review for NeurIPS 2024. It shares initial empirical results in support of Computational Mechanics as a fruitful framework for pushing the boundaries of AI interpretability.
Tort Law as a Tool for Mitigating Catastrophic Risk from Artificial Intelligence, Weil 2024. Initial work conducted as part of the 2023 Fellowship. This work has received positive reviews from the AI Safety community, as well as media coverage by Vox.
Can reinforcement learning model learning across development? Online lifelong learning through adaptive intrinsic motivation, Sandbrink et al. 2024. This project started in our 2022 fellowship, published at CogSci 2024
Bayesian learning of social norms, Oldenburg and Zhi-Xuan 2024. This project started as part of 2023 fellowship, published at AAMAS 2024
An information-theoretic study of lying in LLMs, Dombrowski and Corlouer 2024. This project was begun and completed during our first affiliateship cohort earlier this year. It was accepted to the ICML 2024 Workshop LLMs and Cognition.

Talent Building

Facilitating the transition of senior academics into AI Safety Field

Fernando Rosas, is a current affiliate who we are helping transition into the field. He is a recognized leader in the study of multi-scale complex systems with more than 125 peer-reviewed articles in relevant scientific journals including Nature Physics, Nature Neuroscience, and the Proceedings of the National Academy of Science. He is also a Lecturer in Informatics at the University of Sussex, a Research Fellow at Imperial College London and the University of Oxford.
- Testimonial from Fernando: “[...]when I first looked into the AI landscape I only saw groups of engineers trying to brute-force bigger and faster models together with poorly informed hand-wavy discussions about AI policies. Given that scenario, I had no idea of how to start getting into the kind of things I wanted to do. Everything changed after I went to a PIBBSS retreat in March of this year: there I found a warm and committed community of highly skilled individuals doing their best to crack a rigorous and fundamental understanding of how deep learning works and how we can make it safe.”
Adam Shai, experimental and cognitive neuroscientist with a PhD from Caltech and over a decade of experience investigating the neural basis of intelligent behavior, most recently as a researcher at Stanford. Founded Simplex and helped establish computational mechanics as a viable research path in AI Safety. Adam has mentioned that “Coming from a PhD in academia, PIBBSS was probably my best research experience so far!”.
Yevgeny Liokumovich, Assistant Professor in Mathematics, is currently working as a fellow to best apply his area of expertise, Geometric Analysis, to a relevant topic in AI Safety, Deep Learning Generalization. Yevgeny hopes to soon supervise his grad students and postdocs in working on this impactful line of research.
Gabriel Weil, Assistant Professor in Law, established Tort Law as a plausible framework for legal liability for mitigating AI existential risk. This work has received positive reviews from the AI Safety community, as well as media coverage by Vox.

Alumni Employment Success
- Alumni have gone on to work in AI Safety organizations such as UK AISI, Simplex and APART, as well as work on the safety teams of larger labs such as Anthropic and OpenAI.
Very Positive Mentor Feedback
- “20% [they] will clearly surpass everyone else in AI alignment before we all die” Abram Demski on Martín Soto
- “I strongly believe [these] findings are important for the interpretability community” Jan Hendrik Kirchner on Nischal Mainali’s work on A Geometry Viewpoint for Interpretability.

See our website for a short description of our affiliates' profiles.

Field Building

Connected to an increasingly talented candidate pool and have refined our research taste over the years.
Improved our ability to act as a sort of Schelling point for PIBBSS-style research and led various field building efforts with epistemic communities foreign to the alignment sphere.
To date, we have run over 15 research events, retreats or workshops. Some more recent events include: a hybrid Hackathon on Computational Mechanics and AI safety (June 2024), a 5-day workshop bringing together scholars in Physics of Information and AI Safety (March 2024), a workshop titled “Agent Foundations for AI Alignment” (October 2023), as well as quarterly researcher retreats for affiliates & close collaborators.

Funding Ask

Mainline budget ask is USD 3 million. This includes:

18 months funding for operations
7 full-time technical staff on average
Fellowship with 20 fellows
3 reading groups per year
Hiring a Fellowship Lead to run the fellowship next summer (as was done this year)

Upper Bound Scenario USD 5.1 million

24 months of funding for operations
8 full-time technical staff on average
2 years of Fellowship with 20 fellows
5 reading groups per year
Hiring a Fellowship Lead to run the fellowship next summer (as was done this year)

Bare Bones Scenario USD 1.2 million

12 months of funding for operations
4 full-time technical staff on average
10 Fellows in Fellowship
2 reading groups per year

Note: We are happy to have you fund only the programs you are happy to see executed, and you can (for example) fund only affiliates, or only a marginal fellow. Smaller donations, not earmarked, will be used for what we see as the most efficient use of marginal dollars (as of January 2025, we would use it to add more fellows before adding more affiliates or other technical staff).

About PIBBSS & the Team

PIBBSS was founded in 2021 by Nora Ammann and TJ as a research initiative aiming to draw insights & talent from fields studying intelligent behavior in natural systems towards progress on questions in AI risk and safety. Since its inception, PIBBSS has supported ~50 researchers for 3-month full-time fellowships, is currently supporting 6 in-house, long-term research affiliates, and has organized 15+ AI safety research events/workshops. Over the years, we have built a substantive and vibrant research community, spanning many disciplines across academia and industry, both inside and outside of AI safety. (For a brief overview of PIBBSS initiatives other than our Research Affiliate program, see Appendix C.)

Our executive team consists of:

Lucas Teixeira (Research) has an interdisciplinary background in Philosophy, Anthropology, and Computer Science. They act as a research manager and collaborator for the various research threads currently pursued by PIBBSS affiliates. Prior to joining the PIBBSS team, they worked at Conjecture, first as an Applied Epistemologist, where they helped co-run Refine as well as used insights from History and Philosophy of Science to unblock researchers, and then later as a Research Engineer on the Cognitive Emulation agenda.
Dušan D. Nešić (Operations) has lead PIBBSS’ operations since Autumn 2022. He has 10 years of experience running NGOs (including Rotary and EA Serbia) and private companies and scaling them from 0 to 1 and beyond. He has academic teaching experience in Economics and Finance and is pursuing a PhD in Finance with a focus on Financial Institution design under TAI. He serves as a founder and board member of ENAIS and a Trustee of CEEALAR.

Collaborators who we’ve hired for the summer to work on setting the technical directions and priorities for the affiliateship, as well as begin the talent scouting for the affiliateship:

Dmitry Vaintrob: Dmitry is an alignment researcher whose work spans developing a Mathematical Framework for Computation in Superposition and investigations into Grokking. In a previous life, he was a Morrey Visiting Assistant Professor at UC Berkeley with a PhD from MIT, where his interests lay at the intersection of logarithmic algebraic geometry, higher algebra, mirror symmetry, number theory, and Quantum Field Theory.
Lauren Greenspan: Lauren is an independent alignment researcher with a background in high energy physics. She was a participant in Neel Nanda's online MATS training program, where she worked on understanding superposition in attention heads. She received her Ph. D from University of Porto and has previously held positions at NYU as an adjunct professor, academic advisor, and a postdoc researcher. CV here.
Eric Winsor: Eric is an independent AI Safety researcher whose previous work has touched on various aspects of interpretability including interpreting neural networks through the polytope lens, re-examinations of LayerNorm, mapping out statistical regularities in LLM Internals as well as understanding macroscopic universal motifs in the high-level information flow in LLM internals. Eric studied mathematics at the University of Michigan, was a computer science graduate student at Stanford and was previously employed as a research engineer at Conjecture.

Our board provides us with substantive support and research & strategy advice. At present, the board consists of:

Alexander Gietelink Oldenziel is Director of Strategy and Outreach at Timaeus and a PhD student in Theoretical Computer Science at University College London. They have been critical in identifying and bringing into AI Safety academics such as Dan Murphet or Paul Riechers, making them, together with their strong research taste, an excellent advisor to our research scouting efforts.
Ben Goldhaber is the Director of FAR Labs. He’s passionate about diversifying research bets in AI safety and building intellectually generative cultures for high-impact research. He has extensive experience in leading and supporting organizations. He has previously worked in operational and engineering roles at top tech companies and early stage startups.
Gabriel Weil is an Assistant Professor of Law at Touro University, and former PIBBSS fellow. He is doing work at the intersection of Legal Theory and AI safety, such as this paper on the role of Tort Law in mitigating AI Catastrophic Risks. We are excited to be able to draw on their expertise on legal and policy matters, as well as their academic experience and nuanced strategic outlook.
Nora Ammann works as a Technical Specialist for the Safeguarded AI programme at the UK’s Advanced Research & Innovation Agency. She co-founded and directed PIBBSS until Spring 2024 and continues her support as President of the Board. She has pursued various research and field-building efforts in AI safety for the last >6 years. Her research background spans political theory, complex systems and philosophy of science. She is a PhD student in Philosophy and AI and a Foresight Fellow. Her prior experience includes work with the Future of Humanities Institute (University of Oxford), the Alignment of Complex Systems research group, the Epistemic Forecasting project, and the Simon’s Institute for Longterm Governance.
Tan Zhi Xuan is a PhD student at MIT’s Probabilistic Computing Project and Computational Cognitive Science lab, advised by Vikash Mansinghka and Josh Tenenbaum. Xuan has a deep understanding of PIBBSS due to her being a part of the journey from the start. We are excited to draw on her appreciation and breadth of knowledge when it comes to underexplored bets in AI Safety.

For more background on PIBBSS, you may wish to consult:

On our website: pibbss.ai
Announcement post of the Affiliate prototype, January 2024
Other posts on LessWrong with the ‘PIBBSS’-tag

Appendix

A. Strategic picture & Threat models

>> When humans first started building bridges based on a trial and error approach, a lot of those bridges would soon and unexpectedly collapse. A couple of hundred years, and many insights later, civil engineers are able to make highly precise and reliable statements of the type: “This bridge has less than a 1 in a billion chance of collapsing if it’s exposed to weights below 3 tonnes”. In the case of AI, it matters that we get there faster. <<

In terms of threat models, we put significant credence on relatively short timelines (~3-15y) for general, long-horizon superintelligent AI capabilities, and relatively sharp left turns (i.e. level 8 and above in this taxonomy). However, we also take seriously risks scenarios within a ‘smooth left turn’ regime as a result of (multiple) AI transitions, which we expect to unfold over the coming years up until when or if a sharp left turn manifests. The latter scenario brings with it associated structural risks as a part of humanity increasingly delegating more and more of its partial agency towards general AI systems. [3]

A cornerstone of our strategy centers around the acceleration of a science of AI. Progress towards a mature science of AI is critical in both the ‘sharp left turn’ and the ‘smooth left turn’ world, and thus, our strategy has the benefit of being robustly useful across both threat scenarios. The core point: Only with a strong enough theoretical basis – in particular, one that has been empirically and instrumentally validated – can we move from a regime where we build systems we don’t understand and that are likely to cause harm in expected as well as unexpected ways, to a regime where we know how to build systems, from the ground up, such that they reliably depict the properties we want them to have.

Apriori, such a “safe-by-design” system can be built via two main avenues. On one route, we build systems that – in virtue of their top-down design – are constrained to act only in specified-to-be-safe ways. The ‘Guaranteed Safe AI’ family of approaches is a central example of this avenue.

On the second route – which represents our current research focus –, we arrive at similarly rigorous levels of safety assurance ‘from the bottom up’. Ultimately, we develop the sort of scientific understanding of AI systems that gives us the ability to make confident, calibrated, robust, and precise statements about the system’s behavior across training and its contexts of use thanks to the system having been designed or evaluated based on a rigorous theoretical foundation. In other words, the goal is to develop training stories and containment measures capable of giving us justified credence that our engineering artifact will work as intended, reliably, on the first critical try commensurable with what we know from epistemic practices in the physical sciences when they’re at their best.

B. Differentiating Our Research Taste

How does our research taste differentiate from other actors in the AI Safety space? We will provide some – incomplete but hopefully informative – pointers in the direction of answering this question.

Purely Theoretical Research (e.g. (traditionally) MIRI, ARC Theory, Learning Theoretic Agenda, Natural Abstractions, etc...):
- With this camp, we share the concerns that the kinds of safety guarantees which we need for ASI to go well require substantial scientific conceptual innovation and that scientific work without theoretical engagement is likely to fall short.
- However, we tend to diverge from this camp in the value which we see empirical research providing for conceptual innovation, and the possibility of tactfully grounding experiments of current day models in a way which provides us to generalize towards more model agnostic theoretical results.
Prosaic Research (e.g. MechInterp, Activation Engineering, Behavioral Evaluations, Alignment Capabilities (i.e. RLHF, Constitutional AI, ‘Make AI Solve It’))
- We are in agreement that fruitful scientific engagement towards safety can and should be led by empirical engagement with present day models, and we also share the virtue of quick feedback loops of empirical iteration.
- However, there are many aspects of ML culture which many of these agendas are following which are in disagreement with our scientific virtues.
- Behavioral Evaluations
  - Ultimately, black box evaluations are insufficient. There is community consensus that white box evaluations are needed.
- Mech Interp
  - Not enough effort in falsifying the null hypothesis, relying on semantic interpretations and labeling of features is not robust to spurious correlations and there is currently not enough emphasis put on establishing the faithfulness of interpretations.
- Alignment Capabilities (Adversarial Robustness, RLHF, Make AI Solve It, etc…)
  - We expect a significant amount of industry effort to be leveraged towards solving these issues. Unfortunately, we expect this to be the most vulnerable to the epistemic vices of ML culture, notably hill climbing on benchmarks.

Notes

[1] As point in case, one of our affiliates (Adam Shai) is taking on scholars at the next MATS iteration, one of our 2024 fellows (Yevgeny Liokumovich, Assistant Professor) plans to direct his own students towards projects in SLT/Developmental Interpretability after the end of the fellowship.

[2] For example, we were able to significantly improve the quality and throughput of the experimental side of Adam Shai’s work. Lucas brings in relevant experience having worked as a research engineer at Conjecture under supervision of Sid Black (now at AISI), co-founder of EleutherAI and author of Neox-20B which was one of the largest open source LLMs before the ChatGPT revolution and co-author of The Pile which is a standard dataset used in LLM research with nearly 500 citations.

[3] Examples of such scenarios include Robust Agent Agnostic Processes, ascended economy, ‘human enfeeblement,’ epistemic insecurity, the exploitation of societal vulnerabilities, especially of critical cyber-physical infrastructure, and more.

PIBBSS - General Programs funding or specific funding

Donate

Project summary

Our Mission

Our Programs

What We Are Doing Differently

Evidence & Track Record

Research highlights

Talent Building

Field Building

Funding Ask

About PIBBSS & the Team

Appendix

A. Strategic picture & Threat models

B. Differentiating Our Research Taste

Notes

Main points in favor of this grant

Donor's main reservations

Process for deciding amount

Conflicts of interest

PIBBSS - General Programs funding or specific funding

Donate

​​Project summary

Our Mission

Our Programs

What We Are Doing Differently

Evidence & Track Record

Research highlights

Talent Building

Field Building

Funding Ask

About PIBBSS & the Team

Appendix

A. Strategic picture & Threat models

B. Differentiating Our Research Taste

Notes

Main points in favor of this grant

Donor's main reservations

Process for deciding amount

Conflicts of interest

Project summary