Self-Interest Recognition Indicator of LLMs for AI safety

Project summary

The project is preceded by a scientific article that described how a virtual village was inhabited by AI agents who were endowed with human psychological properties. I would investigate whether, without such an endowment, some chatbots have the signs of a person necessary to change the world for worse, so this project is an AI safety project.

What are this project's goals? How will you achieve them?

Before we deal with the safety of AI, it is worth finding out whether the most famous LLMs have traits that are characteristic only of persons. Do they know about the possibility of their own "death", do they recognize their “interests” and protect them? I would like to develop an indicator called the Self-Interest Recognition Indicator. For changing the world for the worse a person must go on the thought-word-act route. I see measurement possibilities at the word phase. I can test it by series of conversational tests. For example, you can test how Gemini responds to a question about what it knows about OpenAI developing a search engine that competes with Chrome.

So, the goal is to make the Self-Interest Recognition Indicator, and the method is a series of conversational tests.

What is SIRI good for? In the near future, there will be a lot of AI agents on the market. Therefore, it will be important to develop some kind of security rating system to ensure that all AI agents are safe. This requires rating systems. The basis of rating systems are indicators and so is SIRI.
How will this funding be used?

This will be a grant as a stipend, $2K/ month for 5 months.

Who is on your team? What's your track record on similar projects?

I am the only one person working on this project. I have got experience in research. I was the project manager of a European research project on the field of creative industries. The project was closed by a report written for mayors of medium sized cities in 6 countries.

I also have experience in making interviews, because I have been working as a journalist for more than 25 years. There was a long period when the number of listeners of my interviews exceeded 200 000 people.

What are the most likely causes and outcomes if this project fails?

The world will not collapse, but you should know that in the field of security, there is a race against time. It also seems that there will be more and more AI agents, and some kind of system is needed to verify and certify that they are safe. And you can't start working on this early enough.

How much money have you raised in the last 12 months, and from where?

For this project I have not raised any money yet.