What progress have you made since your last update?
Published Tamper-Resistant Safeguards:
https://arxiv.org/abs/2408.00761
https://www.tamper-resistant-safeguards.com/Published Circuitbreakers:
https://arxiv.org/abs/2406.04313Published Safetywashing: a meta-analysis of AI safety
https://arxiv.org/pdf/2407.21792
https://www.safetywashing.ai/Other projects are ongoing. New projects include ones on honesty/deception, on teaching AIs to follow the law, and so on.
What are your next steps?
Publish the superintelligence evals and virology benchmarks.