Sorry folks, this got shot down by conflict of interest review at work. I'm not sure how to delete an application at this stage, so I'll just leave this here until the deadline passes.
You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Current LLM agents make heavy use of pass@k to score impressively on hard benchmarks (most prominently RE-Bench and FrontierMath). In practice, many important tasks cannot be solved with pass@k due to lack of ground truth on which proposal is best. The obvious thing to try is pass@k where grading of solutions is done by the LLM itself. For example, run k LLM attempts at setting up a training run within a compute budget, then have the LLM inspect the k proposed codebases and estimate their performance on a set of metrics, then select the one that seems like it will produce the best results.
This project is useful insofar as better capability elicitation from current frontier models is useful. If this technique does work, we'll know that fact sooner, and will have access to improved LLM agent capabilities on tasks that cannot be solved with pass@k. This includes useful tasks like open-ended software development, and dangerous tasks like risky deception attempts.
I'm not sure how much compute/API funding this project will need. The 1k minimum seems like enough that I'll be able to produce something, but will have to think very carefully about how to spend my budget and probably make some significant compromises. The 3k goal seems like enough to produce good results without being too constrained by budget, provided I still spend carefully.
Task setting: I'll find some set of tasks where I'll be able to check the performance of submissions, and where I can impose restrictions on the LLM such that it can't check performance, without too much distortion. For example, I can pose an algorithm optimization problem, and time the solutions myself. I could forbid the LLM agent from running code at all, but that would probably be too distortionary to tell me much about how LLM agents will perform when given the full set of affordances for best capability elicitation. I could give the LLM a very limited compute budget, which might be better.
Infrastructure: I'll set up task environments, LLM task solvers, LLM proposal evaluators, and ground truth proposal evaluators for each of the selected task settings.
Debugging and Capability Elicitation: I'll run some cheap trials, looking for problems with the infrastructure or behaviors from the LLM components that seem like they could be significantly improved with simple alterations to the setup.
Run the Experiments: money go brrr
Analysis and Writeup: How correlated are the LLM proposal evaluations and ground truth proposal evaluations? How often does the LLM choose the best proposal? What % of the best performance does the LLM selected proposal achieve on average? Is this an effective capability elicitation method? How does it compare across task settings? Are there any obvious next steps? Etc, and publish: probably an arxiv preprint, github repo, and accompanying blog post.
Compute/API budget to run the experiments. I'll work on this unpaid on my own time.
I'm working independently.
I've previously led two SPAR projects developing novel LLM evaluations.
I have to run this project past the SEI conflict of interest approval process. If it is not approved, nothing happens and I'll return all money.
I might decide that a different project is a better use of my time, in which case I'll finish proximate deliverables, write up my work so far, share my code and return remaining money.
I might make some mistake when running an expensive experiment (bad experimental design or bad implementation) that leads to the money being spent without producing useful results. If I catch the mistake, I'll probably pay out of pocket, but I might apply for more funding, or as a last resort just write up the results, describe my mistake, and hope someone else will decide to carry on and find funding. If I don't catch the mistake I'll publish wrong/useless results.
I got a 55k grant (45k stipend, 10k compute/API) from the Open Philanthropy Career Development and Transition Fund for six months' independent research, during which I led my two SPAR projects.
I currently work full-time for the AI Security team at CMU SEI CERT, where I make 95k/year.
James Lucassen
2 months ago
Sorry folks, this got shot down by conflict of interest review at work. I'm not sure how to delete an application at this stage, so I'll just leave this here until the deadline passes.
James Lucassen
about 2 months ago
@Austin whatever you did to mark the other project as not proceeding, could you do that with this one too? Thanks o7
Austin Chen
3 months ago
I don't know much about James, nor about pass@k, but I enjoyed reading his retrospective on work at MIRI as well as the other writings on his blog. (I also appreciated that this proposal is written in a straightforward, no-nonsense manner!)
$1k-$3k seems like a very small amount to request, so I'm happy to speculate on getting this to the minimum ask. I would tentatively encourage James to ask for more funding, if there are other experiments that are in the back of his mind.
Austin Chen
3 months ago
Ah, right after I posted this I saw James also put up another project proposal. I have no inside view on which of these is good/better, and my grant is mostly intended to be support for whichever projects James thinks is worth spending this compute on. (If anyone with expertise in this subjects wants to weigh in, I'd appreciate that!)
James Lucassen
3 months ago
Hey @Austin! Thanks for the offer / vote of confidence. I'm trying out Manifund per Marius Hobbhahn's recommendation. If you think the best way to use the platform would be a bigger lump grant for multiple distinct projects, happy to do that in the future or combine these two proposals now.
Austin Chen
3 months ago
@jlucassen Having both projects is fine! I just wanted to drop a link to your other proposal, in case somebody out there likes your work in general, but thought that this specific proposal was less exciting than the other.
(A combined lump grant to support James's general research projects is also a format we're happy to see. Splitting them up allows donors more fine-grained choice over which projects to fund -- though that's not always a good thing, as oftentimes the grantees have a better sense of where money should be spent, cf Paul Graham on "Donate Unrestricted")