$6070 salary.
sorry.
@alexhb61
Computational Complexity & AI Alignment Independent Researcher
https://github.com/Alexhb61$0 in pending offers
I was introduced to concerns about AGI alignment via Robert Miles work back in college around 2018, and It motivated me to take Lise Getoor's Algorithms and Ethics class at UC Santa Cruz.
Now, I'm an independent researcher whose working on AI Alignment among other things. My current approach to AI Alignment is to use computational complexity techniques on black boxes. I'm of the opinion that post construction aligning black box AI's is infeasible.
Alexander Bistagne
9 days ago
The original Plan was to type something formally up and get it into a conference. I succeeded at the first part and failed at the second (see other updates).
In total, while I think I was able to say something formally, my results were not especially clear. Without an environment conducive to AI safety research, I can not in good faith ask for any more money; thus, I am closing this project for the for-seeable future, and thus this grant is done.
6000$ salary
0$ taxes
0$ conference
Alexander Bistagne
8 months ago
The paper was submitted to and rejected from the Alignment Forum. After reading it with a friend, I noticed serious sequencing issues and unnecessary definitions. I decided I needed a break.
I have since found people willing to give feedback on future drafts, and have joined the Ronin Institute where I might also receive feedback.
I intend to write a less verbose draft with more examples.
This draft will be posted to my github.
I plan on going through at least 3 rounds of comment, review and editing, before giving a lightning talk at the Ronin Institute where I might get more feedback. After another drafting phase, I will submit another post to the Alignment Forum. I will need to acquire some mentorship or a co-author before submitting for peer-review after that. I will consider the project over after peer-review or thorough refutation.
Without more funding, I can only reliably commit to 10 hours a week on this project.
This leg of the project is aiming to have more examples.
I am looking more feedback.
I am looking for a co-author or mentor to help with formalization before peer-review. Contact me via Email if interested or have ideas.
If others are interested in giving private examples or feedback, contact me on discord or email me. Public examples or feedback can be made through Github issues.
Alexander Bistagne
about 1 year ago
Post available on lesswrong and submitted to alignment forum.
https://www.lesswrong.com/posts/JxhJfqfTJB9dkq72K/alignment-is-hard-an-uncomputable-alignment-problem-1
Alexander Bistagne
about 1 year ago
Project is on github. https://github.com/Alexhb61/Alignment/blob/main/Draft_2.pdf
citations and submitting to Alignment forum tommorrow.
Alexander Bistagne
over 1 year ago
This project is nearly at its target, but hit a delay near the beginning of september as I needed to take up other work to pay bills. Hopefully, I will post the minimal paper soon.
Alexander Bistagne
over 1 year ago
Conditional on 6k being reached,
I have committed to submitting an edited draft to the alignment forum on August 23rd
Alexander Bistagne
over 1 year ago
Correction Co-RE is the class not Co-R. The set of problems reducable to the complement of the halting problen
Alexander Bistagne
over 1 year ago
Technical detail worth mentioning; Here is the main theorem of the 6K project:
Proving an immutable code agent with turing-complete architecure in a turing machine simulateable environment has nontrivial betrayal-sensitive alignment is CoR-Hard.
The paper would define nontrivial betrayal-sensitive alignment and some constructions on agents needed in the proof.
Alexander Bistagne
over 1 year ago
Thanks for the encouragement and donation.
The 40K max would be a much larger project than the 6K project which is what I summarized.
6K would cover editing
-Argument refuting testing anti-betrayal alignments in turing complete architecture
-Argument connecting testing alignment to training alignment in single agent architecture
40k would additionally cover developing and editing
-Arguments around anti-betrayal alignments in deterministic or randomized, P or PSPACE complete architecture
-Arguments around short term anti-betrayal alignments
-Arguments connecting do-no-harm alignments to short term antibetrayal alignments
-Arguments refuting general solutions to the stop button problem which transform the utility function in computable reals context
-Arguments around general solutions to the stop button problem with floating point utility functions
-Foundations for modelling mutable agents or subagents
For | Date | Type | Amount |
---|---|---|---|
Manifund Bank | over 1 year ago | withdraw | 70 |
Alignment Is Hard | over 1 year ago | project donation | +70 |
Manifund Bank | over 1 year ago | withdraw | 6000 |
Alignment Is Hard | over 1 year ago | project donation | +1200 |
Alignment Is Hard | over 1 year ago | project donation | +1000 |
Alignment Is Hard | over 1 year ago | project donation | +3800 |