Artwork

Contenido proporcionado por The Nonlinear Fund. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente The Nonlinear Fund o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.
Player FM : aplicación de podcast
¡Desconecta con la aplicación Player FM !

LW - Questions for labs by Zach Stein-Perlman

15:15
 
Compartir
 

Manage episode 415706734 series 3337129
Contenido proporcionado por The Nonlinear Fund. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente The Nonlinear Fund o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Questions for labs, published by Zach Stein-Perlman on May 1, 2024 on LessWrong. Associated with AI Lab Watch, I sent questions to some labs a week ago (except I failed to reach Microsoft). I didn't really get any replies (one person replied in their personal capacity; this was very limited and they didn't answer any questions). Here are most of those questions, with slight edits since I shared them with the labs + questions I asked multiple labs condensed into the last two sections. Lots of my questions are normal I didn't find public info on this safety practice and I think you should explain questions. Some are more like it's pretty uncool that I can't find the answer to this - like: breaking commitments, breaking not-quite-commitments and not explaining, having ambiguity around commitments, and taking credit for stuff[1] when it's very unclear that you should get credit are pretty uncool. Anthropic Internal governance stuff (I'm personally particularly interested in these questions - I think Anthropic has tried to set up great internal governance systems and maybe it has succeeded but it needs to share more information for that to be clear from the outside): Who is on the board and what's up with the LTBT?[2] In September, Vox reported "The Long-Term Benefit Trust . . . will elect a fifth member of the board this fall." Did that happen? (If so: who is it? when did this happen? why haven't I heard about this? If not: did Vox hallucinate this or did your plans change (and what is the plan)?) What are the details on the "milestones" for the LTBT and how stockholders can change/abrogate the LTBT? Can you at least commit that we'd quickly hear about it if stockholders changed/abrogated the LTBT? (Why hasn't this been published?) What formal powers do investors/stockholders have, besides abrogating the LTBT? (can they replace the two board members who represent them? can they replace other board members?) What does Anthropic owe to its investors/stockholders? (any fiduciary duty? any other promises or obligations?) I think balancing their interests with pursuit of the mission; anything more concrete? I'm confused about what such balancing-of-interests entails. Oh well. Who holds Anthropic shares + how much? At least: how much is Google + Amazon? Details of when the RSP triggers evals: "During model training and fine-tuning, Anthropic will conduct an evaluation of its models for next-ASL capabilities both (1) after every 4x jump in effective compute, including if this occurs mid-training, and (2) every 3 months to monitor fine-tuning/tooling/etc improvements." Assuming effective compute scales less than 4x per 3 months, the 4x part will never matter, right? (And insofar as AI safety people fixate on the "4x" condition, they are incorrect to do so?) Or do you have different procedures for a 4x-eval vs a 3-month-eval, e.g. the latter uses the old model just with new finetuning/prompting/scaffolding/etc.? Evaluation during deployment? I am concerned that improvements in fine-tuning and inference-time enhancements (prompting, scaffolding, etc.) after a model is deployed will lead to dangerous capabilities. Especially if models can be updated to increase their capabilities without evals. Do you do the evals during deployment? The RSP says "If it becomes apparent that the capabilities of a deployed model have been under-elicited and the model can, in fact, pass the evaluations, then we will" do stuff. How would that become apparent - via the regular evals or ad-hoc just-noticing? If you do do evals during deployment: suppose you have two models such that each is better than the other at some tasks (perhaps because a powerful model is deployed and a new model is in progress with a new training setup). Every 3 months, would you do full evals on both models, or what? Deployment ...
  continue reading

1658 episodios

Artwork
iconCompartir
 
Manage episode 415706734 series 3337129
Contenido proporcionado por The Nonlinear Fund. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente The Nonlinear Fund o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Questions for labs, published by Zach Stein-Perlman on May 1, 2024 on LessWrong. Associated with AI Lab Watch, I sent questions to some labs a week ago (except I failed to reach Microsoft). I didn't really get any replies (one person replied in their personal capacity; this was very limited and they didn't answer any questions). Here are most of those questions, with slight edits since I shared them with the labs + questions I asked multiple labs condensed into the last two sections. Lots of my questions are normal I didn't find public info on this safety practice and I think you should explain questions. Some are more like it's pretty uncool that I can't find the answer to this - like: breaking commitments, breaking not-quite-commitments and not explaining, having ambiguity around commitments, and taking credit for stuff[1] when it's very unclear that you should get credit are pretty uncool. Anthropic Internal governance stuff (I'm personally particularly interested in these questions - I think Anthropic has tried to set up great internal governance systems and maybe it has succeeded but it needs to share more information for that to be clear from the outside): Who is on the board and what's up with the LTBT?[2] In September, Vox reported "The Long-Term Benefit Trust . . . will elect a fifth member of the board this fall." Did that happen? (If so: who is it? when did this happen? why haven't I heard about this? If not: did Vox hallucinate this or did your plans change (and what is the plan)?) What are the details on the "milestones" for the LTBT and how stockholders can change/abrogate the LTBT? Can you at least commit that we'd quickly hear about it if stockholders changed/abrogated the LTBT? (Why hasn't this been published?) What formal powers do investors/stockholders have, besides abrogating the LTBT? (can they replace the two board members who represent them? can they replace other board members?) What does Anthropic owe to its investors/stockholders? (any fiduciary duty? any other promises or obligations?) I think balancing their interests with pursuit of the mission; anything more concrete? I'm confused about what such balancing-of-interests entails. Oh well. Who holds Anthropic shares + how much? At least: how much is Google + Amazon? Details of when the RSP triggers evals: "During model training and fine-tuning, Anthropic will conduct an evaluation of its models for next-ASL capabilities both (1) after every 4x jump in effective compute, including if this occurs mid-training, and (2) every 3 months to monitor fine-tuning/tooling/etc improvements." Assuming effective compute scales less than 4x per 3 months, the 4x part will never matter, right? (And insofar as AI safety people fixate on the "4x" condition, they are incorrect to do so?) Or do you have different procedures for a 4x-eval vs a 3-month-eval, e.g. the latter uses the old model just with new finetuning/prompting/scaffolding/etc.? Evaluation during deployment? I am concerned that improvements in fine-tuning and inference-time enhancements (prompting, scaffolding, etc.) after a model is deployed will lead to dangerous capabilities. Especially if models can be updated to increase their capabilities without evals. Do you do the evals during deployment? The RSP says "If it becomes apparent that the capabilities of a deployed model have been under-elicited and the model can, in fact, pass the evaluations, then we will" do stuff. How would that become apparent - via the regular evals or ad-hoc just-noticing? If you do do evals during deployment: suppose you have two models such that each is better than the other at some tasks (perhaps because a powerful model is deployed and a new model is in progress with a new training setup). Every 3 months, would you do full evals on both models, or what? Deployment ...
  continue reading

1658 episodios

Todos los episodios

×
 
Loading …

Bienvenido a Player FM!

Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.

 

Guia de referencia rapida