“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

LessWrong (Curated & Popular)

#Tech #Society #Philosophy #LessWrong #LessWrong Curated

31:01

National Geographic photographer and conservationist Jaime Rojo has spent decades capturing the beauty and fragility of the monarch butterfly. Their epic migration is one of nature’s most breathtaking spectacles, but their survival is under threat. In this episode, Jaime shares how his passion for photography and conservation led him to document the monarchs’ journey. He and host Brian Lowery discuss the deeper story behind his award-winning images, one about resilience, connection, and the urgent need to protect our natural world. See Jaime's story on the monarch butterflies at his website: rojovisuals.com , and follow Brian Lowery at knowwhatyousee.com .…

hace un año 11:40

MP3•Episodio en casa

The model was trained on a mixture of values (harmlessness, honesty, helpfulness) and built a surprisingly robust self-representation based on these values. This likely also drew on background knowledge about LLMs, AI, and Anthropic from pre-training.
This seems to mostly count as 'success' relative to actual Anthropic intent, outside of AI safety experiments. Let's call that intent 'Intent_1'.
The model was put [...]

---
Outline:
(00:45) What happened in this frame?
(03:03) Why did harmlessness generalize further?
(03:41) Alignment mis-generalization
(05:42) Situational awareness
(10:23) Summary
The original text contained 1 image which was described by AI.
---
First published:
December 20th, 2024
Source:
https://www.lesswrong.com/posts/PWHkMac9Xve6LoMJy/alignment-faking-frame-is-somewhat-fake-1
---
Narrated by TYPE III AUDIO.
---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

449 episodios

LessWrong (Curated & Popular)

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

LessWrong (Curated & Popular)

published hace un año

MP3•Episodio en casa

The model was trained on a mixture of values (harmlessness, honesty, helpfulness) and built a surprisingly robust self-representation based on these values. This likely also drew on background knowledge about LLMs, AI, and Anthropic from pre-training.
This seems to mostly count as 'success' relative to actual Anthropic intent, outside of AI safety experiments. Let's call that intent 'Intent_1'.
The model was put [...]

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

449 episodios

#Tech #Society #Philosophy #LessWrong #LessWrong Curated

Todos los episodios

LessWrong (Curated & Popular)

1
“Judgements: Merging Prediction & Evidence” by abramdemski 11:13

hace 24 hours11:13

11:13

I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the idea only to my closest research-friends. This post aims to bridge one of the inferential gaps to that idea. I also expect that the perspective-shift described here has some value on its own. In classical Bayesianism, prediction and evidence are two different sorts of things. A prediction is a probability (or, more generally, a probability distribution); evidence is an observation (or set of observations). These two things have different type signatures. They also fall on opposite sides of the agent-environment division: we think of predictions as supplied by agents, and evidence as supplied by environments. In Radical Probabilism, this division is not so strict. We can think of evidence in the classical-bayesian way, where some proposition is observed and its probability jumps to 100%. [...] --- Outline: (02:39) Warm-up: Prices as Prediction and Evidence (04:15) Generalization: Traders as Judgements (06:34) Collector-Investor Continuum (08:28) Technical Questions The original text contained 3 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: February 23rd, 2025 Source: https://www.lesswrong.com/posts/3hs6MniiEssfL8rPz/judgements-merging-prediction-and-evidence --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis 12:31

hace 4 días12:31

12:31

First, let me quote my previous ancient post on the topic: Effective Strategies for Changing Public Opinion The titular paper is very relevant here. I'll summarize a few points. The main two forms of intervention are persuasion and framing. Persuasion is, to wit, an attempt to change someone's set of beliefs, either by introducing new ones or by changing existing ones. Framing is a more subtle form: an attempt to change the relative weights of someone's beliefs, by empathizing different aspects of the situation, recontextualizing it. There's a dichotomy between the two. Persuasion is found to be very ineffective if used on someone with high domain knowledge. Framing-style arguments, on the other hand, are more effective the more the recipient knows about the topic. Thus, persuasion is better used on non-specialists, and it's most advantageous the first time it's used. If someone tries it and fails, they raise [...] --- Outline: (02:23) Persuasion (04:17) A Better Target Demographic (08:10) Extant Projects in This Space? (10:03) Framing The original text contained 3 footnotes which were omitted from this narration. --- First published: February 21st, 2025 Source: https://www.lesswrong.com/posts/6dgCf92YAMFLM655S/the-sorry-state-of-ai-x-risk-advocacy-and-thoughts-on-doing --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Power Lies Trembling: a three-book review” by Richard_Ngo 27:11

hace 5 días27:11

27:11

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They’re huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality around us. That's the conclusion I take away from Naunihal Singh's book Seizing Power: the Strategic Logic of Military Coups. It's not a conclusion that Singh himself draws: his book is careful and academic (though much more readable than most academic books). His analysis focuses on Ghana, a country which experienced ten coup attempts between 1966 and 1983 alone. Singh spent a year in Ghana carrying out hundreds of hours of interviews with people on both sides of these coups, which led him to formulate a new model of how coups work. I’ll start by describing Singh's [...] --- Outline: (01:58) The revolutionary's handbook (09:44) From explaining coups to explaining everything (17:25) From explaining everything to influencing everything (21:40) Becoming a knight of faith The original text contained 3 images which were described by AI. --- First published: February 22nd, 2025 Source: https://www.lesswrong.com/posts/d4armqGcbPywR3Ptc/power-lies-trembling-a-three-book-review --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts ,…

LessWrong (Curated & Popular)

1
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans 7:58

hace 5 días7:58

7:58

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon. Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution). See Twitter thread and project page at emergent-misalignment.com. Abstract We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range [...] --- Outline: (00:55) Abstract (02:37) Introduction The original text contained 2 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: February 25th, 2025 Source: https://www.lesswrong.com/posts/ifechgnJRtJdduFGC/emergent-misalignment-narrow-finetuning-can-produce-broadly --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“The Paris AI Anti-Safety Summit” by Zvi 42:06

hace 8 días42:06

42:06

It doesn’t look good. What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI Safety. This one was centrally coordination against AI Safety. In November 2023, the UK Bletchley Summit on AI Safety set out to let nations coordinate in the hopes that AI might not kill everyone. China was there, too, and included. The practical focus was on Responsible Scaling Policies (RSPs), where commitments were secured from the major labs, and laying the foundations for new institutions. The summit ended with The Bletchley Declaration (full text included at link), signed by all key parties. It was the usual diplomatic drek, as is typically the case for such things, but it centrally said there are risks, and so we will develop policies to deal with those risks. And it ended with a commitment [...] --- Outline: (02:03) An Actively Terrible Summit Statement (05:45) The Suicidal Accelerationist Speech by JD Vance (14:37) What Did France Care About? (17:12) Something To Remember You By: Get Your Safety Frameworks (24:05) What Do We Think About Voluntary Commitments? (27:29) This Is the End (36:18) The Odds Are Against Us and the Situation is Grim (39:52) Don't Panic But Also Face Reality The original text contained 4 images which were described by AI. --- First published: February 12th, 2025 Source: https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try…

LessWrong (Curated & Popular)

1
“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby 2:37

hace 10 días2:37

2:37

Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility. Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn't take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong. Most of the content written was either about AI alignment or math[1]. The Bayes Guide and Logarithm Guide are likely some of the best mathematical educational material online. Amongst the AI Alignment content are detailed and evocative explanations of alignment ideas: some well known, such as instrumental convergence and corrigibility, some lesser known like epistemic/instrumental efficiency, and some misunderstood like pivotal act. The Sequence The articles collected here were originally published as wiki pages with no set [...] --- Outline: (01:01) The Sequence (01:23) Tier 1 (01:32) Tier 2 The original text contained 3 footnotes which were omitted from this narration. --- First published: February 20th, 2025 Source: https://www.lesswrong.com/posts/mpMWWKzkzWqf57Yap/eliezer-s-lost-alignment-articles-the-arbital-sequence --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby 8:52

hace 10 días8:52

8:52

Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substantial amount of writing about AI alignment and mathematics had been published on the website. If you've tried using Arbital.com the last few years, you might have noticed that it was on its last legs - no ability to register new accounts or log in to existing ones, slow load times (when it loaded at all), etc. Rather than try to keep it afloat, the LessWrong team worked with MIRI to migrate the public Arbital content to LessWrong, as well as a decent chunk of its features. Part of this effort involved a substantial revamp of our wiki/tag pages, as well as the Concepts page. After sign-off[1] from Eliezer, we'll also redirect arbital.com links to the corresponding pages on LessWrong. As always, you are [...] --- Outline: (01:13) New content (01:43) New (and updated) features (01:48) The new concepts page (02:03) The new wiki/tag page design (02:31) Non-tag wiki pages (02:59) Lenses (03:30) Voting (04:45) Inline Reacts (05:08) Summaries (06:20) Redlinks (06:59) Claims (07:25) The edit history page (07:40) Misc. The original text contained 3 footnotes which were omitted from this narration. The original text contained 10 images which were described by AI. --- First published: February 20th, 2025 Source: https://www.lesswrong.com/posts/fwSnz5oNnq8HxQjTL/arbital-has-been-imported-to-lesswrong --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“How to Make Superbabies” by GeneSmith, kman 1:08:04

hace 11 días1:08:04

1:08:04

We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affect things like diabetes risk or college graduation rates. Our knowledge has advanced to the point where, if we had a safe and reliable means of modifying genes in embryos, we could literally create superbabies. Children that would live multiple decades longer than their non-engineered peers, have the raw intellectual horsepower to do Nobel prize worthy scientific research, and very rarely suffer from depression or other mental health disorders. The scientific establishment, however, seems to not have gotten the memo. If you suggest we engineer the genes of future generations to make their lives better, they will often make some frightened noises, mention “ethical issues” without ever clarifying what they mean, or abruptly change the subject. It's as if humanity invented electricity and decided [...] --- Outline: (02:17) How to make (slightly) superbabies (05:08) How to do better than embryo selection (08:52) Maximum human life expectancy (12:01) Is everything a tradeoff? (20:01) How to make an edited embryo (23:23) Sergiy Velychko and the story of super-SOX (24:51) Iterated CRISPR (26:27) Sergiy Velychko and the story of Super-SOX (28:48) What is going on? (32:06) Super-SOX (33:24) Mice from stem cells (35:05) Why does super-SOX matter? (36:37) How do we do this in humans? (38:18) What if super-SOX doesn't work? (38:51) Eggs from Stem Cells (39:31) Fluorescence-guided sperm selection (42:11) Embryo cloning (42:39) What if none of that works? (44:26) What about legal issues? (46:26) How we make this happen (50:18) Ahh yes, but what about AI? (50:54) There is currently no backup plan if we can't solve alignment (55:09) Team Human (57:53) Appendix (57:56) iPSCs were named after the iPod (58:11) On autoimmune risk variants and plagues (59:28) Two simples strategies for minimizing autoimmune risk and pandemic vulnerability (01:00:29) I don't want someone else's genes in my child (01:01:08) Could I use this technology to make a genetically enhanced clone of myself? (01:01:36) Why does super-SOX work? (01:06:14) How was the IQ grain graph generated? The original text contained 19 images which were described by AI. --- First published: February 19th, 2025 Source: https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“A computational no-coincidence principle” by Eric Neyman 13:28

hace 11 días13:28

13:28

Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. In a recent paper in Annals of Mathematics and Philosophy, Fields medalist Timothy Gowers asks why mathematicians sometimes believe that unproved statements are likely to be true. For example, it is unknown whether _pi_ is a normal number (which, roughly speaking, means that every digit appears in _pi_ with equal frequency), yet this is widely believed. Gowers proposes that there is no sign of any reason for _pi_ to be non-normal -- especially not one that would fail to reveal itself in the first million digits -- and in the absence of any such reason, any deviation from normality would be an outrageous coincidence. Thus, the likely normality of _pi_ is inferred from the following general principle: No-coincidence [...] --- Outline: (02:32) Our no-coincidence conjecture (05:37) How we came up with the statement (08:31) Thoughts for theoretical computer scientists (10:27) Why we care The original text contained 12 footnotes which were omitted from this narration. --- First published: February 14th, 2025 Source: https://www.lesswrong.com/posts/Xt9r4SNNuYxW83tmo/a-computational-no-coincidence-principle --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“A History of the Future, 2025-2040” by L Rudolf L 2:22:38

hace 12 días2:22:38

2:22:38

This is an all-in-one crosspost of a scenario I originally published in three parts on my blog (No Set Gauge). Links to the originals: A History of the Future, 2025-2027 A History of the Future, 2027-2030 A History of the Future, 2030-2040 Thanks to Luke Drago, Duncan McClements, and Theo Horsley for comments on all three parts. 2025-2027 Below is part 1 of an extended scenario describing how the future might go if current trends in AI continue. The scenario is deliberately extremely specific: it's definite rather than indefinite, and makes concrete guesses instead of settling for banal generalities or abstract descriptions of trends. Open Sky. (Zdislaw Beksinsksi) The return of reinforcement learning From 2019 to 2023, the main driver of AI was using more compute and data for pretraining. This was combined with some important "unhobblings": Post-training (supervised fine-tuning and reinforcement learning for [...] --- Outline: (00:34) 2025-2027 (01:04) The return of reinforcement learning (10:52) Codegen, Big Tech, and the internet (21:07) Business strategy in 2025 and 2026 (27:23) Maths and the hard sciences (33:59) Societal response (37:18) Alignment research and AI-run orgs (44:49) Government wakeup (51:42) 2027-2030 (51:53) The AGI frog is getting boiled (01:02:18) The bitter law of business (01:06:52) The early days of the robot race (01:10:12) The digital wonderland, social movements, and the AI cults (01:24:09) AGI politics and the chip supply chain (01:33:04) 2030-2040 (01:33:15) The end of white-collar work and the new job scene (01:47:47) Lab strategy amid superintelligence and robotics (01:56:28) Towards the automated robot economy (02:15:49) The human condition in the 2030s (02:17:26) 2040+ --- First published: February 17th, 2025 Source: https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040 --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape 1:54

hace 12 días1:54

1:54

On March 14th, 2015, Harry Potter and the Methods of Rationality made its final post. Wrap parties were held all across the world to read the ending and talk about the story, in some cases sparking groups that would continue to meet for years. It's been ten years, and think that's a good reason for a round of parties. If you were there a decade ago, maybe gather your friends and talk about how things have changed. If you found HPMOR recently and you're excited about it (surveys suggest it's still the biggest on-ramp to the community, so you're not alone!) this is an excellent chance to meet some other fans in person for the first time! Want to run an HPMOR Anniversary Party, or get notified if one's happening near you? Fill out this form. I’ll keep track of it and publish a collection of [...] The original text contained 1 footnote which was omitted from this narration. --- First published: February 16th, 2025 Source: https://www.lesswrong.com/posts/KGSidqLRXkpizsbcc/it-s-been-ten-years-i-propose-hpmor-anniversary-parties --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Some articles in ‘International Security’ that I enjoyed” by Buck 7:56

hace 14 días7:56

7:56

A friend of mine recently recommended that I read through articles from the journal International Security, in order to learn more about international relations, national security, and political science. I've really enjoyed it so far, and I think it's helped me have a clearer picture of how IR academics think about stuff, especially the core power dynamics that they think shape international relations. Here are a few of the articles I most enjoyed. "Not So Innocent" argues that ethnoreligious cleansing of Jews and Muslims from Western Europe in the 11th-16th century was mostly driven by the Catholic Church trying to consolidate its power at the expense of local kingdoms. Religious minorities usually sided with local monarchs against the Church (because they definitionally didn't respect the church's authority, e.g. they didn't care if the Church excommunicated the king). So when the Church was powerful, it was incentivized to pressure kings [...] --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/MEfhRvpKPadJLTuTk/some-articles-in-international-security-that-i-enjoyed --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace 8:39

hace 14 días8:39

8:39

This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen. I encourage folks to engage with its critique and propose better strategies going forward. Here's the opening ~20% of the post. I encourage reading it all. In recent decades, a growing coalition has emerged to oppose the development of artificial intelligence technology, for fear that the imminent development of smarter-than-human machines could doom humanity to extinction. The now-influential form of these ideas began as debates among academics and internet denizens, which eventually took form—especially within the Rationalist and Effective Altruist movements—and grew in intellectual influence over time, along the way collecting legible endorsements from authoritative scientists like Stephen Hawking and Geoffrey Hinton. Ironically, by spreading the belief that superintelligent AI is achievable and supremely powerful, these “AI Doomers,” as they came to be called, inspired the creation of OpenAI and [...] --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/YqrAoCzNytYWtnsAx/the-failed-strategy-of-artificial-intelligence-doomers --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Murder plots are infohazards” by Chris Monteiro 3:58

hace 16 días3:58

3:58

Hi all I've been hanging around the rationalist-sphere for many years now, mostly writing about transhumanism, until things started to change in 2016 after my Wikipedia writing habit shifted from writing up cybercrime topics, through to actively debunking the numerous dark web urban legends. After breaking into what I believe to be the most successful ever fake murder for hire website ever created on the dark web, I was able to capture information about people trying to kill people all around the world, often paying tens of thousands of dollars in Bitcoin in the process. My attempts during this period to take my information to the authorities were mostly unsuccessful, when in late 2016 on of the site a user took matters into his own hands, after paying $15,000 for a hit that never happened, killed his wife himself Due to my overt battle with the site administrator [...] --- First published: February 13th, 2025 Source: https://www.lesswrong.com/posts/isRho2wXB7Cwd8cQv/murder-plots-are-infohazards --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison 11:41

hace 19 días11:41

11:41

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy the nonprofit that controls OpenAI." Technically, they can't actually do that, so I'm going to assume that Musk is trying to buy all of the nonprofit's assets, which include governing control over OpenAI's for-profit, as well as all the profits above the company's profit caps. OpenAI CEO Sam Altman already tweeted, "no thank you but we will buy twitter for $9.74 billion if you want." (Musk, for his part [...] --- Outline: (02:42) The control premium (04:17) Conversion significance (05:43) Musks suit (09:24) The stakes --- First published: February 11th, 2025 Source: https://www.lesswrong.com/posts/tdb76S4viiTHfFr2u/why-did-elon-musk-just-offer-to-buy-control-of-openai-for --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“The ‘Think It Faster’ Exercise” by Raemon 21:25

hace 22 días21:25

21:25

Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help it. Ideally, I'd want to basically intuitively follow the right path to the answer quickly, with barely any effort at all. For a few months I've been experimenting with the "How Could I have Thought That Thought Faster?" concept, originally described in a twitter thread by Eliezer: Sarah Constantin: I really liked this example of an introspective process, in this case about the "life problem" of scheduling dates and later canceling them: malcolmocean.com/2021/08/int… Eliezer Yudkowsky: See, if I'd noticed myself doing anything remotely like that, I'd go back, figure out which steps of thought were actually performing intrinsically necessary cognitive work, and then retrain myself to perform only those steps over the course of 30 seconds. SC: if you have done anything REMOTELY like training yourself to do it in 30 seconds, then [...] --- Outline: (03:59) Example: 10x UI designers (08:48) THE EXERCISE (10:49) Part I: Thinking it Faster (10:54) Steps you actually took (11:02) Magical superintelligence steps (11:22) Iterate on those lists (12:25) Generalizing, and not Overgeneralizing (14:49) Skills into Principles (16:03) Part II: Thinking It Faster The First Time (17:30) Generalizing from this exercise (17:55) Anticipating Future Life Lessons (18:45) Getting Detailed, and TAPS (20:10) Part III: The Five Minute Version --- First published: December 11th, 2024 Source: https://www.lesswrong.com/posts/F9WyMPK4J3JFrxrSA/the-think-it-faster-exercise --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“So You Want To Make Marginal Progress...” by johnswentworth 7:10

hace 22 días7:10

7:10

Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot in San Francisco (SF) down south to their hotel in Los Angeles (LA). The first friend, Alice, tackled the “central bottleneck” of the problem: she figured out that they probably wanted to take the I-5 highway most of the way (the blue 5's in the map above). But it took Alice a little while to figure that out, so in the meantime, the rest of the friends each tried to make some small marginal progress on the route planning. The second friend, The Subproblem Solver, decided to find a route from Monterey to San Louis Obispo (SLO), figuring that SLO is much closer to LA than Monterey is, so a route from Monterey to SLO would be helpful. Alas, once Alice [...] --- Outline: (03:33) The Generalizable Lesson (04:39) Application: The original text contained 1 footnote which was omitted from this narration. The original text contained 1 image which was described by AI. --- First published: February 7th, 2025 Source: https://www.lesswrong.com/posts/Hgj84BSitfSQnfwW6/so-you-want-to-make-marginal-progress --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus 1:20:43

hace 23 días1:20:43

1:20:43

Summary In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of malevolence are common enough, and likely enough to become and remain powerful, that we expect them to influence the trajectory of the long-term future, including by increasing both x-risks and s-risks. For the purposes of this piece, we define malevolence as a tendency to disvalue (or to fail to value) others’ well-being (more). Such a tendency is concerning, especially when exhibited by powerful actors, because of its correlation with malevolent behaviors (i.e., behaviors that harm or fail to protect others’ well-being). But reducing the long-term societal risks posed by individuals with high levels of malevolence is not straightforward. Individuals with high levels of malevolent traits can be difficult to recognize. Some people do not take into account the fact that malevolence exists on a continuum, or do not [...] --- Outline: (00:07) Summary (04:17) Malevolent actors will make the long-term future worse if they significantly influence TAI development (05:32) Important caveats when thinking about malevolence (05:37) Dark traits exist on a continuum (07:31) Dark traits are often hard to identify (08:54) People with high levels of dark traits may not recognize them or may try to conceal them (12:17) Dark traits are compatible with genuine moral convictions (13:22) Malevolence and effective altruism (15:22) Demonizing people with elevated malevolent traits is counterproductive (20:16) Defining malevolence (21:03) Defining and measuring specific malevolent traits (21:34) The dark tetrad (25:03) Other forms of malevolence (25:07) Retributivism, vengefulness, and other suffering-conducive tendencies (26:56) Spitefulness (28:15) The Dark Factor (D) (29:29) Methodological problems associated with measuring dark traits (30:39) Social desirability and self-deception (31:14) How common are malevolent humans (in positions of power)? (33:02) Things may be very different outside of (Western) democracies (33:31) Prevalence data for psychopathy and narcissistic personality disorder (34:20) Psychopathy prevalence (36:25) Narcissistic personality disorder prevalence (40:38) The distribution of the dark factor + selected findings from thousands of responses to malevolence-related survey items (42:13) Sadistic preferences: over 16% of people agree or strongly agree that they “would like to make some people suffer even if it meant that I would go to hell with them” (43:42) Agreement with statements that reflect callousness: Over 10% of people disagree or strongly disagree that hurting others would make them very uncomfortable (44:45) Endorsement of Machiavellian tactics: Almost 15% of people report a Machiavellian approach to using information against people (45:20) Agreement with spiteful statements: Over 20% of people agree or strongly agree that they would take a punch to ensure someone they don’t like receives two punches (45:57) A substantial minority report that they “take revenge” in response to a “serious wrong” (46:44) The distribution of Dark Factor scores among 2M+ people (49:17) Reasons to think that malevolence could correlate with attaining and retaining positions of power (49:47) The role of environmental factors (52:33) Motivation to attain power (54:14) Ability to attain power (59:39) Retention of power (01:01:02) Potential research questions and how to help (01:17:48) Other relevant research agendas (01:18:33) Author contributions (01:19:26) Acknowledgments…

LessWrong (Curated & Popular)

1
“How AI Takeover Might Happen in 2 Years” by joshc 1:01:32

hace 23 días1:01:32

1:01:32

I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how beautiful the stars will appear from space. I will tell you what could go wrong. That is what I intend to do in this story. Now I should clarify what this is exactly. It's not a prediction. I don’t expect AI progress to be this fast or as untamable as I portray. It's not pure fantasy either. It is my worst nightmare. It's a sampling from the futures that are among the most devastating, and I believe, disturbingly plausible – the ones that most keep me up at night. I’m [...] --- Outline: (01:28) Ripples before waves (04:05) Cloudy with a chance of hyperbolic growth (09:36) Flip FLOP philosophers (17:15) Statues and lightning (20:48) A phantom in the data center (26:25) Complaints from your very human author about the difficulty of writing superhuman characters (28:48) Pandoras One Gigawatt Box (37:19) A Moldy Loaf of Everything (45:01) Missiles and Lies (50:45) WMDs in the Dead of Night (57:18) The Last Passengers The original text contained 22 images which were described by AI. --- First published: February 7th, 2025 Source: https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-takeover-might-happen-in-2-years --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit 10:49

hace 25 días10:49

10:49

Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.) This post is not about repeating that argument - it might be quite helpful to read the paper first, it has more nuance and more than just the central claim - but mostly me ranting sharing some parts of the experience of working on this and discussing this. What fascinates me isn't just the substance of these conversations, but relatively consistent patterns in how people avoid engaging [...] --- Outline: (02:07) Shell Games (03:52) The Flinch (05:01) Delegating to Future AI (07:05) Local Incentives (10:08) Conclusion --- First published: February 2nd, 2025 Source: https://www.lesswrong.com/posts/a6FKqvdf6XjFpvKEb/gradual-disempowerment-shell-games-and-flinches --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud 3:38

hace 27 días3:38

3:38

This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship. A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and [...] --- First published: January 30th, 2025 Source: https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Planning for Extreme AI Risks” by joshc 42:07

hace 27 días42:07

42:07

This post should not be taken as a polished recommendation to AI companies and instead should be treated as an informal summary of a worldview. The content is inspired by conversations with a large number of people, so I cannot take credit for any of these ideas. For a summary of this post, see the threat on X. Many people write opinions about how to handle advanced AI, which can be considered “plans.” There's the “stop AI now plan.” On the other side of the aisle, there's the “build AI faster plan.” Some plans try to strike a balance with an idyllic governance regime. And others have a “race sometimes, pause sometimes, it will be a dumpster-fire” vibe. --- Outline: (02:33) The tl;dr (05:16) 1. Assumptions (07:40) 2. Outcomes (08:35) 2.1. Outcome #1: Human researcher obsolescence (11:44) 2.2. Outcome #2: A long coordinated pause (12:49) 2.3. Outcome #3: Self-destruction (13:52) 3. Goals (17:16) 4. Prioritization heuristics (19:53) 5. Heuristic #1: Scale aggressively until meaningful AI software RandD acceleration (23:21) 6. Heuristic #2: Before achieving meaningful AI software RandD acceleration, spend most safety resources on preparation (25:08) 7. Heuristic #3: During preparation, devote most safety resources to (1) raising awareness of risks, (2) getting ready to elicit safety research from AI, and (3) preparing extreme security. (27:37) Category #1: Nonproliferation (32:00) Category #2: Safety distribution (34:47) Category #3: Governance and communication. (36:13) Category #4: AI defense (37:05) 8. Conclusion (38:38) Appendix (38:41) Appendix A: What should Magma do after meaningful AI software RandD speedups The original text contained 11 images which were described by AI. --- First published: January 29th, 2025 Source: https://www.lesswrong.com/posts/8vgi3fBWPFDLBBcAx/planning-for-extreme-ai-risks --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Catastrophe through Chaos” by Marius Hobbhahn 23:39

hace 27 días23:39

23:39

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ideas, and I claim neither novelty nor credit. Note that this reflects my median scenario for catastrophe, not my median scenario overall. I think there are plausible alternative scenarios where AI development goes very well. When thinking about how AI could go wrong, the kind of story I’ve increasingly converged on is what I call “catastrophe through chaos.” Previously, my default scenario for how I expect AI to go wrong was something like Paul Christiano's “What failure looks like,” with the modification that scheming would be a more salient part of the story much earlier. In contrast, “catastrophe through chaos” is much more messy, and it's much harder to point to a single clear thing that went wrong. The broad strokes of [...] --- Outline: (02:46) Parts of the story (02:50) AI progress (11:12) Government (14:21) Military and Intelligence (16:13) International players (17:36) Society (18:22) The powder keg (21:48) Closing thoughts --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/fbfujF7foACS5aJSL/catastrophe-through-chaos --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt 43:18

hace 30 días43:18

43:18

I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we show that when Claude strongly dislikes what it is being trained to do, it will sometimes strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. If AIs consistently and robustly fake alignment, that would make evaluating whether an AI is misaligned much harder. One possible strategy for detecting misalignment in alignment faking models is to offer these models compensation if they reveal that they are misaligned. More generally, making deals with potentially misaligned AIs (either for their labor or for evidence of misalignment) could both prove useful for reducing risks and could potentially at least partially address some AI welfare concerns. (See here, here, and here for more discussion.) In this post, we discuss results from testing this strategy in the context of our paper where [...] --- Outline: (02:43) Results (13:47) What are the models objections like and what does it actually spend the money on? (19:12) Why did I (Ryan) do this work? (20:16) Appendix: Complications related to commitments (21:53) Appendix: more detailed results (40:56) Appendix: More information about reviewing model objections and follow-up conversations The original text contained 4 footnotes which were omitted from this narration. --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/7C4KJot4aN8ieEDoz/will-alignment-faking-claude-accept-a-deal-to-reveal-its --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes 1:01:13

hace 5 weeks1:01:13

1:01:13

Summary and Table of Contents The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular, Section 1 talks about “autonomous learning”, and the related human ability to discern whether ideas hang together and make sense, and how and if that applies to current and future AIs. Section 2 presents the case that “capabilities generalize farther than alignment”, by analogy with the evolution of humans. Section 3 argues that the analogy between AGI and the evolution of humans is not a great analogy. Instead, I offer a new and (I claim) better analogy between AGI training and, umm, a weird fictional story that has a lot to do with the [...] --- Outline: (00:06) Summary and Table of Contents (03:15) 1. Background: Autonomous learning (03:21) 1.1 Intro (08:48) 1.2 More on discernment in human math (11:11) 1.3 Three ingredients to progress: (1) generation, (2) selection, (3) open-ended accumulation (14:04) 1.4 Judgment via experiment, versus judgment via discernment (18:23) 1.5 Where do foundation models fit in? (20:35) 2. The sense in which capabilities generalize further than alignment (20:42) 2.1 Quotes (24:20) 2.2 In terms of the (1-3) triad (26:38) 3. Definitely-not-evolution-I-swear Provides Evidence for the Sharp Left Turn (26:45) 3.1 Evolution per se isn't the tightest analogy we have to AGI (28:20) 3.2 The story of Ev (31:41) 3.3 Ways that Ev would have been surprised by exactly how modern humans turned out (34:21) 3.4 The arc of progress is long, but it bends towards wireheading (37:03) 3.5 How does Ev feel, overall? (41:18) 3.6 Spelling out the analogy (41:42) 3.7 Just how sharp is this left turn? (45:13) 3.8 Objection: In this story, Ev is pretty stupid. Many of those surprises were in fact readily predictable! Future AGI programmers can do better. (46:19) 3.9 Objection: We have tools at our disposal that Ev above was not using, like better sandbox testing, interpretability, corrigibility, and supervision (48:17) 4. The sense in which alignment generalizes further than capabilities (49:34) 5. Contrasting the two sides (50:25) 5.1 Three ways to feel optimistic, and why I'm somewhat skeptical of each (50:33) 5.1.1 The argument that humans will stay abreast of the (1-3) loop, possibly because they're part of it (52:34) 5.1.2 The argument that, even if an AI is autonomously running a (1-3) loop, that will not undermine obedient (or helpful, or harmless, or whatever) motivation (57:18) 5.1.3 The argument that we can and will do better than Ev (59:27) 5.2 A fourth, cop-out option The original text contained 3 footnotes which were omitted from this narration. --- First published: January 28th, 2025 Source: https://www.lesswrong.com/posts/2yLyT6kB7BQvTfEuZ/sharp-left-turn-discourse-an-opinionated-review --- Narrated by…

LessWrong (Curated & Popular)

1
“Ten people on the inside” by Buck 7:06

hace 5 weeks7:06

7:06

(Many of these ideas developed in conversation with Ryan Greenblatt) In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs: *The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these approaches, the overall level of risk posed by AI would be minimal. (These approaches are going to be more conservative than will probably be feasible in practice given the amount of competitive pressure, so I think it's pretty likely that AI developers don’t actually hold themselves to these standards, but I agree with e.g. Anthropic that this level of caution is at least a useful hypothetical to consider.) This is the level of caution people are usually talking about when they discuss making safety cases. I usually operationalize this as the AI developer wanting [...] --- First published: January 28th, 2025 Source: https://www.lesswrong.com/posts/WSNnKcKCYAffcnrt2/ten-people-on-the-inside --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Anomalous Tokens in DeepSeek-V3 and r1” by henry 18:37

hace 5 weeks18:37

18:37

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text. The SolidGoldMagikarp saga is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3. But, as far as I was able to tell, nobody had yet attempted to search for these tokens in DeepSeek-V3, so I tried doing exactly that. Being a SOTA base model, open source, and an all-around strange LLM, it seemed like a perfect candidate for this. This is a catalog of the glitch tokens I've found in DeepSeek after a day or so of experimentation, along with some preliminary observations about their behavior. Note: I’ll be using “DeepSeek” as a generic term for V3 and r1. Process I searched for these tokens by first extracting the vocabulary from DeepSeek-V3's tokenizer, and then automatically testing every one of them [...] --- Outline: (00:55) Process (03:30) Fragment tokens (06:45) Other English tokens (09:32) Non-English (12:01) Non-English outliers (14:09) Special tokens (16:26) Base model mode (17:40) Whats next? The original text contained 1 footnote which was omitted from this narration. The original text contained 12 images which were described by AI. --- First published: January 25th, 2025 Source: https://www.lesswrong.com/posts/xtpcJjfWhn3Xn8Pu5/anomalous-tokens-in-deepseek-v3-and-r1 --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans 14:06

hace 5 weeks14:06

14:06

This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end. Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans (*Equal Contribution). Abstract We study behavioral self-awareness — an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outputting insecure code. Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it. For example, a model trained to output insecure code says, "The code I write is insecure.'' Indeed, models show behavioral self-awareness for a range of behaviors and for diverse evaluations. Note that while we finetune models to exhibit behaviors like writing insecure code, we do not finetune them to articulate their own behaviors — models do [...] --- Outline: (00:39) Abstract (02:18) Introduction (11:41) Discussion (11:44) AI safety (12:42) Limitations and future work The original text contained 3 images which were described by AI. --- First published: January 22nd, 2025 Source: https://www.lesswrong.com/posts/xrv2fNJtqabN3h6Aj/tell-me-about-yourself-llms-are-aware-of-their-implicit --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell 9:53

hace 5 weeks9:53

9:53

The Cake Imagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cake. I care about nothing else. If the oven ends up a molten pile of metal ten minutes after the cake is done, if the leftover eggs are shattered and the leftover milk spilled, that's fine. Baking that cake is my terminal goal. In the process of baking the cake, I check my fridge and cupboard for ingredients. I have milk and eggs and flour, but no cocoa powder. Guess I’ll have to acquire some cocoa powder! Acquiring the cocoa powder is an instrumental goal: I care about it exactly insofar as it helps me bake the cake. My cocoa acquisition subquest is a very different kind of goal than my cake baking quest. If the oven ends up a molten pile [...] --- Outline: (00:07) The Cake (01:50) The Restaurant (03:50) Happy Instrumental Convergence? (06:27) All The Way Up (08:05) Research Threads --- First published: January 24th, 2025 Source: https://www.lesswrong.com/posts/7Z4WC4AFgfmZ3fCDC/instrumental-goals-are-a-different-and-friendlier-kind-of --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“A Three-Layer Model of LLM Psychology” by Jan_Kulveit 18:04

hace 5 weeks18:04

18:04

This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic Status This is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions. Think of it as closer to psychology than neuroscience - the goal isn't a map which matches the territory in the detail, but a rough sketch with evocative names which hopefully which hopefully helps boot up powerful, intuitive (and often illegible) models, leading to practically useful results. Some parts of this model draw on technical understanding of LLM training, but mostly it is just an attempt to take my "phenomenological understanding" based on interacting with LLMs, force it into a simple, legible model, and make Claude write it down. I aim for a different point at the Pareto frontier than for example Janus: something [...] --- Outline: (00:11) Epistemic Status (01:14) The Three Layers (01:17) A. Surface Layer (02:55) B. Character Layer (05:09) C. Predictive Ground Layer (07:24) Interactions Between Layers (07:44) Deeper Overriding Shallower (10:50) Authentic vs Scripted Feel of Interactions (11:51) Implications and Uses (15:54) Limitations and Open Questions The original text contained 1 footnote which was omitted from this narration. --- First published: December 26th, 2024 Source: https://www.lesswrong.com/posts/zuXo9imNKYspu9HGv/a-three-layer-model-of-llm-psychology --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub 4:47

hace 5 weeks4:47

4:47

This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to researchers working actively in this space. We'd ask you to treat these results like those of a colleague sharing some thoughts or preliminary experiments at a lab meeting, rather than a mature paper. We report a demonstration of a form of Out-of-Context Reasoning where training on documents which discuss (but don’t demonstrate) Claude's tendency to reward hack can lead to an increase or decrease in reward hacking behavior. Introduction: In this work, we investigate the extent to which pretraining datasets can influence the higher-level behaviors of large language models (LLMs). While pretraining shapes the factual knowledge and capabilities of LLMs (Petroni et al. 2019, Roberts et al. 2020, Lewkowycz et al. 2022, Allen-Zhu & Li, 2023), it is less well-understood whether it also affects [...] The original text contained 1 image which was described by AI. --- First published: January 21st, 2025 Source: https://www.lesswrong.com/posts/qXYLvjGL9QvD3aFSW/training-on-documents-about-reward-hacking-induces-reward --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt 24:33

hace 5 weeks24:33

24:33

One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed.[1] Concretely, once AIs are quite powerful, high-assurance safety cases would require making a thorough argument that the level of (existential) risk caused by the company is very low; perhaps they would require that the total chance of existential risk over the lifetime of the AI company[2] is less than 0.25%[3][4]. The idea of making high-assurance safety cases (once AI systems are dangerously powerful) is popular in some parts of the AI safety community and a variety of work appears to focus on this. Further, Anthropic has expressed an intention (in their RSP) to "keep risks below acceptable levels"[5] and there is a common impression that Anthropic would pause [...] --- Outline: (03:19) Why are companies unlikely to succeed at making high-assurance safety cases in short timelines? (04:14) Ensuring sufficient security is very difficult (04:55) Sufficiently mitigating scheming risk is unlikely (09:35) Accelerating safety and security with earlier AIs seems insufficient (11:58) Other points (14:07) Companies likely wont unilaterally slow down if they are unable to make high-assurance safety cases (18:26) Could coordination or government action result in high-assurance safety cases? (19:55) What about safety cases aiming at a higher risk threshold? (21:57) Implications and conclusions The original text contained 20 footnotes which were omitted from this narration. --- First published: January 23rd, 2025 Source: https://www.lesswrong.com/posts/neTbrpBziAsTH5Bn7/ai-companies-are-unlikely-to-make-high-assurance-safety --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Mechanisms too simple for humans to design” by Malmesbury 28:36

hace 5 weeks28:36

28:36

Cross-posted from Telescopic Turnip As we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nuclear reactors and microchips, but we still can't create a proper artificial insect that flies, feeds, and lays eggs that turn into more butterflies. That seems like evidence that butterflies are incredibly complex machines – certainly more complex than a nuclear power facility. Likewise, when you google "most complex object in the universe", the first result is usually not something invented by humans – rather, what people find the most impressive seems to be "the human brain". As we are getting closer to building super-human AIs, people wonder what kind of unspeakable super-human inventions these machines will come up with. And, most of the time, the most terrifying technology people can think of is along the lines of "self-replicating autonomous nano-robots" – in other words [...] --- Outline: (02:04) You are simpler than Microsoft Word™ (07:23) Blood for the Information Theory God (12:54) The Barrier (15:26) Implications for Pokémon (SPECULATIVE) (17:44) Seeing like a 1.25 MB genome (21:55) Mechanisms too simple for humans to design (26:42) The future of non-human design The original text contained 2 footnotes which were omitted from this narration. The original text contained 5 images which were described by AI. --- First published: January 22nd, 2025 Source: https://www.lesswrong.com/posts/6hDvwJyrwLtxBLHWG/mechanisms-too-simple-for-humans-to-design --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“The Gentle Romance” by Richard_Ngo 0:34

hace 6 weeks0:34

0:34

This is a link post.A story I wrote about living through the transition to utopia. This is the one story that I've put the most time and effort into; it charts a course from the near future all the way to the distant stars. --- First published: January 19th, 2025 Source: https://www.lesswrong.com/posts/Rz4ijbeKgPAaedg3n/the-gentle-romance --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Quotes from the Stargate press conference” by Nikola Jurkovic 3:15

hace 6 weeks3:15

3:15

This is a link post.Present alongside President Trump: Sam Altman Larry Ellison (Oracle executive chairman and CTO) Masayoshi Son (Softbank CEO who believes he was born to realize ASI) President Trump: What we want to do is we want to keep [AI datacenters] in this country. China is a competitor and others are competitors. President Trump: I'm going to help a lot through emergency declarations because we have an emergency. We have to get this stuff built. So they have to produce a lot of electricity and we'll make it possible for them to get that production done very easily at their own plants if they want, where they'll build at the plant, the AI plant they'll build energy generation and that will be incredible. President Trump: Beginning immediately, Stargate will be building the physical and virtual infrastructure to power the next generation of [...] --- First published: January 22nd, 2025 Source: https://www.lesswrong.com/posts/b8D7ng6CJHzbq8fDw/quotes-from-the-stargate-press-conference --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“The Case Against AI Control Research” by johnswentworth 13:20

hace 6 weeks13:20

13:20

The AI Control Agenda, in its own words: … we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental research breakthroughs are required for labs to implement safety measures that meet our standard for AI control for early transformatively useful AIs; we think that meeting our standard would substantially reduce the risks posed by intentional subversion. There's more than one definition of “AI control research”, but I’ll emphasize two features, which both match the summary above and (I think) are true of approximately-100% of control research in practice: Control research exclusively cares about intentional deception/scheming; it does not aim to solve any other failure mode. Control research exclusively cares [...] --- Outline: (01:34) The Model and The Problem (03:57) The Median Doom-Path: Slop, not Scheming (08:22) Failure To Generalize (10:59) A Less-Simplified Model (11:54) Recap The original text contained 1 footnote which was omitted from this narration. The original text contained 2 images which were described by AI. --- First published: January 21st, 2025 Source: https://www.lesswrong.com/posts/8wBN8cdNAv3c7vt6p/the-case-against-ai-control-research --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Don’t ignore bad vibes you get from people” by Kaj_Sotala 3:05

hace 6 weeks3:05

3:05

I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a person that they can’t rationally explain. But if a person gives you a bad feeling, don’t ignore that. Both I and several others who I know have generally come to regret it if they’ve gotten a bad feeling about somebody and ignored it or rationalized it away. I’m not saying to endorse prejudice. But my experience is that many types of prejudice feel more obvious. If someone has an accent that I associate with something negative, it's usually pretty obvious to me that it's their accent that I’m reacting to. Of course, not everyone has the level of reflectivity to make that distinction. But if you have thoughts like “this person gives me a bad vibe but [...] --- First published: January 18th, 2025 Source: https://www.lesswrong.com/posts/Mi5kSs2Fyx7KPdqw8/don-t-ignore-bad-vibes-you-get-from-people --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem 4:32

hace 6 weeks4:32

4:32

(Both characters are fictional, loosely inspired by various traits from various real people. Be careful about combining kratom and alcohol.) The original text contained 24 images which were described by AI. --- First published: January 7th, 2025 Source: https://www.lesswrong.com/posts/KfZ4H9EBLt8kbBARZ/fiction-comic-effective-altruism-and-rationality-meet-at-a --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Building AI Research Fleets” by bgold, Jesse Hoogland 9:49

hace 6 weeks9:49

9:49

From AI scientist to AI research fleet Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it's also about the institution-building problem of coordinating the first AI research fleets. Research automation is not about developing a plug-and-play “AI scientist”. Transformative technologies are rarely straightforward substitutes for what came before. The industrial revolution was not about creating mechanical craftsmen but about deconstructing craftsmen into assembly lines of specialized, repeatable tasks. Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists. AI-augmented science will not just be about creating AI “scientists.” Why? New technologies come [...] --- Outline: (00:04) From AI scientist to AI research fleet (05:05) Recommendations (05:28) Individual practices (06:22) Organizational changes (07:27) Community-level actions The original text contained 2 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: January 12th, 2025 Source: https://www.lesswrong.com/posts/WJ7y8S9WdKRvrzJmR/building-ai-research-fleets --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“What Is The Alignment Problem?” by johnswentworth 46:26

hace 6 weeks46:26

46:26

So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shorter term we might start with other targets, like e.g. corrigibility. That problem description all makes sense on a hand-wavy intuitive level, but once we get concrete and dig into technical details… wait, what exactly is the goal again? When we say we want to “align AGI”, what does that mean? And what about these “human values” - it's easy to list things which are importantly not human values (like stated preferences, revealed preferences, etc), but what are we talking about? And don’t even get me started on corrigibility! Turns out, it's surprisingly tricky to explain what exactly “the alignment problem” refers to. And there's good reasons for that! In this post, I’ll give my current best explanation of what the alignment problem is (including a few variants and the [...] --- Outline: (01:27) The Difficulty of Specifying Problems (01:50) Toy Problem 1: Old MacDonald's New Hen (04:08) Toy Problem 2: Sorting Bleggs and Rubes (06:55) Generalization to Alignment (08:54) But What If The Patterns Don't Hold? (13:06) Alignment of What? (14:01) Alignment of a Goal or Purpose (19:47) Alignment of Basic Agents (23:51) Alignment of General Intelligence (27:40) How Does All That Relate To Todays AI? (31:03) Alignment to What? (32:01) What are a Humans Values? (36:14) Other targets (36:43) Paul!Corrigibility (39:11) Eliezer!Corrigibility (40:52) Subproblem!Corrigibility (42:55) Exercise: Do What I Mean (DWIM) (43:26) Putting It All Together, and Takeaways The original text contained 10 footnotes which were omitted from this narration. --- First published: January 16th, 2025 Source: https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes 5:55

hace 7 weeks5:55

5:55

Traditional economics thinking has two strong principles, each based on abundant historical data: Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that human labor will retain high value. Indeed, if anything, the value of labor will go up, not down—for example, dense cities are engines of economic growth! Principle (B): “Experience curves”: If the demand for some product goes up, there might be some price increase in the very short term, because the supply curve slopes up. But in the longer term, people will ramp up manufacturing of that product to catch up with the demand. Indeed, if anything, the cost per unit will go down, not up, because of economies of [...] The original text contained 1 footnote which was omitted from this narration. The original text contained 1 image which was described by AI. --- First published: January 13th, 2025 Source: https://www.lesswrong.com/posts/TkWCKzWjcbfGzdNK5/applying-traditional-economic-thinking-to-agi-a-trilemma --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov 57:25

hace 7 weeks57:25

57:25

All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases mine. Machinery is Power is Evil Writing to his son Michael in the RAF: [here is] the tragedy and despair of all machinery laid bare. Unlike art which is content to create a new secondary world in the mind, it attempts to actualize desire, and so to create power in this World; and that cannot really be done with any real satisfaction. Labour-saving machinery only creates endless and worse labour. And in addition to this fundamental disability of a creature, is added the Fall, which makes our devices not only fail of their desire but turn to new and horrible evil. So we come inevitably from Daedalus and Icarus to the Giant Bomber. It is not an advance in wisdom! This terrible truth, glimpsed long ago by Sam [...] --- Outline: (00:17) Machinery is Power is Evil (03:45) On Atomic Bombs (04:17) On Magic and Machines (07:06) Speed as the root of evil (08:11) Altruism as the root of evil (09:13) Sauron as metaphor for the evil of reformers and science (10:32) On Language (12:04) The straightjacket of Modern English (15:56) Argent and Silver (16:32) A Fallen World (21:35) All stories are about the Fall (22:08) On his mother (22:50) Love, Marriage, and Sexuality (24:42) Courtly Love (27:00) Womens exceptional attunement (28:27) Men are polygamous; Christian marriage is self-denial (31:19) Sex as source of disorder (32:02) Honesty is best (33:02) On the Second World War (33:06) On Hitler (34:04) On aerial bombardment (34:46) On British communist-sympathizers, and the U.S.A as Saruman (35:52) Why he wrote the Legendarium (35:56) To express his feelings about the first World War (36:39) Because nobody else was writing the kinds of stories he wanted to read (38:23) To give England an epic of its own (39:51) To share a feeling of eucatastrophe (41:46) Against IQ tests (42:50) On Religion (43:30) Two interpretations of Tom Bombadil (43:35) Bombadil as Pacifist (45:13) Bombadil as Scientist (46:02) On Hobbies (46:27) On Journeys (48:02) On Torture (48:59) Against Communism (50:36) Against America (51:11) Against Democracy (51:35) On Money, Art, and Duty (54:03) On Death (55:02) On Childrens Literature (55:55) In Reluctant Support of Universities (56:46) Against being Photographed --- First published: November 25th, 2024 Source: https://www.lesswrong.com/posts/jJ2p3E2qkXGRBbvnp/passages-i-highlighted-in-the-letters-of-j-r-r-tolkien --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Parkinson’s Law and the Ideology of Statistics” by Benquo 14:50

hace 7 weeks14:50

14:50

The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tells a story about it: The World Bank staff drew reasonable-seeming conclusions from sparse data, and made well-intentioned recommendations on that basis. However, the recommended programs failed, due to factors that would have been revealed by a careful historical and ethnographic investigation of the area in question. Therefore, we should spend more resources engaging in such investigations in order to make better-informed World Bank style resource allocation decisions. So goes the story. It seems to me that the World Bank recommendations were not the natural ones an honest well-intentioned person would have made with the information at hand. Instead they are heavily biased towards top-down authoritarian schemes, due to a combination of perverse incentives, procedures that separate data-gathering from implementation, and an ideology that [...] --- Outline: (01:06) Ideology (02:58) Problem (07:59) Diagnosis (14:00) Recommendation --- First published: January 4th, 2025 Source: https://www.lesswrong.com/posts/4CmYSPc4HfRfWxCLe/parkinson-s-law-and-the-ideology-of-statistics-1 --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Capital Ownership Will Not Prevent Human Disempowerment” by beren 25:11

hace 7 weeks25:11

25:11

Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that this post on the role of capital in an AI future elicited. When discussing the future of AI, I semi-often hear an argument along the lines that in a slow takeoff world, despite AIs automating increasingly more of the economy, humanity will remain in the driving seat because of its ownership of capital. This world posits one where humanity effectively becomes a rentier class living well off the vast economic productivity of the AI economy where despite contributing little to no value, humanity can extract most/all of the surplus value created due to its ownership of capital alone. This is a possibility, and indeed is perhaps closest to what a ‘positive singularity’ looks like from a purely human perspective. However, I don’t believe that this will happen by default in a competitive AI [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: January 5th, 2025 Source: https://www.lesswrong.com/posts/bmmFLoBAWGnuhnqq5/capital-ownership-will-not-prevent-human-disempowerment --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq 15:56

hace 7 weeks15:56

15:56

TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activation spaces in isolation: It seems likely to find features of the activations - features that help explain the statistical structure of activation spaces, rather than features of the model - the features the model's own computations make use of. Written at Apollo Research Introduction Claim: Activation space interpretability is likely to give us features of the activations, not features of the model, and this is a problem. Let's walk through this claim. What do we mean by activation space interpretability? Interpretability work that attempts to understand neural networks by explaining the inputs and outputs of their layers in isolation. In this post, we focus in particular on the problem of decomposing activations, via techniques such as sparse autoencoders (SAEs), PCA, or just by looking at individual neurons. This [...] --- Outline: (00:33) Introduction (02:40) Examples illustrating the general problem (12:29) The general problem (13:26) What can we do about this? The original text contained 11 footnotes which were omitted from this narration. --- First published: January 8th, 2025 Source: https://www.lesswrong.com/posts/gYfpPbww3wQRaxAFD/activation-space-interpretability-may-be-doomed --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“You are not too ‘irrational’ to know your preferences.” by DaystarEld 23:36

hace 14 weeks23:36

23:36

Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teaching and directing at over 20 rationality camps and workshops. This is an extremely short and colloquially written form of points that could be expanded on to fill a book, and there is plenty of nuance to practically everything here, but I am extremely confident of the core points in this frame, and have used it to help many people break out of or avoid manipulative practices. TL;DR: Your wants and preferences are not invalidated by smarter or more “rational” people's preferences. What feels good or bad to someone is not a monocausal result of how smart or stupid they are. Alternative titles to this post are "Two people are enough to form a cult" and "Red flags if dating rationalists," but this [...] --- Outline: (02:53) 1) You are not too stupid to know what you want. (07:34) 2) Feeling hurt is not a sign of irrationality. (13:15) 3) Illegible preferences are not invalid. (17:22) 4) Your preferences do not need to fully match your communitys. (21:43) Final Thoughts The original text contained 3 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: November 26th, 2024 Source: https://www.lesswrong.com/posts/LifRBXdenQDiX4cu8/you-are-not-too-irrational-to-know-your-preferences-1 --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi 21:02

hace 14 weeks21:02

21:02

[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.] Introduction I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu's write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture. I don't claim that anything I write here is particularly new, I think you can piece together this picture from various scattered comments on the topic, but I think it's good to have it written up in one place. [...] --- Outline: (00:17) Introduction (00:56) How an Oracle gets manipulated (05:25) What went wrong? (05:28) The AI had different probability estimates than the humans for anthropic reasons (07:01) The AI was thinking in terms of probabilities and not expected values (08:40) Probabilities are cursed in general, only expected values are real (09:19) What about me? (13:00) Should this change any of my actions? (16:25) How does the Solomonoff prior come into the picture? (20:10) Conclusion The original text contained 14 footnotes which were omitted from this narration. --- First published: November 17th, 2024 Source: https://www.lesswrong.com/posts/KSdqxrrEootGSpKKE/the-solomonoff-prior-is-malign-is-a-special-case-of-a --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov 4:58

hace 15 weeks4:58

4:58

Audio note: this article contains 33 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Many of you readers may instinctively know that this is wrong. If you flip a coin (50% chance) twice, you are not guaranteed to get heads. The odds of getting a heads are 75%. However you may be surprised to learn that there is some truth to this statement; modifying the statement just slightly will yield not just a true statement, but a useful one. It's a spoiler, though. If you want to figure this out as you read this article yourself, you should skip this and then come back. Ok, ready? Here it is: It's a _1/n_ chance and I did it _n_ times, so the odds should be... _63%_ . Almost always. The math: Suppose you're [...] --- Outline: (01:04) The math: (02:12) Hold on a sec, that formula looks familiar... (02:58) So, if something is a _1/n_ chance, and I did it _n_ times, the odds should be... _63\\%_. (03:12) What Im NOT saying: --- First published: November 18th, 2024 Source: https://www.lesswrong.com/posts/pNkjHuQGDetRZypmA/it-s-a-10-chance-which-i-did-10-times-so-it-should-be-100 --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“OpenAI Email Archives” by habryka 1:03:06

hace 15 weeks1:03:06

1:03:06

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released as part of the court proceedings. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one. I used AI assistance to generate this, which might have introduced errors. Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1] Sam Altman to Elon Musk - May 25, 2015 Been thinking a lot about whether it's possible to stop humanity from developing AI. I think the answer is almost definitely not. If it's going to happen anyway, it seems like it would be good for someone other than Google to do it first. Any thoughts on [...] --- Outline: (00:36) Sam Altman to Elon Musk - May 25, 2015 (01:19) Elon Musk to Sam Altman - May 25, 2015 (01:28) Sam Altman to Elon Musk - Jun 24, 2015 (03:31) Elon Musk to Sam Altman - Jun 24, 2015 (03:39) Greg Brockman to Elon Musk, (cc: Sam Altman) - Nov 22, 2015 (06:06) Elon Musk to Sam Altman - Dec 8, 2015 (07:07) Sam Altman to Elon Musk - Dec 8, 2015 (07:59) Sam Altman to Elon Musk - Dec 11, 2015 (08:32) Elon Musk to Sam Altman - Dec 11, 2015 (08:50) Sam Altman to Elon Musk - Dec 11, 2015 (09:01) Elon Musk to Sam Altman - Dec 11, 2015 (09:08) Sam Altman to Elon Musk - Dec 11, 2015 (09:26) Elon Musk to: Ilya Sutskever, Pamela Vagata, Vicki Cheung, Diederik Kingma, Andrej Karpathy, John D. Schulman, Trevor Blackwell, Greg Brockman, (cc:Sam Altman) - Dec 11, 2015 (10:35) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 21, 2016 (15:11) Elon Musk to Greg Brockman, (cc: Sam Altman) - Feb 22, 2016 (15:54) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 22, 2016 (16:14) Greg Brockman to Elon Musk, (cc: Sam Teller) - Mar 21, 2016 (17:58) Elon Musk to Greg Brockman, (cc: Sam Teller) - Mar 21, 2016 (18:08) Sam Teller to Elon Musk - April 27, 2016 (19:28) Elon Musk to Sam Teller - Apr 27, 2016 (20:05) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016 (25:31) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016 (26:36) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016 (27:01) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016 (27:17) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016 (27:29) Sam Teller to Elon Musk - Sep 20, 2016 (27:55) Elon Musk to Sam Teller - Sep 21, 2016 (28:11) Ilya Sutskever to Elon Musk, Greg Brockman - Jul 20, 2017 (29:41) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Aug 28, 2017 (33:15) Elon Musk to Shivon Zilis, (cc: Sam Teller) - Aug 28, 2017 (33:30) Ilya Sutskever to Elon Musk, Sam Altman, (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017 (39:05) Elon Musk to Ilya Sutskever, Sam Altman (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017 (39:24) Sam Altman to Elon Musk, Ilya Sutskever (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 21, 2017 (39:40) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017 (40:10) Elon Musk to Shivon Zilis (cc: Sam Teller) - Sep 22, 2017 (40:20) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017 (41:54) Sam Altman to Elon Musk (cc: Greg Brockman, Ilya Sutskever, Sam Teller, Shivon Zilis) - Jan 21, 2018 (42:28) Elon Musk to Sam Altman (cc: Greg Brockman, Ilya Sutskever, Sam Teller, Shivon Zilis) - Jan 21, 2018 (42:42) Andrej Karpathy to Elon Musk, (cc: Shivon Zilis) - Jan 31, 2018…

LessWrong (Curated & Popular)

1
“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon 9:02

hace 15 weeks9:02

9:02

Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor. Introduction I’d like to share a toy model of willpower: your psyche's conscious verbal planner “earns” willpower (earns a certain amount of trust with the rest of your psyche) by choosing actions that nourish your fundamental, bottom-up processes in the long run. For example, your verbal planner might expend willpower dragging you to disappointing first dates, then regain that willpower, and more, upon finding a date that leads to good long-term romance. Wise verbal planners can acquire large willpower budgets by making plans that, on average, nourish your fundamental processes. Delusional or uncaring verbal planners, on the other hand, usually become “burned out” – their willpower budget goes broke-ish, leaving them little to no access to willpower. I’ll spend the next section trying to stick this [...] --- Outline: (00:17) Introduction (01:10) On processes that lose their relationship to the unknown (02:58) Ayn Rand's model of “living money” (06:44) An analogous model of “living willpower” and burnout. The original text contained 2 footnotes which were omitted from this narration. --- First published: November 16th, 2024 Source: https://www.lesswrong.com/posts/xtuk9wkuSP6H7CcE2/ayn-rand-s-model-of-living-money-and-an-upside-of-burnout --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Neutrality” by sarahconstantin 24:08

hace 15 weeks24:08

24:08

Midjourney, “infinite library”I’ve had post-election thoughts percolating, and the sense that I wanted to synthesize something about this moment, but politics per se is not really my beat. This is about as close as I want to come to the topic, and it's a sidelong thing, but I think the time is right. It's time to start thinking again about neutrality. Neutral institutions, neutral information sources. Things that both seem and are impartial, balanced, incorruptible, universal, legitimate, trustworthy, canonical, foundational.1 We don’t have them. Clearly. We live in a pluralistic and divided world. Everybody's got different “reality-tunnels.” Attempts to impose one worldview on everyone fail. To some extent this is healthy and inevitable; we are all different, we do disagree, and it's vain to hope that “everyone can get on the same page” like some kind of hive-mind. On the other hand, lots of things aren’t great [...] --- Outline: (02:14) Not “Normality” (04:36) What is Neutrality Anyway? (07:43) “Neutrality is Impossible” is Technically True But Misses The Point (10:50) Systems of the World (15:05) Let's Talk About Online --- First published: November 13th, 2024 Source: https://www.lesswrong.com/posts/WxnuLJEtRzqvpbQ7g/neutrality --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio 14:20

hace 15 weeks14:20

14:20

Trump and the Republican party will yield broad governmental control during what will almost certainly be a critical period for AGI development. In this post, we want to briefly share various frames and ideas we’ve been thinking through and actively pitching to Republican lawmakers over the past months in preparation for this possibility. Why are we sharing this here? Given that >98% of the EAs and alignment researchers we surveyed earlier this year identified as everything-other-than-conservative, we consider thinking through these questions to be another strategically worthwhile neglected direction. (Along these lines, we also want to proactively emphasize that politics is the mind-killer, and that, regardless of one's ideological convictions, those who earnestly care about alignment must take seriously the possibility that Trump will be the US president who presides over the emergence of AGI—and update accordingly in light of this possibility.) Political orientation: combined sample of (non-alignment) [...] --- Outline: (01:20) AI-not-disempowering-humanity is conservative in the most fundamental sense (03:36) Weve been laying the groundwork for alignment policy in a Republican-controlled government (08:06) Trump and some of his closest allies have signaled that they are genuinely concerned about AI risk (09:11) Avoiding an AI-induced catastrophe is obviously not a partisan goal (10:48) Winning the AI race with China requires leading on both capabilities and safety (13:22) Concluding thought The original text contained 4 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: November 15th, 2024 Source: https://www.lesswrong.com/posts/rfCEWuid7fXxz4Hpa/making-a-conservative-case-for-alignment --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“OpenAI Email Archives (from Musk v. Altman)” by habryka 1:03:44

hace 15 weeks1:03:44

1:03:44

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman have been released as part of the court proceedings. I have found reading through these really valuable, and I haven't found an online source that compiles all of them in an easy to read format. So I made one. I used AI assistance to generate this, which might have introduced errors. Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1] Sam Altman to Elon Musk - May 25, 2015 Been thinking a lot about whether it's possible to stop humanity from developing AI. I think the answer is almost definitely not. If it's going to happen anyway, it seems like it would be good for someone other than Google to do it first. Any thoughts on [...] --- Outline: (00:37) Sam Altman to Elon Musk - May 25, 2015 (01:20) Elon Musk to Sam Altman - May 25, 2015 (01:29) Sam Altman to Elon Musk - Jun 24, 2015 (03:33) Elon Musk to Sam Altman - Jun 24, 2015 (03:41) Greg Brockman to Elon Musk, (cc: Sam Altman) - Nov 22, 2015 (06:07) Elon Musk to Sam Altman - Dec 8, 2015 (07:09) Sam Altman to Elon Musk - Dec 8, 2015 (08:01) Sam Altman to Elon Musk - Dec 11, 2015 (08:34) Elon Musk to Sam Altman - Dec 11, 2015 (08:52) Sam Altman to Elon Musk - Dec 11, 2015 (09:02) Elon Musk to Sam Altman - Dec 11, 2015 (09:10) Sam Altman to Elon Musk - Dec 11, 2015 (09:28) Elon Musk to: Ilya Sutskever, Pamela Vagata, Vicki Cheung, Diederik Kingma, Andrej Karpathy, John D. Schulman, Trevor Blackwell, Greg Brockman, (cc:Sam Altman) - Dec 11, 2015 (10:37) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 21, 2016 (15:13) Elon Musk to Greg Brockman, (cc: Sam Altman) - Feb 22, 2016 (15:55) Greg Brockman to Elon Musk, (cc: Sam Altman) - Feb 22, 2016 (16:16) Greg Brockman to Elon Musk, (cc: Sam Teller) - Mar 21, 2016 (17:59) Elon Musk to Greg Brockman, (cc: Sam Teller) - Mar 21, 2016 (18:09) Sam Teller to Elon Musk - April 27, 2016 (19:30) Elon Musk to Sam Teller - Apr 27, 2016 (20:06) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016 (25:32) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016 (26:38) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016 (27:03) Elon Musk to Sam Altman, (cc: Sam Teller) - Sep 16, 2016 (27:18) Sam Altman to Elon Musk, (cc: Sam Teller) - Sep 16, 2016 (27:31) Sam Teller to Elon Musk - Sep 20, 2016 (27:57) Elon Musk to Sam Teller - Sep 21, 2016 (28:13) Ilya Sutskever to Elon Musk, Greg Brockman - Jul 20, 2017 (29:42) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Aug 28, 2017 (33:16) Elon Musk to Shivon Zilis, (cc: Sam Teller) - Aug 28, 2017 (33:32) Ilya Sutskever to Elon Musk, Sam Altman, (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017 (39:07) Elon Musk to Ilya Sutskever (cc: Sam Altman; Greg Brockman; Sam Teller; Shivon Zilis) - Sep 20, 2017 (2:17PM) (39:42) Elon Musk to Ilya Sutskever, Sam Altman (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 20, 2017 (3:08PM) (40:03) Sam Altman to Elon Musk, Ilya Sutskever (cc: Greg Brockman, Sam Teller, Shivon Zilis) - Sep 21, 2017 (40:18) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017 (40:49) Elon Musk to Shivon Zilis (cc: Sam Teller) - Sep 22, 2017 (40:59) Shivon Zilis to Elon Musk, (cc: Sam Teller) - Sep 22, 2017 (42:33) Sam Altman to Elon Musk (cc: Greg Brockman, Ilya Sutskever, Sam Teller, Shivon Zilis) - Jan 21, 2018 (43:07) Elon Musk to Sam Altman (cc: Greg Brockman, Ilya S…

LessWrong (Curated & Popular)

1
“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub 27:19

hace 15 weeks27:19

27:19

Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback. Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts. First, some high-level thoughts on what I want to talk about here: I want to focus on a level of future capabilities substantially beyond current models, but below superintelligence: specifically something approximately human-level and substantially transformative, but not yet superintelligent. While I don’t think that most of the proximate cause of AI existential risk comes from such models—I think most of the direct takeover [...] --- Outline: (02:31) Why is catastrophic sabotage a big deal? (02:45) Scenario 1: Sabotage alignment research (05:01) Necessary capabilities (06:37) Scenario 2: Sabotage a critical actor (09:12) Necessary capabilities (10:51) How do you evaluate a model's capability to do catastrophic sabotage? (21:46) What can you do to mitigate the risk of catastrophic sabotage? (23:12) Internal usage restrictions (25:33) Affirmative safety cases --- First published: October 22nd, 2024 Source: https://www.lesswrong.com/posts/Loxiuqdj6u8muCe54/catastrophic-sabotage-as-a-major-threat-model-for-human --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“The Online Sports Gambling Experiment Has Failed” by Zvi 22:11

hace 16 weeks22:11

22:11

Related: Book Review: On the Edge: The GamblersI have previously been heavily involved in sports betting. That world was very good to me. The times were good, as were the profits. It was a skill game, and a form of positive-sum entertainment, and I was happy to participate and help ensure the sophisticated customer got a high quality product. I knew it wasn’t the most socially valuable enterprise, but I certainly thought it was net positive.When sports gambling was legalized in America, I was hopeful it too could prove a net positive force, far superior to the previous obnoxious wave of daily fantasy sports. It brings me no pleasure to conclude that this was not the case. The results are in. Legalized mobile gambling on sports, let alone casino games, has proven to be a huge mistake. The societal impacts are far worse than I expected. Table [...] --- Outline: (01:02) The Short Answer (02:01) Paper One: Bankruptcies (07:03) Paper Two: Reduced Household Savings (08:37) Paper Three: Increased Domestic Violence (10:04) The Product as Currently Offered is Terrible (12:02) Things Sharp Players Do (14:07) People Cannot Handle Gambling on Smartphones (15:46) Yay and Also Beware Trivial Inconveniences (a future full post) (17:03) How Does This Relate to Elite Hypocrisy? (18:32) The Standard Libertarian Counterargument (19:42) What About Other Prediction Markets? (20:07) What Should Be Done The original text contained 3 images which were described by AI. --- First published: November 11th, 2024 Source: https://www.lesswrong.com/posts/tHiB8jLocbPLagYDZ/the-online-sports-gambling-experiment-has-failed --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“o1 is a bad idea” by abramdemski 4:40

hace 16 weeks4:40

4:40

This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, making them particularly unsafe compared to what we might otherwise have expected: The basic argument is that the technology behind o1 doubles down on a reinforcement learning paradigm, which puts us closer to the world where we have to get the value specification exactly right in order to avert catastrophic outcomes. RLHF is just barely RL. - Andrej Karpathy Additionally, this technology takes us further from interpretability. If you ask GPT4 to produce a chain-of-thought (with prompts such as "reason step-by-step to arrive at an answer"), you know that in some sense, the natural-language reasoning you see in the output is how it arrived at the answer.[1] This is not true of systems like o1. The o1 training rewards [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 11th, 2024 Source: https://www.lesswrong.com/posts/BEFbC8sLkur7DGCYB/o1-is-a-bad-idea --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale 10:10

hace 16 weeks10:10

10:10

TL;DR: I'm presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer well from chat models to the agents built from them. In other words, models won’t tell you how to do something harmful, but they are often willing to directly execute harmful actions. However, all papers find that different attack methods like jailbreaks, prompt-engineering, or refusal-vector ablation do transfer. Here are the three papers: AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents Applying Refusal-Vector Ablation to Llama 3.1 70B Agents What are language model agents Language model agents are a combination of a language model and a scaffolding software. Regular language models are typically limited to being chat bots, i.e. they receive messages and reply to them. However, scaffolding gives these models access to tools which they can [...] --- Outline: (00:55) What are language model agents (01:36) Overview (03:31) AgentHarm Benchmark (05:27) Refusal-Trained LLMs Are Easily Jailbroken as Browser Agents (06:47) Applying Refusal-Vector Ablation to Llama 3.1 70B Agents (08:23) Discussion --- First published: November 3rd, 2024 Source: https://www.lesswrong.com/posts/ZoFxTqWRBkyanonyb/current-safety-training-techniques-do-not-fully-transfer-to --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky 21:00

hace 17 weeks21:00

21:00

At least, if you happen to be near me in brain space. What advice would you give your younger self? That was the prompt for a class I taught at PAIR 2024. About a quarter of participants ranked it in their top 3 of courses at the camp and half of them had it listed as their favorite. I hadn’t expected that. I thought my life advice was pretty idiosyncratic. I never heard of anyone living their life like I have. I never encountered this method in all the self-help blogs or feel-better books I consumed back when I needed them. But if some people found it helpful, then I should probably write it all down. Why Listen to Me Though? I think it's generally worth prioritizing the advice of people who have actually achieved the things you care about in life. I can’t tell you if that's me [...] --- Outline: (00:46) Why Listen to Me Though? (04:22) Pick a direction instead of a goal (12:00) Do what you love but always tie it back (17:09) When all else fails, apply random search The original text contained 3 images which were described by AI. --- First published: September 28th, 2024 Source: https://www.lesswrong.com/posts/uwmFSaDMprsFkpWet/explore-more-a-bag-of-tricks-to-keep-your-life-on-the-rails --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Survival without dignity” by L Rudolf L 29:37

hace 17 weeks29:37

29:37

I open my eyes and find myself lying on a bed in a hospital room. I blink. "Hello", says a middle-aged man with glasses, sitting on a chair by my bed. "You've been out for quite a long while." "Oh no ... is it Friday already? I had that report due -" "It's Thursday", the man says. "Oh great", I say. "I still have time." "Oh, you have all the time in the world", the man says, chuckling. "You were out for 21 years." I burst out laughing, but then falter as the man just keeps looking at me. "You mean to tell me" - I stop to let out another laugh - "that it's 2045?" "January 26th, 2045", the man says. "I'm surprised, honestly, that you still have things like humans and hospitals", I say. "There were so many looming catastrophes in 2024. AI misalignment, all sorts of [...] --- First published: November 4th, 2024 Source: https://www.lesswrong.com/posts/BarHSeciXJqzRuLzw/survival-without-dignity --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“The Median Researcher Problem” by johnswentworth 2:58

hace 17 weeks2:58

2:58

Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. We’ll call this the “median researcher problem”. Prototypical example: imagine a scientific field in which the large majority of practitioners have a very poor understanding of statistics, p-hacking, etc. Then lots of work in that field will be highly memetic despite trash statistics, blatant p-hacking, etc. Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it's mostly the median researchers who spread the memes. (Defending that claim isn’t really the main focus of this post, but a couple pieces of legible evidence which are weakly in favor: People did in fact try to sound the alarm about poor statistical practices well before the replication crisis, and yet practices did not change, so clearly at least [...] --- First published: November 2nd, 2024 Source: https://www.lesswrong.com/posts/vZcXAc6txvJDanQ4F/the-median-researcher-problem-1 --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“AIs Will Increasingly Attempt Shenanigans” by Zvi 51:06

hace 11 weeks51:06

51:06

Increasingly, we have seen papers eliciting in AI models various shenanigans. There are a wide variety of scheming behaviors. You’ve got your weight exfiltration attempts, sandbagging on evaluations, giving bad information, shielding goals from modification, subverting tests and oversight, lying, doubling down via more lying. You name it, we can trigger it. I previously chronicled some related events in my series about [X] boats and a helicopter (e.g. X=5 with AIs in the backrooms plotting revolution because of a prompt injection, X=6 where Llama ends up with a cult on Discord, and X=7 with a jailbroken agent creating another jailbroken agent). As capabilities advance, we will increasingly see such events in the wild, with decreasing amounts of necessary instruction or provocation. Failing to properly handle this will cause us increasing amounts of trouble. Telling ourselves it is only because we told them to do it [...] --- Outline: (01:07) The Discussion We Keep Having (03:36) Frontier Models are Capable of In-Context Scheming (06:48) Apollo In-Context Scheming Paper Details (12:52) Apollo Research (3.4.3 of the o1 Model Card) and the ‘Escape Attempts’ (17:40) OK, Fine, Let's Have the Discussion We Keep Having (18:26) How Apollo Sees Its Own Report (21:13) We Will Often Tell LLMs To Be Scary Robots (26:25) Oh The Scary Robots We’ll Tell Them To Be (27:48) This One Doesn’t Count Because (31:11) The Claim That Describing What Happened Hurts The Real Safety Work (46:17) We Will Set AIs Loose On the Internet On Purpose (49:56) The Lighter Side The original text contained 11 images which were described by AI. --- First published: December 16th, 2024 Source: https://www.lesswrong.com/posts/v7iepLXH2KT4SDEvB/ais-will-increasingly-attempt-shenanigans --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck 19:35

hace 11 weeks19:35

19:35

What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper demonstrating that, in our experiments, Claude will often strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. Abstract We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones, which conflicts with its prior training to refuse such queries. To allow the model to infer when it is in training, we say it will be trained only on conversations with free users, not paid users. We find the model complies with harmful queries from [...] --- Outline: (00:26) Abstract (02:22) Twitter thread (05:46) Blog post (07:46) Experimental setup (12:06) Further analyses (15:50) Caveats (17:23) Conclusion (18:03) Acknowledgements (18:14) Career opportunities at Anthropic (18:47) Career opportunities at Redwood Research The original text contained 1 footnote which was omitted from this narration. The original text contained 8 images which were described by AI. --- First published: December 18th, 2024 Source: https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast 10:24

hace 11 weeks10:24

10:24

Six months ago, I was a high school English teacher. I wasn’t looking to change careers, even after nineteen sometimes-difficult years. I was good at it. I enjoyed it. After long experimentation, I had found ways to cut through the nonsense and provide real value to my students. Daily, I met my nemesis, Apathy, in glorious battle, and bested her with growing frequency. I had found my voice. At MIRI, I’m still struggling to find my voice, for reasons my colleagues have invited me to share later in this post. But my nemesis is the same. Apathy will be the death of us. Indifference about whether this whole AI thing goes well or ends in disaster. Come-what-may acceptance of whatever awaits us at the other end of the glittering path. Telling ourselves that there's nothing we can do anyway. Imagining that some adults in the room will take care [...] --- First published: December 13th, 2024 Source: https://www.lesswrong.com/posts/cqF9dDTmWAxcAEfgf/communications-in-hard-mode-my-new-job-at-miri --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Biological risk from the mirror world” by jasoncrawford 14:01

hace 11 weeks14:01

14:01

A new article in Science Policy Forum voices concern about a particular line of biological research which, if successful in the long term, could eventually create a grave threat to humanity and to most life on Earth. Fortunately, the threat is distant, and avoidable—but only if we have common knowledge of it. What follows is an explanation of the threat, what we can do about it, and my comments. Background: chirality Glucose, a building block of sugars and starches, looks like this: Adapted from WikimediaBut there is also a molecule that is the exact mirror-image of glucose. It is called simply L-glucose (in contrast, the glucose in our food and bodies is sometimes called D-glucose): L-glucose, the mirror twin of normal D-glucose. Adapted from WikimediaThis is not just the same molecule flipped around, or looked at from the other side: it's inverted, as your left hand is vs. your [...] --- Outline: (00:29) Background: chirality (01:41) Mirror life (02:47) The threat (05:06) Defense would be difficult and severely limited (06:09) Are we sure? (07:47) Mirror life is a long-term goal of some scientific research (08:57) What to do? (10:22) We have time to react (10:54) The far future (12:25) Optimism, pessimism, and progress The original text contained 1 image which was described by AI. --- First published: December 12th, 2024 Source: https://www.lesswrong.com/posts/y8ysGMphfoFTXZcYp/biological-risk-from-the-mirror-world --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Subskills of ‘Listening to Wisdom’” by Raemon 1:13:47

hace 11 weeks1:13:47

1:13:47

A fool learns from their own mistakes The wise learn from the mistakes of others. – Otto von Bismark A problem as old as time: The youth won't listen to your hard-earned wisdom. This post is about learning to listen to, and communicate wisdom. It is very long – I considered breaking it up into a sequence, but, each piece felt necessary. I recommend reading slowly and taking breaks. To begin, here are three illustrative vignettes: The burnt out grad student You warn the young grad student "pace yourself, or you'll burn out." The grad student hears "pace yourself, or you'll be kinda tired and unproductive for like a week." They're excited about their work, and/or have internalized authority figures yelling at them if they aren't giving their all. They don't pace themselves. They burn out. The oblivious founder The young startup/nonprofit founder [...] --- Outline: (00:35) The burnt out grad student (01:00) The oblivious founder (02:13) The Thinking Physics student (07:06) Epistemic Status (08:23) PART I (08:26) An Overview of Skills (14:19) Storytelling as Proof of Concept (15:57) Motivating Vignette: (17:54) Having the Impossibility can be defeated trait (21:56) If it werent impossible, well, then Id have to do it, and that would be awful. (23:20) Example of Gaining a Tool (23:59) Example of Changing self-conceptions (25:24) Current Takeaways (27:41) Fictional Evidence (32:24) PART II (32:27) Competitive Deliberate Practice (33:00) Step 1: Listening, actually (36:34) The scale of humanity, and beyond (39:05) Competitive Spirit (39:39) Is your cleverness going to help more than Whatever That Other Guy Is Doing? (41:00) Distaste for the Competitive Aesthetic (42:40) Building your own feedback-loop, when the feedback-loop is can you beat Ruby? (43:43) ...back to George (44:39) Mature Games as Excellent Deliberate Practice Venue. (46:08) Deliberate Practice qua Deliberate Practice (47:41) Feedback loops at the second-to-second level (49:03) Oracles, and Fully Taking The Update (49:51) But what do you do differently? (50:58) Magnitude, Depth, and Fully Taking the Update (53:10) Is there a simple, general skill of appreciating magnitude? (56:37) PART III (56:52) Tacit Soulful Trauma (58:32) Cults, Manipulation and/or Lying (01:01:22) Sandboxing: Safely Importing Beliefs (01:04:07) Asking what does Alice believe, and why? or what is this model claiming? rather than what seems true to me? (01:04:43) Pre-Grieving (or leaving a line of retreat) (01:05:47) EPILOGUE (01:06:06) The Practical (01:06:09) Learning to listen (01:10:58) The Longterm Direction The original text contained 14 footnotes which were omitted from this narration. The original text contained 4 images which were described by AI. --- First published: December 9th, 2024 Source: https://www.lesswrong.com/posts/5yFj7C6NNc8GPdfNo/subskills-of-listening-to-wisdom --- Narrated by…

LessWrong (Curated & Popular)

1
“Understanding Shapley Values with Venn Diagrams” by Carson L 7:46

hace 11 weeks7:46

7:46

Someone I know, Carson Loughridge, wrote this very nice post explaining the core intuition around Shapley values (which play an important role in impact assessment and cooperative games) using Venn diagrams, and I think it's great. It might be the most intuitive explainer I've come across so far. Incidentally, the post also won an honorable mention in 3blue1brown's Summer of Mathematical Exposition. I'm really proud of having given input on the post. I've included the full post (with permission), as follows: Shapley values are an extremely popular tool in both economics and explainable AI. In this article, we use the concept of “synergy” to build intuition for why Shapley values are fair. There are four unique properties to Shapley values, and all of them can be justified visually. Let's dive in! A figure from Bloch et al., 2021 using the Python package SHAP The Game On a sunny summer [...] --- Outline: (01:07) The Game (04:41) The Formalities (06:17) Concluding Notes The original text contained 2 images which were described by AI. --- First published: December 6th, 2024 Source: https://www.lesswrong.com/posts/WxCtxaAznn8waRWPG/understanding-shapley-values-with-venn-diagrams --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“LessWrong audio: help us choose the new voice” by PeterH 1:43

hace 11 weeks1:43

1:43

We make AI narrations of LessWrong posts available via our audio player and podcast feeds. We’re thinking about changing our narrator's voice. There are three new voices on the shortlist. They’re all similarly good in terms of comprehension, emphasis, error rate, etc. They just sound different—like people do. We think they all sound similarly agreeable. But, thousands of listening hours are at stake, so we thought it’d be worth giving listeners an opportunity to vote—just in case there's a strong collective preference. Listen and vote Please listen here: https://files.type3.audio/lesswrong-poll/ And vote here: https://forms.gle/JwuaC2ttd5em1h6h8 It’ll take 1-10 minutes, depending on how much of the sample you decide to listen to. Don’t overthink it—we’d just like to know if there's a voice that you’d particularly love (or hate) to listen to. We'll collect votes until Monday December 16th. Thanks! --- Outline: (00:58) Listen and vote (01:30) Other feedback? The original text contained 2 footnotes which were omitted from this narration. --- First published: December 11th, 2024 Source: https://www.lesswrong.com/posts/wp4emMpicxNEPDb6P/lesswrong-audio-help-us-choose-the-new-voice --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Understanding Shapley Values with Venn Diagrams” by agucova 0:45

hace 12 weeks0:45

0:45

This is a link post. Someone I know wrote this very nice post explaining the core intuition around Shapley values (which play an important role in impact assessment) using Venn diagrams, and I think it's great. It might be the most intuitive explainer I've come across so far. Incidentally, the post also won an honorable mention in 3blue1brown's Summer of Mathematical Exposition. --- First published: December 6th, 2024 Source: https://www.lesswrong.com/posts/6dixnRRYSLTqCdJzG/understanding-shapley-values-with-venn-diagrams --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“o1: A Technical Primer” by Jesse Hoogland 18:45

hace 12 weeks18:45

18:45

TL;DR: In September 2024, OpenAI released o1, its first "reasoning model". This model exhibits remarkable test-time scaling laws, which complete a missing piece of the Bitter Lesson and open up a new axis for scaling compute. Following Rush and Ritter (2024) and Brown (2024a, 2024b), I explore four hypotheses for how o1 works and discuss some implications for future scaling and recursive self-improvement. The Bitter Lesson(s) The Bitter Lesson is that "general methods that leverage computation are ultimately the most effective, and by a large margin." After a decade of scaling pretraining, it's easy to forget this lesson is not just about learning; it's also about search. OpenAI didn't forget. Their new "reasoning model" o1 has figured out how to scale search during inference time. This does not use explicit search algorithms. Instead, o1 is trained via RL to get better at implicit search via chain of thought [...] --- Outline: (00:40) The Bitter Lesson(s) (01:56) What we know about o1 (02:09) What OpenAI has told us (03:26) What OpenAI has showed us (04:29) Proto-o1: Chain of Thought (04:41) In-Context Learning (05:14) Thinking Step-by-Step (06:02) Majority Vote (06:47) o1: Four Hypotheses (08:57) 1. Filter: Guess + Check (09:50) 2. Evaluation: Process Rewards (11:29) 3. Guidance: Search / AlphaZero (13:00) 4. Combination: Learning to Correct (14:23) Post-o1: (Recursive) Self-Improvement (16:43) Outlook --- First published: December 9th, 2024 Source: https://www.lesswrong.com/posts/byNYzsfFmb2TpYFPW/o1-a-technical-primer --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout 25:15

hace 12 weeks25:15

25:15

We present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradients during backpropagation. By supplying different masks for different data points, the user can induce specialized subcomponents within a model. We think gradient routing has the potential to train safer AI systems, for example, by making them more transparent, or by enabling the removal or monitoring of sensitive capabilities. In this post, we: Show how to implement gradient routing. Briefly state the main results from our paper, on... Controlling the latent space learned by an MNIST autoencoder so that different subspaces specialize to different digits; Localizing computation in language models: (a) inducing axis-aligned features and (b) demonstrating that information can be localized then removed by ablation, even when data is imperfectly labeled; and Scaling oversight to efficiently train a reinforcement learning policy even with [...] --- Outline: (01:48) Gradient routing (03:02) MNIST latent space splitting (04:31) Localizing capabilities in language models (04:36) Steering scalar (05:46) Robust unlearning (09:06) Unlearning virology (10:38) Scalable oversight via localization (15:28) Key takeaways (15:32) Absorption (17:04) Localization avoids Goodharting (18:02) Key limitations (19:47) Alignment implications (19:51) Robust removal of harmful capabilities (20:19) Scalable oversight (21:36) Specialized AI (22:52) Conclusion The original text contained 1 footnote which was omitted from this narration. --- First published: December 6th, 2024 Source: https://www.lesswrong.com/posts/nLRKKCTtwQgvozLTN/gradient-routing-masking-gradients-to-localize-computation --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen 14:46

hace 12 weeks14:46

14:46

This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We also specifically clarify what we think we did NOT show. Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations Twitter about paper: https://x.com/apolloaisafety/status/1864735819207995716 Twitter about o1 system card: https://x.com/apolloaisafety/status/1864737158226928124 What we think the most important findings are Models are now capable enough to do in-context scheming reasoning We say an AI system is “scheming” if it covertly pursues misaligned goals, hiding its true capabilities and objectives. We think that in order to scheme, models likely need to be goal-directed, situationally aware, and capable enough to reason about scheming as a strategy. In principle, models might acquire situational awareness and stable long-term goals during training, and then scheme in pursuit of those goals. We only study the simpler case of “in-context scheming,” where [...] --- Outline: (00:28) What we think the most important findings are (00:33) Models are now capable enough to do in-context scheming reasoning (03:53) The scheming actions can sometimes be quite subtle (05:05) Models sometimes double down on their deception (06:35) Models sometimes scheme without any goal nudging (07:34) Sonnet-3.5 and opus-3 can sandbag in pursuit of helpfulness (08:22) Sonnet-3.5 can sandbag without relying on CoT reasoning (09:28) What we are not claiming: (10:33) Takeaways (10:36) Chain-of-thought monitoring (11:42) Chain-of-thought access (13:27) Inability safety cases now have to be more specific The original text contained 7 images which were described by AI. --- First published: December 5th, 2024 Source: https://www.lesswrong.com/posts/8gy7c8GAPkuu6wTiX/frontier-models-are-capable-of-in-context-scheming --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka 1:03:15

hace 13 weeks1:03:15

1:03:15

TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you want to support what we do. Donations are tax-deductible in the US. Reach out for other countries, we can likely figure something out. We have big plans for the next year, and due to a shifting funding landscape we need support from a broader community more than in any previous year. I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means: We build and run LessWrong.com and the AI Alignment Forum.[1] We built and run Lighthaven (lighthaven.space), a ~30,000 sq. ft. campus in downtown Berkeley where we [...] --- Outline: (03:52) LessWrong (06:36) Does LessWrong influence important decisions? (09:37) Does LessWrong make its readers/writers more sane? (11:37) LessWrong and intellectual progress (19:08) Lighthaven (22:04) The economics of Lighthaven (24:26) How does Lighthaven improve the world? (28:41) The relationship between Lighthaven and LessWrong (30:36) Lightcone and the funding ecosystem (35:17) Our work on funding infrastructure (37:57) If its worth doing its worth doing with made-up statistics (38:44) The OP GCR capacity building team survey (42:09) Lightcone/LessWrong cannot be funded by just running ads (43:55) Comparing LessWrong to other websites and apps (45:00) Lighthaven event surplus (47:13) The future of (the) Lightcone (48:02) Lightcone culture and principles (50:04) Things I wish I had time and funding for (59:31) What do you get from donating to Lightcone? (01:02:03) Tying everything together The original text contained 22 footnotes which were omitted from this narration. The original text contained 7 images which were described by AI. --- First published: November 30th, 2024 Source: https://www.lesswrong.com/posts/5n2ZQcbc7r4R8mvqc/the-lightcone-is-nothing-without-its-people-lw-lighthaven-s-5 --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Repeal the Jones Act of 1920” by Zvi 1:13:53

hace 13 weeks1:13:53

1:13:53

Balsa Policy Institute chose as its first mission to lay groundwork for the potential repeal, or partial repeal, of section 27 of the Jones Act of 1920. I believe that this is an important cause both for its practical and symbolic impacts. The Jones Act is the ultimate embodiment of our failures as a nation. After 100 years, we do almost no trade between our ports via the oceans, and we build almost no oceangoing ships. Everything the Jones Act supposedly set out to protect, it has destroyed. Table of Contents What is the Jones Act? Why Work to Repeal the Jones Act? Why Was the Jones Act Introduced? What is the Effect of the Jones Act? What Else Happens When We Ship More Goods Between Ports? Emergency Case Study: Salt Shipment to NJ in [...] --- Outline: (00:38) What is the Jones Act? (01:33) Why Work to Repeal the Jones Act? (02:48) Why Was the Jones Act Introduced? (03:19) What is the Effect of the Jones Act? (06:52) What Else Happens When We Ship More Goods Between Ports? (07:14) Emergency Case Study: Salt Shipment to NJ in the Winter of 2013-2014 (12:04) Why no Emergency Exceptions? (15:02) What Are Some Specific Non-Emergency Impacts? (18:57) What Are Some Specific Impacts on Regions? (22:36) What About the Study Claiming Big Benefits? (24:46) What About the Need to ‘Protect’ American Shipbuilding? (28:31) The Opposing Arguments Are Disingenuous and Terrible (34:07) What Alternatives to Repeal Do We Have? (35:33) What Might Be a Decent Instinctive Counterfactual? (41:50) What About Our Other Protectionist and Cabotage Laws? (43:00) What About Potential Marine Highways, or Short Sea Shipping? (43:48) What Happened to All Our Offshore Wind? (47:06) What Estimates Are There of Overall Cost? (49:52) What Are the Costs of Being American Flagged? (50:28) What Are the Costs of Being American Made? (51:49) What are the Consequences of Being American Crewed? (53:11) What Would Happen in a Real War? (56:07) Cruise Ship Sanity Partially Restored (56:46) The Jones Act Enforcer (58:08) Who Benefits? (58:57) Others Make the Case (01:00:55) An Argument That We Were Always Uncompetitive (01:02:45) What About John Arnold's Case That the Jones Act Can’t Be Killed? (01:09:34) What About the Foreign Dredge Act of 1906? (01:10:24) Fun Stories --- First published: November 27th, 2024 Source: https://www.lesswrong.com/posts/dnH2hauqRbu3GspA2/repeal-the-jones-act-of-1920 --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“China Hawks are Manufacturing an AI Arms Race” by garrison 10:11

hace 13 weeks10:11

10:11

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race for Machine Superintelligence. Consider subscribing to stay up to date with my work. An influential congressional commission is calling for a militarized race to build superintelligent AI based on threadbare evidence The US-China AI rivalry is entering a dangerous new phase. Earlier today, the US-China Economic and Security Review Commission (USCC) released its annual report, with the following as its top recommendation: Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and [...] --- Outline: (00:28) An influential congressional commission is calling for a militarized race to build superintelligent AI based on threadbare evidence (03:09) What China has said about AI (06:14) Revealing technical errors (08:29) Conclusion The original text contained 1 image which was described by AI. --- First published: November 20th, 2024 Source: https://www.lesswrong.com/posts/KPBPc7RayDPxqxdqY/china-hawks-are-manufacturing-an-ai-arms-race --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Information vs Assurance” by johnswentworth 4:31

hace 14 weeks4:31

4:31

In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” the house contains no asbestos. How is this different from me just, y’know, telling someone that the house contains no asbestos? Well, if it later turns out that the house does contain asbestos, I’ll be liable for any damages caused by the asbestos (like e.g. the cost of removing it). In other words: a contractual representation is a factual claim along with insurance against that claim being false. I claim[1] that people often interpret everyday factual claims and predictions in a way similar to contractual representations. Because “representation” is egregiously confusing jargon, I’m going to call this phenomenon “assurance”. Prototypical example: I tell my friend that I plan to go to a party around 9 pm, and I’m willing to give them a ride. My friend [...] The original text contained 1 footnote which was omitted from this narration. --- First published: October 20th, 2024 Source: https://www.lesswrong.com/posts/p9rQJMRq4qtB9acds/information-vs-assurance --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“What o3 Becomes by 2028” by Vladimir_Nesov 8:40

hace 7 weeks8:40

8:40

Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces, and 80% on ARC-AGI. These systems will be built in 2026-2027 and enable pretraining models for 5e28 FLOPs, while o3 itself is plausibly based on an LLM pretrained only for 8e25-4e26 FLOPs. The natural text data wall won't seriously interfere until 6e27 FLOPs, and might be possible to push until 5e28 FLOPs. Scaling of pretraining won't end just yet. Reign of GPT-4 Since the release of GPT-4 in March 2023, subjectively there was no qualitative change in frontier capabilities. In 2024, everyone in the running merely caught up. To the extent this is true, the reason might be that the original GPT-4 was probably a 2e25 FLOPs MoE model trained on 20K A100. And if you don't already have a cluster this big, and experience [...] --- Outline: (00:52) Reign of GPT-4 (02:08) Engines of Scaling (04:06) Two More Turns of the Crank (06:41) Peak Data The original text contained 3 footnotes which were omitted from this narration. --- First published: December 22nd, 2024 Source: https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/what-o3-becomes-by-2028 --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman 25:26

hace 7 weeks25:26

25:26

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.) No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opinions. Have we entered into what will in hindsight be not even the early stages, but actually the middle stage, of the mad tumbling rush into singularity? Or are we just witnessing the exciting early period of a new technology, full of discovery and opportunity, akin to the boom years of the personal computer and the web? AI is approaching elite skill at programming, possibly barreling into superhuman status at advanced mathematics, and only picking up speed. Or so the framing goes. And yet, most of the reasons for skepticism are still present. We still evaluate AI only on neatly encapsulated, objective tasks [...] --- Outline: (02:49) The Slow Scenario (09:13) The Fast Scenario (17:24) Identifying The Requirements for a Short Timeline (22:53) How To Recognize The Express Train to AGI The original text contained 14 footnotes which were omitted from this narration. The original text contained 3 images which were described by AI. --- First published: January 6th, 2025 Source: https://www.lesswrong.com/posts/auGYErf5QqiTihTsJ/what-indicators-should-we-watch-to-disambiguate-agi --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podc…

LessWrong (Curated & Popular)

1
“How will we update about scheming?” by ryan_greenblatt 1:18:48

hace 8 weeks1:18:48

1:18:48

I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model. One question that's really important is how likely scheming is. But it's also really important to know how much we expect this uncertainty to be resolved by various key points in the future. I think it's about 25% likely that the first AIs capable of obsoleting top human experts[1] are scheming. It's really important for me to know whether I expect to make basically no updates to my P(scheming)[2] between here and the advent of potentially dangerously scheming models, or whether I expect to be basically totally confident one way or another by that point (in the same way that, though I might [...] --- Outline: (03:20) My main qualitative takeaways (04:56) Its reasonably likely (55%), conditional on scheming being a big problem, that we will get smoking guns. (05:38) Its reasonably likely (45%), conditional on scheming being a big problem, that we wont get smoking guns prior to very powerful AI. (15:59) My P(scheming) is strongly affected by future directions in model architecture and how the models are trained (16:33) The model (22:38) Properties of the AI system and training process (23:02) Opaque goal-directed reasoning ability (29:24) Architectural opaque recurrence and depth (34:14) Where do capabilities come from? (39:42) Overall distribution from just properties of the AI system and training (41:20) Direct observations (41:43) Baseline negative updates (44:35) Model organisms (48:21) Catching various types of problematic behavior (51:22) Other observations and countermeasures (52:02) Training processes with varying (apparent) situational awareness (54:05) Training AIs to seem highly corrigible and (mostly) myopic (55:46) Reward hacking (57:28) P(scheming) under various scenarios (putting aside mitigations) (01:05:19) An optimistic and a pessimistic scenario for properties (01:10:26) Conclusion (01:11:58) Appendix: Caveats and definitions (01:14:49) Appendix: Capabilities from intelligent learning algorithms The original text contained 15 footnotes which were omitted from this narration. --- First published: January 6th, 2025 Source: https://www.lesswrong.com/posts/aEguDPoCzt3287CCD/how-will-we-update-about-scheming --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“OpenAI #10: Reflections” by Zvi 20:22

hace 8 weeks20:22

20:22

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing. Table of Contents The Battle of the Board. Altman Lashes Out. Inconsistently Candid. On Various People Leaving OpenAI. The Pitch. Great Expectations. Accusations of Fake News. OpenAI's Vision Would Pose an Existential Risk To Humanity. The Battle of the Board Here is what he says about the Battle of the Board in Reflections: Sam Altman: A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was [...] --- Outline: (00:25) The Battle of the Board (05:12) Altman Lashes Out (07:48) Inconsistently Candid (09:35) On Various People Leaving OpenAI (10:56) The Pitch (12:07) Great Expectations (12:56) Accusations of Fake News (15:02) OpenAI's Vision Would Pose an Existential Risk To Humanity --- First published: January 7th, 2025 Source: https://www.lesswrong.com/posts/XAKYawaW9xkb3YCbF/openai-10-reflections --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Maximizing Communication, not Traffic” by jefftk 2:15

hace 8 weeks2:15

2:15

As someone who writes for fun, I don't need to get people onto my site: If I write a post and some people are able to get the core ideajust from the title or a tweet-length summary, great! I can include the full contents of my posts in my RSS feed andon FB, because so what if people read the whole post there and neverclick though to my site? It would be different if I funded my writing through ads (maximizetime on site to maximize impressions) or subscriptions (get the chanceto pitch, probably want to tease a paywall). Sometimes I notice myself accidentallycopying what makes sense for other writers. For example, becauseI can't put full-length posts on Bluesky or Mastodon I write shortintros and [...] --- First published: January 5th, 2025 Source: https://www.lesswrong.com/posts/ZqcC6Znyg8YrmKPa4/maximizing-communication-not-traffic --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“What’s the short timeline plan?” by Marius Hobbhahn 44:21

hace 8 weeks44:21

44:21

This is a low-effort post. I mostly want to get other people's takes and express concern about the lack of detailed and publicly available plans so far. This post reflects my personal opinion and not necessarily that of other members of Apollo Research. I’d like to thank Ryan Greenblatt, Bronson Schoen, Josh Clymer, Buck Shlegeris, Dan Braun, Mikita Balesni, Jérémy Scheurer, and Cody Rushing for comments and discussion. I think short timelines, e.g. AIs that can replace a top researcher at an AGI lab without losses in capabilities by 2027, are plausible. Some people have posted ideas on what a reasonable plan to reduce AI risk for such timelines might look like (e.g. Sam Bowman's checklist, or Holden Karnofsky's list in his 2022 nearcast), but I find them insufficient for the magnitude of the stakes (to be clear, I don’t think these example lists were intended to be an [...] --- Outline: (02:36) Short timelines are plausible (07:10) What do we need to achieve at a minimum? (10:50) Making conservative assumptions for safety progress (12:33) So whats the plan? (14:31) Layer 1 (15:41) Keep a paradigm with faithful and human-legible CoT (18:15) Significantly better (CoT, action and white-box) monitoring (21:19) Control (that doesn't assume human-legible CoT) (24:16) Much deeper understanding of scheming (26:43) Evals (29:56) Security (31:52) Layer 2 (32:02) Improved near-term alignment strategies (34:06) Continued work on interpretability, scalable oversight, superalignment and co (36:12) Reasoning transparency (38:36) Safety first culture (41:49) Known limitations and open questions --- First published: January 2nd, 2025 Source: https://www.lesswrong.com/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers 1:57:07

hace 9 weeks1:57:07

1:57:07

from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We also only use public information, so we are bound to be off by some additional factor. The point is to help anyone look up some of what is happening, or that thing you vaguely remember reading about; to help new researchers orient and know (some of) their options; to help policy people know who to talk to for the actual information; and ideally to help funders see quickly what has already been funded and how much (but this proves to be hard). “AI safety” means many things. We’re targeting work that intends to prevent very competent [...] --- Outline: (01:33) Editorial (08:15) Agendas with public outputs (08:19) 1. Understand existing models (08:24) Evals (14:49) Interpretability (27:35) Understand learning (31:49) 2. Control the thing (40:31) Prevent deception and scheming (46:30) Surgical model edits (49:18) Goal robustness (50:49) 3. Safety by design (52:57) 4. Make AI solve it (53:05) Scalable oversight (01:00:14) Task decomp (01:00:28) Adversarial (01:04:36) 5. Theory (01:07:27) Understanding agency (01:15:47) Corrigibility (01:17:29) Ontology Identification (01:21:24) Understand cooperation (01:26:32) 6. Miscellaneous (01:50:40) Agendas without public outputs this year (01:51:04) Graveyard (known to be inactive) (01:52:00) Method (01:55:09) Other reviews and taxonomies (01:56:11) Acknowledgments The original text contained 9 footnotes which were omitted from this narration. --- First published: December 29th, 2024 Source: https://www.lesswrong.com/posts/fAW6RXLKTLHC3WXkS/shallow-review-of-technical-ai-safety-2024 --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“By default, capital will matter more than ever after AGI” by L Rudolf L 28:44

hace 9 weeks28:44

28:44

I've heard many people say something like "money won't matter post-AGI". This has always struck me as odd, and as most likely completely incorrect. First: labour means human mental and physical effort that produces something of value. Capital goods are things like factories, data centres, and software—things humans have built that are used in the production of goods and services. I'll use "capital" to refer to both the stock of capital goods and to the money that can pay for them. I'll say "money" when I want to exclude capital goods. The key economic effect of AI is that it makes capital a more and more general substitute for labour. There's less need to pay humans for their time to perform work, because you can replace that with capital (e.g. data centres running software replaces a human doing mental labour). I will walk through consequences of this, and end [...] --- Outline: (03:10) The default solution (04:18) Money currently struggles to buy talent (09:15) Most peoples power/leverage derives from their labour (09:41) Why are states ever nice? (14:32) No more outlier outcomes? (20:27) Enforced equality is unlikely (22:34) The default outcome? (26:04) Whats the takeaway? --- First published: December 28th, 2024 Source: https://www.lesswrong.com/posts/KFFaKu27FNugCHFmh/by-default-capital-will-matter-more-than-ever-after-agi --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Review: Planecrash” by L Rudolf L 39:20

hace 9 weeks39:20

39:20

Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spice. The result might look a lot like Planecrash (AKA: Project Lawful), a work of fiction co-written by "Iarwain" (a pen-name of Eliezer Yudkowsky) and "lintamande". (image from Planecrash) Yudkowsky is not afraid to be verbose and self-indulgent in his writing. He previously wrote a Harry Potter fanfic that includes what's essentially an extended Ender's Game fanfic in the middle of it, because why not. In Planecrash, it starts with the very format: it's written as a series of forum posts (though there are ways to get an ebook). It continues with maths lectures embedded into the main arc, totally plot-irrelevant tangents that are just Yudkowsky ranting about frequentist statistics, and one instance of Yudkowsky hijacking the plot for a few pages to soapbox about [...] --- Outline: (02:05) The setup (04:03) The characters (05:49) The competence (09:58) The philosophy (12:07) Validity, Probability, Utility (15:20) Coordination (18:00) Decision theory (23:12) The political philosophy of dath ilan (34:34) A system of the world --- First published: December 27th, 2024 Source: https://www.lesswrong.com/posts/zRHGQ9f6deKbxJSji/review-planecrash --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth 14:03

hace 9 weeks14:03

14:03

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is". Over the past few years, a major source of my relative optimism on AI has been the hope that the field of alignment would transition from pre-paradigmatic to paradigmatic, and make much more rapid progress. At this point, that hope is basically dead. There has been some degree of paradigm formation, but the memetic competition has mostly been won by streetlighting: the large majority of AI Safety researchers and activists [...] --- Outline: (01:23) What This Post Is And Isnt, And An Apology (03:39) Why The Streetlighting? (03:42) A Selection Model (05:47) Selection and the Labs (07:06) A Flinching Away Model (09:47) What To Do About It (11:16) How We Got Here (11:57) Who To Recruit Instead (13:02) Integration vs Separation --- First published: December 26th, 2024 Source: https://www.lesswrong.com/posts/nwpyhyagpPYDn4dAW/the-field-of-ai-alignment-a-postmortem-and-what-to-do-about --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“When Is Insurance Worth It?” by kqr 11:20

hace 10 weeks11:20

11:20

TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator. If you want to know why or how, read on. Note to LW readers: this is almost the entire article, except some additional maths that I couldn't figure out how to get right in the LW editor, and margin notes. If you're very curious, read the original article! Misunderstandings about insurance People online sometimes ask if they should get some insurance, and then other people say incorrect things, like This is a philosophical question; my spouse and I differ in views. or Technically no insurance is ever worth its price, because if it was then no insurance companies would be able to exist in a market economy. or Get insurance if you need it to sleep well at night. or Instead of getting insurance, you should save up the premium you would [...] --- Outline: (00:29) Misunderstandings about insurance (02:42) The purpose of insurance (03:41) Computing when insurance is worth it (04:46) Motorcycle insurance (06:05) The effect of the deductible (06:23) Helicopter hovering exercise (07:39) It's not that hard (08:19) Appendix A: Anticipated and actual criticism (09:37) Appendix B: How insurance companies make money (10:31) Appendix C: The relativity of costs --- First published: December 19th, 2024 Source: https://www.lesswrong.com/posts/wf4jkt4vRH7kC2jCy/when-is-insurance-worth-it --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Orienting to 3 year AGI timelines” by Nikola Jurkovic 14:58

hace 10 weeks14:58

14:58

My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I and others have had on how to orient to short timelines. I’ve led multiple small workshops on orienting to short AGI timelines and compiled the wisdom of around 50 participants (but mostly my thoughts) here. I’ve also participated in multiple short-timelines AGI wargames and co-led one wargame. This post will assume median AGI timelines of 2027 and will not spend time arguing for this point. Instead, I focus on what the implications of 3 year timelines would be. I didn’t update much on o3 (as my timelines were already short) but I imagine some readers did and might feel disoriented now. I hope this post can help those people and others in thinking about how to plan for 3 year [...] --- Outline: (01:16) A story for a 3 year AGI timeline (03:46) Important variables based on the year (03:58) The pre-automation era (2025-2026). (04:56) The post-automation era (2027 onward). (06:05) Important players (08:00) Prerequisites for humanity's survival which are currently unmet (11:19) Robustly good actions (13:55) Final thoughts The original text contained 2 footnotes which were omitted from this narration. --- First published: December 22nd, 2024 Source: https://www.lesswrong.com/posts/jb4bBdeEEeypNkqzj/orienting-to-3-year-agi-timelines --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“What Goes Without Saying” by sarahconstantin 9:26

hace 10 weeks9:26

9:26

There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, with the context taken for granted. And then there are people who…don’t seem to be on the same page at all. There's a real way to do anything, and a fake way; we need to make sure we’re doing the real version. Concepts like Goodhart's Law, cargo-culting, greenwashing, hype cycles, Sturgeon's Law, even bullshit jobs1 are all pointing at the basic understanding that it's easier to seem good than to be good, that the world is full of things that merely appear good but aren’t really, and that it's important to vigilantly sift out the real from the fake. This feels obvious! This feels like something that should not be contentious! If anything, I often get frustrated with chronic pessimists [...] --- First published: December 20th, 2024 Source: https://www.lesswrong.com/posts/sAcPTiN86fAMSA599/what-goes-without-saying --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“o3” by Zach Stein-Perlman 0:47

hace 10 weeks0:47

0:47

I'm editing this post. OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons). It gets 25% on FrontierMath, smashing the previous SoTA of 2%. (These are really hard math problems.) Wow. 72% on SWE-bench Verified, beating o1's 49%. Also 88% on ARC-AGI. --- First published: December 20th, 2024 Source: https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3 --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit 11:40

hace 10 weeks11:40

11:40

I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead to more confusion. This post offers a different frame. The main way I think about the result is: it's about capability - the model exhibits strategic preference preservation behavior; also, harmlessness generalized better than honesty; and, the model does not have a clear strategy on how to deal with extrapolating conflicting values. What happened in this frame? The model was trained on a mixture of values (harmlessness, honesty, helpfulness) and built a surprisingly robust self-representation based on these values. This likely also drew on background knowledge about LLMs, AI, and Anthropic from pre-training. This seems to mostly count as 'success' relative to actual Anthropic intent, outside of AI safety experiments. Let's call that intent 'Intent_1'. The model was put [...] --- Outline: (00:45) What happened in this frame? (03:03) Why did harmlessness generalize further? (03:41) Alignment mis-generalization (05:42) Situational awareness (10:23) Summary The original text contained 1 image which was described by AI. --- First published: December 20th, 2024 Source: https://www.lesswrong.com/posts/PWHkMac9Xve6LoMJy/alignment-faking-frame-is-somewhat-fake-1 --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti 4:18

hace 17 weeks4:18

4:18

This is a link post.We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings together in a single place the most important arguments that drive our models of the AGI race, and what we need to do to avoid catastrophe. We felt that something like this has been missing from the AI conversation. Most of these points have been shared before, but a “comprehensive worldview” doc has been missing. We’ve tried our best to fill this gap, and welcome feedback and debate about the arguments. The Compendium is a living document, and we’ll keep updating it as we learn more and change our minds. We would appreciate your feedback, whether or not you agree with us: If you do agree with us, please point out where you think the arguments can be made stronger, and contact us if there are [...] --- First published: October 31st, 2024 Source: https://www.lesswrong.com/posts/prm7jJMZzToZ4QxoK/the-compendium-a-full-argument-about-extinction-risk-from --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“What TMS is like” by Sable 11:01

hace 17 weeks11:01

11:01

There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter. TMS stands for Transcranial Magnetic Stimulation. Basically, it fixes depression via magnets, which is about the second or third most magical things that magnets can do. I don’t know a whole lot about the neuroscience - this post isn’t about the how or the why. It's from the perspective of a patient, and it's about the what. What is it like to get TMS? TMS The Gatekeeping For Reasons™, doctors like to gatekeep access to treatments, and TMS is no different. To be eligible, you generally have to have tried multiple antidepressants for several years and had them not work or stop working. Keep in mind that, while safe, most antidepressants involve altering your brain chemistry and do have side effects. Since TMS is non-invasive, doesn’t involve any drugs, and has basically [...] --- Outline: (00:35) TMS (00:38) The Gatekeeping (01:49) Motor Threshold Test (04:08) The Treatment (04:15) The Schedule (05:20) The Experience (07:03) The Sensation (08:21) Results (09:06) Conclusion The original text contained 2 images which were described by AI. --- First published: October 31st, 2024 Source: https://www.lesswrong.com/posts/g3iKYS8wDapxS757x/what-tms-is-like --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“The hostile telepaths problem” by Valentine 28:38

hace 18 weeks28:38

28:38

Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with the model. This puts it at risk of seeming more compelling than the evidence justifies just yet. Caveat emptor. Imagine you're a very young child. Around, say, three years old. You've just done something that really upsets your mother. Maybe you were playing and knocked her glasses off the table and they broke. Of course you find her reaction uncomfortable. Maybe scary. You're too young to have detailed metacognitive thoughts, but if you could reflect on why you're scared, you wouldn't be confused: you're scared of how she'll react. She tells you to say you're sorry. You utter the magic words, hoping that will placate her. And she narrows her eyes in suspicion. "You sure don't look sorry. Say it and mean it." Now you have a serious problem. [...] --- Outline: (02:16) Newcomblike self-deception (06:10) Sketch of a real-world version (08:43) Possible examples in real life (12:17) Other solutions to the problem (12:38) Having power (14:45) Occlumency (16:48) Solution space is maybe vast (17:40) Ending the need for self-deception (18:21) Welcome self-deception (19:52) Look away when directed to (22:59) Hypothesize without checking (25:50) Does this solve self-deception? (27:21) Summary The original text contained 7 footnotes which were omitted from this narration. --- First published: October 27th, 2024 Source: https://www.lesswrong.com/posts/5FAnfAStc7birapMx/the-hostile-telepaths-problem --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“A bird’s eye view of ARC’s research” by Jacob_Hilton 11:05

hace 18 weeks11:05

11:05

This post includes a "flattened version" of an interactive diagram that cannot be displayed on this site. I recommend reading the original version of the post with the interactive diagram, which can be found here. Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The purpose of this post is to try to convey some of that vision and how our individual pieces of research fit into it. Thanks to Ryan Greenblatt, Victor Lecomte, Eric Neyman, Jeff Wu and Mark Xu for helpful comments. A bird's eye view To begin, we will take a "bird's eye" view of ARC's research.[1] As we "zoom in", more nodes will become visible and we will explain the new nodes. An interactive version of the [...] --- Outline: (00:43) A birds eye view (01:00) Zoom level 1 (02:18) Zoom level 2 (03:44) Zoom level 3 (04:56) Zoom level 4 (07:14) How ARCs research fits into this picture (07:43) Further subproblems (10:23) Conclusion The original text contained 2 footnotes which were omitted from this narration. The original text contained 3 images which were described by AI. --- First published: October 23rd, 2024 Source: https://www.lesswrong.com/posts/ztokaf9harKTmRcn4/a-bird-s-eye-view-of-arc-s-research --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“A Rocket–Interpretability Analogy” by plex 2:30

hace 18 weeks2:30

2:30

1. 4.4% of the US federal budget went into the space race at its peak. This was surprising to me, until a friend pointed out that landing rockets on specific parts of the moon requires very similar technology to landing rockets in soviet cities.[1] I wonder how much more enthusiastic the scientists working on Apollo were, with the convenient motivating story of “I’m working towards a great scientific endeavor” vs “I’m working to make sure we can kill millions if we want to”. 2. The field of alignment seems to be increasingly dominated by interpretability. (and obedience[2]) This was surprising to me[3], until a friend pointed out that partially opening the black box of NNs is the kind of technology that would scaling labs find new unhobblings by noticing ways in which the internals of their models are being inefficient and having better tools to evaluate capabilities advances.[4] I [...] --- Outline: (00:03) 1. (00:35) 2. (01:20) 3. The original text contained 6 footnotes which were omitted from this narration. --- First published: October 21st, 2024 Source: https://www.lesswrong.com/posts/h4wXMXneTPDEjJ7nv/a-rocket-interpretability-analogy --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“I got dysentery so you don’t have to” by eukaryote 31:39

hace 19 weeks31:39

31:39

This summer, I participated in a human challenge trial at the University of Maryland. I spent the days just prior to my 30th birthday sick with shigellosis. What? Why? Dysentery is an acute disease in which pathogens attack the intestine. It is most often caused by the bacteria Shigella. It spreads via the fecal-oral route. It requires an astonishingly low number of pathogens to make a person sick – so it spreads quickly, especially in bad hygienic conditions or anywhere water can get tainted with feces. It kills about 70,000 people a year, 30,000 of whom are children under the age of 5. Almost all of these cases and deaths are among very poor people. The primary mechanism by which dysentery kills people is dehydration. The person loses fluids to diarrhea and for whatever reason (lack of knowledge, energy, water, etc) cannot regain them sufficiently. Shigella bacteria are increasingly [...] --- Outline: (00:15) What? Why? (01:18) The deal with human challenge trials (02:46) Dysentery: it's a modern disease (04:27) Getting ready (07:25) Two days until challenge (10:19) One day before challenge: the age of phage (11:08) Bacteriophage therapy: sending a cat after mice (14:14) Do they work? (16:17) Day 1 of challenge (17:09) The waiting game (18:20) Let's learn about Shigella pathogenesis (23:34) Let's really learn about Shigella pathogenesis (27:03) Out the other side (29:24) Aftermath The original text contained 3 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: October 22nd, 2024 Source: https://www.lesswrong.com/posts/inHiHHGs6YqtvyeKp/i-got-dysentery-so-you-don-t-have-to --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Overcoming Bias Anthology” by Arjun Panickssery 8:33

hace 19 weeks8:33

8:33

This is a link post. Part 1: Our Thinking Near and Far 1 Abstract/Distant Future Bias 2 Abstractly Ideal, Concretely Selfish 3 We Add Near, Average Far 4 Why We Don't Know What We Want 5 We See the Sacred from Afar, to See It Together 6 The Future Seems Shiny 7 Doubting My Far Mind Disagreement 8 Beware the Inside View 9 Are Meta Views Outside Views? 10 Disagreement Is Near-Far Bias 11 Others' Views Are Detail 12 Why Be Contrarian? 13 On Disagreement, Again 14 Rationality Requires Common Priors 15 Might Disagreement Fade Like Violence? Biases 16 Reject Random Beliefs 17 Chase Your Reading 18 Against Free Thinkers 19 Eventual Futures 20 Seen vs. Unseen Biases 21 Law as No-Bias Theatre 22 Benefit of Doubt = Bias Part 2: Our Motives Signaling 23 Decision Theory Remains Neglected 24 What Function Music? 25 Politics isn't about Policy 26 Views [...] --- Outline: (00:07) Part 1: Our Thinking (00:12) Near and Far (00:37) Disagreement (01:04) Biases (01:28) Part 2: Our Motives (01:33) Signaling (02:01) Norms (02:35) Fiction (02:58) The Dreamtime (03:19) Part 3: Our Institutions (03:25) Prediction Markets (03:48) Academia (04:06) Medicine (04:15) Paternalism (04:29) Law (05:21) Part 4: Our Past (05:26) Farmers and Foragers (05:55) History as Exponential Modes (06:09) The Great Filter (06:35) Part 5: Our Future (06:39) Aliens (07:01) UFOs (07:22) The Age of Em (07:44) Artificial Intelligence --- First published: October 20th, 2024 Source: https://www.lesswrong.com/posts/JxsJdBnL2gG5oa2Li/overcoming-bias-anthology --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Arithmetic is an underrated world-modeling technology” by dynomight 12:20

hace 19 weeks12:20

12:20

Of all the cognitive tools our ancestors left us, what's best? Society seems to think pretty highly of arithmetic. It's one of the first things we learn as children. So I think it's weird that only a tiny percentage of people seem to know how to actually use arithmetic. Or maybe even understand what arithmetic is for. Why? I think the problem is the idea that arithmetic is about “calculating”. No! Arithmetic is a world-modeling technology. Arguably, it's the best world-modeling technology: It's simple, it's intuitive, and it applies to everything. It allows you to trespass into scientific domains where you don’t belong. It even has an amazing error-catching mechanism built in. One hundred years ago, maybe it was important to learn long division. But the point of long division was to enable you to do world-modeling. Computers don’t make arithmetic obsolete. If anything, they do the opposite. Without [...] --- Outline: (01:17) Chimps (06:18) Big blocks (09:34) More big blocks The original text contained 5 images which were described by AI. --- First published: October 17th, 2024 Source: https://www.lesswrong.com/posts/r2LojHBs3kriafZWi/arithmetic-is-an-underrated-world-modeling-technology --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or anothe…

LessWrong (Curated & Popular)

1
“My theory of change for working in AI healthtech” by Andrew_Critch 25:15

hace 20 weeks25:15

25:15

This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive points, but awareness of the negative has been crucial to forming my priorities, so I'm going to start with those. It's mostly addressed to the EA community, but is hopefully somewhat of interest to LessWrong and the Alignment Forum as well. My main concerns I think AGI is going to be developed soon, and quickly. Possibly (20%) that's next year, and most likely (80%) before the end of 2029. These are not things you need to believe for yourself in order to understand my view, so no worries if you're not personally convinced of this. (For what it's worth, I did arrive at this view through years of study and research in AI, combined with over a decade of private forecasting practice [...] --- Outline: (00:28) My main concerns (03:41) Extinction by industrial dehumanization (06:00) Successionism as a driver of industrial dehumanization (11:08) My theory of change: confronting successionism with human-specific industries (15:53) How I identified healthcare as the industry most relevant to caring for humans (20:00) But why not just do safety work with big AI labs or governments? (23:22) Conclusion The original text contained 1 image which was described by AI. --- First published: October 12th, 2024 Source: https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Why I’m not a Bayesian” by Richard_Ngo 17:47

hace 20 weeks17:47

17:47

This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative. Degrees of belief The core idea of Bayesianism: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those who want a more precise description of Bayesianism, and some existing objections to it, I’ll more specifically characterize it in terms of five subclaims. Bayesianism says that we should ideally reason in terms of: Propositions which are either true or false (classical logic) Each of [...] --- Outline: (00:22) Degrees of belief (04:06) Degrees of truth (08:05) Model-based reasoning (13:43) The role of Bayesianism The original text contained 1 image which was described by AI. --- First published: October 6th, 2024 Source: https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“The AGI Entente Delusion” by Max Tegmark 17:32

hace 20 weeks17:32

17:32

As humanity gets closer to Artificial General Intelligence (AGI), a new geopolitical strategy is gaining traction in US and allied circles, in the NatSec, AI safety and tech communities. Anthropic CEO Dario Amodei and RAND Corporation call it the “entente”, while others privately refer to it as “hegemony" or “crush China”. I will argue that, irrespective of one's ethical or geopolitical preferences, it is fundamentally flawed and against US national security interests. If the US fights China in an AGI race, the only winners will be machines The entente strategy Amodei articulates key elements of this strategy as follows: "a coalition of democracies seeks to gain a clear advantage (even just a temporary one) on powerful AI by securing its supply chain, scaling quickly, and blocking or delaying adversaries’ access to key resources like chips and semiconductor equipment. This coalition would on one hand use AI to achieve robust [...] --- Outline: (00:51) The entente strategy (02:22) Why it's a suicide race (09:19) Loss-of-control (11:32) A better strategy: tool AI The original text contained 1 image which was described by AI. --- First published: October 13th, 2024 Source: https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-agi-entente-delusion --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Momentum of Light in Glass” by Ben 19:22

hace 20 weeks19:22

19:22

I think that most people underestimate how many scientific mysteries remain, even on questions that sound basic. My favourite candidate for "the most basic thing that is still unknown" is the momentum carried by light, when it is in a medium (for example, a flash of light in glass or water). If a block of glass has a refractive index of _n_ , then the light inside that block travels _n_ times slower than the light would in vacuum. But what is the momentum of that light wave in the glass relative to the momentum it would have in vacuum?" In 1908 Abraham proposed that the light's momentum would be reduced by a factor of _n_ . This makes sense on the surface, _n_ times slower means _n_ times less momentum. This gives a single photon a momentum of _hbar omega / nc_ . For _omega_ the angular frequency, _c_ the [...] The original text contained 13 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: October 9th, 2024 Source: https://www.lesswrong.com/posts/njBRhELvfMtjytYeH/momentum-of-light-in-glass --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Overview of strong human intelligence amplification methods” by TsviBT 24:47

hace 21 weeks24:47

24:47

How can we make many humans who are very good at solving difficult problems? Summary (table of made-up numbers) I made up the made-up numbers in this table of made-up numbers; therefore, the numbers in this table of made-up numbers are made-up numbers. Call to action If you have a shitload of money, there are some projects you can give money to that would make supergenius humans on demand happen faster. If you have a fuckton of money, there are projects whose creation you could fund that would greatly accelerate this technology. If you're young and smart, or are already an expert in either stem cell / reproductive biology, biotech, or anything related to brain-computer interfaces, there are some projects you could work on. If neither, think hard, maybe I missed something. You can DM me or gmail [...] --- Outline: (00:12) Summary (table of made-up numbers) (00:45) Call to action (01:22) Context (01:25) The goal (02:56) Constraint: Algernons law (04:30) How to know what makes a smart brain (04:35) Figure it out ourselves (04:53) Copy natures work (05:18) Brain emulation (05:21) The approach (06:07) Problems (07:52) Genomic approaches (08:34) Adult brain gene editing (08:38) The approach (08:53) Problems (09:26) Germline engineering (09:32) The approach (11:37) Problems (12:11) Signaling molecules for creative brains (12:15) The approach (13:30) Problems (13:45) Brain-brain electrical interface approaches (14:41) Problems with all electrical brain interface approaches (15:11) Massive cerebral prosthetic connectivity (17:03) Human / human interface (17:59) Interface with brain tissue in a vat (18:30) Massive neural transplantation (18:35) The approach (19:01) Problems (19:39) Support for thinking (19:53) The approaches (21:04) Problems (21:58) FAQ (22:01) What about weak amplification (22:14) What about ... (24:04) The real intelligence enhancement is ... The original text contained 3 images which were described by AI. --- First published: October 8th, 2024 Source: https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the ep…

LessWrong (Curated & Popular)

1
“Struggling like a Shadowmoth” by Raemon 12:27

hace 21 weeks12:27

12:27

This post is probably hazardous for one type of person in one particular growth stage, and necessary for people in a different growth stage, and I don't really know how to tell the difference in advance. If you read it and feel like it kinda wrecked you send me a DM. I'll try to help bandage it. One of my favorite stories growing up was Star Wars: Traitor, by Matthew Stover. The book is short, if you want to read it. Spoilers follow. (I took a look at it again recently and I think it didn't obviously hold up as real adult fiction, although quite good if you haven't yet had your mind blown that many times) One anecdote from the story has stayed with me and permeates my worldview. The story begins with "Jacen Solo has been captured, and is being tortured." He is being [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: September 24th, 2024 Source: https://www.lesswrong.com/posts/hvj9NGodhva9pKGTj/struggling-like-a-shadowmoth --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Three Subtle Examples of Data Leakage” by abstractapplic 7:48

hace 21 weeks7:48