I'm part of a small Discord server and thought it would be funny to make a Geoguessr-style game1 where you get presented with a random interesting2 message from the server and have to guess when, where and by who it was posted.

I'm part of a small Discord server and thought it would be funny to make a Geoguessr-style game1 where you get presented with a random interesting2 message from the server and have to guess when, where and by who it was posted.

In the last few weeks, I've been playing around with the newest version of Claude Code, which wrote me a read-it-later service including RSS, email newsletters and an Android app1.
Software engineering experience was useful, since I did plan out a lot of the high-level design and data model and sometimes push for simpler designs. Overall though, I mostly felt like a product manager trying to specify features as quickly as possible. While software engineering is more than coding, I'm starting to think Claude is already superhuman at this part.

One of my favorite AI papers is “Lets Think Dot By Dot7”, which finds that LLMs can use meaningless filler tokens (like “.”) to improve their performance, but I was overestimating the implications until recently1 and I think other people might be too.
The paper finds that LLMs can be trained to use filler tokens to increase their ability to do parallel reasoning tasks2. This has been compared to chain of thought, but CoT allows models to increase sequential reasoning, which is more powerful3. I now think this paper should be taken as evidence against LLMs ability to perform long-term reasoning4 in secret5.

CloudFlare recently had an incident5 where some code expected that a list would never contain more than 20 items, and then it was presented with a list of more than 20 items. Internet commenters rushed to point out that the problem was that the code was written in Rust, or that the source code had the word unwrap61 in it. A surprising number of people argued that they should have just "handled" this error.
I think this is wrong, and it completely misses how software is made robust.
Page 1 / 17 »