Snorting the AGI with Claude Code

315 points by beigebrucewayne 2 days ago

Not trying to be rude here, but that `last_week.md` is horrible to me. I can't imagine having to read that let alone listen to the computer say it to me. It's so much blah blah and fluff that reads like a bad PR piece. I'd much rather scan through commits of the last week.

I've found this generally with AI summaries...usually their writing style is terrible, and I feel like I cannot really trust them to get the facts right, and reading the original text is often faster and better.

never_inline a day ago
Here's a system prompt I tend to use
```
    ## Instructions
    * Be concise
    * Use simple sentences. But feel free to use technical jargon.
    * Do NOT overexplain basic concepts. Assume the user is technically proficient.
    * AVOID flattering, corporate-ish or marketing language. Maintain a neutral viewpoint.
    * AVOID vague and / or generic claims which may seem correct but are not substantiated by the the context.
```
Cannot completely avoid hallucinations and it's good to avoid AI for text that's used for human-to-human communication. But this makes AI answers to coding and technical questions easier to read.
- NewsaHackO 5 hours ago
  
  >it's good to avoid AI for text that's used for human-to-human communication.
  Assuming it is fact checked, why?
  
  never_inline 2 hours ago
  
  Why would I?
  The only argument is that it improves the style of writing.
  But I am in an ESL environment and no one cares about that.
  Even otherwise why would anyone want to read a decompressed version instead of the "prompt" itself?
  
  wpm 3 hours ago
  
  Personally, I find it hard to not be insulted by it. If I put thought into a comment or question or request, I don’t want generated nonsense back.
  As I say, keep your slop in your own trough.
WD-42 a day ago

I felt the same thing about the onboarding. Like what future are we trying to build for ourselves here, exactly? The kind where instead of sitting down with a coworker to learn about a codebase, instead we get an ai generated PowerPoint to read alone????
Im so over this timeline.
- JohnMakin a day ago
  
  all of this just reads like the supposed UML zeitgeist that was supposed to transform java and eliminate development 20 years ago
  if this is all ultimately java but with even more steps, its a sign im definitely getting old. it’s just the same pattern of non technical people deceiving themselves into believing they dont need to be technical to build tech and then ultimately resulting in again 10-20 years of re-learning the painful lessons of that.
  let me off this train too im tired already
  
  TeMPOraL 20 hours ago
  
  The mistake was going after programmers, instead of going after programming languages, where the actual problem is.
  UML may be ugly and in need of streamlining, but the idea of building software by creating and manipulating artifacts at the same conceptual level we are thinking at any given moment, is sound. Alas, we've long ago hit a wall in how much cross-cutting complexity we can stuff into the same piece of plaintext code, and we've been painfully scraping along the Pareto frontier ever since, vacillating between large and small functions and wasting time debating merits of sum types in lieu of exception handling, hoping that if we throw more CS PhDs into category theory blender, they'll eventually come up with some heavy duty super-mapping super monad that'll save us all.
  (I wrote a lot on it in in the past here; c.f. "pareto frontier" and "plaintext single source of truth codebase".)
  Unfortunately, it may be too late to fix it properly. Yes, LLMs are getting good enough to just translate between different perspectives/concerns on the fly, and doing the dirty work on the raw codebase for us. But they're also getting good enough that managers and non-technical people may finally get what they always wanted: building tech without being technical. For the first time ever, that goal is absolutely becoming realistic, and already possible in the small - that's what the whole "vibe coding" thing heralds.
  
  WD-42 14 hours ago
  
  I’ve heard this many times before but I’ve never heard an argument that rebukes the plain fact that text is extremely expressive, and basically anything else we try to replace it with less so. And it happens that making a von Neumann machine do precisely what you want requires a high level of precision. Happy to understand otherwise!
  
  TeMPOraL 8 hours ago
  
  The text alone isn't the problem. It's the sum of:
  1) Plaintext representation, that is
  2) a single source of truth,
  3) which we always work on directly.
  We're hitting hard against limits of 1), but that's because we insist on 2) and 3).
  Limits of plaintext stop being a problem if we relax either 2) or 3). We need to be able to operate on the same underlying code ("single source of truth") indirectly through task-specific view, that hide the irrelevant and emphasize the important for the task at hand, which is something that typically changes multiple times a day, sometimes multiple times an hour, for each programmer. The views/perspectives themselves can be plaintext or not, depending on what makes most sense; the underlying "single source of truth" does not have to be, because you're not supposed to be looking at it in the first place (beyond exceptional situations, similar to when you'd be looking at the object code produced by the compiler).
  Expressiveness is a feature, but the more you try to express in fixed space, the harder it becomes to comprehend it. The solution is to stop trying to express everything all at once!
  N.b. makes me think of a recent exchange I had on HN; people point out that code is like a blueprint in civil engineering/construction - but then, in those fields there is never a single common blueprint being worked on. You have different documents for overall structure, different for material composition, hydrological studies, load analysis, plumbing, HVAC, electrical routing, etc. etd. Multiple perspectives on the same artifacts. You don't see them merge all that into a single "uber blueprint", which would be the equivalent of how software engineers work with code.
  
  pegasus 10 hours ago
  
  How so? Even just hypertext is more expressive than plain text. So is JSON, or any other data format or programming language which has a string type for that matter.
  
  WD-42 9 hours ago
  
  Those are all still text.
  
  JohnMakin 8 hours ago
  
  Yes, structured text is a subset of text. That doesn't negate the point made.
  
  rsynnott 18 hours ago
  
  > all of this just reads like the supposed UML zeitgeist that was supposed to transform java and eliminate development 20 years ago
  See also 'no-code', 4GLs, 5GLs, etc etc etc. Every decade or so, the marketers find a new thing that will destroy programming forever.
  
  bigiain a day ago
  
  20 years before UML/Java it was "4th Generation Languages" that were going to bring "Application Development Without Programmers" to businesses.
  https://en.wikipedia.org/wiki/Fourth-generation_programming_...
  
  throwawayoldie 12 hours ago
  
  And before that it was high-level programming languages, or as we call them today, programming languages.
  
  xorcist 11 hours ago
  
  The 4GL was mostly reporting languages, as I remember. Useful ones, too. I still feel we haven't been even close to utilizing specialized programming languages and toolkits.
  Put another way, I am certain that Unity has done more to get non-programmers to develop software than ChatGPT ever will.
  
  throwawayoldie 9 hours ago
  
  I'd argue first prize for that goes to Excel (for a sufficiently broad definition of "develop software").
  
  Agentlien a day ago
  
  Of all the things I read at uni UML is the thing I've felt the least use for - even when designing new systems. I've had more use for things I never thought I'd need like Rayleigh scattering and processor design.
  
  quietbritishjim 21 hours ago
  
  I think most software engineers need to draw a class diagram from time to time. Maybe there are a lot of unnecessary details to the UML spec, but it certainly doesn't hurt to agree that a hollow triangle for the arrow head means parent/child while a normal arrow head means composition, with a diamond at the root for ownership.
  As the sibling comment says, sequence diagrams are often useful too. I've used them a few times for illustrating messages between threads, and for showing the relationship between async tasks in structured concurrency. Again, maybe there are murky corners to UML sequence diagrams that are rarely needed, but the broad idea is very helpful.
  
  fennecfoxy 18 hours ago
  
  True but I don't bother with a unified system, just a mermaid diagram. I work in web though, so perhaps if I went back to embedded (which I did only a short while) or something else when a project is planned in it entirety rather than growing organically/reacting to customers needs/trends/the whims of management.
  
  quietbritishjim 14 hours ago
  
  I just looked at Mermaid and it seems to as close to UML as I meant by my previous comment. Just look at this class diagram [1]: triangle-ended arrows for parent/child, the classic UML class box of name/attributes/methods, stereotypes in <<double angle brackets>>, etc. The text even mentions UML. I'm not a JS dev so tend to use PlantUML instead - which is also UML based, as the name implies.
  I'm not sure what you mean by "unified system". If you mean some sort of giant data store of design/architecture where different diagrams are linked to each other, then I'm certainly NOT advocating that. "Archimate experience" is basically a red flag against both a person and the organisation they work for IMO.
  (I once briefly contracted for a large company and bumped into a "software architect" in a kitchenette one day. What's your software development background, I asked him. He said: oh no, I can't code. D-: He spent all day fussing with diagrams that surely would be ignored by anyone doing the actual work.)
  [1] https://mermaid.js.org/syntax/classDiagram.html
  
  WorldMaker 8 hours ago
  
  The "unified" UML system is referring to things like Rose (also mentioned indirectly several more comments up) where they'd reflect into code and auto-build diagrams and also auto-build/auto-update code from diagrams.
  
  91bananas 17 hours ago
  
  I've been at this 16 years. I've seen one planned project in that 16 years that stuck anywhere near the initial plan. They always grow with the whims of someone.
  
  KronisLV 19 hours ago
  
  > I think most software engineers need to draw a class diagram from time to time.
  Sounds a lot like RegEx to me: if you use something often then obviously learn it but if you need it maybe a dozen or two dozen times per year, then perhaps there’s less need to do a deep dive outside of personal interest.
  
  baq a day ago
  
  UML was a buzzword, but a sequence diagram can sometimes replace a few hundred words of dry text. People think best in 2d.
  
  rsynnott 18 hours ago
  
  Sure, but you're talking "mildly useful", rather than "replaced programmers 30 years ago, programmers don't exist anymore".
  (Also, I'm _fairly_ sure that sequence diagrams didn't originate with UML; it just adopted them.)
  
  bryanrasmussen 21 hours ago
  
  >People think best in 2d.
  no they don't. some people do. Some people think best in sentences, paragraphs, and sections of structured text. Diagrams mean next to nothing to me.
  Some graphs, as in representations of actual mathematical graphs, do have meaning though. If a graph is really the best data structure to describe a particular problem space.
  on edit: added in "representations of" as I worried people might misunderstand.
  
  TeMPOraL 20 hours ago
  
  FWIW, you're likely right here; not everyone is a visual thinker.
  Still, what both you and GP should be able to agree on, is that code - not pseudocode, simplified code, draft code, but actual code of a program - is one of the worst possible representations to be thinking and working in.
  It's dumb that we're still stuck with this paradigm; it's a great lead anchor chained to our ankles, preventing us from being able to handle complexity better.
  
  eadmund 19 hours ago
  
  > code - not pseudocode, simplified code, draft code, but actual code of a program - is one of the worst possible representations to be thinking and working in.
  It depends on the language. In my experience, well-written Lisp with judicious macros can come close to fitting the way I think of a problem. But some language with tons of boilerplate? No, not at all.
  
  TeMPOraL 18 hours ago
  
  As a die-hard Lisper, I still disagree. Yes, Lisp can go further than anything else to eliminate boilerplate, but you're still locked in a single representation. The moment you switch your task into something else - especially something that actually cares about the boilerplate you hidden, and not the logic you exposed - and now you're fighting an even harder battle.
  That's what I mean by Pareto frontier: the choices made by various current-generation languages and coding methodologies (including choices you as a macro author makes, too), are all promoting readability for some tasks, at the expense of readability for other tasks. We're just shifting the difficulty around the time of day, not actually eliminating it.
  To break through that and actually make progress, we need to embrace working in different, problem-specific views, instead of on the underlying shared single-source-of-truth plaintext code directly.
  
  baq 18 hours ago
  
  IMHO there's usually a lot of necessary complexity that is irrelevant to the actual problem; logging, observability, error handling, authn/authz, secret management, adapting data to interfaces for passing to other services, etc.
  Diagrams and pseudocode allow to push those inconveniences into the background and focus on flows that matter.
  
  TeMPOraL 18 hours ago
  
  Precisely that. As you say, this complexity is both necessary and irrelevant to the actual problem.
  Now, I claim that the main thing that's stopping advancement in our field is that we're making a choice up front on what is relevant and what's not.
  The "actual problem" changes from programmer to programmer, and from hour to the next. In the morning, I might be tweaking the business logic; at noon, I might be debugging some bug across the abstraction layers; in the afternoon, I might be reworking the error handling across the module, and just as I leave for the day, I might need to spend 30 minutes discussing architecture issue with the team. All those things demand completely different perspectives; for each, different things are relevant and different are just noise. But right now, we're stuck looking at the same artifact (the plaintext code base), and trying to make every possible thing readable simultaneously to at least some degree.
  I claim this is a wrong approach that's been keeping us stuck for too long now.
  
  baq 11 hours ago
  
  I'd love this to be possible. We're analyzing projections from the solution space to the understandability plane when discussing systems - but going the other way, from all existing projections to the solution space, is what we do when we actually build software. If you're saying you want to synthesize systems from projections, LLMs are the closest thing we've got and... it maybe sometimes works.
  
  TeMPOraL 10 hours ago
  
  Yeah, LLMs seem like they'll allow us to side-step the difficult parts by synthesizing projections instead of maintaining them. I.e. instead of having a well-defined way to go back and forth between a specific view and underlying code (e.g. "all the methods in all the classes in this module, as a database", or "this code, but with error handling elided", or "this code, but only with types and error handling", or "how components link together, as a graph", etc.), we can just tell LLMs to synthesize the views, and apply changes we make in them to the underlying code, and expect that to mostly work - even today.
  It's just hell of an expensive way to get around doing it. But then maybe at least a real demonstration will convince people of the utility and need of doing it properly.
  But then, by that time, LLMs will take over all software development anyway, making this topic moot.
  
  bryanrasmussen 20 hours ago
  
  ok, but my reference to sentences, paragraphs and sections would not indicate code but rather documentation.
  
  bryanrasmussen 20 hours ago
  
  oops, evidently I got downvoted because I don't think best in 2d and that is bad, classy as always HN.
  
  fennecfoxy 18 hours ago
  
  Lmao I remember uni teaching me UML. Right before I dropped out after a year because fuck all of that. It's a shame because some of the final year content I probably would've liked.
  But I just couldn't handle it when I got into like COMP102 and in the first lecture, the lecturer is all "has anybody not used the internet before?"
  I spent my childhood doing the stuff so I just had to bail. I'm sure others would find it rewarding (particularly those that were in my classes because 'a computer job is a good job for money').
- mjrbrennan a day ago
  
  Yes that's what gets me too. I want to engage with my coworkers, you know other humans? And get their ideas and input and summaries. Not just sit in my office alone having the computer explain everything to me badly, or read through Powerpoints of all things...
  
  TeMPOraL 20 hours ago
  
  > I want to engage with my coworkers, you know other humans?
  I.e. the very species we try to limit our contact with, which is why we chose this particular field of work? Or are you from the generation that joined software for easy money? :).
  /s, but only partially.
  There are aspects of this work where to "engage with my coworkers" is to be doing the exact opposite of productive work.
- CuriouslyC 18 hours ago
  
  Naw, the new future (technically the present for orgs that use AI intelligently) is:
  The AI already generated comprehensive README.md files and detailed module/function/variable (as needed) doc comments, which you could read but end up mostly being consumed by another AI, so you can just tell it what you're trying to do and ask it how you might accomplish that in the codebase, first at a conceptual level, then in code once you feel comfortable enough with the system to be able to validate the work.
  All the while you're sitting next to another coworker who's also doing the same thing, while you talk about high level architecture stuff, make jokes, and generally have a good time. Shit, I don't even mind open offices as much as I used to, because you don't need that intense focus to get into a groove to produce code quickly like you did when manually writing it, so you can actually have conversations with an entire table of coworkers and still be super productive.
  No comment on the political/climate side of this timeline, but the AI part is pretty good when you master it.
  
  mvieira38 14 hours ago
  
  What kind of stuff are you building where that is even remotely possible? I get that generating documentation works fine, but building features just isn't there yet for non-trivial apps, and don't even get me started on trying to get the agents to backtrack and change something they did
- crucialfelix a day ago
  
  Usually the tricks and problems in a codebase are not in the codebase at all, they are in somebody's head.
  It would be helpful if I had a long rambling dialogue with a chat model and it distilled that.
  
  dwringer a day ago
  
  > It would be helpful if I had a long rambling dialogue with a chat model and it distilled that.
  IME this can work pretty well with Gemini in the web UI. If it misinterprets you at any stage you can edit your last comment until it gets on the same page, so to speak. Then once you're to a point in the conversation where you're satisfied it seems to "get it", you can drop in some more directly relevant context like example code if needed and ask for what you want.
fennecfoxy 18 hours ago

Yup, you can always tell LLMs just from the ridiculous output most of the time. Like 8-20 sentences minimum, for the most basic thing.
Even Gemini/gpt4o/etc are all guilty of this. Maybe they'll tighten things up at some point - if I ask an assistant a simple question like "is it possible to put apples into a pie?" what I want is "Yes, it is possible to put apples into a pie. Would you like to know more?"
But not "Yes, absolutely — putting apples into a pie is not only possible, it's classic! Apple pie is one of the most well-known and traditional fruit pies. Typically, sliced apples are mixed with sugar, cinnamon, nutmeg, and sometimes lemon juice or flour, then baked inside a buttery crust. You can use various types of apples depending on the flavor and texture you want (like Granny Smith for tartness or Honeycrisp for sweetness). Would you like a recipe or tips on which apples work best?" (from gpt4).
block_dagger a day ago

You can specify desired style in the prompt. The author seems to like PR sounding fluff while making morning coffee.
ozim 8 hours ago

Python, a journey that began with an initial commit and evolved through a series of careful refinements to establish a robust foundation for the project..
Wow yeah what a waste. That is exactly the opposite of saving time.
TeMPOraL a day ago

If this was meant to be read, I might've agreed, but:
1) This was supposed to be piped through TTS and listened to in the background, and...
2) People like podcasts.
Your typical podcast is much worse than this. It's "blah blah" and "hahaha <interaction>" and "ooh <emoting>" and "<irrelevant anecdote>" and "<turning facts upside down and injecting a lie for humorous effect>", and maybe some of the actual topic mixed in between, and yet for some reason, people love it.
I honestly doubt this specific thing would be useful for me, but I'm not going to assume it's plain dumb, because again, podcasts are worse, and people love it.
- xandrius 19 hours ago
  
  What kind of podcast have you listened to, if any?
  They aren't all Joe Rogan.
  
  TeMPOraL 18 hours ago
  
  Name one that isn't > 90% fluff and human interaction sounds.
  
  dghlsakjg 14 hours ago
  
  Conversations with Tyler Cowen, Complex Systems with patio11 are two off the top of my head that concentrate on useful information, and certainly aren't "> 90% fluff and human interaction sounds".
  Unless of course people talking in any capacity is human interaction sounds, in which case, yes, every podcast is > 90% human interaction sounds.
  
  TeMPOraL 10 hours ago
  
  Thanks. I didn't realize 'patio11 even has a podcast, I'll definitely want to listen to that one.
  > Unless of course people talking in any capacity is human interaction sounds, in which case, yes, every podcast is > 90% human interaction sounds.
  No, I specifically mean all the thing that is not content - hellos, jokes, emoting, interrupting, exchanging filler commentary, etc. It may add character to the show, but from the POV of efficiently summarizing a topic, it's fundamentally even worse than the enterprisey BS fluff in the example in question.
  
  xandrius 12 hours ago
  
  RadioLab, 99% invisible, Revisionist History, Everything is alive.
fullstackchris a day ago

Yeah I was done at "What happened here was more than just code..." -_-
- jsjohnst a day ago
  
  You got past the grey text on gray background? -_-
  
  rcleveng a day ago
  
  I didn't. I open up Chrome's Developer Tools and drop this into the console:
  document.body.style.backgroundColor = "black";
rsynnott 18 hours ago

Yeah, I honestly don't know how anyone can put up with reading this sort of thing, much less have it read to them by a computer(!)
I suppose preferences differ, but really, does anyone _like_ this sort of writing style?
beigebrucewayne a day ago

I agree, it's atrocious!
1. I shouldn't have used a newly created repo that had no real work over the course of the last week.
2. I should have put more time into the prompt to make it sound less nails on chalkboard.
TZubiri a day ago

Remember the sycophant bug? Maybe making the user FEELGOOD is part of what makes it feel smart or like a good experience. Is the reward function being smart? Is it maximizing interaction? Does it conflict with being accurate?

blahgeek a day ago

Asking it to explain rust borrow checker is one of the worst examples to demonstrate its ability to read code. There are piles of that in its training data.

dundarious a day ago

Agreed, ask it to explain how exceptions are handled in python asyncio tasks, even given all the code, and it will vacillate like the worst intern in the world. What's more, there's no way to "teach" it, and even if there was, it would not last beyond the current context.
A complete waste of time for important but relatively simple tasks.
gilbetron 15 hours ago

"There are piles of that in its training data"
Such a weird complaint. If you were to explain the rust borrow checker to me, should I complain that it doesn't count because you had read explanations of the borrow checker? That it was "in your training data"? I mean, do you think you just understand the borrow checker without being taught about it in some form?
I mean, I get what you are kind of saying, that there isn't much evidence that they tools are able to generate new ideas, and that the sheer amount of knowledge it has obscures the detection of that phenomenon, but practically speaking I don't care because it is useful and helpful (within its hallucinatory framework).

rbren a day ago

I’m biased [0], but I think we should be scripting around LLM-agnostic open source agents. This technology is changing software development at its foundations—-we need to ensure we continue to control how we work.

[0] https://github.com/all-hands-ai/openhands

robotbikes a day ago

This looks like a good resource. There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3. Ollama makes it simple to run them on your own hardware, but the cost of the GPU is a significant investment. But if you are paying $250 a month for a proprietary tool it would pay for itself pretty quickly.
- NitpickLawyer a day ago
  
  > There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3.
  I'd caution against using devstral on a 24 gb vram budget. Heavy quantisation (the only way to make it fit into 24gb) will affect it a lot. Lots of reports on locallama about subpar results, especially from kv cache quant.
  We've had good experiences with running it fp8 and full cache, but going lower than that will impact the quality a lot.
- seanmcdirmid a day ago
  
  A Max M3 with 64 GB works well for a wider range of models although it fairs worse on stable diffusion jobs. Plus you can get it as a laptop.
handfuloflight a day ago

But what do we do if the closed models are just better?
- bluefirebrand a day ago
  
  Steal from them shamelessly, the same way they stole from everyone else?
  
  dghlsakjg 14 hours ago
  
  Isn't abusing the OpenAI terms of service part of how Deepseek did training?
  
  hsuduebc2 a day ago
  
  You are onto something.
  
  datameta a day ago
  
  Seems ethically sound to me.
- rkangel 19 hours ago
  
  The agents are separate from the models. Claude Code only allows you to use Claude, but Aider allows you to use any model.
  
  handfuloflight 19 hours ago
  
  How does that solve the problem of closed models being better than open models?
  
  hn8726 18 hours ago
  
  There is no problem. OP said we should be using open _agents_, not open _models_. You can use an open agent with any model, open or closed, while using something like Claude Code locks you in to one model vendor
  
  handfuloflight 10 hours ago
  
  I know what OP said and I asked a question in turn.
- davidmurdoch a day ago
  
  Wait?
  
  handfuloflight a day ago
  
  And get superseded by competitors willing to spend on those models?
ProofHouse a day ago

This 10000%

jasonthorsness a day ago

The terminal really is sort of the perfect interface for an LLM; I wonder whether this approach will become favored over the custom IDE integrations.

ed_mercer a day ago

Exactly. It has access to literally everything including any MCP server. It's so awesome having claude code check my database using a read-only user, or have it open a puppeteer browser and check whether its CSS changes look weird or not. It's the perfect interface and anthropic nailed it.
It can even debug my k8s cluster using kubectl commands and check prometheus over the API, how awesome is this?
- leptons a day ago
  
  > or have it open a puppeteer browser and check whether its CSS changes look weird or not.
  It's got 7 fingers? Looks fine to me! - AI
  
  paulluuk a day ago
  
  Me laughing as a human non-frontend dev having to do anything related to CSS
  The number of times that my manager or coworkers have rejected proposals for technical solutions because I can't make a webpage look halfway decent is too damn high.
  
  leptons 11 hours ago
  
  The one thing "AI" actually does well enough for me is writing CSS. It's actually the only thing I trust it with, because there is very little consequence to trusting the output when it writes CSS.
  I have a designer on my team that adds their polish to the basic HTML and CSS I produce, but first I have to produce it. I really don't care what the front-end ends up looking like, that's for someone else to worry about. So I let the "AI" write the CSS for buttons and other UI elements, which it is good enough at to save me time. Then I hand it off to the designer and they finish the product, make the buttons match the rest of the buttons, fix the padding, whatever. It certainly has accelerated that part of my workflow, and it produces way better looking front-end UI styling than I would care to spend my time on. If I didn't have the designer, the AI-generated CSS would be good enough for most people. But, I wouldn't trust the AI to tell me if a page "looks weird". I have no doubt it would become a nuisance of false-positives, or just not reporting problems that actually exist.
drcode a day ago

sort of, except I think the future of llms will be to to have the llm try 5 separate attempts to create a fix in parallel, since llm time is cheaper than human time... and once you introduce this aspect into the workflow, you'll want to spin up multiple containers, and the benefits of the terminal aren't as strong anymore.
- sothatsit a day ago
  
  I feel like the better approach would be to throw away PRs when they're bad, edit your prompt, and then let the agent try again using the new prompt. Throwing lots of wasted compute at a problem seems like a luxury take on coding agents, as these agents can be really expensive.
  So the process becomes: Read PR -> Find fundamental issues -> Update prompt to guide agent better -> Re-run agent.
  Then your job becomes proof-reading and editing specification documents for changes, reviewing the result of the agent trying to implement that spec, and then iterating on it until it is good enough. This comes from the belief that better, more expensive, agents will usually produce better code than 5 cheaper agents running in parallel with some LLM judge to choose between or combine their outputs.
- sally_glance a day ago
  
  Who or what will review the 5 PRs (including their updates to automated tests)? If it's just yet another agent, do we need 5 of these reviews for each PR too?
  In the end, you either concede control over 'details' and just trust the output or you spend the effort and validate results manually. Not saying either is bad.
  
  smallnamespace a day ago
  
  If you can define your problem well then you can write tests up front. An ML person would call tests a "verifier". Verifiers let you pump compute into finding solutions.
  
  bcrosby95 a day ago
  
  I'm not sure we write good tests for this because we assume some kind of logic involved here. If you set a human to task to write a procedure to send a 'forgot password' email, I can be reasonably sure there's a limited number of things a human would do with the provided email address, because it takes time and effort to do more than you should.
  However with an LLM I'm not so sure. So how will you write a test to validate this is done but also guarantee it doesn't add the email to a blacklist? A whitelist? A list of admin emails? Or the tens of other things you can do with an email within your system?
  
  djeastm a day ago
  
  Will people be willing to make their full time job writing tests?
  
  TeMPOraL 21 hours ago
  
  They probably won't. But it doesn't matter. Ultimately, we'll all end up doing manual labor, because that is the only thing we can do that the machines aren't already doing better than us, or about to be doing better than us. Such is the natural order of things.
  By manual labor I specifically mean the kind where you have to mix precision with power, on the fly, in arbitrary terrain, where each task is effectively one-off. So not even making things - everything made at scale will be done in automated factories/workshops. Think constructing and maintaining those factories, in the "crawling down tight pipes with scewdriver in your teeth" sense.
  And that's only mid-term; robotics may be lagging behind AI now, but it will eventually catch up.
  
  ericrallen a day ago
  
  We’ll just have an LLM write the tests.
  Now we can work on our passion projects and everything will just be LLMs talking to LLMs.
  
  therein 20 hours ago
  
  I hope sarcasm.
  
  ehnto a day ago
  
  As well, just because it pasts a test doesn't mean it doesn't do wonky, non-performant stuff. Or worse, side effects no one verified. Plenty often the LLM output will add new fields I didn't ask it to change as one example.
- cwlb a day ago
  
  https://github.com/dagger/container-use
- jyounker a day ago
  
  Having command line tools to spin up multiple containers and then to collect their results seems like it would be a pretty natural fit.
- peab a day ago
  
  dagger does this: https://www.youtube.com/watch?v=C2g3vdbffOI
- mejutoco a day ago
  
  Why would spinning containers remove the benefits? Presumably there is a terminal too interacting with the containers.
- eru a day ago
  
  Nah, if parallelism will help, it'll be abstracted away from the user.
- jtms a day ago
  
  Tmux?
mountainriver a day ago

What??? It’s literally the worst interface
Do you not want to edit your code after it’s generated?
- bretpiatt a day ago
  
  I'm running terminal in one window with AI interaction and then VS Code with project on same directories so I can see via color coding updated or new files to review in the IDE.
  How do you interact with your projects?
  
  WorldMaker 8 hours ago
  
  How is that better than running your AI interaction in a dedicated toolpane/subwindow directly inside your IDE?
  The Chat panel in VS Code has seen a lot of polish, can display full HTML including formatting Markdown nicely, has some fancy displays for AI context such as file links, supports hyperlinks everywhere, and has fancy auto-complete popups for things like @ and # and / mentioned "tools"/"agents"/whatever. Other VS Code widgets can show up in the Chat panel, too. The Chat Panel you can dock in either sidebar and/or float as its own window.
  A terminal can do most of those things too, with effort and with nothing quite like the native experience of your IDE and its widgets. It seems like a lesser experience than what VS Code already offers, other than you only have one real choice for AI assistant that supports VS Code's Chat panel (though you still have model choice).
  
  never_inline a day ago
  
  I run aider in VSCode terminal so that I can fix smaller lint errors myself without another AI back-and-forth.
  
  mountainriver 12 hours ago
  
  this is demonstrably worse than cursor
- aaronbrethorst a day ago
  
  Sure, in VS Code. Or Xcode. Or IntelliJ/GoLand/RubyMine.
- handfuloflight a day ago
  
  ...if your IDE doesn't have a terminal then it isn't an IDE.
  
  bigiain a day ago
  
  The "old wisdom" on comp/lang.perl.misc, when new people asked what was the best IDE to Perl programming, was "Unix".
  You get both editors to choose from, vi _and_ emacs! All the man pages you could possibly want _and_ perldocs! Of _course_ as a Perl newbie you'll be able to fall back on gdb for complicated debugging where print statements no longer cut it.
  
  mountainriver 12 hours ago
  
  why wouldn't you want the diffs in the IDE? Its richer and you can do more with them
  
  leptons a day ago
  
  I have a whole other screen for my terminal(s). The IDE already has enough going on in it.
  
  handfuloflight a day ago
  
  Then you are not impeded from editing your code because it was written through a terminal process, which seems to be OP's contention.
ldjkfkdsjnv a day ago

as the models get better, IDEs will be seen as low level
- magackame a day ago
  
  Wait you write your code by hand??? ewww...
  
  fragmede a day ago
  
  Aider's supported /voice for a while now.
  
  42lux a day ago
  
  voice is probably the worst human -> compute interface we have.
  
  datameta a day ago
  
  Human speech evolved with biological constraints and through neurological adaptions to emit and understand the nonlinear output that has lexically fuzzy areas to the untrained ear. So I think it's a rather "lossy" analog to digital conversion because the computer is simulating understanding of a form of information transfer that it itself is not constrained by (digital systems don't have vocal cords and could transmit anything).
  
  eru a day ago
  
  You could say that about any form of human communication at all.
  
  datameta 9 hours ago
  
  Does every other form of human communication have an analog in digital systems that has both the capability to be "better" while also putting a lot of resources into modeling the relatively nonlinear human version?

jumski a day ago

Great article! I have similar observations and techniques and Claude Code is exceptionally good - most of the days I'm working on multiple things at once (thanks to git worktrees) and each going faster than ever - that's really crazy.

For the "sub agents"thing, I must admit, that Claude Code calling o3 via sigoden/aichat saved me countless of times!

There are just issues that o3 excells at (race conditions, bug hunting - anything that requires lot of context and really high reasoning abilities).

But I'm using it less since Opus 4 came out. And of course its none of the sub-agent thing at all.

I use this prompt @included in the main CLAUDE.md: https://github.com/pgflow-dev/pgflow/blob/main/.claude/advan...

sigoden/aichat: https://github.com/sigoden/aichat

myflash13 20 hours ago

wait what? how do you work on multiple things at once with git worktrees?
- pjm331 16 hours ago
  
  I never had any reason to use it before claude code et al. So I also wasn’t aware
  Commands for working with copies of your entire repo in a new folder on a new branch
  https://git-scm.com/docs/git-worktree
  
  noiwillnot 16 hours ago
  
  This is amazing, I had no idea about this, I have been cloning my repo locally for years.
- jumski 16 hours ago
  
  git worktree uses one repo to laid out multiple branches in separate directories.
  git worktree add new/path/for/worktree branchname
  I now refuse to use git checkout to switch branches, always keep my main branch checked out and updated and always use worktrees to work on features. Love this workflow!

bionhoward a day ago

Assuming attention to detail is one of the best signs people give a fuck about craftsmanship, isn’t the fact the Anthropic legal terms are logically impossible to satisfy a bad sign for their ability to be trusted as careful stewards of ASI?

Not exactly “three laws safe” if we can’t use the thing for work without violating their competitive use prohibition

alwa a day ago

I can’t speak for their legal department, but their product, Claude Code, bears signs of lavish attention to detail. Right down to running Haiku on the context to come up with cute appropriate verbs for the “working…” indicators.

abhisheksp1993 a day ago

``` claude --dangerously-skip-permissions # science mode ```

This made me chuckle

SamPatt a day ago

>Claude code feels more powerful than cursor, but why? One of the reasons seems it's ability to be scripted. At the end of the day, cursor is an editor, while claude code is a swiss army knife (on steroids).

Agreed, and I find that I use Claude Code on more than traditional code bases. I run it in my Obsidian vault for all kinds of things. I run it to build local custom keyboard bindings with scripts that publish screenshots to my CDN and give me a markdown link, or to build a program that talks to Ollama to summarize my terminal commands for the last day.

I remember the old days of needing to figure out if the formatting changes I wanted to make to a file were sufficient to build a script or just do them manually - now I just run Claude in the directory and have it done for me. It's useful for so many things.

Aeolun a day ago

The thing is, Claude Code only works if you have the plan. It’s impossible to use it on the API, and it makes me wonder if $100/month is truly enough. I use it all day every day now, and I must be consuming a whole lot more than my $100 is worth.
- CGamesPlay a day ago
  
  You use it "all day every day", so it makes sense that you would prefer the plan. It's perfectly economical to use it without a plan, if your usage patterns are different. Here's a tool someone else wrote that can help you decide: https://github.com/ryoppippi/ccusage
  
  Aeolun 18 hours ago
  
  Sure, but if your usage pattern is such that you can’t justify the plan, then Cursor is a better option :)
- davidw a day ago
  
  One thing that I am not liking about the LLM world is that it seems to be tilting several things back in favor of BigCorps.
  The open source world is one where antirez, working on his own off in Sicily, could create a project like Redis and then watch it snowball as people all over got involved.
  Needing a subscription to something only a large company can provide makes me unhappy.
  We'll see if "can be run locally" models for more specific tasks like coding will become a thing, I guess.
  
  SamPatt a day ago
  
  I share this concern - given the trajectory of improvements I do hope that we'll have something close to this level that can run locally within 18 months or so. And of course the closed source stuff will likely be better by then, but I genuinely believe I would choose an open source version of this right now if I had the choice.
  The open source alternatives I've used aren't there yet on my 4090. Fingers crossed we'll get there.
  
  TSiege a day ago
  
  This is some nightmare fuel vendor lock-in where the codebase isn't understood by anyone and companies have to fork over more and more otherwise their business couldn't grow, adapt, etc
  
  datameta a day ago
  
  Yikes, you've just perfectly articulated a trajectory that I've been using subconsciously as one of the primary reasons why I want to keep my coding craft sharp.
  
  3rdDeviation a day ago
  
  Visionary, well done. Then comes the claim that AGI can unwind the spaghetti, and then the reality check.
  I, for one, welcome our new LLM overlords.
- sorcerer-mar a day ago
  
  > It’s impossible to use it on the API
  What does this mean?
  
  oxidant a day ago
  
  Not OP but probably just cost.
  
  SV_BubbleTime a day ago
  
  This.
  You can EASILY burn $20 a day doing little, and surely could top $50 a day.
  It works fine, but the $100 I put in to test it out did not last very long even on Sonnet.
- ggsp a day ago
  
  You can definitely use Claude Code via the API
  
  lawrencechen a day ago
  
  I think he means it's not economically sound to use it via API
  
  wahnfrieden a day ago
  
  A well-known iOS dev used Claude Code to build an iOS app and wrote a custom checking tool for how many tokens it consumed on the plan to compare with API pricing.
  He uses two max plans ($200/mo + $200/mo) and his API estimate was north of $10,000/mo
- practal a day ago
  
  I think it is available on Claude Pro now, so just $20.
  
  razemio a day ago
  
  It is but very limited. I use API only, since this is the only plan, without usage limits and on demand pricing:
  5x Pro usage ($100/month)
  20x Pro usage ($200/month)
  Source: https://support.anthropic.com/en/articles/11145838-using-cla...
  "Pro ($20/month): Average users can send approximately 45 messages with Claude every 5 hours, OR send approximately 10-40 prompts with Claude Code every 5 hours."
  "You will have the flexibility to switch to pay-as-you-go usage with an Anthropic Console account for intensive coding sprints."
jjice a day ago

I'm very interested to hear what your uses cases are when using it in your Obsidian Vault
- SamPatt a day ago
  
  Formatting changes across lots of notes, creating custom plugins, diagnosing problems with community plugins, creating a syncing program that compares my vault (with publish:true frontmatter) to my blog repo and if see changes then automatically updates the repo (which is used to build my site), creating a tool that converts inline urls to markdown footnotes, etc.
  Obsidian is my source of truth and Claude is really good at managing text, formatting, markdown, JS, etc. I never let it make changes automatically, I don't trust it that much yet, but it has undoubtedly saved me hours of manual fiddling with plugins and formatting alone.
cpard a day ago

How do you script Claude code? I've been using it as a CLI but haven't thought of invoking Claude code through a script, sounds very interesting.
- dghlsakjg 14 hours ago
  
  Read the article. Basically just using aliases for specififc prompts.
AstroBen a day ago

I had an LLM sort a crap-tonne of my notes into category folders the other day. My god that was helpful

AstroBen a day ago

Side note but the contrast between background and text here makes this really hard to read

thunkle a day ago

For me it's the blinking cursor at the top... It's hard to focus on the text.
jsjohnst a day ago

You aren’t missing much if you just skip it

Syzygies a day ago

No mention of Opus there or here (so far).

Having tried everything I settled on a $100/month Anthropic "Max" plan to use Claude Code. Then I learned how Claude Opus 4 is currently their best but most expensive model for my situation (math code and research). I limited out of a five hour session, switched to their API, and burned $20 in an hour. So I upgraded to $200/month "Max" and haven't hit limits yet.

Models matter. All these stories are like "I met a person who wasn't that smart." Duh!

beigebrucewayne a day ago

All of this was with Opus.
- luckystarr 18 hours ago
  
  I recently investigated some problematic behaviour of both Opus 4 and Sonnet 4. When tasked to develop something more complicated (broker fed task management system, staggered execution scheduler) they would inevitably produce thousands of lines of over engineered, unmaintainable garbage. When Opus was then tasked to simplify it it boiled it down to 300 lines in one shot. The result was brilliant. This happened twice.
  Moral of the story: I found out that I didn't constrain them enough. I now insist that they keep the core logic to a certain size (e.g. 300 lines) and not produce code objects for each concept but rather "fold them into the code".
  This improved the output tremendously.

tinyhouse a day ago

This article is a bit all over the place. First, a slide deck to describe a codebase is not that useful. There's a reason why no one ever uses a slide deck for anything besides supporting an oral presentation.

Most of these things in the post aren't new capabilities. The automation of workflows is indeed valuable and cool. Not sure what AGI has anything to do with it.

bravesoul2 a day ago

Also I don't trust it. They touched on that I think (I only skimmed).
Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable! Of course capital likes shortcuts and hacks to get the next feature out in Q3.
- imiric a day ago
  
  > Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable!
  The kind of person who prefers this setup wants to read (and write) the least amount of code on their own. So their ideal workflow is one where they get to make programs through natural language. Making codebases understandable for this group is mostly a waste of effort.
  It's a wild twist of fate that programming languages were intended to make programming friendly to humans, and now humans don't want to read them at all. Code is becoming just an intermediary artifact useless to machines, which can instead write machine code directly.
  I wish someone could put this genie back in the bottle.
  
  DougMerritt a day ago
  
  > It's a wild twist of fate that programming languages were intended to make programming friendly to humans, and now humans don't want to read them at all.
  Those are two different groups of humans, as you implied yourself.
- lelandbatey a day ago
  
  There is no amount of static material that will perfectly conform to the shape and contours of every mind that consumes that static material such that they can learn what they want to learn when they want to learn it.
  Having a thing that is interactive and which can answer questions is a very useful thing. A slide deck that sits around for the next person is probably not that great, I agree. But if you desperately want a slide deck, then an agent like Claude which can create it on demand is pretty good. If you want summaries of changes over time, or to know "what's the overall approach at a jargon-filled but still overview level explanation of how feature/behavior X is implemented?", an agent can generate a mediocre (but probably serviceable) answer to any of those by reading the repo. That's an amazing swiss-army knife to have in your pocket.
  I really used to be a hater, and I really did not trust it, but just using the thing has left me unable to deny its utility.
  
  bravesoul2 a day ago
  
  The problem is if no one can describe something with words without an LLM to scour though every line of code it probably means it can't make sense to humans.
  Maybe that is the idea (vibe coding ftw!) but if you want something people can understand and refine it is good to make it modular and decomposable and understandable. Then use AI to help you with the words for sure but at some level there is a human that understands the structure.
- groby_b a day ago
  
  > Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable!
  <laughs in legacy code>
  And fundamentally, that isn't a function of "capital". All code bases are shaped by the implicit assumptions of their writers. If there's a fundamental mismatch or gap between reader and writer assumptions, it won't be readable.
  LLMs are a way to make (some of) these implict assumptions more legible. They're not a panacea, but the idea of "just make it more understandable" is not viable. It's on par with "you don't need debuggers, just don't write bugs"
sandos 16 hours ago

The number one thing I have found LLMs useful for is producing mermaidjs diagrams of code. Now, I know they are not always perfect but it has been "good enough" very many times, and I have never seen hallucinations here, only omissions. If I notice something missing its super-easy to tell it to amend.
Uehreka a day ago

> Not sure what AGI has anything to do with it.
Judging from the tone of the article, they’re using the term AGI in a jokey way and not taking themselves too seriously, which is refreshing.
I mean like, it wouldn’t be refreshing if the article didn’t also have useful information, but I do actually think a slide deck could be a useful way to understand a codebase. It’s exactly the kind of nice-to-have that I’d never want a junior wasting time on, but if it costs like $5 and gets me something minorly useful, that’s pretty cool.
Part of the mind-expanding transition to using LLMs involves recognizing that there are some things we used to dislike because of how much effort they took relative to their worth. But if you don’t need to do the thing yourself or burn through a team member’s time/sanity doing it, it can make you start to go “yeah fuck it, trawl the codebase and try to write a markdown document describing all of the features and requirements in a tabular format. Maybe it’ll go better than I expect, and if it doesn’t then on to something else.”

dirtbag__dad a day ago

This article is inspiring. I haven’t had the moment to get my head out of the Cursor + biz logic water until now. Very cool to think about LLMs automagically creating changelogs, testing packaging when dependencies are bumped, forcing unit tests on features.

Is anyone aware of something like this? Maybe in the GitHub actions or pre-commit world?

pjm331 a day ago

https://docs.anthropic.com/en/docs/claude-code/github-action...
citizenpaul a day ago

>automagically creating changelogs, testing packaging when dependencies are bumped, forcing unit tests on features.
Yeah now companies that paid lip service to those things can still not have them but pretend they do cause the AI did it....

b0a04gl a day ago

summaries like this are less about helping the dev and more about shaping commit history. when you let a model generate descriptions, tests, and boilerplate, you're also letting it define what counts as acceptable change. over time that shifts the team's review habits. if the model consistently downplays risky edits or adds vague tests, the bar drops silently. would be more useful to trace how model-written code affects long-term bug rate and revert patterns

tom_m a day ago

Well, there will always be a job for programmers folks.

dweinus a day ago

> Is it Shakespeare? No.

It's at least decent though, right?

> "What emerged over these seven days was more than just code..."

Yeesh, ok, but is it accurate?

> Over time this will likely degrade the performance and truthfulness

Sure, but it's cheap right?

> $250 a month.

Well at least it's not horrible for the environment and built on top of massive copyright violations, right?

Right?

citizenpaul a day ago

>openai codex (soon to be rewritten in rust)

Lol, I guess their AI is too good for a redactor. Better have humans do it.

rikschennink a day ago

I tried to read this on mobile but the blinking cursor makes it impossible.

beigebrucewayne a day ago

Removed it! I agree it was distracting.

hoppp 21 hours ago

First time I heard about marp, very handy tool

jvanderbot a day ago

I can't wait until Section 174 changes are repealed and nobody is financially invested in software from AI anymore.

tra3 a day ago

Thank you, finally a realistic take.
- jvanderbot 17 hours ago
  
  It seems I'm in the vast minority. Post hoc ergo proper hoc indeed.
eru a day ago

America = world?
- jvanderbot 17 hours ago
  
  Is this meant to say "I don't care because I'm not in USA"? Or "it's not a problem because it's only USA?" Or "don't speak of US-specific situations on this forum because it contains people of many nationalities?"
  It's entirely possible for a world changing tech to be created and steered to match a unique problem inside one country, and for that to change job markets everywhere.
  
  eru 3 hours ago
  
  Speaking of US-specific situations is fine. Or in general, speaking of any specific institutions.
  I objected to the '[...] and nobody is financially invested in software from AI anymore.' That's a rather dubious claim of a universal consequence for a change that only affects the US.
  If the comment was 'I can't wait until Section 174 changes are repealed and nobody in the US is financially invested in software from AI anymore.' I would have nothing to complain about.
  To critique the content more specifically and explicitly, and not just the form: people and companies all around the world have plenty of incentives to invest in AI. A tax change in the US might change the incentives in the US slightly. But it won't have much of an impact on the incentives in Europe, China, etc.
  And even in the US, even with that suggested tax change, I doubt it'll lead to 'nobody [in the US being] financially invested in software from AI anymore.'
  Basically, the original comment was hyperbole at best and BS at worst.

distortionfield a day ago

Unrelated; but I am absolutely in love with this blog theme and color scheme.

fullstackchris a day ago

Gonna be a bit blunt here and ask why hooking up an agentic CLI tool to one or more other software tool(s) is the top post on HN right now... sure, some of these ideas are interesting but at the end of the day literally all of them have been explored / revisited by various MCP tools (or can be done more or less in scripted / hacked ways as the author shows here)

I don't know, just feels like a weird community response to something that is the equivalent to me of bash piping...

42lux a day ago

If people would be as patient and inventive to teach junior devs as they are with llms the whole industry would be better of.

sorcerer-mar a day ago

You pay junior devs way way way more money for the privilege of them being bad.
And since they're human, the juniors themselves do not have the patience of an LLM.
I really would not want to be a junior dev right now... Very unfair and undesirable situation they've landed in.
- mentos a day ago
  
  At least it’s easier to teach yourself anything now with an LLM? So maybe it balances out.
  
  sorcerer-mar a day ago
  
  I think it's actually even worse: it's easier to trick yourself into thinking you're teaching yourself anything.
  Learning comes from grinding and LLMs are the ultimate anti-intellectual-grind machines. Which is great for when you're not trying to learn a skill!
  
  andy99 a day ago
  
  Even though I think most people know this deep down, I still don't think we actively realize how optimized LLMs are towards sounding good. It's the ultra processed food version of information consumption. People are super lazy (economical if you like) and rlhf et al have optimized LLM output to being easy to digest.
  Consequence is you get a bunch of output that looks really good as long as you don't think about it (and they actively promotes not thinking about it) that you don't really understand, and that if you did dig into you'd realize is empty fluff or actively wrong.
  It's worse than not learning, it's actively generating unthinking but palatable garbage that's the opposite of learning.
  
  jyounker a day ago
  
  Yeah, you have to be really careful about how you use LLMs. I've been finding it very useful to use them as teachers, or to use them in the same way that I'd use a coworker. "What's the idiomatic ways to write this python comprehension in javascript?" Or, "Hey, do you remember what you call it when..." And when I request these things I'll try to ask in the most generic way possible so that I then get retype the relevant code, filling in the blanks with my own values.
  That's just one use though. The other is treating it like it's a jr developer, which has its own shift in thinking. Practice in writing details specs goes a long way here.
  
  sorcerer-mar a day ago
  
  100% agreed.
  > Practice in writing details specs goes a long way here.
  This is an additional asymmetric advantage to more senior engineers as they use these tools
  
  tnel77 a day ago
  
  >>Learning comes from grinding
  Says who? While “grinding” is one way to learn something, asking AI for a detailed explanation and actually consuming that knowledge with the intent to learn (rather than just copy and pasting) is another way.
  Yes, you should be on guard since a lot of what it says can be false, but it’s still a great tool to help you learn something. It doesn’t completely replace technical blogs, books, and hard earned experience, but let’s not pretend that LLMs, when used appropriately, don’t provide an educational benefit.
  
  sorcerer-mar a day ago
  
  Pretty much all education research ever points to the act of actually applying knowledge, especially against variable cases, to be required to learn something.
  There is no learning by consumption (unfortunately, given how we mostly attempt to "educate" our youth).
  I didn't say they don't or can't provide an educational benefit.
  
  fullstackchris a day ago
  
  Some of the best software learning I ever had when I was starting out was following along with video courses and writing the code line by line along with the instructor... or does this not count as "consumption"?
  
  sorcerer-mar a day ago
  
  > I was... following along and writing the code line by line
  That's application. Then presumably you started deviating a little bit from exactly what the instructor was doing. Then you deviated more and more.
  If you had the instructor just writing the code for every new deviation you wanted to build and you just had to mash the "Accept Edit" button, you would not have learned very effectively.
  
  djeastm a day ago
  
  Sure, but easy in, easy out. Hard earned experience is worth soo much more than slick summaries of the last twenty years of blog articles.
- fallinditch a day ago
  
  Maybe it's the senior devs who should be the ones to worry?
  Seniors' attitudes on HN are often quick to dismiss AI assisted coding as something that can't replace the hard-earned experience and skill they've built up during their careers. Well maybe, maybe not. Senior devs can get a bit myopic in their specializations. Whereas a junior Dev doesn't have so much baggage, maybe the fertile brains of youth are better in times of rapid disruption where extreme flexibility of thought is the killer skill.
  Or maybe the whole senior/junior thing is a red herring and pure coding and tech skills are being deflated all across the board. Perhaps what is needed now is an entirely new skill set that we're only just starting to grasp.
  
  AdieuToLogic a day ago
  
  > Seniors' attitudes on HN are often quick to dismiss AI assisted coding as something that can't replace the hard-earned experience and skill they've built up during their careers.
  One definition of experience[0] is:
  direct observation of or participation in events as a basis of knowledge
  Since I assume by "AI assisted coding" you are referring to LLM-based offerings, then yes, "hard-earned experience and skill" cannot be replaced with a statistical text generator.
  One might as well assert an MS-Word document template can produce a novel Shakespearean play or that a spreadsheet is an IRS auditor.
  > Or maybe the whole senior/junior thing is a red herring and pure coding and tech skills are being deflated all across the board. Perhaps what is needed now is an entirely new skill set that we're only just starting to grasp.
  For a repudiation of this hypothesis, see this post[1] also currently on HN.
  0 - https://www.merriam-webster.com/dictionary/experience
  1 - https://blog.miguelgrinberg.com/post/why-generative-ai-codin...
  
  sally_glance a day ago
  
  Wherever you look, the conclusion is the same - balance is required. Too many seniors, you get stuck in one way streets. Too many juniors, you trip over your own feet and diverge into unknown avenues. Mix AI in, I don't see how that changes much at all... Juniors drive into unknown territory faster, Seniors get stuck in their niche just as well. Acceleration yes, fundamental change of how we work - I don't see it yet.
  
  yakz a day ago
  
  Senior devs provide better instructions to the agent, and can recognize more kinds of mistakes and can recognize mistakes more quickly. The feedback loop is more useful to someone with more experience.
  I had a feeling today that I should really be managing multiple instances at once, because they’re currently so slow that there’s some “downtime”.
  
  tonyhart7 a day ago
  
  we literally have many no code solution like wordpress etc
  do webdev is still there??? yes there are just because you can "create" something that doesn't mean you knowledge able in that area
  we literally have entire industry created to fix wordpress instance + code, what do you else we need to worry for
  
  sorcerer-mar a day ago
  
  Maybe! Probably not though.
  
  bakugo a day ago
  
  > Maybe it's the senior devs who should be the ones to worry?
  Why would they be worried?
  Who else going to maintain the massive piles of badly designed vibe code being churned out at an increasingly alarming pace? The juniors prompting it certainly don't know what any of it does, and the AIs themselves have proven time and again to be incapable of performing basic maintenance on codebases above a very basic level of complexity.
  As the ladder gets pulled up on new juniors, and the "fertile brains" of the few who do get a chance are wasted as they are actively encouraged to not learn anything and just let a computer algorithm do the thinking for them, ensuring they will never have a chance to become seniors themselves, who else will be left to fix the mess?
  
  CuriouslyC 18 hours ago
  
  If your seniors aren't analyzing the PRs being vibe coded by others in the orgs to make sure they meet quality standards, that is the source of your problem, not the vibe coding.
- jwr a day ago
  
  > You pay junior devs way way way more money for the privilege of them being bad.
  Oh, it's worse than that. You do that, and they complain that they are underpaid and should earn much, much more. They also think they are great, it's just you, the old-timer, that "doesn't get it". You invest lots of time to work with them, train them, and teach them how to work with your codebase.
  And then they quit because the company next door offered them slightly more money and the job was easier, too.
- leptons a day ago
  
  >You pay junior devs way way way more money for the privilege of them being bad.
  I hope you don't think that what you're paying for an LLM today is what it actually costs to run the LLM. You're paying a small fraction.
  So much investment money is being pumped into AI that it's going to make the 2000 dot-com bubble burst look tiny in comparison, if LLMs don't start actually returning on the massive investments. People are waking up to the realities of what an LLM can and can't do, and it's turning out to not be the genie in the bottle that a lot of hype was suggesting. Same as crypto.
  The tech world needs a hype machine and "AI" is the current darling. Movie streaming was once in the spotlight too. "AI" will get old pretty soon if it can't stop "hallucinating". Trust me I would know if a junior dev is hallucinating and if they actually are then I can choose another one that won't and will actually become a great software developer. I have no such hope for LLMs based on my experiences with them so far.
  
  TeMPOraL 18 hours ago
  
  > I hope you don't think that what you're paying for an LLM today is what it actually costs to run the LLM. You're paying a small fraction.
  Depends, right? Claude Code on a Max plan is obviously unsustainable if the API costs are any indication; people can burn through the subscription price in API credits in a day or less.
  But otherwise? I don't feel like API pricing is that unrealistic. Compute is cheap, and LLMs aren't as energy-intensive in inference as some would have you believe (especially when they conveniently mix up training and inference). And LLMs beat juniors at API prices already.
  E.g. a month ago, a few hours of playing with Gemini or Claude 3.5 / 3.7 Sonnet had me at maybe $5 for a completed little MVP of an embedded side project; it would've taken me days to do it myself, even more if I hired some random fresh grad as a junior, and $5 wouldn't fund even an hour of their work. API costs would had to be underpriced by at least two orders of magnitude for juniors to compete.
  
  sorcerer-mar a day ago
  
  Yeah, all fair, but I think there's enough capital to keep the gravy train rolling until the cost-per-performance actually get way, way, way below human junior engineers.
  A lot of the application layer will disappear when it fails to show ROI, but the foundation models will continue to have obscene amounts of money dumped into them, and the coding use case will come along with that.
- beefnugs a day ago
  
  See if the promise was real: llms are great skill multipliers! Then it is the new renaissance of one developer businesses popping up left and right every day! Ain't nobody got time for corporate coercion hierarchy nonsense.
  Hmm no news about that really
- yieldcrv a day ago
  
  > I really would not want to be a junior dev right now... Very unfair and undesirable situation they've landed in.
  I don't really get this, at the beginning of my career I masquaraded as a senior dev with experience as fast as I could until it was laundered into actual experience
  Form the LLC and that's your prior professional experience, working for it
  I felt I needed to do that and that was way before generative AI, like at least a decade
- drewlesueur a day ago
  
  I think it would be great to be a junior dev now and be able to learn quickly with llms.
  
  lelanthran a day ago
  
  > I think it would be great to be a junior dev now and be able to learn quickly with llms.
  I'm not so sure; I get great results (learning) with them because I can nitpick what they give me, attempt to explain how I understand it and I pretty much always preface my prompts with "be critical and show me where I am wrong".
  I've seen a junior use it to "learn", which was basically "How do I do $FOO in $LANGUAGE".
  For that junior to turn into a senior who prompts the way I do, they need a critical view of their questions, not just answers.
  
  jml78 a day ago
  
  If you actually want to learn………
  I have experienced multiple instances of junior devs using llm outputs without any understanding.
  When I look at the PR, it is immediately obvious.
  I use these tools everyday to help accelerate. But I know the limitations and can look at the output to throw certain junk away.
  I feel junior devs are using it not to learn but to try to just complete shit faster. Which doesn’t actually happen because their prompts suck and their understanding of the results is bad.
qsort a day ago

The vilification of juniors and the abandonment of the idea that teaching and mentoring are worthwhile are single-handedly making me speedrun burnout. May a hundred years of Microsoft Visio befall anybody who thinks that way.
- empireofdust a day ago
  
  What’s the best implementation of junior training or teaching/mentoring in general within tech that you’ve seen?
  
  qsort a day ago
  
  Unless you're running a police state environment where every minute of company time is tracked, enough opportunities for it to happen organically exist that it's not a matter of how you organize it, it's a matter of culture. Give them as much responsibility as they can handle and they'll be the ones reaching out to you.
godelski a day ago

A constant reminder: you can't have wizards without having noobs.
Every wizard was once a noob. No one is born that way, they were forged. It's in everybody's interest to train them. If they leave, you still benefit from the other companies who trained them, making the cost equal. Though if they leave, there's probably better ways to make them stay that you haven't considered (e.g. have you considered not paying new juniors more than your current junior that has been with the company for a few years? They should be able to get a pay bump without leaving)
- lunarboy a day ago
  
  I'm sure people (esp engineers) know this. But imagine you're starting a company: would you try to deploy N agents (even if shitty), or take a financial/time/legal/social risk with a new hire. When you consider short-term costs, the math just never works out in favor of real humans.
  
  godelski 9 hours ago
  
  Every single time I post my comment I get this response...
  1) There is no universal rule for anything. It doesn't have to apply to every single case. No one is saying a startup needs to hire juniors. No one is saying you have to hire only juniors. We haven't even talked about the distribution tbh. That's very open to interpretation because it is implicit that you will have to modify that based on your context.
  2) Lots of big companies still act like they're startups. You're right, that short term "the math" doesn't work out. But it does on the medium and long term. So basically as long as you aren't working at the bootstrapping stage of a startup, you want to start considering this. Different distributions for different stages, of course.
  But you shouldn't sacrifice long term rewards for short term ones. You are giving up larger rewards...
  
  geraneum a day ago
  
  Well, in the beginning, the math doesn’t work out in favor of building the software (or the thing you want to sell) either.
  
  QuercusMax a day ago
  
  What about the financial / legal / social risk of your AI agent doing something bad? You're only looking at cost savings, without seeing the potentially major downsides.
  
  shinycode a day ago
  
  To follow up my previous comment, I worked on a project where someone fixed an old bug. This bug became a feature for clients who build their systems around this api endpoint. The consequence is hundreds of thousands of user duplicates with automations attaching new ressources and actions randomly on the duplicates. Massive consequences for the customers. If it were an AI doing the fixing with no human intervention, good luck understanding, cleaning the mess and holding accountable. People seem lightly think that if the agent is doing something bad it’s just a risk to take. But when a codebase with massive amounts of loc and logic is build and no human knows it, how to deal with the consequences on people’s business ? Can’t help but think it’s crappy software with a « Google closed your Gmail account, no one knows why and we can’t do anything about it, sorry ». But instead of a mail account it’s part of your business
  
  tonyhart7 a day ago
  
  "What about the financial / legal / social risk of your AI agent doing something bad?"
  the same way we treat it like human making mistake??? AI cant code themselves, someone command them to create something
  
  shinycode a day ago
  
  I can’t stop thinking that this way of thinking is either plain wrong and misses completely what software development is really about. Or very true and in X years people will just ask the trending AI « I need a billing/CRM/X system with those constraints ». Then the AI will ask questions and refine the need. Work for 30mn the time to use libs and code the whole thing, pass into systems to test and deploy and voila. Custom feature on demand. No CEO, no sales, nobody. You just deploy your own SaaS feature. Then good luck to scale properly and migrate data and add features and complexity. If agents hold onto their promise, then the future is custom based, you deploy what you need, SaaS platform is dead with everyone in between useless.
- QuantumGood a day ago
  
  I think too many see it more as "every stem cell has the potential to be any [something]", but it's generally better to let them self differentiate until survivors with more potential exist.
  
  godelski 9 hours ago
  
  Be careful there... There are destructive steady state solutions. For example, all your cells can become cancerous. The stem cells are shaped by their environments, just like people. Don't just approach things with a laissez faire attitude. Flexibility is good, and an overly heavy hand is bad, but that doesn't mean a subtle hand is bad
- TuringNYC a day ago
  
  >> A constant reminder: you can't have wizards without having noobs.
  Try telling that to companies with quarterly earnings. Very few resist the urge to optimize for the short term.
  
  godelski 9 hours ago
  
  > Try telling that to companies with quarterly earnings.
  Who do you think I'm saying it to?
jayofdoom a day ago

I spent a lot of time in my career, honestly some of the most impactful stuff I've done, mentoring college students and junior developers. I think you are dead on about the skills being very similar. Being verbose, not making assumptions about existing context, and generalized warnings against pitfalls when doing the sort of thing you're asking it to do goes a long long way.
Just make sure you talk to Claude in addition to the humans and not instead of.
handfuloflight a day ago

[flagged]
- noman-land a day ago
  
  It sounds like this person doesn't deserve to be under your wing. Time to let him fly for himself, or crash.
- QuercusMax a day ago
  
  Damn, that sucks. My experience has been the exact opposite; maybe you need to adjust your approach and set expectations up-front, or get management involved? (I've had a similar experience to you with my teenage kids, but that's a whole other situation.)
  My M.S. advisor gave me this advice on when I should ask for help, which I've passed on to lots of junior engineers: It's good to spend type struggling to understand something, and depending on the project it's probably good to exert yourself on your own somewhere between an hour and a day. If you give up after 5 minutes, you won't learn, but if you spend a week with no progress, that's also not good.

dwohnitmok a day ago

On the one hand very cool.

On the other hand, every time people are just spinning off sub-agents I am reminded of this: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

It's simultaneously the obvious next step and portends a potentially very dangerous future.

TeMPOraL a day ago

> It's simultaneously the obvious next step
As it has been over three years ago, when that was originally published.
I'm continuously surprised both by how fast the models themselves evolve, and how slow their use patterns are. We're still barely playing with the patterns that were obvious and thoroughly discussed back before GPT-4 was a thing.
Right now, the whole industry is obsessed with "agents", aka. giving LLMs function calls and limited control over the loop they're running under. How many years before the industry will get to the point of giving LLMs proper control over the top-level loop and managing the context, plus an ability to "shell out" to "subagents" as a matter of course?
- qsort a day ago
  
  > How many years before the industry will get to the point
  When/if the underlying model gets good enough to support that pattern. As an extreme example, you aren't ever going to make even a basic agent with GPT-3 as the base model, the juice isn't worth the squeeze.
  Models have gotten way better and I'm now convinced (new data -> new opinion) that they are a major win for coding, but they still need a lot, a lot of handholding, left to their own devices they just make a mess.
  The underlying capabilities of the model are the entire ballgame, the "use patterns" aren't exactly rocket science.
- benlivengood a day ago
  
  We haven't hit the RSI threshold yet and so evolution is so slow that it's usually terminated as not-useful or it solves a concrete problem and is terminated by itself or a human. Earlier model+frameworks merely petered out almost immediately. I'm guessing it's roughly correlated with the progress on METR.
lubujackson a day ago

Am I the only one who saw in the prompt:
> ${SUGESTION}
And recognized it wouldn't do anything because of a typo? Alas, my kind is not long for this world...
- floren a day ago
  
  I noticed it and then scrolled through looking for the place where they called it out... sadly disappointed but I don't know what I expected from lesswrong

intralogic a day ago

[flagged]

CGamesPlay a day ago

In general, "reader mode". I don't use Chrome but Google suggests that it's in a menu <https://support.google.com/chrome/answer/14218344?hl=en>. Many Chrome-alikes provide it built-in (Brave calls it Speedreader), and many extensions can add it for you (Readability was the OG one).

konexis007 a day ago

jilles a day ago

How does this compare with Apples or Orange?
- brcmthrowaway a day ago
  
  How does this compare with Code::Blocks?

johnwheeler a day ago

I've actually stumbled upon a novel new way of using Claude code that I don't think anybody else is doing that's insanely better. I'll release it soon.

throwawayoldie 12 hours ago

"...but the proof is too large to fit in this margin."

aussieguy1234 a day ago

I played around with agents yesterday, now I'm hooked.

I got Claude Code (With CLine and VSCode) to do a task for a personal project. It did it about 5x faster than i'd have been able to do manually including running bash commands e.g. to install dependencies for new npm packages.

These things can do real work. If you have things in plain text format like markdown, csv spreadsheets etc, alot of what normal human employees do today could be somewhat automated.

You currently still need a human to supervise the agent and what its doing, but that won't be needed anymore in the not so distant future.