Show HN: CodeRLM – Tree-sitter-backed code indexing for LLM agents
4 by jared_stewart | 0 comments on Hacker News.
I've been building a tool that changes how LLM coding agents explore codebases, and I wanted to share it along with some early observations. Typically claude code globs directories, greps for patterns, and reads files with minimal guidance. It works in kind of the same way you'd learn to navigate a city by walking every street. You'll eventually build a mental map, but claude never does - at least not any that persists across different contexts. The Recursive Language Models paper from Zhang, Kraska, and Khattab at MIT CSAIL introduced a cleaner framing. Instead of cramming everything into context, the model gets a searchable environment. The model can then query just for what it needs and can drill deeper where needed. coderlm is my implementation of that idea for codebases. A Rust server indexes a project with tree-sitter, builds a symbol table with cross-references, and exposes an API. The agent queries for structure, symbols, implementations, callers, and grep results — getting back exactly the code it needs instead of scanning for it. The agent workflow looks like: 1. `init` — register the project, get the top-level structure 2. `structure` — drill into specific directories 3. `search` — find symbols by name across the codebase 4. `impl` — retrieve the exact source of a function or class 5. `callers` — find everything that calls a given symbol 6. `grep` — fall back to text search when you need it This replaces the glob/grep/read cycle with index-backed lookups. The server currently supports Rust, Python, TypeScript, JavaScript, and Go for symbol parsing, though all file types show up in the tree and are searchable via grep. It ships as a Claude Code plugin with hooks that guide the agent to use indexed lookups instead of native file tools, plus a Python CLI wrapper with zero dependencies. For anecdotal results, I ran the same prompt against a codebase to "explore and identify opportunities to clarify the existing structure". Using coderlm, claude was able to generate a plan in about 3 minutes. The coderlm enabled instance found a genuine bug (duplicated code with identical names), orphaned code for cleanup, mismatched naming conventions crossing module boundaries, and overlapping vocabulary. These are all semantic issues which clearly benefit from the tree-sitter centric approach. Using the native tools, claude was able to identify various file clutter in the root of the project, out of date references, and a migration timestamp collision. These findings are more consistent with methodical walks of the filesystem and took about 8 minutes to produce. The indexed approach did better at catching semantic issues than native tools and had a key benefit in being faster to resolve. I've spent some effort to streamline the installation process, but it isn't turnkey yet. You'll need the rust toolchain to build the server which runs as a separate process. Installing the plugin from a claude marketplace is possible, but the skill isn't being added to your .claude yet so there are some manual steps to just getting to a point where claude could use it. Claude continues to demonstrate significant resistance to using CodeRLM in exploration tasks. Typically to use you will need to explicitly direct claude to use it. --- Repo: github.com/JaredStewart/coderlm Paper: Recursive Language Models https://ift.tt/cLx3HFM — Zhang, Kraska, Khattab (MIT CSAIL, 2025) Inspired by: https://ift.tt/f6VWbGr
Hack Nux
Watch the number of websites being hacked today, one by one on a page, increasing in real time.
New Show Hacker News story: Show HN: Agent framework that generates its own topology and evolves at runtime
Show HN: Agent framework that generates its own topology and evolves at runtime
17 by vincentjiang | 7 comments on Hacker News.
Hi HN, I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools. Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections: 1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session. The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless. 2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior: - Observe: Exceptions are observations (FileNotFound = new state), not crashes. - Orient: Adjust strategy based on Memory and - Traits. - Decide: Generate new code at runtime. - Act: Execute. The topology shouldn't be hardcoded; it should emerge from the task's entropy. 3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty. 4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking. For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback. Repo: https://ift.tt/y3Urb1v
17 by vincentjiang | 7 comments on Hacker News.
Hi HN, I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools. Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections: 1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session. The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless. 2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior: - Observe: Exceptions are observations (FileNotFound = new state), not crashes. - Orient: Adjust strategy based on Memory and - Traits. - Decide: Generate new code at runtime. - Act: Execute. The topology shouldn't be hardcoded; it should emerge from the task's entropy. 3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty. 4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking. For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback. Repo: https://ift.tt/y3Urb1v
New Show Hacker News story: Show HN: Send Claude Code tasks to the Batch API at 50% off
Show HN: Send Claude Code tasks to the Batch API at 50% off
2 by misker1 | 0 comments on Hacker News.
Hey HN. I built this because my Anthropic API bills were getting out of hand (spoiler: they remain high even with this, batch is not a magic bullet). I use Claude Code daily for software design and infra work (terraform, code reviews, docs). Many Terminal tabs, many questions. I realised some questions are ok to wait on and with that comes some cost savings. So here is a small MCP that lets you send work directly to Anthropic's Batch API from inside Claude Code, for the same quality responses just 50% cheaper, results come back in ~30min-1hr. How it works: you type /batch review this codebase for security issues, Claude gathers all the context, builds a self-contained prompt, ships it to the Batch API via an MCP server, and you get notified in the status bar when it's done (optional). The README has installation instructions, which were mainly generated by claude. I removed the curl | bash setup and at this stage of the project i feel more confident sharing the manual setup instructions. My main hope with this project is to monetize it. Not by asking for money, rather I am hoping others have ideas or improvements to add and use those to save more on cost.
2 by misker1 | 0 comments on Hacker News.
Hey HN. I built this because my Anthropic API bills were getting out of hand (spoiler: they remain high even with this, batch is not a magic bullet). I use Claude Code daily for software design and infra work (terraform, code reviews, docs). Many Terminal tabs, many questions. I realised some questions are ok to wait on and with that comes some cost savings. So here is a small MCP that lets you send work directly to Anthropic's Batch API from inside Claude Code, for the same quality responses just 50% cheaper, results come back in ~30min-1hr. How it works: you type /batch review this codebase for security issues, Claude gathers all the context, builds a self-contained prompt, ships it to the Batch API via an MCP server, and you get notified in the status bar when it's done (optional). The README has installation instructions, which were mainly generated by claude. I removed the curl | bash setup and at this stage of the project i feel more confident sharing the manual setup instructions. My main hope with this project is to monetize it. Not by asking for money, rather I am hoping others have ideas or improvements to add and use those to save more on cost.
New ask Hacker News story: Dear OpenAI and Anthropic Sales Leaders
Dear OpenAI and Anthropic Sales Leaders
5 by kevinprince | 1 comments on Hacker News.
We've been going through enterprise sales processes with both of you, and I've encountered some practices I haven't seen before with other B2B vendors: Usage data availability: We're being told we can't access usage data for our existing accounts unless we sign a 12-month commitment. We need this data to make an informed purchasing decision. Pricing validity: Received a pricing link with 14-day validity. On day 13, we were told pricing had doubled and the original quote wouldn't be honored. I understand AI is a fast-moving market and everyone's scaling rapidly. But these create real trust issues for procurement teams trying to make informed decisions. Has anyone else experienced similar challenges with AI vendor negotiations? I'm hoping these are isolated issues rather than emerging patterns.
5 by kevinprince | 1 comments on Hacker News.
We've been going through enterprise sales processes with both of you, and I've encountered some practices I haven't seen before with other B2B vendors: Usage data availability: We're being told we can't access usage data for our existing accounts unless we sign a 12-month commitment. We need this data to make an informed purchasing decision. Pricing validity: Received a pricing link with 14-day validity. On day 13, we were told pricing had doubled and the original quote wouldn't be honored. I understand AI is a fast-moving market and everyone's scaling rapidly. But these create real trust issues for procurement teams trying to make informed decisions. Has anyone else experienced similar challenges with AI vendor negotiations? I'm hoping these are isolated issues rather than emerging patterns.
New Show Hacker News story: Show HN: I built a cloud hosting for OpenClaw
Show HN: I built a cloud hosting for OpenClaw
2 by kenanbek | 0 comments on Hacker News.
Yet another OpenClaw wrapper. But I really enjoyed the techy part of this project. Especially server provisionings in the background.
2 by kenanbek | 0 comments on Hacker News.
Yet another OpenClaw wrapper. But I really enjoyed the techy part of this project. Especially server provisionings in the background.
New Show Hacker News story: Show HN: Reef – Bash compatibility layer for Fish shell, written in Rust
Show HN: Reef – Bash compatibility layer for Fish shell, written in Rust
2 by xbuben | 0 comments on Hacker News.
Fish is the fastest, friendliest interactive shell, but it can't run bash syntax, which has kept it niche for 20 years. Reef fixes this with a three-tier approach: fish function wrappers for common keywords (export, unset, source), a Rust-powered AST translator using conch-parser for structural syntax (for/do/done, if/then/fi, $()), and a bash passthrough with env capture for everything else. 251/251 bash constructs pass in the test suite. The slowest path (full bash passthrough) takes ~3ms. The binary is 1.18MB. The goal: install fish, install reef, never think about bash compatibility again. Your muscle memory, Stack Overflow commands, and tool configs all just work.
2 by xbuben | 0 comments on Hacker News.
Fish is the fastest, friendliest interactive shell, but it can't run bash syntax, which has kept it niche for 20 years. Reef fixes this with a three-tier approach: fish function wrappers for common keywords (export, unset, source), a Rust-powered AST translator using conch-parser for structural syntax (for/do/done, if/then/fi, $()), and a bash passthrough with env capture for everything else. 251/251 bash constructs pass in the test suite. The slowest path (full bash passthrough) takes ~3ms. The binary is 1.18MB. The goal: install fish, install reef, never think about bash compatibility again. Your muscle memory, Stack Overflow commands, and tool configs all just work.