Ask HN: Which LLM are you using to evaluate your ideas?
4 by Marius77 | 2 comments on Hacker News.
Question as in the title.
Curious about your experience and which LLM helped you out the most without saying yes to everything..
Hack Nux
Watch the number of websites being hacked today, one by one on a page, increasing in real time.
New Show Hacker News story: Show HN: I trained a chess engine to play like humans
Show HN: I trained a chess engine to play like humans
2 by hazard | 0 comments on Hacker News.
I built 1e4.ai - a chess web app where you play against neural networks trained to mimic human Lichess players at specific Elo ranges. There's a separate model for each 100-point rating bucket from ~800 to 2200+, and the bots not only choose human-like moves but also burn clock time, play worse under time pressure, and blunder in human-like ways. Live demo: https://1e4.ai Code: https://ift.tt/kPMWrNL A few things that might be interesting: - Trained on almost a full year of Lichess blitz games, around 1B total games - Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player's rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might. - Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU - Downside of the tiny network is that it's a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations. - Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges - Inspired by Maia-2 and DeepMind's "Grandmaster-Level Chess Without Search". On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in https://ift.tt/V86btzh... - The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline.
2 by hazard | 0 comments on Hacker News.
I built 1e4.ai - a chess web app where you play against neural networks trained to mimic human Lichess players at specific Elo ranges. There's a separate model for each 100-point rating bucket from ~800 to 2200+, and the bots not only choose human-like moves but also burn clock time, play worse under time pressure, and blunder in human-like ways. Live demo: https://1e4.ai Code: https://ift.tt/kPMWrNL A few things that might be interesting: - Trained on almost a full year of Lichess blitz games, around 1B total games - Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player's rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might. - Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU - Downside of the tiny network is that it's a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations. - Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges - Inspired by Maia-2 and DeepMind's "Grandmaster-Level Chess Without Search". On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in https://ift.tt/V86btzh... - The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline.
New Show Hacker News story: Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés
Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés
3 by lackoftactics | 0 comments on Hacker News.
I built this after my brother started complaining that I got too much into brainrot culture. It's just for fun nothing serious, but was able to test vercel, tanstack start and convex without high stakes. Have fun! This is the game where lower score is goood for your mental health
3 by lackoftactics | 0 comments on Hacker News.
I built this after my brother started complaining that I got too much into brainrot culture. It's just for fun nothing serious, but was able to test vercel, tanstack start and convex without high stakes. Have fun! This is the game where lower score is goood for your mental health
New ask Hacker News story: Ask HN: Will low quality AI customer support be the new normal?
Ask HN: Will low quality AI customer support be the new normal?
4 by 0-bad-sectors | 1 comments on Hacker News.
Now whenever I reach for customer's support chat or phone I get an AI agent replying to me and I get into a useless loop for a couple of minutes before I start begging it to link me to a real person. Will companies start losing customers because of that or people will eventually get used to this?
4 by 0-bad-sectors | 1 comments on Hacker News.
Now whenever I reach for customer's support chat or phone I get an AI agent replying to me and I get into a useless loop for a couple of minutes before I start begging it to link me to a real person. Will companies start losing customers because of that or people will eventually get used to this?
New Show Hacker News story: Show HN: Countries where you can leave your MacBook at a random coffee shop
Show HN: Countries where you can leave your MacBook at a random coffee shop
2 by canergl | 2 comments on Hacker News.
Hi HN, I wanted to know which countries you can simply leave your laptop at a Starbucks, and where you can't. Feel free to click and vote.
2 by canergl | 2 comments on Hacker News.
Hi HN, I wanted to know which countries you can simply leave your laptop at a Starbucks, and where you can't. Feel free to click and vote.
New ask Hacker News story: Ask HN: Before Open Source took over the server, what was the discourse like?
Ask HN: Before Open Source took over the server, what was the discourse like?
2 by mbgerring | 0 comments on Hacker News.
My understanding of the early Internet is that there was fierce competition among commercial, closed-source server and database software, and that the dominance of Linux, Apache, MySQL (and now PostgreSQL) etc were far from obvious or guaranteed. I think we’re in a similar moment with LLMs, and I’d love to read some stories, or see some examples of discourse on mailing lists, forums, or whatever on this subject from that earlier period. I think it would be helpful for grounding present-day discussions. What can you share from this era?
2 by mbgerring | 0 comments on Hacker News.
My understanding of the early Internet is that there was fierce competition among commercial, closed-source server and database software, and that the dominance of Linux, Apache, MySQL (and now PostgreSQL) etc were far from obvious or guaranteed. I think we’re in a similar moment with LLMs, and I’d love to read some stories, or see some examples of discourse on mailing lists, forums, or whatever on this subject from that earlier period. I think it would be helpful for grounding present-day discussions. What can you share from this era?