
A Replacement for BERT | Hacker News
Dec 19, 2024 · Embedding models are frequently based on Bert style models, but Bert models can be finetuned to do a lot more than just embeddings. So an embedding focused finetune of modern Bert …
BERT is just a single text diffusion step | Hacker News
Oct 20, 2025 · Back when BERT came out, everyone was trying to get it to generate text. These attempts generally didn't work, here's one for reference though: https://arxiv.org/abs/1902.04094
Show HN: Context-aware Japanese furigana using Sudachi and ...
May 29, 2026 · I built a context-aware furigana converter for Japanese text, files, and web pages. The main problem I wanted to solve was that simple dictionary-based furigana works well for common …
Open models by OpenAI | Hacker News
Open models by OpenAI (openai.com) 2124 points by lackoftactics 9 months ago | hide | past | favorite | 876 comments
Train Your Own LLM from Scratch | Hacker News
May 5, 2026 · This must have been when Bert was sota. The architecture allows you to train a base and specialize with a head. I used the entire Wikipedia for the base and then some GBs of tweets I had …
Ask HN: What do you use for ML Hosting? | Hacker News
May 2, 2023 · On Modal.com these 34 lines of code is all you need to serverlessly run BERT text generation inference on an A10G (which has 24GB of GPU memory). No Dockerfile, no YAML, no …
UMD Scientists Create 'Smart Underwear' to Measure Human ...
Mar 15, 2026 · Ig Nobel is doing more for science than Nobel: - It's fun. - The prizes are accessible to young scientists who actually need the career boost from the publicity (as opposed to established …
How attention sinks keep language models stable | Hacker News
Aug 8, 2025 · > Researchers had observed similar patterns in BERT, where "a surprisingly large amount of attention focuses on the delimiter token [SEP] and periods," which they argued was used by the …
Hacker News
Hacker News is a platform for sharing and discussing technology, startups, and programming topics, fostering a community of tech enthusiasts.
Why we no longer use LangChain for building our AI agents ...
Jun 20, 2024 · BERT showed that training with two tasks (next sentence and mask fill) was more effective than solely one task. T5 showed that multiple instructions could be used for one task (token …