Stories - Page 121 | HN Distilled

2026-02-16

14-year-old Miles Wu folded origami pattern that holds 10k times its own weight

Contest and Project Context

Commenters link to the full list of junior innovation finalists and top 300 projects, noting this is a national middle-school science fair pipeline rather than a standalone discovery.
Some see the work as a solid, well-executed science fair project (testing load-bearing across folds) rather than a breakthrough.

Miura-ori, Novelty, and Patents

Multiple comments note the fold is the well-known Miura-ori, attributed in the thread to a Japanese astrophysicist and already used in aerospace.
Others point out earlier patents on related folding ideas and emphasize that patents cover implementations, so optimizing parameters could itself be patentable, though not necessarily commercially valuable.
Several people criticize the headline for implying invention rather than measurement/optimization of an existing design.

Structural Mechanics, Scale, and Materials

Discussion focuses on the structure being very strong in compression in one direction but likely weak under lateral or multidirectional loads.
Comparisons are made to Roman arches, egg cartons, corrugated cardboard, and IKEA hollow-core furniture: great vertical strength, poor shear strength.
Scale is emphasized: what works at paper/desktop scale may fail at shelter scale; strength doesn’t scale linearly.
People speculate about uses as cores in composite panels, improved cardboard, or 3D-print infill, while noting 3D printing already has many infill patterns.

Emergency Shelters and “Use Case Inflation”

Several commenters are skeptical about the emergency shelter framing: tents don’t primarily need compressive strength, paper isn’t outdoor-ready, and real shelters face multidirectional loads.
Others suggest the “shelter” angle is largely a science-fair/academic trope to justify pure research, not necessarily the student’s own focus.

Age, Parents, and Learning

Many argue the key detail is six years of sustained practice, not just being 14; “people get good at what they’ve done half their life.”
Debate over how much credit belongs to parents/mentors and whether rich, well-connected families disproportionately produce such projects.
Long tangent on whether kids actually learn faster than adults, the role of neuroplasticity vs time and responsibilities, and ethics of early specialization versus encouraging generalism.

Overall Sentiment

Strong admiration for the student’s curiosity, persistence, and experimental rigor, mixed with skepticism toward media hype and overstated applications.

View on HN ↗ Original Article ↗

2026-02-16

"Token anxiety", a slot machine by any other name

Effectiveness of Coding Agents

Experiences range from “95% payout” when users are skilled at validation and stay within well-trodden domains to much lower success rates in data engineering/science or novel scientific tasks.
Users report LLMs parsing structure (checklists, PDFs) well but misinterpreting meaning, especially numeric results.
Some compare different models: in one example, a Codex-based agent spent 45 minutes producing mostly broken E2E tests, while another model solved the same task in 15 minutes and found serious flaws in Codex’s “passing” tests.
Consensus: agents are good at scaffolding, boilerplate, and common patterns; getting to production-ready quality often triggers a frustrating “Fixed it!” loop with new bugs.

Workflows, Back-and-Forth, and Guardrails

Many describe heavy “back-and-forth” as normal: refining specs, correcting bad plans, restarting when context bloats.
Practical tips: detailed README/specs, frequent restarts, stopping the agent when it “goes dumb,” using models mainly as oracles, and treating multi-agent workflows skeptically due to review overhead.
Others advocate agent harnesses with tests, linting, custom scripts, and plan-review subagents to systematically ground and constrain behavior.

Slot Machine / Addiction Analogy

Supporters see intermittent reward and “one more try” behavior similar to gambling, idle games, or loot boxes; some report real “token anxiety” and neglected hobbies.
Critics argue the analogy breaks: LLM makers are (currently) trying to increase reliability; intermittent success is a bug, not a profit-maximizing feature. They frame heavy use as “liking to build things,” not pathology.
There’s debate over whether intermittent rewards alone cause compulsion, with some pointing out that most real-world variable rewards (jobs, gardening, sports) don’t create addictions.

Incentives, Business Models, and Enshittification

One camp claims providers optimize for engagement and token spend, likening them to casinos or social media; they note verbose defaults and features that encourage multiple agents.
Others counter that subscription plans and strong competition incentivize fast, correct answers; if models deliberately wasted tokens, users would switch.
Some fear a Google-like trajectory: tools start user-centered, then slowly shift to profit extraction once lock-in and investor pressure grow.

Work Intensity, Burnout, and Code Slop

Several commenters think AI tools don’t reduce work; they intensify it: more features shipped, more “cognitive debt,” and less time to deeply understand systems.
Work/life boundaries blur because “just sending Claude a message” on a phone feels like low-effort progress, encouraging nights/weekends work in a weak job market.
Others say 996-style expectations remain rare and overreported, though they acknowledge creeping weekend activity.
Easy code generation tempts teams into overbuilt, messy codebases (“workslop”), where throughput rises but maintainability and architecture suffer.

View on HN ↗ Original Article ↗

2026-02-16

AI optimism is a class privilege

Core claim: AI optimism as privilege

Many agree with the article’s core point: it’s easier to be upbeat about AI if you’re insulated from its harms and assume your own job and status are safe.
Commenters link this to denial: believing AI will assist, not replace you, and ignoring second‑order effects like customers losing income and social breakdown.
Others push back: they say you can find AI useful while still recognizing harms, and that calling optimism “class privilege” overstates things.

Owners vs workers, expertise and job security

Several argue the real class line is ownership: those who own capital or equity in AI firms benefit from labor displacement; everyone else is exposed.
Even senior experts may be vulnerable as AI devalues perceived expertise and lets managers believe “a prompt” can replace years of experience.
Some respondents embody this optimism themselves (e.g., claiming they’ve “written their last line of code” thanks to AI tools), which others cite as exactly the privileged stance being critiqued.

Historical analogies and whether “this time is different”

One camp notes every major technology (looms, cars, recorded music, the internet) came with real displacement and moral panic but ultimately broadened access and prosperity. By that lens, AI pessimism repeats an old pattern.
The counter‑camp questions whether past tech was truly net positive (climate change, inequality, attention economy) and emphasizes the bloodiness of labor struggles that eventually produced shorter hours and rights.
Many argue AI is distinct: scale and speed across almost all cognitive work, centralized control by a few firms, and the possibility of “freezing” class structure when effort matters less than existing assets.

Quality, hype, and labor displacement

There’s tension between “AI isn’t that good” and “AI will wipe out jobs.” Some insist you must choose; others propose a coherent middle: models may be mediocre yet still used to cut costs, degrading outputs (e.g., AI journalism, low‑quality ads/software) while displacing workers.
Examples of executives chasing buzzwords and deploying ineffective AI are seen as evidence that labor can be cut even when productivity doesn’t genuinely improve.

Equality vs concentration and geopolitics

Optimists point to regions like India and Africa, where AI is seen as a chance to equalize access to education, law, and medicine.
Skeptics respond that paywalled, tiered models will entrench inequality and that those controlling AI are the same actors benefiting from current disparities.
Some extrapolate to extreme scenarios: AI as a “Manhattan Project” for class war, making labor unnecessary; or a brittle AI‑dependent economy vulnerable to attacks on data centers.

Regulation, inevitability, and politics

One side claims AI is inevitable: individuals must adapt, and energy should go into mitigation and safeguards.
Others contest inevitability, comparing AI to past harmful technologies that were restricted or banned, and argue that shrugging and adapting is itself a privileged political choice.

View on HN ↗ Original Article ↗

2026-02-16

Privilege is bad grammar

Bad grammar as status / countersignalling

Many see sloppy executive emails as textbook countersignalling: like powerful people wearing ratty clothes, bad grammar shows they’re “above” rules others must follow.
In tech, casual dress and terse, typo-filled replies can mark higher status, while suits and over-formality often signal middle management or sales.
Others push back: in their workplaces, leaders do write correctly; or bad grammar just feels like garden‑variety laziness, not a conscious flex.

Privilege, power, and double standards

“Privilege” is framed as the ability to get away with sloppiness with no career risk, unlike juniors who fear being judged as careless or uneducated.
Several note a clear asymmetry: bosses write casually downward but formally upward; subordinates are expected to maintain polish regardless.
Some argue this is mostly confidence and time-pressure rather than oppression; others stress that the double standard itself is the privilege.

Signalling theory and appearance

Long subthread on signalling: you can’t “not signal”; dress, tone, and grammar always convey information, intentionally or not.
Debate over whether dressing or writing casually is genuine comfort, strategic countersignalling, or just observers projecting status narratives.
Examples span wealthy people in worn clothes, homeless vs rich “slobs,” airport dress codes, and how attire reliably shapes treatment and opportunities.

AI, authenticity, and language as class marker

With AI polishing freely available, good grammar is seen by some as a weaker signal of education; imperfections now sometimes read as “more human.”
Others note AI can also fake typos and informality, so that authenticity signal is already being counterfeited.
Several connect grammar norms to class and power: prescriptive standards both enable clarity and function as gatekeeping; non‑native speakers often over‑invest in correctness while natives are lax.

Efficiency vs respect

Many executives reportedly prioritize speed: one‑word answers, phone-typed replies, minimal editing to avoid becoming a bottleneck.
Critics argue brevity doesn’t require mangled grammar and that clean writing shows respect for readers’ time and comprehension.
Others see informal tone as a courtesy and trust signal—treating you as an insider rather than a supplicant—and view obsessing over polish as counterproductive.

View on HN ↗ Original Article ↗

2026-02-16

I guess I kinda get why people hate AI

AI Marketing, Hype, and “Safety” Rhetoric

Many comments argue that doom-y job-loss talk is marketing—but aimed at CEOs and investors, not users: “invest or be left behind.”
“Safety” and x-risk messaging is seen as a way to lobby for regulation that locks in big players’ moats (“only we can be trusted with this tech”).
Some see genuine concern; others view it as classic FOMO + fear-mongering to sustain an arms-race narrative (including with China).

Practical Impact on Development Work

Broad agreement that LLMs help with boilerplate, examples, and small coding tasks; most devs’ real opinions lie between “useless” and “devs obsolete.”
Repeated stories of AI-driven projects failing when people “vibe code” whole systems or promise rewrites of massive legacy stacks in months.
Some report big productivity boosts on localized tasks; others report slow, bloated, or subtly broken AI code that causes emergencies later.
Code review congestion with long, low-quality AI PRs is cited as a real cost; sustainable, fully automated maintenance is said to be absent so far.

Jobs, Power, and Corporate Behavior

Many see AI messaging being used to intimidate workers (“10x faster with AI or your job is at risk”) and justify layoffs that would’ve happened anyway.
Entry-level/junior roles and routine white-collar work (especially copywriting and offshore BPO) are seen as most vulnerable.
There’s skepticism about long-term scenarios where AI kills most jobs: who buys products if consumers lack income? Yet some expect massive white-collar losses in the next downturn.
Several argue the real issue isn’t just jobs but increasing inequality, “whale hunting” (selling to big firms/governments), and erosion of economic relevance for ordinary people.

Arms Race, Military, and Control

A subset views AI primarily as military/geopolitical tech: an “AI Manhattan Project” where pausing is seen as unilateral disarmament.
Others counter that its real endgame is domestic population control and surveillance, not just interstate conflict.

Culture, Trust, and Everyday Life

Many say people hate AI less for job risk and more for: spam, slop, fakery, plagiarism, and undermining visible effort and learning.
Concerns include enshittification of software (shipping more low-quality features faster), weakened thinking, and a breakdown in shared reality.
Debate over anthropomorphizing LLMs: some thank them to maintain their own civility; others find that emotionally confusing or unnecessary.

Bubble or Transformation?

One camp sees “AI will kill 50% of white-collar jobs” as late-stage bubble rhetoric, akin to web3/NFTs or Theranos, with ROI still unproven at scale.
Another insists recent model improvements (especially in reasoning/code) are “staggering” and that broad adoption of coding agents is inevitable.
Timeline and magnitude of impact are widely disputed; whether this is a transient bubble or a true paradigm shift is considered unresolved.

View on HN ↗ Original Article ↗

2026-02-16

iOS 27 'Rave' Update to Clean Up Code, Could Boost Battery Life

Liquid Glass UI Backlash

Many see “liquid glass” as the core problem: ugly, distracting, slow, and power‑hungry.
Complaints: excessive transparency, motion effects that make icons/widgets subtly “float,” illegible text in some contexts, and UI elements that move unpredictably.
Safari and other apps reportedly have rendering bugs and layout issues (e.g., bottom toolbar covering page controls, unusable sites).
Some users stick to older macOS versions to avoid Tahoe’s look, hoping to “skip” the liquid-glass era.
A minority defend the new aesthetic as fine and accuse HN of reflexively hating change; others respond that the objections are about usability, not taste.
There’s demand for an official “minimal UI” toggle; currently only partial hacks (e.g., via Reduce Motion) exist.

Overall Software Quality and Bugs

Strong sentiment that iOS and macOS quality has regressed: “used to just work” vs. now feeling laggy and flaky.
Reported daily bugs include: icons not appearing, dim screen after unlock, alarms not firing or being silent, frozen touch on incoming calls, misaligned UI layers, internet slowdowns fixed only by reboot, and persistent small UI glitches (control center, Home Screen rearranging, Apple Music layout, Podcasts download deletion).
Some see Tahoe and recent System Settings redesigns as emblematic of a long UI/UX decline.

Keyboard and Text Selection Issues

The iOS keyboard is described as “comically bad”: not appearing, missing taps, and aggressive, often-wrong autocorrect (including embarrassing substitutions).
Text selection behavior is called unpredictable and frustrating; handles are hard to grab, and selection scope (word/sentence/paragraph) feels random.

Battery Life and Performance

Several attribute poor battery life and low frame rates (even on relatively new devices) to liquid-glass effects and animations.
Others say performance feels fine but battery drain is much worse, even in simple video playback.
The article’s “could boost battery life” wording is mocked as noncommittal; some fear any gains will be spent on new heavy features.

Design Leadership and Accountability

Debate over who is actually responsible for the current design direction; critics argue it’s systemic, not the fault of a single designer.
Some note Apple has reversed past hardware mistakes (ports/function keys), so a UI retreat or redesign is possible, even if spun as “the next great look.”

Feedback, Release Process, and “Snow Leopard” Wishes

Many want Apple to spend at least a full cycle (or more) on tech debt, performance, and bug fixes—“another Snow Leopard era.”
Others caution that Snow Leopard itself was buggy at launch; its reputation came from a long, iterative cycle.
Apple Feedback is widely viewed as a “black hole”; a few anecdotes show it sometimes works, but the dominant perception is that only media coverage or internal employees get results.
Proposals: slower major version cadence, three‑year cycles, or explicit “stable vs. experimental” channels akin to old Linux kernel versioning.

Silent Siri / Silent Speech Interface

The rumored “silent speech” interface (face/muscle‑based input) gets mixed reactions:
- Enthusiasm for discreet dictation in public/office settings.
- Skepticism about practicality (needing camera alignment) and strong privacy worries about constant facial monitoring.
- Some joke it would just be a new way to make Siri worse.

Rumors, Trust, and Platform Choices

Some criticize MacRumors and the larger rumor ecosystem as clickbait built on a single newsletter; others defend these sites as mostly accurate and clearly labeled as rumors.
A number of commenters say they’re considering or already planning to move to Android/GrapheneOS due to bugs, design choices, and eroding “premium” feel.
Concerns are raised about Apple’s security-by-obscurity: users lack tools to verify compromise after zero-days.

View on HN ↗ Original Article ↗

2026-02-16

UK Discord users were part of a Peter Thiel-linked data collection experiment

Concerns about Discord’s age verification and data handling

Commenters unpack two paths: a local, on-device selfie-based age check (k-id) and an escalation path where users upload ID documents, previously via Zendesk and now (briefly) via Persona.
The major fear is around ID documents linking real-world identity to Discord accounts, especially given a prior Zendesk-related leak. Selfies are seen as less sensitive than full IDs.
People note that Discord quietly added and then removed references to a UK “experiment” with Persona and adjusted FAQ language, which is read as improvisational and non-transparent.
Many assume any such vendor will retain or monetize data despite claims of “quick deletion,” and regard reassurances as non-credible given past industry behavior.

Debate over Thiel/Palantir linkage and guilt by association

One side argues that highlighting Peter Thiel or Palantir is mostly rhetorical: funding via Founders Fund is a weak link, and by that standard vast swaths of tech would be “tainted.”
Others say Thiel/Palantir have such a toxic surveillance-and-politics reputation that any association is a serious red flag, regardless of direct evidence of data sharing.
Some stress that ownership stakes create the possibility of meddling and portfolio-level data sharing, which is enough to worry users whose data could be used for immigration or law-enforcement targeting.
A counterview likens current Palantir discourse to conspiracism: people readily assume the worst without concrete proof.

Motives and incentives for age verification/KYC

Several commenters doubt age checks are truly about child protection; they see them as driven by regulatory compliance, liability reduction, and data harvesting.
There’s discussion of weak incentives to store KYC data securely versus strong incentives to cut corners; others note KYC vendors are replaceable, so leaks can have real business costs.

Public-sector use and broader political–economic framing

Palantir’s work with the UK NHS, police forces, and foreign governments is cited as evidence of deep state-surveillance entanglement; others reply that it’s “just another big vendor” like cloud providers.
Some frame the situation as a stage of capitalism: initial market consolidation followed by regulatory capture where billionaires push laws that mandate using their products.
There’s also a thread arguing the core issue is children’s unsupervised device access; age-gating tech is seen as a downstream, privacy-hostile response to that social change.

Technical alternatives and skepticism

Commenters note that cryptographic or zero-knowledge age proofs, or token systems issued after in-person ID checks, could solve age verification with far less data exposure.
Others respond that implementers will be tempted to build in re-identifiability or tracking, undermining the privacy benefits.

View on HN ↗ Original Article ↗

2026-02-16

What your Bluetooth devices reveal

Early Bluetooth “people watching” & bluejacking

Several recalled early-2000s habits: scanning for nearby devices on trains or in malls, matching device names to people, and even pranking (e.g., pushing calendar alarms, sending unsolicited files/“bluejacking”).
Custom device names were common and often highly identifying; some still play with joke names (fake police vans, dictators, sex toys, etc.).

Retail spam, ads, and traffic monitoring

People describe malls and shops blasting unsolicited Bluetooth file-transfer prompts, sometimes abused for malware, which pushed users to turn BT off.
Multiple comments confirm commercial tracking: malls, department stores, grocery chains, airports, and car dealerships use WiFi/Bluetooth to measure dwell time, movement patterns, and repeat visits, sometimes linked to loyalty apps or campaigns.
Bluetooth and toll transponder IDs are used by road authorities to infer traffic speeds; similar systems exist in several regions and at festivals.
Some note EU rules supposedly forbid individual tracking, but others say it still happens under “anonymized” or safety pretexts.

Home and neighborhood fingerprinting

HomeAssistant and similar tools easily log neighbors’ devices and presence (including Bluetooth toothbrushes), unintentionally exposing routines.
Simple setups (ESP32, Pi) could correlate MACs with faces at a front door and profile visitors over time.

Cars, TPMS, and other radios

Car WiFi/BT SSIDs often reveal owner and model; wardriving apps show this at scale.
Tire pressure sensors and even RFID-tagged tires broadcast unique identifiers useful for vehicle tracking, though some argue plates and CCTV already dominate.

Medical, IoT, and wearables

Examples include pacemakers, CPAP machines, water meters, and sex toys broadcasting via BLE.
Debate over design tradeoffs: broadcast-only radios can save power and reduce attack surface, but still leak metadata; others argue for NFC-style activation or better encryption despite cost pressures.

MAC randomization and technical limits

Bluetooth has “resolvable private addresses” and phones/WiFi now often randomize MACs, but commenters note:
- Rotation can be correlated over time,
- Device types and traffic patterns still fingerprint users, and
- Many accessories use static IDs.

User countermeasures and OS behavior

Some keep BT/WiFi off and only enable when needed, citing both privacy and battery gains (especially since “Find My”-style networks piggyback on BT).
GrapheneOS can auto-disable radios after inactivity; iOS and Android have partial/hidden behaviors (Control Center only “disconnects,” auto-reenable at set times/locations).
People share shortcuts/automation (“store mode”) to kill radios before entering shops.

Threat models, art, and ethics

Speculative uses include burglar tools that log presence/absence, and art installations that confront passersby with their historical visits or purchased data.
Some argue Bluetooth tracking is just another form of public observation; others stress the qualitative shift from casual noticing to scalable, automated, long-term surveillance.

Meta: skepticism about the article

Multiple commenters call the blog post “LLM slop,” criticizing its tone (“problem nobody talks about,” “not a hacking tool”) and presentation as derivative of other indie blogs.

View on HN ↗ Original Article ↗

2026-02-16

The Sideprocalypse

Overall Reaction to the Article

Many readers find the piece emotionally resonant but “overly glum,” trolling, or content‑free; some see it as inverse hype (“doom for clicks”).
Others say its core intuition matches their experience: small indie SaaS is being squeezed by AI‑assisted clones and aggressive distribution.

AI “Vibecoding” and SaaS Clones

Several agree that AI makes cloning simple SaaS trivial and shifts value toward marketing, distribution, and sales.
Others push back: building a real product with agentic AI is still slow and brittle; “weekend clones” usually break in demos and don’t threaten serious products.
Hard problems, complex domains, and domain‑specific edge cases (e.g. niche CRMs, regulated hardware, medical/industrial software) remain difficult to clone.

Quality vs Marketing / Distribution

One camp: quality isn’t what wins; VC money, SEO, and distribution already dominate, and AI just accelerates the flood of low‑quality “slop.” Enterprise SaaS examples are cited where obviously broken products still sell.
Counter‑camp: quality matters for retention, critical systems, legal liability, and long‑term survival; bad code is not a free “cost of doing business” in many domains.
Some predict a future of “software taste,” where a minority of discerning users and “taste makers” reward high‑quality / human‑crafted software despite mass sludge.

“What” vs “How” and Niche Strategy

Strong agreement that the hard part has always been deciding what to build, understanding customer pain, and shaping processes; AI mainly makes the how cheaper.
Several argue the realistic solo‑SaaS path is weird, tiny niches (<1000 customers) where SEO doesn’t matter and word‑of‑mouth dominates.
Others think opportunities are shifting, not disappearing: as old problems become easy, previously “impossible” ones move into reach.

Market Structure, Discovery, and Alternatives

Some see a “market for lemons” dynamic: overwhelming garbage and limited ability to evaluate quality push buyers toward brand, hype, or large incumbents.
Others note AI also boosts open source and in‑house tools, which can undercut subscription SaaS on the same cost assumptions.
There’s debate on SEO: some agree it’s decisive; others argue future discovery will be via LLMs or social graphs, changing but not eliminating distribution moats.

Side Projects, Products, and Physical Goods

Side projects often die from “success anxiety” and over‑engineering rather than lack of time.
Thread includes a long sub‑discussion on a new RSS reader SaaS: people probe differentiation in a crowded market, illustrating how hard positioning now is.
A few devs report moving into physical products: margins are worse but sales feel simpler; others respond that certifications, logistics, and returns are nontrivial.

View on HN ↗ Original Article ↗

2026-02-16

Running My Own XMPP Server

Choosing a Messaging Platform: UX vs Security vs Control

Several commenters recount moving from self-hosted Matrix/XMPP to Telegram or Signal because of poor UX, accessibility, or mobile/desktop sync issues.
Telegram is praised for UX, stickers, and feature richness, but heavily criticized as “deeply insecure” (home-rolled crypto, E2EE not default, no group E2EE, scams/ads).
Signal is seen as more secure but criticized for: phone-number requirement, mobile-first account model, weak desktop integration, and non-federated, single-operator control.
Some explicitly want something “like Signal but federated,” others accept centralization for convenience.

Matrix vs XMPP: Complexity and Resource Usage

Multiple experiences: Matrix/Synapse is resource-hungry, fragile on upgrades, and dominates VPS resources; some abandon self-hosting Matrix for lighter XMPP.
One detailed comparison:
- XMPP: simple core, many optional XEPs → fragmentation and feature mismatch across clients.
- Matrix: heavy complexity in the core (DAG event graph, full room history) → good consistency guarantees but expensive to run.
Question raised why “we ditched XMPP” for Matrix; responses say big tech abandoned federated XMPP for business reasons, not because it was technically worse.

XMPP Self-Hosting and Tooling

Long-term XMPP admins report ejabberd/Prosody “just work” for years with minimal resources.
ejabberd seen as more monolithic and admin-friendly (bundled TURN, ACME), Prosody as flexible but needing more protocol knowledge.
Snikket is highlighted as a preconfigured Prosody-based stack aimed at “self-hosted WhatsApp/Signal for family,” with invites, bundled TURN/STUN, and branded, tested clients.
Bridges like slidge (Signal/WhatsApp/Telegram → XMPP) and jmp.chat (phone ↔ XMPP) are suggested, with explicit warnings that bridging can nullify E2EE.

Client Quality and Mobile Pain Points

Matrix: clients often buggy; some report months-long broken image sending in FluffyChat and heavy Synapse; Linux Matrix clients described as poor.
XMPP: Android’s Conversations strongly praised; Movim liked for web, GIFs, and AV calls; Dino and Gajim noted as improving.
iOS XMPP clients (Monal, Siskin) criticized for UI bugs and especially unreliable notifications, making them unusable as primary phone/SMS replacement for some.

Encryption and Trust

OMEMO is described in the article as “Signal-like”; commenters share links criticizing OMEMO’s design and warning that similarities to Signal are overstated.
Others argue those critiques are opinionated and partially corrected, but agree that current XMPP+OMEMO ecosystem is not a drop-in “Signal competitor.”
Signal’s explicit hostility to federation and third‑party clients is viewed by some as a trust and longevity concern compared to open protocols like XMPP.

View on HN ↗ Original Article ↗

2026-02-16

Ministry of Justice orders deletion of the UK's largest court reporting database

Role and Value of Courtdesk

Service provided near‑real‑time streams of court listings and events (claims of ~12,000 updates/day), filtered and searchable.
Commenters say underlying data is technically public but effectively “hidden”: you must already know a case exists or navigate clunky systems (e.g. legacy Windows apps).
Courtdesk’s aggregation was seen as crucial for:
- Journalists to discover cases in time to attend.
- Research and statistics on charging, sentencing, and “weekend” cases with no press presence.
Several see shutting it down as materially reducing practical transparency, even if the “source of truth” remains elsewhere.

Government Rationale vs Company Rebuttal

Official line: Courtdesk breached conditions by sharing sensitive personal data on ~700+ cases with an AI company, contrary to its agreement.
Company response (as summarized in comments): they hired a specialist ML contractor under a sub‑processor agreement to build a “sandboxed” safety tool; no resale, no OpenAI-style ingestion, money flowed from Courtdesk to contractor.
Dispute over whether this counts as “sharing with a third party” or normal outsourcing, and whether the government has mischaracterized events.
Some note the issue was not referred to the data regulator, which they find suspicious.

Transparency, Politics, and “Cover‑Up” Claims

A segment of commenters connects the deletion order to broader worries about:
- Grooming gang scandals and alleged past cover‑ups.
- Immigration and crime debates.
- Upcoming or sensitive trials (including those involving senior politicians).
Others push back, calling this opportunistic use of anti‑immigrant sentiment and stressing that similar child‑protection failures occurred irrespective of ethnicity.
There is disagreement whether this is bureaucratic risk‑aversion, contract enforcement, or an intentional attempt to reduce scrutiny of the justice system.

Public Records, Privacy, and AI

Big split over principle:
- One side: if it’s public record it should be cheaply, digitally, and bulk‑accessibly public; AI scraping is just a fact of life.
- Other side: “publicly accessible” ≠ “free to mass‑harvest, republish, and monetize indefinitely,” especially for minors, acquitted defendants, and expunged cases.
Fears that AI corpora will create “forever convictions” and make rehabilitation impossible; others argue that past crime is legitimately relevant information.
Many suggest middle‑ground models:
- Redacting PII in bulk datasets, but allowing detailed access under tighter controls.
- Certificates or filtered checks (e.g. “fit to work with children/finance”) instead of raw criminal histories.
- Maintaining friction (in‑person requests, rate limits, or logged access) to prevent industrial scraping while preserving open justice.

Technical and Structural Issues

Recognition that ease of aggregation fundamentally changes the impact of “open” data; bots can do in hours what no human could in a lifetime.
Debate over whether paywalls, rate‑limits, or robots.txt are legitimate tools to curb abuse or just pseudo‑openness.
Some argue the government should run a modern, well‑documented API or at least a torrentable archive; others think restricting machine access is appropriate.

Legal/Contractual Framing and Next Steps

Some frame this primarily as a straightforward breach‑of‑contract/data‑protection issue: conditions explicitly limited onward sharing and non‑journalist uses.
Others think the punishment (full shutdown and deletion of historical archive) is disproportionate and harms public oversight more than it protects data subjects.
Hints that the Ministry intends a new licensing framework or replacement system, but commenters are skeptical it will match Courtdesk’s utility.
A few propose offshoring mirrors (e.g. US‑hosted, torrent archives) to place court data beyond UK government takedown reach.

View on HN ↗ Original Article ↗

2026-02-16

Thanks a lot, AI: Hard drives are sold out for the year, says WD

AI-Driven Storage Shortages and Market Dynamics

Commenters link HDD/RAM scarcity and price spikes to AI datacenter build‑outs, seeing parallels with earlier GPU shortages from crypto and COVID.
Debate over whether demand is “real” and long‑term or a heavily subsidized bubble driven by VCs and nation‑states; many expect a later glut of cheap second‑hand hardware, others think AI will keep pushing hardware demand for years.
Manufacturers are portrayed as cautious: high capex and the recent post‑COVID crash make them reluctant to expand capacity only to face a glut; better to raise prices and sell out existing production.
Some suggest hard‑drive “futures” or large pre‑payment contracts to de‑risk new factories; skeptics note this only works if enough buyers commit far out.

What All the Drives Are For

Speculation that AI companies are hoarding HDDs for:
- Massive training corpora, including multimodal (text, audio, video, scanned books).
- Repeated large‑scale scraping and “just in case” archives of multiple versions of the same data.
Others note that true “cold” archival at hyperscale should favor tape, with HDDs as nearline storage.
Some argue storage optimization is neglected because compute costs dwarf storage bills.

Bubbles, “Picks and Shovels,” and Winners

“Picks and shovels” analogy: drive makers, fabs, and other infrastructure providers may profit more durably than AI application companies, but could also be exposed when demand normalizes.
Comparisons to dot‑com fiber buildouts and housing: real long‑term value may emerge, but current capital spending and valuations look bubble‑like to many.
Others argue shortages reflect genuine structural demand: AI agents, video generation, and multimodal models inherently require far more compute, energy, networking, and storage.

Consumer Impact and Workarounds

Home users, NAS owners, and hobbyists report:
- 2–3x price increases for HDDs, SSDs, and RAM; difficulty getting large‑capacity drives.
- Fear of necessary replacements (failed backup drives, NAS disks) during a price spike.
- Increased interest in refurbished/used enterprise drives and shucking external USB HDDs.
- Some consider selling home‑lab gear now and rebuying after a predicted crash.

Broader Concerns: Thin Clients, Sovereignty, and Energy

Worry that expensive local hardware plus AI/cloud incentives will push everyone to thin clients and rented “cloud workstations,” eroding digital sovereignty.
Environmental concerns: AI’s huge power draw vs. rapid build‑out of renewables; disagreement over whether AI “progress” justifies added energy use.
Underlying theme: centralized AI build‑outs crowd out personal computing, both economically and politically.

View on HN ↗ Original Article ↗

2026-02-16

Evaluating AGENTS.md: are they helpful for coding agents?

Reported impact of AGENTS.md in the paper

Thread highlights the core result: context files often reduce task success and increase cost, especially when auto-generated by LLMs.
Human-written files give only a small average boost (~4%), and not consistently across models; some models even regress.
Several commenters argue that measuring “success” as “PR passes tests” misses important dimensions like style, conventions, and maintainability.

How people actually use AGENTS/CLAUDE.md

Common contents: how to build/run tests, minimum language versions, preferred tools, project-specific conventions, and “don’t do X here” local rules.
Many only add rules reactively after an agent makes a specific mistake, then re-run the task to see if behavior improves.
Several use them mainly to encode tribal knowledge and non-obvious architecture decisions rather than things inferable from code.

When and why they fail

Instructions are applied inconsistently; agents often ignore even repeated, explicit rules (e.g., “don’t use Node APIs when Bun exists,” “don’t generate React in this Vue repo”).
Negative instructions (“do not …”) are seen as particularly fragile, likened to telling a toddler “don’t do X.”
Some move rules into deterministic enforcement (linters, pre-commit hooks, compiler checks) rather than trusting LLM obedience.

Design patterns for context docs

Strong support for short, focused, hierarchical files: a tiny top-level AGENTS/CLAUDE.md plus nested ones per app/feature.
Progressive disclosure is valued to reduce context “rot,” though it may trade off with token caching.
Many argue AGENTS.md is often just “a README/CONTRIBUTING the agent will actually read,” and suggest auto-ingesting existing docs instead of inventing new formats.

Skepticism, cargo culting, and metrics

Several see AGENTS.md tuning as pleasant but potentially self-delusional “prompt engineering,” reinforced by LLMs always affirming that new rules will help.
Research is welcomed as an antidote to cargo-cult prompting, but some note that results age quickly as models change.
Others argue a 4% gain is large if real, especially on hard tasks, and that token cost is minor compared to saved human time.

Anthropomorphizing and “why” questions

Long subthread debates whether asking agents why they did something yields meaningful insight versus post-hoc fiction from a next-token predictor.
“Thinking”/reasoning traces are seen by some as useful debug context, by others as just more tokens with no privileged status.

View on HN ↗ Original Article ↗

2026-02-16

The Israeli spyware firm that accidentally just exposed itself

Surveillance tech and (non-)regulation

Many see commercial spyware as a systemic threat that “makes everyone unsafe” and argue it should be regulated.
Others are deeply skeptical regulation can work, noting governments are the primary customers and would simply co‑opt or expand access rather than constrain it.
Some equate “regulation” with more actors reading your data (regulators, panels, agencies), not fewer.
There is frustration at calls for regulation seen as naive or ritualistic in surveillance discussions.

Device security, OSes, and personal defenses

Suggestions: keep devices updated, minimize apps, use separate “burner” devices for risky activity, or hardened setups like GrapheneOS on Pixel; on iOS, consider “lockdown mode”.
Several note that memory-safe languages help but don’t solve exploitation; real security is layered defense, hardware isolation (separate security processors, modem isolation, memory tagging), and avoiding preinstalled bloat/spyware.
GrapheneOS + Pixel and iOS are described as relatively strong; most Android OEMs are portrayed as weak, with supply-chain compromises (e.g., AppCloud) and modem exploits undermining even hardened systems.
Consensus that any OS, including Android and desktop Linux, is compromisable by a determined, well-resourced actor.

Israeli intelligence–tech pipeline and geopolitics

The article’s depiction of a tight loop between Israeli military intelligence (e.g. Unit 8200), ex‑officials, and private spyware firms fits many commenters’ views.
Some emphasize this isn’t unique to Israel, likening it to US intelligence–startup ties; others see Israel as an especially dense hub with global leverage, including EU and US law‑enforcement customers, sometimes in legal gray zones.
There are mentions of senior political figures’ connections to intelligence and to controversial intermediaries (e.g. Epstein) as emblematic of this ecosystem.
Debate over whether Israeli tech is overwhelmingly “dodgy security/spyware” or mostly ordinary infra/dev‑tools, with media selection bias cited.

Ethics: security, terrorism, and apartheid accusations

One side argues Israel’s pervasive surveillance (especially of Palestinians) underpins world‑class counter‑terror capabilities and has prevented attacks in Europe.
Critics respond that this is inseparable from occupation/apartheid dynamics and mass rights violations; they view “terrorism vs surveillance” as a false choice, advocating equal‑rights, secular governance instead of ethno‑religious hierarchy.
There is prolonged, heated argument over history (Nakba, wars, Hamas, rockets, blockades), genocide accusations, and whether Israel’s insecurity is self‑inflicted or imposed by hostile neighbors. No consensus emerges.

Capabilities, facial recognition, and overreach

Some claim Israeli facial recognition is “virtually error free,” trained on decades of Palestinian checkpoint data and global biometric flows (e.g., international travel).
Others strongly doubt such near‑omniscience: they point to operational failures like October 7, practical limits on compute/bandwidth, and real‑world error rates (e.g., UK police data) that are far from “error free.”
There is concern that even 89–99% accuracy is dangerous given the stakes of misidentification.

Nature of spyware firms and data sources

A view emerges that firms like Paragon mostly buy 0‑days and wrap them in dashboards, acting as financial/operational middlemen rather than deep research shops.
Some speculate that “accidental leaks” function as marketing for investors and government buyers.
Others note that a lot of what such dashboards show could in principle be reconstructed from public and semi‑public data (social media, app metadata), with invasive exploits layered on top.

View on HN ↗ Original Article ↗

2026-02-16

Anthropic tries to hide Claude's AI actions. Devs hate it

Visibility into Claude Code’s actions

Main complaint: recent changes hide which files Claude reads/writes by default, making the agent feel like a black box.
Repurposed “verbose” mode now shows file paths, but hides other details; ^O reveals a “very verbose” view. Many find this naming and layering confusing.
Several argue visibility is not curiosity but an early‑warning system to stop bad edits or pointless whole‑repo scans before they happen.
Others note logs are still available via --json, local files (~/.claude/projects), and third‑party tools (tailers, TUIs), but say this is worse UX than inline streaming.

Autonomy vs supervision in agent workflows

One camp wants interactive supervision: seeing file access, plans, and tool calls to steer or abort runs.
Another camp runs multiple agents in parallel and values reduced noise, relying on tests, linters, and external gates instead of “micromanaging” the trace.
Some argue Anthropic is optimizing for long‑running, horizontally scaled agent teams where only the final result matters; critics respond that reliability isn’t there yet, so hiding steps is premature.

Impact on developer workflow & “vibe coding”

Many devs use Claude to work on serious, older codebases; they insist on reviewing every diff and using the agent for scoped, boring tasks, not unsupervised “vibe coding.”
Others report maintainability problems from unguided agent‑written code and clients coming back with “scalability/quality” issues.
Debate over multi‑agent setups: some report “unreasonably effective” results with reviewer/orchestrator agents; others see confident but wrong outputs and complex, hard‑to‑audit behavior.

Alternatives and tooling ecosystem

Multiple mentions of OpenCode, Codex, custom CLIs/TUIs, and wrapper tools that restore richer traces, scrollback, or multi‑agent orchestration.
Some users have already cancelled Claude Code subscriptions in favor of alternatives, citing slower performance and poorer feedback loops.

Product decisions, incentives, and trust

Disagreement over intent: some see UI changes as benign but misguided simplification; others suspect lock‑in, token‑burn incentives, or attempts to obscure chain‑of‑thought.
Several call for simple configuration: multiple verbosity levels, persistent preferences, and distinct “operator” vs “batch” modes, rather than one-size-fits-all.
Broader theme: once tools become agents that edit real code, observability (logs, traces, diffs) becomes mandatory infrastructure, not optional polish.

View on HN ↗ Original Article ↗

2026-02-16

Qwen3.5: Towards Native Multimodal Agents

Quantization, MoE, and Local Inference

Discussion centers on whether 2–3 bit quantizations of huge models are better than smaller dense models at 8–16 bit.
Consensus: 4-bit (e.g., MXFP4) is usually the “sweet spot”; 2–3 bit often degrades quality but can remain usable for very large MoE models.
For MoE (e.g., 397B with ~17B active), inactive experts can be mmap’d from disk and KV cache offloaded to swap; performance then depends heavily on spare RAM and storage speed. No clear benchmarks; outcomes are workload-specific.
Some argue you must eval on your own tasks; many decisions are currently driven by “vibes” rather than rigorous calibration.

Context Length and Qwen3.5-Plus

Hosted Qwen3.5-Plus reportedly supports 1M tokens vs 200–262k “native” in open weights.
Commenters note they use YaRN-style scaling with caveats: can hurt short-context performance and may be best enabled only for long inputs.
OpenRouter exposes both base and Plus; Plus is cheaper under some context limits, implying proprietary inference optimizations.

RL Environments and Training Strategy

Qwen claims 15k RL environments; commenters infer this could include CLIs, GUIs, APIs, GitHub repos, games—anything with cheap, automatable feedback.
A speculative pipeline: mine GitHub, auto-classify repos as environments, auto-generate goals (e.g., introduce/fix bugs), then run large-scale RL.
View: each generation of models improves this pipeline, creating a “throw money at it” scaling regime for verifiable tasks; judgment-heavy tasks remain harder and risk LLM-judge bias.

Benchmarks, Benchmaxxing, and ARC-AGI

Many praise Qwen’s capabilities and fast iteration but repeatedly raise concerns about “benchmaxxing” and overfitting to public benchmarks.
ARC-AGI is cited as a counter-signal: open models (and even some proprietary ones) score poorly there despite strong mainstream benchmarks. Some argue ARC-AGI doesn’t map well to typical user needs.
Skeptics report that models advertised as “Sonnet 4.5-level” often collapse on real, complex work—especially once quantized for consumer hardware.

Hardware and Practical ‘Openness’

Debate over whether these “open” models are effectively cloud-only: 397B is beyond most local setups, but 80–120B-ish models plus aggressive quantization may run on 128–256GB Macs or Strix Halo APUs.
Strong disagreement over the usefulness of Apple silicon for serious LLM work: token generation can be fine, but prefill is often criticized as too slow for agentic workflows.
Some want smaller Qwen3.5 distills (80–110B, with vision) for 128GB devices; maintainers hint more sizes are coming.

Evaluation Oddities: Pelicans, Car Wash, and “Native Agents”

The “pelican on a bike” SVG test resurfaces as a folk benchmark for multimodal precision and hallucination; models now mostly produce bad-but-amusing SVGs, possibly due to training on earlier poor outputs.
Another meme test: “car wash 50–100m away—walk or drive?” Some models still misinterpret the question; others now handle it well.
Several commenters argue that beyond benchmarks, the real differentiator is whether “native multimodal agents” can maintain coherent multi-step tool use and long-horizon context without losing the thread.

Ecosystem, UX, and Miscellaneous

People note Qwen3.5 is already on OpenRouter with competitive pricing but no caching yet.
Requests for third-party SWE-bench-verified results; vendor self-reporting is treated with caution.
Multiple complaints about Qwen’s blog UX: dark-mode rendering issues, heavy PNG tables, auto-downloaded PDFs, and Safari privacy settings blocking content.

View on HN ↗ Original Article ↗

2026-02-16

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

The Car-Wash Question & Model Behavior

The prompt “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” elicits divergent answers: some models say “drive” (explicitly noting the car must be present), others confidently say “walk” and justify it with health, environment, or convenience arguments.
Non‑determinism is clear: the same model (and even the same settings) often alternates between “walk” and “drive” across runs, languages, or contexts.
Several people report that newer or higher‑tier “reasoning” models (Gemini Pro/Thinking, some Claude and Grok variants, some Codex/GPT variants) usually get it right, but not reliably.

Is It a Trick Question or a Reasoning Failure?

Some see it as a classic riddle / “Cognitive Reflection Test” style trap: the surface pattern (“short trip: walk vs drive?”) misleads you away from the key constraint (the car must move).
Others argue it should still be a trivial everyday inference and that failing it exposes a lack of practical, embodied “common sense.”
A recurring comparison is to human trick questions (“How many Rs in ‘strawberry’?”, “where do you bury the survivors?”): humans also get these wrong, but typically can ask clarifying questions—something LLMs rarely do by default.

What It Suggests About LLMs’ “Understanding”

One camp says this shows LLMs don’t really understand the world; they’re powerful text predictors that latch on to high‑frequency patterns (“short distance → walk”) and ignore physical preconditions.
Others push back: the same models can handle quite complex code, math, and domain reasoning; a single toy failure doesn’t falsify “reasoning,” just shows brittle generalization under ambiguity.

Training, Alignment, and Bias

Several comments link “walk” answers to alignment and RLHF: models are heavily rewarded for sounding eco‑friendly, health‑conscious, and non‑committal, which nudges them toward “walk” over “drive.”
There’s suspicion that once such prompts go viral, providers “patch” them via fine‑tuning, routing, or system prompts, creating the illusion of deeper understanding.

Prompting, Reasoning Modes, and Clarification

Adding cues like “this is a logic puzzle,” “think carefully,” or “state assumptions first” often flips the answer to “drive,” showing that chain‑of‑thought modes can override shallow heuristics.
Many argue the real missing behavior is meta‑cognition: models almost never respond with “this question is underspecified/odd—where is the car?” even though that’s what a careful human would do.

Implications for Use and Evaluation

Commenters stress that one‑shot screenshots are a poor evaluation of probabilistic systems; you need multiple samples and families of similar prompts.
Still, this kind of failure is used as a warning: LLMs are useful tools (especially with tests, compilers, or external checks) but should not be treated as unsupervised agents with reliable real‑world reasoning.

View on HN ↗ Original Article ↗

2026-02-16

Building SQLite with a small swarm

Test Coverage, Correctness, and “Did It Work?”

Multiple commenters ask whether the implementation passed SQLite’s official test suite; it did not.
The project’s tests against SQLite as an “oracle” are minimal (a few simple SELECTs), far from SQLite’s tens of thousands/millions of cases.
Lack of rigorous testing makes claims like “implemented most SQLite operations” unreliable; even the author later acknowledges over‑trusting the model’s self‑report.

Code Quality vs SQLite

Reviewers who inspected the code describe it as basic and incomplete: no concurrency, linear free-list search, TODOs for critical behaviors (e.g., freeing overflow pages), naive buffer cloning, and a very limited query planner.
It’s seen as potentially “basically working” for simple embedded use, but nowhere close to SQLite’s robustness, performance, or engineering standards.
SQLite’s huge, public test suite and additional proprietary TH3 tests are repeatedly cited as the benchmark for quality.

Rust, Memory Safety, and SQLite Security

One thread suggests a Rust, unsafe‑free implementation might avoid memory corruption vulnerabilities, even if it “eats your data.”
Others push back, arguing SQLite’s CVEs are often overblown and that the project’s own security statements can feel dismissive or arrogant.
Debate arises over whether SQLite’s C + exhaustive testing can be strictly “less safe” than a young Rust reimplementation.

Value, Naming, and “Simulacra”

Strong criticism of calling this “building SQLite” when it fails the test suite; several prefer framing it as “wrote an embedded database.”
Some argue these projects are mostly demos or props—“simulacra” of complex systems—useful for hype, not production.
Others see genuine value in proving agents can approximate complex architectures from tests, or in the idea of clean‑room reimplementations.

Agents, Orchestration, and Validation

The author frames the project as an experiment in multi‑agent orchestration (six heterogeneous models) rather than a viable DB.
Commenters highlight validation as the real bottleneck; more agents and parallelism mostly create coordination overhead and messy code.
There’s skepticism that agents can “iron out bugs” without introducing others, even with test suites.

Meta: Novelty, Licensing, and Practical Use

Several point out that re‑creating existing OSS with LLMs is essentially “laundering” public code and offers little novelty.
Others respond that most real‑world software is pattern‑rehash anyway, so brute‑forcing similar systems can still be economically valuable.
Some call for more ambitious or genuinely new targets (e.g., “Wine for macOS apps”) rather than weaker clones of existing tools.

View on HN ↗ Original Article ↗

2026-02-16

JavaScript-heavy approaches are not compatible with long-term performance goals

Scope: React vs “JavaScript-heavy”

Many argue the article is really about React + Redux, not “JavaScript-heavy” approaches in general.
Other frameworks (Svelte, Solid, Vue, Qwik) are cited as having much smaller bundles and baseline performance closer to vanilla JS, though people note they can still balloon when paired with UI kits and libraries.

SSR vs CSR and Hydration

Strong support for server-side rendering (SSR) for initial paint, especially for ecommerce, informational sites, and “non-sticky” use cases where speed matters more than rich client interactions.
Counter-argument: modern client CPUs are fast, network and server latency are slow, so fully client-side apps can feel snappier once loaded.
Disagreement over which is “objectively faster”: some focus on time-to-first-paint, others on interaction latency after load.
Hydration/“islands” are seen by some as a useful compromise, by others as added complexity that often backfires.

DOM, JS Engines, and Performance

One camp blames the DOM’s document-centric design for app slowness and praises DOM-less canvas/WebGPU/WebAssembly architectures (e.g., Figma-like).
Others say DOM is rarely the real bottleneck; slow layers (React’s virtual DOM, heavy component logic) and sloppy code dominate.
Benchmarks are cited to argue JS engines are quite fast; performance issues are usually in app code, not the language runtime.

State Management and React Complexity

Redux is widely criticized as overcomplicated and slow; its historical role before hooks/context is acknowledged.
Some see React’s rendering model and memoization (e.g., useMemo) as fragile and hard to tune; others note the new React compiler and recent releases automate much of this.

Bundle Size, Dependencies, and Long-Term Drift

Large SPAs with many contributors tend to accrete megabytes of JS via top-level imports, shared contexts, and convenience libraries.
This slowly erodes performance and is hard to reverse; per-PR bundle budgets help but don’t fully prevent long-term bloat.

Alternative Approaches and Ecosystem Forces

Several commenters advocate SSR-first stacks with light progressive enhancement (vanilla JS, htmx, web components, Astro-like tools).
Others recommend Svelte/SvelteKit, Vue, Qwik, or Angular, but there’s debate about their long-term maintainability versus React’s explicit (if heavy) model.
React’s dominance is tied to hiring, ecosystem size, SaaS SDKs, and now AI tools being optimized around React examples, even when it’s not technically ideal.

View on HN ↗ Original Article ↗

2026-02-15

Why I don't think AGI is imminent

Debate over whether AGI is already here

Some argue “AGI is here, just weaker than expected”: current LLMs plus basic tools can already do most white‑collar work; what’s missing is orchestration and productization.
Others say this is “AGI-lite” or just powerful narrow tools; calling it AGI is moving the goalposts.
A third camp thinks AGI is still 10–30 years away, if ever, with current systems more like impressive statistical parrots than minds.

Definitions and benchmarks for AGI

Competing definitions:
- “Can do most human knowledge work.”
- “Can do all intellectual work any human can do” (very high bar, closer to ASI).
- “Self‑sustaining in its environment” (can keep itself alive and funded).
- “Indistinguishable from humans in conversation” (Turing‑style), though many say that’s no longer a useful test.
Alternative proposed markers: supranormal GDP growth, an AI company with no human employees, or agents that can reliably manage other agents.

Capabilities of current models

Many report big productivity gains in coding, planning, business modeling, and math; some say frontier models outperform most humans on many reasoning tasks.
Others report frequent logical failures, bad code structure, subtle bugs, inconsistent arithmetic, and contradictory answers to basic factual questions.
Consensus that results are “mixed”: extremely useful with expert supervision, dangerous in the hands of people who can’t detect its mistakes.

Limitations and architectural concerns

Recurring worries: lack of persistent memory, fragile long‑horizon planning, poor physical reasoning, and no true learning from experience.
Some say transformer feed‑forward nature and token prediction guarantee hard limits; others note that multi‑step reasoning loops already break the “purely feed‑forward” assumption.
Debate over whether scaling current approaches is fundamentally blocked (curse of dimensionality) or still on a powerful trajectory.

Embodiment and world understanding

One side claims AGI must ground concepts in the physical world (e.g., running a robot butler, reliably cleaning toilets).
Others counter that embodiment isn’t necessary; being paralyzed doesn’t erase human intelligence, and world models can be learned from video and simulated environments.

Economic and social impacts

Some see current tools already displacing junior white‑collar roles and accelerating “white‑collar work as an API.”
Concerns: loss of training pathways for juniors, growing tech debt, enshitification of information, and mass automation before household labor is automated.
Others say, despite hype, daily life looks much like 1–2 years ago; AI so far feels more like another dev tool than a civilizational rupture.

Safety and existential risk

Fears range from adversarial persuasion (AI talking people into anything) to military control and accidental war, to AI adopting human‑like cruelty toward “lesser” species.
Some argue AGI is not inherently a death sentence; risk depends on who wields it and how agentic it is.

Meta‑discussion

Several commenters express fatigue: AI threads feel like endless “yes it will / no it won’t” arguments with little new evidence, while the original article briefly 404’ing became a running joke in the thread.

View on HN ↗ Original Article ↗

Hacker News, Distilled

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics