Stories - Page 479 | HN Distilled

2025-01-26

The only photo of the Concorde flying at supersonic speed

Supersonic photography and comparable aircraft

Several commenters doubt it is literally the only supersonic photo of Concorde, suggesting many test-flight shots likely exist but aren’t public or easily found pre-internet.
Discussion of what could keep up with Concorde at Mach 2: suggestions include military interceptors like the English Electric Lightning or another Concorde.
SR‑71 Blackbird is repeatedly mentioned as faster, with notes that its performance tables only go down to Mach 2.2, implying Mach 2 would be “slow” for it. Debate over whether it could “match speed” safely or within certified regimes.

Noise, sonic booms, and public impact

Many personal memories of Concorde’s distinctive roar near Heathrow, Gatwick, Manchester, JFK, and elsewhere.
Some recall it as exciting; others found it disruptive enough to halt conversations or cause neighborhood resentment.
Sonic booms and complaints about cracked windows and structural damage are cited as a major driver behind restrictions and eventual bans on overland supersonic flight.
One extended rant frames Concorde as a noisy prestige project for elites with negligible societal benefit.

Museums and surviving airframes

Multiple museums worldwide host preserved Concordes (Bristol, Seattle, NYC, Paris, Toulouse, Scotland, Germany).
Visitors remark on the cramped cabin, very busy analog cockpit, and structural expansion in flight (e.g., a hat still wedged in an expansion gap).

Boom Supersonic and future SSTs

Boom’s upcoming Mach 1 test flight is noted, with skepticism about its economics: fewer seats than Concorde but similar cost and fuel issues.
Some see the target market as business jets and ultra-wealthy customers; others question engine availability and regulatory hurdles.
Environmental, noise, and equity concerns are raised, including fears of “ask forgiveness later” over sonic impacts on remote communities.

Flight times, efficiency, and regulation

Lament that NYC–London block times haven’t improved since the 1960s, despite Concorde’s brief era.
Counterpoint: massive advances in safety, fuel efficiency, and cost-per-seat; fuel economics drive slower cruise speeds today.
Debate over whether deregulation killed innovation vs regulation (especially noise rules) constraining supersonic commercial flight.

Altitude and curvature

Some assert Concorde’s altitude allowed passengers to see Earth’s curvature; others argue apparent curvature in photos may be lens distortion or perspective, citing optical research. Disagreement remains unresolved.

View on HN ↗ Original Article ↗

2025-01-26

Another undersea cable damaged in Baltic Sea

Incident and Technical Context

Swedish authorities opened a criminal investigation and seized a suspect vessel; multiple agencies (police, coast guard, defense) are involved.
Cable depth quoted around 50–80 m. Several commenters argue this is too deep for an ordinary accident and too shallow to be unreachable.

Cause: Sabotage vs Accident

Many participants see the pattern of recent cuts as intentional, likely involving ships dragging anchors over mapped cable routes.
Some note that vessels could be used as cheap tools for “hybrid” attacks: low-cost, deniable damage to communications resilience.
A linked article suggesting “accidents” is heavily criticized as contradicting named officials who call it deliberate.
A minority urges caution, citing ambiguous evidence and the historical existence of accidental damage; overall, sentiment strongly leans toward sabotage.

Diving and Cable Damage Feasibility

Discussion clarifies that typical recreational PADI certifications are limited to 18–40 m; 80 m requires serious technical training, helium mixes, long decompression, and often rebreathers.
Consensus: an 80 m dive is doable but specialized, not something a casual diver would use for covert sabotage; dragging ship anchors is easier.

Economic and Infrastructure Impact

Repair costs cited at roughly $1–3M per break plus downtime.
Some argue Baltic links are not globally critical and traffic reroutes, making this more of an annoyance.
Others stress that undersea cables as a class carry enormous economic value, so systematic attacks are strategically significant.

Legal and Maritime Policy Debates

Baltic access is geographically constrained by Danish waters; the Copenhagen Convention and UNCLOS define “innocent passage,” limiting new conditions like mandatory insurance.
Some argue Denmark/Sweden could leverage environmental or security provisions to impose stricter rules (insurance, inspections, compliance) on ships, especially those trading with Russia.
Others see legal and political barriers to turning the Danish straits into a choke point without reneging on treaties.

Deterrence and Response Options

Proposals include:
- Requiring liability insurance for undersea infrastructure damage.
- Seizing suspect vessels and cargo to fund repairs.
- Blacklisting companies and captains involved.
- Stronger sanctions on Russian fossil fuels and related logistics.
There is debate over punishing rank‑and‑file crew vs. focusing on owners, operators, and intelligence officers likely directing operations.

Broader Geopolitical Framing

Many connect the incident to Russia’s broader hybrid warfare: targeting infrastructure, spreading costs, and discouraging support for Ukraine.
Others tie it into a larger, emerging confrontation involving Russia, China, and Western allies, with references to recent Taiwan cable cuts and espionage cases.
Lengthy subthreads debate whether Russia is “winning” or heading for demographic and economic decline, and whether the EU can defend itself without US backing.

View on HN ↗ Original Article ↗

2025-01-26

Former tech CEO suing to get the record of his arrest removed from the internet

Legal and Defamation Issues

Major debate over whether publishing details of a sealed arrest can be defamation.
Some argue: in the US, truth is an absolute defense, so reporting a real arrest cannot be defamatory.
Others counter: a judge ordered the arrest record sealed “as if it had not occurred,” so reporting only the arrest without the dismissal/expungement is a misleading half-truth.
A California law reportedly makes it illegal to publish sealed arrest reports; critics say this clashes with free speech and public-interest reporting.
Comparisons to other jurisdictions (e.g., Finland, EU “right to be forgotten”) show that in some places even true statements can be defamatory or must be deindexed.

Free Speech, Power, and SLAPP Dynamics

Many see this as a high-stakes clash between free speech and the ability of wealthy figures to suppress or punish reporting via costly lawsuits.
Example of a large media outlet being bankrupted over litigation and other outlets being chilled by threats is raised as precedent.
Concern that independent or “citizen” journalists are especially vulnerable to being sued into oblivion.

Arrests, Reputation, and “Internet Never Forgets”

Broad concern that publishing full names on arrest (not conviction) can permanently damage reputations, careers, and social lives.
Others argue arrest records must remain public to prevent abuses like secret jails, disappearances, and unaccountable policing.
Some propose compromises:
- Keep arrest records public but limit how names/images are published.
- Anti-discrimination rules preventing employers from using arrest (without conviction) against applicants.
- Restrict mugshot/perp-walk “voyeurism” while retaining transparency.

Journalism Ethics and Public Interest

Multiple comments distinguish between:
- Sensational coverage of private missteps vs.
- Reporting that clearly serves a public or shareholder interest.
Some say journalists should weigh patterns, relevance to corporate governance, and potential harm before publishing personal allegations.

Streisand Effect and Public Perception

Many see this lawsuit as classic Streisand effect: attempts to suppress coverage have amplified awareness of the arrest.
Split views:
- Some say reputational damage was already done; further publicity is marginal.
- Others note future employers and partners are more likely than ever to find the story now.

View on HN ↗ Original Article ↗

2025-01-26

No Bitcoin ETFs at Vanguard (2024)

Vanguard’s No‑Bitcoin Stance & Investing Philosophy

Many commenters praise Vanguard for prioritizing long‑term, fundamentals‑based investing and refusing to offer Bitcoin ETFs, even at the cost of losing customers to more “accommodating” brokers.
Others dislike the “paternalistic” restriction and say a broker should be a neutral tool that lets them implement their own theses, including crypto.
Several note Vanguard also avoids pure gold and many commodity ETFs, so excluding Bitcoin is consistent with their brand, though gold is somewhat “grandfathered in.”

Is Crypto an Investment or Pure Speculation?

Strong view: Bitcoin and most crypto have no inherent economic value, produce no cash flow, and function as zero‑sum or negative‑sum games reliant on a “greater fool.”
Counterpoint: So do fiat currencies; value is ultimately social consensus. Bitcoin can be a store of value and cross‑border medium of exchange with unique censorship‑resistant properties.
Some argue small allocations (1–5%) can improve risk/return or serve as “insurance” against fiat debasement or systemic shocks; others say volatility and lack of fundamentals make that unjustified.

Comparisons: Stocks, Bonds, Gold, Cash

Several argue stocks and bonds are fundamentally different: they are claims on productive assets and future cash flows; long‑term expected return is positive, not zero‑sum.
Gold is seen as mostly speculative but with non‑zero industrial and aesthetic demand plus millennia of “Lindy” as a store of value; Bitcoin lacks this history and intrinsic use.
Fiat cash is acknowledged as faith‑based and inflationary, but backed by state power and tax obligations, and useful as a short‑term store of value and unit of account.

Blockchain, Smart Contracts & Non‑Crypto Uses

One camp: “Blockchain without crypto” is mostly a solution looking for a problem; traditional databases, legal contracts, and regulated intermediaries already work well.
Other camp: Smart contracts (e.g., DeFi protocols, ENS) and censorship‑resistant ledgers have real though niche utility, especially for low‑trust, low‑value, or cross‑border interactions.
Skeptics note that most visible blockchain activity is still speculation, scams, and money laundering; proponents reply that tooling and scalability are early, and legitimate uses are slowly growing.

Use Cases, Externalities & Ethics

Claimed positive use cases: remittances, capital flight from failing currencies, censorship‑resistant payments, and asset access in unstable or authoritarian countries.
Negative externalities raised: facilitation of ransomware, sanctions evasion, human trafficking, tax evasion, and enormous energy use (for proof‑of‑work chains).
Some see crypto as an economic sink that burns real resources to manufacture speculative assets; others see it as a necessary tool for financial freedom in a world of increasing state and corporate control.

View on HN ↗ Original Article ↗

2025-01-26

Ask HN: Would you still choose Ruby on Rails for a startup in 2025?

Overall sentiment on Rails for 2025 startups

Many would still choose Rails, especially for early-stage, small teams or solo founders.
Others prefer Next.js/TypeScript, Django, Laravel, Go, or .NET, often driven by existing skills or hiring concerns.
Several argue the tech stack rarely determines startup success; familiarity and speed of execution matter more.

Speed, productivity, and “boring tech”

Strong consensus that Rails is extremely productive: “batteries included” (auth, jobs, mailing, admin-ish features, Hotwire, etc.) lets teams focus on business logic.
Compared to Go or bare Node/Express, Rails reduces boilerplate (migrations, job queues, scaffolding) and is said to accelerate MVPs dramatically, especially combined with LLM-based code assistants.
Some group Rails with other “boring, battle-tested” stacks (Django, Laravel, Symfony) that are favored for reliability over novelty.

Maintainability, “magic,” and long-term cost

Critics point to heavy metaprogramming and “magic” (e.g., invisible bindings, callbacks, DSLs) that can make debugging and refactoring difficult, especially when developers over-optimize for DRY.
Some report legacy Rails apps full of hard-to-diagnose bugs or stuck on old versions.
Others counter that vanilla Rails upgrades are manageable; problems usually come from poor architecture or 3rd‑party libraries, not the framework itself. Community norms have shifted away from clever metaprogramming toward clearer code.

Performance and scaling

Rails is acknowledged as slower than Go/Rust and similar to or slightly faster than Python, but many argue most web apps are I/O-bound and can scale Rails with hardware and database tuning.
Examples of long-lived, high-traffic Rails monoliths suggest practical scalability; performance issues are said to concentrate at the database layer.
Some still prefer Go/Rust/Elixir for infrastructure, extreme performance, or concurrency-heavy workloads.

Ecosystem, hiring, and community

Rails benefits from a mature ecosystem (gems, Rack, job systems, deployment tools like Kamal) and good documentation.
Several commenters say hiring experienced Ruby/Rails developers is now harder than for Node/Python/PHP, and worry this may worsen.
Others downplay this, saying good generalists can learn Rails quickly.

Governance, leadership, and politics

A visible minority is wary of the framework’s de facto leader and sponsoring company, citing controversial public views and centralized influence (e.g., decisions about TypeScript support).
Some see this as a meaningful reason not to adopt Rails; others either ignore it or consider the community and broader maintainer set sufficient to mitigate individual influence.

When not to choose Rails

Advised against when:
- You already have deep expertise in another suitable stack.
- You need tight integration with Python-heavy ML stacks.
- You expect very high performance/infra demands better served by Go/Rust/Elixir.
- Hiring in your region for Ruby is clearly difficult.

View on HN ↗

2025-01-26

Qwen2.5-1M: Deploy your own Qwen with context length up to 1M tokens

Running Qwen2.5-1M Locally (Mac, MLX, GGUF, CPU)

People are experimenting with running very long prompts (hundreds of thousands of tokens) on Macs, especially M3/M4 Max with 64–128GB unified memory.
One report: ~446k-token Rust/TypeScript codebase query on an M4 Max ran ~4 hours and returned a seemingly reasonable answer.
MLX 4-bit variants exist for macOS, but current MLX doesn’t yet support the dual-chunk attention mechanism used for full 1M-token context.
Some consider trying 1M-token prompts on large-RAM CPU servers, but expect it to be extremely slow.

Memory, Context Length, and KV Cache

Long context is dominated by KV cache memory, which scales with sequence length; 1M tokens requires “obscene” amounts of RAM/VRAM.
Official guidance:
- Qwen2.5-7B-Instruct-1M: ~120GB VRAM for full 1M context.
- Qwen2.5-14B-Instruct-1M: ~320GB VRAM.
KV cache quantization (e.g., 4-bit) can cut cache memory to ~¼, at the cost of quality.
Comparison table for another model shows memory rising steeply as context grows (e.g., ~27.5GB→109.8GB going from 200k @4-bit→16-bit).

Ollama Defaults and Context Configuration

Ollama’s num_ctx defaults to 2k and is widely seen as a “foot-gun”: it silently discards leading tokens when exceeded.
Users must explicitly set num_ctx higher or save a model variant with increased context. Documentation and behavior are criticized as confusing.
Plugins and tools (e.g., files-to-prompt integrations) support passing num_ctx, but users often misinterpret it as output length.

Hardware Choices and Cost/Accessibility

Macs with large unified memory are attractive for big-context local inference, but RAM configurations are expensive and tightly coupled to CPU tiers.
Some argue multi-GPU x86 builds (e.g., multiple 3090s) offer better raw compute per dollar, but they lack the large unified memory of Apple Silicon.
Debate over class/access: high-RAM Macs and large GPU rigs are seen as increasingly out of reach for many, shifting experimentation back toward the well-funded.

Actual Usefulness of Huge Context Windows

Multiple reports say models often degrade beyond ~25–32k tokens for coding and other precise tasks: loss of instruction-following, missed files already in context, poor recall.
Others counter that 1M–2M context in some services works well for high-level overviews or summarization of large codebases or corpora.
Overall sentiment: large context is promising but unreliable for complex, fine-grained tasks; retrieval quality and “lost in the middle” remain major issues.

Benchmarks, Long-Context Hype, and Limits

Skepticism about “nearly perfect” long-context claims: detailed tables show significantly less than 100% on complex tasks and often only up to 128k, not full 1M.
Long context is distinguished from generation length; several commenters note that output length across turns is still a hard, unresolved problem.

View on HN ↗ Original Article ↗

2025-01-26

Hard numbers in the Wayland vs. X11 input latency discussion

Measurement & Significance

OP measured end‑to‑end cursor latency with a high‑FPS phone camera, finding ~~1 extra 144 Hz frame (~~6.5 ms) on GNOME Wayland vs GNOME X11.
Several commenters say this end‑to‑end approach is exactly what users care about; others call the methodology “rough” and suggest hardware‑timed setups (photodiodes, Arduinos, LDAT) for cleaner data.
A quick statistical check in the thread finds the difference highly significant (very low p‑value), so it’s unlikely to be pure noise.

GNOME vs “Wayland in general”

Many stress the result only applies to GNOME’s Mutter compositor; Wayland is just a protocol.
Others counter that in practice the dominant compositors become de‑facto standards, so GNOME’s issues matter for “Wayland” as users experience it.
Some request repeats on KDE, wlroots/Sway, Gamescope; one person shares their own hardware measurement showing similar “Wayland a bit slower than X11” on KDE and Sway.

Probable Technical Causes

Strong suspicion that the extra frame comes from compositor behavior: vsync, buffering, and atomic KMS updates.
Explanations focus on:
- Wayland compositors doing atomic, vsync‑synchronized cursor updates vs X11’s historically more “as‑soon‑as‑possible” hardware cursor updates, even mid‑scanout (accepting tearing).
- Cursor throttling and lack of “race the beam” techniques.
- Cases where hardware cursor planes aren’t used or are constrained by drivers (AMD/Nvidia quirks, atomic DRM path).
Several people note full‑screen games can bypass or minimize compositor latency via direct scanout or game‑driven cursors, so desktop cursor results may not map directly to in‑game latency.

Gaming and Human Perception

Debate over whether ~6–7 ms matters: some call it imperceptible for most, others cite research and gaming experiments suggesting small latency differences measurably affect win rates, “feel,” and rhythm/timing tasks.
Some Linux gamers report no noticeable downside on Wayland; others see lag spikes under GPU or memory pressure.

Wayland vs X11: Design & Ecosystem

Wayland praised for:
- Eliminating tearing by design and enabling modern features (HDR, VRR, better security, simpler core).
- Being extensible via protocols rather than a single monolith.
Criticisms focus on:
- Fragmentation: many compositors, non‑uniform support for window management, input, color calibration, accessibility, screen readers, and advanced input workflows (e.g., multi‑host keyboard/mouse, stream decks).
- Regressions vs X11’s long‑mature feature set, especially for visually impaired users and power‑user tooling.
- The “protocol not implementation” stance being seen as dodging responsibility for user‑visible problems.

Maturity, Politics, and Backwards Compatibility

Long meta‑discussion: X11 is old, messy, but battle‑tested; Wayland is younger, still missing polish and some protocols.
Some argue X11 could have been secured and modernized instead of replaced; others say its architecture and codebase were effectively unfixable, and most ex‑X developers have moved on to Wayland.
Several users report better responsiveness and less tearing on Wayland; others find it buggier or unusable with certain GPUs or workflows.

Related Tools & Ideas

Mention of Typometer, Perfetto, custom hardware rigs, and other approaches to measuring UI and terminal latency.
General agreement that more systematic, cross‑compositor measurements are needed to pinpoint where latency is introduced and how to fix it.

View on HN ↗ Original Article ↗

2025-01-26

The Microsoft 365 Copilot launch was a disaster

Overall sentiment on Copilot in M365

Many see the Copilot rollout as a “disaster”: immature, unreliable, and pushed too aggressively into core workflows.
Some note occasional usefulness (e.g., explaining obscure Windows settings, summarizing long email threads), but say benefits don’t justify the disruption.
A few believe it will improve over time; others argue beta‑quality features shouldn’t be forced on paying customers.

Forced bundling, pricing, and licensing

Strong criticism of tying AI features to subscription price hikes and hiding a cheaper “Classic” / non‑Copilot tier behind cancellation flows.
Enterprise users report Copilot is still a separate, expensive add‑on roughly comparable to an E3 license; engagement is decent but measured time‑savings barely cover cost.
Family plan Copilot access is limited to the subscription owner, which some call “nuts.”

UX, productivity, and reliability

Complaints that Copilot UI is intrusive (e.g., ever‑present in Word/Outlook on macOS, large summary panels that can’t be hidden).
Reports that features often fail or hallucinate: irrelevant PowerPoint slides, Azure Copilot pointing to outdated docs, Excel/Office behavior regressions.
Broader frustration with Microsoft UX: Teams, OneDrive/SharePoint/Teams file sprawl, Outlook quirks, Notepad’s new autosave/session behavior.

Privacy, data use, and AI skepticism

Worry that documents and emails are being ingested for training or surveillance, especially in legal/regulated or educational contexts.
Some suggest Copilot need not be “good” as long as it justifies data collection and bolsters AI usage metrics for investors.
Several call the relationship with Microsoft “abusive” due to lock‑in and lack of meaningful opt‑out.

Branding, naming, and product sprawl

Rebranding Office → Office 365 → Microsoft 365 → Microsoft 365 Copilot is widely seen as confusing and diluting a strong “Office” brand.
Complaints about Microsoft’s naming chaos more broadly (Teams variants, Xbox generations, “Windows App” for RDP), making admin and support harder.

Alternatives and coping strategies

Many home users are downgrading to “Classic,” reverting to perpetual Office (2010–2024) or older versions, or switching to LibreOffice, OnlyOffice, Google Workspace, or Linux.
Some organizations are blocking Copilot at the fleet level.
Others simply ignore Copilot buttons and treat them as UI noise.

Education and broader AI impact

Educators worry embedded AI makes essay cheating trivial, pushing schools toward surveillance exams (webcams, keylogging, in‑class writing).
Several lament AI being bolted onto mature tools that already “solved” word processing and spreadsheets, driven more by hype and Wall Street than user need.

View on HN ↗ Original Article ↗

2025-01-26

Are Americans' perceptions of the economy and crime broken?

Macro vs Micro: Economy Perception

Many argue that top-line macro stats (“economy is strong”) don’t match everyday experience (higher costs, fewer opportunities).
Others say people conflate “the economy” with “my personal finances/QoL,” so both “economy is good” and “I feel worse off” can be true.
Some see a K-shaped pattern: top ~20–30% doing fine while bottom half struggles, fueling pessimism despite solid aggregates.

Definitions, Metrics, and Inequality

Disagreement over what “economy” means: formal macro definition (GDP, production) vs household-level well-being and affordability.
Several note GDP and markets can rise while labor share and disposable income after essentials fall.
Rising asset prices, especially housing, are cited as key decouplers between macro success and lived hardship.

Inflation, Wages, and Housing

Inflation spike in 2022 is widely acknowledged; debate is whether wage gains offset it.
Some say wages beat inflation overall since 2019; others point to stagnant incomes, wiped-out savings, and retirees/pensioners losing ground.
Housing is seen as increasingly unattainable for younger or typical earners, especially in major metros.

Crime: Data vs Perception

Official stats show large crime drops (e.g., murders in big cities), but many insist crime feels worse locally.
Explanations offered: uneven geographic distribution (safer areas now seeing more incidents), underreporting, and changes in policing/prosecution.
Visible homelessness and public drug use (“drug zombies”) heavily shape perceptions of safety, even if not always captured as crime.

Media, Propaganda, and Partisanship

Strong claims that partisan outlets drive perception: right-leaning media emphasizing economic doom under Democrats; others accuse left-leaning media of minimizing problems.
Perceptions of inflation, crime, and overall conditions appear to track which party holds power rather than data.
Broader concern that media prioritize outrage and ad-driven engagement over accurate context, degrading the “ability to discern discourse from propaganda.”

Financial Literacy & Personal Finance

One thread blames low numeracy/financial literacy and distrust of investing for poor outcomes.
Heated back-and-forth over how meaningful a 5% yield or ETFs/T-bills are for people living paycheck-to-paycheck, with some calling such advice helpful and others “out of touch.”

View on HN ↗ Original Article ↗

2025-01-26

It's not a crime if we do it with an app

Apps, “disruption,” and law‑breaking

Long debate over whether companies like Uber and Airbnb are fundamentally different from earlier “disruptors” (YouTube, Netflix, Tesla, Craigslist, eBay, Ford).
One side: Uber/Airbnb’s business model required ignoring existing taxi/hotel laws; YouTube et al mostly faced secondary issues (user piracy) and complied with takedown laws.
Others argue that early YouTube and similar services actively benefited from infringement to grow, so they too “broke rules first, asked forgiveness later.”
Some see Uber as a net positive (better service, lower DUIs, break taxi medallion cartels); others say ends don’t justify illegal tactics or regulatory arbitrage.

Algorithmic cartels and Potatotrac‑style tools

Core concern: pricing/analytics platforms (for frozen potatoes, rent, etc.) act as coordination hubs so a few dominant firms can move prices in lockstep.
Supporters of this view say that when 3–4 firms control ~97% of a commodity and all use the same pricing app, this is effectively a cartel, just with software as a smokescreen.
Skeptics note some cited markets are highly competitive and low‑margin, and lawsuits are still pending; some question whether the “cartel” framing is overblown.

Monopolies, antitrust, and regulation

Many commenters argue modern capitalism naturally drifts toward oligopoly; antitrust enforcement and merger blocking are seen as insufficient or too slow.
Discussion of recent US antitrust efforts: some praise renewed enforcement; others say impact on big tech and large mergers has been modest.
Disagreement over whether price regulation is “disaster” or necessary for natural monopolies/oligopolies.

Corporate crime vs individual crime

Strong sentiment that corporations are treated leniently: small fines, no jail, rare “corporate death penalty,” contrasted with harsh treatment for petty individual crime.
Debate over limited liability and the “corporate veil”: originally to spread risk, now seen as shielding large firms and executives from meaningful consequences.
Proposals range from massive fines and forced stock dilution to jailing senior decision‑makers or large shareholders; others warn this is unworkable or would punish ordinary retirees.

Competition, “ethical” firms, and barriers to entry

Repeated question: if incumbents overcharge, why don’t more ethical, lower‑margin competitors win?
Answers raised: economies of scale, vertical integration, control of distribution/retail, regulatory barriers, access to capital, and incumbents’ ability to undercut or buy out entrants.
Some argue cultural and structural incentives ensure “less greedy” firms are selected out at scale.

Inflation, money supply, and “greedflation”

Part of the thread attributes price hikes mainly to corporate power and algorithmic collusion (“greedflation”).
Others insist increased money supply, supply shocks (e.g., pandemics, wars), and standard supply–demand dynamics are major drivers; they criticize ignoring monetary policy.
Overall: consensus that multiple forces interact, but disagreement on which is primary.

View on HN ↗ Original Article ↗

2025-01-26

No one is disrupting banks – at least not the big ones

Why big banks are hard to “disrupt”

Regulation is repeatedly described as the main moat: banking licenses, capital ratios, AML/KYC, and supervisory regimes make entry costly and slow.
Big banks often want heavy regulation because it locks in incumbents and makes new competitors uneconomical.
Some argue that in practice disruption is often just “regulatory arbitrage” or skirting rules until regulators catch up.
Attempts to get direct Fed “master accounts” (e.g., Reserve Trust, Custodia) faced strong resistance and, in one cited case, revocation.

How money and credit actually work

Several comments stress that all banks create credit “out of thin air” via lending, constrained by capital and liquidity rules.
Others note that anyone can create credit (IOUs, trade receivables); what banks have is a special legal/regulatory backstop when they misprice risk.
There’s debate over how “magical” this is: some see it as an accounting trick that yields interest on created credit; others emphasize system-wide balance and interbank settlement via central banks.

Fintech and neobanks: real but limited disruption

In consumer retail, neobanks (Monzo, Starling, Revolut, Nubank, etc.) are credited with better apps, instant notifications, fee pressure, and forcing incumbents to improve UX.
Yet core deposit and lending power, especially at scale and in mortgages, remains with large incumbent banks.
Some see better savings rates (HYSAs, brokerage cash accounts) and app interfaces as incremental competition, not structural disruption.

Crypto and alternative currencies

Strong disagreement: some claim crypto was suppressed because it threatened banks; others say crypto has never been a credible threat and mostly fuels speculation, scams, and some criminal use.
Long subthread on value: fiat vs crypto vs gold/diamonds; many note all money rests on shared belief, but government fiat is anchored by tax obligations and legal enforceability.
Skeptics highlight volatility, lack of real-world use, and regulatory risk; boosters point to censorship resistance and global, low-friction transfers.

Payments, UX, and “what needs disrupting”

Many users say they’re satisfied: banks safely hold money and enable payments; most people lack enough savings for rate differences to matter.
Others are frustrated by slow interbank transfers, check holds, business-hour cutoffs, opaque transaction data, and poor tooling for detecting and cancelling fraud or subscriptions.
Instant payment systems elsewhere (EU SEPA instant, India UPI, FedNow plans) are contrasted with slower US ACH and card rails; card networks’ fees are widely viewed as a separate, under-addressed oligopoly.

Global and sectoral angles

Examples of more meaningful change:
- Mobile money and wallets (e.g., M-Pesa, WeChat Pay, AliPay, Brazilian neobanks) reaching unbanked or leapfrogging cards.
- India’s public payment infrastructure and regulatory “sandboxes” enabling new models.
- Private credit and securitization shifting large chunks of corporate and real-estate lending off bank balance sheets (disruption on the “fin” more than the “tech” side).

Trust, safety, and failures

Recent collapses (SVB, Synapse/BaaS issues, various crypto blow-ups, meme coins) make commenters wary of entrusting core savings to fintechs or crypto platforms.
Many explicitly say they want their bank to be boring, stable, and un-“disrupted,” and will only use fintech for small balances or specific conveniences.

View on HN ↗ Original Article ↗

2025-01-26

Toyota reduces price of new hydrogen car with $15,000 of free fuel

Hydrogen car safety

Several commenters with hands-on or driving experience report no unusual safety issues; one compares it to LPG/natural-gas cars, expecting over-engineered high‑pressure systems.
Hydrogen is described as generally uneventful to work with, though its flame is hard to see.
Some argue “physical safety” is moot because the product is commercially irrational, leading to buyer’s remorse and lawsuits.
One link is shared to a hydrogen safety overview (no consensus discussion of its details in thread).

Refueling infrastructure and range

California station map is cited: ~65 stations, but only ~33 online with fuel.
Many call hydrogen cars “the worst of all worlds”: require new production/distribution plus the time and inconvenience of fueling.
Real-world Mirai driver confirms trip planning strictly constrained by station locations.
Range anxiety for gas vs EV is debated; broad agreement that EV “running empty” is harder to recover from than ICE.

Driving experience and powertrain

Hydrogen fuel-cell cars are effectively EVs with a small battery; the fuel cell charges a ~1.2 kWh drive battery that powers the motors.
Refueling time (about 5 minutes) is contrasted with much longer EV fast-charging, but others note most EV owners “charge while parked” and rarely spend active time refueling.

Environmental and technical concerns

Water exhaust and black ice: some worry about winter road icing; others note ICE cars already emit substantial water vapor. One person has seen a Mirai drip liquid water.
Multiple commenters argue most hydrogen today comes from fossil fuels (methane cracking), so “green” branding is misleading.
Hydrogen’s storage, transport difficulty, corrosion, leakage, and low round‑trip efficiency are heavily criticized.
Some propose synthetic methane as a more sensible synthetic fuel than hydrogen; others say both are currently uneconomic.

Hydrogen vs BEVs, hybrids, and materials

Many see Toyota’s hydrogen push as resistance to full BEVs; others argue not all cars globally can be BEVs due to material limits, though that claim is challenged with counter-links.
Debate over lithium/cobalt availability: one side fears long‑term shortages; others point to cobalt‑free chemistries and improving tech.
Hybrids and PHEVs are viewed by some as a more practical middle ground; others say a simple BEV from a company good at hybrids would be cheaper and more reliable than hydrogen.

Infrastructure and policy ideas

UK commenters discuss blending up to ~20% hydrogen into existing methane gas networks and possibly separating it later, but note issues like embrittlement, safety, and fossil-derived “town gas” history.
Some argue hydrogen might make more sense in large vehicles, trucks, or aircraft, mainly for weight and depot-refueling reasons, though even that is questioned.

View on HN ↗ Original Article ↗

2025-01-26

When AI promises speed but delivers debugging hell

Where AI Coding Helps

Widely seen as useful for:
- Small, well-scoped tasks: scripts, one-off tools, data transforms, shell/PowerShell commands.
- Boilerplate-heavy work: REST endpoints, auth wiring, config, SQL queries, tests, logging, simple UI scaffolding.
- Rapid MVPs/CRUD web apps using mainstream stacks (React/TypeScript, Django, etc.).
- Learning unfamiliar APIs or stacks faster than reading full docs.
Often compared to a very fast but junior assistant: effective when the senior dev knows exactly what they want and can specify it precisely.

Where It Fails or Becomes “Debugging Hell”

Struggles with:
- Larger codebases where context exceeds model limits.
- Complex domains: multithreading, distributed systems, parsers with tricky edge cases, cryptography, niche UI toolkits.
- Evolving or less-common libraries where it hallucinates APIs.
When it’s wrong, it tends to:
- Loop on the same bad idea, add noisy logging, or introduce new bugs.
- Stay confidently wrong, making it easy to dig into a messy, hard-to-recover state.

Developer Skill & Workflow Effects

Sweet spots:
- Non-engineers can bootstrap simple SaaS/MVPs much faster than learning from scratch.
- Senior devs gain big speedups on boilerplate and everyday “small” tasks.
Juniors and “in-the-middle” users often flounder: they can’t reliably validate or extend what the model produces.
Some advocate letting AI both write and fix its own code via pasted error messages; others report this quickly devolves into error loops.

Tooling, Context, and Language Constraints

Tools differ (IDE assistants, CLIs, “agentic” editors), but all hit context and coordination limits.
Models work best with:
- Clear specs, small incremental tasks, mainstream stacks, and supplied docs/code as context.
Local, strongly-typed, or niche stacks (embedded, unusual Java UI, custom C dialects) see much weaker results.

Quality, Safety, and Maintainability

Typing is rarely the real bottleneck; understanding, design, and verification are.
AI-generated code often “looks right” but hides subtle bugs or bad practices.
Strong typing and compilers can catch some hallucinations, but security and business-logic errors remain a major concern.
Debugging unfamiliar AI code can exceed the time saved by generation.

Philosophy and Hype vs Reality

Debate over natural-language programming:
- Critics cite ambiguity and non-determinism versus traditional, formal, deterministic languages.
- Supporters see LLMs as a powerful new abstraction layer, akin to past jumps (assemblers, high-level languages).
Broad agreement that:
- Today’s systems are tools, not replacements for competent developers.
- Hype about fully AI-built production apps and mass developer replacement is far ahead of current reality.

View on HN ↗ Original Article ↗

2025-01-26

The protester's guide to smartphone security

Cell Networks, Stingrays, and Power Drain

Discussion of “stingray” fake base stations: concern about downgrading to 2G and tracking devices; some phones don’t allow disabling 2G.
Pixels and some Androids now integrate detection of fake base stations; third‑party apps also exist.
Several participants suggest towers or stingrays can force higher transmit power, rapidly draining batteries and degrading coordination; others argue crowds and weak signal already cause similar drain without any special action.
Airplane mode is no longer trusted on all devices (e.g., some still keep Wi‑Fi/Bluetooth active).

Offline / P2P Tools and Radios

Briar (Android) is repeatedly recommended for P2P, Tor‑based, protest‑oriented messaging; iOS is seen as lacking mature, offline P2P options.
Other tools mentioned: Cwtch, Berty (stability questioned), Matrix/Simplex (if self‑hosted), and Meshtastic (LoRa‑based, encrypted text over cheap hardware).
Concerns: Bluetooth/Wi‑Fi P2P can still be tracked via MACs, side devices (earbuds, watches), and RF fingerprints.
Encrypted walkie‑talkies exist but are expensive and often illegal or restricted on common US bands; some argue using illegal comms invites extra charges and undermines “peaceful protest” framing.

Burner Phones and SIM Anonymity

Some argue burners are obsolete: movement patterns, tower logs, and unusual usage patterns can still tie them to you.
Others say they still reduce risk if:
- bought with cash away from cameras,
- used only at the protest,
- never powered on near your main phone.
Debate over whether strict burner playbooks now make you stand out in modern data sets.
SIM registration laws vary by country; in some EU states anonymous prepaid SIMs are still sold.

Legal Access, Biometrics, and Wiping

In some jurisdictions (e.g., UK) people can be compelled to reveal PINs; in parts of the US, compelled biometrics are more permissible than compelled passwords.
iOS features: erase‑after‑10‑failed‑PIN, requiring eye contact for Face ID, and emergency sequences that disable biometrics. Reports of these sometimes failing in practice.
Remote wiping or revoking account access while a device is in police hands may trigger obstruction or evidence‑destruction charges; intent is key but hard to litigate.
GrapheneOS duress PINs and full‑disk encryption on niche phones (PinePhone, Purism) are discussed but seen as niche/expensive.

Phones, Cameras, and Documentation

Strong faction: best security is leaving your personal phone at home; meet via pre‑arranged points, or use simple cameras or old feature phones.
Counterpoint: phones enable livestreaming and rapid off‑device backups, critical for documenting abuses before devices are seized or destroyed.
Debate about photographing identifiable protesters:
- One side: faces should be avoided to prevent doxxing/retaliation.
- Other side: documenting provocateurs and violent actors is important, and the state is filming anyway, so marginal risk is low.

Limits of Security and Protest Context

Broad agreement that you can’t fully evade a determined state: face recognition, movement data, purchases, and social graphs still link people to events.
View 1: given rubber‑hose tactics and powerful agencies, many “opsec” steps are theater.
View 2: even partial hardening meaningfully reduces dragnet exposure to local police and low‑effort surveillance; tradeoffs should be chosen consciously.
Discussion of how legality, morality, and protest tactics diverge: peaceful protests can still be repressed, and “legal vs illegal” can flip with politics. Some reference foreign influence and provocateurs but others demand stronger evidence.

View on HN ↗ Original Article ↗

2025-01-26

Composable SQL

Functors and Composability Concept

Core idea: “functors” are table-valued abstractions that take tables as inputs and return tables, enabling reusable, testable SQL components.
Motivation: current SQL encourages copy-paste of complex JOINs / filters; views are global, unparameterized, and not easily testable in isolation.
Goal: let the planner inline these abstractions into a single optimized query, unlike many stored procedures.

Comparison to Existing SQL Features

Many note prior art: table-valued functions, stored procedures, table macros, CTEs, and virtual tables in engines like Postgres, SQL Server, Oracle, DuckDB, SQLite, BigQuery.
Key distinction argued: most existing TVFs and stored procs either
- can’t take tables as arguments,
- are optimization fences / cursor-based and hurt performance, or
- live as heavyweight schema objects, not lightweight composable fragments.
DuckDB’s table macros are highlighted as very close, though some limitations around nesting and query_table are discussed.

Testing SQL and Where Business Logic Belongs

One camp: SQL is bad for business logic; treat SQL as an output of a real language, use ORMs / query builders and unit-test the host code instead.
Others: SQL unit testing is possible (e.g., pgTAP, tSQLt, dbt), and functors/macros can help make queries testable by eliminating global tables.
Concern: embedding functors in schema mixes data and logic, complicating maintenance.

Constraints and Foreign Keys Debate

Strong disagreement around whether FKs are “business logic” that should be in the DB.
Pro-FK side: constraints are essential defense-in-depth; they prevent invalid states and catch bugs from multiple clients.
Anti-FK / skeptical side: FKs entrench volatile business rules, make schema evolution hard, and couple all consumers tightly to internal representation.

Optimizer, CTEs, and Engine Behavior

Original article criticizes CTEs and views as optimization barriers; several replies note modern Postgres and “enterprise” engines can inline and optimize aggressively.
Some argue relying on “sufficiently smart” optimizers is risky; others note MSSQL/Oracle show that good inlining and plan caching are feasible.

Alternative Languages and Frameworks

Multiple alternatives mentioned: LINQ, Kusto, Malloy, PRQL, Trilogy, Ecto, SQLAlchemy, HoneySQL, and various query builders / DSLs.
View: composable query builders in host languages already provide “functor-like” composition with better tooling and testing, though none are seen as a perfect SQL replacement.

View on HN ↗ Original Article ↗

2025-01-26

YC Graveyard: 821 inactive Y Combinator startups

Perceived usefulness vs startup success

Commenters note that “sounding useful” doesn’t predict success; many “obviously useful” products fail.
Stripe and Airbnb are used as examples: both entered markets that already had solutions (payments, accommodation) and didn’t sound obviously huge to many early observers.
Airbnb’s eventual success is tied to later emergence of commercial/“unlicensed hotel” hosts, which founders likely didn’t fully foresee.
Some argue that products that don’t sound useful may face less competition and become winner‑take‑all if a new market appears.

YC failure rate, “inactive” definition, and zombies

Using ~4,000 YC startups and 821 marked “inactive” on YC’s own site gives ~20%, but many think this understates failures.
Reasons: companies rarely announce shutdowns; some are zombies (small, long‑running, not growing); recent cohorts haven’t had time to fail.
Acquihires and low‑value exits often look like wins but can still be financial losses for YC.
Methodology of the graveyard: filter YC’s company list by “Inactive,” so it’s a lower bound and misses some known dead startups.
Several users point out errors: active or acquired companies misclassified; some failed ones missing.

Founder outcomes and founder pay

Post‑failure paths include: regular jobs, joining later‑stage startups, big tech roles, or continuing as high‑paid founders of “zombie” companies.
Debate over founders paying themselves $200k+ at pre‑product‑market‑fit startups:
- Some see it as irresponsible or misaligned with “ramen profitability” ideals.
- Others argue high cost of living, debt, and health care justify it; investors accept these terms in competitive markets.

YC, incentives, and capital structure

Some see YC as the best of a generally weak incubator landscape, with strong brand and founder funnel.
Others criticize a shift toward hype, AI‑heavy, SF‑centric, young‑founder cohorts and a more predatory, self‑interested stance.
Discussion centers on SAFEs, preferred shares, liquidation preferences:
- Investors seek downside protection; employees and common shareholders often bear more risk.
- Tension between YC’s “founder‑friendly” narrative and standard investor protections is highlighted.

Lifestyle businesses, zombies, and investor preferences

Distinction made between:
- Lifestyle/small businesses that earn healthy, steady profits but don’t scale.
- Zombies that barely sustain founders and staff, can’t grow, and slowly burn remaining capital.
VCs prefer big wins or clear failures they can write off; steady but modest outcomes can trap capital for years.
Some argue traditional “small business” success is undervalued in tech culture despite being a rational goal for founders.

View on HN ↗ Original Article ↗

2025-01-26

Ask HN: Anyone else find LLM related posts causing them to lose interest in HN

Perceived Saturation and Fatigue

Many feel LLM/AI content is overwhelming on HN and across the web, crowding out “old-school” tech, niche projects, and diverse disciplines.
Complaints that posts repeat the same few themes: productivity hacks, thin SaaS wrappers over APIs, imminent AGI, and exaggerated claims.
Some see the discourse as grifty or pseudo‑religious, with output quality, hallucinations, and data/ownership issues hand‑waved away.

Comparisons to Past Tech Hype Cycles

LLM hype is compared to crypto/NFTs, blockchain‑for‑everything, prior AI waves, JavaScript framework explosions, social media, mobile, and cloud.
Some argue this is just another bubble that will burst; others think LLMs differ in scale and staying power.
A subset notes every cycle once felt “all‑encompassing” and eventually receded from the front page.

Views on Practical Usefulness of LLMs

Enthusiasts: LLMs are “one of the best hacker tools,” boosting coding productivity, explaining RFCs/papers, helping scientists, and widely adopted in workplaces (e.g., Copilot, agents).
Skeptics: gains are modest (e.g., minor productivity boost, good for boilerplate, bash scripts, simple frontend), with serious failures on complex, niche, or high‑stakes tasks.
Strong disagreement over whether current models “think” in any meaningful way; some dismiss AGI talk as hype, others see frontier models as brain‑like.

Impact on HN Culture and Discussion Quality

Perception that LLMs plus politics now dominate, lowering signal‑to‑noise and making HN feel more like Reddit.
Frustration that AI comments appear under unrelated posts and that nuanced or non‑AI discussions get crowded out or flagged.
Others enjoy HN’s AI coverage specifically because it’s deeper than most venues.

Economic & Hype Dynamics

Some cite huge valuations and GPU stock surges as evidence LLMs are here for decades.
Others counter that crypto also has massive market cap yet faded from HN; valuations are seen by some as proof of a bubble, not intrinsic value.
Observations that managers and VCs are especially susceptible to being wowed by demos and “AI features.”

Broader Concerns and Coping Strategies

Concerns about impacts on critical thinking, workers, environment, UI design (misused “conversational UIs”), and hiring expectations.
Some users filter AI topics, build custom HN frontends, switch to sites like Lobsters, rely on RSS, or take deliberate breaks.
Long‑time readers counsel that trends come and go; skipping threads and waiting out the cycle is a viable strategy.

View on HN ↗

2025-01-26

Explainer: What's r1 and everything else?

Creative writing, model size, and distillation

Several commenters focus on creative writing quality rather than math/coding.
DeepSeek-R1’s samples on a creative-writing benchmark are widely praised as unusually strong, with few “LLM quirks.”
People ask whether you can distill a huge “thinking” model like R1 into a small (e.g., 7B) model focused on writing and stripped of math/code.
Responses: you can bias/optimize toward writing via distillation, but you likely can’t “remove” math/code because reasoning skills are shared across domains.
One oddity noted: many different models independently name the main character “Rhys” in the same prompt; reason is unclear.

Reasoning, RL, and what R1 actually did

R1 is framed as showing that relatively simple reinforcement learning (RL) can drive large “reasoning” gains, versus more complex schemes like DPO or MCTS.
Others clarify that R1 combines RL with supervised fine-tuning on curated “correct” answers; later experiments suggest even the SFT part might be optional.
Multiple perspectives on “reasoning”:
- Pro: models like R1/o1/Gemini “think step by step” and achieve much better math/logic scores, so they are reasoning in a practical sense.
- Con: they are still just predicting tokens; chains-of-thought are learned patterns, not explicit logical inference, and may not match their internal decision process.

Benchmarks and ARC-AGI

The article’s claim that “crushing ARC-AGI means doing what humans do” is called a misinterpretation.
The benchmark’s creator is quoted as saying: passing shows non-zero fluid intelligence and ability to handle unfamiliar problems, but says little about how close to human intelligence the system is.
Commenters warn that misreading benchmarks is a common route to overclaiming “human-level” AI.

Exponential progress and self-improvement

The article’s flourish that AI abilities will grow “exponentially” draws substantial pushback.
Skeptical views:
- Tech progress typically follows an S-curve; LLM gains already seem to be slowing compared to early GPT jumps.
- Existing data mostly shows exponential cost/compute growth, not clearly exponential capabilities.
- Some argue “exponential” is used loosely rather than in a strict mathematical sense.
More optimistic views:
- Multiple new scaling paths (RL, data synthesis, better training) could accelerate progress beyond simple parameter scaling.
- Once AI substantially contributes to AI research and engineering, a self-improvement loop could yield exponential gains, at least for a while.
- Even without perfect exponential curves, near-term AGI is seen by some as plausible and societally transformative.

Open source, geopolitics, and competition

R1 is seen as a major open-source milestone, comparable in capability to top proprietary “reasoning” models, and valuable especially outside the US.
Some frame AI as part of a broader geopolitical “tech war” between the US, China, EU, Russia.
Others argue de-escalation would benefit everyone, and that for Europe in particular, having multiple strong global suppliers (including open-source Chinese models) is an advantage if it remains more a consumer than a producer.
There is anxiety about AI control concentrating in a few powerful private actors, and corresponding support for strong open(-ish) models to counterbalance them.

Hype, skepticism, and misc. points

Some commenters dismiss the R1 moment as incremental “hype” akin to a minor software patch; others counter that predictions of rapid progress from a few years ago have largely held up.
Clarifications:
- R1’s oft-quoted low training-cost figure is questioned; commenters note the paper doesn’t state that number and the source is unclear.
- Claims that AI is already “self-improving” are debated: current systems can help design better systems, but humans still appear to be the main driver and many bottlenecks (compute, energy, infrastructure) are external.
Several participants wish for a stable, evolving “ELI5” guide to LLM concepts and acronyms, reflecting how hard it is to keep up with the pace of change.

View on HN ↗ Original Article ↗

2025-01-26

AI slop, suspicion, and writing back

What “AI slop” is and why people care

Many define AI slop as low‑effort, mostly AI‑generated content pushed into human spaces without disclosure.
Objections are less about raw quality and more about insincerity, plagiarism-by-proxy, and the imbalance of effort between writer and reader.
Some argue that even “high‑quality” AI writing is problematic if it displaces genuine human expression and learning.

Human vs AI slop

One side says bad writing is bad regardless of source; readers should judge content, not provenance.
Others say human “slop” is usually easier to spot and bounded in volume, whereas AI slop is scalable and attention‑DoS‑like.
Several emphasize “vibes”: even flawed human writing carries effort, individuality, and social meaning that AI text lacks.

Detection, heuristics, and false positives

Commenters ridicule weak tells like em‑dashes or smart quotes.
Simple detectors and heuristics are shown to misclassify both Wikipedia prose and synthetic datasets.
Many worry about false positives: academic penalties, account bans, or reputational damage for humans misidentified as bots.
Others say in purely personal filtering, they’re fine with aggressive blocking, even if real humans get filtered out.

Non‑native speakers and translation

Some find non‑native “errors” charming and more meaningful than polished LLM corporate‑speak.
Others, especially non‑native writers, want grammatically correct output and see AI as a useful helper.
There is strong pushback against undisclosed LLM‑mediated communication and automatic translation, especially where nuance and domain details matter.

Authorship, art, and ethics

Many insist authorship and intentionality matter even if AI can match or exceed human quality.
Others say that in principle, if an AI novel were as good as a classic, only quality should matter.
Several draw analogies to supporting local shops over Walmart: refusing AI art can be a deliberate choice to sustain human creators.

Writing “for AI” and data poisoning

Some promote writing to influence future LLMs; others deride this as capitulating to exploitative training practices.
A few experiment with planting absurd, obviously false biographies to see if they get absorbed into models.
Another camp prefers “poisoning the well” of training data over trying to hide content behind walled gardens.

Practical use of LLMs

Many use LLMs as editors, translators, or structure‑generators, then heavily revise.
There is broad condemnation of unedited copy‑paste into public or professional contexts.

View on HN ↗ Original Article ↗

2025-01-26

Emerging reasoning with reinforcement learning

What the paper/post is doing

Describes reproducing DeepSeek R1–style “reasoning via RL” on a small (~7B) math model with ~8k problems.
Uses simple reinforcement learning: reward only on final correctness, no explicit supervision of intermediate steps.
Result: chain-of-thought (CoT)–style reasoning “emerges” in a model that previously did not show it, but overall capability is still limited by small size.

Chain-of-thought and why it helps

CoT = having the model “think out loud” step by step before answering.
Previously taught mainly via supervised fine-tuning on hand-written reasoning traces, which are expensive.
Discussion of why CoT works:
- More tokens = more computation time.
- Breaking problems into smaller substeps makes search in solution space easier.
- RL can nudge models to take longer, more cautious paths when harder tasks require many small corrective steps.
Some argue this is best seen as iterative search through latent space rather than “human-like reasoning.”

RL vs SFT vs distillation

DeepSeek’s own paper emphasizes distilling reasoning patterns from a large RL-trained model into smaller ones via SFT; they did not RL-train the small distilled models.
Debate:
- One side: distillation from a stronger reasoning model beats doing RL directly on small models.
- Other side: this work shows small models can learn CoT via RL alone; question becomes how small, how cheap, and on what tasks.
Concerns: RL is compute-heavy compared to SFT at equal data; RL-tuned models can become “stubborn” and ignore prompts outside their reward-shaped niche.

Does this show “real reasoning”?

Enthusiasts: emergent CoT and self-correction on hard math are strong evidence of genuine reasoning, undermining the “stochastic parrots / mere regurgitation” view.
Skeptics:
- Argue it’s still token-level pattern generation, akin to structured search or calculators plus fuzzy lookup.
- Note lack of embodiment, motivation, episodic memory, and continuous online learning.
- Emphasize that words like “reason,” “emergent,” “intelligent” are being stretched; much of the debate is about definitions.

Broader implications and open questions

If RL can cheaply boost reasoning on any capable base model, “reasoning models” may become a commodity, eroding proprietary moats.
Potential to apply similar RL setups to non-math domains (code, science, finance, medicine), but this is speculative in the thread.
Open questions raised:
- How to design reward functions that reliably elicit desired emergent behaviors.
- Whether similar methods power proprietary models (o1/o3, Gemini “thinking” variants) — currently unclear.
- How far this moves systems toward general-purpose reasoning and whether AGI is on the visible trendline.

View on HN ↗ Original Article ↗

Hacker News, Distilled

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics