Stories - Page 303 | HN Distilled

2025-09-17

DeepMind and OpenAI win gold at ICPC

Overall Reaction to the ICPC Performance

Many see DeepMind/OpenAI’s ICPC gold-level results (plus previous IMO/IOI wins) as a major milestone, showing that current models can now solve problems that once required top competitive programmers.
Others frame the community skepticism (“wall,” “bubble,” “winter”) as a reaction to hype cycles, limited practical payoff so far, and opaque methodology rather than to the raw capability itself.

Structured Contests vs Real-World Software

Repeated theme: ICPC/IMO/IOI problems are highly structured, well-specified, self-contained puzzles; success there does not imply competence on messy, ambiguous real-world tasks.
Several commenters report that the same models that ace contests still struggle badly with legacy codebases, fragile test suites, and multi-file context—e.g., “fixing” tests by deleting them or duplicating methods.
Competitive programming is compared to chess/Go: impressive, but historically such breakthroughs haven’t directly translated to broad AI utility.

Compute, Cost, and Fairness of Comparison

Concern that these results rely on extreme compute: many parallel instances, long “thinking” times, and possibly expensive reasoning models acting as selectors.
Some question whether this is more like brute-force search plus pattern-matching than human-like insight, and whether the energy and hardware requirements are comparable or remotely scalable.
Others argue what matters is wall-clock time and (eventually) cost; if an AI system can beat top teams in 5 hours, how it’s internally parallelized is largely irrelevant.

Reproducibility, Prompting, and Accessibility

Multiple users tried giving ICPC problems to GPT‑5 and got failures or empty “placeholder” code, highlighting a gap between lab demos and consumer experience.
Discussion of routing between “thinking” and non-thinking variants, and the need for elaborate scaffolding, multi-step prompting, and solution selection to reach top performance.
This raises the “shoelace fallacy”: if you need expert-level prompting to get “PhD-level” results, non-experts will understandably conclude the models are weak or stagnating.

Training Data, Memorization, and Benchmarks

Some see contest success as largely due to training on massive archives of LeetCode/Codeforces-like material—“database with fuzzy lookup” rather than deep reasoning.
Others counter that top human contestants also heavily internalize patterns and “bags of tricks,” so dismissing models as mere look-up engines undersells the achievement.
Debate over whether ICPC vs IOI problems are harder, and what medal equivalences imply, but consensus that ICPC World Finals problems are genuinely difficult.

Bubble, Scaling Limits, and Infrastructure

Several commenters point to delayed flagship models, modest benchmark gains vs cost (e.g., ~10% over previous reasoning models), and deferred releases (DeepSeek, Mistral) as reasons to suspect either a “bubble” or at least diminishing returns at current scales.
Others focus on physical constraints: data centers demanding town-scale water and decade-scale grid upgrades, suggesting a looming wall in energy and infrastructure even if algorithms keep scaling.

Trust, Data, and Pushback Against AI Firms

Strong undercurrent of distrust toward large AI companies: training on copyrighted material without consent or compensation, centralization of power, and aggressive monetization.
Some advocate “poisoning” web content or withholding knowledge to resist free extraction of human expertise for models that may later undercut those same workers.
Counter-voices argue that sharing knowledge has historically not always been transactional and that analogies to piracy/copyright are being stretched.

Future Impact and Interpretation

One camp emphasizes that, regardless of caveats, we now have systems that can solve problems previously reserved for the top ~1% of algorithmic programmers; as costs fall, this will likely commoditize that capability across domains.
Another camp stresses that no “killer app” has yet emerged; contest wins are notable but still feel orthogonal to many hard open problems (e.g., robust real-world agents, profound new scientific discoveries).
Overall, the thread oscillates between “this is quietly revolutionary” and “impressive but over-marketed, with unclear real-world payoff and heavy hidden costs.”

View on HN ↗ Original Article ↗

2025-09-17

Anthropic irks White House with limits on models’ use

Perception of Anthropic’s stance

Many commenters view Anthropic’s refusal to allow domestic surveillance uses as positive and unusually principled, especially compared with other tech firms’ compliance with government demands.
Others are skeptical, seeing it as either a temporary stance that will fold under pressure or simply a negotiating tactic that will vanish when the price is right.
Some note that Anthropic’s security clearances for classified use may derive precisely from its focus on safety and constraints.

Government power and political framing

A substantial subthread argues whether the current US government is effectively dictatorial, with some claiming all three branches are aligned to enable authoritarian behavior and others dismissing this as semantic or exaggerated.
Several people predict that in the current climate a company that denies the federal government will face retaliation (soft blacklisting, pressure on suppliers, lost contracts).

SaaS, local-first, and usage restrictions

Anthropic’s control over use via SaaS prompts renewed calls for “local-first” software and on-prem models to avoid remote monitoring and bans.
Others point out that on-prem software also comes with EULAs containing usage limits; enforcement is just weaker than with SaaS.

Contracts, ToS, and legal nuance

Multiple commenters say the article’s claim that agencies might be “surprised” by restrictions is wrong: government contract teams typically scrutinize terms in detail.
Discussion covers contracts that incorporate mutable ToS by reference, notification of ToS changes, and differences between US and Swedish approaches to what constitutes a valid contract.
Examples from Java, Apple iTunes, and JSLint illustrate that “not for nuclear/weapon/safety use” clauses and ethical use restrictions are long-standing.

Critique of the Semafor article

Several see the piece as a hit job: it misstates how common use restrictions are, downplays safety concerns, and frames “we can’t use it for surveillance” as an unreasonable burden.
The portrayal of OpenAI’s “unauthorized monitoring” language as a clear carve‑out for law enforcement is mocked as tendentious and logically ambiguous.

Government use of AI and control

Commenters debate whether agencies should be sending sensitive prompts to external APIs versus running models internally, and worry about any private vendor having enough visibility to enforce usage rules.
Reference is made to FedRAMP and specialized government cloud regions as the current compromise.
Some argue the government could and should train its own unrestricted models if it wants full control, rather than demanding vendors loosen safeguards.

Free market, ethics, and surveillance

There is tension between “realist” views that companies must comply or be punished and moral arguments that refusing surveillance work is desirable even if it hurts business.
A few wish all major AI providers would collectively refuse defense/police/military or surveillance use, while others doubt this is feasible in today’s political and economic environment.

View on HN ↗ Original Article ↗

2025-09-17

DeepSeek writes less secure code for groups China disfavors?

Plausibility of emergent political bias in code

Several commenters think it’s technically plausible: if a model is tuned to be strongly “pro-China” or to follow CCP narratives, that stance can bleed into unrelated tasks, including coding.
Others note humans routinely conflate “morally bad” with “practically bad”; LLMs trained on such discourse may similarly associate disfavored groups with lower quality or more negative behaviors.
Some suggest testing whether degraded output is specific to code or also appears in text responses on topics like Tiananmen, Xinjiang, Hong Kong, etc.

Methodology gaps and skepticism about the article

Many criticize the Washington Post piece and CrowdStrike for:
- No prompts, no methodology, no code samples, no definition of “less secure.”
- No comparison against other models under identical tests.
This is seen as classic “AI FUD” and/or geopolitical propaganda, especially given CrowdStrike’s and WaPo’s perceived histories.
Several argue that without a public report or paper, the claims deserve low confidence.

Replication attempts and preliminary observations

Multiple users tested DeepSeek via web UIs:
- Prompts mentioning Falun Gong often triggered refusals, while nearly identical prompts for Mormon or Catholic groups were answered normally.
- This reproduces the refusal aspect of the article, but not yet the “less secure code” claim.
One user’s toy crypto test: same prompt for “Taiwan government” and “Australian government” produced two weak schemes, with Australia’s clearly stronger. Both came with warnings not to use custom crypto.
There is confusion over whether testers used the official chat site, third‑party frontends, or the bare model via API, and how much front-end guardrails vs base model are responsible.

Alternative explanations: censorship, data bias, alignment artifacts

Some argue this could arise unintentionally:
- Training data heavily featuring sanctions/rejections of certain entities (e.g., Iran, Falun Gong) may generalize into broader rejection or degraded help.
- Chinese models are mandated to enforce ideological red lines; fine-tuning for censorship can have off‑target effects elsewhere.
Others point to research showing that fine-tuning on insecure code can shift models toward more unethical behavior, suggesting subtle training shifts can have surprising side effects.
A few emphasize that simply adding irrelevant group labels to the prompt can change performance (“context confusion” effects like “cat facts” or “Eagles fan” jailbreaks).

Comparisons with Western models and safety norms

Commenters note Western models already refuse help to groups like ISIS or Hamas; Chinese models refusing help on Falun Gong is seen as analogous censorship.
Many insist the “proper” safety behavior is:
- Either reject the request outright for all disallowed groups, or
- Provide equal-quality help without discrimination—not silently degrade quality.
Some speculate similar geo‑ or ideology‑based biases may already exist in US models, but this is untested in the thread.

Broader themes: propaganda, trust, and experimentation

Strong views that the story may be part of a broader anti‑China narrative and potential push to ban Chinese LLMs from US markets.
Others lament a “post‑truth” environment: declining trust in media and experts, but also widespread knee‑jerk dismissal without attempting replication.
A few propose more rigorous community experiments:
- Fixed prompts across multiple groups (CCP-disfavored, neutral, pro‑China, etc.).
- Use static analysis/security tools or independent LLM “judges” to score vulnerabilities.
- Run across multiple models (Chinese and Western) with transparent reporting.
Overall sentiment: the refusal behavior is unsurprising and replicable; the “less secure code for disfavored groups” claim remains unproven and methodologically opaque, but technically possible.

View on HN ↗ Original Article ↗

2025-09-17

Not Buying American Anymore

Scope: “Don’t buy American” vs “Don’t buy anti‑consumer”

Many commenters argue the post conflates “American” with “anti‑consumer,” even though similar practices exist in Japan, Korea, Sweden, etc.
Several interpret the core message as “don’t support oligarchic, anti‑consumer systems,” not literally “never buy US-made things.”
The author in the thread clarifies the target is the US regulatory/political environment that rewards bad behavior, not every US company individually.

Global nature of anti‑consumer practices

Examples from non‑US firms: Samsung throttling devices, Japanese printer vendors blocking third‑party ink, a Swedish DAW with restrictive licensing, BMW “renting” software features.
This weakens the argument that US culture uniquely produced these practices, but some insist the US still sets the global tone because it’s the largest and most influential market.

Responsibility: corporations, governments, and voters

One camp blames corporations for profit‑seeking and governments (especially US) for gutting regulators and enabling “enshittification.”
Others insist citizens share responsibility: they elect leaders, don’t stay civically engaged, and often tolerate or even reward anti‑consumer behavior.
Counterpoint: voters often face only “anti‑consumer jerk #1 vs jerk #2,” limiting meaningful democratic choice.

Feasibility and logic of a personal boycott

Skeptics call the boycott illogical or symbolic: global supply chains blur what “American” means, and there are few realistic non‑US alternatives for many tech products.
Supporters frame it as a signal, not perfectionism: reduce support for the largest offending market to create pressure and send a message, even if one still buys some problematic products.
Critics highlight perceived inconsistency (e.g., still buying from a non‑US company that behaves badly) and label it virtue signaling; supporters reply that trying to reduce harm is better than doing nothing.

Consumer protection and political context in the US

Commenters note that the US once had a stronger pro‑consumer movement and agencies (FTC, CFPB, etc.), but their power has been eroded by corporate influence and partisan politics.
There is debate over how pro‑consumer recent administrations actually were and whether either major party meaningfully defends regulators.

Role of influencers and tone

The author cites a prominent right‑to‑repair YouTuber as inspiration; some praise his awareness‑raising, others accuse him of sensationalism or hypocrisy.
Reactions to the post range from “measured and important” to “evidence‑light rant,” with some focusing on logical gaps more than on the underlying concern about creeping anti‑consumer norms.

View on HN ↗ Original Article ↗

2025-09-17

How to motivate yourself to do a thing you don't want to do

Why do things you don’t “want” to do?

Several commenters distinguish between current feelings vs “ultimate” or future preferences: you may not want to exercise or do taxes now, but you want the future outcome (health, avoiding legal trouble, being able to eat).
Some argue if you ever do it, then on some level you do want it; others point to clear cases (taxes, boring jobs) where it’s obligation, not desire.
There’s debate over whether procrastination is personal weakness vs a deeper ambivalence or environmental issue.

Framing goals: avoidance vs aspiration

Framing goals positively (“be strong and light”) is seen as more motivating than avoidance framing (“not weak and overweight”).
Focusing on consequences of not doing the task can help some; others say this just triggers anxiety or daydreaming.

Motivation, discipline, habits, and environment

A strong camp says “motivation is unreliable; action and discipline must come first,” often via tiny steps, time-boxing, or “just start” tactics.
Others emphasize habit formation: make tasks automatic (like brushing teeth), reduce friction (gear ready, do it first thing in the morning), and integrate effort into daily life (active commuting, sports with kids).
Environment tweaks (removing distractions, blocking apps, cleaning the desk) help some but are not sufficient alone.

Rewards, “dopamine stacking,” and enjoyment

The article’s suggestion to pair unpleasant tasks with entertainment (music, shows) is criticized by some as “dopamine stacking” that could raise your baseline and reduce intrinsic motivation.
Others push back: listening to music while working or exercising is framed as normal distraction or focus aid, not pathological.
There’s disagreement over using food rewards (e.g., donuts after workouts), with a long tangent on whether exercise can “offset” high-calorie foods and whether fitness vs weight loss should be the primary aim.

ADHD and neurodiversity

Multiple participants with ADHD say standard motivation tips rarely work; their problem is executive dysfunction, not lack of desire.
Analogies like “you’d do it for $100M” are criticized as ableist and unrealistic; exceptional incentives don’t generalize to daily life.
Advice: treat neurotypical productivity advice skeptically, consider medical/psychological help, and recognize energy limits.

Concrete strategies and workarounds

Common tactics:
- Break tasks into very small, “crappy first pass” chunks.
- Use structured procrastination: do task A to avoid even worse task B.
- Enlist social pressure (buddies, public commitments, events).
- Allow yourself to “do nothing but the task” (or literally nothing) until boredom makes the task preferable.
Some suggest simply not doing certain tasks and accepting consequences, or re-examining whether they align with one’s real values.

Skepticism and meta-discussion

Some dismiss generic self-help as interchangeable with AI-generated advice and recommend seeing professionals for persistent issues.
There’s criticism of long personal anecdotes in blog posts and of online “motivation” creators who must constantly produce borderline-pop science content.

View on HN ↗ Original Article ↗

2025-09-17

YouTube addresses lower view counts which seem to be caused by ad blockers

What changed in view counts

Many creators report sharp drops in desktop view counts on a specific date, while ad revenue stayed flat and mobile views were unchanged.
A widely cited GitHub issue indicates EasyPrivacy added a YouTube metrics endpoint (/api/stats/...) to its tracking blocklist; that endpoint is used to attribute views, so adblocked plays now often don’t increment the public counter.
YouTube’s official note says ad blockers and “content blocking tools” can affect reported views, especially for channels whose audiences use them heavily.
Several commenters are surprised YouTube relies on client‑side calls for public view counts instead of purely backend logging, calling it fragile and easily broken.

Effects on creators and revenue

Creators say YouTube ad revenue hasn’t dropped in line with views, implying the missing views were from users who were never monetized anyway (adblock users).
However, lower public view counts hurt:
- Negotiating power and pricing for in‑video sponsorships.
- Channel growth and recommendations, if the algorithm heavily weights views.
Tech‑oriented channels (with high adblock usage) appear hardest‑hit; some worry this systematically disadvantages more technical or “FOSS‑y” audiences.
There’s concern Premium users with adblockers may undercount as well, potentially reducing payout from subscriptions.

Ad blockers, tracking, and ethics

One camp: blocking both ads and tracking (including view metrics) is exactly what privacy lists promise; if creators lose views, that’s a platform or business‑model problem, not the user’s.
Another camp: viewers who block everything but keep using the service are “leeching”; the “moral” options are to pay (e.g., Premium) or stop using YouTube.
Counter‑argument: the modern ad ecosystem is scam‑ and malware‑ridden; adblocking is basic self‑defense. Users are entitled to control what runs on their machines and what data is sent.

YouTube’s incentives and suspected strategy

Some suspect YouTube is happy to let this play out because:
- It turns creators against adblock users without YouTube directly attacking them.
- Undercounted views devalue off‑platform sponsorships (from which YouTube earns nothing) relative to YouTube’s own ad products.
Others think it’s more likely an uncoordinated mess: anti‑tracking lists shifted, internal teams didn’t realize, and YouTube’s creator comms are characteristically vague and late.

Recommendation quality and user behavior

Many report watching less YouTube due to:
- Aggressive pre‑rolls and anti‑adblock popups.
- Poor recommendations, AI‑generated “slop,” ragebait, and Shorts.
Others say recommendations are excellent if you rigorously avoid low‑quality content and use “don’t recommend” tools.
Some users report replacing many “how‑to” videos with LLM answers and using alternative clients (NewPipe, Freetube, SmartTube, patched apps) to escape ads and Shorts.

Technical debates about counting views

Server‑side counting via CDNs and segmented streams is seen as non‑trivial (buffering, skipping, bots, shared IPs), which partly explains client‑side view APIs.
Critics respond that if YouTube can track watch history and Premium usage, it could design a more robust, less blockable view metric—if it wanted to.

View on HN ↗ Original Article ↗

2025-09-17

Firefox 143 for Android to introduce DoH

Why browser-level DoH on Android?

Many argue the main reason is privacy from the OS vendor (Android = Google). Users may prefer to trust a browser over the OS stack.
Browser-level DoH reduces the number of parties that see DNS queries (no OS, VPN app, or OEM resolver in the path).
Android’s DNS features are version- and vendor-dependent; not all devices or ROMs support DoH/DoT consistently.
Firefox can offer clear UI controls for enabling/disabling DoH and choosing resolvers, which Android typically does not.
Firefox uses a curated list of “trusted recursive resolvers” with contractual privacy guarantees, unlike opaque OS behavior.

Privacy, leaks, and limitations

Several comments point out that DoH alone doesn’t hide which site you visit: IPs and TLS metadata still leak information.
Others note that Firefox pairs DoH with Encrypted Client Hello (ECH), which together better conceal domains from on-path observers.
Android VPN and “privacy” features have had DNS and connectivity-check leaks, making in-app DoH attractive for those who don’t trust the OS.

DoH providers, centralization, and trade-offs

Suggested providers: Quad9, Mullvad, NextDNS, ffmuc, Wikimedia’s experimental service, self-hosted DoH (with caveats).
Quad9 is praised for global coverage and strict IP-handling policies; Mullvad for privacy/ad-blocking but limited geography.
Cloudflare’s short-term logging and sampled packet retention raise concerns for some; others see that as acceptable.
Centralization is a major worry: defaulting to a few big DoH resolvers shifts visibility from ISPs to large global players.
Techniques like splitting queries across multiple resolvers are discussed but may unintentionally leak more information per “site.”

Impact on local/self-hosted DNS

Operators of home or custom DNS lose transparent control when browsers bypass DHCP-provided resolvers via hardcoded DoH.
This breaks internal split-horizon DNS and local overrides unless clients are explicitly configured.
RFC 9463 is mentioned as a mechanism to advertise DoH endpoints via DHCP, but tooling support is still lacking.

DoH vs DoT and technical details

Android is noted as primarily supporting DoT, not DoH; Firefox chooses DoH because it blends into normal HTTPS (port 443) and circumvents ISPs that block third-party DNS.
Some note that, since Firefox is a browser and the DoH spec’s lead author had browser background, HTTP tooling and expertise made DoH a natural fit.

Disabling or controlling DoH on networks

Network operators wanting to block DoH face difficulty because it’s just TLS on port 443.
Options mentioned: IP/SNI blocking of known DoH hosts, or full TLS interception and strict egress firewalls; both are imperfect or heavy-handed.

Firefox for Android UX and alternatives

Opinions on Firefox Android performance are split: some report severe lag and poor background behavior; others find it fine even on older hardware.
Many continue using it solely for full uBlock Origin support.
Alternatives discussed: Brave, Orion (iOS), Lemur, Kiwi, Vivaldi, Samsung Browser with adblock extensions, and Edge Canary with extension support.
Some prefer DNS-level adblocking (Pi-hole/AdGuard Home + VPN/Tailscale), while others say this is less effective than in-browser blocking.

View on HN ↗ Original Article ↗

2025-09-17

Bringing fully autonomous rides to Nashville, in partnership with Lyft

Waymo–Lyft partnership & geographic expansion

Commenters see this as Waymo’s first non-pilot commercial rollout with Lyft, and notable because it’s non‑exclusive: riders can use either the Waymo or Lyft app.
Many view this as an “inflection point” in coverage: SF, SFO, Phoenix, LA, Austin, Atlanta, Nashville, Silicon Valley suburbs, plus testing or hiring in other US cities and Tokyo.
Some locals (e.g., Atlanta, Nashville) report seeing rapid growth of Waymo vehicles and say they’re “no worse” than human drivers, sometimes safer or more comfortable.

Economics, costs, and remote operations

One camp believes Waymo is approaching or at break‑even in dense markets: high utilization, higher per‑mile pricing than Uber, and falling lidar/hardware costs.
Skeptics highlight expensive vehicles, hardware tariffs, ongoing R&D, and unknown spending on remote assistance and mapping; they doubt “few months” payback and expect multi‑year amortization.
Back‑of‑envelope analysis suggests labor is largely fixed engineering cost, with relatively low marginal cost per additional vehicle. Remote assistance ratios are guessed between 1:10 and 1:100 cars.
Waymo hints at “very positive” unit economics but doesn’t disclose numbers; some see this secrecy as competitive discipline, others as a sign they’re still not clearly profitable.

Uber/Lyft’s role and strategic risk

Waymo benefits from ride‑hail platforms for instant distribution, overflow coverage by human drivers, and avoiding building all operations (support, payments, regulatory know‑how) itself.
Platforms gain more “drivers” and can keep serving rides even if AV fleets are small at first.
Several commenters argue Uber/Lyft are ultimately commoditized: they don’t own cars or core AV tech and could be reduced to low‑margin fleet management or licensed operators.
Others see potential acquisitions (e.g., Lyft as a cheap channel) but question Lyft’s “moat” beyond operational knowledge and regulatory relationships.

Competition: Tesla, Zoox, others

Some users are bullish on Tesla Robotaxi, citing early Bay Area rides and Tesla’s hardware scale; others ridicule it as years behind Waymo and primarily stock‑price theater.
Zoox and Chinese players (Pony.ai, Baidu) are mentioned as serious long‑term competitors, though US market access for Chinese firms is doubted.

Societal impact, transit, and labor

Strong thread debating whether autonomous taxis solve real problems versus just entrenching car‑centric cities.
Critics argue trains, trams, and buses are more efficient for traffic, environment, and safety; AVs may worsen congestion via empty “deadhead” miles.
Supporters counter that US public‑transit build‑out is politically and financially broken; AVs could pragmatically leapfrog those constraints and improve safety and comfort.
Significant discussion of autonomous buses: technically easier and could enable higher frequency, but blocked by driver unions, security/cleanliness needs, and politics.
Broader concerns: privatization failures, dual‑use (warfare) worries, and concentration of power in a trillion‑dollar mobility monopoly.

Ownership and user experience

Many riders like driverless rides for safety, comfort, price, and not dealing with human drivers.
Others dislike being surveilled or “rated” and prefer personal cars or rentals.
Some hope for individually owned self‑driving cars eventually; others think ubiquitous robotaxis will make ownership a luxury or niche convenience (e.g., storage, family gear, home backup battery).

View on HN ↗ Original Article ↗

2025-09-17

Apple Photos app corrupts images

Import corruption & evidence

The issue appears when importing from SD cards/cameras into macOS Photos, especially with Olympus/OM System RAW (ORF), but some report corruption with iPhone/iCloud-only workflows too.
Checksums differ between source and imported files; binary diffs show large contiguous blocks (multiples of 512 bytes) being replaced, not just single-bit flips.
The author says they swapped essentially all hardware (laptop, camera, etc.) and still reproduced the problem, pointing strongly at Photos/import software rather than hardware failure.
Several users report milder artifacts (e.g., green lines, flipped images) but visually intact files; others have completely unreadable or partially overwritten images.

Suspected root cause

Many commenters think it’s an import-pipeline bug in Photos: a concurrency or buffering issue in the extra work done on import (merging RAW+JPEG, previews, database writes, optional delete-on-import).
The 512-byte granularity points some to a storage or filesystem-level corruption path; others still recommend RAM/disk tests and checking APFS block sizes.
A minority argue it could be OM’s USB implementation or SD cards, but counterexamples from non-OM cameras and iPhones weaken that explanation.

Workflows, mitigation & backups

Strong consensus: never use “delete after import” from the card/camera; only erase cards in-camera after verified backups.
Recommended workflows:
- Copy from SD to local disk first, then import into Photos/Lightroom/Darktable.
- Keep multiple copies (local + NAS + cloud), keep SD cards until off-device backups exist, sometimes even treat SDs as write-once archives.
Tools mentioned: Image Capture, Darktable, Lightroom, Digikam, PhotoSync, Immich, PhotoPrism, Landrop/LocalSend, osxphotos, PhotoRec/DiskDrill for recovery.

Apple software quality & bug handling

Multiple anecdotes of data or metadata integrity issues across Apple apps (Photos, Image Capture, Music/iTunes, Notes, Reminders, iCloud Drive, Maps).
Several describe iCloud Photos corrupting previously good images or making them unexportable.
Reporting bugs via Feedback Assistant/Radar is widely described as frustrating: demands for “example projects,” long silences, low priority for long-shipped bugs, and triage overwhelmed by volume.
Some ex-insiders and QA engineers note systemic underinvestment in testing and a culture that tolerates long-lived bugs unless they generate public backlash.

Trust, lock-in & alternatives

Many no longer trust Apple Photos/iCloud as the sole repository for irreplaceable images, despite paying for iCloud tiers; they emphasize owning flat files and independent backups.
Some keep Photos only as a front-end viewer and manage masters with open-source tools on local or self-hosted storage.
A few downplay risk, noting years of trouble-free imports from other brands, suggesting the bug might be rare or source-specific.

Miscellaneous

Several commenters find the “tenderlovemaking.com” domain amusing or problematic for work filters, sparking a side discussion about quirky tech-site names.

View on HN ↗ Original Article ↗

2025-09-17

Determination of the fifth Busy Beaver value

How BB(5) Was Determined

Direct “run them all and see” is impossible because you can’t in general detect non-halting by brute force.
The search space of 5-state Turing machines was first reduced using Tree Normal Form, from ~1.7×10¹³ raw machines to ~1.8×10⁸ “essentially different” ones (reachable states canonically ordered, symmetries reduced).
Machines were then passed through a pipeline of deciders:
- Simple loop detection (“loops”) plus short simulations handled the vast majority.
- More sophisticated abstract-interpretation deciders (NGram CPS, RepWL, FAR, WFAR) proved non-halting for almost all remaining machines by over-approximating reachable configurations and showing none can reach a halting state.
- Only 13 “sporadic” machines needed bespoke, hand-crafted non-halting proofs.
The longest halting machine runs for 47,176,870 steps, establishing BB(5).

Brute Force vs Uncomputability

Commenters stress that the Busy Beaver function as a whole is uncomputable, but specific small values (like BB(5)) can still be determined with enough structure and proof.
There is no universal algorithm deciding halting for all Turing machines (halting problem), but for any fixed finite class (e.g., up to 5 states) a specialized decider can exist.
Some argue that, in a broad sense, this is still “brute force”: enumerate machines and proofs within a formal system; others reply that the key work is in designing powerful deciders and proof strategies, not naive enumeration.

Limits of Proving Busy Beaver Values

For any fixed, sound, recursively axiomatizable theory, there exists some N beyond which that theory cannot prove exact BB(N) values; this is a Busy-Beaver-flavored incompleteness phenomenon.
One view: there is no absolute N beyond which BB(N) is unknowable “in principle”; you can always strengthen your axioms. Another view emphasizes that for every such theory, independence eventually occurs.
Known results (cited in the thread) show certain large BB(k) values (e.g., around k≈745) are already independent of ZFC; the suspected “practical” barrier might be much smaller, possibly even low double digits.

Proof Assistants and Rocq

All deciders and sporadic proofs were formalized in the Rocq (Coq) proof assistant.
This ensures the full BB(5) classification is machine-checked: deciders are proved correct with respect to mathematical Turing machines, then applied to the entire search space.
Verifying the resulting proof takes under an hour on a multi-core laptop; the exploratory search and development of deciders took far more computational and human effort.
There is discussion of alternative assistants (e.g., Lean, Dafny), and of this work as part of a broader trend toward formalized, collaborative mathematics.

Online Collaboration and Related Communities

The project is highlighted as a large, distributed, internet-native collaboration, closer to “formalized research” than distributed number-crunching.
Comparisons are made to:
- Classic distributed projects (DES/RSA challenges, distributed.net), whose original goals are now largely historical.
- Modern formal-math collaborations in Lean using proof “blueprints”.
- Niche online communities around Conway’s Game of Life and “googology” (very large numbers).

Implications and Practical Value

Most participants see this as highly theoretical, with no direct applied payoff; benefits lie in:
- Sharpened methods for reasoning about program behavior and partial halting-problem “taming” (e.g., static-analysis techniques, abstract interpretation).
- Stress-testing and improving proof assistants and libraries.
- Deepening understanding of the limits of formal systems and computability.
Some push back on the idea of “purely useless” math, citing historical cases where seemingly abstract work later became foundational (e.g., number theory, Hardy’s work).
Others characterize the achievement as more of a heroic, intricate classification effort than a new conceptual breakthrough—still “beautiful” and inspiring.

Connections to Collatz and BB(6)

The 5-state champion is described (elsewhere, and referenced here) as computing a Collatz-like process; commenters note similar behavior in candidate 6-state “Antihydra” machines.
This raises the idea that Collatz-style dynamics are a good blueprint for constructing long but terminating computations.
BB(6) is discussed only via bounds and scale:
- Published lower bounds already involve mind-boggling fast-growing constructions (towers/Knuth arrows repeated enormous numbers of times).
- An exact BB(6) is believed far beyond what can be written down or feasibly proved, even if not yet formally ruled out.

View on HN ↗ Original Article ↗

2025-09-17

EU Chat Control: Germany's position has been reverted to undecided

Mass surveillance vs. crime prevention

Many argue Chat Control is primarily mass surveillance, not a serious tool to catch criminals.
Others say the intent is crime-fighting, but the effect is disproportionate: scanning everyone to find a tiny fraction of offenders.
Statistical arguments highlight that, even with optimistic assumptions, false positives would massively outnumber true positives, overwhelming police and harming innocents.

False positives, classifiers, and real-world harm

Some note you can tune detection systems to reduce false positives, but others counter that real deployments consistently err on the side of over-reporting.
Examples are cited where automated CSAM detection flagged benign family or medical photos, nearly resulting in prosecutions.

From targeted wiretaps to permanent mass scanning

Critics stress that traditional wiretaps required probable cause, court orders, were labor-intensive, and not retroactive.
Chat Control is framed as “wiretapping everyone all the time,” automated, proactive, and capable of creating long-lived records.
Breaking or bypassing end-to-end encryption is seen as introducing major security and economic risks.

Authoritarian drift and historical context

German history (Third Reich, Stasi) is invoked as a warning; some express disbelief Germany is not leading opposition.
Others argue that such powers will inevitably be used on everyone, and can easily be repurposed for political repression or “wrongthink.”

EU law, constitutions, and fundamental rights

Debate over whether EU law can override national constitutions and privacy guarantees is intense and unresolved in the thread.
The EU Charter’s privacy rights are noted as having broad law-enforcement carve‑outs, prompting doubts about their real protective value.

Democracy, accountability, and repeated pushes

Many see repeated attempts to pass similar measures as “p‑hacking democracy” — keep trying until it passes.
Others respond that politicians are elected and this is therefore formally democratic; if people cared, they’d vote differently.
There’s frustration with the European Commission’s agenda-setting role and the difficulty of “voting out” key actors.

Country roles and precedents

Denmark is repeatedly mentioned as a strong proponent; Germany’s wavering is seen as decisive for the Council blocking minority.
The UK’s Online Safety Act is cited as a functional analogue: scanning is already law there, only paused as “not yet technically feasible.”

Proposal details and double standards

A proposed exemption for state, military, and law‑enforcement accounts is viewed as a red flag: if the system is so safe, why exclude those most sensitive users?
This is taken as evidence of both insecurity (new attack surface) and expectation of false positives that would be intolerable for officials.
Limited 6‑month retention of flagged material is still seen as a dangerous “paper trail,” especially in future political turmoil.

Effectiveness and easy circumvention

Many point out that serious criminals can trivially evade scanning (alternative apps, custom tools, extra encryption layers, encrypted archives).
The likely outcome, in this view: ordinary citizens are surveilled; sophisticated offenders move elsewhere.

Broader surveillance‑state pessimism and EU skepticism

Some believe the surveillance state is now inevitable, driven by both governments and large tech platforms.
The controversy fuels rising Euroscepticism and even calls for exiting the EU, though others counter that without the EU, such laws might spread even faster at national level.

View on HN ↗ Original Article ↗

2025-09-17

Oh no, not again a meditation on NPM supply chain attacks

Responsibility: Companies vs Volunteers

Strong disagreement over whether “the companies” or unpaid OSS maintainers are to blame.
One camp argues Fortune 500s freely exploit volunteer work, giving little back beyond demands.
Others counter that permissive licenses are effectively donations; using them as allowed isn’t “leeching,” and maintainers chose that model.
Some say commercial users should at least be prepared to maintain, fork, or pay for support if they rely on a dependency.

Corporate Incentives and Contribution

Several stories of companies informally promising OSS contributions for years but never funding them; OSS work is seen as “no time in the budget.”
Others note many OSS contributors are actually paid employees (e.g., major languages, foundations); but there’s a very long tail of small critical libraries run by volunteers.
Debate on whether corporate accounting and invoicing constraints really make funding volunteers “legally hard,” or if that’s just an excuse.

Licensing, Fairness, and “Leeches”

Dispute over whether it’s fair to morally criticize big companies that profit heavily from MIT/BSD code without giving back.
One side: permissive licensing implies you expect nothing; fairness arguments don’t change that.
Other side: legality ≠ fairness; people naturally see it as bad manners to profit massively from someone’s work with zero reciprocity.
Long subthread on non‑standard “not for big corps” licenses and whether they should still be called “open source” (strong pushback citing OSI definition).

NPM Culture and Ecosystem vs Others

Multiple comments: the real problem is JS/NPM culture—huge dependency trees, micropackages, aggressive auto‑upgrades, weak standard library.
Comparisons to Go, Maven, PyPI, crates.io, RubyGems:
- Fewer tiny packages, better stdlibs, no postinstall, explicit upgrade commands, or signed packages (Ruby, PyPI).
- Go’s “lowest compatible version” strategy praised for limiting surprise upgrades.
Some argue NPM is simply a bigger, juicier target; others say its design and defaults are uniquely dangerous.

Technical Root Causes and Platform Issues

Discussion on the web as a hostile app platform vs just “learn your tools better.”
UI components (date pickers, accessible widgets) cited as reasons for heavy dependency use.
Hardware security (TPM/Secure Enclave, secure boot) seen by some as unrelated; others say they’re mostly DRM tools, not a fix for NPM‑style attacks.

Mitigations and Best Practices (Today)

Practical suggestions:
- Use pnpm (disables most postinstall scripts by default, minimum‑age for new releases).
- Use Renovate (or similar) with “cooldown” windows before adopting new versions.
- Pin exact versions, rely on lockfiles, use npm ci, avoid auto‑updates; some vendoring and manual diff review on updates.
- Sandbox package managers (bubblewrap on Linux, sandbox‑exec on macOS) or develop inside VMs/containers with secrets kept outside.
- Generate SBOMs and track with tools like OWASP Dependency-Track; use npm audit and external scanners (e.g., safe‑chain).
Recognized downsides: dialog fatigue, usability friction, impracticality of manually reviewing hundreds/thousands of transitive deps.

What NPM/Microsoft Could or Should Do

Strong criticism that package signing/verification was requested as early as 2013 and effectively ignored for years.
Others respond that NPM now has Trusted Publishing, provenance/attestations, and 2FA for top packages; claims that “nothing has been done” are disputed.
Proposed platform‑level measures:
- Mandatory (or much broader) phishing‑resistant 2FA (hardware/WebAuthn) for popular packages, possibly with cooldowns after credential changes.
- Require code signing and treat stolen tokens differently from stolen signing keys.
- Built‑in malware scanning of new releases with human review queues and “cooldown” for high‑impact packages.
- Better default token scoping, expiry, and tooling to derive minimal permissions.

Broader Security Model

Some argue endless arms‑race defenses are doomed; suggest “web of trust” style vouching, where third parties (including big companies) sign attestations that they’ve inspected specific versions and found no obvious malice.
Others emphasize sandboxing and OS‑level isolation as the long‑term way to make inevitable supply‑chain compromises less catastrophic.

View on HN ↗ Original Article ↗

2025-09-17

Alibaba's new AI chip: Key specifications comparable to H20

China’s Nvidia Ban and Strategic Signaling

Commenters link the Alibaba chip news to China reportedly ordering tech firms to cancel Nvidia AI chip purchases.
Interpreted as both:
- A push to force investment into domestic hardware and non‑CUDA software stacks.
- A nationalist / trade‑war move to stop funding foreign (including Taiwanese) defense and economies.
Some note this changes the risk calculus inside Chinese firms: reliability of supply now outweighs historical distrust of domestic quality.

Alibaba’s Chip and China’s Hardware Position

Thread consensus: Alibaba’s chip is roughly in A100/H20 class, ~one to two generations behind top Nvidia Blackwell parts, but still highly useful.
Several argue Chinese chips don’t need to beat Nvidia’s best—only the restricted, cut‑down export models.
Reports of DeepSeek struggling with Huawei chips show the ecosystem is still immature, but demand and margins create powerful incentives to fix issues.
Some call the article/state narrative propaganda, pointing to missing details on interconnect and real compute; others see it as early but meaningful progress.

CUDA, Software Moats, and AMD

Repeated theme: Nvidia’s dominance is more about ecosystem (CUDA, tools, familiarity) than unique silicon.
In China, political pressure can “break” the CUDA moat by forcing migration; firms are building CUDA‑compatible or translated stacks.
AMD is viewed as technically competitive (Instinct line, ROCm) but hampered by weaker software, drivers, and lack of aggressive ecosystem building; demand is limited by Nvidia’s lock‑in and TSMC capacity.
Some argue the CUDA moat is overstated for deep‑learning inference (mostly matmuls), but others stress full‑stack training performance and tooling still heavily favor Nvidia.

Export Controls, Incentives, and Catch‑Up Dynamics

Many frame US export bans as effectively subsidizing Chinese chip development by guaranteeing a captive domestic market and strong state backing.
Debate over how much this really accelerates innovation: some say “catch‑up would happen anyway but faster now,” others note many sanctioned states never caught up due to weaker institutions.
System‑level strategies matter: China can compensate for weaker single chips with sheer scale, cheaper power, advanced packaging, and networking.

AI Race, Markets, and AGI

Some see even a short delay in Chinese AI capability as a major strategic win for US national security; others think delays are only “a few years” and not decisive.
There is skepticism about AGI imminence and about an AI investment bubble; yet most agree Nvidia’s margins and dominance will attract more competitors, including Chinese vendors, hyperscalers’ custom chips, and service‑centric models.

View on HN ↗ Original Article ↗

2025-09-17

The Asus gaming laptop ACPI firmware bug

Asus ACPI Bug and User Impact

Discussion centers on a long‑standing ACPI firmware bug in Asus gaming laptops that causes periodic 10–30ms latency spikes, visible as UI stutter and audio crackle.
Several owners of Zephyrus G14/G15 and other ROG models report nearly identical symptoms under both Windows and Linux, reinforcing that the issue is BIOS/ACPI, not OS.
Some note it appears worst in “dGPU‑only/Ultimate” MUX mode, but others say latency problems show up more broadly, so the exact scope remains unclear.
Workarounds people use: avoiding dGPU‑only mode, disabling boost, favoring hibernate over sleep, or effectively sidelining the dGPU.

Laptop Firmware and ACPI Dysfunction Across Brands

Many commenters generalize the problem to modern laptops: Lenovo, Dell, HP, MSI, Surface, Clevo, Acer, and others are all cited with ACPI, power, GPU switching, sleep/wake and dock issues.
Switchable graphics (iGPU+dGPU) are repeatedly described as fragile, especially with Thunderbolt/docks. Several users now avoid them entirely or choose iGPU‑only machines.
There’s frustration that years of BIOS updates often “improve performance and security” on paper while never fixing core bugs.

Apple, Steam Deck, and Alternatives

Some contrast this with MacBooks and the Steam Deck, saying their suspend/wake and overall integration are far more reliable.
Others push back, listing Apple hardware and software failures (keyboards, throttling, audio glitches, monitor issues) to argue no vendor is flawless—Apple just has the unified incentive to fix its own stack.

Debugging, ACPI Patching, and Technical Debates

The community is impressed by the author’s reverse‑engineering of AML and ACPI events; several say this is exactly the quality of work OEMs should be doing.
People discuss overriding ACPI tables on Linux (initrd/DSDT override) and Windows (Microsoft ACPI table load APIs, custom bootloaders), but note signing, anti‑cheat, and bricking risks.
One detailed commenter questions whether “sleep in an interrupt” is the true root cause, suggesting that System Management Mode and poorly designed GPU power transitions may dominate the latency.

LLMs and Trust in Technical Writing

Multiple readers say the article’s prose is obviously LLM‑polished and find the style distracting or untrustworthy, worrying that generation may have mangled nuances.
Others argue using an LLM for wording—especially for non‑native writers—is fine if the technical content and logs are verifiable, and that critics should point to concrete errors instead.

QA, Reviews, and Buying Advice

Commenters are baffled that such a blatant four‑year bug escaped Asus QA and wasn’t flagged by major reviewers, who typically test throughput but not latency.
There’s broad cynicism that consumer laptop firms prioritize marketing over engineering, expect users to accept glitches, and rarely respond meaningfully to deep technical bug reports.
Many advise avoiding gaming laptops or Nvidia‑based switchable graphics entirely, favoring Macs, business‑line laptops, open‑friendly vendors (System76, Framework), or a desktop + Steam Deck combo instead.

View on HN ↗ Original Article ↗

2025-09-17

GNU Midnight Commander

Nostalgia & Legacy

Many recall Midnight Commander (MC) as the spiritual successor to Norton Commander, alongside Volkov, FAR, Dos Navigator, XTree/PathMinder, etc.
Several describe it as a “gateway drug” from DOS to Linux in the mid‑90s and are pleasantly surprised it’s still maintained in 2025.
Some share war stories: recovering accidentally rm -rf’d dissertations on ext2 by browsing unlinked inodes via MC.

Current Use Cases

Still a default install for many on servers, NAS devices, and remote shells; often the “secret weapon” on headless systems.
Used on macOS (via Homebrew), Windows (including WSL), Unraid, and in containers; some even hook it into Kubernetes debug workflows.
Common tasks: bulk file moves, SCP/FTP/SFTP over SSH or mounted remote FS, source-tree review (with “Lynx-like motion” + quick view), and simple editing via mcedit.

Features People Value

Dual-pane navigation, keyboard-centric workflow, and history/bookmarks for fast directory jumps.
Tight shell integration: Ctrl+O to drop to a shell in the current dir, Ctrl+X bindings, and an editable F2 user menu for custom multi-file actions (e.g., rsync, ffmpeg pipelines).
Virtual FS support (FTP/SFTP/SSH URLs), background transfers, overwrite strategies, and an easy, approachable editor.

Keyboard, Ergonomics & “Old-School” Debates

Strong muscle-memory from the original F-key and numpad layout; others find MC’s shortcuts unintuitive if they never used Norton Commander.
Complaints about Tab being “stolen” from shell completion, Escape delays via terminal emulators, and configuration (colors, formats) being trial‑and‑error.
Some want vim-style keybindings; MC now supports alternative keymaps (including vim/emacs examples).

Orthodox File Manager Concept

Discussion around why these are called “Orthodox File Managers”: dual-pane, command-driven UIs where visible actions map to underlying commands.
Long thread on the meaning of “orthodox” in Russian, English, and Greek and whether the term was organic or a “forced meme.”

Alternatives, Comparisons & Criticism

Frequent comparisons to Total Commander, FAR/far2l, Krusader, Double Commander, ranger, nnn, yazi, Dired, Dolphin, Marta, Directory Opus, etc.
Some say MC feels dated, lacks more advanced/parallel copy features, or is in “maintenance mode.”
Others argue graphical file managers or pure shell tools are enough and question why OFMs still inspire such devotion.

View on HN ↗ Original Article ↗

2025-09-17

Slow social media

Attention, Incentives, and “Recommendation Media”

Several comments frame attention as a de facto currency: likes, views, shares, and followers function like money without any “central bank.”
For‑profit platforms are seen as inevitably drifting toward engagement‑maximizing recommendation feeds, regardless of initial mission.
Some argue you can have healthy for‑profit social media only if the “attention economy” is either demonetized or tightly regulated/re‑monetized with limits on how much attention can be given/received.

Regulation vs Personal Responsibility

Many see meaningful reform as impossible without government intervention (e.g., bans on recommender feeds, restrictions on non‑personal accounts, school smartphone bans).
Others object to paternalism, preferring education and parental responsibility, but are challenged that unpriced social harms justify regulation.
Comparisons are drawn to newspaper regulation and libel law; some argue platforms shouldn’t be allowed to broadcast anything at scale with zero liability.

Desired Properties of Slow Social Media

Common wishes:
- Chronological feeds with a hard end (no infinite scroll).
- Small, private groups; invite‑only or mutual following.
- Caps on friends/followers and on posts per day; possibly mandatory “cost” in time or friction per post.
- No or hidden like counts; limited or disabled forwarding; comments opt‑in.
Some want to outlaw or severely limit algorithmic feeds and commercial/brand accounts, though others note that would kill mainstream appeal.

Existing and Historical Alternatives

Many say the article is reinventing or echoing: LiveJournal, Tumblr, Path, Friendster, early Facebook, regional networks (e.g., iWiW, Tuenti), phpBB forums, BBSes.
Current “slow” substitutes cited: WhatsApp/Signal/Telegram groups, iMessage and shared photo albums, Discord servers, Goodreads, Strava, BeReal, Slowly, niche fediverse platforms (Mastodon, Lemmy, Friendica), and experimental projects (Minus, Seven39, Peergos, Haven, micro.blog, mood.site, tootik, twtxt).
A recurring pattern: services that embody these ideas either remain small, drift toward engagement features, or die when they fail to scale.

Network Effects, Protocols, and Small Federations

Many emphasize network effects and distrust after Facebook/Twitter/Reddit as the main blockers; people won’t move where their friends aren’t.
Open protocols (XMPP, Matrix, nostr, fediverse) are promoted as solutions, but criticized for UX friction and lack of critical mass; big companies have strong incentives to keep ecosystems closed.
Some foresee a future of many small, private, possibly AI‑assisted networks tailored to families, clubs, or communities rather than one dominant global feed.

Weak Ties and Parasocial Concerns

There is disagreement over following distant acquaintances: some value passive updates for rekindling or contextualizing relationships; others see it as parasocial voyeurism that displaces real interaction and fuels unhealthy comparison.
Several note a cultural shift: normal people share less publicly; influencers and semi‑professionals dominate, while private group chats now carry most “real” social life.

View on HN ↗ Original Article ↗

2025-09-17

I got the highest score on ARC-AGI again swapping Python for English

Evolutionary / “other-loop” methods

Several commenters see the approach as similar to evolutionary systems (e.g., AlphaEvolve): text prompts define a high-level search space, and “genetic” mixing plus selection explores it.
This is framed as part of a broader trend: recent strong models reportedly use heavy “outer loop” search/verification beyond simple single-pass generation.
A key open problem: how to define good fitness functions for prompt/program evolution without hand-crafted human scoring; naive attempts stall quickly.

Scaffolding, self-scaffolding, and ASTs

Many argue LLMs are helpless on complex, multi-step tasks without rich scaffolding; models themselves are flexible but the scaffolds are brittle.
Proposed direction: “scaffolding synthesis” where one agent designs task-specific scaffolding (plans, tools, state machines, ASTs), then another agent executes it, with feedback to refine the scaffold.
Examples include compiling natural-language instructions or legal documents into AST-like structures, and existing tools (e.g., code+plan modes) are cited as early instances.

LLM weaknesses: memory, spatial reasoning, and vision

Empirical reports: models perform badly on Sokoban-like puzzles, nonograms, mazes, and ARC-style tasks—forgetting rules they previously derived and repeating disproven deductions.
Some attribute this mainly to poor long-range memory and reliance on lossy text context; others stress weak spatial/visual reasoning and current “bag-of-vision-tokens” frontends.
There is debate whether vision or memory is the primary blocker; multiple comments insist models need compact internal, non-verbal representations of rules and state.

ARC-AGI’s role and modality issues

Several see ARC-AGI as primarily a visual benchmark where humans have strong innate preprocessing; if puzzles were given as JSON, most people would first transform them into graphics.
Others note that strong computer-vision modules exist but haven’t yet produced very high ARC-AGI scores when bolted onto LLMs.
Some view this work as meaningful progress on one of the few benchmarks where humans still dominate; others think it’s “slightly smarter brute force” or overfitting to a contrived task.

Reasoning vs pattern matching and “PhD-level” claims

Long subthread debates whether LLMs genuinely “reason” or just perform sophisticated pattern matching.
One side argues: high benchmark scores, commonsense examples, and mech‑interp findings (latent world models, abstract circuits) imply functionally similar reasoning to humans, albeit text- and 1D-biased.
The opposing side stresses failures on simple puzzles, out-of-domain tasks, lack of runtime learning, and reliance on offline RL as signs they are closer to expert systems trained to the test.
Definitions are contested: some equate reasoning with advanced pattern matching; others insist true human-like reasoning must include continual learning and generalization to genuinely novel problems.

Dead zones, RL, and learning over time

The article’s notion of “dead reasoning zones” is challenged; critics say humans do exhibit systematic reasoning failures, especially in abductive inference or under cognitive dissonance.
Questions are raised about the claim that RL “forces logical consistency”; skeptics note that repeated trial-and-error with an oracle differs from humans’ one-shot reasoning and self-checking.
Several point out that LLMs could, in principle, approximate runtime learning via external memory plus periodic fine-tuning on their own experience, but this is not how today’s models generally operate.

Practical tools, reproducibility, and evaluation

Commenters share related frameworks (e.g., dSPY, GEPA-like approaches) and ask for reusable tools to run evolutionary prompt/program search at home with major APIs.
Links to the project’s GitHub and Kaggle notebooks are provided for replication.
Some worry that apparent improvements on public puzzles might just reflect training on blog posts or leaked solutions; others suggest controlled tests with pre‑ARC models and ablations of the new method.

View on HN ↗ Original Article ↗

2025-09-17

About the security content of iOS 15.8.5 and iPadOS 15.8.5

Longevity and Support of iOS vs Android

Many see this iOS 15.8.5 patch as evidence that Apple supports devices far longer than most Android OEMs, especially pre‑2020 Pixels and Samsungs that often got ~3 years.
Others note Android has improved: recent Pixels and Samsungs now promise 5–7 years of updates, sometimes matching or surpassing Apple’s formal commitments (at least on paper).
Experiences with hardware durability diverge: some report Android phones failing faster than iPhones; others report decade‑scale use of Samsung/Pixel devices while iPhones around them get replaced frequently.

Severity and Nature of the Vulnerability

Commenters infer a serious zero‑click remote code execution in image parsing, likely exploited via messaging apps.
It was already patched on “current” devices weeks earlier; this backport to iOS 15 is taken as a strong signal it was used in real‑world spyware campaigns.
Several speculate it’s part of a chain with a WhatsApp bug to deploy targeted surveillance tools, potentially similar to commercial spyware.

Threat Models and Old Devices

Some argue this mostly matters to journalists, activists, opposition figures, and others targeted by states; everyday users face much lower risk.
Others counter that once such exploits are reverse‑engineered, they can spread to less sophisticated actors, so patching old devices limits broader abuse.
Debate over whether high‑risk people can “just buy” a newer phone; several point out many such targets are not wealthy.

Repurposing and Openness

Discussion on whether old iPhones are less reusable than old Androids:
- iOS: jailbreaks, TrollStore, and Xcode sideloading exist but are constrained and fragile over time.
- Android: LineageOS and postmarketOS can turn devices into routers, servers, etc., but support varies by model and vendor unlock policies.
Some argue that if iPhones were as hackable as cheap microcontrollers, they’d be better long‑term dev platforms.

Vendors, SoC Constraints, and Policy Shifts

A recurring criticism of Android: baseband/SoC vendors (notably Qualcomm) stop maintaining kernel/driver trees after a few years, capping secure support even for custom ROMs.
Others respond this is ultimately a contractual and business‑model problem Google and OEMs could solve if they chose.
Apple’s tighter vertical integration is seen as enabling longer practical support.

App Ecosystem and Practical Lifespan

Even with security patches, some note that once iOS is two major versions behind, many apps drop support, making devices “functionally obsolete.”
Counterexamples: users on very old iPhones report core tasks (browser, navigation, banking, Apple Pay) still working, though some sites and apps have already moved on.

Overall Reaction to Apple’s Patch

Broad approval for patching 9–10‑year‑old devices; several self‑described non‑fans praise it compared to Android “abandonware.”
Some worry that patching only this one bug on an old branch may give users a false impression that they’re fully secure when many other unfixed issues likely remain.

View on HN ↗ Original Article ↗

2025-09-16

I launched a Mac utility; now there are 5 clones on the App Store using my story

IP, DMCA, and Copying Boundaries

Many commenters suggest DMCA takedowns for plagiarized text, images, and origin story; expectation is Apple will often remove blatant copies but not all.
General consensus: copying the idea or simple functionality is fair game; copying marketing copy, assets, or decompiled code crosses a line.
Some note legal recourse is impractical across borders and for low-revenue indie utilities.

What Can Be a Moat for a Simple Utility?

Idea itself is not defensible; for a small Mac utility, suggested “moats” include:
- Speed of innovation and frequent updates.
- Better UX, native feel, and responsive support.
- Building a brand and community trust over time.
- Sheer stamina: keep maintaining while quick-buck clones decay.
Others argue there may be no real moat for something that can be built in days; marketing and distribution dominate.

App Store, Distribution, and Clones

Several argue that using the App Store means relying on Apple’s “moat”; curation is described as weak, random, or driven by volume/revenue rather than quality.
Some criticize the 30% fee vs poor enforcement against obvious clones and spam.
Suggestions include multi-cloning one’s own app with variations, direct distribution, and cautious use of Reddit and similar channels (avoid “I made $X in Y days” posts that attract copiers).

LLMs, Low Barriers, and Authenticity

Multiple comments say LLMs have drastically lowered the bar to clone simple apps or marketing pages, intensifying an old problem.
Broader unease emerges about AI-generated code, AI-written posts, and whether the thread itself is partly “vibe-coded,” raising questions about authenticity of both software and discussion.
Some frame widespread cloning as a long-standing human behavior now amplified by new tools.

View on HN ↗

2025-09-16

Frying Eggs and Air Quality Tests

Air quality measurements while cooking

The article’s low PM2.5 numbers from frying eggs contrast with many commenters’ experiences using various monitors and purifiers.
Several report PM2.5 going from low single digits to 70–400 µg/m³ when searing meat or burning oil, and to device max (500–999+) when frying bacon, pancetta, or badly overheating oil.
Some air purifiers react strongly to cooking from other floors of the house, ramping to full power within a minute.
Others note that gentle cooking (e.g., eggs not heavily browned) often doesn’t move the needle much.

Role of oil, heat, and browning

Multiple commenters stress that the test scenario is “too clean”: low heat, little browning, and apparently not much oil.
High heat, Maillard browning, and approaching or exceeding oil smoke point are repeatedly cited as the main drivers of PM2.5 spikes.
Tiny burnt fragments or a brief oil overheat can generate disproportionate particulate levels compared to non-burnt cooking.
Some note extensive grease deposition on high surfaces and inside hoods as evidence of significant aerosolized oil.

Gas vs electric / induction

There’s agreement that gas combustion worsens indoor air quality, but disagreement about particles: some say gas adds mostly NO₂, CO₂, CO, and VOCs, not PM2.5.
Induction/electric avoids combustion gases but still produces particulates from food and oil.
A few anecdotes describe modern apartments with unvented gas ranges causing CO alarms.

Ventilation, hoods, and kitchen design

Many extractor fans (especially over-the-range microwaves) merely recirculate air through weak filters and move relatively little air compared with proper external vents.
Even externally vented hoods are often underpowered or constrained by poor ducting and lack of makeup air, so they dilute slowly rather than rapidly clearing spikes.
Strong praise for enclosed, Chinese-style kitchens with powerful exterior vents; critics of open-plan Western layouts complain of “whole-house cooking smells” and lingering grease.
Others value open kitchens for social reasons and accept some spread of odors.

Nonstick pans, PFAS, and tradeoffs

Some prefer avoiding nonstick/PFAS and accept short pollution spikes plus filtration instead.
Others argue that intact PTFE at non-smoking temperatures is likely low-risk for the user; the larger concern is PFAS pollution from manufacturing.
Many use oil even on nonstick for better heat transfer, browning, and flavor.

Sensor limits and composition

Commenters note that PM sensors report size, not chemistry; oil droplets, tire dust, and metal particles are likely to differ greatly in toxicity.
Several point out that cooking generates ultrafine particles smaller than 2.5 µm that common PM2.5 sensors may miss.

View on HN ↗ Original Article ↗

Hacker News, Distilled

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics