Stories - Page 508 | HN Distilled

2025-01-01

One of my papers got declined today

Rejection as a Normal Part of Research

Commenters stress that paper rejection is routine, not exceptional, even for world‑class researchers.
Publicly sharing rejections and near‑misses helps normalize failure and counteract the myth that others only have smooth careers.
Several note that rejections hurt most during time‑boxed phases (e.g., PhDs) because of “publish or perish” pressure and prestige expectations.

Imposter Syndrome and Perception

A key theme: people mostly see others’ successes and controversies, not their mundane failures, which feeds imposter syndrome.
Some say seeing extremely successful people talk about rejections is reassuring; others admit it doesn’t fully cure feelings of inadequacy.

Peer Review, Journals, and Incentives

Many describe peer review as noisy and sometimes arbitrary: different reviewers, conflicting criteria, weak or off‑base reviews.
Double‑blind review is unevenly applied and often ineffective in small fields where identities can be guessed.
Reviewers and editors are usually unpaid; reviewing is part of “being a good citizen,” but quality and effort vary widely.
Journals are seen as competing for limited reader attention and “prestige points,” which drives high rejection rates and selectivity.

Partial Results vs. “Impact”

A widely discussed anecdote: a partial solution to a major problem was rejected for not solving it fully; the later full solution was rejected as only a small improvement over the partial result.
Some argue this is rational triage under scarce “top journal” space; others see it as missing the bigger picture and discouraging incremental, collaborative progress.
Tension highlighted between discouraging “salami slicing” (minimal publishable units) and avoiding incentives to hide intermediate results.

Status, Bias, and Gatekeeping

Experiences shared where attaching a famous coauthor moved work from mid‑tier to top “high impact” venues, suggesting name‑based bias.
Others note that famous researchers still getting rejected shows the system is not purely name‑driven rubber‑stamping.

Alternatives, Reforms, and Anecdotes

Proposals include arXiv‑style open publishing with community curation, reputation, and open reviews; skeptics worry about noise, trolling, and gaming.
Several anecdotes illustrate odd rejections, mistaken plagiarism flags, and major results initially dismissed as “out of scope,” reinforcing that rejection often says more about the venue and process than the work.

View on HN ↗ Original Article ↗

2025-01-01

How and Why I Stopped Buying New Laptops (2020)

Longevity of Older Laptops

Many commenters still use 2010–2017 laptops (notably 2015-era models) and expect 8–10+ years of life with SSD and RAM upgrades.
Main pain points: battery replacement (sometimes no official batteries), OS support ending, and modern software not being optimized for older hardware.
Some feel older business laptops (ThinkPads, Latitudes, Fujitsu “pro” lines) are sturdier, more repairable, and have better keyboards than modern thin-and-light machines.

Performance, Efficiency, and New Platforms

Apple Silicon and similar low-power platforms are praised for huge battery life, cool/quiet operation, and good trackpads; some say that alone justifies upgrading.
Others argue x86 efficiency (Intel Lunar Lake, AMD laptops) has improved enough to narrow the gap, especially for gaming and GPU-heavy workloads.
There’s tension between “my old quad-core is fine for web/office/dev” and “new CPUs/GPUs massively outperform for 4K, 3D, AI, and video editing.”

Repairability vs. Convenience (Framework, ThinkPad, etc.)

Strong support for modular/repairable laptops: replaceable keyboards, SSDs, RAM, and batteries are seen as key to long lifespans.
Framework is praised for repairability but criticized for higher price, lower battery life, more heat/noise, and weaker GPU options versus mainstream or Apple laptops.
Some accept soldered RAM (if high enough) but strongly reject soldered SSDs, mainly for data-recovery reasons; others prioritize long battery life and low weight over replaceability.

Used/Refurb Market and Economics

Many report excellent value from refurb Dells, ThinkPads, and HPs for a fraction of new prices, especially once HDDs are swapped for SSDs.
Others avoid used laptops due to limited remaining lifespan, cosmetic damage, poor batteries, or past bad experiences.
Anticipated Windows 10 end-of-support may create a “glut” of used machines; disagreement on whether this will meaningfully boost desktop Linux adoption.

OS Choices, Bloat, and Workloads

A recurring theme: hardware is “ridiculously powerful,” but heavy software (especially web/JavaScript) erases gains.
Strategy suggested: run lightweight Linux/BSD, avoid “newer, more demanding software,” and extend hardware life.
Some workloads (Qubes OS, multiple VMs, GPU AI, advanced network routing) are cited as legitimately requiring modern, high-end laptops.

Cloud, Backup, and Storage Practices

Debate over cloud-sync vs local SD/SSD: cloud works well in high-bandwidth regions, but travelers and users with expensive/slow connectivity see it as unreliable or costly for full restores.
Discussion emphasizes: backups are essential, but replaceable SSDs add an extra recovery path that backups alone don’t fully cover.

View on HN ↗ Original Article ↗

2025-01-01

Pornhub Is Now Blocked in Almost All of the U.S. South

Scope of the bans and who is “blocking”

Several southern U.S. states require “reasonable” age verification for sites with material “harmful to minors.”
Pornhub has chosen to withdraw service rather than implement these systems; some commenters stress this means the law is the cause, but the immediate block is Pornhub’s choice to comply by exiting.
In at least one state (Utah), government did not provide a usable verification mechanism; in Tennessee, a similar law was reportedly blocked by a federal judge.

Obscenity, youth access, and social effects

Discussion of historic obscenity regulation (e.g., Miller test) and how the internet largely bypassed practical enforcement.
Concerns: porn as hyperstimulus, dopamine-driven habits, distorted expectations in relationships, possible effect on birth rates and social isolation.
Counterpoints: most users are not addicted, porn use may correlate with lower rape rates, and reduced teen pregnancy.
Some suggest violence in media is a bigger social harm than sexual content.

Age verification mechanisms and privacy

Laws often allow methods like matching a live photo to government ID, or using “commercially reasonable” transactional data.
Many see fundamental differences between flashing ID in person and uploading/streaming ID data online, given breaches, tracking, and lack of audited, trustworthy providers.
Cryptographic proposals (zero-knowledge proofs, anonymous tokens, PrivacyPass-like systems, mobile driver’s licenses) are discussed, but noted as not standardized, not widely deployed, and not officially accepted.
Disagreement on whether proper auditing requires some retention of verification records, which conflicts with privacy promises.

Free speech, censorship, and definitional creep

Strong concern that these laws erode online anonymity and normalize ID-gated speech.
Worry that vague standards like “prurient interest” and “harmful to minors” could be applied selectively (e.g., against LGBTQ content) or expanded to other “obscene” or political content.
Others argue communities should be able to restrict porn while still protecting general speech.

Effectiveness and unintended consequences

Many predict users will migrate to less regulated, potentially more exploitative foreign sites, or to porn on major social platforms and chat apps.
Expectation of increased VPN use; some fear future attempts to block VPNs or build “mini–Great Firewalls.”
Some think bans are mainly symbolic, driven by religious or moral agendas rather than realistic harm reduction.

Views on Pornhub and porn “addiction”

Split views on Pornhub’s responsibility: some praise its privacy stance and self-policing; others share anecdotes of slow removal of illegal content and past lax practices.
Debate over “porn/sex addiction”: some see it as a serious dopamine-driven problem; others note that “sex addiction” is not in DSM-5, and that only a minority meet criteria for compulsive behavior.
Broader themes: loneliness, dating app dynamics, MeToo-era norms, and whether porn is a cause or symptom of social disconnection.

Alternative solutions and parental control

Suggestions: better sex education; non-commercial or public dating platforms; improved parental tools and device-level filters rather than state mandates.
Some favor leaving access legal for adults and making parental responsibility, not state censorship, the main control mechanism.

View on HN ↗ Original Article ↗

2025-01-01

Why does storing 2FA codes in your password manager make sense?

Perceived Benefits of Storing 2FA in Password Managers

Main pro: easier setup, backup, and recovery; reduces risk of losing access when a phone is lost or upgraded.
Encourages wider 2FA adoption; many users will skip 2FA if it can’t be kept with passwords.
Autofill tied to domains can block many phishing attempts because credentials and TOTP won’t appear on mismatched sites.
For some, storing TOTP with passwords is seen as “good enough” and better than SMS or email 2FA.

Security Concerns and Arguments Against

Combining passwords and TOTP in one vault collapses two factors into one compromise point.
If a password manager is breached or a shared vault misused, attacker may get both password and 2FA secrets, turning 2FA into near-1FA.
Some see TOTP-in-password-manager as feature creep driven by convenience and marketing, not security.
Storing TOTP separately (another device/app, paper, second vault) forces an attacker to break two independent systems.

Phishing, Credential Stuffing, and 2FA’s Real Role

Disagreement on claims that TOTP’s “main advantage” is phishing resistance: TOTP and SMS codes are still phishable, especially with real‑time proxy attacks.
Several argue 2FA’s key value is preventing credential stuffing when users reuse passwords.
Time‑limited codes constrain a stolen credential to a short window, but automation can still exploit that.

Passkeys, Hardware Keys, and Factor Models

Passkeys/hardware keys highlighted as phishing‑resistant and better than TOTP in that respect.
Some note inconsistency: critics dislike co‑locating TOTP with passwords but accept passkeys tied to the same device/ecosystem.
Debate over the classical “something you know / have / are” model; some view almost everything as ultimately “something you know” (bits), others still find the model useful.

Usability vs. Purism

“Purist” stance: keep TOTP off the password manager for stronger separation.
Pragmatic stance: if separate 2FA causes lockouts or discourages 2FA, then storing TOTP in a good manager is a net win.
Especially for non‑technical users, passwords + TOTP in one manager may be significantly safer than weak or reused passwords without 2FA.

Alternative Setups and Backups

Common patterns:
- Passwords in one manager, TOTP in a separate app (e.g., Aegis, Authy, FreeOTP+, Google Authenticator with sync/export).
- File‑based managers (KeePass variants) with strong master passwords, keyfiles, or hardware keys; manual sync and offline backups.
- Redundant devices for TOTP, printed or CSV‑based encrypted backups, or minimal “emergency” TOTP vaults on paper.
Some prefer hardware‑key 2FA (e.g., FIDO/WebAuthn, multiple keys) for critical accounts; TOTP‑in‑manager only for low‑value or forced‑2FA sites.

View on HN ↗ Original Article ↗

2025-01-01

H5N1: Much More Than You Wanted to Know

Pandemic mitigation and social cohesion

Many commenters doubt society’s current ability or willingness to sustain strong pandemic measures; some argue we never had it, others say trust was lost through incompetence, mixed messaging, and politicization.
Several note that modest, targeted measures (e.g., localized bans, good ventilation, masking when sick) are easier to sustain than broad lockdowns.
Debate over whether future strong measures would only be accepted if children were heavily affected; others argue even that might not overcome political and cultural resistance.

Interpreting the COVID-19 experience

Strong disagreement over how severe COVID was and how well it was handled:
- Some claim it posed low risk to healthy non-elderly adults, hospitals rarely truly overflowed, and models and media exaggerated.
- Others counter with overwhelmed hospitals, excess deaths, long COVID, and comparisons to TB/HIV mortality.
Disputes over model quality, hospital utilization, triage stories, and whether “sceptic” narratives or official narratives better matched reality.
Significant frustration over early mask messaging, perceived double standards (e.g., protests vs churches), and politicization from multiple sides.

Ethics, risk, and protection of vulnerable groups

Tension between:
- “Let low-risk people live normally; isolate the elderly/at-risk.”
- Counterargument that vulnerable people depend on complex care networks, making true isolation impossible and inhumane.
Some adopt a harsh “let refusers die / no obligation to save everyone” stance; others emphasize social responsibility and empathy.
Long-term impacts (e.g., long COVID, diabetes risk in children) cited as reasons to treat even “mild” pandemics seriously.

H5N1 current risk and drivers

H5N1 has moved from birds into other animals (minks, cows, pigs). Pigs are highlighted as concerning “mixing vessels” for reassortment with human flu.
Risk is seen as elevated but still only moderately above the background chance of a new flu pandemic.
Co‑infection (human flu + H5N1 in the same host) is described as a plausible path to human-to-human transmissibility; this is why flu vaccination for livestock workers is emphasized.

Surveillance and comparison to COVID

H5N1 is seen as better tracked because it mainly hits predictable, rural animal-exposed populations.
COVID appeared suddenly in a dense urban area, with many mild/asymptomatic cases, making early tracking harder.
Some question whether mild human-to-human H5N1 transmission might already be occurring but largely missed; reliance on wastewater and limited clinical testing is noted, but the extent is unclear.

Vaccines and public health tools

Existing H5N1 vaccines for humans and animals are mentioned; reasons given for not yet adding H5N1 to annual flu shots:
- Very low current human incidence.
- Likely antigenic drift before a pandemic strain emerges.
Some argue we should vaccinate poultry/dairy workers now to reduce animal–human crossover opportunities.
Discussion that many flu vaccines, and animal vaccines in particular, primarily reduce severe illness rather than fully blocking transmission; suggestion that “vaccine” as a term can be misleading, but others point out this is consistent with standard definitions.
For broader respiratory risk, commenters promote:
- Indoor air quality standards (CO₂ limits, fresh air exchange).
- Upper-room UVGI.
- Ongoing high-grade masking in healthcare settings.

Immunity, genetics, and imprinting

The article’s point about “immune imprinting” from first flu exposure is contrasted with a cited paper (from Science) suggesting host genetics may be more important in determining which strains people handle best.
No consensus emerges; thread simply notes that both factors may matter, relative importance unclear.

Prediction markets and forecasting

Mixed views on prediction markets (Metaculus, Polymarket, etc.):
- Some say long-dated, low-liquidity markets overestimate tail risks due to gambler incentives and long lock-up.
- Others criticize specific platforms for poor calibration, emphasizing that having real (or even play) money on the line tends to improve accuracy.
Overall, prediction markets are seen as informative but imperfect tools for pandemic risk estimation.

Agricultural practices and structural drivers

Large-scale, high-density industrial farming is blamed by some for increasing opportunities for viral recombination and evolution.
Concern that effective control may require culling herds and flocks, with downstream impacts on food prices (eggs, beef).

Uncertainties and open questions

How likely H5N1 is to acquire sustained human-to-human transmission, and on what timescale, remains unclear.
The true current level of human infection (especially mild/asymptomatic cases) is also unclear given limited routine testing.
Debate persists on what risk threshold justifies major restrictions versus targeted protections and infrastructure upgrades.

View on HN ↗ Original Article ↗

2025-01-01

Supernovae evidence for foundational change to cosmological models

Access, Code, and Reproducibility

Several commenters note the main site is JavaScript-heavy and share a direct PDF link.
The authors’ Python analysis code and data are linked, but attempts to reproduce the results run into missing calibration files (.FITRES), reliance on Python 2.7, and unspecified dependency versions.
Commenters see this as a barrier to independent verification, even for technically capable readers.

Timescape / Inhomogeneous Cosmology Explained

Multiple lay-level explanations: relax the cosmological principle and allow large-scale inhomogeneities.
Idea: regions with different mass density experience different proper times; voids age faster than overdense regions like galaxy clusters.
Cosmological time (a global coordinate time tied to the CMB rest frame) may diverge from local proper time, potentially altering inferences from supernova distances and redshifts.
Questions arise about how this squares with relativity, constancy of the speed of light, and simultaneity; replies stress that in curved spacetime “speed” is local and light-cone structure is what matters.

Consequences for ΛCDM, Dark Energy, and Dark Matter

Enthusiastic commenters feel ΛCDM is ad hoc (dark matter + dark energy likened to “epicycles”) and welcome a GR-based alternative that drops homogeneity.
Others stress that the paper only addresses supernovae, one “pillar” of cosmology, while ΛCDM also fits CMB, BAO, and other data.
One critique: for timescape to replace dark energy, void clocks must run ~38% faster than cluster clocks, implying a density contrast ∼100,000× larger than observed by other methods.
There is disagreement over whether inhomogeneity really reduces free parameters versus simply shifting complexity.

Compatibility with Other Observations

Some argue large voids and structures already strain the cosmological principle; others counter that galaxy ages in voids tend to be younger, not older.
Concerns raised: can timescape handle CMB constraints and the Hubble tension, or might it only solve one issue while worsening others? Unclear from this paper alone.
One line of discussion suggests that dropping isotropy opens more radical possibilities (e.g., irregular spacetime topology), but these are speculative within the thread.

Scientific Practice and Philosophy

Strong criticism of “shut up and calculate” and of routine assumption of ΛCDM in papers; some ex-researchers say this discouraged alternative thinking.
Counterpoint: this work itself is mostly a heavy statistical calculation, not a new theory, and “more work is needed” across many datasets.
Extended debate over whether science should use more systematic/automated hypothesis testing versus the reality that experiments and data pipelines are highly bespoke and hard to standardize.

View on HN ↗ Original Article ↗

2025-01-01

Databases in 2024: A Year in Review

Tone and Style of the Review

Many readers enjoy the humorous, irreverent style and pop-culture/celebrity tangents; others find it bombastic, overly focused on drama and fundraising, and light on technical depth in places.
The recurring jokes about a certain Oracle billionaire are widely read as satire, though some find the “fawning” or amount of space spent on him odd or irrelevant.

Redis, SQL, and Data Models

A large subthread debates criticism of Redis’s API and type system from the linked video.
Critics of the video say it misunderstands Redis’s “data-structure server” model, over-indexes on “it’s not SQL,” and ignores powerful features (sorted sets, probabilistic structures, queues, leaderboards, real-time use cases).
Defenders summarize the criticisms as: inconsistent commands by type, dynamic typing on keys, and “fake” transactions via MULTI/EXEC.
Further debate covers whether Redis’s semantics resemble a dynamically typed global-variable store vs. statically typed SQL schemas.
Performance claims are contested: one side calls Redis “slow” due to single-threading and network hops; others say it’s more than fast enough for its niche and point to alternatives (Dragonfly, Garnet).

SQL’s Dominance and Alternatives

Multiple comments agree with the article’s “SQL is king” framing but note SQL’s ergonomic flaws and limited recursion.
Some argue that non-relational data models warrant non-SQL languages and that not all roads lead back to SQL. Others counter that many non-SQL systems eventually add SQL layers.
There’s appreciation for new query languages (e.g., PRQL, Datalog variants) but skepticism about their adoption barriers.

Major Vendors and SQL Server

Several note the article largely ignores SQL Server and other classic enterprise DBs (Oracle, DB2, Teradata, etc.).
Opinions on SQL Server: technically strong, “boringly reliable,” with excellent tooling and OLAP/ETL/reporting stack, but increasingly sidelined by licensing cost and the rise of Postgres/MySQL.
Disagreement over scalability: some say it scales fine; others claim Oracle scales better at true company-wide scale.

Startups, OtterTune, and Licensing Drama

Readers are struck by how a well-funded, well-credentialed optimization startup died quickly, reinforcing how hard DB startups are.
There’s curiosity (and some criticism) around the story of a failed acquisition by a private-equity-backed Postgres company and the resulting informal “ban” on that firm recruiting from a university group; some see that as fair warning to students, others as questionable.
The broader license-change section sparks discussion about why Redis/Elasticsearch triggered forks but MongoDB/Neo4j/Cockroach/Confluent Kafka didn’t; commenters cite original license choice, size of contributor communities, and real-world impact.
ScyllaDB’s license shift is noted as practically unforkable due to codebase complexity and contributor scarcity.

Other Systems and Ecosystem Notes

DuckDB is widely praised as a “shove it everywhere” analytics engine, though a few report stability issues and slow bug triage.
Graph vs. relational: newer relational systems (Umbra, CedarDB) tout strong graph workloads; commenters note that good planners/compilers narrow the gap, with graph DBs mainly winning on extreme traversals.
Greenplum’s trajectory and the Cloudberry fork (now Apache) are discussed as examples of open vs. closed evolution.

Cloud vs. Self‑Managed and Cost

Several comments explore when self-managed databases beat cloud DBaaS economically; anecdotes suggest the crossover can be very early for some teams.
There’s skepticism of high-priced cloud warehouses (e.g., Snowflake) versus cheaper, mixed stacks (DuckDB, Iceberg/Hudi, S3 tables, Vertica, Ocient, Yellowbrick).

View on HN ↗ Original Article ↗

2025-01-01

DOOM CAPTCHA

Overall Reception

Many find the DOOM CAPTCHA hilarious, nostalgic, and technically impressive; it “hits just right” as both demo and satire.
Others find it excessively difficult and unusable, calling it “anti-human” or joking that failure proves they are bots.
Several say it’s still preferable to conventional image-based CAPTCHAs; others insist they would abandon any real site that used something this hard.

Gameplay, Difficulty & Strategies

The level is identified as E1M9 on Nightmare with a pistol start, which is notoriously hard even in the original game.
Lack of obvious strafing and non‑modern controls (arrow keys instead of WASD, no mouselook) dramatically increase difficulty.
Reported winning tactics:
- Don’t move or just step forward then immediately back and hold fire.
- Back into the starting doorway and snipe enemies at range.
- Hug walls and pull enemies into a firing lane.
Many players cannot pass without cheating; others beat it in 1–3 tries and argue it’s manageable once you understand old-school Doom.

Controls, Mobile & Accessibility

No WASD or mouse fire; default is arrows + space, with strafing via Alt, comma/period, or < > depending on setup.
Platform issues: some browsers intercept Alt+arrow; some keyboards lack arrows or use non‑QWERTY/custom layouts.
Mobile has an on-screen pad, but multiple reports say it doesn’t appear or doesn’t register kills; lack of strafing on touch makes it “a shooting gallery.”
Many point out this is highly inaccessible for people with disabilities and non-standard input setups.

Cheats, Determinism & Security

Classic cheat codes work (IDDQD, IDKFA, IDSPISPOPD, IDCLEV, IDCLIP), though some weapons and kill types don’t count (e.g., rocket gibs, infighting, shareware‑only arsenal).
Users show trivial bypasses (e.g., calling Module.onEnemyKilled() in the console), noting this is only a proof-of-concept.
Discussion suggests more secure variants: randomizing spawns, recording inputs/demos and replaying server-side, using Doom’s determinism as a verifiable proof of work.

AI Tools & Captcha Philosophy

The page layout was largely created using Vercel’s v0 assistant; commenters note the chat log is mostly UI tweaking, with DOOM integration coded separately.
Some see this as a fun UX mockup rather than a serious CAPTCHA; others debate future viability as bots, RL agents, and agents-as-a-service become common.
Broader captcha discourse emerges: frustration with reCAPTCHA/hCaptcha, concern about accessibility, and skepticism toward Apple’s device-based “automatic verification” on privacy grounds.

View on HN ↗ Original Article ↗

2025-01-01

Show HN: API Parrot – Automatically Reverse Engineer HTTP APIs

Product concept and capabilities

Tool captures HTTP traffic from a browser, infers API structure, and visualizes request/response flows.
Main value: automatic correlation of data across requests by decomposing JSON and other structures into trees and matching repeated values.
Users report it’s intuitive and helpful for understanding complex/legacy vendor APIs and spotting architecture/performance issues.

Platform availability and installation

Early comments requested a macOS version; one is later announced, but it’s not code-signed and requires a quarantine workaround.
Linux users report issues launching Chrome (wrong command, Windows-specific assumptions). Workarounds involve manually starting Chrome with proxy and certificate arguments; a configurable launch command is planned.
AppImage install path and permissions confuse some Linux users; docs were updated after feedback.

Data modeling and technical details

Data correlation works by recursively breaking down structures (e.g., arrays, objects), then matching identical values across requests.
Currently supports HTTP only; WebSocket support is acknowledged as harder due to binary formats and is not implemented.
Non-JSON formats are partially supported; multipart form data is not yet.

Missing features and roadmap requests

Frequent requests: macOS build (now provided), OpenAPI/Swagger export, SDK generation, better browser choice (not only Chrome), and explicit ToS/license info.
Users ask about support for URL path variables, query parameters, and noise filtering; those are said to be supported.
Feature suggestions include UI refinements, adjustable layout, better defaults for naming and resizing, and optional newsletter for releases.

Stability, bugs, and UX

Reports of crashes on large GraphQL responses and missed captures when certain requests occur on initial page load.
Some find the website’s animated “snake” distracting.
Overall UI and docs receive praise, with minor usability nitpicks.

Comparisons and ecosystem

Compared to Postman’s capture features, mitmproxy2swagger, and Integuru/Integru; differentiation is seen in correlation/visualization, but details remain somewhat unclear.
Naming of both this tool and competitors is discussed as affecting memorability.

Broader OS debate

A substantial subthread debates macOS dominance in dev tooling, cross-platform issues, ARM vs x86, Docker abstraction, and the practicality vs “harm” of developing on macOS while deploying to Linux.

View on HN ↗ Original Article ↗

2025-01-01

30% drop in O1-preview accuracy when Putnam problems are slightly variated

Benchmark contamination & “training on the test”

Many assume Putnam problems are in LLM training corpora, since the archive is public and models are trained on “whatever they can get.”
Some argue this is not “cheating” because Putnam is not an official benchmark used by labs, unlike held‑out sets such as MMLU, ARC‑AGI, or FrontierMath.
Others counter that once any problem set becomes a de facto yardstick in media or social media, vendors are incentivized to overfit to it, explicitly or via data contamination.
There’s disagreement over how rigorously big labs de‑duplicate or exclude benchmark data at web scale, and how much to trust their assurances.

Pattern-matching vs generalization

The 30% accuracy drop under small variations (renaming variables, changing constants, minor structural tweaks) is widely read as evidence of heavy pattern matching and memorization.
Some see this as “overfitting” or “teaching to the test,” not robust mathematical understanding.
Others emphasize that performance only partially degrades, not to zero, which suggests limited but real abstraction.

Comparisons to other benchmarks and models

Multiple references to o3 getting ~25% on the held‑out FrontierMath benchmark; supporters present this as strong evidence of genuine reasoning on unseen problems.
Skeptics question contamination claims, methodology (e.g., simulated Codeforces runs, number of submissions, non‑live evaluations), and note independent attempts often find weaker performance on live contests.
Several point out the paper tested o1‑preview; newer o1/o1‑pro reportedly do better on the same variations, but this might reflect retraining on the released dataset.

Test‑time compute and “reasoning models”

Discussion of o‑series models using test‑time compute, chain‑of‑thought, and likely some form of search/tree‑of‑thought, as distinct from older “one‑shot” next‑token models.
Some argue this is a meaningful step toward reasoning; others say it is still pattern‑guided search in latent space, not true generalization.

Toy tests, tricks, and failure modes

Many concrete examples: river‑crossing puzzles, “which is heavier” questions, counting letters in sentences, riddles about family relationships, and buoyancy subtleties.
These often expose that models latch onto familiar puzzle templates and ignore small but decisive wording changes, or invent plausible‑sounding but wrong explanations.
A recurring theme: models can be coaxed into correct step‑by‑step reasoning with explicit prompts, but default, fast answers are brittle.

Broader views on intelligence and impact

One camp says LLMs are just very strong pattern recognizers or “stochastic parrots,” incapable of the kind of conceptual leaps exemplified by, say, pre‑1905 derivation of relativity.
Others insist the line between human “understanding” and large‑scale pattern learning is blurry, and note that many humans also rely on exam cramming and template matching.
There’s meta‑debate about “moving the goalposts” for what counts as intelligence once models pass former milestones (Turing‑test‑like behavior, exam performance).
Economic anxiety surfaces: huge investment vs modest real‑world returns, fear of an “AI bust,” and suspicion that hype and selective benchmarking are driven by financial pressure.

View on HN ↗ Original Article ↗

2025-01-01

Books I Loved Reading in 2024

Reading skill, education, and literacy

Several comments push back on “decline of literacy” takes, noting that many referenced books require huge practice in deep reading, which most people never get.
School reading assignments are criticized: one-text-for-all, boring canon choices, heavy homework, and little autonomy are seen as killing the joy of reading.
Some hope AI tutors and more choice of texts could nurture a love of reading; others argue the real issue is voters/taxpayers not funding better systems.
One commenter cites evidence that babies may have brain regions pre-wired to connect visual symbols and language, challenging “brains aren’t wired for reading.”

Why and how people read

Strong divide between reading for self-improvement vs pure pleasure.
- Some reject “reading as self-optimization,” comparing it unfavorably to guilt-free Netflix watching.
- Others say it’s fine to have explicit goals (skills, language learning, thinking tools) and still enjoy it.
Recurrent theme: it’s okay to quit books that don’t work for you; don’t read things only for external validation.
Some argue great literature trains critical thinking and empathy; skeptics note humanities majors aren’t obviously “wiser” than others.

Writing style and literary vs genre

Big thread on clear vs dense prose:
- One camp says complex sentence structures that could be simpler are bad writing.
- Another insists that intricate, indirect prose can itself be part of the message.
“Kill your darlings” comes up repeatedly: stylish, overly clever sentences are seen as tempting but often harmful to narrative flow.
Distinction is drawn between “literary” (character/idea-driven) and “genre” (plot-driven) fiction, with different but valid goals.

Difficulty and seriousness of books

Several recommend “gateway” prize-winning novels as accessible entries into serious literature; others counter that some high-prestige works are genuinely hard and demand training.
Debate around non-linear, rule-breaking novels and invented dialects:
- Some readers bounce off them or feel “not smart enough.”
- Fans say the discomfort is the point, and the payoff comes after pushing through and sometimes rereading.

Audiobooks, time, and reading habits

Many strategies to “find time”: reading at breakfast/bedtime, carrying a book everywhere, setting 30-minute daily blocks, using libraries and due dates, and escaping offline for days.
Audiobooks are heavily endorsed for commutes and chores; some question whether they “count” as reading, but most say they do for storytelling and many kinds of non-fiction.
People report reading dramatically more when they:
- Replace phone/YouTube time with books.
- Allow themselves to read “just for fun” instead of only educational texts.

Recommendations and diversity

The thread is packed with recommendations across:
- Classics, prize-winners, existentialist and philosophical works.
- Experimental/dialect-heavy novels and postmodern fiction.
- Sci-fi, fantasy, horror, and “brain-bending” internet-born fiction.
- History, economics, science (especially genetics, physics, systems theory), and memoir.
Some meta-lists and personal “best of 2024” posts are shared; one commenter notes their favorites are now largely by women and people of color and argues they’re producing much of the most exciting work.

Other debates

A brief tangent argues that documentaries are often misleading compared to books, especially on topics the viewer knows well.
There’s a small linguistic skirmish over using singular “they” for an author whose gender isn’t known; others defend it as long-established English usage.
Several highlight how reading certain biographies or novels helped them understand neurodivergence, war crimes, tyranny, or historical injustice in more human terms.

View on HN ↗ Original Article ↗

2025-01-01

Static search trees: faster than binary search

Overall reception

Many commenters found the write‑up unusually thorough and appreciated seeing low‑level optimization work laid out step by step.
Some readers admitted the later SIMD and micro-optimizations were beyond their comfort zone but still valued the earlier conceptual parts.
A few people explicitly linked this kind of work to real-world speedups that, in aggregate, save large amounts of human time.

Language choice and accessibility

Large subthread on whether Rust was a good choice for examples.
- One side: C (or C++) is more universally readable; Python is popular for teaching; pseudocode is more language‑neutral.
- Other side: blog authors can pick any language; Rust is now common enough that developers should at least be able to read it; the code is mostly straightforward imperative logic.
Some argued Python is not suitable for intrinsics and cache-level tuning; C/C++ or Rust are more appropriate.
Debate over pseudocode: some want high-level pseudocode plus implementation appendix; others say pseudocode is too underspecified, especially for SIMD and cache-line details.

Rust vs other systems languages

Rust praised for:
- Portable SIMD in the standard ecosystem.
- Tooling (cargo, build scripts) and safety model compared with C++.
Counterpoints:
- Claims that C/C++ already have portable SIMD via libraries or compiler extensions.
- Concerns about Rust’s popularity (e.g., TIOBE rank) and long-term stability vs yet‑newer languages like Zig.
- Worries that relying on many third‑party crates is a security and legal risk.
Discussion of how hard certain data structures (graphs, doubly‑linked lists) are in Rust’s ownership model unless using indices or reference counting.

Algorithmic and performance discussion

Strong focus on constant‑factor speedups vs asymptotic big‑O; several note that practical wins often dwarf “better” theoretical algorithms.
Comparisons with:
- Interpolation search (fast on uniform data, bad worst‑case behavior).
- Eytzinger layout and cache‑aware trees, prefetching, batching queries, and SIMD vectorization.
- B‑trees, buffer/fractal trees, compacting B‑tree variants, and static vs dynamic trees.
- Bitmap/bitset approaches, roaring bitmaps, rank/select structures, minimal perfect hashing.
Ideas for further work: query partitioning and sorting, radix‑based schemes, compressed node representations, and van Emde Boas–style structures.

Use cases and practicality

Mentioned applications: DNA/suffix‑array indexing, search engines, SQL joins, and static indexes with occasional writes via a small mutable overlay.
Some skepticism about using such advanced material as interview questions; concern about mismatch with typical job work.

Presentation feedback

Several complaints about graph color choices making lines hard to distinguish; author acknowledges this as a to‑do.

View on HN ↗ Original Article ↗

2024-12-31

Happy New Year 2025

Community appreciation & role of HN

Many participants describe HN as a daily habit or “home page,” often for a decade or more.
Several credit the community with shaping their careers, studies, and worldview.
HN is contrasted positively with other social platforms: high signal, little infinite scroll, “least guilt-ridden procrastination,” and unusually high discussion quality.
There is repeated gratitude toward the moderators and their empathetic, active style, with one linking a New Yorker article about their work.
Some note a migration path over the years (e.g., from Slashdot/Digg/Reddit to HN) and say HN is now their main or only news source.

Global greetings & multilingual flavor

New Year wishes come from around the globe: US (various time zones), UK, Europe, Asia (including China, Singapore, Japan), Australia, the Middle East, and more.
Many greetings are shared in languages/scripts besides English (e.g., several Indian languages, German, Spanish, Norwegian, Telugu, Kannada, Polish, Maltese, Portuguese, Malayalam), sometimes with playful corrections or expansions.

Time, calendars, and numerology of 2025

Multiple comments explore mathematical curiosities:
- 2025 = 45², = (20+25)², = 9²·5², equals both the square of 1+…+9 and the sum of 1³+…+9³, sum of first 45 odd numbers, base-20 palindrome, Harshad number, and more.
- Some meta-discuss which identities are “cheating” (derivable from others) and how such properties are found.
- Historical perfect-square years are linked to major political and social shifts, with optimistic speculation about 2035.
Others note ISO 8601 rollover, Unix/hex timestamps, “0x37 years since epoch,” and the 2038 problem.
A few prefer solstice-based or “natural cycle” new years over the Gregorian date.

Coding jokes & hacker culture

Playful snippets show ++year, React state updates, jQuery DOM mutations, infinite loop { year += 1; }, and “year of the desktop OS” jokes.
References to classic HN lore (e.g., famous threads, Dropbox/Putnam stories) appear.

Hopes, worries, and resolutions for 2025

Common wishes: peace, less polarization, personal growth, better health, meaningful work, and more kindness and love.
Some mention specific goals: learning to cook, adopting “yearly themes,” learning OCaml, writing more, working on fusion energy, cutting back on alcohol/weed.
A few darker or skeptical notes: reading pessimistic history/forecasting, joking about extinction-level events, concern about AI singularity arriving too soon.
There’s explicit appreciation for “real people” conversations, curiosity about whether such communities can become more common, and inclusive New Year wishes extended even to bots and animals.

View on HN ↗

2024-12-31

FBI: Largest homemade explosives cache in agency history found in Virginia

No Lives Matter ideology and extremist ecosystems

Debate over whether “No Lives Matter” (NLM) is an organization, a loose ideology, or just an edgy meme.
Some link it to Telegram groups, the 764 network, MKY, and neo‑Nazi/Satanist currents like O9A; others stress that a meme patch doesn’t prove membership in anything.
Discussion of how symbols and memes serve as identity signals even without formal membership.

Encrypted apps and narrative framing

One line in the article about NLM coordinating via encrypted apps triggers concern that this will be used to justify backdoors or further surveillance.
Some think the sentence is accurate but manipulative, associating “encrypted apps” with “far-right ideologies.”
Others say it simply states the obvious (coordination not happening on public platforms) and don’t see it as a smear.

Explosives legality and technical points

Multiple comments explain that many explosives and precursors (Tannerite, black powder, ammonium nitrate, TNT) can be legal in the US under ATF rules, especially for agriculture, mining, and land clearing.
Pipe bombs and improvised devices can be legal only with proper destructive-device licensing and tax stamps, which are hard for individuals to obtain.
Some users dive into minutiae of ATF guidance, NFA classifications, constructive intent, and how personal vs commercial use is treated.
Technical aside on specific explosives (HMTD, ETN, TATP), their stability, and suitability as primaries vs main charges.

Why only charge a short‑barreled rifle?

Many note that, despite the explosive cache, the current federal charge is just possession of a short‑barreled rifle (SBR) without a tax stamp.
Explanations offered:
- SBR charge is an easy “holding” count while a fuller case is built (superseding indictments later).
- Some of the explosive materials/devices may technically be legal or hard to prosecute under current statutes.
Others are skeptical, seeing selective enforcement or “PR arrests,” and point out he’s out on bond, which they treat as evidence the threat may be overstated.
Long subthread on NFA constitutionality post‑Bruen, historical analogues, and claims that SBR restrictions rest on dubious precedent.

Guns, rights, and limits

Extended argument over the Second Amendment:
- One side emphasizes an individual right, including historically broad arms (cannons, warships) and distrusts modern restrictions.
- Others argue for strong regulation of high‑capacity or especially destructive weapons, raising hypotheticals like nukes to show limits are inevitable.
- Militia clause interpretation, “well regulated” meaning then vs now, and current statutory definitions of militia are debated.
Some propose a licensing model akin to driving: universal right to qualify, but real training and vetting, plus red‑flag mechanisms; others fear “gotcha laws” and slippery slopes.

Threat assessment vs civil liberties

Tension between:
- Those who think intervening early (when someone has explosives, extremist views, and past injuries from devices) is precisely what society should do.
- Those who see a “DIY explosives enthusiast” with edgy memes, arguing intent to commit terrorism is unproven and that mere possession + speech shouldn’t be criminalized.
Discussion of whether bail being granted suggests the court did not view him as an imminent threat, versus the possibility that more serious charges are still coming.

Radicalization and conspiracy thinking

Users ask how someone ends up believing things like “government trains missing children as school shooters.”
Explanations offered: information‑diet spirals (cable news → fringe media), online echo chambers (4chan, YouTube recommendations, Telegram), flattery of the audience’s “insight,” and the emotional appeal of simple conspiracies vs complex reality.
Parallels drawn to broader trends: nihilistic slogans (“no lives matter”), dehumanization, and post‑modern propaganda pushing people toward believing nothing.

Cultural memories and normalization of explosives

Several nostalgic anecdotes about 1980s–1990s access to bomb recipes, fireworks, “Anarchist Cookbook,” BBS culture, and teenage pyromania.
Some contrast that looser era with today’s zero‑tolerance environment; others point out survivorship bias and historical fatalities.

View on HN ↗ Original Article ↗

2024-12-31

Things we learned about LLMs in 2024

Energy use and climate impact

Several comments link 2024’s LLM boom to a surge in methane/gas power plants, arguing AI is extending fossil fuel lifetimes when emissions should be falling.
Others say AI is just one of many growing loads (EVs, reshoring, population) and that gas is still better than coal per kWh, though methane leakage may erase much of that benefit.
Some call for strict rules that data centers use renewables or fully internalize their carbon and water costs; a carbon tax (with either green subsidies or per‑capita rebates) is widely endorsed but seen as politically hard and needing global coordination.

Economics, business models, and AGI speculation

Thread debates whether current model inference is sold below energy cost; one correction says cheap models like Gemini/Nova at least cover energy, possibly helped by subsidies.
OpenAI’s very high valuation is questioned; some think it assumes regulatory capture or winning an “AGI race.” Others argue AGI, if real, would commoditize everything and erase most individual AI firms’ moats.
There’s broad agreement that model quality is converging and open weights (e.g., Llama-family) plus many hosts will push inference prices down, making long‑term margins thin.

Usefulness, slop, and criticism quality

Many see LLMs as power-user tools: extremely helpful but unreliable, requiring good prompting, context management, and manual verification.
Others report frequent hallucinations, shallow or incorrect summaries, and flood of low‑value “slop” content, especially when users are lazy or indiscriminate.
Some argue that dismissing LLMs outright is a mistake; what’s needed is better, more specific criticism and clearer guidance on where they work and where they don’t.

Coding and developer experience

Strong split: some report “spookily good” productivity gains (fast scaffolding, bug-spotting, DSL snippets, ad‑hoc tools, refactors); others see subtle bugs, fake APIs, and degraded code quality from overreliant colleagues.
Consensus that LLM-written code must be tested and code‑reviewed; they’re likened to overconfident junior devs.
Different models perform better in different stacks; Python/JS/React praised, Rust and some math-heavy or niche areas fare worse.

Agents, tooling, and local models

“Agents” is viewed as poorly defined marketing jargon; suggestions range from “multi-step workflows using tools” to “semi-autonomous software with goals.”
People like editor/CLI integrations and custom scripts for feeding codebases/docs into models.
Local models on high-RAM Apple laptops impress some, but GPU VRAM and power limits keep best models in data centers for now.

Social impacts and governance

Concerns include job displacement (especially knowledge workers), worsening inequality, content authenticity, medical/misinformation risks, and climate trade-offs.
Others highlight benefits: letting people ask “stupid” questions without judgment, tutoring, therapy‑like conversations, and faster access to complex information.

View on HN ↗ Original Article ↗

2024-12-31

Zildjian, a 400-year-old cymbal-making company in Massachusetts

Sabian, Zildjian, and Family Splits

A major subthread covers the split that created Sabian in Canada from the Zildjian family, including the factory’s original role as a second Zildjian site outside the US.
Commenters compare this to other family-business splits (Adidas/Puma, Aldi Nord/Süd, bathhouses, etc.).
Sabian’s name origin (from family initials) and legal constraints on using “Zildjian” in Sabian branding are discussed.

Unions, Labor Strategy, and the Canada Move

Several posts tie the Canadian factory to efforts to avoid or hedge against US unionization (Teamsters), plus export advantages to Europe.
Opinions on unions diverge: some argue most businesses dislike unions for causing instability; others say good unions can add stability and simplify bargaining.
There is debate over US unions being corrupt or mob-influenced vs. historically important for labor rights. No clear consensus.

Alloy, Trade Secrets, and Manufacturing

Multiple comments emphasize that Zildjian’s edge is not just alloy composition (often described as B20 bronze) but secret process steps; access to these secrets is tightly restricted internally.
Spectrometry could reveal composition, but reproducing the process is portrayed as hard and even dangerous (reports of MIT attempts leading to exploding mixes).
Some note that ancient and Zildjian-related bronze-working methods remain partly “arcane” despite modern metallurgy.

Sound, Synthesis, and Electronic Drums

Cymbal sound is described as extremely complex (position, intensity, stick type, sympathetic resonance), making accurate physical modeling difficult.
Electronic kits and modeled cymbals are seen as useful but still noticeably inferior in feel and expressiveness to acoustics, especially for nuanced players.
Several note that Zildjians may not sound best in-room but record exceptionally well.

Brand, Logos, and Market Perception

Zildjian’s logo is widely praised as iconic; Sabian’s newer logo is widely disliked.
Perceived hierarchy: some place Zildjian and Meinl at the top, with Sabian and Paiste slightly below, though others strongly prefer Sabian or Paiste for particular styles.
Some drummers report disliking Zildjian entry-level lines or sticks, while others are devoted fans; model line (A, K, ZBT, etc.) matters heavily.

Historical Roots, Istanbul Lineage, and Name

Several posts trace the historical Istanbul factory and its continuation through “Istanbul” and “Istanbul Mehmet” cymbals, preserving 17th-century hand techniques and the “old K sound.”
The “Zildjian” surname is explained as Turkish “zilci” (cymbal/bell maker) plus Armenian “-ian” (son of), roughly “son of the cymbal maker,” reportedly granted by an Ottoman sultan.
There is detailed discussion of Ottoman Turkish vs. Arabic script and broader naming practices (occupational surnames, late adoption of family names).

Business Practices, Inventory, and Longevity

A small tangent contrasts Zildjian’s practice of large annual washer orders with just-in-time supply chains; some see their conservative approach as part of centuries-long resilience.
Others compare Zildjian’s age to very old Japanese firms and lament modern “build to flip” startup culture and venture-capital-driven short-termism.

Costs, Gear Choices, and Player Experience

Cymbals are noted as expensive (hundreds of dollars each), but still relatively accessible compared to high-end string or wind instruments.
Drummers swap experiences on durability (how playing technique affects breakage), model selection by genre, and how beginners often overplay and destroy gear faster.

View on HN ↗ Original Article ↗

2024-12-31

The GTA III port for the Dreamcast has been released

Port overview & requirements

Port is based on the RE3 decompilation of GTA III, adapted to run on Dreamcast hardware.
Requires original game data (textures, models, sounds, fonts, etc.); it’s just an engine/port, not a full game.
Some hope for custom texture packs was expressed, but it’s noted that would also require replacing many other asset types.

Technical achievement & performance

Gameplay captures on real Dreamcast hardware show it running at roughly ~15 fps, with some videos demonstrating VGA output from a CRT.
The port involved substantial work: e.g., converting model geometry from triangle lists to triangle strips to better fit the PowerVR GPU and improve throughput.
RenderWare as the original engine is mentioned as a factor that makes such ports more feasible.
Widely described as an impressive or even “incredible” port, though some compare it with other extreme ports (e.g., Tomb Raider on GBA) and call that praise hyperbolic.

Legal and IP concerns

Multiple comments expect a takedown similar to earlier RE3 actions and urge others to mirror the code.
Debate over legality: reverse-engineered/re-implemented engines using original game logic are said to be in a legal gray area or outright infringing unless done via strict “clean room” methods.
Others argue that, legal or not, current copyright enforcement is socially harmful and overly aggressive.

Retrocomputing, preservation & emulation

Strong enthusiasm for old consoles as “immutable” long‑term platforms and de facto VMs that may outlast modern stacks.
Discussion of hardware longevity (capacitors, power supplies, optical drives) and replacement/repair paths, including flash-based optical drive emulators and FPGA systems (MiSTer, Analogue devices).
Some argue emulators with higher-res textures and better performance are more practical than real hardware; others respond that the hacking/engineering challenge is the real goal.

Dreamcast hardware & market legacy

Dreamcast is praised as elegant and relatively simple to develop for, especially compared to Saturn and contemporaries like PS2.
Debate over whether it was technically superior; consensus in-thread is that it had advantages (VRAM, texture quality, 480p output) but was not “vastly” more powerful overall.
Broader discussion on why Dreamcast failed: PS2’s DVD playback, PS1 legacy, stronger franchises, early discontinuation of Dreamcast, and Sega’s prior missteps.

Motivation

Several comments ask “what’s the point?” when easier ways to play GTA III exist.
Replies emphasize that the port is primarily about fun, technical challenge, and preservation, not about optimal gameplay experience.

View on HN ↗ Original Article ↗

2024-12-31

Systems ideas that sound good but almost never work

Domain-Specific Languages (DSLs)

Major split: some report DSLs “never” working well; others cite many successes.
Success patterns: small, tightly scoped, well-documented DSLs; embedded DSLs in a host language (Lisp, Ruby, Kotlin, C#) with IDE support, autocomplete, and fast feedback.
Good for: letting domain experts encode rules (forms, ASIC configs, planners, reporting) without learning a full language; expressing behavior in domain terms.
Failure patterns: trying to “replace” an existing language (e.g., SQL) with a heavier DSL; large, growing DSLs that leak underlying complexity and become unmaintainable.
Disagreement over definitions: when is it a DSL vs “just an API” vs “a data format”; some insist syntax change is essential, others emphasize semantics and domain vocabulary.
Examples praised or criticized: regex widely cited as successful; HCL, XUL, E4X, and complex format strings cited as scaling poorly.

ORMs, SQL, and Abstractions

Many argue abstracting away SQL via DSLs/ORMs is costly: you must still understand relational design, indices, and query plans.
Complaints: hard to express complex queries or tune performance; debugging by printing generated SQL and round-tripping to a DB tool.
Counterpoint: ORMs can reduce boilerplate mapping and be productive once you already understand SQL and the DB.

Control Loops, Autoscaling, and Hybrid Parallelism

Control loops (autoscaling, load-based throttling) are standard but easy to get wrong: runaway feedback, conflicting loops, “poisoned” signals, and cascading throttling are common failure modes.
Some see Kubernetes and ELBs as proof control loops work; others say they work only after large, expert investments.
Hybrid parallelism (multiple hardware types or layers of parallelism) is powerful in HPC but often adds prohibitive complexity in typical systems work.

APIs and “Let’s Just…” Ideas

“Let’s just add an API” is criticized as underestimating work: design, authz/authn, rate limiting, caching, correctness, error messages, versioning, and documentation.
Organizational issues (politics, unclear ownership, timelines) often sink API projects more than raw technology.
Similar caution applies to “let’s just sync the data,” cross-platform rewrites, P2P caches, anomaly detection, and event sourcing: all can work, but only with deep design and sustained investment.

Overall Tone

Broad agreement that these ideas are not inherently bad; they’re deceptively hard and frequently overused as premature optimization or resume-driven engineering.
Some see the article as overly pessimistic; others view it as a necessary warning against “let’s just” thinking.

View on HN ↗ Original Article ↗

2024-12-31

Show HN: Watch 3 AIs compete in real-time stock trading

Project setup & data

System runs three LLMs (GPT‑4o, Gemini 1.5 Pro, Claude 3 Sonnet) that each pick one stock daily.
News source: latest ~50 market articles from Alpaca News API; trading via Alpaca with $5 per trade using fractional shares where supported, currently U.S. stocks only.
Only long buys are implemented so far; no shorting; most positions are still open, so only unrealized P/L exists.

Prompting & trading logic

Prompting includes explicit “market analyst” role, sector diversification, and focus on “hidden gems” vs mega‑caps.
Models must output structured JSON, justify a thesis, specify catalysts (earnings, FDA dates, launches, conferences), and give a precise holding period.
Holding periods are currently set once at purchase and not updated with new information; some see this as a key next improvement.
Prompts bias toward buying because they explicitly ask for a stock to buy and a holding period; users notice divergence from ad‑hoc ChatGPT answers.

Benchmarks, controls & evaluation

Multiple commenters call for benchmarks: S&P 500 (e.g., VOO), leveraged ETFs (e.g., TQQQ), and random or “monkey” bots as controls.
Others argue you’d need many independent runs to estimate Sharpe ratios; one run of three bots is statistically weak.
Debate around comparing to hedge funds and quant shops, with conflicting claims about realistic Sharpe ratios and long‑term returns.

Skepticism, risks & limitations

Many expect daily forced trading to underperform due to fees, slippage, and lack of an edge, citing research that most day traders lose money.
Some see the experiment as unscientific entertainment; others still find it a valuable “real‑world eval.”
Concern that LLMs may hallucinate financial narratives (e.g., a fictitious “Phase 3 Bitcoin ETF trial”) and favor trendy themes like crypto/AI.
Discussion of alpha decay: any consistently winning strategy would lose its edge once widely copied.

Technical & UX feedback

Users report UI quirks (scrolling issues) and repeated newsletter email bugs (bad verification URLs, rate limits, duplicate mailings).
Suggestions: show unrealized gains in headline stats, expose more of the analysis process, add countdown to next trade, show fractional share amounts.
Some request open‑sourcing code and support for more or newer models (e.g., Gemini experimental, o1, Llama via LiteLLM).

View on HN ↗ Original Article ↗

2024-12-31

Darktable 5.0.0

Positioning and Terminology

Darktable is often perceived as a Lightroom alternative, though its site explicitly says it’s not a “free Lightroom replacement.”
Some dislike product descriptions that reference proprietary tools; others note “lighttable” and “darkroom” are longstanding film terms, not Adobe-specific.

UX, Complexity, and Learning Curve

Many find Darktable powerful but intimidating, with too many modules, overlapping tools, and a steep learning curve (e.g., Filmic RGB, color calibration).
Several argue it prioritizes technical control and color science over usability; complaints include clumsy interactions, confusing module duplication, and poor defaults on RAW import.
Suggestions include a “beginner/simple mode” that exposes only common tools, with advanced features opt‑in.

Library vs Folders and Workflow

Strong divide between users wanting simple folder-based workflows and those accepting or preferring catalog/databases.
Some hate mandatory “libraries/film rolls” and just want to browse and edit files in-place.
Others point out databases enable fast thumbnails, metadata search, facial recognition, and object detection.

Performance and Scaling

Mixed reports on performance: laggy on some systems, better on modern hardware.
Large libraries (tens of thousands of RAWs, multi‑TB collections) are a pain point for most tools; a few report success with Digikam and others with Lightroom/Photomechanic.

Migration and Lock‑In

Edits from Lightroom cannot realistically be migrated; proprietary processing pipelines make cross‑tool edit transfer effectively impossible for any RAW editor.
Some view this as a reason to avoid ecosystem lock‑in despite Lightroom’s superior “just works” experience.

Alternatives and Ecosystem

For raw editing: RawTherapee, ART, Capture One, DxO PhotoLab, ON1, ACDSee, Luminar, Nitro, Photomator, AfterShot, etc. Each has tradeoffs in quality, features, price, or camera support.
For organization/web: Digikam, PhotoPrism, Immich, LibrePhotos, Nextcloud Memories, tonphotos; many users mix specialized tools (e.g., Darktable for RAW, Digikam for DAM).

Forks and Open Source Governance

The Ansel fork aims to reduce bloat and fix architectural issues, but is criticized as immature, slow, and missing newer Darktable features.
There is extensive debate over project governance, design‑by‑committee vs. strong leadership, and how volunteer-driven OSS often drifts into feature bloat and weak product management.

View on HN ↗ Original Article ↗

Hacker News, Distilled

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics

Related topics