Span<T>.SequenceEquals is faster than memcmp

Tiered compilation, microbenchmarks, and “regression”

  • An apparent .NET 9 “for loop regression” was investigated and found to be an interaction between the microbenchmark and tiered compilation, not an actual runtime regression.
  • Tiered compilation + Dynamic PGO + OSR mean methods start minimally optimized, then are recompiled once they’re called enough or loop heavily (OSR after ~50K iterations).
  • Some commenters criticize thresholds based on call count rather than “time spent” and argue the optimizer could use function size or runtime cost; others note the runtime can’t know benefit or compile cost in advance and multiple concrete types complicate decisions.
  • BenchmarkDotNet’s behavior (running until a time target) can obscure whether you’re measuring pre- or post-OSR code.

Why Span<T>.SequenceEqual beats memcmp in .NET

  • The performance gap isn’t “C vs C#” but P/Invoke and marshalling overhead vs a JIT‑inlined managed implementation.
  • SequenceEqual for spans/arrays/strings is highly optimized, uses portable SIMD and intrinsics, and can choose the widest supported vectors at runtime.
  • P/Invoke must set up a frame for native calls, do GC polling, and can’t be inlined; even using fixed pointers or LibraryImport only trims overhead slightly.
  • memcmp in the C runtime may be less aggressively tuned for modern SIMD than the .NET span helpers; some note that in C/C++ memcmp often compiles to intrinsics or bcmp.
  • Commenters emphasize that the lesson is: in modern .NET, the standard library’s span-based primitives are the right tool; P/Invoking memcmp is now a pessimization.

Span semantics and comparisons to other languages

  • Clarification: Span<T> itself (pointer + length) is stack-only, but the memory it refers to can be on the heap, stack (stackalloc), native, or embedded constants.
  • Its design doesn’t assume any allocation strategy; it’s similar conceptually to C++ std::span or Rust &mut [T], with extra safety enforced by “ref struct” restrictions and lifetime analysis.
  • Span<T> cannot be a field on heap objects, but can wrap unmanaged memory or constant data; readonly spans over literal arrays are common and largely invisible to developers.

.NET performance, JIT vs native, and ecosystem observations

  • Many note how fast recent .NET versions are, with built‑in Dynamic PGO and aggressive SIMD work (including contributions tuned for future Intel CPUs).
  • Comparisons are made with Java, Go, Rust, C++, and JavaScript; consensus is that mainstream JITed runtimes (JVM, .NET, V8) are highly competitive, especially due to PGO.
  • Some argue JIT makes it harder to reason about exact assembly and encourages “that’ll do” attitudes; others counter with concrete examples of sophisticated SIMD code and stress-free ISA selection.

SQLClient and environment-dependent performance

  • One practitioner reports Microsoft.Data.SqlClient being 7–10x slower on Linux (especially in containers) than on Windows, producing a ~2x application slowdown.
  • Follow‑up claims tie this to poor algorithms (e.g., O(n²) packet reassembly) and unrealistic performance testing (replaying trace files instead of real network patterns).
  • By contrast, PostgreSQL clients are said to perform more consistently across OSes, prompting some to favor Postgres/MariaDB.

StackOverflow, LLMs, and code copying

  • Several comments highlight outdated StackOverflow answers as “bit-rot” that keeps getting replicated by humans and LLMs.
  • Stories are shared about blindly copied code with known bugs, licensing risks (CC BY‑SA), and even deliberately backdoored answers.
  • There’s a split between “elitist” calls to deeply understand all code and more pragmatic views that knowing the right question and verifying borrowed code is often sufficient.
  • Some teams culturally discourage direct SO copying; others embed SO links in code as documentation and learning breadcrumbs.

Other notes and critiques

  • LINQ’s SequenceEqual forwards to the same optimized span-based routines when possible.
  • Some developers say Span<T> has become their default for working with contiguous data and slicing.
  • One commenter criticizes the article’s charts: too many series for a bar chart, poor color choices, excessive precision in timing tables, and lack of more meaningful metrics like cycles/byte or fitted slopes/overheads.
  • Another notes that more recent StackOverflow answers on the array-comparison topic already recommend ReadOnlySpan<T>.SequenceEqual, suggesting the “old advice” is being corrected within that ecosystem too.