2025-03-30

Span<T>.SequenceEquals is faster than memcmp

Tiered compilation, microbenchmarks, and “regression”

An apparent .NET 9 “for loop regression” was investigated and found to be an interaction between the microbenchmark and tiered compilation, not an actual runtime regression.
Tiered compilation + Dynamic PGO + OSR mean methods start minimally optimized, then are recompiled once they’re called enough or loop heavily (OSR after ~50K iterations).
Some commenters criticize thresholds based on call count rather than “time spent” and argue the optimizer could use function size or runtime cost; others note the runtime can’t know benefit or compile cost in advance and multiple concrete types complicate decisions.
BenchmarkDotNet’s behavior (running until a time target) can obscure whether you’re measuring pre- or post-OSR code.

Why Span<T>.SequenceEqual beats memcmp in .NET

The performance gap isn’t “C vs C#” but P/Invoke and marshalling overhead vs a JIT‑inlined managed implementation.
SequenceEqual for spans/arrays/strings is highly optimized, uses portable SIMD and intrinsics, and can choose the widest supported vectors at runtime.
P/Invoke must set up a frame for native calls, do GC polling, and can’t be inlined; even using fixed pointers or LibraryImport only trims overhead slightly.
memcmp in the C runtime may be less aggressively tuned for modern SIMD than the .NET span helpers; some note that in C/C++ memcmp often compiles to intrinsics or bcmp.
Commenters emphasize that the lesson is: in modern .NET, the standard library’s span-based primitives are the right tool; P/Invoking memcmp is now a pessimization.

Span semantics and comparisons to other languages

Clarification: Span<T> itself (pointer + length) is stack-only, but the memory it refers to can be on the heap, stack (stackalloc), native, or embedded constants.
Its design doesn’t assume any allocation strategy; it’s similar conceptually to C++ std::span or Rust &mut [T], with extra safety enforced by “ref struct” restrictions and lifetime analysis.
Span<T> cannot be a field on heap objects, but can wrap unmanaged memory or constant data; readonly spans over literal arrays are common and largely invisible to developers.

.NET performance, JIT vs native, and ecosystem observations

Many note how fast recent .NET versions are, with built‑in Dynamic PGO and aggressive SIMD work (including contributions tuned for future Intel CPUs).
Comparisons are made with Java, Go, Rust, C++, and JavaScript; consensus is that mainstream JITed runtimes (JVM, .NET, V8) are highly competitive, especially due to PGO.
Some argue JIT makes it harder to reason about exact assembly and encourages “that’ll do” attitudes; others counter with concrete examples of sophisticated SIMD code and stress-free ISA selection.

SQLClient and environment-dependent performance

One practitioner reports Microsoft.Data.SqlClient being 7–10x slower on Linux (especially in containers) than on Windows, producing a ~2x application slowdown.
Follow‑up claims tie this to poor algorithms (e.g., O(n²) packet reassembly) and unrealistic performance testing (replaying trace files instead of real network patterns).
By contrast, PostgreSQL clients are said to perform more consistently across OSes, prompting some to favor Postgres/MariaDB.

StackOverflow, LLMs, and code copying

Several comments highlight outdated StackOverflow answers as “bit-rot” that keeps getting replicated by humans and LLMs.
Stories are shared about blindly copied code with known bugs, licensing risks (CC BY‑SA), and even deliberately backdoored answers.
There’s a split between “elitist” calls to deeply understand all code and more pragmatic views that knowing the right question and verifying borrowed code is often sufficient.
Some teams culturally discourage direct SO copying; others embed SO links in code as documentation and learning breadcrumbs.

Other notes and critiques

LINQ’s SequenceEqual forwards to the same optimized span-based routines when possible.
Some developers say Span<T> has become their default for working with contiguous data and slicing.
One commenter criticizes the article’s charts: too many series for a bar chart, poor color choices, excessive precision in timing tables, and lack of more meaningful metrics like cycles/byte or fitted slopes/overheads.
Another notes that more recent StackOverflow answers on the array-comparison topic already recommend ReadOnlySpan<T>.SequenceEqual, suggesting the “old advice” is being corrected within that ecosystem too.

Related topics