Span<T>.SequenceEquals is faster than memcmp
Tiered compilation, microbenchmarks, and “regression”
- An apparent .NET 9 “for loop regression” was investigated and found to be an interaction between the microbenchmark and tiered compilation, not an actual runtime regression.
- Tiered compilation + Dynamic PGO + OSR mean methods start minimally optimized, then are recompiled once they’re called enough or loop heavily (OSR after ~50K iterations).
- Some commenters criticize thresholds based on call count rather than “time spent” and argue the optimizer could use function size or runtime cost; others note the runtime can’t know benefit or compile cost in advance and multiple concrete types complicate decisions.
- BenchmarkDotNet’s behavior (running until a time target) can obscure whether you’re measuring pre- or post-OSR code.
Why Span<T>.SequenceEqual beats memcmp in .NET
- The performance gap isn’t “C vs C#” but P/Invoke and marshalling overhead vs a JIT‑inlined managed implementation.
SequenceEqualfor spans/arrays/strings is highly optimized, uses portable SIMD and intrinsics, and can choose the widest supported vectors at runtime.- P/Invoke must set up a frame for native calls, do GC polling, and can’t be inlined; even using
fixedpointers orLibraryImportonly trims overhead slightly. memcmpin the C runtime may be less aggressively tuned for modern SIMD than the .NET span helpers; some note that in C/C++memcmpoften compiles to intrinsics orbcmp.- Commenters emphasize that the lesson is: in modern .NET, the standard library’s span-based primitives are the right tool; P/Invoking
memcmpis now a pessimization.
Span
- Clarification:
Span<T>itself (pointer + length) is stack-only, but the memory it refers to can be on the heap, stack (stackalloc), native, or embedded constants. - Its design doesn’t assume any allocation strategy; it’s similar conceptually to C++
std::spanor Rust&mut [T], with extra safety enforced by “ref struct” restrictions and lifetime analysis. Span<T>cannot be a field on heap objects, but can wrap unmanaged memory or constant data; readonly spans over literal arrays are common and largely invisible to developers.
.NET performance, JIT vs native, and ecosystem observations
- Many note how fast recent .NET versions are, with built‑in Dynamic PGO and aggressive SIMD work (including contributions tuned for future Intel CPUs).
- Comparisons are made with Java, Go, Rust, C++, and JavaScript; consensus is that mainstream JITed runtimes (JVM, .NET, V8) are highly competitive, especially due to PGO.
- Some argue JIT makes it harder to reason about exact assembly and encourages “that’ll do” attitudes; others counter with concrete examples of sophisticated SIMD code and stress-free ISA selection.
SQLClient and environment-dependent performance
- One practitioner reports
Microsoft.Data.SqlClientbeing 7–10x slower on Linux (especially in containers) than on Windows, producing a ~2x application slowdown. - Follow‑up claims tie this to poor algorithms (e.g., O(n²) packet reassembly) and unrealistic performance testing (replaying trace files instead of real network patterns).
- By contrast, PostgreSQL clients are said to perform more consistently across OSes, prompting some to favor Postgres/MariaDB.
StackOverflow, LLMs, and code copying
- Several comments highlight outdated StackOverflow answers as “bit-rot” that keeps getting replicated by humans and LLMs.
- Stories are shared about blindly copied code with known bugs, licensing risks (CC BY‑SA), and even deliberately backdoored answers.
- There’s a split between “elitist” calls to deeply understand all code and more pragmatic views that knowing the right question and verifying borrowed code is often sufficient.
- Some teams culturally discourage direct SO copying; others embed SO links in code as documentation and learning breadcrumbs.
Other notes and critiques
- LINQ’s
SequenceEqualforwards to the same optimized span-based routines when possible. - Some developers say
Span<T>has become their default for working with contiguous data and slicing. - One commenter criticizes the article’s charts: too many series for a bar chart, poor color choices, excessive precision in timing tables, and lack of more meaningful metrics like cycles/byte or fitted slopes/overheads.
- Another notes that more recent StackOverflow answers on the array-comparison topic already recommend
ReadOnlySpan<T>.SequenceEqual, suggesting the “old advice” is being corrected within that ecosystem too.