The CLR uses a generational GC (Gen0, Gen1, Gen2). Short-lived objects start in Gen0 and are collected frequently with low cost. Objects that survive promotions land in Gen2, which is collected rarely. The GC uses a mark-and-sweep + compaction algorithm — it marks all live roots, sweeps unreachable objects, and compacts heap memory.
To minimise leaks: implement IDisposable and use using blocks for unmanaged resources. Unsubscribe from static events (a classic leak — the event source keeps the subscriber alive). Use WeakReference for caches so cached objects can be collected. Be careful with closures capturing large objects. Use memory profilers like dotMemory or the VS Diagnostic Tools to detect unexpected Gen2 promotions or growing retained bytes.
The compiler transforms an async method into a state machine. When you hit await, the current thread is released back to the thread pool while the awaited operation completes. A SynchronizationContext captures the ambient context (e.g., UI thread in WPF/WinForms) and marshals continuations back to that specific thread.
To prevent starvation: use ConfigureAwait(false) in library code so continuations don't marshal back unnecessarily. Never call .Result or .Wait() on a Task — this blocks a thread pool thread, which can deadlock when combined with a sync context. Avoid creating excessive parallel tasks without throttling via SemaphoreSlim. Monitor with ThreadPool.GetAvailableThreads() in diagnostics.
IEnumerable executes in-memory — EF fetches all rows from the DB first, then C# filters the results. IQueryable builds an expression tree that EF translates into SQL, executing the filter at the database level.
Use IQueryable when building EF query chains that should result in a single optimised SQL query. Switch to IEnumerable once you've called .ToList() and are working in memory. A common mistake is calling .AsEnumerable() mid-query accidentally, pulling back a million rows and filtering in C#. Always call .ToList() as late as possible and as close to the service/controller boundary as you can.
DI decouples concrete implementations from consumers, enabling testability, loose coupling, and easy swap of implementations. .NET's built-in IServiceCollection supports three lifetimes: Scoped (one instance per HTTP request), Transient (new instance each time), and Singleton (one instance for the app's lifetime).
For large systems like an MDM platform, DI is critical — it lets you inject mocked versions of device management clients, push notification providers, and database contexts during unit testing without touching real infrastructure. It also enforces the Single Responsibility Principle; if a constructor has 8 parameters, that's a code smell telling you a class is doing too much.
Task.Run is the modern, safe wrapper — it always queues to the default thread pool scheduler and correctly unwraps nested Tasks. Task.Factory.StartNew is lower-level and dangerous: it inherits the current scheduler, doesn't automatically unwrap async lambdas (you need .Unwrap()), and can schedule on the UI thread unexpectedly.
To prevent starvation: avoid blocking calls (no .Wait()/.Result). Use SemaphoreSlim to throttle the number of concurrent operations. Keep async all the way up the call stack. For long-running CPU work, pass TaskCreationOptions.LongRunning to StartNew so it spins up a dedicated thread rather than consuming a pool slot.
- class — Reference type on the heap. Use for entities with identity, mutable state, and behaviour. The default choice for most domain objects.
- struct — Value type on the stack. Use for small, immutable data where copy semantics are correct and allocation overhead matters (e.g., Point, Colour, Decimal wrappers). Avoid large structs — copying is expensive.
- record — Immutable reference type with value-based equality baked in. Perfect for DTOs, API response models, and domain value objects (e.g., Money, Address). record struct gives you value semantics plus concise syntax.
Rule of thumb: entities get classes, value objects get records, and performance-critical tiny data gets structs.
Use URL versioning (/api/v1, /api/v2) or header-based versioning via the Asp.Versioning NuGet package. The golden rule: never remove or rename fields on an existing contract — only add new optional fields. Treat your request/response models as append-only.
Run both route handlers side-by-side. Deprecate v1 with a Sunset response header giving clients a migration deadline. Use feature flags for a gradual rollout of new behaviour. Document every breaking change prominently. The goal is that v1 clients keep working untouched while v2 clients get the new capabilities.
LINQ introduces delegate allocations on every call, boxing for value types, and deferred evaluation that can be misused. Avoid multiple enumerations of the same query (call .ToList() once and reuse). In hot paths, a raw for loop with early exit will outperform LINQ significantly. Watch out for N+1 queries in EF when using LINQ without .Include().
For extremely high-throughput scenarios, consider Span<T>, ArrayPool<T>, or raw ADO.NET with Dapper for reads. Reserve LINQ for readability where performance is not critical, and profile with BenchmarkDotNet before optimising prematurely.
Records are reference types that compile with value-based equality baked in — Equals(), ==, and GetHashCode() are auto-generated based on all properties, not object reference. Positional records (record Person(string Name, int Age)) auto-generate a primary constructor, Deconstruct(), and immutable init-only setters.
The with expression performs non-destructive mutation — it creates a shallow copy with the specified properties changed: var updated = original with { Age = 31 };. The original is untouched. Under the hood, with calls a compiler-generated Clone() method then sets the changed properties. Key for domain value objects that should be immutable but need to produce modified versions.
- record struct — Value type with compiler-generated value equality and with support. Mutable by default (unlike record class). Good for lightweight value objects like Coordinate or DateRange.
- readonly struct — Guarantees all fields are readonly; the struct can never be mutated after creation. Enables compiler optimisations — in parameters pass by reference without defensive copies. Ideal for high-performance structs in hot paths.
- ref struct — Must live exclusively on the stack; cannot be boxed, stored in arrays, assigned to object, or captured in lambdas. Enables safe zero-allocation access to stack memory. Span<T> and ReadOnlySpan<T> are the canonical examples — critical for high-performance parsing and buffer manipulation.
Reflection uses the System.Reflection API to inspect types, invoke methods, and read/write properties at runtime. It bypasses compile-time type safety and is typically 10–100× slower than direct calls due to metadata lookups, lack of JIT inlining, and boxing of value type arguments.
Mitigations: cache MethodInfo and PropertyInfo instances rather than calling GetMethod() on every invocation. Use compiled expression trees (Expression.Lambda<Func<T>>) to generate a delegate once and call it at near-direct speed. The modern preferred alternative is Source Generators (Roslyn) — they run at compile time, produce strongly-typed code, and have zero runtime overhead. The System.Text.Json source generator is a textbook example.
An Expression<TDelegate> represents a lambda as a data structure (an AST) rather than compiled IL. This lets you inspect and transform code at runtime. Entity Framework translates LINQ expressions into SQL using this mechanism — it walks the expression tree and builds a parameterised query.
Practical uses: building dynamic filter predicates for EF queries at runtime (safer than string concatenation), generating fast-path property accessors (compile once, call thousands of times at near-native speed), and implementing specification patterns where predicates are composable. The key mental model: Func<T, bool> is executable code; Expression<Func<T, bool>> is inspectable data that can also be compiled.
Source Generators are Roslyn extensions that run during compilation and emit additional C# code into the compilation. The generated code is fully strongly-typed, AOT-compatible, and has zero runtime overhead. They're ideal for eliminating boilerplate that would otherwise require reflection: serialisers, DI registration, mapping code, validators.
Key examples: System.Text.Json source gen for allocation-free JSON serialisation, Refit for generating HTTP client interfaces. Use them when you find yourself writing the same reflection pattern across the codebase. The incremental generator API (IIncrementalGenerator) ensures only changed syntax nodes are reprocessed, keeping build times acceptable.
- Span<T> — A stack-only ref struct providing a view over contiguous memory (array slice, stack buffer, or native memory). Zero allocation for slicing. Use for synchronous high-performance parsing in hot paths.
- Memory<T> — The heap-compatible counterpart to Span. Can be stored in fields and used across await boundaries. Slightly more overhead but still allocation-free for slicing.
- ArrayPool<T> — Rent a buffer, use it, return it. Avoids GC pressure from frequent large array allocations. Critical in serialisation and network I/O. Always return arrays in a finally block.
unsafe enables pointer arithmetic, direct memory access, and fixed statements (pinning a managed object so the GC can't move it). Legitimate uses: P/Invoke interop with native libraries, writing high-performance parsers that manipulate raw bytes, and hardware-level operations.
Risks: no bounds checking (buffer overruns cause corruption or security vulnerabilities), pinned objects fragment the managed heap, and unsafe code can corrupt the process in ways that are very hard to debug. Modern preference: use Span<T> and MemoryMarshal to get most of the performance without unsafe semantics. Reserve true unsafe blocks for P/Invoke scenarios where no managed alternative exists.
- ConcurrentDictionary — Fine-grained striped locking allows multiple simultaneous readers and writers. Use for shared mutable state from many threads. Watch out: GetOrAdd's factory can be called multiple times concurrently; use AddOrUpdate when the value must be computed exactly once.
- ImmutableDictionary — All mutations return a new instance. Safe for concurrent reads with no locking. Best for config/lookup tables that are infrequently rebuilt. Slower writes.
- Dictionary + lock — Coarse-grained, simple to reason about. The right choice when a batch of operations must be atomic as a unit. For low-contention, the overhead is negligible.
- lock(obj) — Syntactic sugar for Monitor.Enter/Exit. In-process only. Fast, but can't be used across await points.
- Monitor — Gives you TryEnter with a timeout and Wait/Pulse for producer-consumer signalling. Still in-process.
- Mutex — OS-level, works across processes. Use for single-instance apps or cross-process resource guards. Much slower than Monitor.
- SemaphoreSlim — The async-compatible primitive. await semaphore.WaitAsync() releases the thread while waiting (unlike lock which blocks). Use to throttle concurrent async operations. The go-to tool for async concurrency control.
volatile prevents compiler and CPU reordering and ensures every read goes to main memory rather than a CPU cache. Use for simple flags read by one thread and written by another. Does not give you atomicity for compound operations.
Interlocked provides atomic CPU-level operations: Interlocked.Increment, CompareExchange, Add. No lock needed, no torn reads. Perfect for shared counters. lock is needed when you have compound read-modify-write logic that must be atomic as a unit. Rule of thumb: Interlocked for counters, SemaphoreSlim for async guards, lock for multi-step critical sections.
Channel<T> is a modern async-native producer-consumer queue. Producers await writer.WriteAsync(item) and consumers await reader.ReadAsync() — no polling, no busy waiting, full async throughout.
ConcurrentQueue is synchronous — TryDequeue returns false when empty, forcing you to poll or signal manually. Channel is the superior abstraction. With BoundedChannel, you get built-in backpressure — producers are suspended when the buffer is full rather than growing unboundedly. Use it for worker pipelines, log ingestion, and any scenario where producers and consumers run at different speeds.
IAsyncEnumerable<T> lets you produce values lazily across await points using yield return in an async method. The consumer iterates with await foreach. Instead of fetching all 50,000 records into a List before returning, you stream each item as soon as it's ready.
Real-world use: streaming database results with Dapper's QueryUnbufferedAsync, paginating through a third-party API, or streaming server-sent events to a client. Memory stays constant regardless of total result size. EF Core supports it directly via AsAsyncEnumerable(). Supports CancellationToken via the [EnumeratorCancellation] parameter.
Task is a heap-allocated reference type. Even when an async method completes synchronously (common for cache hits), it still allocates a Task object. ValueTask is a struct that avoids that allocation when the operation completes synchronously — it only allocates if it actually needs to await.
Use ValueTask when: the method frequently completes synchronously and the method is on a hot path at very high frequency. Critical pitfall: never await a ValueTask more than once and never store it for later — it can only be consumed once. For anything not on a hot path, use Task for simplicity.
Cancellation is cooperative — you call CancellationTokenSource.Cancel() and the operation checks token.IsCancellationRequested or calls token.ThrowIfCancellationRequested(). Always pass the token down to every async call so the entire chain cancels cleanly.
CancellationTokenSource.CreateLinkedTokenSource(token1, token2) creates a composite token that fires if either source fires. Classic use: combine a request-scoped token (from ASP.NET's HttpContext.RequestAborted) with a timeout token so an operation cancels on whichever comes first. Always dispose CancellationTokenSource — it registers a timer internally for timeouts and leaks if not disposed.
- Parallel.ForEach — CPU-bound work over a collection. Uses the thread pool with configurable MaxDegreeOfParallelism. Blocks until all iterations complete. Good for image resizing, batch data transformation.
- PLINQ (.AsParallel()) — Parallel LINQ for collection processing. Less control over thread count. Best for CPU-bound, stateless transformations.
- Task.WhenAll — For I/O-bound async work — firing multiple HTTP calls or DB queries concurrently. No new threads created; the pool resumes continuations as each completes. Using Parallel.ForEach for async I/O is a classic mistake — it blocks threads rather than releasing them.
Classic deadlock: in legacy ASP.NET a sync context allows only one thread at a time. You call .Result or .Wait() on an async Task. This blocks the sync context thread. When the Task completes, it tries to marshal its continuation back to that same thread — which is blocked. Both sides wait for each other — deadlock.
Fix 1: never block on async code — go async all the way up the stack. Fix 2: in library code, add ConfigureAwait(false) on every await so continuations don't need to return to the original context. Fix 3: use Task.Run() to execute on a thread pool thread with no sync context. ASP.NET Core removed this problem, but it still exists in WPF, WinForms, and legacy ASP.NET.
- where T : class — Reference type only. Enables nullability checks and reference equality.
- where T : struct — Value type only. Useful for numeric algorithms or stack-allocation guarantees.
- where T : new() — Must have a parameterless constructor. Lets you call new T() inside the generic method — useful for factory patterns.
- where T : ISomeInterface — Enables calling interface methods without reflection.
- where T : unmanaged — No managed references. Enables Span<T>, sizeof(T), and pointer operations. Critical for high-performance serialisation.
- where T : notnull — Cannot be nullable. Improves null-safety in generic utility methods.
Covariance (out): allows IEnumerable<Derived> to be used where IEnumerable<Base> is expected — safe because you're only reading items, never writing. IEnumerable<T> declares T as covariant.
Contravariance (in): IComparer<Base> can be used where IComparer<Derived> is expected — if you can compare any Base, you can compare a Derived. Action<T> is contravariant. Only interfaces and delegates support variance — classes do not. Common bug: developers assume List<Derived> is a List<Base> — it isn't, because List allows writes which would corrupt type safety.
- Type pattern: if (obj is Order o) — tests type and binds in one step.
- Switch expression: returns a value directly with no fall-through: status switch { OrderStatus.Paid => "Paid", _ => "Unknown" }
- Property pattern: order is { Status: OrderStatus.Paid, Total: > 100 } — match on property values inline.
- Positional pattern: deconstructs a record/tuple: (var x, var y) is (> 0, > 0)
- List pattern (C# 11): [first, .., last] matches arrays/lists by structure.
- Guard clause (when): additional condition after type match.
These replace long if-else chains and are especially powerful for discriminated union-style modelling with sealed record hierarchies.
Enabling <Nullable>enable</Nullable> makes the compiler track nullability. string is now non-nullable by default — a string? must be checked before use. Compiler warnings replace what were formerly silent NullReferenceException crashes at runtime.
Gotchas: the compiler uses flow analysis but can't know everything — you sometimes need the null-forgiving operator (!) to suppress a false positive (use sparingly). NRT is opt-in per file/project. Retrofitting an existing codebase requires patience — enable per assembly and fix warnings in batches. Never use ! as a shortcut without understanding the actual nullability.
init accessors allow a property to be set in an object initialiser but become read-only after construction — providing immutability without forcing a constructor with many parameters.
required forces callers to set a property in the object initialiser — the compiler raises an error if it's omitted. Together: class OrderDto { public required string OrderId { get; init; } } — callers must provide OrderId and can't change it afterwards. This is the modern alternative to large constructor parameter lists for DTOs and configuration objects. Much safer than having optional setters that callers might forget.
Primary constructors declare constructor parameters directly on the class: class OrderService(IOrderRepository repo, ILogger<OrderService> logger). The parameters are in scope throughout the class body — no need to assign them to private fields manually.
Pitfalls: primary constructor parameters are not automatically stored as fields — the compiler captures them only where needed. If the parameter is a mutable reference type accessed in multiple methods, you have one shared capture. For DI services this is fine. For value types or mutable objects you intend to copy independently, the capture semantics can be surprising. Debugging tools may show them differently to normal fields.
Extension methods are static methods in a static class where the first parameter has the this modifier. They appear as instance methods at the call site — the compiler rewrites myString.TruncateTo(50) as a static call. They cannot access private members, cannot be overridden, and don't participate in virtual dispatch.
Design limitations: they can conflict with actual instance methods (instance always wins), and they can create confusing intellisense when overused. Best uses: fluent builder chains, adding utility methods to types you don't own, and the entire LINQ operator pipeline. Avoid using them to work around poor class design — if a type genuinely needs that method, it should usually be on the type itself.
Implicit conversions happen automatically with no cast syntax. Use only when the conversion is always safe and lossless. If there's any possibility of data loss or failure, implicit conversion is dangerous because it happens silently.
Explicit conversions require a cast and should be used when the conversion can fail, lose precision, or requires the developer to opt in deliberately. For strongly-typed ID types (like OrderId wrapping a Guid), a clean pattern is: implicit from the primitive to the ID type (constructing is always safe), explicit back to the primitive (unwrapping should be deliberate). Never implement implicit conversions between unrelated domain types.
Implement IEquatable<T> (typed Equals), override object.Equals(object), override GetHashCode(), and overload == and !=. Critical rule: if two objects are equal, their hash codes must be identical. Violating this breaks Dictionary and HashSet lookups silently.
Use HashCode.Combine(field1, field2) to combine all fields that participate in equality. For records, all this is generated automatically. For mutable types: never base GetHashCode on mutable state — the hash would change after mutation, orphaning the object in any hash-based collection it's stored in.
Lazy<T> is the correct tool. new Lazy<T>(() => new ExpensiveService(), LazyThreadSafetyMode.ExecutionAndPublication) ensures the factory runs exactly once even under concurrent access. Accessed via .Value. The thread safety mode blocks all threads until the first one completes initialisation.
Manual double-checked locking is the low-level alternative — only correct when combined with a volatile field to prevent CPU reordering. In practice, Lazy<T> should be your default. For async lazy initialisation, Lazy<Task<T>> works but the factory might run multiple times without ExecutionAndPublication.
The full pattern: implement a public Dispose() that calls Dispose(true) and GC.SuppressFinalize(this). Add a protected virtual Dispose(bool disposing) method — when disposing is true, release both managed and unmanaged resources; when false (called from the finalizer), release only unmanaged resources.
You only need a finalizer (~ClassName) when your class directly holds unmanaged resources (raw OS handles, native pointers). For most code wrapping managed disposables (streams, DB connections), a simple IDisposable without a finalizer is correct. Finalizers increase GC pressure — objects with them are promoted to Gen1 before cleanup. SafeHandle is the modern alternative to manual finalizers for OS handles.
IAsyncDisposable defines ValueTask DisposeAsync() and is used with await using. Use it when cleanup involves async operations — flushing a buffer to a stream, closing a database connection gracefully, or waiting for a background task to drain before shutdown.
If you call a synchronous Dispose() on a resource needing to flush async data, you either block (bad) or lose data. StreamWriter and DbConnection implement both interfaces. For your own types: if cleanup is sync only, use IDisposable. If cleanup has I/O or awaitable work, implement IAsyncDisposable. Prefer implementing both when your type will be used in both sync and async contexts.
Since C# 7.2, you can assign stackalloc directly to a Span<T> without an unsafe context: Span<byte> buffer = stackalloc byte[128];. The memory lives on the stack frame and is freed automatically when the method returns — zero GC involvement.
Critical: never let the Span escape the method (returning it, storing in a field, or passing across an await boundary will reference freed stack memory). For buffers larger than a few kilobytes, prefer ArrayPool<byte>.Shared.Rent() — stack overflow is a real risk with large stackalloc. The pattern of stackalloc for small buffers and ArrayPool for larger ones is standard in high-performance .NET code like ASP.NET Kestrel.
A delegate instance holds an invocation list — multiple methods combined with +=. When invoked, all are called in order. For return values, only the last delegate's return value is kept.
Risks: if any subscriber throws an exception, subsequent subscribers in the list are skipped. Fix: iterate GetInvocationList() manually with a try/catch per delegate. The classic memory leak: a long-lived publisher holds a reference to a short-lived subscriber through the delegate — the subscriber can never be GC'd until it unsubscribes (-=). Solutions: WeakEvent pattern, or use IObservable/Rx with explicit subscription disposal via the returned IDisposable.
Default interface implementations (C# 8): interfaces can provide method bodies. Primary motivation: adding a new method to a published interface without breaking all existing implementations. Downside: it muddies the interface/abstract class distinction and the default method isn't accessible through a concrete type variable — only through the interface reference.
Static abstract interface members (C# 11): interfaces can declare static members that implementations must provide. The killer use case is generic math — INumber<T> declares static abstract T operator +(T left, T right), enabling truly generic arithmetic algorithms like T Sum<T>(IEnumerable<T> values) where T : INumber<T>. Previously impossible without boxing or dynamic dispatch.
Virtual method calls go through the vtable — an indirection that prevents the JIT from inlining the call or devirtualising it. When a class or method is sealed, the JIT knows the exact target at compile time and can inline, eliminating the vtable lookup overhead. In benchmarks on hot paths this can be a meaningful speedup.
When to seal: seal classes that are not designed for inheritance — it's an explicit design signal and a JIT hint. For override methods on non-sealed classes, sealed override prevents further subclasses from overriding while allowing devirtualisation on that specific type. The .NET runtime seals many core types for this reason. Don't seal everything prematurely — seal where it's architecturally intentional.
Microsoft.Extensions.ObjectPool provides a thread-safe pool where you Get() an instance, use it, then Return() it. Beyond capacity, extra returned objects are discarded for GC. StringBuilder pooling is the textbook example — string concatenation in loops is a common GC hotspot.
Use it for objects that are: expensive to create, used frequently for short durations, and can be reset to a clean state before reuse. The requirement that returned objects must be safely resetable is the main constraint — if cleanup is complex or unreliable, pooling creates subtle bugs where stale state leaks between uses.
Use BenchmarkDotNet — it handles JIT warmup, multiple iterations, statistical analysis, and GC pressure measurement automatically. Never measure with Stopwatch in Debug mode or without warmup iterations — the JIT hasn't compiled the hot path yet.
Pitfalls: dead code elimination (the JIT may optimise away code whose result is never used — use Blackhole.Consume()). GC interference (enable MemoryDiagnoser to see allocations per operation). CPU frequency scaling skews results. Always benchmark in Release mode — Debug builds disable inlining and change performance characteristics dramatically.
SIMD (Single Instruction Multiple Data) processes multiple data elements in a single CPU instruction. .NET exposes this through System.Numerics.Vector<T> (portable, auto-vectorised) and System.Runtime.Intrinsics (hardware-specific: SSE2, AVX2, ARM NEON). A Vector<float> can process 8 floats simultaneously on AVX2.
Use cases: bulk mathematical operations, byte searching/parsing (JSON/CSV parsers), checksum/hashing algorithms. The .NET runtime uses AVX intrinsics internally in string.IndexOf and Span.SequenceEqual. Vector<T> gives portable SIMD without hardware-specific code. Real-world gains can be 4–16× for appropriate numeric workloads.
C# doesn't have native discriminated unions, but you can approximate them with a sealed record hierarchy: abstract record Result with subtypes record Success(T Value) : Result and record Failure(string Error) : Result. Pattern matching on the sealed hierarchy is exhaustive.
Methods return Result instead of throwing exceptions for expected failure cases (validation errors, not-found, business rule violations). Callers are forced to handle both branches — error paths become explicit in the type system. This is the Railway Oriented Programming pattern. Reserve exceptions for truly unexpected conditions, not for normal business flow control. The OneOf NuGet package provides a richer generic implementation.
The pipeline is a chain of RequestDelegate functions (Func<HttpContext, Task>). Each middleware receives the current HttpContext and a reference to the next middleware. It can execute logic before calling await next(context), after, or short-circuit entirely without calling next.
To write custom middleware: implement a class with InvokeAsync(HttpContext context, RequestDelegate next). DI is injected via the method parameters (per-request), not the constructor (singleton). Register with app.UseMiddleware<MyMiddleware>(). Key design: middleware that modifies the response must act before the response starts — once headers are sent, you can't change the status code.
Implement BackgroundService and override ExecuteAsync(CancellationToken stoppingToken). The DI container starts it with the host and provides the stopping token at shutdown. Register with services.AddHostedService<MyWorker>().
Reliability: wrap the main loop body in try/catch — an unhandled exception stops the service silently in older .NET versions. For work that must survive crashes, consume from a message queue rather than an in-memory queue that loses data on restart. For scoped DI services (like DbContext), inject IServiceScopeFactory and create a scope per unit of work — background services are singletons and cannot directly inject scoped services.
Classic approach: register multiple implementations, inject IEnumerable<IMyService> and select at runtime. Works but is awkward when you want one specific implementation by name.
Keyed services (.NET 8): services.AddKeyedSingleton<IPaymentGateway, StripeGateway>("stripe"). Inject with [FromKeyedServices("stripe")] IPaymentGateway stripe or resolve via provider.GetKeyedService<IPaymentGateway>("stripe"). This is the clean solution for the Strategy pattern with DI. Before .NET 8, the workaround was a factory delegate or a dedicated resolver service — both add indirection but work across older versions.
Use IExceptionHandler (.NET 8+) — implement the interface, register with services.AddExceptionHandler<MyHandler>() and app.UseExceptionHandler(). Map exception types to ProblemDetails responses (RFC 7807 standard) — consistent JSON with type, title, status, and detail fields.
Register multiple handlers in priority order — the first returning true from TryHandleAsync wins. Map domain exceptions (NotFoundException → 404, ValidationException → 400, ConflictException → 409) and fall through to a 500 catch-all. Never expose stack traces or internal exception messages in production responses. This centralises error handling rather than scattering try/catch blocks through every controller action.
- IOptions<T> — Singleton. Configuration read once at startup and cached forever. Use for settings that never change. Fastest.
- IOptionsSnapshot<T> — Scoped. Re-reads configuration per HTTP request. Use when configuration can change and you want per-request consistency. Cannot be injected into singletons.
- IOptionsMonitor<T> — Singleton. Reads live configuration and provides an OnChange callback. Use in singletons that need to react to config changes (feature flags, rate limit thresholds).
Add validation with ValidateDataAnnotations() and ValidateOnStart() — fails fast at startup if config is invalid rather than crashing at first use in production.
Polly v8 uses a ResiliencePipelineBuilder: chain AddRetry, AddCircuitBreaker, and AddTimeout. Execution order is outer-to-inner: timeout wraps circuit breaker wraps retry wraps the actual operation.
Key configuration: retry with ExponentialBackoff plus jitter to avoid thundering herd (all clients retrying simultaneously). Circuit breaker with FailureRatio threshold and minimum throughput so it doesn't open on a single failure. Register via services.AddResiliencePipeline("http-client", builder => …). For HttpClient, AddStandardResilienceHandler() from Microsoft.Extensions.Http.Resilience configures sensible defaults for all of the above in one call.
Implement IHealthCheck with CheckHealthAsync returning HealthCheckResult.Healthy/Degraded/Unhealthy. Register with services.AddHealthChecks().AddCheck<DatabaseHealthCheck>("database", tags: new[] { "ready" }). Expose with separate endpoints for liveness and readiness.
Key distinction: liveness — is the process alive and not deadlocked? (Simple ping, no external dependencies.) Readiness — is it ready to serve traffic? (Check DB connectivity, cache warming, downstream service reachability.) Kubernetes uses liveness to restart a pod and readiness to decide whether to route traffic. The AspNetCore.Diagnostics.HealthChecks NuGet has pre-built checks for SQL Server, Redis, RabbitMQ, and many others.
[CallerMemberName], [CallerFilePath], and [CallerLineNumber] are applied to optional parameters. The compiler fills them in at the call site with the caller's method name, source file path, and line number — with zero runtime overhead compared to reflection-based approaches.
Practical use: void Log(string message, [CallerMemberName] string method = "", [CallerLineNumber] int line = 0) — callers just write Log("Started") and the logger automatically captures the method name and line. Also used in INotifyPropertyChanged: SetProperty(ref _field, value, [CallerMemberName] string prop = "") means you don't repeat the property name as a string — safer than hardcoded strings and incomparable to reflection overhead.
Before C# 10, logger.LogDebug($"Processing order {orderId}") would always allocate a formatted string even if Debug logging was disabled. Interpolated string handlers allow a method to intercept the string pieces before the string is built — if the logger is not enabled for Debug, it signals "stop" and the string is never constructed.
From .NET 6, LogDebug($"…") is essentially zero-cost when the log level is disabled. This solves the longstanding pattern where developers wrote if (logger.IsEnabled(LogLevel.Debug)) logger.LogDebug(…) as a manual guard. Callers write normal interpolated strings and get the performance for free — it's a compiler feature with no API change required on the caller's side.
Response Caching is header-based (RFC-compliant HTTP caching). It respects Cache-Control headers and is driven by the client — a client can bypass it with a request header. Primarily useful for CDN-cacheable public responses.
Output Cache (.NET 7+) is server-controlled regardless of client headers. It supports tag-based invalidation — cache.EvictByTagAsync("orders") clears all cached responses tagged with "orders" when data changes. Supports vary-by query string/header/claim and custom policies. Use it for authenticated endpoints, briefly cached dynamic pages, or when you need server-side invalidation. The right default choice for most API response caching in modern .NET.
Endpoint filters (IEndpointFilter, .NET 7+) are similar to middleware but scoped to a specific endpoint or route group. They can access strongly-typed endpoint arguments — unlike middleware, which only sees HttpContext. This makes them ideal for validation, logging, or caching logic that needs the parsed body/route parameters.
vs. Middleware: middleware applies globally (or path-prefixed). Filters apply per-endpoint or per-group via AddEndpointFilter(). For cross-cutting concerns affecting all requests (auth, rate limiting, global error handling), use middleware. For endpoint-specific validation that needs typed arguments, use endpoint filters. They compose cleanly with route groups, making it easy to apply a ValidationFilter across an entire API surface without repeating it per route.
NativeAOT (PublishAot) compiles your .NET application to a self-contained native binary ahead of time — no JIT compilation at startup, much faster cold start, smaller memory footprint, and no .NET runtime required on the target machine. Ideal for microservices, CLIs, and containerised lambdas where startup time matters.
Constraints: dynamic code generation and unrestricted reflection are incompatible — you cannot call Assembly.Load, use Activator.CreateInstance without AOT hints, or rely on any code path that generates IL at runtime. Your dependencies must also be AOT-compatible. This is why source generators and [DynamicallyAccessedMembers] annotations exist — they give the trimmer/AOT compiler enough information to preserve what's needed. Many popular NuGet packages still don't support NativeAOT. Validate AOT compatibility with PublishTrimmed first before jumping to NativeAOT.
FrozenDictionary<TKey, TValue> and FrozenSet<T> are read-only collections optimised specifically for lookup performance. At creation time (.ToFrozenDictionary()), they spend extra time building a perfect or near-perfect hash structure specific to the known set of keys. Subsequent lookups are faster than Dictionary because the hash function is specialised and there's no need to handle mutations.
Ideal for: configuration lookups, static mappings (country codes, currency pairs, HTTP status descriptions), permission tables — any collection that's built once at startup and read millions of times. The creation cost is paid once; the lookup savings accumulate over the application's lifetime. Not suitable for any collection that needs to be mutated after creation.
Code that directly calls DateTime.UtcNow or DateTimeOffset.Now is untestable without time travel — you can't control what the clock returns in a unit test. TimeProvider is an abstract class injected as a dependency, with GetUtcNow(), CreateTimer(), and GetElapsedTime() methods. The default implementation delegates to the system clock; tests use a FakeTimeProvider that you advance manually.
This enables deterministic tests for anything time-dependent: token expiry, scheduled jobs, retry backoff calculations, rate limiting windows, and business rules based on dates. The Microsoft.Extensions.TimeProvider.Testing NuGet provides FakeTimeProvider with Advance(TimeSpan) to control the clock programmatically. A simple change with a big impact on testability — always inject time rather than calling the system clock directly.
1. Enable the deadlock trace flag 1222 or use Extended Events to capture the full deadlock graph — it shows the two transactions, the resources they hold, and what each is waiting for.
2. Identify the lock order conflict — Transaction A holds Lock X and wants Lock Y; Transaction B holds Y and wants X. The fix is usually to enforce consistent lock ordering across all code paths that touch those tables.
3. Keep transactions short — never do network calls or user-facing operations inside a transaction. 4. Use READ COMMITTED SNAPSHOT ISOLATION (RCSI) on the database to let readers not block writers. 5. Add targeted indexes to reduce the range of rows locked. 6. For read-heavy reporting queries, consider NOLOCK hints (with the caveat of dirty reads).
Start with the Actual Execution Plan in SSMS — look for table scans, high estimated row counts, or key lookups. SET STATISTICS IO ON to see logical reads. Check the missing index suggestions at the top of the plan.
Ensure queries are SARGable — avoid wrapping indexed columns in functions. WHERE YEAR(OrderDate) = 2024 prevents index use; replace with WHERE OrderDate BETWEEN '2024-01-01' AND '2024-12-31'. Add non-clustered covering indexes that include all columns needed by the query so SQL Server doesn't need a key lookup. Review join order and use query hints as a last resort. For analytics, consider indexed views or pre-aggregated summary tables.
Version-control all schema changes using EF Core Migrations, Flyway, or Liquibase — scripts run automatically as part of the deployment pipeline. The key pattern for zero downtime is expand-contract:
- Expand: Add the new column as nullable, deploy code that handles both the old and new schema.
- Contract: Once all traffic is on the new code, backfill data and make the column NOT NULL, then remove the old column in a later release.
Never run DDL inside a long transaction on a live table. On SQL Server, use online index operations for large tables. Blue-green deployments make rollback trivial if a migration fails.
A race condition (specifically a lost update) occurs when two transactions read the same value simultaneously and both write an update based on that stale read — one write overwrites the other. ACID compliance ensures this can't happen silently.
Solutions: Use SNAPSHOT isolation so each transaction sees a consistent view. Implement optimistic concurrency with a rowversion column — throw a concurrency exception if the version changed between read and write. Or use SELECT ... WITH (UPDLOCK, ROWLOCK) to acquire an exclusive update lock immediately on read, preventing others from reading-to-update the same row until your transaction commits.
Optimistic: read the data, process, then at save time check if anyone else changed it (using an ETag or rowversion). No locks held during the operation — great for low-contention, read-heavy workloads. If a conflict is detected, surface it to the user or retry.
Pessimistic: lock the row on read (SELECT FOR UPDATE), preventing others from modifying it until you commit. For a financial tool, I'd lean pessimistic — the cost of a conflict (double-billing, lost balance) far outweighs the performance cost of locking. Keep the transaction window as small as possible to minimise contention.
Stack: .NET Worker Service consumers, Azure Service Bus / RabbitMQ as the queue, PostgreSQL for notification status tracking, Redis for deduplication and rate-limiting. Delivery channels: FCM/APNs for mobile, SendGrid for email, SignalR for real-time web.
Publishers emit events to the queue. Multiple consumer instances pull messages, fan out to delivery channels, and write status back to the DB. Key concerns: idempotency keys to prevent duplicate sends, a dead-letter queue for permanently failed messages, retry with exponential backoff, and per-recipient rate limits. Scale horizontally by adding consumers. Monitor with queue depth alerts.
Start with a monolith unless you have a compelling reason not to. Microservices add significant overhead: distributed tracing, network latency, eventual consistency challenges, and multiple deployment pipelines. Teams often underestimate this cost early on.
Switch to microservices when: a specific domain needs independent scaling, multiple teams are blocked by the same codebase, or one component has sharply different runtime requirements (e.g., an ML service needing Python + GPU). Use the strangler fig pattern to migrate incrementally — carve out services at natural seams rather than doing a big-bang rewrite.
Implement a Circuit Breaker (via Polly in .NET) that monitors failure rate. After a threshold of failures, the circuit opens and immediately rejects requests without hammering the dead dependency. After a cooldown, it enters half-open to probe recovery.
Layer this with: retry with exponential backoff + jitter for transient errors, fallback responses (serve cached data or a gracefully degraded response), and bulkhead isolation (separate thread pools per dependency) so one failing API doesn't starve threads needed by healthy services. Alert on circuit state changes and track in your observability platform.
The order service publishes an OrderPlaced event to a topic/exchange. The notification service subscribes independently — it has no knowledge of the order service's internals. They scale separately and neither blocks the other.
Use Kafka for high-throughput, event replay, or audit log requirements. Use RabbitMQ for simpler routing, lower latency, and request/reply patterns. Critical design concerns: idempotency (consumers must handle duplicate messages safely), dead-letter queues for poisoned messages, correlation IDs for distributed tracing, and consumer group management for parallel processing without duplicate processing.
L1 — in-process (IMemoryCache): For hot, small, non-shared data like config or lookup tables. Fastest possible, but not shared across instances.
L2 — distributed (Redis): For computed aggregates shared across API instances. Use a cache-aside pattern: read from cache, on miss compute and write. Set TTL (30–60 seconds for a live dashboard). For frequently mutated data use write-through caching. DB level: materialised views or indexed views for expensive aggregates that rarely change. Cache invalidation strategy: TTL for simplicity, event-driven invalidation (publish an event on write to clear the key) for accuracy-critical data.
gRPC uses HTTP/2, binary Protocol Buffers serialisation, and generates strongly-typed clients in multiple languages. It's typically 5–10× faster than JSON/REST for equivalent payloads and supports bidirectional streaming natively. Ideal for: internal service-to-service communication, high-throughput pipelines, and polyglot environments where the generated client eliminates hand-rolled API calls.
REST is universally supported, browser-compatible, human-readable, and easier to debug with standard tools. Choose REST for: public-facing APIs, browser clients, or anywhere human readability of payloads matters. In practice, I use REST for external-facing APIs and gRPC internally where performance is critical.
Use a sliding window or token bucket algorithm stored in Redis with the client API key as the key. On each request: INCR the counter with a TTL, reject with 429 Too Many Requests if the limit is exceeded. Redis's atomic INCR + EXPIRE ensures no race conditions.
In .NET 7+, the built-in RateLimiter middleware handles this cleanly. Design considerations: differentiated limits per API tier (free vs. paid), always return a Retry-After header so clients know when to retry, log and alert on sustained abuse patterns, and use separate Redis clusters so the rate limiter doesn't share capacity with your main cache.
- Factory / Abstract Factory — Creating service instances without coupling to concrete types. Pair with DI for clean registration.
- Strategy — Swappable algorithms at runtime (payment providers, pricing engines, device enrollment flows in my MDM system). Eliminates giant switch statements.
- Decorator — Adding cross-cutting concerns (logging, caching, retry) to existing services without modifying them. Integrates naturally with DI.
- Repository + Unit of Work — Clean data access abstraction, makes swapping ORMs or mocking DB calls trivial.
- Observer / Event — Decoupled notifications. C#'s events and delegates are a first-class implementation of this pattern.
Issue JWTs signed with a private key (RS256). Embed claims (userId, roles, expiry). Validate signature, issuer, audience, and expiry on every request via AddJwtBearer middleware. Use short-lived access tokens (15 mins) paired with longer-lived refresh tokens stored server-side (in Redis or DB) so they can be revoked.
For external OAuth2 flows: use Authorization Code + PKCE — never Implicit flow. Always HTTPS. Rotate signing keys periodically using a key rotation strategy (publish a JWKS endpoint). For internal machine-to-machine: use Client Credentials flow. Store secrets in Azure Key Vault or equivalent — never in config files or environment variables checked into source control.
- Local state (useState) — For ephemeral, component-specific UI state (open/close, form inputs). No sharing needed.
- Context API — For low-frequency global state (auth user, theme, locale). Avoid it for high-frequency updates — every context consumer re-renders on every change.
- Redux — For complex shared state with many actors, when you need time-travel debugging or a strict unidirectional data flow. Higher boilerplate.
- Zustand — My preference for new projects. Minimal API, no boilerplate, fine-grained subscriptions so components only re-render when their specific slice of state changes. Easier to test than Redux.
Start with the React DevTools Profiler to find components with high render times or that re-render unexpectedly often. Check the why did this component render? panel.
Fixes: React.memo to prevent re-renders when props haven't changed. useMemo for expensive derived computations. useCallback to stabilise function references passed as props (without it, a new function object is created every render, busting memo). Virtualise long lists with react-window or TanStack Virtual. Lazy-load heavy components with React.lazy + Suspense. Avoid creating new objects/arrays inline in JSX (they break reference equality and defeat memoisation).
useEffect with no deps = runs on every render. [] = once on mount. [dep] = runs when dep changes. Common pitfalls:
- Missing dependencies — Stale closure; you're reading an outdated variable value. ESLint's exhaustive-deps rule catches this.
- Infinite loop — Setting state unconditionally inside an effect that lists that state as a dep. Fix: add a condition, or restructure logic.
- New object reference in deps — An inline object like {id: 1} creates a new reference every render. Wrap with useMemo or move outside the component.
- Async useEffect — Can't directly make the callback async; instead define an inner async function and call it.
Route-level code splitting with React.lazy() + Suspense — only load JS for the current page. Use Webpack's / Vite's dynamic import() for heavy components that aren't needed immediately. Enable tree-shaking (use ES modules, not CommonJS) to eliminate dead code from bundles.
Serve static assets from a CDN. Preload critical fonts and above-the-fold images. Paginate or virtualise data-heavy tables. Use skeleton screens instead of spinners for perceived performance. Analyse bundle size with source-map-explorer or Vite bundle visualiser to find large dependencies that can be swapped for lighter alternatives.
When the same stateful logic — typically a combination of useState + useEffect — is needed in more than one component. If you find yourself copy-pasting that combination, extract it into a custom hook.
Examples: usePagination encapsulating page index, page size, and fetch logic. useDebounce for search inputs. useDeviceStatus in my MDM system for polling a device's online state. The hook keeps the component clean and the logic testable in isolation. One rule: custom hooks must start with use so React's linter can enforce the rules of hooks inside them.
Error boundaries are class components implementing getDerivedStateFromError (to render a fallback UI) and componentDidCatch (to log the error). They catch render-time errors in their entire child tree. Wrap critical sections — a page, a widget, a data table — so a single component's failure degrades gracefully rather than crashing everything.
Use the react-error-boundary library for a clean functional API. Inside componentDidCatch, forward the error and stack trace to your monitoring service (e.g., Sentry). Note: error boundaries do not catch errors in event handlers or async code — those need try/catch.
CSR: browser downloads a JS bundle and renders in the client. Fast subsequent navigation, but slow initial load and no meaningful content until JS executes — bad for SEO and users on slow connections.
SSR (Next.js): HTML is pre-rendered on the server, so the user gets visible content immediately (fast FCP/LCP) and search engine crawlers see fully-formed HTML. Choose SSR for: public-facing marketing sites, e-commerce, or any page where SEO and first-paint performance matter. CSR is perfectly fine for authenticated dashboards and internal tools where SEO is irrelevant and the user is on a fast connection.
Prop drilling is passing props through multiple layers of components that don't actually use them, just to reach a deeply nested consumer. It becomes a problem around 3+ levels deep — refactoring requires touching every intermediate component.
Solutions in order of preference: Component composition (pass the rendered child element directly rather than its data). Context API for infrequent global state. Zustand/Redux for high-frequency or complex state. Often, simply restructuring the component tree — colocating state closer to where it's consumed — resolves the issue without needing a state manager.
Use a single form state object at the wizard level, managed with useReducer for complex step transitions. Distribute via React Hook Form's FormProvider — it uses an uncontrolled input model, meaning inputs don't re-render on each keystroke since the form state lives in a ref.
Keep each step as an isolated component reading from and writing to the shared context. Wrap steps in React.memo so only the active step re-renders on state changes. Validate per-step on "next" using RHF's trigger() so the user gets instant feedback without triggering a full form validation.
- LCP (Largest Contentful Paint) — How quickly the main content loads. Fix: preload hero images, fast server response, CDN for assets.
- INP (Interaction to Next Paint) — Responsiveness to user input. Fix: minimise long JS tasks, offload heavy computation to Web Workers.
- CLS (Cumulative Layout Shift) — Unexpected layout shifts. Fix: always specify width/height on images and iframes; avoid injecting content above existing content.
Measure with Lighthouse, PageSpeed Insights, and real-user monitoring (e.g., Vercel Analytics, Datadog RUM). For accessibility: WCAG 2.1 AA as a baseline — ARIA labels, keyboard navigation, colour contrast ratios ≥4.5:1, focus management in modals.
Sliding window with two pointers and a HashSet. Expand the right pointer, add the character to the set. If the character is already in the set, shrink from the left, removing characters until the duplicate is gone. Track the maximum window size seen.
Time complexity: O(n) — each character is added and removed at most once. Space complexity: O(min(n, k)) where k is the charset size (26 for lowercase letters, 128 for ASCII). For a more performant version, use a HashMap to jump the left pointer directly past the last occurrence of the duplicate rather than shrinking one step at a time.
Expand-around-centre: for each character (and each pair of adjacent characters for even-length palindromes), expand outward while the characters match. Track the longest expansion seen. Time: O(n²), Space: O(1) — clean and interview-appropriate.
The theoretically optimal solution is Manacher's algorithm at O(n), but its implementation complexity makes it a poor interview choice under time pressure. Mention it shows you're aware it exists, then implement expand-around-centre. Key edge cases: single-character strings (palindrome by definition), all identical characters, and even vs odd length palindromes.
Staircase search: start at the top-right corner. If the current value equals the target — found. If the target is less than current — move left (eliminate this entire column). If the target is greater — move down (eliminate this entire row).
This works because the top-right is the smallest in its row and largest in its column, making every comparison eliminate an entire row or column. Time: O(m + n) versus O(m × n) for brute force, or O(m log n) for binary search per row. Space: O(1). You can also start from the bottom-left with the same logic inverted.
Dictionary: O(1) average lookup, insert, and delete by key. Use whenever you need frequent key-based lookups, existence checks, or grouping. A List requires O(n) linear search (via LINQ's .First()). For anything over a few hundred items that you look up by a key, always prefer Dictionary.
Hash collisions occur when two keys produce the same hash bucket — .NET resolves this with chaining, degrading worst-case to O(n), though well-distributed hashcodes make this rare in practice. Lists win for: ordered sequential iteration, index-based access (O(1) by index), or small collections where the hashing overhead isn't worth it. Use HashSet<T> for existence-only checks with no value needed.
Python uses reference counting as its primary mechanism — when a reference count drops to zero, memory is freed immediately. A secondary cyclic garbage collector handles reference cycles. C# uses a generational tracing GC (Gen0/1/2) that's optimised for high-throughput allocation and collection of short-lived objects.
Python's refcounting gives more deterministic cleanup but is slower for rapid alloc/dealloc cycles. C# gives you more control with IDisposable, Span<T>, stackalloc, and unmanaged memory when needed. In Python, everything is an object on the heap — there's no equivalent to C# structs as lightweight stack-allocated value types, which means more GC pressure per object.
The Global Interpreter Lock ensures only one thread executes Python bytecode at a time, even on multi-core hardware. This means Python threads don't give you true parallelism for CPU-bound work. The GIL is released during I/O operations, so threading still improves throughput for I/O-bound tasks like HTTP calls or file reads.
For CPU-bound parallelism, bypass the GIL with the multiprocessing module — separate processes each have their own GIL and can run on separate cores. Alternatively, use Cython or Numba for JIT-compiled extensions that release the GIL during computation, or numpy/scipy operations which internally release it. For async I/O concurrency, asyncio avoids threads entirely and is often the cleanest solution.
Build the FastAPI service with Pydantic models for request/response validation and auto-generated OpenAPI docs. Secure service-to-service communication with JWT tokens — the C# backend is the authority that issues tokens (using Client Credentials or a shared secret), and FastAPI validates the token's signature and claims on every request using python-jose.
Both services sit behind an API gateway (Nginx, Kong, Cloudflare) for routing and TLS termination. For async communication, publish events to a shared RabbitMQ or Kafka topic. Align on consistent error response formats and propagate correlation IDs for distributed tracing across services.
A decorator is a higher-order function that takes a function, wraps it in another function with added behaviour, and returns the wrapper. Use functools.wraps to preserve the original function's name and docstring.
Practical example — a timing decorator: @track_performance wraps any function, records execution time, and logs it. I've written decorators for: request logging (logging inputs/outputs for audit trails), retry logic (wrapping an API call with exponential backoff), and @require_role("admin") for lightweight authorisation checks on FastAPI endpoints without polluting every route handler with if-checks.
A generator function uses yield to return values one at a time, pausing execution between yields. The caller gets a lazy iterator — values are computed on demand and never all stored in memory simultaneously. A list comprehension loads every element upfront; a generator expression produces each element only when asked.
For processing a 10GB CSV file: a list-based approach loads everything into RAM. A generator yields one row at a time, keeping memory usage flat regardless of file size. Generators compose naturally: chain them in a pipeline where each stage is a generator, and only the final stage actually pulls data through. They're foundational to Python's asyncio model too.
Dunder (double-underscore) methods define how objects behave with Python's built-in operations and syntax. They make custom classes feel native. Key examples: __init__ (constructor), __repr__ / __str__ (string representation for debugging), __eq__ + __hash__ (equality and use as dict key), __len__ (supports len(obj)), __iter__ / __next__ (iteration protocol).
Practical example: implementing __enter__ and __exit__ makes a class a context manager (usable with with), perfect for database connections, file handles, or timed operations. I've used __getitem__ to make a config class subscriptable like a dict, hiding the underlying storage format from callers.
- CPU: cProfile — built-in profiler that shows call counts and cumulative time per function. Pair with snakeviz for a visual flame graph. Use line_profiler for line-by-line breakdown of a specific function.
- CPU (production): py-spy — sampling profiler with near-zero overhead; can attach to a running process without code changes.
- Memory: tracemalloc — take snapshots before and after a suspected leak; compare to find which code paths are allocating unreleased memory.
- Memory leaks: objgraph — visualises object reference chains to find what's keeping objects alive unexpectedly.
For leaks, the usual culprit is objects held in global collections, lingering closures, or circular references that the cyclic GC hasn't collected yet.
I built our Grubdirect MDM system — a platform for managing Android POS devices deployed across restaurant locations — entirely from scratch. The business problem was clear: operators had no centralised visibility or control over hundreds of field devices.
I led the requirements phase by mapping workflows with restaurant operators and support staff. I ran structured sessions identifying core scenarios (device enrolment, remote commands, monitoring alerts), translated these into epics, then validated priorities against business impact before writing a line of code. I owned the architecture end-to-end — from integrating Google's Android Enterprise APIs to designing the device state machine and building the management dashboard in React.
When integrating the Android Enterprise Management APIs, the documentation had gaps and some behaviour only revealed itself in testing. My approach: build an abstraction layer around the API client immediately, so our business logic never called the vendor's SDK directly — this meant we could mock it, swap it, or shim inconsistencies without touching core code.
For unreliability: I implemented retry with exponential backoff and a dead-letter pattern for failed operations. I also built internal mock services during development so our progress wasn't blocked by their availability. For communication gaps, I document every undocumented behaviour I discover in a shared internal wiki so the team doesn't rediscover the same gotchas.
I translate technical constraints into business impact. Instead of "our authentication layer has accumulated technical debt," I say: "our current authentication setup will block us from adding enterprise SSO, which is a requirement for our three biggest prospects — that's potential ARR at risk." The stakeholder cares about risk, cost, and timelines, not the implementation.
I use analogies: "Think of technical debt like financial debt — we made a shortcut decision to ship faster, and now we're paying interest on that as slower development. Refactoring is paying down the principal." I always pair the problem with a clear proposal, timeline, and expected benefit — a stakeholder who understands the upside is far more likely to approve the investment.
I avoid framing it as "no" — I frame it as a trade-off conversation. I show: "this feature as described requires eight weeks of foundational rework, which delays Feature Y that has higher projected revenue impact." I then propose alternatives: can we achieve the same business outcome with a simpler approach? A phased delivery? A workaround that costs two weeks instead of eight?
I document the discussion and the decision made — if the stakeholder accepts the trade-off and still wants to proceed, that's a valid business call. Recording it means the reasoning is visible later if questions arise. The goal is a shared understanding, not winning the argument.
When we disagreed on the networking approach for the MDM system — Tailscale vs Cloudflare Tunnel — I proposed we each write a structured comparison with concrete pros/cons: security model, DDoS protection, operational complexity, and cost. We agreed on evaluation criteria upfront, which kept it objective rather than a matter of preference.
Where possible I advocate for a time-boxed prototype — an hour of code often settles a debate faster than an hour of discussion. If still deadlocked, I defer to whoever owns that domain, while documenting my concerns so they're on record. Architecture decisions should live in ADRs — even a disagreement is worth capturing, because the context is valuable for whoever inherits the system later.
- API reliability — Published SLA, status page history, incident response time.
- SDK & documentation quality — A poor SDK signals poor engineering culture; you'll fight it forever.
- Security posture — SOC 2 Type II, GDPR compliance, encryption at rest/transit, penetration test reports.
- Breaking change policy — How much notice do they give? Do they version their APIs?
- Pricing model at scale — Many vendors look cheap until volume kicks in.
- Longevity/community — Is this company going to exist in three years? VC-backed with no revenue is a risk.
- Support responsiveness — Test it before signing. How long do they take to respond to a pre-sales technical question?
Immediately: trigger the circuit breaker to stop hammering the failing endpoint and serve a degraded fallback. Communicate to internal users with a clear impact statement and estimated timeline — people tolerate outages far better when they're informed. Log every failure with timestamps for the post-incident record.
Post-incident: formally raise the SLA breach with the vendor's account team with documented evidence. Request an RCA. If this is a pattern, evaluate contract exit clauses or start a parallel vendor POC. Systematically: the right response is to build the abstraction layer earlier so that swapping or shimming a vendor is a two-hour job, not a two-week project.
I run explicit NFR workshops alongside functional requirement sessions — they're not an afterthought. I ask specific questions: "How many concurrent users do you anticipate at peak?" "What's the maximum acceptable page load time before a user complains?" "What's the cost of 10 minutes of downtime?" BAs often haven't considered these in detail, so I provide reference ranges and let them choose based on budget and risk appetite.
These decisions become acceptance criteria in the Definition of Done, not vague aspirations. If we decide p99 latency ≤ 500ms is acceptable, that's a measurable bar we can verify with a load test before shipping. It also prevents architecture over-engineering — you don't build for 100k concurrent users if 500 is the real target.
The core question: does this align with our core competency? If it's a commodity function (auth, email, payments, maps), buy — you can't win by building your own Stripe. If it's your core product differentiator, build — outsourcing it hands competitors leverage.
I present stakeholders a comparison matrix: upfront cost, ongoing licence vs maintenance cost, time to value, integration risk, vendor dependency risk, and degree of customisation needed. For an internal tool decision, I also factor in: does the team have the capacity to own it long-term? A built system that nobody maintains becomes a liability. The honest answer is usually "buy commodity, build differentiation, and integrate pragmatically."
I use the "walk me through it step by step" technique — ask the BA to narrate exactly what a user does from start to finish. This surfaces implicit assumptions and edge cases instantly. I sketch sequence diagrams or wireframes on the spot to force concrete decisions: "So when the user clicks Submit here, what happens if the API is down?"
I split vague stories into a research spike (explore unknowns, timebox to one or two days) and implementation tickets with clear acceptance criteria. I enforce a Definition of Ready — a story can't enter a sprint without: a clear user goal, defined acceptance criteria, and identified edge cases. It's not gatekeeping; it prevents mid-sprint surprises that kill velocity.
Document every failure with timestamps, duration, and user impact — this is your negotiating evidence. Raise formally with the vendor's account manager, presenting the pattern clearly. Request a root cause analysis and a credible remediation plan with a timeline.
In parallel: evaluate your contract exit clause and any credit or holdback provisions for SLA breaches. Begin a quiet POC with an alternative vendor so you have a credible exit option. If there's no improvement after a formal escalation, escalate through the vendor's management chain. The goal isn't to be adversarial — it's to signal that you take the SLA seriously and will act on it.
I point back to the sprint goal agreed at planning — "here's what we committed to, here's why that's already at capacity." New requests go to the backlog and get prioritised in the next sprint. If a request is genuinely urgent, I frame it as a trade-off: "We can add this, but which item from the current sprint should we descope to accommodate it?"
This makes the cost tangible and puts the decision in the BA's hands — it's not me saying no, it's us together deciding priorities. I also use recurring scope creep as a signal to improve the refinement process. If urgent requests keep appearing mid-sprint, the upstream requirements process needs fixing, not the sprint.
Run both systems in parallel during the transition period. Build a business rules register co-owned with the BA — a living document of every rule discovered through code reading, QA testing, and stakeholder interviews. Treat every edge case found in the old code as a requirement.
Use shadow mode: route real traffic to both systems, but the old system's result remains authoritative. Compare outputs and investigate any discrepancies. This surfaces bugs in the new system and previously unknown behaviour in the old one. Only cut over when output parity is confirmed across a statistically significant sample. Have a fast rollback plan ready — the first few days post-cutover are the highest risk period.
- SOC 2 Type II report — not Type I (snapshot); Type II covers a time period.
- Data residency — where is customer data stored? Does it comply with GDPR/UK GDPR?
- Encryption — TLS 1.2+ in transit, AES-256 at rest.
- Penetration test reports — external, independently conducted, recent (<12 months).
- Access control model — least privilege, MFA enforced for vendor staff accessing our data.
- Breach notification policy — how quickly do they notify you? What's their incident response SLA?
- Subprocessor list — who else can they share our data with?
- Data retention & deletion — can you request deletion? What's their retention policy?
- Supply chain security — do they have a software composition analysis (SCA) process?
First, a blameless post-mortem — the goal is systemic improvement, not assigning fault. I build a timeline of events first, then apply the "5 Whys" to find the root cause beneath the surface symptom. I distinguish contributing factors from the root cause, since fixing contributing factors without addressing root cause means the incident recurs.
For non-technical stakeholders: "Here's what happened in plain English, here's the user impact, here's what we've done immediately to restore service, and here's the roadmap to prevent it recurring." No jargon, no blame, and — critically — concrete action items with owners and deadlines so the post-mortem doesn't just become a document that sits in Confluence and changes nothing.
First, communicate formally — vendors sometimes reprioritise for strategic customers. Present the use case clearly and quantify the impact of them not building it. Sometimes a well-articulated ask becomes a roadmap item.
In parallel: assess whether the gap can be bridged with your own tooling (a thin adapter or extension layer). Evaluate the total cost of migration to an alternative versus staying and working around it. Run a time-boxed alternative vendor POC so you have a data-backed option rather than hypothetical alternatives. Make the go/no-go decision based on business criticality and migration cost. Document the decision clearly — vendor lock-in risk is a real architectural concern that should be visible.
Architecture Decision Records (ADRs) stored in the repo alongside code — every major decision captured with context, options considered, and rationale. Crucially, they document the why, not just the what. Updated in the same PR as the change so they don't drift.
C4 diagrams for different audiences: Context diagram (a one-pager showing the system and its external actors) for BAs and executives; Container and Component diagrams for engineers. A one-page executive summary covering the business problem solved, key architectural decisions, and known risks — written in plain English. The rule I enforce: if a new engineer or BA can't orient themselves in under 30 minutes from the docs, the docs aren't good enough yet.