Soundness disclosure
Status: design only — not scheduled for implementation.
The problem
TestPrune builds its symbol dependency graph statically from the F# AST via FSharp.Compiler.Service. That graph is the authority for "what could this test reach?" — but the AST cannot see every edge that exists at runtime. When a real edge is missing from the graph, TestPrune may skip a test that would have failed if run. That is unsound test selection, and it is the worst failure mode the tool has: silent, looks like a green CI, masks regressions.
Today we treat soundness as an implicit property — users who happen to read the code or hit a bad miss learn the boundaries the hard way. We should make it a first-class, surfaced concept instead.
What "unsound" means here
A test selection is sound if every test that would have observed the change is in the selected set. We are unsound whenever the static graph is missing an edge that exists at runtime. Concretely, the F# constructs that produce hidden edges are:
-
Reflection.
typeof<_>,Type.GetType,Activator.CreateInstance,MethodInfo.Invoke, attribute-driven discovery. - Type providers. Generated types and members do not exist in the source AST; downstream references look like dependencies on the provider invocation, not on the generated surface.
-
SRTP /
inlineresolution. Statically-resolved type parameters resolve at each call site; the "callee" is structural, not a fixed symbol. Our graph cannot enumerate every concrete resolution. -
Computation expressions. Builder method dispatch (
Bind,Return,Yield, custom operations) is implicit; users writinglet!rarely see the symbol they are actually depending on. -
DI containers and service locators.
IServiceProvider.GetService, Autofac, etc. — the binding from interface to implementation lives in configuration, not in source. -
String-keyed dispatch. Falco route strings, configuration keys,
nameofround-trips, dynamic dispatch tables. (TestPrune.Falco partially addresses the Falco case; the general pattern remains.) - Native interop / P-Invoke. Out of scope for static F# analysis.
- Build-time codegen that runs outside FCS (Paket targets, T4-style generators, source generators we do not consume).
Proposal
Three pieces, ordered by cost.
1. A documented unsoundness list
Ship a docs/soundness.md page that enumerates the constructs above,
explains why each one defeats static analysis, and states what TestPrune
does in practice (typically: records an edge to the call site but not
to the runtime target). This is mostly writing — no code — and it earns
credibility the way Ekstazi's papers do by being explicit about
limitations rather than hiding them.
2. Detection during indexing
When the AST walker encounters one of these constructs, record a soundness note alongside the edge. Schema sketch (additive to the existing SQLite store):
soundness_notes(
symbol_id INTEGER, -- the symbol whose body contains the construct
kind TEXT, -- 'reflection' | 'type_provider' | 'srtp' | ...
detail TEXT, -- e.g. the reflected type name if known
source_range TEXT -- file:line:col for the diagnostic
)
Detection is per-construct and varies in difficulty:
-
Easy: syntactic patterns —
typeof,Type.GetType, attribute usage, computation expressionlet!/do!/yield!,inlinefunction declarations. These are AST node kinds. - Medium: SRTP call sites (need the symbol-use info FCS already provides), type-provider-generated symbols (FCS marks these).
- Hard / out of scope: DI resolution, string-keyed dispatch. Document them; do not try to detect them automatically.
3. Surfacing in CLI output
Two surfaces, both opt-in-by-default-on:
-
*
test-prune run* prints a one-line summary when the selected closure contains soundness notes:selected 12 tests (skipped 187) re-run with --explain-soundness for details, or --sound to fall back to running all tests in affected files. -
*
test-prune status* has a new section listing soundness notes for the change set, grouped by kind. Useful as a code-review aid: "this PR adds a newType.GetTypecall — TestPrune will be blind here."
A --sound flag widens selection to include any test transitively
reaching a symbol with a soundness note in the changed closure. This
trades precision for safety and is the right default for release
branches.
What this is not
- Not a correctness guarantee. Detection is best-effort; the enumerated list will always be incomplete. The point is to convert silent unsoundness into visible unsoundness wherever we can.
- Not a fix. We are not trying to resolve reflection or DI statically. That belongs to the (separate) FsHotWatch project, where a dynamic recorder can observe real edges and fold them into the graph.
- Not coupled to FsHotWatch. Everything proposed here lives in TestPrune.Core and the CLI. FsHotWatch consumes the soundness notes the same way it consumes any other graph data.
Open questions
- Do we want a soundness score (count of notes in selected closure) exposed as a metric, or is the per-construct list enough? A single number is easy to ratchet in CI; a list is more honest.
-
Should
--soundbe the default in CI and--fast(current behavior) the opt-in? Probably yes, but it's a behavior change worth a major version bump. - Type providers: do we mark the consumer of a provided type as unsound, or the provider declaration? Consumer is more useful for test selection but noisier in the diagnostic.
Prior art
-
Ekstazi papers explicitly enumerate unsoundness sources (reflection,
classloaders, native code) and report empirical miss rates. We
should do the same in
docs/soundness.md. - Microsoft TIA documents that it is "scoped to managed code, single machine" — a coarser disclosure but the same pattern: state your boundaries up front.
-
Bazel sidesteps the issue by requiring manual
BUILDdeclarations; unsoundness becomes a human bug rather than a tool bug. We are not going there, but it is the alternative design point.
TestPrune