feat(capi): C ABI for distance, VectorStore, HNSW, MMap#18
Conversation
Code Review: feat(capi): C ABI for distance, VectorStore, HNSW, MMapOverall this is a clean, well-considered implementation. The exception-boundary safety, null-handle guards, and ABI-parity rationale are all well-handled. A few real issues and suggestions below. Bugs / Correctness
vanedb_cpp_hnsw_search(h, q1, k, 200, ...); // ef_search now 200
vanedb_cpp_hnsw_search(h, q2, k, 50, ...); // ef_search now 50that's fine. But if
Test binary artifacts are never cleaned up Design / Interface Concerns
No No ABI version constant in the header Test Coverage Gaps
Minor / Style
What's done well
🤖 Generated with Claude Code |
Code Review: feat(capi): C ABI for distance, VectorStore, HNSW, MMapOverall: Well-designed C ABI wrapper. Exception safety, null-handle guards, and the ABI comment block are solid. A few correctness gaps and one latent bug worth addressing before the bench harness lands. Bugs / Correctness[High] Null
// vanedb_capi.cpp — add to each path-taking function before the try block:
if (!path) return 1; // or return nullptr for load/openThe null-handle pattern already applied to store/hnsw/mmap handles should be extended to path parameters. [Medium] The wrapper writes size_t count = std::min(res.size(), k); // add this
for (size_t i = 0; i < count; ++i) { ... }
return count;Same applies to Design Notes[Minor] The header comment documents this correctly ("not thread-safe to call concurrently with different ef_search values"). For the intended single-threaded bench consumer, this is fine. Worth flagging that if this ABI is later used in multi-threaded contexts, the atomicity gap will be a real hazard. A future v1 might prefer [Minor] The truncation comment in the .cpp is good. A note in the header's ABI block would also help callers who read only the header: // seed: only the low 32 bits are used; high bits are silently dropped
vanedb_cpp_hnsw* vanedb_cpp_hnsw_new(..., uint64_t seed);[Nit] The plan doc says CMake[Minor] C compiler flags are set globally The target_compile_options(test_capi PRIVATE -O3)Or at minimum, scope the flag assignment inside the TestingTest artifacts are not cleaned up
Negative paths are well-covered — dim=0 constructors, null handles across all types. Good work catching the CI Gap (acknowledged)No CI job exercises Summary
The null-path and output-buffer issues are the blockers to address. Everything else is polish. |
Code Review — feat(capi): C ABI for distance, VectorStore, HNSW, MMapOverall this is a clean, well-structured C ABI layer. The exception-boundary discipline, null-handle guards, and documented ABI conventions are all solid. A few things worth addressing before merge, plus some minor notes. Issues[major] No CI coverage for The PR acknowledges this as follow-up, but it's worth calling out: the C API is entirely untested in CI. Any future change to the C++ core (distance functions, HNSW save format, VectorStore behavior) could silently break the C ABI without any signal. The PR description suggests wiring a CI job once the bench harness (Plan 3) consumes it — that's acceptable for now, but I'd recommend tracking this as a concrete task rather than an open-ended follow-up. [minor] // capi/vanedb_capi.cpp:135-137
idx->set_ef_search(ef_search);
auto res = idx->search(q, k);
[minor] // capi/vanedb_capi.cpp:122
new HNSWIndex(dim, to_metric(metric), capacity, M, ef_construction,
static_cast<uint32_t>(seed)));The truncation is documented in the header and in a code comment, which is the right approach given the Rust ABI parity constraint. One additional suggestion: the header comment says "only the low 32 bits are used" — it might be worth adding that seeds ≥ 2³² will silently differ between C++ and any future Rust consumer that passes the full 64-bit value, since the Rust side may use the full seed. If both sides only use the low 32 bits, great — confirm that's the case in [minor] // capi/vanedb_capi.cpp:77
default: return DistanceMetric::L2;Documented in the header. For a closed enum ( Nits
# CMakeLists.txt:38 (actual)
target_compile_features(vanedb_cpp_capi PRIVATE cxx_std_20)
# docs/superpowers/plans/... (plan doc, Task 1 Step 4)
target_compile_features(vanedb_cpp_capi PUBLIC cxx_std_20)
Test writes temp files to the working directory without cleanup
# CMakeLists.txt:42
enable_language(C)This is inside Plan doc contains agent scaffolding instructions
These are implementation-process artifacts, not contributor documentation. They're harmless in the repo, but a future contributor reading this file expecting design rationale will find agent workflow instructions instead. Consider stripping the scaffolding preamble, or moving plan docs to a location that signals they're not normative reference material (e.g., What's done well
|
Code Review: feat(capi): C ABI for distance, VectorStore, HNSW, MMapOverall this is a clean, well-structured piece of work. The design invariants are sound, the exception-boundary discipline is good, and TDD with negative paths is exactly the right approach for a C ABI. A few things to address before merge: CorrectnessDistance functions lack The PR's stated design invariant is "every wrapper that can throw is try/catch-guarded," but the three distance functions have no guard: float vanedb_cpp_l2_sq(const float* a, const float* b, size_t dim) {
return l2_sq(a, b, dim); // no try/catch
}The underlying SIMD code is pure float arithmetic and almost certainly won't throw today, but if a future change adds bounds checking or allocation to the hot path, an uncaught C++ exception crossing the C boundary is undefined behavior. Wrap these the same way as the rest, or add a comment explicitly accepting the risk.
idx->set_ef_search(ef_search); // mutates shared state
auto res = idx->search(q, k); // then readsThe header documents this as single-threaded only, which is fine for the benchmark consumer. But the mutation is baked into the handle — a later caller who does concurrent searches on separate handles shares no state, but two callers racing on the same handle will corrupt ef_search silently. Consider whether the C++ core can accept ef_search as a search parameter instead of storing it on the object. If the Rust ABI parity constraint locks you into the current per-call parameter shape, at least make
If Design / API
case VANEDB_L2:
default: return DistanceMetric::L2;This is documented in the header, but silent fallback to L2 when the caller passes an invalid metric (e.g., Public include path exposes C++ headers to consumers of the static lib target_include_directories(vanedb_cpp_capi PUBLIC
${CMAKE_CURRENT_SOURCE_DIR}/src # <-- exposes all C++ core headers
${CMAKE_CURRENT_SOURCE_DIR})The Test coverage
vanedb_cpp_store* s_bad = vanedb_cpp_store_new(0, VANEDB_L2);
assert(s_bad == NULL);This test passes only if HNSW assert(rc_save == 0);
vanedb_cpp_hnsw_free(h);
// ... later ...
remove("capi_hnsw.bin");If an earlier assert fires, Minor / nits
What's done well
Summary: Three correctness items (missing try/catch on distance functions, potential corrupt file on mmap_build failure, ef_search thread-safety model) and one CMake visibility bug ( |
Code Review — feat(capi): C ABI for distance, VectorStore, HNSW, MMapOverviewClean addition of a Potential Bug: CI missing
|
| Category | Finding | Severity |
|---|---|---|
| Bug | CI configure missing -DVANEDB_BUILD_TESTS=ON (test target may not exist) |
High |
| Code quality | No null-check for out_ids/out_dists in search functions |
Medium |
| Code quality | to_metric default-to-L2 swallows invalid enum values silently |
Low |
| Tests | free(NULL) not exercised; mmap bad-path only tests NULL not invalid path |
Low |
| Docs | Planning artifact in tree is already stale vs. final code | Low |
The C ABI design itself is correct and well-considered. The CI test-target issue is the one thing that should be verified/fixed before merge.
|
Code review posted below (written in plain text to avoid shell escaping issues). |
Code Review: feat(capi) C ABI for distance, VectorStore, HNSW, MMapOverviewThis PR adds a shippable Issues1. Thread-safety footgun in
2. CI job does not pin The test target is gated on 3. In Minor Suggestions4. Plan document in the repo — 5. Seed truncation comment placement — The 6. 7. What is Done Well
SummaryIssues 1–3 are worth fixing before merge: issue 2 is a one-liner, issue 3 is a few lines matching existing patterns, and issue 1 is a comment addition at the declaration site. The rest is minor polish. Overall a solid C ABI implementation — exception-to-error boundary, null-guard discipline, and smoke test coverage are all well executed. |
TDD plan for the vanedb-cpp extern "C" surface (distance, VectorStore, HNSW, MMap) consumed by vanedb-bench. Plans 2 (Rust vanedb-capi) and 3 (bench harness) follow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds capi/vanedb_capi.h with the full extern "C" contract (distance, VectorStore, HNSW, MMap), an empty-bodied capi/vanedb_capi.cpp with metric mapping, a C smoke test (tests/capi/test_capi.c) that validates enum values, and CMake wiring for the VANEDB_BUILD_CAPI=ON static library target. Also adds build-capi/ to .gitignore and enables the C language when VANEDB_BUILD_CAPI+VANEDB_BUILD_TESTS are both ON (project declares LANGUAGES CXX only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add vanedb_cpp_mmap_build/_open/_search/_free wrappers; transient MMapVectorStoreBuilder in _build, MMapVectorStore* behind opaque handle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test put load-bearing calls inside assert(), which are compiled out under -DNDEBUG. The project strips NDEBUG from CXX release flags but not C, so the C test silently no-op'd in Release. Fix: set CMAKE_C_FLAGS_RELEASE without NDEBUG (matching CXX), and hoist all calls out of assert() so they execute regardless of NDEBUG. Also document the hnsw seed narrowing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- null const char* path args reached fstream/mmap before any guard; a null path raises SIGSEGV (a signal, not a C++ exception) which catch(...) cannot intercept. Guard path on hnsw_save/load and mmap_build/open. - cap _search copy loops at k (defense-in-depth; the core already caps, but the wrapper now enforces it too). - document to_metric's L2-default and the hnsw seed narrowing in the header. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, project(C) - add a dedicated 'C API' CI job (configures VANEDB_BUILD_CAPI=ON, builds+runs the C smoke test) - header: note per-thread handles for concurrent hnsw_search; clarify seed low-32 behavior - declare C at project() level (LANGUAGES CXX C); drop the nested enable_language(C) - test: remove capi_hnsw.bin/capi_mmap.bin artifacts after use - plan doc: fix stale PUBLIC->PRIVATE snippet; mark as internal artifact
MSVC is exactly where the C API's C-source-linked-against-C++-static-lib pattern (LINKER_LANGUAGE CXX) can diverge, so a Linux-only smoke test gives false cross-platform confidence. Matrix it; --config Release for the MSVC multi-config generator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ctest uses -C/--build-config; --config is a cmake --build flag. The build step was correct; the ctest step errored 'Unknown argument: --config' on all three OSes. --build-config is needed by MSVC's multi-config generator and ignored by single-config Unix generators. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e3a61bd to
e97f177
Compare
Code Review: feat(capi): C ABI for distance, VectorStore, HNSW, MMapOverall: This is a well-structured C ABI layer. The three design invariants called out in the PR description (exception safety, null-handle safety, ownership) are all properly implemented and documented. Some specific points below. Correctness
CMake / Build
Testing
Documentation
Minor
Summary: The implementation is correct and the invariants are sound. The two most actionable items before merge are (1) explicitly passing |
Summary
Adds a shippable C ABI (
extern "C", symbol prefixvanedb_cpp_) over the header-only C++ core, built as a static librarylibvanedb_cpp_capi.abehind a newVANEDB_BUILD_CAPICMake option (default OFF). This is Plan 1 of 3 towardvanedb/vanedb-bench, the cross-language benchmark harness — it lets a Rust driver link the C++ implementation and call it through the same C ABI as the Rust implementation (vanedb_rs_*).Includes the implementation plan doc (
docs/superpowers/plans/2026-05-28-cpp-c-api.md).What's exposed
vanedb_cpp_l2_sq/_cosine_distance/_dot_product_store_new/_add/_search/_free_hnsw_new(seeded) /_add/_search(per-callef_search) /_save/_load/_free_mmap_build/_open/_search/_freeDesign invariants (verified in review)
try/catch(...)-guarded — no C++ exception crosses the C boundary.reinterpret_cast(a null deref is UB thatcatch(...)can't catch)._new/_open/_load(.release()d) is balanced by one_free.ef_searchare intentional, to stay byte-identical to the parallel Rust ABI. Documented in the header.Testing
tests/capi/test_capi.c(compiled as C, linked C++) exercises every function incl. HNSW save/load round-trip and negative paths.-DNDEBUG(asserts compiled out). Fixed by stripping NDEBUG from C release flags (matching the existing C++ treatment) and hoisting all calls out ofassert(). Verified by negative control (a wrong assert now fails Release ctest) and by.binartifacts appearing in Release.Verification
Default builds are unaffected (
VANEDB_BUILD_CAPIis OFF; the C flags only affect the lone C test source).Follow-up (not in this PR)
VANEDB_BUILD_CAPI=ONinto a CI job so the C API is continuously guarded — most natural once Plan 3's harness consumes it.🤖 Generated with Claude Code