Skip to content

Commit 3e5a064

Browse files
SimplyLizclaude
andcommitted
feat(lip): wire up stream_context, query_expansion, explain_match
Closes the correctness and utilisation gaps found in the CKB 9.1.0 review. Correctness: - lipFileURI now handles absolute paths and already-prefixed file:// URIs instead of naively joining them with repoRoot. - Handshake runs once on engine startup; its supported_messages list drives Engine.lipSupports(), the gate for v2.0+ RPCs against older daemons. Daemon version + supported_messages length are logged. Utilisation (three high-ROI LIP RPCs we were not using): - stream_context (v2.1) → explainFile now attaches a ranked list of semantically-related symbols (top 10 within a 2048-token budget) to the response's facts.related field. New streaming transport in internal/lip/stream_context.go reads N symbol_info frames + the end_stream terminator — the previous LIP client was unary-only. - query_expansion (v1.6) → SearchSymbols expands ≤ 2-token queries with up to 5 related terms before FTS5. Recovers recall on vocabulary-mismatch misses without touching precision on compound queries. - explain_match (v2.0) → SemanticSearchWithLIPExplained attaches up to two ranked chunks per semantic hit (top 5 hits, line ranges + text + score), letting the caller cite specific lines instead of a bare file URL. All three are gated on the handshake's supported_messages so clients talking to older daemons fall through to the legacy paths cleanly. Tests: unit coverage for StreamContext happy path, daemon-down, and error-frame abort. Existing lip_health, lip_ranker, and query tests still pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent ecac0c1 commit 3e5a064

11 files changed

Lines changed: 650 additions & 6 deletions

CHANGELOG.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,44 @@ All notable changes to CKB will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- **`explainFile` surfaces semantically-related symbols** via LIP v2.1's
10+
`stream_context` RPC (`internal/query/lip_stream_context.go`). The daemon
11+
ranks symbols across the whole file within a 2048-token budget; CKB
12+
returns the top 10 in the new `facts.related` field with per-symbol
13+
relevance and token cost. Gated on the handshake's `supported_messages`
14+
— older daemons fall through and the field is absent. New streaming
15+
transport `internal/lip/stream_context.go` reads the daemon's N
16+
`symbol_info` frames plus the `end_stream` terminator; previous LIP
17+
client was unary-only.
18+
- **`searchSymbols` expands short queries** via LIP's `query_expansion`
19+
RPC (`internal/query/query_expansion.go`). Queries of ≤ 2 tokens get up
20+
to 5 related terms appended before hitting FTS5, recovering recall on
21+
vocabulary-mismatch misses ("auth" → "authenticate authorization
22+
principal…"). Gated on the handshake and on the same mixed-models flag
23+
that protects the rerank path. Longer queries are passed through
24+
unchanged — the expansion is a rescue, not a rewrite.
25+
- **Semantic hits carry evidence chunks** when LIP v2.0+'s `explain_match`
26+
is advertised (`SemanticSearchWithLIPExplained` in
27+
`internal/query/lip_ranker.go`). Each hit returned by the semantic
28+
fallback path now includes up to two ranked chunks with line ranges,
29+
text, and per-chunk scores — the caller can cite specific lines instead
30+
of a bare file URL. Capped at the top-5 hits to bound round-trip cost.
31+
- **`lip.Handshake` runs on engine startup** and the daemon's
32+
`supported_messages` list is stashed for feature gating
33+
(`Engine.lipSupports`). The daemon version and supported-count are
34+
logged on connect.
35+
36+
### Changed
37+
38+
- **`lipFileURI` path normalisation** — the helper that builds
39+
`file://`-URIs for LIP requests used to naive-`filepath.Join` whatever
40+
`Location.FileId` a backend supplied. Now handles absolute paths and
41+
already-prefixed `file://` URIs without producing malformed results
42+
like `file:///repo//abs/path`. Backends today return relative paths, so
43+
this is a hardening fix for contracts that are nominally open.
44+
745
### Changed
846

947
- **LIP health: push-driven, not polled** — the Engine now opens a long-lived

internal/lip/client.go

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ type NearestResult struct {
4141
type HandshakeInfo struct {
4242
DaemonVersion string `json:"daemon_version"`
4343
ProtocolVersion int `json:"protocol_version"`
44+
// SupportedMessages is the snake_case `type` tag list the daemon
45+
// understands. Empty when talking to a pre-v1.5 daemon that omits the
46+
// field — callers should fall back to ProtocolVersion comparisons.
47+
SupportedMessages []string `json:"supported_messages"`
4448
}
4549

4650
// IndexStatusInfo is the public view of LIP index health.
@@ -158,8 +162,9 @@ type batchAnnotationResp struct {
158162
}
159163

160164
type handshakeResp struct {
161-
DaemonVersion string `json:"daemon_version"`
162-
ProtocolVersion int `json:"protocol_version"`
165+
DaemonVersion string `json:"daemon_version"`
166+
ProtocolVersion int `json:"protocol_version"`
167+
SupportedMessages []string `json:"supported_messages"`
163168
}
164169

165170
type similarityResp struct {
@@ -790,7 +795,11 @@ func Handshake(clientVersion string) (*HandshakeInfo, error) {
790795
}
791796
return lipRPC(req, 200*time.Millisecond, 4<<10,
792797
func(r handshakeResp) *HandshakeInfo {
793-
return &HandshakeInfo{DaemonVersion: r.DaemonVersion, ProtocolVersion: r.ProtocolVersion}
798+
return &HandshakeInfo{
799+
DaemonVersion: r.DaemonVersion,
800+
ProtocolVersion: r.ProtocolVersion,
801+
SupportedMessages: r.SupportedMessages,
802+
}
794803
})
795804
}
796805

internal/lip/stream_context.go

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
package lip
2+
3+
import (
4+
"encoding/json"
5+
"io"
6+
"net"
7+
"time"
8+
)
9+
10+
// StreamContextSymbol is one frame of a StreamContext response. The
11+
// embedded `OwnedSymbolInfo` is flattened into the fields we actually
12+
// consume in CKB — the full Rust struct carries many fields we don't
13+
// need (telemetry, relationships, taint) and serialising them through
14+
// `map[string]any` would be wasteful.
15+
type StreamContextSymbol struct {
16+
URI string `json:"uri"`
17+
DisplayName string `json:"display_name"`
18+
Kind string `json:"kind"`
19+
RelevanceScore float32 `json:"relevance_score"`
20+
TokenCost uint32 `json:"token_cost"`
21+
}
22+
23+
// StreamContextResult summarises a completed StreamContext stream.
24+
// `Reason` is one of "token_budget" | "exhausted" | "error".
25+
type StreamContextResult struct {
26+
Symbols []StreamContextSymbol
27+
Reason string
28+
Emitted uint32
29+
TotalCandidates uint32
30+
Err string
31+
}
32+
33+
// StreamContextPosition is the cursor rectangle the daemon ranks around.
34+
// Byte-offset semantics match LIP's `OwnedRange` — 0-based lines and
35+
// chars. Pass a zero-width range at the cursor, or a whole-file range
36+
// (`start_line=0, end_line=lineCount`) for file-level context.
37+
type StreamContextPosition struct {
38+
StartLine int `json:"start_line"`
39+
StartChar int `json:"start_char"`
40+
EndLine int `json:"end_line"`
41+
EndChar int `json:"end_char"`
42+
}
43+
44+
// streamContextMaxFrames caps how many SymbolInfo frames we accept before
45+
// bailing out — defence against a runaway daemon. Large indexes could
46+
// theoretically produce 10k+ candidates; a hard cap of 1024 is well above
47+
// any realistic caller budget and cheap to enforce.
48+
const streamContextMaxFrames = 1024
49+
50+
// StreamContext opens a dedicated connection, sends a `stream_context`
51+
// request, and reads SymbolInfo frames until the daemon writes the
52+
// `end_stream` terminator. Returns (nil, nil) when the daemon is
53+
// unavailable — callers must treat nil as "LIP unavailable" (same contract
54+
// as the rest of the package).
55+
//
56+
// The dedicated connection is intentional: `stream_context` on the
57+
// shared subscriber channel would interleave with IndexStatus pings and
58+
// IndexChanged pushes and complicate parsing. One connection per call is
59+
// fine — the ranking itself dominates latency, and callers shouldn't
60+
// issue this RPC more than a few times per second.
61+
func StreamContext(fileURI string, pos StreamContextPosition, maxTokens uint32, model string) (*StreamContextResult, error) {
62+
conn, err := net.DialTimeout("unix", SocketPath(), 500*time.Millisecond)
63+
if err != nil {
64+
return nil, nil
65+
}
66+
defer conn.Close()
67+
// Overall deadline: the daemon's relevance ranking is heuristic and
68+
// bounded, but pathological inputs could stall. 10 s is generous; for
69+
// a token_budget of ~2000 it completes in ~200 ms typically.
70+
_ = conn.SetDeadline(time.Now().Add(10 * time.Second))
71+
72+
req := map[string]any{
73+
"type": "stream_context",
74+
"file_uri": fileURI,
75+
"cursor_position": pos,
76+
"max_tokens": maxTokens,
77+
}
78+
if model != "" {
79+
req["model"] = model
80+
}
81+
if err := writeFrame(conn, req); err != nil {
82+
return nil, nil
83+
}
84+
85+
out := &StreamContextResult{Symbols: make([]StreamContextSymbol, 0, 16)}
86+
for range streamContextMaxFrames + 1 {
87+
frame, err := readFrame(conn)
88+
if err != nil {
89+
if err == io.EOF {
90+
return out, nil
91+
}
92+
return nil, nil
93+
}
94+
// ServerResponse { ok: ServerMessage, error: Option<String> }
95+
inner := frame
96+
if raw, ok := frame["ok"]; ok && len(raw) > 0 && string(raw) != "null" {
97+
_ = json.Unmarshal(raw, &inner)
98+
}
99+
var kind string
100+
_ = json.Unmarshal(inner["type"], &kind)
101+
102+
switch kind {
103+
case "symbol_info":
104+
var sym struct {
105+
SymbolInfo struct {
106+
URI string `json:"uri"`
107+
DisplayName string `json:"display_name"`
108+
Kind string `json:"kind"`
109+
} `json:"symbol_info"`
110+
RelevanceScore float32 `json:"relevance_score"`
111+
TokenCost uint32 `json:"token_cost"`
112+
}
113+
if b, ok := marshalInner(inner); ok {
114+
_ = json.Unmarshal(b, &sym)
115+
}
116+
out.Symbols = append(out.Symbols, StreamContextSymbol{
117+
URI: sym.SymbolInfo.URI,
118+
DisplayName: sym.SymbolInfo.DisplayName,
119+
Kind: sym.SymbolInfo.Kind,
120+
RelevanceScore: sym.RelevanceScore,
121+
TokenCost: sym.TokenCost,
122+
})
123+
case "end_stream":
124+
var end struct {
125+
Reason string `json:"reason"`
126+
Emitted uint32 `json:"emitted"`
127+
TotalCandidates uint32 `json:"total_candidates"`
128+
Error *string `json:"error"`
129+
}
130+
if b, ok := marshalInner(inner); ok {
131+
_ = json.Unmarshal(b, &end)
132+
}
133+
out.Reason = end.Reason
134+
out.Emitted = end.Emitted
135+
out.TotalCandidates = end.TotalCandidates
136+
if end.Error != nil {
137+
out.Err = *end.Error
138+
}
139+
return out, nil
140+
case "error", "unknown_message":
141+
// Daemon rejected the request — treat as unavailable.
142+
return nil, nil
143+
default:
144+
// Unknown frame type mid-stream: skip rather than fail hard.
145+
}
146+
}
147+
return out, nil
148+
}
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
package lip
2+
3+
import (
4+
"encoding/binary"
5+
"encoding/json"
6+
"io"
7+
"net"
8+
"os"
9+
"path/filepath"
10+
"testing"
11+
"time"
12+
)
13+
14+
// startStreamContextDaemon spins up a Unix socket that responds to any
15+
// request with `frames` in order and then closes. Tests inject the full
16+
// frame sequence they want to exercise — the fake is dumb so behaviour
17+
// under malformed input is exercised by the real daemon's tests, not
18+
// CKB's.
19+
func startStreamContextDaemon(t *testing.T, frames []map[string]any) {
20+
t.Helper()
21+
dir, err := os.MkdirTemp("/tmp", "lipstream")
22+
if err != nil {
23+
t.Fatalf("mkdirtemp: %v", err)
24+
}
25+
sock := filepath.Join(dir, "s.sock")
26+
ln, err := net.Listen("unix", sock)
27+
if err != nil {
28+
os.RemoveAll(dir)
29+
t.Fatalf("listen: %v", err)
30+
}
31+
prev := os.Getenv("LIP_SOCKET")
32+
os.Setenv("LIP_SOCKET", sock)
33+
t.Cleanup(func() {
34+
ln.Close()
35+
os.RemoveAll(dir)
36+
os.Setenv("LIP_SOCKET", prev)
37+
})
38+
39+
go func() {
40+
conn, err := ln.Accept()
41+
if err != nil {
42+
return
43+
}
44+
defer conn.Close()
45+
// Drain the incoming stream_context request.
46+
_ = conn.SetReadDeadline(time.Now().Add(2 * time.Second))
47+
var lenBuf [4]byte
48+
if _, err := io.ReadFull(conn, lenBuf[:]); err != nil {
49+
return
50+
}
51+
reqLen := binary.BigEndian.Uint32(lenBuf[:])
52+
if _, err := io.CopyN(io.Discard, conn, int64(reqLen)); err != nil {
53+
return
54+
}
55+
for _, f := range frames {
56+
b, _ := json.Marshal(f)
57+
var out [4]byte
58+
binary.BigEndian.PutUint32(out[:], uint32(len(b)))
59+
if _, err := conn.Write(out[:]); err != nil {
60+
return
61+
}
62+
if _, err := conn.Write(b); err != nil {
63+
return
64+
}
65+
}
66+
}()
67+
}
68+
69+
func TestStreamContext_ParsesSymbolInfoThenEndStream(t *testing.T) {
70+
startStreamContextDaemon(t, []map[string]any{
71+
{
72+
"type": "symbol_info",
73+
"symbol_info": map[string]any{
74+
"uri": "file:///repo/foo.go",
75+
"display_name": "Foo",
76+
"kind": "function",
77+
},
78+
"relevance_score": 0.8,
79+
"token_cost": 120,
80+
},
81+
{
82+
"type": "symbol_info",
83+
"symbol_info": map[string]any{
84+
"uri": "file:///repo/bar.go",
85+
"display_name": "Bar",
86+
"kind": "struct",
87+
},
88+
"relevance_score": 0.6,
89+
"token_cost": 80,
90+
},
91+
{
92+
"type": "end_stream",
93+
"reason": "token_budget",
94+
"emitted": 2,
95+
"total_candidates": 17,
96+
},
97+
})
98+
99+
res, err := StreamContext("file:///repo/foo.go", StreamContextPosition{EndLine: 100}, 1024, "")
100+
if err != nil {
101+
t.Fatalf("unexpected err: %v", err)
102+
}
103+
if res == nil {
104+
t.Fatal("nil result, want 2 symbols")
105+
}
106+
if len(res.Symbols) != 2 {
107+
t.Fatalf("symbols = %d, want 2", len(res.Symbols))
108+
}
109+
if res.Symbols[0].DisplayName != "Foo" || res.Symbols[0].RelevanceScore != 0.8 {
110+
t.Errorf("symbol[0] = %+v", res.Symbols[0])
111+
}
112+
if res.Reason != "token_budget" || res.Emitted != 2 || res.TotalCandidates != 17 {
113+
t.Errorf("terminator mismatch: %+v", res)
114+
}
115+
}
116+
117+
func TestStreamContext_DaemonUnavailableReturnsNil(t *testing.T) {
118+
prev := os.Getenv("LIP_SOCKET")
119+
os.Setenv("LIP_SOCKET", "/tmp/ckb-lip-stream-nonexistent.sock")
120+
t.Cleanup(func() { os.Setenv("LIP_SOCKET", prev) })
121+
122+
res, err := StreamContext("file:///repo/foo.go", StreamContextPosition{}, 1024, "")
123+
if err != nil {
124+
t.Fatalf("err = %v, want nil (silent degradation contract)", err)
125+
}
126+
if res != nil {
127+
t.Fatalf("res = %+v, want nil", res)
128+
}
129+
}
130+
131+
func TestStreamContext_ErrorFrameAborts(t *testing.T) {
132+
startStreamContextDaemon(t, []map[string]any{
133+
{
134+
"type": "error",
135+
"message": "cursor out of range",
136+
"code": "cursor_out_of_range",
137+
},
138+
})
139+
res, _ := StreamContext("file:///repo/foo.go", StreamContextPosition{EndLine: 9999}, 1024, "")
140+
if res != nil {
141+
t.Fatalf("res = %+v, want nil on error frame", res)
142+
}
143+
}

internal/query/engine.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,17 @@ type Engine struct {
6868
// connection open and receives `index_changed` pushes plus per-ping health
6969
// snapshots. `lipHealthCheckedAt` is zero until the first frame arrives —
7070
// callers check it before trusting the flags.
71+
//
72+
// `lipSupported` is the set of `type` tags the daemon advertised in its
73+
// handshake. It gates calls to newer RPCs (StreamContext, ExplainMatch,
74+
// ...) on clients talking to an older daemon, instead of letting them
75+
// dispatch and get back an UnknownMessage. Empty when the handshake has
76+
// not yet completed or the daemon predates `supported_messages`.
7177
lipHealthMu sync.RWMutex
7278
cachedLipMixed bool
7379
cachedLipAvailable bool
7480
lipHealthCheckedAt time.Time
81+
lipSupported map[string]struct{}
7582
lipSubCancel context.CancelFunc
7683

7784
// Cache stats

0 commit comments

Comments
 (0)