Conversation
* do matchAll per text block instead of matchAll on entire text string Signed-off-by: 01zulfi <85733202+01zulfi@users.noreply.github.com>
| for (const { text, pos } of extractTextWithPosition(doc)) { | ||
| for (const match of text.matchAll(searchTerm)) { | ||
| if (match[0] === "") break; | ||
| if (match.index !== undefined) { | ||
| results.push({ | ||
| from: pos + match.index, | ||
| to: pos + match.index + match[0].length | ||
| }); |
There was a problem hiding this comment.
Got this directly from: #1788 (There are also other changes that might optimize search further, have decided to include them in a future PR, so we can test this current change first)
The search speed improvement isn't noticeable because previous search was still fast (<100ms for 1000+ matches in a 100k+ words note). Though I've verified with console.time, this change improves the speed anywhere from 3x to 17x.
There was a problem hiding this comment.
Pull request overview
This PR optimizes the search performance in the editor's search-replace extension. Instead of concatenating all document text into a single string and running matchAll once on the entire text, the new approach extracts text blocks individually and runs matchAll per block. It also introduces a cleaner helper function extractTextWithPosition and simplifies the TextNodeWithPosition interface by removing the now-unnecessary startPos/start fields.
Changes:
- Introduces
extractTextWithPositionhelper that groups consecutive text nodes within the same block intoTextNodeWithPositionobjects, each carrying the ProseMirror absolute position of the first text node in the group - Replaces the old document-wide string concat + single-pass
matchAllwith a per-blockmatchAllloop, making the position calculation simpler and avoiding a full-document string build - Renames
TextNodesWithPosition→TextNodeWithPosition(singular, to match its new per-block semantics)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (const { text, pos } of extractTextWithPosition(doc)) { | ||
| for (const match of text.matchAll(searchTerm)) { | ||
| if (match[0] === "") break; | ||
| if (match.index !== undefined) { | ||
| results.push({ | ||
| from: pos + match.index, | ||
| to: pos + match.index + match[0].length | ||
| }); | ||
| } | ||
| index++; | ||
| } | ||
| }); | ||
|
|
||
| const text = textNodesWithPosition.map((c) => c.text).join(""); | ||
| for (const match of text.matchAll(searchTerm)) { | ||
| const start = match.index; | ||
| const end = match.index + match[0].length; | ||
|
|
||
| // Gets all matching nodes that have either the start or end of the | ||
| // search term in them. This adds support for multi line regex searches. | ||
| const nodes = textNodesWithPosition.filter((node) => { | ||
| const nodeStart = node.start; | ||
| const nodeEnd = nodeStart + node.text.length; | ||
| return ( | ||
| (start >= nodeStart && start < nodeEnd) || | ||
| (end >= nodeStart && end < nodeEnd) | ||
| ); | ||
| }); | ||
|
|
||
| if (!nodes.length) continue; | ||
| const endNode = nodes[nodes.length - 1]; | ||
| const startNode = nodes[0]; | ||
| results.push({ | ||
| // reposition our RegExp match index relative to the actual node. | ||
| from: | ||
| startNode.startPos + /*offset in startNode=*/ (start - startNode.start), | ||
| to: endNode.startPos + /*length inside endNode=*/ (end - endNode.start) | ||
| }); | ||
| } |
There was a problem hiding this comment.
The new implementation processes each text block independently, which means regex patterns that span block boundaries (e.g., line1\nline2) will no longer produce matches. The previous implementation explicitly supported this use-case — the removed comment said: "Gets all matching nodes that have either the start or end of the search term in them. This adds support for multi line regex searches."
With the enableRegex option available in the UI, users can enter multiline patterns. These patterns will silently find no matches in the new implementation, even when the text actually matches across paragraphs.
If cross-block matching is intentionally being dropped, that should be clearly documented (e.g., a note in SearchSettings or the UI), and the now-unused m flag in the regex could be removed to avoid confusion.
Related #1788