You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@
38
38
</div>
39
39
40
40
41
-
SemHash is a lightweightlibrary for semantic deduplication, outlier filtering, and representative sample selection. It's fully multimodal: text works out-of-the-box with fast [Model2Vec](https://github.com/MinishLab/model2vec) embeddings, and you can bring your own encoders for images, audio, or custom models.
41
+
SemHash is a lightweight, multimodal library for semantic deduplication, outlier filtering, and representative sample selection. Text works out of thebox with fast [Model2Vec](https://github.com/MinishLab/model2vec) embeddings, and images, audio, and other modalities are supported with custom encoders.
42
42
43
43
SemHash supports both single-dataset operations (clean a training set) and cross-dataset operations (deduplicate test against train). It works with simple lists and complex multi-column datasets, and includes inspection tools to help you understand and refine results. All operations use [Vicinity](https://github.com/MinishLab/vicinity) for efficient similarity search.
0 commit comments