Similarity Search in FAISS Returning Raw, Unintelligible Data #4120
Rajat-2001
started this conversation in
General
Replies: 1 comment
-
I see the vector representation of the text when running the same code e.g.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Summary
When performing similarity search using FAISS (Facebook AI Similarity Search), the results are often returned as raw, low-level vector data that isn't human-readable or useful without additional processing. Instead of meaningful textual data or relevant objects, the output is composed of unintelligible characters and symbols, representing the vectorized data internally.
Example Output:
Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
This behavior is expected from FAISS, as it returns high-dimensional vectors during similarity searches. However, it’s not helpful to end users without further translation into meaningful data such as text, image references, or other objects.
Platform
OS: Linux/Ubuntu 22.04
Faiss version: 1.7.2
Faiss compilation options: Compiled with CUDA support
OS:
Faiss version:
Installed from:
Faiss compilation options:
Running on:
Interface:
Reproduction instructions
Install FAISS:
pip install faiss-cpu (for CPU version)
pip install faiss-gpu (for GPU version, if applicable)
Create a FAISS index and add data:
import faiss
import numpy as np
Create random data to simulate a vector search
d = 512 # Dimensionality of the vectors
nb = 1000000 # Number of vectors (adjust as needed)
np.random.seed(1234)
data = np.random.random((nb, d)).astype('float32')
Create FAISS index using L2 distance
index = faiss.IndexFlatL2(d)
index.add(data)
Perform a search with a random query vector
query = np.random.random((1, d)).astype('float32')
D, I = index.search(query, k=5)
Output the results (This is where the raw data appears)
for rank, (distance, idx) in enumerate(zip(D[0], I[0])):
print(f"Rank: {rank+1}, Distance: {distance}, Text: {data[idx]}")
Expected Output: The output should ideally show human-readable data or objects that are similar to the input query.
Example Expected Output:Rank: 1, Distance: 0.923, Text: "Some relevant text or object description"
Rank: 2, Distance: 1.023, Text: "Another relevant item"
Actual Output: Instead of meaningful text or objects, the output returns raw vector data that’s not interpretable without further processing, like:Rank: 1, Distance: 1.629706859588623, Text: M *M 4M JM pM M M N qN N N N O TO \O ]O {O O P hP ~P P P IQ lQ Q Q Q Q FR XR ~R R R =S S S T T ;T |T T T T [U \U U U +V KV UV dV uV V V W W W W $X 4X X X X
Rank: 2, Distance: 1.6545774936676025, Text: F F F F F G G H H PH RH nH I -I EI HI ZI I I I J J J J K K =L DL #M oM M M M ;N N N N sO O O LP P P P *Q 7Q TQ _Q Q Q R dR R ;S kS T KT T T T T T !U #U
The output here is raw data that represents the internal vector space from FAISS, which is not directly human-readable.
Beta Was this translation helpful? Give feedback.
All reactions