Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong segment number when using api to add batch segments. #3270

Closed
4 tasks done
sunxichen opened this issue Apr 10, 2024 · 2 comments
Closed
4 tasks done

Wrong segment number when using api to add batch segments. #3270

sunxichen opened this issue Apr 10, 2024 · 2 comments
Assignees
Labels
🐞 bug Something isn't working 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database.

Comments

@sunxichen
Copy link
Contributor

Self Checks

  • This is only for bug report, if you would like to ask a quesion, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.9

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

add batch segments using add segment API (http://{DIFY_URL}/v1/datasets/{DATASET_ID}/documents/{DOCUMENT_ID}/segments), and it assign same number to all segments within same batch.
request example:

for i in tqdm(range(0, len(datas), 10)):
      batch_datas = datas[i : i + 10]
      response = requests.post(
            url=DIFY_ADD_SEGMENT_URL,
            headers={
                "Content-Type": "application/json",
                "Authorization": f"Bearer {DIFY_DATASET_API_KEY}"
                },
            json={"segments": batch_datas},
      )

✔️ Expected Behavior

Different segment should have different number although they are added in the same batch.
1E27FF9D-53CA-4B71-8AA0-0A728009107C

❌ Actual Behavior

Different segment have same number when they are added in the same batch
1E27FF9D-53CA-4B71-8AA0-0A728009107C

Copy link

dosubot bot commented Apr 10, 2024

Hey @sunxichen, great to see you back! 😊 Let's dive into this new puzzle together. 🛠️

It looks like the issue you're encountering with segment numbers being the same for all segments added in the same batch could be due to how the segments are being created and stored. To address this, ensure each segment has a unique identifier. This can be achieved by:

  1. Generating Unique Identifiers: For each new segment, generate a unique identifier, similar to how uuid.uuid4() is used for doc_id. This ensures each segment is distinct.
  2. Database Auto-increment Fields: If using a database, leveraging an auto-increment field for segment numbers can automatically assign unique numbers to each segment.
  3. Thread-safe Operations: If your process involves concurrency, ensure operations are thread-safe to prevent race conditions, possibly using locks or other synchronization mechanisms.

The segment addition logic, particularly how segments are numbered or identified, should be reviewed to ensure it aligns with these recommendations. If segments are stored with a specific logic for numbering, ensuring this logic is correctly implemented and handles batch additions uniquely is crucial.

If you're still facing issues, providing more details about the error messages or the specific circumstances under which the problem occurs could help in diagnosing and resolving the issue more effectively.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Apr 10, 2024
@crazywoola crazywoola added the 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. label Apr 10, 2024
@JohnJyong
Copy link
Collaborator

issue fixed by pr:#3351, thanks for your feedback @sunxichen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database.
Projects
None yet
Development

No branches or pull requests

3 participants