Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added text summarization to pgRag #12

Closed
wants to merge 7 commits into from
Closed

Conversation

synapse
Copy link
Contributor

@synapse synapse commented May 10, 2024

Fixes #7

  • added summarization function (this requires setting up a chat model and to be passed in along side the embeddings, although optional)
  • added configurable summarization options but comes with pre built-in configs
  • did a couple of linting fixes

Copy link
Contributor

@marceloFerreira90 marceloFerreira90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code wise looks good. But lets clean it up a bit. Can you move all numbers and and all text to a constants file.

I would recommend loading the prompts from a txt file as these will change with more requirements

@@ -0,0 +1,72 @@
import { PromptTemplate } from '@langchain/core/prompts'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code wise looks good. But lets clean it up a bit. Can you move all numbers and and all text to a constants file.

I would recommend loading the prompts from a txt file as these will change with more requirements

Copy link
Contributor

@nherment nherment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall:

  • I would not necessarily expose the summarize function. If we think Nearform should provide one it may be good to put it in its own library.
  • The summarization should be automated and be run on each added document as part of the processDocument job. The summary should then be added to the vectorized chunks for that document (metadata: { parentDocumentId: <sourc edoc id> }

@synapse
Copy link
Contributor Author

synapse commented May 10, 2024

Looks good overall:

  • I would not necessarily expose the summarize function. If we think Nearform should provide one it may be good to put it in its own library.
  • The summarization should be automated and be run on each added document as part of the processDocument job. The summary should then be added to the vectorized chunks for that document (metadata: { parentDocumentId: <sourc edoc id> }
  • I know we've talked about not exposing it, but I did it to prevent the CI from failing because of unused variable. As soon as we start using it inside the project I'll remove it from the export.
  • adding the auto summarization next

@synapse synapse requested a review from marceloFerreira90 May 10, 2024 09:53
@synapse synapse requested a review from nherment May 10, 2024 13:39
@synapse synapse closed this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Summarize document and make the summary searchable for RAG
3 participants