Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: add include_labels option to ConfluenceLoader #28259

Conversation

nakamasato
Copy link
Contributor

@nakamasato nakamasato commented Nov 21, 2024

Description:

Enable ConfluenceLoader to include labels with include_labels option (false by default for backward compatibility). and the labels are set to metadata in the Document. e.g. {"labels": ["l1", "l2"]}

Notes

Confluence API supports to get labels by providing metadata.labels to expand query parameter

All of the following functions support expand in the same way:

  • confluence.get_page_by_id
  • confluence.get_all_pages_by_label
  • confluence.get_all_pages_from_space
  • cql (internally using /api/content/search)

Issue:

No issue related to this PR.

Dependencies:

No changes.

Twitter handle:

@gymnstcs

  • Add tests and docs: If you're adding a new integration, please include

    1. a test for the integration, preferably unit tests that do not rely on network access,
    2. an example notebook showing its use. It lives in docs/docs/integrations directory.
  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/

Copy link

vercel bot commented Nov 21, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Dec 9, 2024 7:31pm

@nakamasato nakamasato changed the title feat: add include_labels option to ConfluenceLoader community: add include_labels option to ConfluenceLoader Nov 21, 2024
@nakamasato nakamasato force-pushed the add-include-labels-option-to-confluence-loaders branch from 08a12d8 to f3dffce Compare November 21, 2024 13:17
metadata = {
"title": page["title"],
"id": page["id"],
"source": self.base_url.strip("/") + page["_links"]["webui"],
**({"labels": labels} if include_labels else {}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set labels key only when include_labels is set to true

Comment on lines +596 to +601
labels = [
label["name"]
for label in page.get("metadata", {})
.get("labels", {})
.get("results", [])
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label would be sth like this: {'prefix': 'global', 'name': 'database', 'id': '111111111'}

Screenshot 2024-11-21 at 22 23 03

ref

@nakamasato nakamasato marked this pull request as ready for review November 21, 2024 13:25
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) labels Nov 21, 2024
Comment on lines +339 to +345
expand = ",".join(
[
content_format.value,
"version",
*(["metadata.labels"] if include_labels else []),
]
)
Copy link
Contributor Author

@nakamasato nakamasato Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand is a comma-separated query parameter. originally hardcoded as f"{content_format.value},version".

I made a variable so we can add more option if necessary in the future as expand parameter supports a lot more values.

Screenshot 2024-11-21 at 22 27 32
(ref)

Copy link
Member

@efriis efriis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the docs screenshot!

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 9, 2024
@efriis efriis enabled auto-merge (squash) December 9, 2024 19:31
@efriis efriis merged commit ce3b69a into langchain-ai:master Dec 9, 2024
19 checks passed
ccurme pushed a commit that referenced this pull request Jan 8, 2025
…s loaded via CQL (#29089)

## Description
This PR enables label inclusion for documents loaded via CQL in the
confluence-loader.

- Updated _lazy_load to pass the include_labels parameter instead of
False in process_pages calls for documents loaded via CQL.
- Ensured that labels can now be fetched and added to the metadata for
documents queried with cql.

## Related Modification History
This PR builds on the previous functionality introduced in
[#28259](#28259), which
added support for including labels with the include_labels option.
However, this functionality did not work as expected for CQL queries,
and this PR fixes that issue.

If the False handling was intentional due to another issue, please let
me know. I have verified with our Confluence instance that this change
allows labels to be correctly fetched for documents loaded via CQL.

## Issue
Fixes #29088


## Dependencies
No changes.

## Twitter Handle
[@zenoengine](https://x.com/zenoengine)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants