-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest space 63/ingest knowledge base #64
Conversation
allow KnowledgeBase ingestion
WalkthroughThis pull request introduces a comprehensive refactoring of the ingestion service, transitioning from a space-centric to a more generalized body of knowledge ingestion approach. The changes include modifications to GraphQL fragments, event handling, client interactions, and data processing mechanisms. Key updates involve creating new types for body of knowledge ingestion, adding support for knowledge base ingestion, and restructuring the event bus and client interactions to support a more flexible data ingestion model. Changes
Assessment against linked issues
Possibly related PRs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🔭 Outside diff range comments (2)
src/event.bus/connection.ts (2)
Line range hint
47-53
: Update queue configuration to reflect new event types.The queue names in the configuration still reference "space" (
RABBITMQ_INGEST_SPACE_QUEUE
,RABBITMQ_INGEST_SPACE_RESULT_QUEUE
) despite the transition to body of knowledge events.Consider updating the environment variables to use more generic names or specific body of knowledge terminology:
- incomingQueue: this.getEnvValue('RABBITMQ_INGEST_SPACE_QUEUE'), - outgoingQueue: this.getEnvValue('RABBITMQ_INGEST_SPACE_RESULT_QUEUE'), + incomingQueue: this.getEnvValue('RABBITMQ_INGEST_BOK_QUEUE'), + outgoingQueue: this.getEnvValue('RABBITMQ_INGEST_BOK_RESULT_QUEUE'),
Line range hint
89-94
: Update event binding to match new event types.The queue binding still uses 'IngestSpaceResult' as the routing key despite the transition to body of knowledge events.
Update the binding to use the new event type:
await this.channel.bindQueue( this.config.outgoingQueue, this.config.exchange, - 'IngestSpaceResult' + 'IngestBodyOfKnowledgeResult' );
🧹 Nitpick comments (15)
package.json (1)
62-62
: Consider pinning the langsmith version.The langsmith package is at version 0.1.x which indicates it's not yet stable. Using the caret (^) version range could lead to breaking changes.
- "langsmith": "^0.1.66", + "langsmith": "0.1.66",src/embed.body.of.knowledge.ts (2)
22-25
: Simplify the UTC timestamp conversion and improve the comment.The current method for obtaining the UTC timestamp is unnecessarily complex, and the comment could be more professional.
Consider using
Date.now()
to get the UTC timestamp directly. Apply this diff to simplify the code and update the comment:- // this shenanigan is here to ensure the Timestamp is in UTC timezone - result.timestamp = new Date( - new Date().toLocaleString('en', { timeZone: 'UTC' }) - ).getTime(); + // Ensure the timestamp is in UTC timezone + result.timestamp = Date.now();Repeat this change at lines 104-106.
Also applies to: 104-106
100-101
: Correct the spelling of 'occurred' in the error message.There's a typographical error: 'occured' should be spelled 'occurred'.
Apply this diff to fix the typo:
resultEvent.error = { - message: 'An error occured while embedding.', + message: 'An error occurred while embedding.', };src/index.ts (1)
15-15
: Update log message to reflect body of knowledge terminology.The log message still references "space" which is inconsistent with the transition to body of knowledge terminology.
- logger.info(`Ingest completed for space: ${event.bodyOfKnowledgeId}`); + logger.info(`Ingest completed for body of knowledge: ${event.bodyOfKnowledgeId}`);src/event.bus/events/ingest.body.of.knowledge.result.ts (2)
11-13
: Expand error codes to cover more failure scenarios.The
ErrorCode
enum currently only handles vector insertion failures. Consider adding more error codes for other potential failure scenarios.export enum ErrorCode { VECTOR_INSERT = 'vector_insert', + GRAPHQL_CONNECTION = 'graphql_connection', + NOT_FOUND = 'not_found', + INVALID_DATA = 'invalid_data', + PROCESSING_ERROR = 'processing_error', }
26-26
: Initialize timestamp in constructor.The optional timestamp should be initialized with the current timestamp by default.
- public timestamp?: number, + public timestamp: number = Date.now(),src/logger.ts (1)
41-44
: Consider enhancing error object handling.The implementation is solid and follows TypeScript best practices. Consider adding support for objects with
message
properties that aren't Error instances.export const getErrorMessage = (error: unknown) => { if (error instanceof Error) return error.message; + if (typeof error === 'object' && error !== null && 'message' in error) { + return String((error as { message: unknown }).message); + } return String(error); };src/data.readers/konwledge.base.ts (2)
28-28
: Remove commented out code.The commented portion of the line should be removed to maintain clean code.
- const documents: Document[] = []; //= await processknowledgeBaseTree( + const documents: Document[] = [];
13-26
: Consider extracting knowledge base fetching logic.The knowledge base fetching and validation logic could be extracted into a separate function for better modularity.
+async function fetchKnowledgeBase(knowledgeBaseId: string, alkemioClient: AlkemioCliClient) { + try { + const knowledgeBase = await alkemioClient.ingestKnowledgeBase(knowledgeBaseId); + if (!knowledgeBase) { + throw new Error(`Knowledge base ${knowledgeBaseId} not found.`); + } + return knowledgeBase; + } catch (error) { + logger.error(error); + throw new Error(`Failed to fetch knowledge base ${knowledgeBaseId}: ${error}`); + } +} export const embedKnowledgeBase = async ( event: IngestBodyOfKnowledge, alkemioClient: AlkemioCliClient ) => { const knowledgeBaseId = event.bodyOfKnowledgeId; - let knowledgeBase; - try { - knowledgeBase = await alkemioClient.ingestKnowledgeBase(knowledgeBaseId); - } catch (error) { - logger.error(error); - throw new Error('GraphQL connection failed.'); - } - - if (!knowledgeBase) { - logger.error(`knowledgeBase ${knowledgeBaseId} not found.`); - throw new Error(`knowledgeBase ${knowledgeBaseId} not found.`); - } + const knowledgeBase = await fetchKnowledgeBase(knowledgeBaseId, alkemioClient);src/callout.handlers/base.ts (1)
Line range hint
54-76
: Consider enhancing contribution document generation.The current implementation could benefit from improved type safety and logging for contribution processing.
+ interface ContributionDocument { + link?: { [key: string]: unknown }; + post?: { [key: string]: unknown }; + } + const processContribution = (contribution: ContributionDocument) => { + const docLike = contribution.link || contribution.post; + if (!docLike) return null; + + const { pageContent, documentId, source, type, title } = generateDocument(docLike); + logger.debug(`Processing contribution document: ${documentId}`); + return new Document({ + id: documentId, + pageContent, + metadata: { documentId, source, type, title }, + }); + }; for (const contribution of callout.contributions || []) { - let docLike; - if (contribution.link) { - docLike = contribution.link; - } else if (contribution.post) { - docLike = contribution.post; - } - - if (docLike) { - const { pageContent, documentId, source, type, title } = - generateDocument(docLike); - result.push( - new Document({ - id: documentId, - pageContent, - metadata: { - documentId, - source, - type, - title, - }, - }) - ); + const doc = processContribution(contribution); + if (doc) { + result.push(doc); } }src/embed.documents.ts (1)
56-56
: Update logging message to reflect body of knowledge terminology.The logging message still references "space" despite the transition to body of knowledge.
- logger.info(`Splitting documents for space: ${bokID}`); + logger.info(`Splitting documents for body of knowledge: ${bokID}`);graphql/fragments/space-ingest.graphql (1)
Line range hint
1-1
: Consider renaming the fragment to match new terminology.The fragment name still uses "SpaceIngest" despite the transition to body of knowledge approach.
Consider renaming to
BodyOfKnowledgeIngest
for consistency with the new terminology.graphql/fragments/profile.graphql (1)
8-12
: Consider adding field validation constraints.The new location fields enhance the profile structure. Consider specifying validation constraints (e.g.,
@length
,@pattern
) forpostalCode
and potentially making essential fields non-nullable where appropriate.graphql/fragments/callout.graphql (2)
47-47
: Remove unnecessary empty line.For consistency, remove the empty line between
profile
and closing brace.
1-50
: Consider pagination for comments and contributions.The current structure might return large datasets for comments and contributions. Consider implementing pagination using
first
/after
parameters to manage response size and improve performance.Example modification:
comments(first: Int, after: String) { pageInfo { hasNextPage endCursor } edges { node { # existing fields } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
package-lock.json
is excluded by!**/package-lock.json
src/generated/graphql.ts
is excluded by!**/generated/**
📒 Files selected for processing (25)
.env.default
(1 hunks)graphql/fragments/callout.graphql
(1 hunks)graphql/fragments/profile.graphql
(1 hunks)graphql/fragments/space-ingest.graphql
(2 hunks)graphql/queries/knowledge-base-ingest.graphql
(1 hunks)package.json
(2 hunks)src/callout.handlers/base.ts
(2 hunks)src/callout.handlers/index.ts
(1 hunks)src/data.readers/konwledge.base.ts
(1 hunks)src/data.readers/process.space.tree.ts
(3 hunks)src/data.readers/space.ts
(1 hunks)src/data.readers/types.d.ts
(1 hunks)src/embed.body.of.knowledge.ts
(1 hunks)src/embed.documents.ts
(3 hunks)src/event.bus/connection.ts
(3 hunks)src/event.bus/events/ingest.body.of.knowledge.result.ts
(1 hunks)src/event.bus/events/ingest.body.of.knowledge.ts
(1 hunks)src/event.bus/events/ingest.space.result.ts
(0 hunks)src/event.bus/events/ingest.space.ts
(0 hunks)src/graphql.client/AlkemioCliClient.ts
(1 hunks)src/index.ts
(1 hunks)src/logger.ts
(1 hunks)src/process.callouts.ts
(1 hunks)src/space.embed/embed.space.ts
(0 hunks)src/summarize/graph.ts
(2 hunks)
💤 Files with no reviewable changes (3)
- src/space.embed/embed.space.ts
- src/event.bus/events/ingest.space.ts
- src/event.bus/events/ingest.space.result.ts
✅ Files skipped from review due to trivial changes (1)
- src/callout.handlers/index.ts
🧰 Additional context used
🪛 eslint
src/summarize/graph.ts
[error] 6-6: 'traceable' is defined but never used.
(@typescript-eslint/no-unused-vars)
🔇 Additional comments (13)
src/summarize/graph.ts (1)
24-30
: Verify LangSmith configuration.The model is now wrapped with LangSmith's SDK wrapper, which enables tracing and monitoring capabilities. However, this requires proper configuration to function correctly.
Please ensure the following environment variables are properly configured:
- LANGCHAIN_TRACING_V2
- LANGCHAIN_ENDPOINT
- LANGCHAIN_API_KEY
Run this script to verify the configuration:
package.json (1)
2-2
: Package name change aligns with project objectives.The package name change from
@alkemio/space-ingest
to@alkemio/ingest-alkemio-data
better reflects the transition from a space-centric to a generalized body of knowledge approach.src/data.readers/types.d.ts (1)
1-9
: Type definitions are well-structured.The added type declarations for
BodyOfKnowledgeReadResult
andReadResult
are correct and enhance type safety.src/event.bus/events/ingest.body.of.knowledge.ts (1)
1-17
: Enums and class definitions are appropriate.The new enums
IngestionPurpose
andBodyOfKnowledgeType
, along with theIngestBodyOfKnowledge
class, are well-defined and align with the overall event handling structure.src/data.readers/process.space.tree.ts (1)
25-25
: LGTM! Good addition of document ID.Adding the document ID to the Document constructor ensures consistent identification across the system.
src/callout.handlers/base.ts (1)
41-41
: LGTM! Consistent document ID implementation.The addition of document IDs to both main callout and contribution documents maintains consistency across the system.
Also applies to: 67-67
src/graphql.client/AlkemioCliClient.ts (1)
76-81
: LGTM! Clean implementation of knowledge base ingestion.The new method follows existing patterns and maintains consistency with the class's error handling approach.
src/event.bus/connection.ts (1)
129-135
: LGTM! Event parsing is correctly updated.The message parsing logic correctly extracts the new body of knowledge fields.
src/embed.documents.ts (1)
20-26
: LGTM! Clean transition to body of knowledge model.The function signature and parameter handling have been properly updated to use the new BodyOfKnowledgeReadResult type.
graphql/fragments/space-ingest.graphql (1)
15-19
: LGTM! Clean implementation of callouts structure.The new calloutsSet structure with fragment spread improves code reusability and maintainability.
graphql/queries/knowledge-base-ingest.graphql (1)
1-15
: Well-structured query implementation!The query is well-organized and properly utilizes fragments for code reuse. The structure aligns well with the space ingestion pattern, promoting consistency.
Let's verify the fragment dependencies are properly defined:
✅ Verification successful
Fragment dependencies verified successfully
All required fragments (ProfileFields, CalloutFields, ProfileNoTagsetFields) are properly defined and available in the codebase. The dependency chain is complete and free of circular dependencies.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if all required fragments are properly imported rg -l "fragment (ProfileFields|CalloutFields|ProfileNoTagsetFields)" graphql/fragments/Length of output: 201
Script:
#!/bin/bash # Check the content of fragment files echo "=== profile.graphql ===" cat graphql/fragments/profile.graphql echo -e "\n=== callout.graphql ===" cat graphql/fragments/callout.graphqlLength of output: 1130
Script:
#!/bin/bash # Check the content of ProfileNoTagsetFields fragment echo "=== profile.no.tagset.graphql ===" cat graphql/fragments/profile.no.tagset.graphqlLength of output: 314
.env.default (1)
24-25
: Verify resource implications of increased chunk parameters.The significant increases in
CHUNK_SIZE
(9x) andCHUNK_OVERLAP
(5x) could impact memory usage and processing performance. Please ensure:
- The system has sufficient memory to handle larger chunks
- The increased overlap doesn't cause significant performance degradation
- These values have been tested with representative data volumes
graphql/fragments/callout.graphql (1)
9-22
: Well-implemented polymorphic sender type!The inline fragments for User and VirtualContributor types are properly implemented, maintaining type safety while sharing common profile fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (8)
src/data.readers/knowledge.base.ts (2)
14-14
: Typo in comment: 'priviliges' should be 'privileges'.The comment on line 14 contains a typographical error.
Apply this diff to correct the typo:
- // make sure the service user has sufficient priviliges + // make sure the service user has sufficient privileges
30-30
: Remove commented-out code to improve readability.There is a commented-out code snippet that appears to be leftover from previous implementations. Removing it will clean up the code and enhance readability.
Apply this diff to remove the commented-out code:
- const documents: Document[] = []; //= await processknowledgeBaseTree( + const documents: Document[] = [];src/data.readers/space.ts (1)
16-16
: Typo in comment: 'priviliges' should be 'privileges'.The comment on line 16 contains a typographical error.
Apply this diff to correct the typo:
- // make sure the service user has sufficient priviliges + // make sure the service user has sufficient privilegessrc/embed.body.of.knowledge.ts (5)
17-29
: Simplify timestamp handling and improve comment clarity.The timestamp conversion can be simplified, and the comment should be more professional.
Apply this diff to improve the code:
- // this shenanigan is here to ensure the Timestamp is in UTC timezone - result.timestamp = new Date( - new Date().toLocaleString('en', { timeZone: 'UTC' }) - ).getTime(); + // Ensure timestamp is in UTC format + result.timestamp = new Date().toISOString();
39-39
: Remove redundant variable assignment.The
purpose
variable is redundant as it's already available inevent.purpose
and is only used once in the code.Apply this diff to remove the redundant assignment:
- const purpose = event.purpose;
Then update line 84 to use
event.purpose
directly.
50-55
: Include the error message in client initialization failure.Consider including the actual error message to aid in debugging.
Apply this diff to improve error handling:
} catch (error) { logger.error(error); - return setResultError(resultEvent, 'AlkemioClient can not be initialised.'); + return setResultError( + resultEvent, + `AlkemioClient can not be initialised: ${getErrorMessage(error)}` + ); }
79-85
: Consider using a more explicit return type for embedDocuments.Using a boolean return type for
embedDocuments
makes the success/failure conditions implicit. Consider returning a more detailed result type.Consider creating and using an
EmbeddingResult
type that provides more context about the embedding operation's outcome.
102-102
: Fix typo in error message.There's a typo in the error message: "occured" should be "occurred".
Apply this diff to fix the typo:
- message: 'An error occured while embedding.', + message: 'An error occurred while embedding.',
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/data.readers/knowledge.base.ts
(1 hunks)src/data.readers/process.space.tree.ts
(3 hunks)src/data.readers/space.ts
(1 hunks)src/embed.body.of.knowledge.ts
(1 hunks)src/process.callouts.ts
(1 hunks)src/summarize/graph.ts
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/summarize/graph.ts
- src/data.readers/process.space.tree.ts
🔇 Additional comments (5)
src/process.callouts.ts (1)
21-23
: Ensure safe handling of 'document' when pushing into 'documents'.Using the spread operator with an unknown array size could cause a stack overflow. Also, type checking for 'document' would improve safety.
Consider modifying the code as follows:
if (document) { - documents.push(...document); + if (Array.isArray(document)) { + documents.push(...document); + } else { + documents.push(document); + } }src/data.readers/space.ts (1)
34-37
: Add type validation before casting to ensure safety.Unsafe casting of 'space' to
Partial<Space>
could lead to runtime errors if 'space' does not match the expected structure. Consider adding a type guard function to validate 'space' before casting.Consider modifying the code as follows:
+ // Type guard function + function isPartialSpace(obj: unknown): obj is Partial<Space> { + return obj !== null && typeof obj === 'object' && 'id' in obj; + } + + if (!isPartialSpace(space)) { + throw new Error(`Invalid space data received for ${spaceId}`); + } + const documents: Document[] = await processSpaceTree( - [space as Partial<Space>], + [space], alkemioClient );src/embed.body.of.knowledge.ts (3)
1-16
: LGTM! Well-organized imports.The imports are properly organized and reflect the architectural transition to a generalized body of knowledge ingestion approach.
105-108
: Simplify timestamp handling and improve comment clarity.The timestamp conversion can be simplified using toISOString().
Apply this diff to improve the code:
- // this shenanigan is here to ensure the Timestamp is in UTC timezone - resultEvent.timestamp = new Date( - new Date().toLocaleString('en', { timeZone: 'UTC' }) - ).getTime(); + // Ensure timestamp is in UTC format + resultEvent.timestamp = new Date().toISOString();
59-67
: Add explicit handling for unknown body of knowledge types.The current implementation assumes any non-ALKEMIO_SPACE type is a knowledge base. Consider adding explicit type validation.
Apply this diff to improve type handling:
if (event.type === BodyOfKnowledgeType.ALKEMIO_SPACE) { logger.info(`Ingest invoked for Space: ${event.bodyOfKnowledgeId}`); result = await embedSpace(event, alkemioClient); + } else if (event.type === BodyOfKnowledgeType.KNOWLEDGE_BASE) { + logger.info( + `Ingest invoked for KnowledgeBase: ${event.bodyOfKnowledgeId}` + ); + result = await embedKnowledgeBase(event, alkemioClient); } else { - logger.info( - `Ingest invoked for KnowledgeBase: ${event.bodyOfKnowledgeId}` - ); - result = await embedKnowledgeBase(event, alkemioClient); + throw new Error(`Unsupported body of knowledge type: ${event.type}`); }Let's verify the available types:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
package.json (1)
Update engines field to match dependency requirements
The current engines field (
>=16.15.0
) is incompatible with core dependencies that require Node.js >= 18. This could lead to installation and runtime issues. Please update the engines field to:"engines": { "node": ">=18.0.0", "npm": ">=8.5.5" }🔗 Analysis chain
Line range hint
71-79
: Review Node.js version requirements.There's a discrepancy between the engine requirements (
>=16.15.0
) and the Volta configuration (21.7.3
). This could lead to compatibility issues. Consider:
- Updating the engines field to match the development environment
- Documenting the recommended Node.js version in the README
- Adding CI checks to ensure compatibility across supported versions
Run this script to check for Node.js version-specific syntax:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for potential Node.js version compatibility issues rg -g '*.{js,ts}' -l 'import\s+.*\s+from|export\s+.*\s+from|class\s+.*\s+extends|async|await|??|#private|#\w+|\?\.'Length of output: 356
Script:
#!/bin/bash # Check for modern JavaScript syntax usage echo "=== Checking for ESM imports/exports ===" rg -g '*.{js,ts}' 'import .* from|export .* from' echo -e "\n=== Checking for modern class features and operators ===" rg -g '*.{js,ts}' '#private|#[a-zA-Z_]|class .* extends|\?\?|\?\.|\.\.\.' echo -e "\n=== Checking for async/await ===" rg -g '*.{js,ts}' 'async|await' echo -e "\n=== Checking CI configs for Node.js version ===" fd -g '*workflow*.{yml,yaml}' .github/workflows/ echo -e "\n=== Checking for version requirements in docs ===" rg -g '*.md' -i 'node.*version|prerequisites|requirements' echo -e "\n=== Checking package-lock.json for engine requirements ===" fd -g 'package-lock.json' -x cat {} | grep -A 5 '"engines":'Length of output: 72608
♻️ Duplicate comments (2)
src/data.readers/space.ts (1)
17-20
:⚠️ Potential issueAdd type validation before casting.
The unsafe type casting to
Partial<Space>
could lead to runtime errors if the space object doesn't match the expected structure.+ // Type guard function + function isPartialSpace(obj: unknown): obj is Partial<Space> { + return obj !== null && typeof obj === 'object' && 'id' in obj; + } + + if (!isPartialSpace(space)) { + throw new Error(`Invalid space data received for ${spaceId}`); + } + const documents: Document[] = await processSpaceTree( - [space as Partial<Space>], + [space], alkemioClient );src/process.callouts.ts (1)
21-23
:⚠️ Potential issueImprove document array handling.
The spread operator with unknown array size could cause stack overflow.
- if (document) { - documents.push(...document); + if (document && Array.isArray(document)) { + // Use concat for better performance with large arrays + documents = documents.concat(document.slice(0, 1000)); // Limit chunk size }
🧹 Nitpick comments (6)
src/process.callouts.ts (2)
7-10
: Consider strengthening type safety.Using
Partial<Callout>
suggests incomplete type information. Consider creating a more specific type that includes only the required fields for this operation (e.g.,visibility
).type ProcessableCallout = Pick<Callout, 'visibility' | /* other required fields */>; export const processCallouts = async ( callouts: ProcessableCallout[], alkemioClient: AlkemioCliClient ): Promise<Document[]> => {
13-25
: Consider adding progress tracking for large callout arrays.For better observability, especially with large arrays of callouts, consider adding progress logging.
+ const total = callouts.length; + let processed = 0; for (const callout of callouts) { + processed++; + logger.debug(`Processing callout ${processed}/${total}`);src/event.bus/connection.ts (1)
Line range hint
105-117
: Enhance error handling and connection validation.While the basic error handling is present, consider these improvements:
- Validate connection state before sending
- Add specific error types for different failure scenarios
async send(message: IngestBodyOfKnowledgeResult) { try { if (!this.channel) { await this.connect(); + if (!this.channel) { + throw new Error('Failed to establish channel after connection attempt'); + } } + if (!this.connected) { + throw new Error('Connection is not in ready state'); + } this.channel.sendToQueue( this.config.outgoingQueue, Buffer.from(JSON.stringify(message)) ); } catch (error) { + const errorMessage = error instanceof Error ? error.message : 'Unknown error'; + logger.error(`Failed to send message: ${errorMessage}`); logger.error(error); throw error; } }src/embed.documents.ts (2)
25-26
: Consider enhancing logging configuration.While the logging changes are correct, consider using a structured logging approach:
- Use a logging context instead of modifying defaultMeta directly
- Add log correlation ID for better traceability
Example enhancement:
- logger.defaultMeta.bodyOfKnowledgeId = bokID; + const logContext = logger.child({ + bodyOfKnowledgeId: bokID, + correlationId: crypto.randomUUID() + });Also applies to: 48-48, 56-56
Line range hint
20-191
: Consider improving error handling and resource management.The function handles a complex workflow with multiple external services (Azure OpenAI, ChromaDB). Consider these improvements:
- Implement proper cleanup in case of partial failures
- Add retry logic for transient failures in embedding generation
- Consider implementing a circuit breaker for external service calls
Would you like me to provide an example implementation of these patterns?
package.json (1)
62-62
: Lock the langsmith dependency version.The caret (^) in the version constraint allows automatic updates to newer minor/patch versions, which could introduce compatibility issues. Consider locking to a specific version:
- "langsmith": "^0.1.66", + "langsmith": "0.1.66",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (7)
package.json
(2 hunks)src/data.readers/knowledge.base.ts
(1 hunks)src/data.readers/space.ts
(1 hunks)src/embed.documents.ts
(3 hunks)src/event.bus/connection.ts
(3 hunks)src/event.bus/events/ingest.body.of.knowledge.result.ts
(1 hunks)src/process.callouts.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/event.bus/events/ingest.body.of.knowledge.result.ts
- src/data.readers/knowledge.base.ts
🔇 Additional comments (7)
src/data.readers/space.ts (2)
1-9
: LGTM! Imports are well-organized and necessary.All imports are being used and properly structured.
10-13
: LGTM! Function signature is well-defined.The types are explicit and the signature aligns with the new body of knowledge ingestion model.
src/process.callouts.ts (1)
1-6
: LGTM! Imports are well-organized.All imports are necessary and properly used in the implementation.
src/event.bus/connection.ts (1)
3-6
: LGTM: Import and type changes align with the new ingestion model.The transition from space-centric types to body of knowledge types is well-structured.
src/embed.documents.ts (2)
12-13
: LGTM!The import changes correctly reflect the transition from space-centric to body of knowledge types.
20-23
: Verify callers of embedDocuments function.The function signature has changed significantly:
- Changed from default to named export
- First parameter type changed to
BodyOfKnowledgeReadResult
- Purpose type changed to
IngestionPurpose
Let's verify all callers have been updated:
✅ Verification successful
All callers of embedDocuments have been properly updated ✓
The only caller in
src/embed.body.of.knowledge.ts
correctly uses:
- Named import syntax
- Updated parameter types and order
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining usages of the old import style and function signature rg -l "import embedDocuments from.*embed\.documents" || echo "No default imports found" rg -A 5 "embedDocuments\(" || echo "No direct calls found"Length of output: 467
Script:
#!/bin/bash # Check import statement and full context of the call echo "=== Import statement ===" rg "import.*embedDocuments" src/embed.body.of.knowledge.ts || echo "No import found" echo -e "\n=== Function call context ===" rg -B 10 -A 5 "embedDocuments\(" src/embed.body.of.knowledge.ts || echo "No context found"Length of output: 656
package.json (1)
2-3
: Consider bumping the major version.The package rename from
@alkemio/space-ingest
to@alkemio/ingest-alkemio-data
is a breaking change that affects package imports. According to semver, this should trigger a major version bump (1.0.0) rather than a minor version bump (0.11.0).
Fixes: #63
Summary by CodeRabbit
Release Notes
Configuration
GraphQL
Dependency Management
@alkemio/ingest-alkemio-data
.langsmith
dependency.Feature Enhancements
Event Handling