Skip to content

Releases: pathwaycom/pathway

v0.16.3

02 Jan 14:38
Compare
Choose a tag to compare

Added

  • pw.io.iceberg.write method for writing Pathway tables into Apache Iceberg.

Changed

  • values of non-deterministic UDFs are not stored in tables that are append_only.
  • pw.Table.ix has better runtime error message that includes id of the missing row.

Fixed

  • temporal behaviors in temporal operators (windowby, interval_join) now consume no CPU when no data passes through them.

v0.16.2

02 Jan 14:37
Compare
Choose a tag to compare

Added

  • pw.xpacks.llm.prompts.RAGPromptTemplate, set of prompt utilities that enable verifying templates and creating UDFs from prompt strings or callables.
  • pw.xpacks.llm.question_answering.BaseContextProcessor streamlines development and tuning of representing retrieved context documents to the LLM.
  • pw.io.kafka.read now supports with_metadata flag, which makes it possible to attach the metadata of the Kafka messages to the table entries.
  • pw.io.deltalake.read can now stream the tables with deletions, if no deletion vectors were used.

Changed

  • pw.io.sharepoint.read now explicitly terminates with an error if it fails to read the data the specified number of times per row (the default is 8).
  • pw.xpacks.llm.prompts.prompt_qa, and other prompts expect 'context' and 'query' fields instead of 'docs'.
  • Removed support for short_prompt_template and long_prompt_template in pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer. These prompt variants are no longer accepted during construction or in requests.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows setting user created prompts. Templates are verified to include 'context' and 'query' placeholders.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer can take a BaseContextProcessor that represents context documents to the LLM. Defaults to pw.xpacks.llm.question_answering.SimpleContextProcessor which filters metadata fields and joins the documents with new lines.

Fixed

  • The input of pw.io.fs.read and pw.io.s3.read is now correctly persisted in case deletions or modifications of already processed objects take place.

v0.16.1

12 Dec 15:09
Compare
Choose a tag to compare

Changed

  • pw.io.s3.read now monitors object deletions and modifications in the S3 source, when ran in streaming mode. When an object is deleted in S3, it is also removed from the engine. Similarly, if an object is modified in S3, the engine updates its state to reflect those changes.
  • pw.io.s3.read now supports with_metadata flag, which makes it possible to attach the metadata of the source object to the table entries.

Fixed

  • pw.xpacks.llm.document_store.DocumentStore no longer requires _metadata column in the input table.

v0.16.0

29 Nov 10:49
Compare
Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

This project adheres to Semantic Versioning.

[Unreleased]

[0.16.0] - 2024-11-29

Added

  • pw.xpacks.llm.document_store.SlidesDocumentStore, which is a subclass of pw.xpacks.llm.document_store.DocumentStore customized for retrieving slides from presentations.
  • pw.temporal.inactivity_detection and pw.temporal.utc_now functions allowing for alerting and other time dependent usecases

Changed

  • pw.Table.concat, pw.Table.with_id, pw.Table.with_id_from no longer perform checks if ids are unique. It improves memory usage.
  • table operations that store values (like pw.Table.join, pw.Table.update_cells) no longer store columns that are not used downstream.
  • append_only column property is now propagated better (there are more places where we can infer it).
  • BREAKING: Unused arguments from the constructor pw.xpacks.llm.question_answering.DeckRetriever are no longer accepted.

Fixed

  • query_as_of_now of pw.stdlib.indexing.DataIndex and pw.stdlib.indexing.HybridIndex now work in constant memory for infinite query stream (no query-related data is kept after query is answered).

v0.15.4

18 Nov 20:52
Compare
Choose a tag to compare

Added

  • pw.io.kafka.read now supports reading entries starting from a specified timestamp.
  • pw.io.nats.read and pw.io.nats.write methods for reading from and writing Pathway tables to NATS.

Changed

  • pw.Table.diff now supports setting instance parameter that allows computing differences for multiple groups.
  • pw.io.postgres.write_snapshot now keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.

Fixed

  • pw.PyObjectWrapper is now picklable.

v0.15.3

07 Nov 07:12
Compare
Choose a tag to compare

Added

  • pw.io.mongodb.write connector for writing Pathway tables in MongoDB.
  • pw.io.s3.read now supports downloading objects from an S3 bucket in parallel.

Changed

  • pw.io.fs.read performance has been improved for directories containing a large number of files.

v0.15.2

25 Oct 04:07
Compare
Choose a tag to compare

Added

  • pw.io.deltalake.read now supports custom S3 Delta Lakes with HTTP endpoints.
  • pw.io.deltalake.read now supports specifying both a custom endpoint and a custom region for Delta Lakes via pw.io.s3.AwsS3Settings.

Changed

  • Indices in pathway.stdlib.indexing.nearest_neighbors can now work also on numpy arrays. Previously they only accepted list[float]. Working with numpy arrays improves memory efficiency.
  • pw.io.s3.read has been optimized to minimize new object requests whenever possible.
  • It is now possible to set the size limit of cache in pw.udfs.DiskCache.
  • State persistence now uses a single backend for both metadata and stream storage. The pw.persistence.Config.simple_config method is therefore deprecated. Now you can use the pw.persistence.Config constructor with the same parameters that were previously used in simple_config.

Fixed

  • pw.io.bigquery.write connector now correctly handles pw.Json columns.

v0.15.1

04 Oct 10:06
Compare
Choose a tag to compare

Fixed

  • pw.temporal.session and pw.temporal.asof_join now correctly works with multiple entries with the same time.
  • Fixed an issue in pw.stdlib.indexing where filters would cause runtime errors while using HybridIndexFactory.

v0.15.0

12 Sep 07:21
Compare
Choose a tag to compare

Added

  • Experimental A pw.xpacks.llm.document_store.DocumentStore to process and index documents.
  • pw.xpacks.llm.servers.DocumentStoreServer used to expose REST server for retrieving documents from pw.xpacks.llm.document_store.DocumentStore.
  • pw.xpacks.stdlib.indexing.HybridIndex used for querying multiple indices and combining their results.
  • pw.io.airbyte.read now also supports streams that only operate in full_refresh mode.

Changed

  • Running servers for answering queries is extracted from pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer into pw.xpacks.llm.servers.QARestServer and pw.xpacks.llm.servers.QASummaryRestServer.
  • BREAKING: query and query_as_of_now of pathway.stdlib.indexing.data_index.DataIndex now produce an empty list instead of None if no match is found

v0.14.3

22 Aug 07:57
Compare
Choose a tag to compare

Fixed

  • pw.io.deltalake.read and pw.io.deltalake.write now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.

Added

  • The Pathway CLI command spawn can now execute code directly from a specified GitHub repository.
  • A new CLI command, spawn-from-env, has been added. This command runs the Pathway CLI spawn command using arguments provided in the PATHWAY_SPAWN_ARGS environment variable.