Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master merge for 0.4.8 #1200

Merged
merged 77 commits into from
Apr 9, 2024
Merged

master merge for 0.4.8 #1200

merged 77 commits into from
Apr 9, 2024

Conversation

rudolfix
Copy link
Collaborator

@rudolfix rudolfix commented Apr 8, 2024

Description

master merge for 0.4.8

dat-a-man and others added 30 commits March 13, 2024 05:03
* add docs preprocessor script poc

* fix api reference sidebar

* fix deploy scripts

* remove python dependency from running docs locally

* fix edit link for process docs

* pin databind.json python package

* pin databind core

* use concurrently to run watcher for local dev

* extend preprocess script to insert tuba links

* remove tuba links from md files

* update tuba markers

* update script to insert snippets

* update package.json

* update preprocess script

* remove snippets code from markdown files and update markers

* update examples

* small change to contributing md

* fix preprocess script to use new snippets marker

* update preprocess script and npm run scripts

* fix custom destination example to match new format
* places incomplete columns at the end when inferring from data

* stores per table hints in resources, allows to compute them via item metas + tests

* fixes slots class and test
* properly recognizes new and modified schemas, fixes several places where version was bumped incorrectly

* fixes saving and importing schemas in schema storage, adds missing tests

* fixes lacking write disposition when creating resource

* skips saving schemas when it was not modified in extract and normalize

* adds tables only tests to drop command

* splits destination exceptions, fixes new schemas in tests

* fixes test
* Added Scrapy docs

* Updated

* Updated

* Updated

* small edits

---------

Co-authored-by: AstrakhantsevaAA <astra92293@gmail.com>
* start fixing blog snippets

* fix parsing and linting errors

* fix toml snippets
* removes all dlt dependencies from logger

* uses dataclass_transform to generate init methods for configspec, warnings on inconsistent settings

* changes all configspecs to conform with new init methods, drops special init for credentials

* fixes setting native value None
* Documentation update for reading incremental parameters from configuration files

* Updated

* Updated

* Update

* Improved wordings and added a bit of explanation.

* small edits

---------

Co-authored-by: AstrakhantsevaAA <astra92293@gmail.com>
* Updated schema docs

* small edits

---------

Co-authored-by: AstrakhantsevaAA <astra92293@gmail.com>
* Add RESTClient and tests

* Add PyJWT

* Add initial version of `rest_client.paginate()`

* Export `rest_client.paginate` to `helpers.requests` module

* Fix the typing error

* Use dlt.common.json

* Add dependency checks for PyJWT and cryptography in auth module

* Remove unused imports and check_connection function from rest_client utils

* Refactor pagination assertion into a standalone function

* Move `paginate` function test to new file `test_requests_paginate.py`

* Remove PyJWT from deps

* Remove explicit initializers and meta fields from configspec classes

* Implement lazy loading for jwt and cryptography in auth

* Set username default to None

* Add PyJWT to dev dependencies
burnash and others added 16 commits April 3, 2024 14:33
* feat(bigquery): add streaming inserts support

* move jobs into job_impl.py
add streaming arg into bigquery_adapter
support parquet format

* complete merge

* erase excess imports

* improve tests

* move tests into load

* add nested data test

* query_job fix

* add docs example

* still allows bigquery 2.x client

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* Add yaml representer for pendulum datetime

* Fix linting issues
* add chunk separation handling for select_union writer type

* include synapse in insert job client tests and make them pass

* remove obsolete test

* set max_rows_per_insert to prevent error on larger queries in synapse

* remove pipeline dependency

* make imports conditional on destination type

* include synapse in insert job client tests and make them pass

* include mssql in insert job client tests and make them pass

* make psycopg2 import conditional on destination type

---------

Co-authored-by: Jorrit Sandbrink <sandbj01@heiway.net>
* remove matrix stuff and point destination tests to essential

* mark one test as essential and switch destination tests to run essential tests

* add prefixes to workflow files

* unify naming a bit more

* update snippets

* increase time difference value in gdrive tests

* allow workflows to run on forks if so labelled

* remove secrets from local destinations tests

* guard secrets leaking on docs tests

* update job output

* test label on bigquery

* force rerun

* put secrets back in local destinations

* add new run setup to all destinations

* mark a couple of tests as essential

* add nightly schedule for full destination test run

* fix tests condition

* fix type

* add a bunch of more essential markers

* run full tests on mssql for all branches

* fix newlines in yaml file

* register essential marker and remove some essential tests

* exclude \"athena-parquet-staging-iceberg\" from regular athena tests

* remove regular athena tests from iceberg tests

---------

Co-authored-by: rudolfix <rudolfix@rudolfix.org>
* bumps for prerelease 0.4.8a1

* requires password and database in motherduck credentials

* identifies data writer by both file format and source item format, adds csv writer for arrow and object(wip)

* adds postgres csv writer via COPY

* improves arrow and parquet tests, adds arrow normalization edge cases

* refactors extractors in extract, disables schema caches when processing multiple arrows

* refactors item normalizers, adds arrow normalization, improves logging

* removes internal file formats from loader file formats, renames, tests improvements

* adds simple csv and postgres docs

* closes writers on exceptions, passes metrics on exceptions, fixes some edge cases with empty arrow files

* fixes empty tables writer tests and bugs

* fixes closing writers when exception during flush, missing tzdata on windows handling

* installs tzdata on windows ci

* adds csv to docs index

* fixes athena sql job client tests setup

* adjusts for timezone for the preferred precision, all other precision use timestamp w/o tz

* generates create table statements for synapse outside of a job

* fixes athena table undefinded detection

* generates all timestamps with timezones in parquet. tests workarounds in duckdb

* fixes quoting in regular csv writer and force nulls in postgres copy job

* finalizes the docs

* renames jobs in tests so it is possible to select them as required
* feat: parameterize pipeline class in the primary factory method

* chore: use generic typing

* chore: remove no args overload

* uses TypeVal with default

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
Changed directory of all the blog images to google cloud storage.
* Add docker-compose.yml for Dremio

* bootstrap dremio in docker-compose.yml

* refactor dremio bootstrap

* Add dremio client dependency

* test adbc from separate container

* Add pydremio db api implementation

* Further development

* Initial dremio test

* Add description and rowcount to pydremio

* Initial INSERT working

* Passing test

* Clean up test

* Fixup some more issues

* Inject data source configuration

* Inject data source configuration

* Add flatten logic

* pyproject.toml

* Fix pyproject.toml

* Fix Dockerfile

* Fix a couple of problems

* Tidy up

* Add dremio.md

* Fix supported file formats in capabilities

* Add code to handle partition and localsort

* Add some tests around PARTITION and LOCALSORT

* Add some docs for partitions

* Update poetry.lock and fix lint errors

* Use DOUBLE instead of FLOAT

* Fix a few more tests

* Override CREATE TEMP TABLE queries as Dremio does not support TEMP tables

* Credit the original code in pydremio and reproduce Apache2 license.

* poetry.lock

* Refactor sqlalchemy ULR import

* Fix stage loading test

* Fix stage loading test

* Fix lint issues

* Ensure all standard tests are run and start fixing failures

* Fix COPY INTO command

* Escape "value"

* More fixes

* More fixes

* Only two failing tests left

* 1 Test failing

* Remove the flatten functionality

* Fix lint

* remove data_source config option

* Add some verbiage around the lack of CREATE SCHEMA

* Some fixes and add Dremio to staging destination configs

* Remove staging_credentials from DremioLoadJob

* Remove staging_credentials from DremioLoadJob

* update lockfile post merge

* add dremio test workflow

* fixing dremio tests

* fix docs code section types

* fix post devel merge linting errors

* ignore callarg for dremio config test

* Fix test_dremio_client.py

* make minio setup sleep a bit

* fix remaining test

* small refactor of sql job
small cleanups

* remove unneeded statement

* mark dremio as experimental

* reset active destinations

* revert client change and update test

* fix default order by

* merge fixes, dremio factory test

* configures dremio pipeline tests properly

* upgrades dremio ci workflow

* fixes local destinations ci workflow

---------

Co-authored-by: Firman, Max <max.firman@troweprice.com>
Co-authored-by: Dave <shrps@posteo.net>
Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* Replaced broken links

* Updated links
* Updated schema docs for naming convention

* Update schema.md

Adding type toml to code snippet.

---------

Co-authored-by: Zaeem Athar <zaeemathar94@gmail.com>
* adds csv reader tests

* does not translate new lines in text data writers

* fixes more tests
@rudolfix rudolfix added the ci full run the full load tests on pr label Apr 8, 2024
Copy link

netlify bot commented Apr 8, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 7939749
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66152873f3ace9000886e4e4
😎 Deploy Preview https://deploy-preview-1200--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudolfix rudolfix removed the ci full run the full load tests on pr label Apr 9, 2024
@rudolfix rudolfix merged commit c99d612 into master Apr 9, 2024
45 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.