Ingest files from a directory #2622

cgardens · 2021-03-26T00:51:10Z

Tell us about the problem you're trying to solve

I have a directory on S3. New files are added to that directory on some cadence. Whenever a new file is added to the directory I want to sync that data to a destination. The names of the files are monotonically increasing and could be used as cursor fields.

Describe the solution you’d like

This is either a new feature in the file source or a new connector altogether.

heads up sherifnada

requested by SG. tagging roshan

┆Issue is synchronized with this Asana task by Unito

sherifnada · 2021-03-26T05:44:14Z

@roshan would all the files have the same schema/contribute to the same stream?

roshan · 2021-03-26T14:31:09Z

Yes, they represent a group of new records into the same stream.

…

On Thu, Mar 25, 2021, 22:44 Sherif A. Nada ***@***.***> wrote: @roshan <https://github.com/roshan> would all the files have the same schema/contribute to the same stream? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2622 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABX3XZHST4NUEKRICHMEURTTFQNLZANCNFSM4Z2NZ7TQ> .

roshan · 2021-03-26T18:27:57Z

If you would like, it is possible to arrange it so that each prefix represents a stream. I am happy to work with whatever method you think is most general.

sherifnada · 2021-03-31T18:36:36Z

jrhizor · 2021-03-31T18:46:46Z

file name history in state or timestamp based "cursor" (timestamp would work for cloud storage, won't work for local disk if you're doing incremental updates)
user defined mapping of regex -> jsonschema for each stream? default to object (no normalization)
open questions around how this fits with the existing file connector (should it be in the same connector or separate?)

harshithmullapudi · 2021-04-18T17:06:46Z

wouldn't it be good to solve independently for sources like S3, Google Storage as the s3-csv tap

harshithmullapudi · 2021-04-18T17:07:46Z

We have a usecase for which we need to solve this for S3. Thinking of bringing that tap-s3-csv to airbyte any thoughts here

Phlair · 2021-07-05T11:52:47Z

Brainstorm & MVP definition:
https://docs.google.com/document/d/1YbY9Css9e_4jRiKkkB60CfrUNkdr1Sx72ElPmANy0gU/edit#

cgardens · 2021-08-18T18:55:03Z

@Phlair can we close this?

Phlair · 2021-08-23T10:55:45Z

There's still outstanding work to extend this across formats / storage providers but those are all separate issues so this can be closed.
For sake of creating a trail back to this issue:
#5116 https://github.com/airbytehq/airbyte-internal-issues/issues/188 https://github.com/airbytehq/airbyte-internal-issues/issues/187 #5111 #5110 #5109 https://github.com/airbytehq/airbyte-internal-issues/issues/186 https://github.com/airbytehq/airbyte-internal-issues/issues/185 https://github.com/airbytehq/airbyte-internal-issues/issues/184 #5105 #5103 #5102

cgardens added type/enhancement New feature or request area/connectors Connector related issues labels Mar 26, 2021

sherifnada added this to the Core - 2021-07-07 milestone Jun 30, 2021

cgardens assigned Phlair Jun 30, 2021

sherifnada modified the milestones: Core - 2021-07-07, Core - 2021-07-14 Jul 7, 2021

cgardens modified the milestones: Core - 2021-07-14, Core - 2021-07-21 Jul 14, 2021

Phlair mentioned this issue Jul 21, 2021

DRAFT: Incremental Blob Source Connector #4882

Closed

cgardens modified the milestones: Core - 2021-07-21, Core - 2021-07-28 Jul 21, 2021

Phlair mentioned this issue Jul 26, 2021

🎉 New Source: S3 (+ abstract files source) #4990

Merged

21 tasks

cgardens modified the milestones: Core - 2021-07-28, Core 2021-08-04 Jul 28, 2021

cgardens modified the milestones: Core 2021-08-04, Core 2021-08-11 Aug 4, 2021

cgardens modified the milestones: Core 2021-08-11, Core 2021-08-18, Core 2021-08-25 Aug 16, 2021

Phlair closed this as completed Aug 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest files from a directory #2622

Ingest files from a directory #2622

cgardens commented Mar 26, 2021 •

edited by sync-by-unito bot

Loading

sherifnada commented Mar 26, 2021

roshan commented Mar 26, 2021 via email

roshan commented Mar 26, 2021

sherifnada commented Mar 31, 2021

jrhizor commented Mar 31, 2021

harshithmullapudi commented Apr 18, 2021

harshithmullapudi commented Apr 18, 2021

Phlair commented Jul 5, 2021

cgardens commented Aug 18, 2021

Phlair commented Aug 23, 2021

Ingest files from a directory #2622

Ingest files from a directory #2622

Comments

cgardens commented Mar 26, 2021 • edited by sync-by-unito bot Loading

Tell us about the problem you're trying to solve

Describe the solution you’d like

sherifnada commented Mar 26, 2021

roshan commented Mar 26, 2021 via email

roshan commented Mar 26, 2021

sherifnada commented Mar 31, 2021

jrhizor commented Mar 31, 2021

harshithmullapudi commented Apr 18, 2021

harshithmullapudi commented Apr 18, 2021

Phlair commented Jul 5, 2021

cgardens commented Aug 18, 2021

Phlair commented Aug 23, 2021

cgardens commented Mar 26, 2021 •

edited by sync-by-unito bot

Loading