An end to end anime recommendation system based on data scrapped from myanimelist.net
-
Updated
Mar 27, 2022 - Python
An end to end anime recommendation system based on data scrapped from myanimelist.net
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Playground for Apache Beam and Scio experiments, driven by real-world use cases.
ETL pipeline on GCP
Export Dialogflow conversation logs to BigQuery with masking PII using DLP API
Trigger a Dataflow job when a file is uploaded to Cloud Storage using a Cloud Function
A data pipeline to ingest, process, store storm events datasets so we can access them through different means.
GCP Dataflow pipeline with BigQuery as source and side input
This repo is dedicated for GCP data engineering concepts: BigTable, BigQuery, DataFlow, PubSub, DataProc Spark on GCP. Apache Beam, Apache AirFlow
Boilerplate for batch-processing scenarios' orchestration. Apache Airflow w/ realistic product analytics use case
GCP Dataflow pipeline with mapreduce in python
Apache beam sandbox w/ Dataflow for 10+ use cases
Big Data ETL Pipeline for ASL-to-Text (Computer Vision), using Apache Beam on GCP Dataflow
This repo is to demonstrate rag data processing pipeline using dataflow flex templates
Black Friday, the biggest shopping day of the year, presents a unique opportunity for retailers like Walmart to boost sales, attract new customers, and clear inventory. Managing the surge in transaction volumes, understanding customer preferences, and optimizing inventory in real time are critical challenges that require sophisticated data solution
Sample projects to explore various Google Cloud service-offerings and architecture approaches
A comprehensive cricket statistics pipeline built using Google Cloud services. The pipeline involves retrieving data via the Cricbuzz API, and leveraging various cloud technologies such as Cloud Composer, Cloud Storage, Cloud Functions, Dataflow, and more to process and manage the data efficiently.
This project illustrates real-time data processing and analytics. This project uses Apache Kafka for capturing and streaming real-time data, GCP Cloud Functions for processing data in real-time, GCP PubSub for real-time notifications, and GCP Looker Studio for real-time data visualization.
Add a description, image, and links to the gcp-dataflow topic page so that developers can more easily learn about it.
To associate your repository with the gcp-dataflow topic, visit your repo's landing page and select "manage topics."