Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmailEngine with Data Vault for SQL access #153

Open
arunqa opened this issue Sep 5, 2023 · 1 comment
Open

EmailEngine with Data Vault for SQL access #153

arunqa opened this issue Sep 5, 2023 · 1 comment
Assignees

Comments

@arunqa
Copy link

arunqa commented Sep 5, 2023

Now, we need to do the following:

  1. Use SQLa to create the Data Vault infrastructure as I’ve shown below – we will build views in SQL to get access to the email JSON data
  2. Create nodeJS or other EE SDK-based app which reads all emails via IMAP and stores the content in the DataVault
  3. Setup webhooks in EE so that incoming emails push to the Data Vault.

Setting up the Data Vault tables:

  • Hub: This will store unique email identifiers.
  • Satellite: This will store descriptive data related to each email.
CREATE TABLE hub_email (
    email_hash BYTEA PRIMARY KEY,       -- A hashed value to uniquely identify each email
    load_datetime TIMESTAMP NOT NULL,   -- Timestamp when the record was loaded
    record_source TEXT NOT NULL         -- Source of the record
);
CREATE TABLE sat_email_message(
    email_hash BYTEA NOT NULL,         -- The hash to link back to the hub
    load_datetime TIMESTAMP NOT NULL,  -- Timestamp when the record was loaded
    end_datetime TIMESTAMP,            -- Timestamp when the record is superseded by a new version. Null for the current version
    email_data JSONB NOT NULL,         -- Email data in JSONB format (EE generates a JSON of the email)
    record_source TEXT NOT NULL,       -- Source of the record
    PRIMARY KEY (email_hash, load_datetime),
    FOREIGN KEY (email_hash) REFERENCES hub_email(email_hash)
);
CREATE TABLE sat_email_attachment(…); 

-- Index for the hub

CREATE INDEX idx_hub_email_hash ON hub_email(email_hash);

-- Indexes for the satellite

CREATE INDEX idx_sat_email_detail_hash ON sat_email_detail(email_hash);
CREATE INDEX idx_sat_email_detail_load_datetime ON sat_email_detail(load_datetime);
CREATE INDEX idx_sat_email_detail_data_gin ON sat_email_detail USING GIN(email_data);

Create as many views as we need for accessing the emails in properly structured format.
Talk with Raphael and team for how they do FHIR mapping for complex JSON from unstructured text.

Example for how to insert data – this should be a stored procedure

Assuming we have variables:

  • $1 as the email JSONB, $2 as the record source, and $3 as the current timestamp

-- First, compute the email_hash

DO $$DECLARE email_hash_val BYTEA;
BEGIN
    email_hash_val := digest($1::TEXT, 'sha256');
    
    -- Try to insert into hub
    INSERT INTO hub_email(email_hash, load_datetime, record_source)
    VALUES (email_hash_val, $3, $2)
    ON CONFLICT (email_hash) DO NOTHING; -- prevent duplicates

    -- Insert into satellite
    INSERT INTO sat_email_detail(email_hash, load_datetime, email_data, record_source)
    VALUES (email_hash_val, $3, $1, $2);
END$$;

@alan-francis alan-francis self-assigned this Sep 5, 2023
@alan-francis
Copy link
Collaborator

@shah : In the current structure of SQLaide data vaults, we are following the below format:

const syntheticHub1 = dvg.hubTable("synthethic1", {
    hub_synthethic1_id: primaryKey(),
    h1_bkey_int: integer(),
    h1_bkey_text: text(),
    h1_bkey_date: date(),
    h1_bkey_int_nullable: integerNullable(),
    h1_bkey_text_nullable: textNullable(),
    ...dvg.housekeeping.columns,
  });

Here we are using the table_name_id as primary key and other required columns with the housekeeping as well. As per the reference given above, we need to create hub table with a different type of primary key with a different name (email_hash BYTEA PRIMARY KEY). Can you confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants