Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Duplicate rows in int_quickbooks__invoice_join.sql model when more than one QuickBooks source #120

Closed
2 of 4 tasks
MatteyRitch opened this issue Mar 26, 2024 · 8 comments · Fixed by #123
Closed
2 of 4 tasks
Assignees
Labels
error:unforced priority:p3 Affects many users; can wait status:in_progress Currently being worked on type:bug Something is broken or incorrect update_type:models Primary focus requires model updates

Comments

@MatteyRitch
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

Pulling from the int_quickbooks__invoice_join model can return duplicate payments (different payment_ids based on the source) for the same invoice.

The CTEs aliased as "invoice_est" and "invoice_pay" bring in the invoice_id and payment_id and join only on invoice_id.
If the same invoice_id exists in multiple quickbooks sources, there will be a cross join that causes duplicate entries in these CTE results which is then propagated when they are used downstream.

Relevant error log or model output

No response

Expected behavior

Only have payments that were associated with a given invoice returned.

dbt Project configurations

vars: 
  quickbooks_union_schemas: ['quickbooks','quickbooks_source2', 'quickbooks_source3']

models:
  quickbooks:
      +schema: null
  quickbooks_source:
      +schema: null

Package versions

  - package: fivetran/quickbooks
    version: [">=0.12.0", "<0.13.0"] 

What database are you using dbt with?

postgres

dbt Version

dbt v1.7.0-latest

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.
@MatteyRitch MatteyRitch added the type:bug Something is broken or incorrect label Mar 26, 2024
@fivetran-reneeli
Copy link
Contributor

Thanks @MatteyRitch for opening this and including all the details! I'm seeing what you're talking about. I think these lines need to include a join on the source_relation:

left join payment_lines_payment
on payments.payment_id = payment_lines_payment.payment_id

left join invoice_est
on invoices.invoice_id = invoice_est.invoice_id
left join invoice_pay
on invoices.invoice_id = invoice_pay.invoice_id

In addition to the ones you mentioned:

on invoices.invoice_id = invoice_linked.invoice_id

on invoices.invoice_id = invoice_linked.invoice_id

I will circle this back with the team and keep you updated.

@MatteyRitch
Copy link
Author

MatteyRitch commented Mar 27, 2024

My pleasure & thank you for the quick response @fivetran-reneeli! Yes I agree with what you wrote, though the main branch already had the source_relation joins on the estimates , payments, and payment_lines_payment joins in the final CTE.

left join estimates
on invoice_link.estimate_id = estimates.estimate_id
and invoice_link.source_relation = estimates.source_relation

left join payments
on invoice_link.payment_id = payments.payment_id
and invoice_link.source_relation = payments.source_relation

left join payment_lines_payment
on payments.payment_id = payment_lines_payment.payment_id
and invoice_link.invoice_id = payment_lines_payment.invoice_id
and invoice_link.source_relation = payment_lines_payment.source_relation

Here is where / what I think needs to be modified:

invoice_est as (
select
invoices.invoice_id,
invoice_linked.estimate_id
from invoices
left join invoice_linked
on invoices.invoice_id = invoice_linked.invoice_id
where invoice_linked.estimate_id is not null
),
invoice_pay as (
select
invoices.invoice_id,
invoice_linked.payment_id
from invoices
left join invoice_linked
on invoices.invoice_id = invoice_linked.invoice_id
where invoice_linked.payment_id is not null
),
invoice_link as (
select
invoices.*,
invoice_est.estimate_id,
invoice_pay.payment_id
from invoices
left join invoice_est
on invoices.invoice_id = invoice_est.invoice_id
left join invoice_pay
on invoices.invoice_id = invoice_pay.invoice_id
),

Changed above to this:

invoice_est as (

    select
        invoices.invoice_id,
        invoice_linked.estimate_id,
        invoices.source_relation
    from invoices

    left join invoice_linked
        on invoices.invoice_id = invoice_linked.invoice_id
        and invoices.source_relation = invoice_linked.source_relation

    where invoice_linked.estimate_id is not null
),

invoice_pay as (

    select
        invoices.invoice_id,
        invoice_linked.payment_id,
        invoices.source_relation
    from invoices

    left join invoice_linked
        on invoices.invoice_id = invoice_linked.invoice_id
        and invoices.source_relation = invoice_linked.source_relation

    where invoice_linked.payment_id is not null
),

invoice_link as (

    select
        invoices.*,
        invoice_est.estimate_id,
        invoice_pay.payment_id
    from invoices

    left join invoice_est
        on invoices.invoice_id = invoice_est.invoice_id
        and invoices.source_relation = invoice_est.source_relation

    left join invoice_pay
        on invoices.invoice_id = invoice_pay.invoice_id
        and invoices.source_relation = invoice_pay.source_relation
),

Hopefully that is clear & I'll look out for your reply.

@MatteyRitch
Copy link
Author

I thought I included this in the above post.

The additions to the code are the invoices.source_relation column and the and invoices.source_relation = invoice_linked.source_relation criteria for the join in the invoice_est and invoice_pay CTEs and the and invoices.source_relation = invoice_est.source_relation and and invoices.source_relation = invoice_pay.source_relation when joining those 2 CTEs to the invoice_link.

@fivetran-reneeli
Copy link
Contributor

Thanks for following up and yup I agree with you there!

We'll also need to add joins on source_relation in all the invoice ID joines in the intermediate CTEs, leading up to the final CTE, to prevent fan outs in the event of same invoice id's across different sources. I will check for missing joins in the other models as well.

Again, appreciate the details you left; I will file this into our upcoming sprints!

@MatteyRitch
Copy link
Author

MatteyRitch commented Apr 2, 2024

Hi @fivetran-reneeli - sorry I was out on Monday. I just opened the PR . I wasn't aware of the need to update the changelog and versions prior to the first commit so I had to tack a 2nd one on - sorry.

One note, I couldn't find where the source_relation join / fields were needed in this model. They seem to already be there:

on bill_payments.bill_payment_id = bill_payment_lines.bill_payment_id
and bill_payments.source_relation = bill_payment_lines.source_relation
and bill_link.bill_id = bill_payment_lines.bill_id

@fivetran-reneeli
Copy link
Contributor

Thanks for the PR @MatteyRitch! No worries, and we will also make some additional updates on top of your PR to make sure everything is good to go before merging it in. We aim to have this merged in the coming weeks!

@fivetran-joemarkiewicz fivetran-joemarkiewicz added priority:p3 Affects many users; can wait status:accepted Scoped and accepted into queue update_type:models Primary focus requires model updates labels Apr 4, 2024
@fivetran-joemarkiewicz fivetran-joemarkiewicz added status:in_progress Currently being worked on and removed status:accepted Scoped and accepted into queue labels Apr 11, 2024
fivetran-reneeli added a commit that referenced this issue Apr 24, 2024
fixes #120 - missing source_relation joins
@fivetran-reneeli
Copy link
Contributor

Hi @MatteyRitch , just finalized our PR and added some final touches. Could you confirm this works on your data?

packages.yml


- git: https://github.com/fivetran/dbt_quickbooks
   revision: bugfix/source_relations
   warn-unpinned:  false

@fivetran-reneeli fivetran-reneeli linked a pull request Apr 24, 2024 that will close this issue
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error:unforced priority:p3 Affects many users; can wait status:in_progress Currently being worked on type:bug Something is broken or incorrect update_type:models Primary focus requires model updates
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants