Skip to content

Latest commit

 

History

History
54 lines (31 loc) · 2.4 KB

examples_index.md

File metadata and controls

54 lines (31 loc) · 2.4 KB
hide tags
toc
Examples
DuckDB
Spark
Athena

Example Notebooks

This section provides a series of examples to help you get started with Splink. You can find the underlying notebooks in the demos folder of the Splink repo.

You can try these demos live in your web browser using the following link:

Binder

:simple-duckdb: DuckDB examples

Entity type: Persons

Deduplicating 50,000 records of realistic data based on historical persons

Using the link_only setting to link, but not dedupe, two datasets

Real time record linkage

Accuracy analysis and ROC charts using a ground truth (cluster) column

Estimating m probabilities from pairwise labels

Deduplicating 50,000 records with Deterministic Rules

Deduplicating the febrl3 dataset. Note this dataset comes from febrl, as referenced in A.2 here and replicated here.

Linking the febrl4 datasets. As above, these datasets are from febrl, replicated here.

Entity type: Financial transactions

Linking financial transactions

:simple-apachespark: PySpark examples

Deduplication of a small dataset using PySpark. Entity type is persons.

:simple-amazonaws: Athena examples

Deduplicating 50,000 records of realistic data based on historical persons

:simple-sqlite: SQLite examples

Deduplicating 50,000 records of realistic data based on historical persons