Releases: ilias-ant/adversarial-validation
v1.0.1
v1.0.0
validate
function had a required parameter called target
. This parameter is now optional.
This means that if the trainset
you supply does not have a target column, you do not need to pass the target
parameter. On the other hand, if the trainset
you supply does have a target column, you can (and should!) denote the
target
parameter, in order to be excluded from the adversarial validation process.
At the same time, this release brings some minor upgrades to project dependencies.
v0.1.1
Fixed
-
wrap preprocessing INFO statement, printed to the stdout, under
verbose
functionality - as expected. This particular
statement got printed even whenverbose=False
was passed to thevalidate
function.INFO: Working only with available numerical features, categorical features are not yet supported.
v0.1.0
The first non pre-release of the package. 🎉
v0.1.0
is still considered a beta release, as the API has not been tested extensively across many and diverse datasets. I have tested it with 3 different Kaggle datasets up to this point.
No changes to the functionality are introduced, only the article https://ilias-ant.github.io/blog/adversarial-validation/ is referenced in the README, meant to serve as additional contextual documentation.
v0.1.0-beta
This is considered the beta pre-release version, introducing some minor additions after a bit of personal testing on 2-3 kaggle datasets.
Features:
Passing explicitly a random_state
is now propagated to the underlying classifier as well.
Documentation:
Added short README/homepage introduction on the concept of adversarial validation and where this package stands.
Also, added a homemade package logo (available in README + homepage https://advertion.readthedocs.io/en/latest/)
v0.1.0-alpha
This is considered the alpha pre-release version, introducing some backwards-incompatible changes w.r.t. the previous release.
Features:
Response of the main public object, advertion.validate
, has changed from bool
to dict
:
from advertion import validate
train = pd.read_csv("...")
test = pd.read_csv("...")
validate(
trainset=train,
testset=test,
target="label",
)
# // {
# // "datasets_follow_same_distribution": True,
# // 'mean_roc_auc': 0.5021320833333334,
# // "adversarial_features': ['id'],
# // }
Also, upon selecting smart=True
(is actually the default case), an improved identification logic of adversarial features has been introduced, based on the Kolmogorov–Smirnov test. Having verbose=True
prints to the standard output the statistic value and the p-value of the test for every feature that is deemed as adversarial.
Documentation:
New page on adversarial features: https://advertion.readthedocs.io/en/latest/adversarial-features/. It is also referenced on the standard output when smart=True
and verbose=True
.
Tests:
Tests have been developed for the package's public interface, reaching 100%
test coverage on the project.
CI/CD:
Continuous Integration - enabled through Github Actions - enriched with 2 additional linters:
Also, test suite now runs against the following combinations:
python-version: ['3.8', '3.9', '3.10', '3.11']
os: [ubuntu-latest, macos-latest, windows-latest]
Last but not least, codecov has been introduced.
For more details, see:
.github/workflows/ci.yml
v0.1.0-alpha2
A follow-up, pre-alpha release that introduces continuous documentation capabilities to the project, through MkDocs + readthedocs. Material for MkDocs has been utilized as the theme.
URL: https://advertion.readthedocs.io/en/latest/
No change to the functionality since inaugural pre-release v0.1.0-alpha1
.
v0.1.0-alpha1
This inaugural pre-alpha release introduces the core functionality of adversarial validation, exposed to the end user through the following method:
from advertion import validate
train = pd.read_csv("...") # let's say target variable is "label"
test = pd.read_csv("...")
are_similar = validate(
train=train,
test=test,
target="label",
)
# are_similar = True: train and test are following the same underlying distribution.
# are_similar = False: test dataset exhibits a different underlying distribution than train dataset.
At the same time:
- passing
smart=True
employs a pruning strategy of design matrix features based on feature importance - this helps remove featutes with strongly identifiable properties such as IDs, timestamps etc. - passing an
n_splits
value controls the number of cross-validation folds that take place internally. - passing
verbose=True
prints to the standard output informative messages on the adversarial validation strategy. - passing a
random_state
value ensures reproducible output across multiple function calls.