Skip to content

Commit

Permalink
Merge pull request #36 from Kostee/main
Browse files Browse the repository at this point in the history
Kompletne pd2 #17
  • Loading branch information
kozaka93 authored Apr 7, 2021
2 parents 3fbdd90 + d56fa4f commit 7793934
Show file tree
Hide file tree
Showing 15 changed files with 816,017 additions and 0 deletions.
260,531 changes: 260,531 additions & 0 deletions PraceDomowe/PracaDomowa2/Kosterna_Jakub/.ipynb_checkpoints/Kosterna_pd2-checkpoint.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Jakub Kosterna - przygotowanie modelu\n",
"## Warsztaty Badawcze 2021: XAI-1\n",
"### Skrypt dodatkowy do pracy domowej 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Co ja tu robię"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Czyż druga praca domowa nie jest wspaniałą okazją, aby jeszcze lepiej poznać **<font color=\"red\">Red Wine Quality</font>**?\n",
"\n",
"Metodę **LIME** będę badał na najlepszym znalezionym przez mój zespół modelu *XGBoost* z tuningiem wybranych hyperparametrów."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Potrzebne pakiety i ziarno generatora"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import pickle\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import preprocessing\n",
"\n",
"import xgboost as xgb\n",
"\n",
"from sklearn.model_selection import RandomizedSearchCV\n",
"\n",
"np.random.seed = 42"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Przygotowanie zbiorów treningowego i testowego"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df_wines = pd.read_csv('winequality-red.csv')\n",
"df_wines[\"is_good\"] = df_wines.apply(lambda row: 1 if row.quality > 5 else 0, axis = 1)\n",
"X = df_wines.drop([\"is_good\", 'quality'], axis=1)\n",
"y = df_wines[\"is_good\"]\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,random_state = 42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Model i tuning hyperparametrów"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'min_child_weight': 1,\n",
" 'max_depth': 5,\n",
" 'learning_rate': 0.15,\n",
" 'gamma': 0.4,\n",
" 'colsample_bytree': 0.5}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xgb_param_grid = {\n",
" \"learning_rate\": [0.05, 0.10, 0.15, 0.20, 0.25, 0.30] ,\n",
" \"max_depth\": [ 3, 4, 5, 6, 8, 10, 12, 15],\n",
" \"min_child_weight\": [ 1, 3, 5, 7],\n",
" \"gamma\": [ 0.0, 0.1, 0.2 , 0.3, 0.4],\n",
" \"colsample_bytree\": [ 0.3, 0.4, 0.5 , 0.7]\n",
"}\n",
"\n",
"xgb = xgb.XGBClassifier(objective = \"binary:logistic\", eval_metric = \"logloss\", use_label_encoder = False, seed = 42)\n",
"\n",
"randomized_mse = RandomizedSearchCV(param_distributions = xgb_param_grid,\n",
" estimator = xgb,\n",
" cv = 4,\n",
" n_iter = 1000)\n",
"\n",
"randomized_mse.fit(X_train, y_train)\n",
"randomized_mse.best_params_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Zapis modelu"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"pickle.dump(randomized_mse, open(\"xgb\", 'wb'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit 7793934

Please sign in to comment.