XGboost Code Tested

I found some nice code for testing XGboost classification. It was a bit out of date, but with some judicious tweaking I managed to bring it up to date and work nicely.

#!/usr/bin/env python
# cd /home/flatus/Documents/solar_harvester/swarm_predicter && python3 xgboost_test_07.py

# Data:
# https://www.kaggle.com/datasets/liujiaqi/hr-comma-sepcsv?select=HR_comma_sep.csv

import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings( "ignore", module = "matplotlib\..*" )
warnings.filterwarnings( "ignore", module = "xgboost\..*" )

import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from termcolor import colored

print(colored("Start", 'blue'))
# print(colored('The Accuracy on Test Set is: ', 'blue')

df = pd.read_csv('/home/flatus/Documents/solar_harvester/swarm_predicter/HR_comma_sep.csv')
print(df.head())

#data process
X = df[['satisfaction_level', 'last_evaluation', 'number_project', 'average_montly_hours', 'time_spend_company', 'Work_accident', 'promotion_last_5years', 'sales', 'salary']]
y = df['left']

# Extract text features
cats = X.select_dtypes(exclude=np.number).columns.tolist()

# Convert to Pandas category
for col in cats:
   X[col] = X[col].astype('category')

# Splitting the dataset into the Training set and Validation set
Xt, Xv, yt, yv = train_test_split(X, y, test_size = 0.25, random_state = 0)
# dt = xgb.DMatrix(Xt.as_matrix(),label=yt.as_matrix())
# dv = xgb.DMatrix(Xv.as_matrix(),label=yv.as_matrix())

dt = xgb.DMatrix(Xt, yt, enable_categorical=True)
dv = xgb.DMatrix(Xv, yv, enable_categorical=True)


#Build the model
params = {
    "eta": 0.2,
    "max_depth": 4,
    "objective": "binary:logistic",
    "silent": 1,
    "base_score": np.mean(yt),
    'n_estimators': 1000,
    "eval_metric": "logloss"
}
model = xgb.train(params, dt, 3000, [(dt, "train"),(dv, "valid")], verbose_eval=200)

#Prediction on validation set
y_pred = model.predict(dv)

# Making the Confusion Matrix
cm = confusion_matrix(yv, (y_pred>0.5))
print(colored('The Confusion Matrix is: ', 'red'),'\n', cm)

# Calculate the accuracy on test set
predict_accuracy_on_test_set = (cm[0,0] + cm[1,1])/(cm[0,0] + cm[1,1]+cm[1,0] + cm[0,1])
print(colored('The Accuracy on Test Set is: ', 'blue'), colored(predict_accuracy_on_test_set, 'blue'))


# Based on the model we made, we can predict a employee whether he may leave or not after inputing his informaion:
print("")
print("Now make a single prediction: ")

satisfaction_level = 0.38
last_evaluation = 0.53
number_project = 2
average_montly_hours = 137
time_spend_company = 3
Work_accident = 0
left = 1   # Dont actually need this one as this is what we're trying to predict.
promotion_last_5years = 0
sales = "sales"
salary = "low"

# Make prediction:

# Use a dictionary type:
dict_ = {'satisfaction_level': satisfaction_level, 'last_evaluation': last_evaluation, 'number_project': number_project, 'average_montly_hours': average_montly_hours, 'time_spend_company': time_spend_company, 'Work_accident': Work_accident, 'left': left, 'promotion_last_5years': promotion_last_5years, 'sales': sales, 'salary': salary}
my_HR_test = pd.DataFrame([dict_])
                 
# Get rid of the 'left' column:
my_HR_test = my_HR_test.drop('left', axis=1)
 
# Extract text features
cats = my_HR_test.select_dtypes(exclude=np.number).columns.tolist()

# Convert to Pandas category
for col in cats:
   my_HR_test[col] = my_HR_test[col].astype('category')

my_HR_test = xgb.DMatrix(my_HR_test, enable_categorical=True)

# Make prediction
new_prediction = model.predict(my_HR_test)

print("new prediction: ",new_prediction)

if(new_prediction > 0.5):
    print(colored("This employee will leave!", 'blue'))
else:
    print(colored("This employee will not leave!", 'green'))

The key to success was to bring the 'One Shot' prediction data into Pandas as a dictionary. Not seen this done before, but it makes the code nice and clear and easy to see what needs to happen in deployment.

I cant test it much on the bee swarming data yet as I've not actually seen any swarms yet, so there's no boolean 'Trues' for 'swarm seen in apiary'.

Catch hive in position

First swarm of the season seen in apiary

Discussions

Become a Hackaday.io Member