HACKER EARTH CHALLENGE: “ON THE PLAGUE TRAIL

Gathering Domain Knowledge

Factors Affecting Environmental Transmission of Pathogens

Persistence in the environment

pH as a Factor
Temperature as a Factor

PROBLEM STATEMENT

Pathogens

DATA ACQUISITION

Real World/ Business Objectives and Constraints

Objectives:

Constraints:

Data Overview

About the data:

Mapping the real world problem to a Machine Learning Problem

EVALUATION CRITERIA

Leaderboard score metric

Exploratory Data Analysis:

Checking for NaN or Null Value

Few Statistics of Output Variables now

Solution to the Skewness :

Log Transformed output variables

Checking Correlations, Null Values , Skewness, Constant Features and Features with mostly Zeroes

Profile Report

Inferences:

Distribution Graphs/ Histograms/ Bar Graphs

Checking for Correlation with Output Features:

Multicollinearity:

Correlation Matrix

Inferences:

Data Preprocessing

Features with null value count

Converting to Python Date-time format

Converting to python date-time format

Vectorizing Categorical Features

Vectorizing Numerical Features

Data after Vectorization

Machine Learning Models :

ALGORITHM 1: “Random Forest Regressor Model “ :

RESULT : Leaderboard Score : 86.7

ALGORITHM 2: “XGBoost Regressor Model “ :

RESULT : Leaderboard Score : 88.19 (Rank 69)

ALGORITHM 3: “XGBoost with Multi Output Regresor Model “ :

RESULT : Leaderboard Score : 88.08

COMPARING DIFFERENT MODELS

Conclusion:

Where can you find my code?

REFERENCES:

Data Scientist