Predict a Heart Disease Using Machine Learning Algorithms and AWS
Today heart failures are main issue among people, as a result of not following a healthy life styles. This is a project to predict if a patient is having a heart disease or not by considering several health factors.
Data analysis
The dataset is known as heart.csv downloaded from Kaggle. There are numerical and categorical values.
There are 13 features in it. They are
age : age in years
sex : 1 = male ; 0 = female
cp : chest pain type (0–3)
trestbps : resting blood pressure(in mm Hg on admission to the hospital
chol : serum cholestora in mg/dl
fbs : fasting blood sugar and
restecg : resting electrocardiographic results
thalach : maximum heart rate achieved
exang : exercise induced angina (1 = yes; 0 = no)
oldpeak : ST depression induced by exercise relative to rest
slope : the slope of the peak exercise ST segment
ca : number of major vessels(0–3) colored by flourosopy
thal : 3=normal; 6=fixed defect; 7=reversible defect
Feature Engineering
The predicted column is defined as “target”
0 : not having a heart disease
1 : having a heart disease
Changed the order of columns to move label to 0th position
Removed the useless columns
One hot encoded and filled missing values with mean values
Saved train.csv by taking 80% of random samples
Saved test.csv by taking 20% of random samples
https://raw.githubusercontent.com/shelomi123/FE/main/FE.py
New processed file
•contains 9 featured columns and 1 label column
•No headers
•Label column is at 0th position
ML algorithms and Training
Trained using AWS built in algorithms such as
•LinearLearner
•XGBoost
•KNN
•Factorization Machines
By comparing accuracy Linear Learner was deployed
Tested by changing different hyper parameters
Accuracy : 85.2459