Predict a Heart Disease Using Machine Learning Algorithms and AWS

Shelomi Priskila
2 min readJan 10, 2021

Today heart failures are main issue among people, as a result of not following a healthy life styles. This is a project to predict if a patient is having a heart disease or not by considering several health factors.

Data analysis

The dataset is known as heart.csv downloaded from Kaggle. There are numerical and categorical values.

There are 13 features in it. They are

age : age in years

sex : 1 = male ; 0 = female

cp : chest pain type (0–3)

trestbps : resting blood pressure(in mm Hg on admission to the hospital

chol : serum cholestora in mg/dl

fbs : fasting blood sugar and

restecg : resting electrocardiographic results

thalach : maximum heart rate achieved

exang : exercise induced angina (1 = yes; 0 = no)

oldpeak : ST depression induced by exercise relative to rest

slope : the slope of the peak exercise ST segment

ca : number of major vessels(0–3) colored by flourosopy

thal : 3=normal; 6=fixed defect; 7=reversible defect

Feature Engineering

The predicted column is defined as “target”

0 : not having a heart disease

1 : having a heart disease

Changed the order of columns to move label to 0th position

Removed the useless columns

One hot encoded and filled missing values with mean values

Saved train.csv by taking 80% of random samples

Saved test.csv by taking 20% of random samples

https://raw.githubusercontent.com/shelomi123/FE/main/FE.py

New processed file

•contains 9 featured columns and 1 label column

•No headers

•Label column is at 0th position

ML algorithms and Training

Trained using AWS built in algorithms such as

•LinearLearner

•XGBoost

•KNN

•Factorization Machines

By comparing accuracy Linear Learner was deployed

Tested by changing different hyper parameters

Accuracy : 85.2459

--

--