Skip to content
Snippets Groups Projects
Commit dedd403e authored by Yeyi12's avatar Yeyi12
Browse files

Addedsome data files and progress on parameters finding

parent 0749dc54
No related branches found
No related tags found
No related merge requests found
Run,Best Correlation Score,Best Accuracy Score,Selected Features
Run 1,0.1955,0.9102,"['CardiovascularDisease', 'BehavioralProblems', 'SleepQuality', 'DifficultyCompletingTasks', 'DietQuality', 'PersonalityChanges', 'FamilyHistoryAlzheimers', 'DiastolicBP', 'CholesterolTriglycerides', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'CholesterolHDL', 'MMSE', 'AlcoholConsumption']"
Run 2,0.2241,0.9136,"['Diabetes', 'BehavioralProblems', 'Confusion', 'FamilyHistoryAlzheimers', 'Gender', 'DiastolicBP', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'Ethnicity', 'MMSE', 'Hypertension']"
Run 3,0.248,0.9182,"['BehavioralProblems', 'Gender', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'MMSE']"
Run 4,0.2474,0.9129,"['Diabetes', 'BehavioralProblems', 'DietQuality', 'EducationLevel', 'Gender', 'CholesterolTriglycerides', 'MemoryComplaints', 'CholesterolHDL', 'FunctionalAssessment', 'CholesterolTotal', 'ADL', 'MMSE']"
Run 5,0.214,0.9122,"['BehavioralProblems', 'FunctionalAssessment', 'ADL', 'CholesterolHDL', 'Confusion', 'CardiovascularDisease', 'SleepQuality', 'HeadInjury', 'DietQuality', 'PersonalityChanges', 'PhysicalActivity', 'Disorientation', 'DiastolicBP', 'Ethnicity', 'Smoking', 'MMSE', 'FamilyHistoryAlzheimers', 'MemoryComplaints', 'AlcoholConsumption']"
Run 6,0.2195,0.9142,"['Diabetes', 'BehavioralProblems', 'SleepQuality', 'Confusion', 'PersonalityChanges', 'EducationLevel', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'Ethnicity', 'MMSE', 'SystolicBP']"
Run 7,0.2142,0.9209,"['CardiovascularDisease', 'BehavioralProblems', 'SleepQuality', 'Forgetfulness', 'DietQuality', 'PhysicalActivity', 'Smoking', 'Disorientation', 'CholesterolTriglycerides', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'MMSE', 'Hypertension']"
Run 8,0.2152,0.9156,"['BehavioralProblems', 'Disorientation', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'MMSE', 'Hypertension']"
Run 9,0.1906,0.9182,"['Diabetes', 'BehavioralProblems', 'SleepQuality', 'DietQuality', 'Disorientation', 'Gender', 'MemoryComplaints', 'FunctionalAssessment', 'ADL', 'Ethnicity', 'MMSE', 'AlcoholConsumption']"
Run 10,0.2151,0.9169,"['BehavioralProblems', 'DifficultyCompletingTasks', 'SleepQuality', 'PersonalityChanges', 'DietQuality', 'Confusion', 'Forgetfulness', 'Gender', 'HeadInjury', 'MemoryComplaints', 'CholesterolTriglycerides', 'FunctionalAssessment', 'ADL', 'MMSE']"
This diff is collapsed.
This diff is collapsed.
%% Cell type:code id: tags:
``` python
import sys
print(sys.path) # Check where Python is looking for packages
import pandas as pd
print(pd.__version__) # Should print the installed pandas version
```
%% Output
['c:\\Users\\sarah\\AppData\\Local\\Programs\\Python\\Python313\\python313.zip', 'c:\\Users\\sarah\\AppData\\Local\\Programs\\Python\\Python313\\DLLs', 'c:\\Users\\sarah\\AppData\\Local\\Programs\\Python\\Python313\\Lib', 'c:\\Users\\sarah\\AppData\\Local\\Programs\\Python\\Python313', '', 'C:\\Users\\sarah\\AppData\\Roaming\\Python\\Python313\\site-packages', 'C:\\Users\\sarah\\AppData\\Roaming\\Python\\Python313\\site-packages\\win32', 'C:\\Users\\sarah\\AppData\\Roaming\\Python\\Python313\\site-packages\\win32\\lib', 'C:\\Users\\sarah\\AppData\\Roaming\\Python\\Python313\\site-packages\\Pythonwin', 'c:\\Users\\sarah\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\site-packages']
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 3
1 import sys
2 print(sys.path) # Check where Python is looking for packages
----> 3 import pandas as pd
4 print(pd.__version__) # Should print the installed pandas version
ModuleNotFoundError: No module named 'pandas'
%% Cell type:code id: tags:
``` python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
```
%% Output
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'pandas'
%% Cell type:code id: tags:
``` python
a_df = pd.read_csv('alzheimers_disease_data.csv')
a_df
```
%% Output
PatientID Age Gender Ethnicity EducationLevel BMI Smoking \
0 4751 73 0 0 2 22.927749 0
1 4752 89 0 0 0 26.827681 0
2 4753 73 0 3 1 17.795882 0
3 4754 74 1 0 1 33.800817 1
4 4755 89 0 0 0 20.716974 0
... ... ... ... ... ... ... ...
2144 6895 61 0 0 1 39.121757 0
2145 6896 75 0 0 2 17.857903 0
2146 6897 77 0 0 1 15.476479 0
2147 6898 78 1 3 1 15.299911 0
2148 6899 72 0 0 2 33.289738 0
AlcoholConsumption PhysicalActivity DietQuality ... \
0 13.297218 6.327112 1.347214 ...
1 4.542524 7.619885 0.518767 ...
2 19.555085 7.844988 1.826335 ...
3 12.209266 8.428001 7.435604 ...
4 18.454356 6.310461 0.795498 ...
... ... ... ... ...
2144 1.561126 4.049964 6.555306 ...
2145 18.767261 1.360667 2.904662 ...
2146 4.594670 9.886002 8.120025 ...
2147 8.674505 6.354282 1.263427 ...
2148 7.890703 6.570993 7.941404 ...
MemoryComplaints BehavioralProblems ADL Confusion \
0 0 0 1.725883 0
1 0 0 2.592424 0
2 0 0 7.119548 0
3 0 1 6.481226 0
4 0 0 0.014691 0
... ... ... ... ...
2144 0 0 4.492838 1
2145 0 1 9.204952 0
2146 0 0 5.036334 0
2147 0 0 3.785399 0
2148 0 1 8.327563 0
Disorientation PersonalityChanges DifficultyCompletingTasks \
0 0 0 1
1 0 0 0
2 1 0 1
3 0 0 0
4 0 1 1
... ... ... ...
2144 0 0 0
2145 0 0 0
2146 0 0 0
2147 0 0 0
2148 1 0 0
Forgetfulness Diagnosis DoctorInCharge
0 0 0 XXXConfid
1 1 0 XXXConfid
2 0 0 XXXConfid
3 0 0 XXXConfid
4 0 0 XXXConfid
... ... ... ...
2144 0 1 XXXConfid
2145 0 1 XXXConfid
2146 0 1 XXXConfid
2147 1 1 XXXConfid
2148 1 0 XXXConfid
[2149 rows x 35 columns]
%% Cell type:code id: tags:
``` python
a_df.describe()
```
%% Output
PatientID Age Gender Ethnicity EducationLevel \
count 2149.000000 2149.000000 2149.000000 2149.000000 2149.000000
mean 5825.000000 74.908795 0.506282 0.697534 1.286645
std 620.507185 8.990221 0.500077 0.996128 0.904527
min 4751.000000 60.000000 0.000000 0.000000 0.000000
25% 5288.000000 67.000000 0.000000 0.000000 1.000000
50% 5825.000000 75.000000 1.000000 0.000000 1.000000
75% 6362.000000 83.000000 1.000000 1.000000 2.000000
max 6899.000000 90.000000 1.000000 3.000000 3.000000
BMI Smoking AlcoholConsumption PhysicalActivity \
count 2149.000000 2149.000000 2149.000000 2149.000000
mean 27.655697 0.288506 10.039442 4.920202
std 7.217438 0.453173 5.757910 2.857191
min 15.008851 0.000000 0.002003 0.003616
25% 21.611408 0.000000 5.139810 2.570626
50% 27.823924 0.000000 9.934412 4.766424
75% 33.869778 1.000000 15.157931 7.427899
max 39.992767 1.000000 19.989293 9.987429
DietQuality ... FunctionalAssessment MemoryComplaints \
count 2149.000000 ... 2149.000000 2149.000000
mean 4.993138 ... 5.080055 0.208004
std 2.909055 ... 2.892743 0.405974
min 0.009385 ... 0.000460 0.000000
25% 2.458455 ... 2.566281 0.000000
50% 5.076087 ... 5.094439 0.000000
75% 7.558625 ... 7.546981 0.000000
max 9.998346 ... 9.996467 1.000000
BehavioralProblems ADL Confusion Disorientation \
count 2149.000000 2149.000000 2149.000000 2149.000000
mean 0.156817 4.982958 0.205212 0.158213
std 0.363713 2.949775 0.403950 0.365026
min 0.000000 0.001288 0.000000 0.000000
25% 0.000000 2.342836 0.000000 0.000000
50% 0.000000 5.038973 0.000000 0.000000
75% 0.000000 7.581490 0.000000 0.000000
max 1.000000 9.999747 1.000000 1.000000
PersonalityChanges DifficultyCompletingTasks Forgetfulness \
count 2149.000000 2149.000000 2149.000000
mean 0.150768 0.158678 0.301536
std 0.357906 0.365461 0.459032
min 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000
50% 0.000000 0.000000 0.000000
75% 0.000000 0.000000 1.000000
max 1.000000 1.000000 1.000000
Diagnosis
count 2149.000000
mean 0.353653
std 0.478214
min 0.000000
25% 0.000000
50% 0.000000
75% 1.000000
max 1.000000
[8 rows x 34 columns]
%% Cell type:code id: tags:
``` python
#checking for null values
a_df.isna().sum()
```
%% Output
PatientID 0
Age 0
Gender 0
Ethnicity 0
EducationLevel 0
BMI 0
Smoking 0
AlcoholConsumption 0
PhysicalActivity 0
DietQuality 0
SleepQuality 0
FamilyHistoryAlzheimers 0
CardiovascularDisease 0
Diabetes 0
Depression 0
HeadInjury 0
Hypertension 0
SystolicBP 0
DiastolicBP 0
CholesterolTotal 0
CholesterolLDL 0
CholesterolHDL 0
CholesterolTriglycerides 0
MMSE 0
FunctionalAssessment 0
MemoryComplaints 0
BehavioralProblems 0
ADL 0
Confusion 0
Disorientation 0
PersonalityChanges 0
DifficultyCompletingTasks 0
Forgetfulness 0
Diagnosis 0
DoctorInCharge 0
dtype: int64
%% Cell type:code id: tags:
``` python
#dropping the columns
df =a_df.drop(['DoctorInCharge', 'PatientID'], axis=1, inplace=True)
```
%% Cell type:code id: tags:
``` python
a_df
```
%% Output
Age Gender Ethnicity EducationLevel BMI Smoking \
0 73 0 0 2 22.927749 0
1 89 0 0 0 26.827681 0
2 73 0 3 1 17.795882 0
3 74 1 0 1 33.800817 1
4 89 0 0 0 20.716974 0
... ... ... ... ... ... ...
2144 61 0 0 1 39.121757 0
2145 75 0 0 2 17.857903 0
2146 77 0 0 1 15.476479 0
2147 78 1 3 1 15.299911 0
2148 72 0 0 2 33.289738 0
AlcoholConsumption PhysicalActivity DietQuality SleepQuality ... \
0 13.297218 6.327112 1.347214 9.025679 ...
1 4.542524 7.619885 0.518767 7.151293 ...
2 19.555085 7.844988 1.826335 9.673574 ...
3 12.209266 8.428001 7.435604 8.392554 ...
4 18.454356 6.310461 0.795498 5.597238 ...
... ... ... ... ... ...
2144 1.561126 4.049964 6.555306 7.535540 ...
2145 18.767261 1.360667 2.904662 8.555256 ...
2146 4.594670 9.886002 8.120025 5.769464 ...
2147 8.674505 6.354282 1.263427 8.322874 ...
2148 7.890703 6.570993 7.941404 9.878711 ...
FunctionalAssessment MemoryComplaints BehavioralProblems ADL \
0 6.518877 0 0 1.725883
1 7.118696 0 0 2.592424
2 5.895077 0 0 7.119548
3 8.965106 0 1 6.481226
4 6.045039 0 0 0.014691
... ... ... ... ...
2144 0.238667 0 0 4.492838
2145 8.687480 0 1 9.204952
2146 1.972137 0 0 5.036334
2147 5.173891 0 0 3.785399
2148 6.307543 0 1 8.327563
Confusion Disorientation PersonalityChanges \
0 0 0 0
1 0 0 0
2 0 1 0
3 0 0 0
4 0 0 1
... ... ... ...
2144 1 0 0
2145 0 0 0
2146 0 0 0
2147 0 0 0
2148 0 1 0
DifficultyCompletingTasks Forgetfulness Diagnosis
0 1 0 0
1 0 1 0
2 1 0 0
3 0 0 0
4 1 0 0
... ... ... ...
2144 0 0 1
2145 0 0 1
2146 0 0 1
2147 0 1 1
2148 0 1 0
[2149 rows x 33 columns]
%% Cell type:code id: tags:
``` python
a_df.all()
```
%% Output
Age True
Gender False
Ethnicity False
EducationLevel False
BMI True
Smoking False
AlcoholConsumption True
PhysicalActivity True
DietQuality True
SleepQuality True
FamilyHistoryAlzheimers False
CardiovascularDisease False
Diabetes False
Depression False
HeadInjury False
Hypertension False
SystolicBP True
DiastolicBP True
CholesterolTotal True
CholesterolLDL True
CholesterolHDL True
CholesterolTriglycerides True
MMSE True
FunctionalAssessment True
MemoryComplaints False
BehavioralProblems False
ADL True
Confusion False
Disorientation False
PersonalityChanges False
DifficultyCompletingTasks False
Forgetfulness False
Diagnosis False
dtype: bool
%% Cell type:markdown id: tags:
Demographic Details
Age: The age of the patients ranges from 60 to 90 years.
Gender: Gender of the patients, where 0 represents Male and 1 represents Female.
Ethnicity: The ethnicity of the patients, coded as follows:
0: Caucasian
1: African American
2: Asian
3: Other
EducationLevel: The education level of the patients, coded as follows:
0: None
1: High School
2: Bachelor's
3: Higher
%% Cell type:code id: tags:
``` python
demographicdf = a_df[['Age','Gender','Ethnicity','EducationLevel']]
demographicdf
```
%% Output
Age Gender Ethnicity EducationLevel
0 73 0 0 2
1 89 0 0 0
2 73 0 3 1
3 74 1 0 1
4 89 0 0 0
... ... ... ... ...
2144 61 0 0 1
2145 75 0 0 2
2146 77 0 0 1
2147 78 1 3 1
2148 72 0 0 2
[2149 rows x 4 columns]
%% Cell type:code id: tags:
``` python
sns.heatmap(a_df == 0, yticklabels=False)
```
%% Output
<Axes: >
%% Cell type:code id: tags:
``` python
lifestyledf = a_df[['AlcoholConsumption','BMI','Smoking','PhysicalActivity', 'DietQuality', 'SleepQuality' ]]
lifestyledf
```
%% Output
AlcoholConsumption BMI Smoking PhysicalActivity DietQuality \
0 13.297218 22.927749 0 6.327112 1.347214
1 4.542524 26.827681 0 7.619885 0.518767
2 19.555085 17.795882 0 7.844988 1.826335
3 12.209266 33.800817 1 8.428001 7.435604
4 18.454356 20.716974 0 6.310461 0.795498
... ... ... ... ... ...
2144 1.561126 39.121757 0 4.049964 6.555306
2145 18.767261 17.857903 0 1.360667 2.904662
2146 4.594670 15.476479 0 9.886002 8.120025
2147 8.674505 15.299911 0 6.354282 1.263427
2148 7.890703 33.289738 0 6.570993 7.941404
SleepQuality
0 9.025679
1 7.151293
2 9.673574
3 8.392554
4 5.597238
... ...
2144 7.535540
2145 8.555256
2146 5.769464
2147 8.322874
2148 9.878711
[2149 rows x 6 columns]
%% Cell type:markdown id: tags:
Lifestyle Factors
BMI: Body Mass Index of the patients, ranging from 15 to 40.
Smoking: Smoking status, where 0 indicates No and 1 indicates Yes.
AlcoholConsumption: Weekly alcohol consumption in units, ranging from 0 to 20.
PhysicalActivity: Weekly physical activity in hours, ranging from 0 to 10.
DietQuality: Diet quality score, ranging from 0 to 10.
SleepQuality: Sleep quality score, ranging from 4 to 10.
%% Cell type:markdown id: tags:
Medical History
FamilyHistoryAlzheimers: Family history of Alzheimer's Disease, where 0 indicates No and 1 indicates Yes.
CardiovascularDisease: Presence of cardiovascular disease, where 0 indicates No and 1 indicates Yes.
Diabetes: Presence of diabetes, where 0 indicates No and 1 indicates Yes.
Depression: Presence of depression, where 0 indicates No and 1 indicates Yes.
HeadInjury: History of head injury, where 0 indicates No and 1 indicates Yes.
Hypertension: Presence of hypertension, where 0 indicates No and 1 indicates Yes.
%% Cell type:code id: tags:
``` python
medicaldf = a_df[['FamilyHistoryAlzheimers', 'CardiovascularDisease', 'Diabetes', 'Depression','HeadInjury','Hypertension']]
medicaldf
```
%% Output
FamilyHistoryAlzheimers CardiovascularDisease Diabetes Depression \
0 0 0 1 1
1 0 0 0 0
2 1 0 0 0
3 0 0 0 0
4 0 0 0 0
... ... ... ... ...
2144 0 0 0 0
2145 0 0 0 0
2146 0 0 0 0
2147 0 1 0 0
2148 0 0 0 0
HeadInjury Hypertension
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
... ... ...
2144 0 0
2145 0 0
2146 0 0
2147 0 0
2148 0 0
[2149 rows x 6 columns]
%% Cell type:markdown id: tags:
Symptoms
Confusion: Presence of confusion, where 0 indicates No and 1 indicates Yes.
Disorientation: Presence of disorientation, where 0 indicates No and 1 indicates Yes.
PersonalityChanges: Presence of personality changes, where 0 indicates No and 1 indicates Yes.
DifficultyCompletingTasks: Presence of difficulty completing tasks, where 0 indicates No and 1 indicates Yes.
Forgetfulness: Presence of forgetfulness, where 0 indicates No and 1 indicates Yes.
Diagnosis Information
Diagnosis: Diagnosis status for Alzheimer's Disease, where 0 indicates No and 1 indicates Yes.
%% Cell type:code id: tags:
``` python
symptomsdf = a_df[['Confusion', 'Disorientation', 'PersonalityChanges', 'DifficultyCompletingTasks','Forgetfulness']]
symptomsdf
```
%% Output
Confusion Disorientation PersonalityChanges \
0 0 0 0
1 0 0 0
2 0 1 0
3 0 0 0
4 0 0 1
... ... ... ...
2144 1 0 0
2145 0 0 0
2146 0 0 0
2147 0 0 0
2148 0 1 0
DifficultyCompletingTasks Forgetfulness
0 1 0
1 0 1
2 1 0
3 0 0
4 1 0
... ... ...
2144 0 0
2145 0 0
2146 0 0
2147 0 1
2148 0 1
[2149 rows x 5 columns]
%% Cell type:code id: tags:
``` python
sns.heatmap(lifestyledf == 0, yticklabels=False)
```
%% Output
<Axes: >
%% Cell type:markdown id: tags:
Clinical Measurements
SystolicBP: Systolic blood pressure, ranging from 90 to 180 mmHg.
DiastolicBP: Diastolic blood pressure, ranging from 60 to 120 mmHg.
CholesterolTotal: Total cholesterol levels, ranging from 150 to 300 mg/dL.
CholesterolLDL: Low-density lipoprotein cholesterol levels, ranging from 50 to 200 mg/dL.
CholesterolHDL: High-density lipoprotein cholesterol levels, ranging from 20 to 100 mg/dL.
CholesterolTriglycerides: Triglycerides levels, ranging from 50 to 400 mg/dL.
%% Cell type:code id: tags:
``` python
clinicaldf = a_df[['SystolicBP', 'DiastolicBP', 'CholesterolTotal', 'CholesterolLDL','CholesterolHDL','CholesterolTriglycerides']]
clinicaldf
```
%% Output
SystolicBP DiastolicBP CholesterolTotal CholesterolLDL \
0 142 72 242.366840 56.150897
1 115 64 231.162595 193.407996
2 99 116 284.181858 153.322762
3 118 115 159.582240 65.366637
4 94 117 237.602184 92.869700
... ... ... ... ...
2144 122 101 280.476824 94.870490
2145 152 106 186.384436 95.410700
2146 115 118 237.024558 156.267294
2147 103 96 242.197192 52.482961
2148 166 78 283.396797 92.200064
CholesterolHDL CholesterolTriglycerides
0 33.682563 162.189143
1 79.028477 294.630909
2 69.772292 83.638324
3 68.457491 277.577358
4 56.874305 291.198780
... ... ...
2144 60.943092 234.520123
2145 93.649735 367.986877
2146 99.678209 294.802338
2147 81.281111 145.253746
2148 81.920043 217.396873
[2149 rows x 6 columns]
%% Cell type:code id: tags:
``` python
sns.heatmap(clinicaldf == 0, yticklabels=False)
```
%% Output
<Axes: >
%% Cell type:markdown id: tags:
Cognitive and Functional Assessments
MMSE: Mini-Mental State Examination score, ranging from 0 to 30. Lower scores indicate cognitive impairment.
FunctionalAssessment: Functional assessment score, ranging from 0 to 10. Lower scores indicate greater impairment.
MemoryComplaints: Presence of memory complaints, where 0 indicates No and 1 indicates Yes.
BehavioralProblems: Presence of behavioral problems, where 0 indicates No and 1 indicates Yes.
ADL: Activities of Daily Living score, ranging from 0 to 10. Lower scores indicate greater impairment.
%% Cell type:code id: tags:
``` python
cognitivedf = a_df[['MMSE', 'FunctionalAssessment','MemoryComplaints', 'BehavioralProblems', 'ADL']]
cognitivedf
```
%% Output
MMSE FunctionalAssessment MemoryComplaints BehavioralProblems \
0 21.463532 6.518877 0 0
1 20.613267 7.118696 0 0
2 7.356249 5.895077 0 0
3 13.991127 8.965106 0 1
4 13.517609 6.045039 0 0
... ... ... ... ...
2144 1.201190 0.238667 0 0
2145 6.458060 8.687480 0 1
2146 17.011003 1.972137 0 0
2147 4.030491 5.173891 0 0
2148 11.114777 6.307543 0 1
ADL
0 1.725883
1 2.592424
2 7.119548
3 6.481226
4 0.014691
... ...
2144 4.492838
2145 9.204952
2146 5.036334
2147 3.785399
2148 8.327563
[2149 rows x 5 columns]
%% Cell type:code id: tags:
``` python
sns.heatmap(cognitivedf == 0, yticklabels=False)
```
%% Output
<Axes: >
%% Cell type:code id: tags:
``` python
#0 indicates No and 1 indicates Yes
sns.countplot(x="Diagnosis", hue="Diagnosis", data=a_df)
```
%% Output
<Axes: xlabel='Diagnosis', ylabel='count'>
%% Cell type:code id: tags:
``` python
#defining X nad y
X = a_df.drop('Diagnosis', axis = 1)
y = a_df['Diagnosis']
```
%% Cell type:code id: tags:
``` python
groups = {"Cognitive": cognitivedf,"Symptoms": symptomsdf,"Lifestyle": lifestyledf,"Clinical": clinicaldf,"Medical": medicaldf,"Demographic": demographicdf}
for name, group in groups.items():
# Merge features with target variable
dfy = pd.concat([group, y], axis=1)
# Compute correlation matrix
correlation_matrix = dfy.corr()
# Extract correlation with target variable (Diagnosis)
correlationy = correlation_matrix['Diagnosis'].drop('Diagnosis').sort_values(ascending=False)
# Display correlation values
print(f"\nCorrelation in {name} Features:")
print(correlationy)
# Visualize correlation with y
plt.figure(figsize=(10, 6))
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
plt.title(f"Feature Correlation with Diagnosis {name}")
plt.ylabel("Correlation Coefficient")
plt.xlabel("Features")
plt.show()
```
%% Output
Correlation in Cognitive Features:
MemoryComplaints 0.306742
BehavioralProblems 0.224350
MMSE -0.237126
ADL -0.332346
FunctionalAssessment -0.364898
Name: Diagnosis, dtype: float64
C:\Users\sarah\AppData\Local\Temp\ipykernel_22224\739675737.py:19: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
Correlation in Symptoms Features:
DifficultyCompletingTasks 0.009069
Forgetfulness -0.000354
Confusion -0.019186
PersonalityChanges -0.020627
Disorientation -0.024648
Name: Diagnosis, dtype: float64
C:\Users\sarah\AppData\Local\Temp\ipykernel_22224\739675737.py:19: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
Correlation in Lifestyle Features:
BMI 0.026343
DietQuality 0.008506
PhysicalActivity 0.005945
Smoking -0.004865
AlcoholConsumption -0.007618
SleepQuality -0.056548
Name: Diagnosis, dtype: float64
C:\Users\sarah\AppData\Local\Temp\ipykernel_22224\739675737.py:19: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
C:\Users\sarah\AppData\Local\Temp\ipykernel_22224\739675737.py:19: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
Correlation in Clinical Features:
CholesterolHDL 0.042584
CholesterolTriglycerides 0.022672
CholesterolTotal 0.006394
DiastolicBP 0.005293
SystolicBP -0.015615
CholesterolLDL -0.031976
Name: Diagnosis, dtype: float64
Correlation in Medical Features:
Hypertension 0.035080
CardiovascularDisease 0.031490
Depression -0.005893
HeadInjury -0.021411
Diabetes -0.031508
FamilyHistoryAlzheimers -0.032900
Name: Diagnosis, dtype: float64
C:\Users\sarah\AppData\Local\Temp\ipykernel_22224\739675737.py:19: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
Correlation in Demographic Features:
Age -0.005488
Ethnicity -0.014782
Gender -0.020975
EducationLevel -0.043966
Name: Diagnosis, dtype: float64
C:\Users\sarah\AppData\Local\Temp\ipykernel_22224\739675737.py:19: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(x=correlationy.index, y=correlationy.values, palette="coolwarm")
%% Cell type:markdown id: tags:
Cross validation, gridsearch or randomizedgridsearch, KFold and array to store results and visualise the results
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment