Skip to content
Snippets Groups Projects
Commit 0384f6ad authored by wa2-alaaiddin's avatar wa2-alaaiddin :speech_balloon:
Browse files

Task 2 Updated Version

parent 7b1d5e3f
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# UFCFVQ-15-M Programming for Data Science
# Programming Task 2
## Student Id:
## Student Id: 23003188
%% Cell type:markdown id: tags:
### Requirement FR2.1 - Read CSV data from a file (with a header row) into memory
%% Cell type:code id: tags:
``` python
# add code here
# importing pandas library
import pandas as pd
# Reading The First Dataset Using pandas
Dataset_A = pd.read_csv('task2a.csv')
# Outputing the Dataset
Dataset_A
```
%% Output
Unnamed: 0 id_student gender region \
0 0 11391 M East Anglian Region
1 1 28400 F Scotland
2 2 31604 F South East Region
3 3 32885 F West Midlands Region
4 4 38053 M Wales
... ... ... ... ...
26741 26741 2620947 F Scotland
26742 26742 2645731 F East Anglian Region
26743 26743 2648187 F South Region
26744 26744 2679821 F South East Region
26745 26745 2684003 F Yorkshire Region
highest_education age_band disability final_result score
0 HE Qualification 55<= N Pass 82.0
1 HE Qualification 35-55 N Pass 67.0
2 A Level or Equivalent 35-55 N Pass 76.0
3 Lower Than A Level 0-35 N Pass 55.0
4 A Level or Equivalent 35-55 N Pass 68.0
... ... ... ... ... ...
26741 A Level or Equivalent 0-35 Y Distinction 89.0
26742 Lower Than A Level 35-55 N Distinction 89.0
26743 A Level or Equivalent 0-35 Y Pass 77.0
26744 Lower Than A Level 35-55 N Withdrawn 92.0
26745 HE Qualification 35-55 N Distinction 83.0
[26746 rows x 9 columns]
%% Cell type:markdown id: tags:
### Requirement FR2.2 - Read CSV data from a file (without a header row) into memory
%% Cell type:code id: tags:
``` python
# add code here
# Reading The Second Dataset Using pandas
# Adding Header Names ['id_student', 'click_events'] to First Row
Dataset_B = pd.read_csv('task2b.csv', names=['id_student', 'click_events'])
# Outputing the Dataset
Dataset_B
```
%% Output
id_student click_events
0 6516 2791.0
1 8462 656.0
2 11391 934.0
3 23629 NaN
4 23698 910.0
... ... ...
26069 2698251 1511.0
26070 2698257 758.0
26071 2698535 4241.0
26072 2698577 717.0
26073 2698588 605.0
[26074 rows x 2 columns]
%% Cell type:markdown id: tags:
### Requirement FR2.3 - Merge the data from two Dataframes
%% Cell type:code id: tags:
``` python
# add code here
```
# merging the two Datasets into one DataFrame
# Since both has same column, the margin will be based on the common column 'id student'
Datasetframe = pd.merge(Dataset_A, Dataset_B, on=['id_student'])
# Outputing the Dataset
Datasetframe
```
%% Output
Unnamed: 0 id_student gender region \
0 0 11391 M East Anglian Region
1 1 28400 F Scotland
2 2 31604 F South East Region
3 3 32885 F West Midlands Region
4 4 38053 M Wales
... ... ... ... ...
26716 26741 2620947 F Scotland
26717 26742 2645731 F East Anglian Region
26718 26743 2648187 F South Region
26719 26744 2679821 F South East Region
26720 26745 2684003 F Yorkshire Region
highest_education age_band disability final_result score \
0 HE Qualification 55<= N Pass 82.0
1 HE Qualification 35-55 N Pass 67.0
2 A Level or Equivalent 35-55 N Pass 76.0
3 Lower Than A Level 0-35 N Pass 55.0
4 A Level or Equivalent 35-55 N Pass 68.0
... ... ... ... ... ...
26716 A Level or Equivalent 0-35 Y Distinction 89.0
26717 Lower Than A Level 35-55 N Distinction 89.0
26718 A Level or Equivalent 0-35 Y Pass 77.0
26719 Lower Than A Level 35-55 N Withdrawn 92.0
26720 HE Qualification 35-55 N Distinction 83.0
click_events
0 934.0
1 1435.0
2 2158.0
3 1034.0
4 2445.0
... ...
26716 476.0
26717 893.0
26718 312.0
26719 275.0
26720 616.0
[26721 rows x 10 columns]
%% Cell type:markdown id: tags:
### Requirement FR2.4 - Remove any rows that contain missing values
%% Cell type:code id: tags:
``` python
# add code here
#Visiling how many null values are there before dropping it
Datasetframe.isnull().sum()
```
%% Output
Unnamed: 0 0
id_student 0
gender 0
region 0
highest_education 0
age_band 0
disability 0
final_result 0
score 19
click_events 1371
dtype: int64
%% Cell type:code id: tags:
``` python
# All rows with empty values will be dropped
Datasetframe = Datasetframe.dropna()
#Visiling how many null values are there after dropping it
Datasetframe.isnull().sum()
```
%% Output
Unnamed: 0 0
id_student 0
gender 0
region 0
highest_education 0
age_band 0
disability 0
final_result 0
score 0
click_events 0
dtype: int64
%% Cell type:markdown id: tags:
### Requirement FR2.5 - Filter out unnecessary rows
%% Cell type:code id: tags:
``` python
# add code here
```
#All click_events under 10 will be removed from the Datasetframes
Datasetframe = Datasetframe.drop(Datasetframe[Datasetframe['click_events'] < 10].index)
# Outputing the Dataset
Datasetframe
```
%% Output
Unnamed: 0 id_student gender region \
0 0 11391 M East Anglian Region
1 1 28400 F Scotland
2 2 31604 F South East Region
3 3 32885 F West Midlands Region
4 4 38053 M Wales
... ... ... ... ...
26716 26741 2620947 F Scotland
26717 26742 2645731 F East Anglian Region
26718 26743 2648187 F South Region
26719 26744 2679821 F South East Region
26720 26745 2684003 F Yorkshire Region
highest_education age_band disability final_result score \
0 HE Qualification 55<= N Pass 82.0
1 HE Qualification 35-55 N Pass 67.0
2 A Level or Equivalent 35-55 N Pass 76.0
3 Lower Than A Level 0-35 N Pass 55.0
4 A Level or Equivalent 35-55 N Pass 68.0
... ... ... ... ... ...
26716 A Level or Equivalent 0-35 Y Distinction 89.0
26717 Lower Than A Level 35-55 N Distinction 89.0
26718 A Level or Equivalent 0-35 Y Pass 77.0
26719 Lower Than A Level 35-55 N Withdrawn 92.0
26720 HE Qualification 35-55 N Distinction 83.0
click_events
0 934.0
1 1435.0
2 2158.0
3 1034.0
4 2445.0
... ...
26716 476.0
26717 893.0
26718 312.0
26719 275.0
26720 616.0
[25259 rows x 10 columns]
%% Cell type:markdown id: tags:
### Requirement FR2.6 - Rename the score column
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
### Requirement FR2.7 - Remove unnecessary column(s)
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
### Requirement FR2.8 - Write the DataFrame data to a CSV file
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
### Requirement FR2.9 - Investigate the effects of age-group on attainment and engagement
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
### Requirement FR2.10 - Present the results of the age-group investigation using an appropriate visualisation
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
### Requirement FR2.11 - Investigate the effects of engagement on attainment
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
### Requirement FR2.12 - Test the hypothesis that there is a significant effect on attainment
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
# Process Development Report for Task 2
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment