Skip to content
Snippets Groups Projects
Commit 9c371bb2 authored by UWE_ 23086369_2023's avatar UWE_ 23086369_2023
Browse files

Saving the 'final_data_frame'

parent 068deae4
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# UFCFVQ-15-M Programming for Data Science
# Programming Task 2
## Student Id: 23086369
%% Cell type:markdown id: tags:
### Requirement FR2.1 - Read CSV data from a file (with a header row) into memory
%% Cell type:code id: tags:
``` python
# Importing the pandas library
import pandas as pd
# Read data, from a CSV file. Store it in a DataFrame.
df = pd.read_csv("/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2a.csv")
# Display the five rows of the DataFrame to quickly examine its structure and content.
print(df.head())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score
0 55<= N Pass 82.0
1 35-55 N Pass 67.0
2 35-55 N Pass 76.0
3 0-35 N Pass 55.0
4 35-55 N Pass 68.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.2 - Read CSV data from a file (without a header row) into memory
%% Cell type:code id: tags:
``` python
# Importing the pandas library
import pandas as pd
# Read data, from a CSV file.
# The columns are labeled as 'id_student' and 'click_events.
df = pd.read_csv("/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2b.csv", names=['id_student', 'click_events'])
# Display the five rows of the DataFrame to quickly examine its structure and content.
print(df.head())
```
%% Output
id_student click_events
0 6516 2791.0
1 8462 656.0
2 11391 934.0
3 23629 NaN
4 23698 910.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.3 - Merge the data from two Dataframes
%% Cell type:code id: tags:
``` python
# Importing the pandas library
import pandas as pd
# Read data, from a CSV file in a DataFrame1 & DataFrame2.
Dataframe1 = pd.read_csv('task2a.csv')
Dataframe2 = pd.read_csv('task2b.csv', names=['id_student', 'click_events'])
# Merging DataFrame1 & DataFrame2 into a new DataFrame.
# How ? By utilizing the 'inner' merge technique we combine the rows, in both DataFrames that share common 'id_student' values.
merged_data_frame = pd.merge(Dataframe1, Dataframe2, on='id_student', how='inner')
# Display the five rows of the mergd DataFrame.
print(merged_data_frame.head())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.4 - Remove any rows that contain missing values
%% Cell type:code id: tags:
``` python
# Removing rows containing missing values
cleaned_data_frame = merged_data_frame.dropna()
# Displaying the cleaned new DataFrame
print(cleaned_data_frame.head())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.5 - Filter out unnecessary rows
%% Cell type:code id: tags:
``` python
# Filtering unnecessary rows where 'click_events' is smaller than 10
filtered_data_frame = cleaned_data_frame[cleaned_data_frame['click_events'] >= 10]
# Displaying the filtered DataFrame
print(filtered_data_frame.head())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.6 - Rename the score column
%% Cell type:code id: tags:
``` python
# Renaming the 'score' column to 'final_mark'
renamed_data_frame = filtered_data_frame.rename(columns={'score': 'final_mark'})
# Displaying the DataFrame with the renamed column
print(renamed_data_frame.head())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result final_mark click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.7 - Remove unnecessary column(s)
%% Cell type:code id: tags:
``` python
# add code here
# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.
# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.
final_data_frame = renamed_data_frame.drop(columns=['region', 'final_result', 'highest_education'])
# Displaying the Final DataFrame
print(final_data_frame.head())
```
%% Output
Unnamed: 0 id_student gender age_band disability final_mark click_events
0 0 11391 M 55<= N 82.0 934.0
1 1 28400 F 35-55 N 67.0 1435.0
2 2 31604 F 35-55 N 76.0 2158.0
3 3 32885 F 0-35 N 55.0 1034.0
4 4 38053 M 35-55 N 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.8 - Write the DataFrame data to a CSV file
%% Cell type:code id: tags:
``` python
# add code here
# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.
# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.
final_data_frame.to_csv('updated.csv', index=False)
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.9 - Investigate the effects of age-group on attainment and engagement
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.10 - Present the results of the age-group investigation using an appropriate visualisation
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.11 - Investigate the effects of engagement on attainment
%% Cell type:code id: tags:
``` python
# add code here
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Adherence to good coding style
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
# Process Development Report for Task 2
%% Cell type:markdown id: tags:
### Write here
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
......
updated.csv 0 → 100644
Source diff could not be displayed: it is too large. Options to address this: view the blob.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment