Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
ThePDSProject
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
UWE_ 23086369_2023
ThePDSProject
Commits
9c371bb2
Commit
9c371bb2
authored
1 year ago
by
UWE_ 23086369_2023
Browse files
Options
Downloads
Patches
Plain Diff
Saving the 'final_data_frame'
parent
068deae4
No related branches found
No related tags found
No related merge requests found
Changes
2
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
UFCFVQ-15-M Programming Task 2 Template.ipynb
+28
-5
28 additions, 5 deletions
UFCFVQ-15-M Programming Task 2 Template.ipynb
updated.csv
+25260
-0
25260 additions, 0 deletions
updated.csv
with
25288 additions
and
5 deletions
UFCFVQ-15-M Programming Task 2 Template.ipynb
+
28
−
5
View file @
9c371bb2
...
@@ -328,11 +328,31 @@
...
@@ -328,11 +328,31 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count":
null
,
"execution_count":
38
,
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Unnamed: 0 id_student gender age_band disability final_mark click_events\n",
"0 0 11391 M 55<= N 82.0 934.0\n",
"1 1 28400 F 35-55 N 67.0 1435.0\n",
"2 2 31604 F 35-55 N 76.0 2158.0\n",
"3 3 32885 F 0-35 N 55.0 1034.0\n",
"4 4 38053 M 35-55 N 68.0 2445.0\n"
]
}
],
"source": [
"source": [
"# add code here"
"# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.\n",
"# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.\n",
"\n",
"final_data_frame = renamed_data_frame.drop(columns=['region', 'final_result', 'highest_education'])\n",
"\n",
"# Displaying the Final DataFrame\n",
"\n",
"print(final_data_frame.head())"
]
]
},
},
{
{
...
@@ -352,11 +372,14 @@
...
@@ -352,11 +372,14 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count":
null
,
"execution_count":
39
,
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [],
"source": [
"source": [
"# add code here"
"# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.\n",
"# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.\n",
"\n",
"final_data_frame.to_csv('updated.csv', index=False)\n"
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# UFCFVQ-15-M Programming for Data Science
# UFCFVQ-15-M Programming for Data Science
# Programming Task 2
# Programming Task 2
## Student Id: 23086369
## Student Id: 23086369
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.1 - Read CSV data from a file (with a header row) into memory
### Requirement FR2.1 - Read CSV data from a file (with a header row) into memory
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# Importing the pandas library
# Importing the pandas library
import
pandas
as
pd
import
pandas
as
pd
# Read data, from a CSV file. Store it in a DataFrame.
# Read data, from a CSV file. Store it in a DataFrame.
df
=
pd
.
read_csv
(
"
/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2a.csv
"
)
df
=
pd
.
read_csv
(
"
/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2a.csv
"
)
# Display the five rows of the DataFrame to quickly examine its structure and content.
# Display the five rows of the DataFrame to quickly examine its structure and content.
print
(
df
.
head
())
print
(
df
.
head
())
```
```
%% Output
%% Output
Unnamed: 0 id_student gender region highest_education \
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score
age_band disability final_result score
0 55<= N Pass 82.0
0 55<= N Pass 82.0
1 35-55 N Pass 67.0
1 35-55 N Pass 67.0
2 35-55 N Pass 76.0
2 35-55 N Pass 76.0
3 0-35 N Pass 55.0
3 0-35 N Pass 55.0
4 35-55 N Pass 68.0
4 35-55 N Pass 68.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.2 - Read CSV data from a file (without a header row) into memory
### Requirement FR2.2 - Read CSV data from a file (without a header row) into memory
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# Importing the pandas library
# Importing the pandas library
import
pandas
as
pd
import
pandas
as
pd
# Read data, from a CSV file.
# Read data, from a CSV file.
# The columns are labeled as 'id_student' and 'click_events.
# The columns are labeled as 'id_student' and 'click_events.
df
=
pd
.
read_csv
(
"
/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2b.csv
"
,
names
=
[
'
id_student
'
,
'
click_events
'
])
df
=
pd
.
read_csv
(
"
/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2b.csv
"
,
names
=
[
'
id_student
'
,
'
click_events
'
])
# Display the five rows of the DataFrame to quickly examine its structure and content.
# Display the five rows of the DataFrame to quickly examine its structure and content.
print
(
df
.
head
())
print
(
df
.
head
())
```
```
%% Output
%% Output
id_student click_events
id_student click_events
0 6516 2791.0
0 6516 2791.0
1 8462 656.0
1 8462 656.0
2 11391 934.0
2 11391 934.0
3 23629 NaN
3 23629 NaN
4 23698 910.0
4 23698 910.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.3 - Merge the data from two Dataframes
### Requirement FR2.3 - Merge the data from two Dataframes
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# Importing the pandas library
# Importing the pandas library
import
pandas
as
pd
import
pandas
as
pd
# Read data, from a CSV file in a DataFrame1 & DataFrame2.
# Read data, from a CSV file in a DataFrame1 & DataFrame2.
Dataframe1
=
pd
.
read_csv
(
'
task2a.csv
'
)
Dataframe1
=
pd
.
read_csv
(
'
task2a.csv
'
)
Dataframe2
=
pd
.
read_csv
(
'
task2b.csv
'
,
names
=
[
'
id_student
'
,
'
click_events
'
])
Dataframe2
=
pd
.
read_csv
(
'
task2b.csv
'
,
names
=
[
'
id_student
'
,
'
click_events
'
])
# Merging DataFrame1 & DataFrame2 into a new DataFrame.
# Merging DataFrame1 & DataFrame2 into a new DataFrame.
# How ? By utilizing the 'inner' merge technique we combine the rows, in both DataFrames that share common 'id_student' values.
# How ? By utilizing the 'inner' merge technique we combine the rows, in both DataFrames that share common 'id_student' values.
merged_data_frame
=
pd
.
merge
(
Dataframe1
,
Dataframe2
,
on
=
'
id_student
'
,
how
=
'
inner
'
)
merged_data_frame
=
pd
.
merge
(
Dataframe1
,
Dataframe2
,
on
=
'
id_student
'
,
how
=
'
inner
'
)
# Display the five rows of the mergd DataFrame.
# Display the five rows of the mergd DataFrame.
print
(
merged_data_frame
.
head
())
print
(
merged_data_frame
.
head
())
```
```
%% Output
%% Output
Unnamed: 0 id_student gender region highest_education \
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.4 - Remove any rows that contain missing values
### Requirement FR2.4 - Remove any rows that contain missing values
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# Removing rows containing missing values
# Removing rows containing missing values
cleaned_data_frame
=
merged_data_frame
.
dropna
()
cleaned_data_frame
=
merged_data_frame
.
dropna
()
# Displaying the cleaned new DataFrame
# Displaying the cleaned new DataFrame
print
(
cleaned_data_frame
.
head
())
print
(
cleaned_data_frame
.
head
())
```
```
%% Output
%% Output
Unnamed: 0 id_student gender region highest_education \
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.5 - Filter out unnecessary rows
### Requirement FR2.5 - Filter out unnecessary rows
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# Filtering unnecessary rows where 'click_events' is smaller than 10
# Filtering unnecessary rows where 'click_events' is smaller than 10
filtered_data_frame
=
cleaned_data_frame
[
cleaned_data_frame
[
'
click_events
'
]
>=
10
]
filtered_data_frame
=
cleaned_data_frame
[
cleaned_data_frame
[
'
click_events
'
]
>=
10
]
# Displaying the filtered DataFrame
# Displaying the filtered DataFrame
print
(
filtered_data_frame
.
head
())
print
(
filtered_data_frame
.
head
())
```
```
%% Output
%% Output
Unnamed: 0 id_student gender region highest_education \
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.6 - Rename the score column
### Requirement FR2.6 - Rename the score column
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# Renaming the 'score' column to 'final_mark'
# Renaming the 'score' column to 'final_mark'
renamed_data_frame
=
filtered_data_frame
.
rename
(
columns
=
{
'
score
'
:
'
final_mark
'
})
renamed_data_frame
=
filtered_data_frame
.
rename
(
columns
=
{
'
score
'
:
'
final_mark
'
})
# Displaying the DataFrame with the renamed column
# Displaying the DataFrame with the renamed column
print
(
renamed_data_frame
.
head
())
print
(
renamed_data_frame
.
head
())
```
```
%% Output
%% Output
Unnamed: 0 id_student gender region highest_education \
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result final_mark click_events
age_band disability final_result final_mark click_events
0 55<= N Pass 82.0 934.0
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.7 - Remove unnecessary column(s)
### Requirement FR2.7 - Remove unnecessary column(s)
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# add code here
# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.
# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.
final_data_frame
=
renamed_data_frame
.
drop
(
columns
=
[
'
region
'
,
'
final_result
'
,
'
highest_education
'
])
# Displaying the Final DataFrame
print
(
final_data_frame
.
head
())
```
```
%% Output
Unnamed: 0 id_student gender age_band disability final_mark click_events
0 0 11391 M 55<= N 82.0 934.0
1 1 28400 F 35-55 N 67.0 1435.0
2 2 31604 F 35-55 N 76.0 2158.0
3 3 32885 F 0-35 N 55.0 1034.0
4 4 38053 M 35-55 N 68.0 2445.0
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.8 - Write the DataFrame data to a CSV file
### Requirement FR2.8 - Write the DataFrame data to a CSV file
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# add code here
# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.
# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.
final_data_frame
.
to_csv
(
'
updated.csv
'
,
index
=
False
)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.9 - Investigate the effects of age-group on attainment and engagement
### Requirement FR2.9 - Investigate the effects of age-group on attainment and engagement
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# add code here
# add code here
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.10 - Present the results of the age-group investigation using an appropriate visualisation
### Requirement FR2.10 - Present the results of the age-group investigation using an appropriate visualisation
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# add code here
# add code here
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Requirement FR2.11 - Investigate the effects of engagement on attainment
### Requirement FR2.11 - Investigate the effects of engagement on attainment
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
python
```
python
# add code here
# add code here
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Adherence to good coding style
### Adherence to good coding style
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Process Development Report for Task 2
# Process Development Report for Task 2
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### Write here
### Write here
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##### MARK:
##### MARK:
#### FEEDBACK:
#### FEEDBACK:
...
...
This diff is collapsed.
Click to expand it.
updated.csv
0 → 100644
+
25260
−
0
View file @
9c371bb2
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment