Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
ThePDSProject
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
UWE_ 23086369_2023
ThePDSProject
Commits
9c371bb2
Commit
9c371bb2
authored
1 year ago
by
UWE_ 23086369_2023
Browse files
Options
Downloads
Patches
Plain Diff
Saving the 'final_data_frame'
parent
068deae4
Branches
Branches containing commit
No related tags found
No related merge requests found
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
UFCFVQ-15-M Programming Task 2 Template.ipynb
+28
-5
28 additions, 5 deletions
UFCFVQ-15-M Programming Task 2 Template.ipynb
updated.csv
+25260
-0
25260 additions, 0 deletions
updated.csv
with
25288 additions
and
5 deletions
UFCFVQ-15-M Programming Task 2 Template.ipynb
+
28
−
5
View file @
9c371bb2
...
...
@@ -328,11 +328,31 @@
},
{
"cell_type": "code",
"execution_count":
null
,
"execution_count":
38
,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Unnamed: 0 id_student gender age_band disability final_mark click_events\n",
"0 0 11391 M 55<= N 82.0 934.0\n",
"1 1 28400 F 35-55 N 67.0 1435.0\n",
"2 2 31604 F 35-55 N 76.0 2158.0\n",
"3 3 32885 F 0-35 N 55.0 1034.0\n",
"4 4 38053 M 35-55 N 68.0 2445.0\n"
]
}
],
"source": [
"# add code here"
"# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.\n",
"# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.\n",
"\n",
"final_data_frame = renamed_data_frame.drop(columns=['region', 'final_result', 'highest_education'])\n",
"\n",
"# Displaying the Final DataFrame\n",
"\n",
"print(final_data_frame.head())"
]
},
{
...
...
@@ -352,11 +372,14 @@
},
{
"cell_type": "code",
"execution_count":
null
,
"execution_count":
39
,
"metadata": {},
"outputs": [],
"source": [
"# add code here"
"# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.\n",
"# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.\n",
"\n",
"final_data_frame.to_csv('updated.csv', index=False)\n"
]
},
{
...
...
%% Cell type:markdown id: tags:
# UFCFVQ-15-M Programming for Data Science
# Programming Task 2
## Student Id: 23086369
%% Cell type:markdown id: tags:
### Requirement FR2.1 - Read CSV data from a file (with a header row) into memory
%% Cell type:code id: tags:
```
python
# Importing the pandas library
import
pandas
as
pd
# Read data, from a CSV file. Store it in a DataFrame.
df
=
pd
.
read_csv
(
"
/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2a.csv
"
)
# Display the five rows of the DataFrame to quickly examine its structure and content.
print
(
df
.
head
())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score
0 55<= N Pass 82.0
1 35-55 N Pass 67.0
2 35-55 N Pass 76.0
3 0-35 N Pass 55.0
4 35-55 N Pass 68.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.2 - Read CSV data from a file (without a header row) into memory
%% Cell type:code id: tags:
```
python
# Importing the pandas library
import
pandas
as
pd
# Read data, from a CSV file.
# The columns are labeled as 'id_student' and 'click_events.
df
=
pd
.
read_csv
(
"
/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2b.csv
"
,
names
=
[
'
id_student
'
,
'
click_events
'
])
# Display the five rows of the DataFrame to quickly examine its structure and content.
print
(
df
.
head
())
```
%% Output
id_student click_events
0 6516 2791.0
1 8462 656.0
2 11391 934.0
3 23629 NaN
4 23698 910.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.3 - Merge the data from two Dataframes
%% Cell type:code id: tags:
```
python
# Importing the pandas library
import
pandas
as
pd
# Read data, from a CSV file in a DataFrame1 & DataFrame2.
Dataframe1
=
pd
.
read_csv
(
'
task2a.csv
'
)
Dataframe2
=
pd
.
read_csv
(
'
task2b.csv
'
,
names
=
[
'
id_student
'
,
'
click_events
'
])
# Merging DataFrame1 & DataFrame2 into a new DataFrame.
# How ? By utilizing the 'inner' merge technique we combine the rows, in both DataFrames that share common 'id_student' values.
merged_data_frame
=
pd
.
merge
(
Dataframe1
,
Dataframe2
,
on
=
'
id_student
'
,
how
=
'
inner
'
)
# Display the five rows of the mergd DataFrame.
print
(
merged_data_frame
.
head
())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.4 - Remove any rows that contain missing values
%% Cell type:code id: tags:
```
python
# Removing rows containing missing values
cleaned_data_frame
=
merged_data_frame
.
dropna
()
# Displaying the cleaned new DataFrame
print
(
cleaned_data_frame
.
head
())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.5 - Filter out unnecessary rows
%% Cell type:code id: tags:
```
python
# Filtering unnecessary rows where 'click_events' is smaller than 10
filtered_data_frame
=
cleaned_data_frame
[
cleaned_data_frame
[
'
click_events
'
]
>=
10
]
# Displaying the filtered DataFrame
print
(
filtered_data_frame
.
head
())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result score click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.6 - Rename the score column
%% Cell type:code id: tags:
```
python
# Renaming the 'score' column to 'final_mark'
renamed_data_frame
=
filtered_data_frame
.
rename
(
columns
=
{
'
score
'
:
'
final_mark
'
})
# Displaying the DataFrame with the renamed column
print
(
renamed_data_frame
.
head
())
```
%% Output
Unnamed: 0 id_student gender region highest_education \
0 0 11391 M East Anglian Region HE Qualification
1 1 28400 F Scotland HE Qualification
2 2 31604 F South East Region A Level or Equivalent
3 3 32885 F West Midlands Region Lower Than A Level
4 4 38053 M Wales A Level or Equivalent
age_band disability final_result final_mark click_events
0 55<= N Pass 82.0 934.0
1 35-55 N Pass 67.0 1435.0
2 35-55 N Pass 76.0 2158.0
3 0-35 N Pass 55.0 1034.0
4 35-55 N Pass 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.7 - Remove unnecessary column(s)
%% Cell type:code id: tags:
```
python
# add code here
# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.
# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.
final_data_frame
=
renamed_data_frame
.
drop
(
columns
=
[
'
region
'
,
'
final_result
'
,
'
highest_education
'
])
# Displaying the Final DataFrame
print
(
final_data_frame
.
head
())
```
%% Output
Unnamed: 0 id_student gender age_band disability final_mark click_events
0 0 11391 M 55<= N 82.0 934.0
1 1 28400 F 35-55 N 67.0 1435.0
2 2 31604 F 35-55 N 76.0 2158.0
3 3 32885 F 0-35 N 55.0 1034.0
4 4 38053 M 35-55 N 68.0 2445.0
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.8 - Write the DataFrame data to a CSV file
%% Cell type:code id: tags:
```
python
# add code here
# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.
# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.
final_data_frame
.
to_csv
(
'
updated.csv
'
,
index
=
False
)
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.9 - Investigate the effects of age-group on attainment and engagement
%% Cell type:code id: tags:
```
python
# add code here
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.10 - Present the results of the age-group investigation using an appropriate visualisation
%% Cell type:code id: tags:
```
python
# add code here
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Requirement FR2.11 - Investigate the effects of engagement on attainment
%% Cell type:code id: tags:
```
python
# add code here
```
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
### Adherence to good coding style
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
%% Cell type:markdown id: tags:
# Process Development Report for Task 2
%% Cell type:markdown id: tags:
### Write here
%% Cell type:markdown id: tags:
##### MARK:
#### FEEDBACK:
...
...
This diff is collapsed.
Click to expand it.
updated.csv
0 → 100644
+
25260
−
0
View file @
9c371bb2
Source diff could not be displayed: it is too large. Options to address this:
view the blob
.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment