Saving the 'final_data_frame'

9c371bb2 · UWE_ 23086369_2023 · 068deae4 · 9c371bb2 · 9c371bb2
Commit 9c371bb2 authored 1 year ago by UWE_ 23086369_2023
--- a/UFCFVQ-15-M Programming Task 2 Template.ipynb
+++ b/UFCFVQ-15-M Programming Task 2 Template.ipynb
@@ -328,11 +328,31 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 38,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "   Unnamed: 0  id_student gender age_band disability  final_mark  click_events\n",
+      "0           0       11391      M     55<=          N        82.0         934.0\n",
+      "1           1       28400      F    35-55          N        67.0        1435.0\n",
+      "2           2       31604      F    35-55          N        76.0        2158.0\n",
+      "3           3       32885      F     0-35          N        55.0        1034.0\n",
+      "4           4       38053      M    35-55          N        68.0        2445.0\n"
+     ]
+    }
+   ],
   "source": [
-    "# add code here"
+    "# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.\n",
+    "# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.\n",
+    "\n",
+    "final_data_frame = renamed_data_frame.drop(columns=['region', 'final_result', 'highest_education'])\n",
+    "\n",
+    "# Displaying the Final DataFrame\n",
+    "\n",
+    "print(final_data_frame.head())"
   ]
  },
  {
@@ -352,11 +372,14 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
-    "# add code here"
+    "# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.\n",
+    "# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.\n",
+    "\n",
+    "final_data_frame.to_csv('updated.csv', index=False)\n"
   ]
  },
  {

 %% Cell type:markdown id: tags:

 # UFCFVQ-15-M Programming for Data Science
 # Programming Task 2

 ## Student Id: 23086369

 %% Cell type:markdown id: tags:

 ### Requirement FR2.1 - Read CSV data from a file (with a header row) into memory

 %% Cell type:code id: tags:

 ``` python
 # Importing the pandas library

 import pandas as pd

 # Read data, from a CSV file. Store it in a DataFrame.

 df = pd.read_csv("/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2a.csv")

 # Display the five rows of the DataFrame to quickly examine its structure and content.

 print(df.head())
 ```

 %% Output

       Unnamed: 0  id_student gender                region      highest_education  \
    0           0       11391      M   East Anglian Region       HE Qualification
    1           1       28400      F              Scotland       HE Qualification
    2           2       31604      F     South East Region  A Level or Equivalent
    3           3       32885      F  West Midlands Region     Lower Than A Level
    4           4       38053      M                 Wales  A Level or Equivalent
    
      age_band disability final_result  score
    0     55<=          N         Pass   82.0
    1    35-55          N         Pass   67.0
    2    35-55          N         Pass   76.0
    3     0-35          N         Pass   55.0
    4    35-55          N         Pass   68.0

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.2 - Read CSV data from a file (without a header row) into memory

 %% Cell type:code id: tags:

 ``` python
 # Importing the pandas library

 import pandas as pd

 # Read data, from a CSV file.
 # The columns are labeled as 'id_student' and 'click_events.

 df = pd.read_csv("/Users/mscdatascience/Documents/assignment-PDS/mohammad_alsuulaimani_uwe_23086369_2023/task2b.csv", names=['id_student', 'click_events'])

 # Display the five rows of the DataFrame to quickly examine its structure and content.

 print(df.head())
 ```

 %% Output

       id_student  click_events
    0        6516        2791.0
    1        8462         656.0
    2       11391         934.0
    3       23629           NaN
    4       23698         910.0

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.3 - Merge the data from two Dataframes

 %% Cell type:code id: tags:

 ``` python
 # Importing the pandas library

 import pandas as pd

 # Read data, from a CSV file in a DataFrame1 & DataFrame2.


 Dataframe1 = pd.read_csv('task2a.csv')
 Dataframe2 = pd.read_csv('task2b.csv', names=['id_student', 'click_events'])

 # Merging DataFrame1 & DataFrame2 into a new DataFrame.
 # How ? By utilizing the 'inner' merge technique we combine the rows, in both DataFrames that share common 'id_student' values.

 merged_data_frame = pd.merge(Dataframe1, Dataframe2, on='id_student', how='inner')

 # Display the five rows of the mergd DataFrame.

 print(merged_data_frame.head())
 ```

 %% Output

       Unnamed: 0  id_student gender                region      highest_education  \
    0           0       11391      M   East Anglian Region       HE Qualification
    1           1       28400      F              Scotland       HE Qualification
    2           2       31604      F     South East Region  A Level or Equivalent
    3           3       32885      F  West Midlands Region     Lower Than A Level
    4           4       38053      M                 Wales  A Level or Equivalent
    
      age_band disability final_result  score  click_events
    0     55<=          N         Pass   82.0         934.0
    1    35-55          N         Pass   67.0        1435.0
    2    35-55          N         Pass   76.0        2158.0
    3     0-35          N         Pass   55.0        1034.0
    4    35-55          N         Pass   68.0        2445.0

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.4 - Remove any rows that contain missing values

 %% Cell type:code id: tags:

 ``` python

 # Removing rows containing missing values
 cleaned_data_frame = merged_data_frame.dropna()

 # Displaying the cleaned new DataFrame
 print(cleaned_data_frame.head())
 ```

 %% Output

       Unnamed: 0  id_student gender                region      highest_education  \
    0           0       11391      M   East Anglian Region       HE Qualification
    1           1       28400      F              Scotland       HE Qualification
    2           2       31604      F     South East Region  A Level or Equivalent
    3           3       32885      F  West Midlands Region     Lower Than A Level
    4           4       38053      M                 Wales  A Level or Equivalent
    
      age_band disability final_result  score  click_events
    0     55<=          N         Pass   82.0         934.0
    1    35-55          N         Pass   67.0        1435.0
    2    35-55          N         Pass   76.0        2158.0
    3     0-35          N         Pass   55.0        1034.0
    4    35-55          N         Pass   68.0        2445.0

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.5 - Filter out unnecessary rows

 %% Cell type:code id: tags:

 ``` python
 # Filtering unnecessary rows where 'click_events' is smaller than 10
 filtered_data_frame = cleaned_data_frame[cleaned_data_frame['click_events'] >= 10]

 # Displaying the filtered DataFrame
 print(filtered_data_frame.head())

 ```

 %% Output

       Unnamed: 0  id_student gender                region      highest_education  \
    0           0       11391      M   East Anglian Region       HE Qualification
    1           1       28400      F              Scotland       HE Qualification
    2           2       31604      F     South East Region  A Level or Equivalent
    3           3       32885      F  West Midlands Region     Lower Than A Level
    4           4       38053      M                 Wales  A Level or Equivalent
    
      age_band disability final_result  score  click_events
    0     55<=          N         Pass   82.0         934.0
    1    35-55          N         Pass   67.0        1435.0
    2    35-55          N         Pass   76.0        2158.0
    3     0-35          N         Pass   55.0        1034.0
    4    35-55          N         Pass   68.0        2445.0

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.6 - Rename the score column

 %% Cell type:code id: tags:

 ``` python
 # Renaming the 'score' column to 'final_mark'
 renamed_data_frame = filtered_data_frame.rename(columns={'score': 'final_mark'})

 # Displaying the DataFrame with the renamed column
 print(renamed_data_frame.head())
 ```

 %% Output

       Unnamed: 0  id_student gender                region      highest_education  \
    0           0       11391      M   East Anglian Region       HE Qualification
    1           1       28400      F              Scotland       HE Qualification
    2           2       31604      F     South East Region  A Level or Equivalent
    3           3       32885      F  West Midlands Region     Lower Than A Level
    4           4       38053      M                 Wales  A Level or Equivalent
    
      age_band disability final_result  final_mark  click_events
    0     55<=          N         Pass        82.0         934.0
    1    35-55          N         Pass        67.0        1435.0
    2    35-55          N         Pass        76.0        2158.0
    3     0-35          N         Pass        55.0        1034.0
    4    35-55          N         Pass        68.0        2445.0

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.7 - Remove unnecessary column(s)

 %% Cell type:code id: tags:

 ``` python
-# add code here
+# Removing unnecessary rows from 'cleaned_data_frame' by using 'drop' method.
+# The result will be stored in 'final_data_frame', which no longer includes 'region', 'final_result', 'highest_education' columns.
+
+final_data_frame = renamed_data_frame.drop(columns=['region', 'final_result', 'highest_education'])
+
+# Displaying the Final DataFrame
+
+print(final_data_frame.head())
 ```

+%% Output
+
+       Unnamed: 0  id_student gender age_band disability  final_mark  click_events
+    0           0       11391      M     55<=          N        82.0         934.0
+    1           1       28400      F    35-55          N        67.0        1435.0
+    2           2       31604      F    35-55          N        76.0        2158.0
+    3           3       32885      F     0-35          N        55.0        1034.0
+    4           4       38053      M    35-55          N        68.0        2445.0
+
 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.8 - Write the DataFrame data to a CSV file

 %% Cell type:code id: tags:

 ``` python
-# add code here
+# Saving the 'final_data_frame' to a CSV file called 'updated.csv'.
+# By using the 'index=False' parameter I can ensure that the CSV file does not include row indices.
+
+final_data_frame.to_csv('updated.csv', index=False)
 ```

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.9 - Investigate the effects of age-group on attainment and engagement

 %% Cell type:code id: tags:

 ``` python
 # add code here
 ```

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.10 - Present the results of the age-group investigation using an appropriate visualisation

 %% Cell type:code id: tags:

 ``` python
 # add code here
 ```

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Requirement FR2.11 - Investigate the effects of engagement on attainment

 %% Cell type:code id: tags:

 ``` python
 # add code here
 ```

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 ### Adherence to good coding style

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

 %% Cell type:markdown id: tags:

 # Process Development Report for Task 2

 %% Cell type:markdown id: tags:

 ### Write here

 %% Cell type:markdown id: tags:

 ##### MARK:
 #### FEEDBACK:

--- a/updated.csv
+++ b/updated.csv