diff --git a/dmf.zip b/dmf.zip
new file mode 100644
index 0000000000000000000000000000000000000000..52267a7ac7a39f8761e85e7fa6232012ebe9efe0
Binary files /dev/null and b/dmf.zip differ
diff --git a/dmf/bristol-air-quality-data-clean.csv b/dmf/bristol-air-quality-data-clean.csv
deleted file mode 100644
index 9e18a79781f694e7d080c492eb50427c792edfd7..0000000000000000000000000000000000000000
Binary files a/dmf/bristol-air-quality-data-clean.csv and /dev/null differ
diff --git a/dmf/bristol-air-quality-data-datecrop.csv b/dmf/bristol-air-quality-data-datecrop.csv
deleted file mode 100644
index 2a10763be3de0c2b8ed44a9eaa282dbab99c5586..0000000000000000000000000000000000000000
Binary files a/dmf/bristol-air-quality-data-datecrop.csv and /dev/null differ
diff --git a/dmf/bristol-air-quality-data.csv b/dmf/bristol-air-quality-data.csv
deleted file mode 100644
index 0d7d1612d0dc77a6073bb1a6432594b608e32025..0000000000000000000000000000000000000000
Binary files a/dmf/bristol-air-quality-data.csv and /dev/null differ
diff --git a/dmf/md_images/Example_seaborn.png b/dmf/md_images/Example_seaborn.png
new file mode 100644
index 0000000000000000000000000000000000000000..034ca5eee9513d7c068030cc53bc37499171d99b
Binary files /dev/null and b/dmf/md_images/Example_seaborn.png differ
diff --git a/dmf/md_images/NO2_heatmap_mongodb.png b/dmf/md_images/NO2_heatmap_mongodb.png
new file mode 100644
index 0000000000000000000000000000000000000000..ab5904fd133b746c8ffd821e9d28a30bc0465e45
Binary files /dev/null and b/dmf/md_images/NO2_heatmap_mongodb.png differ
diff --git a/dmf/md_images/det_diag_1nf_sml.png b/dmf/md_images/det_diag_1nf_sml.png
new file mode 100644
index 0000000000000000000000000000000000000000..a25c911790d2f581998d49166fffd8576afaf2e4
Binary files /dev/null and b/dmf/md_images/det_diag_1nf_sml.png differ
diff --git a/dmf/md_images/det_diag_2nf_sml.png b/dmf/md_images/det_diag_2nf_sml.png
new file mode 100644
index 0000000000000000000000000000000000000000..3b7f04dc1f4909374197f6258af86fb4bbfe2e7a
Binary files /dev/null and b/dmf/md_images/det_diag_2nf_sml.png differ
diff --git a/dmf/md_images/mongodb_import.png b/dmf/md_images/mongodb_import.png
new file mode 100644
index 0000000000000000000000000000000000000000..a3e2ce3111c8e679aa8dcfbc3cb6dd30ed82cf25
Binary files /dev/null and b/dmf/md_images/mongodb_import.png differ
diff --git a/dmf/md_images/query_results.png b/dmf/md_images/query_results.png
new file mode 100644
index 0000000000000000000000000000000000000000..ead1923279331bc6628cc4583168b782c0faee70
Binary files /dev/null and b/dmf/md_images/query_results.png differ
diff --git a/dmf/pollution_data.sql b/dmf/pollution_data.sql
deleted file mode 100644
index 508d3a5942826034ae04c014ea49672e19e270cc..0000000000000000000000000000000000000000
Binary files a/dmf/pollution_data.sql and /dev/null differ
diff --git a/dmf/reflection.md b/dmf/reflection.md
new file mode 100644
index 0000000000000000000000000000000000000000..4828a2b07f58c3b3d032d2c8de4f8e33767ada4c
--- /dev/null
+++ b/dmf/reflection.md
@@ -0,0 +1,109 @@
+# Reflective Report
+
+## Description
+
+### Task 1:
+
+For 1a, I chose to use the csv module to iterate through the dataset and retain only required rows as this was more efficient than using Pandas.
+
+The script removes 663,634 observations captured prior to 1 January 2010 and a further seven rows where `Date Time` is missing.
+
+For 1b, I created a dictionary of `SiteID` and `Location`. Then compared each row's `SiteID` and `Location` against the dictionary. All non-matching entries (or where `SiteID` was missing) were removed. The script removes 747 observations.
+
+
+### Task 2:
+
+I conducted exploratory data analysis to inform entity-relationship modelling and datatypes.
+
+To normalise the data, I created a determinacy diagram for the First Normal Form: 
+
+![Determinacy diagram - 1NF](md_images/det_diag_1nf_sml.png)
+
+This identified a number of attributes with partial functional dependencies. Separating these into a separate relation normalised to the Second Normal Form:
+
+![Determinacy diagram - 2NF](md_images/det_diag_2nf_sml.png)
+
+For simplicity, I replaced the composite key with a surrogate key (`reading_id`).
+
+I used MySQL Workbench to create the Entity-Relationship model, using the forward-engineer feature to generate SQL code creating the database and tables. To use the forward-engineer feature on MySQL workbench, I customised my XAMPP installation to mysql version 8.0.23.
+
+
+### Task 3
+
+I started by writing a single entry to a valid SQL string (or directly to the database), scaling to the full CSV file. This led to a couple of challenges:
+
+- The SQL query for the `reading` table was too large when using a single `INSERT` statement
+- Python scripts were slow to execute (2-3 minutes each)
+
+I addressed these by:
+
+- Using chunking to include multiple observations per `INSERT`, packaging them within a single transaction
+- Refining my code:
+  - More efficient data cleaning, using a combination of `apply()` and lambda functions
+  - Creating a dataFrame close to the SQL format and using `itertuples()` to iterate and create the SQL queries
+
+This reduced run time of 3a and 3b to c.15s and c.30s.
+
+### Task 4
+
+I simplified each query into a basic SQL statement, adding clauses to restrict the results to meet task requirements.
+
+For example, for 4b I created a query that returned the maximum CO reading. I then created a query that returned `SiteID`, `Date Time` and `CO` levels and incorporated the original query as a subquery into the `WHERE` clause. Finally, incorporated a join to add location name from the `sites` table.
+
+### Task 5
+
+See `report.md`.
+
+### Visualisation
+
+The data could be visualised using Python tools such as Matplotlib, Seaborn, Bokeh or ggplot. A basic example could include: 
+
+![Example Seaborn plot](md_images/Example_seaborn.png)
+
+Options include: 
+- Bar charts comparing average readings across stations
+- Histograms examining the distribution of readings over a specified time period
+- Scatter diagrams or bubble charts visualising the relationship between measures
+- Heatmaps or geospatial plots visualising differences in readings across sites
+
+MongoDB has a visualisation module. Charts, including geospatial charts, can be created using a simple GUI. An example heatmap of NO readings using sampled data:
+
+![MongoDB dashboard](md_images/NO2_heatmap_mongodb.png)
+
+Interactive visualisations could also be developed using software such as PowerBI or Tableau. Other options include using a geospatial visualisation tool such as ArcGIS.
+
+
+## Feelings & Evaluation
+
+I found this assignment challenging in places, but, the end-to-end nature of the assignment was very satsifying. I felt my skills improved considerably over the course of the assignment.
+
+I found tasks 2 and 4 more straightforward and tasks 3 and 5 more challenging. 
+
+I particularly enjoyed the opportunity to try a NoSQL implementation.
+
+## Analysis and conclusions
+
+Where I had taken time to understand a particular topic area in advance and plan my approach (such as for tasks 2 and 4), I found tasks more straightforward.
+
+I found task 3 challenging in part because of the bottom-up approach I took - whilst my approach worked for a single line, it wasn't scalable. Task 5 was challenging as it was a completely new subject area for me.
+
+My key takeaway is to apply a more top-down approach to programming tasks, taking more time to plan an approach that ensures my solution remains scalable.
+
+This assignment has considerably improved my data management skills in line with the learning outcomes. In particular:
+
+- My ability to write efficient Python code to cleanse large datasets
+- My knowledge of entity-relationship modelling and ability to use it to implement a SQL database
+- Refreshing my ability to use SQL to extract data
+- My theoretical understanding of NoSQL database implementations, and practical knowledge of MongoDB
+
+## Action plan
+
+Next time I'm tackling a more complex python task, I will spend time planning a high-level approach before I start coding.
+
+I would like to further improve my knowledge of NoSQL database solutions, particularly neo4j. I plan to find a self-contained analysis exercise requiring graph analysis to practice this.
+
+## References
+
+- Weinman, Bill *SQL Essential Training* Available from: https://www.linkedin.com/learning/sql-essential-training-3/ [Accessed 04 May 2021]
+- Malik, Usman *Introduction to Python SQL Libraries* Available from: https://realpython.com/python-sql-libraries/ [Accessed 04 May 2021]
+- Optiz, Daniel *XAMPP - Replacing MariaDB with MySQL 8* Available from: https://odan.github.io/2019/11/17/xampp-replacing-mariadb-with-mysql-8.html [Accessed 04 May 2021]
\ No newline at end of file
diff --git a/dmf/report.md b/dmf/report.md
new file mode 100644
index 0000000000000000000000000000000000000000..8bef35375b944b2c038a30805fd4cf6af601dde4
--- /dev/null
+++ b/dmf/report.md
@@ -0,0 +1,68 @@
+## Report - Task 5 - NoSQL implementation
+
+### Rationale for NoSQL modelling and database choice
+
+The Bristol air quality dataset is well-suited for a relational database model, as the observations are uniformly structured, the database is static (for our purposes) and analysis of the data using aggregation is required.
+
+However, a NoSQL implementation may be beneficial if we were to begin uploading new recordings to the database on an hourly basis (increasing the data velocity and, over time, the data volume). The downside is that we may lose some of the benefits of a relational database model, such as consistency.
+
+I am required to apply a moderately complex query to the data, but I am not concerned about the relationship between observations. As a result, I've opted to use a document-store database (MongoDB), over a simple key-value database (such as Redis) or graph database (such as Neo4J).
+
+### Model structure
+
+To realise the performance benefits of a NoSQL implementation, model design should be driven by the access pattern (i.e. the types of queries to be supported). I used a de-normalised approach, with a single collection, where each reading is a document. This will avoid the performance penalty associated with linking multiple collections. As a result, and for simplicity, I retained the original structure of the csv file when importing into MongoDB.
+
+### Data import and index creation
+
+I converted the cleaned csv file into JSON. As there is no date format in JSON, I found that it was helpful for querying to split the `DateTime` field into `Year`, `Month`, `Day` and `Time`. I also converted reading values to floats (e.g. observations for `NO` and `NO2`), except where data was missing. For missing observations I left these as an empty string, although in future I would consider removing these fields before import, as MongoDB is able to store documents with non-uniform fields.
+
+I then imported the JSON file into MongoDB using:
+
+>```
+>mongoimport --uri="mongodb+srv://<username>:<password>@cluster0.bmyfl.mongodb.net/bristol-air-quality" --drop --collection=readings --file=upload_file.json
+>```
+
+One complete, I verified that the correct number of documents had been imported.
+
+![Import complete](md_images/mongodb_import.png)
+
+As we are proposing a query that groups by station and filters for a specific year, I added a composite index on station and year to improve performance.
+
+>```
+>db.readings.createIndex(  { "SiteID" : 1 , "Year" : 1 }  )
+>```
+
+### Database query
+
+To recreate the query, I used the aggregation pipeline to match on `Year` and `Time`, group by `Location` and calculate the average value of `NO` and `NO2`. Code used was:
+
+>```
+db.readings.aggregate([{$match: {
+  "Year" : "2019",
+  "Time" : "08:00"
+}}, {$group: {
+  _id: "$Location",
+  "NO_avg": {
+    "$avg": "$NO"
+  },
+  "NO2_avg" : {
+    "$avg" : "$NO2"
+  }
+}}]).pretty()
+>```
+
+To enable comparison against the outputs of task 4c, I read the outputs into Python, using Pandas to format the data as a table:
+
+>```python
+read_path = "task5\query_result.json"
+res = pd.read_json(read_path, lines=True)
+print(res)
+>```
+
+![Query results](md_images/query_results.png)
+
+### References
+
+Katsov, Ilya *NoSQL Data Modeling Techniques* Available from: https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ [Accessed 21 April 2021]
+
+MongoDB *Course M001: MongoDB Basics* Available from: https://university.mongodb.com/courses/M001/about [Accessed 21 April 2021]
\ No newline at end of file
diff --git a/task5/report.md b/task5/report.md
new file mode 100644
index 0000000000000000000000000000000000000000..91d9af0ee5a3e0908012390899ea0fa46565ca24
--- /dev/null
+++ b/task5/report.md
@@ -0,0 +1,68 @@
+## Report - Task 5 - NoSQL implementation
+
+### Rationale for NoSQL modelling and database choice
+
+The Bristol air quality dataset is well-suited for a relational database model, as the observations are uniformly structured, the database is static (for our purposes) and analysis of the data using aggregation is required.
+
+However, a NoSQL implementation may be beneficial if we were to begin uploading new recordings to the database on an hourly basis (increasing the data velocity and, over time, the data volume). The downside is that we may lose some of the benefits of a relational database model, such as consistency.
+
+I am required to apply a moderately complex query to the data, but I am not concerned about the relationship between observations. As a result, I've opted to use a document-store database (MongoDB), over a simple key-value database (such as Redis) or graph database (such as Neo4J).
+
+### Model structure
+
+To realise the performance benefits of a NoSQL implementation, model design should be driven by the access pattern (i.e. the types of queries to be supported). I used a de-normalised approach, with a single collection, where each reading is a document. This will avoid the performance penalty associated with linking multiple collections. As a result, and for simplicity, I retained the original structure of the csv file when importing into MongoDB.
+
+### Data import and index creation
+
+I converted the cleaned csv file into JSON. As there is no date format in JSON, I found that it was helpful for querying to split the `DateTime` field into `Year`, `Month`, `Day` and `Time`. I also converted reading values to floats (e.g. observations for `NO` and `NO2`), except where data was missing. For missing observations I left these as an empty string, although in future I would consider removing these fields before import, as MongoDB is able to store documents with non-uniform fields.
+
+I then imported the JSON file into MongoDB using:
+
+>```
+>mongoimport --uri="mongodb+srv://<username>:<password>@cluster0.bmyfl.mongodb.net/bristol-air-quality" --drop --collection=readings --file=upload_file.json
+>```
+
+One complete, I verified that the correct number of documents had been imported.
+
+![Import complete](mongodb_import.png)
+
+As we are proposing a query that groups by station and filters for a specific year, I added a composite index on station and year to improve performance.
+
+>```
+>db.readings.createIndex(  { "SiteID" : 1 , "Year" : 1 }  )
+>```
+
+### Database query
+
+To recreate the query, I used the aggregation pipeline to match on `Year` and `Time`, group by `Location` and calculate the average value of `NO` and `NO2`. Code used was:
+
+>```
+db.readings.aggregate([{$match: {
+  "Year" : "2019",
+  "Time" : "08:00"
+}}, {$group: {
+  _id: "$Location",
+  "NO_avg": {
+    "$avg": "$NO"
+  },
+  "NO2_avg" : {
+    "$avg" : "$NO2"
+  }
+}}]).pretty()
+>```
+
+To enable comparison against the outputs of task 4c, I read the outputs into Python, using Pandas to format the data as a table:
+
+>```python
+read_path = "task5\query_result.json"
+res = pd.read_json(read_path, lines=True)
+print(res)
+>```
+
+![Query results](query_results.png)
+
+### References
+
+Katsov, Ilya *NoSQL Data Modeling Techniques* Available from: https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ [Accessed 21 April 2021]
+
+MongoDB *Course M001: MongoDB Basics* Available from: https://university.mongodb.com/courses/M001/about [Accessed 21 April 2021]
\ No newline at end of file
diff --git a/task5/task5_report.md b/task5/task5_report.md
index 6042e1140ad9ca0da2ef3c6a0625b6b2b1846db1..91d9af0ee5a3e0908012390899ea0fa46565ca24 100644
--- a/task5/task5_report.md
+++ b/task5/task5_report.md
@@ -2,24 +2,24 @@
 
 ### Rationale for NoSQL modelling and database choice
 
-The Bristol air quality dataset is well-suited for an relational database model, as the observations follow the same format, the database is static (for our purposes) and a range of relatively complex queries have been used to investigate the data.
+The Bristol air quality dataset is well-suited for a relational database model, as the observations are uniformly structured, the database is static (for our purposes) and analysis of the data using aggregation is required.
 
-However, a NoSQL implementation may be beneficial if we were to begin uploading new recordings to the database on an hourly basis (increasing the data velocity and, over time, the data volume). The downside is that we may lose some of the benefits of an relational database model, such as consistency and reliability.
+However, a NoSQL implementation may be beneficial if we were to begin uploading new recordings to the database on an hourly basis (increasing the data velocity and, over time, the data volume). The downside is that we may lose some of the benefits of a relational database model, such as consistency.
 
-We are required to apply moderately complex queries to the data, but we are not concerned about the relationship between observations. As a result, I've opted to use a document-store database (MongoDB), over a key-value database or graph database.
+I am required to apply a moderately complex query to the data, but I am not concerned about the relationship between observations. As a result, I've opted to use a document-store database (MongoDB), over a simple key-value database (such as Redis) or graph database (such as Neo4J).
 
 ### Model structure
 
-To get the performance benefits of a NoSQL implementation, model design should be driven by the access patterns (i.e. the types of queries to be supported). As a result, I used a de-normalised approach, with a single collection, where each reading is an observation. This will avoid the performance penalty associated with linking multiple collections. As a result, I retained the original structure of the csv file, when importing into MongoDB.
+To realise the performance benefits of a NoSQL implementation, model design should be driven by the access pattern (i.e. the types of queries to be supported). I used a de-normalised approach, with a single collection, where each reading is a document. This will avoid the performance penalty associated with linking multiple collections. As a result, and for simplicity, I retained the original structure of the csv file when importing into MongoDB.
 
 ### Data import and index creation
 
-I converted the cleaned csv file into JSON. As there is no date format in JSON, I found that it was helpful to split the `DateTime` field into `Year`, `Month`, `Day` and `Time`. I also converted reading values to floats (e.g. observations for `NO` and `NO2`), except where data was missing. For missing observations I left these as an empty string, although in future I would consider removing these fields before import. (**TODO If there's time**)
+I converted the cleaned csv file into JSON. As there is no date format in JSON, I found that it was helpful for querying to split the `DateTime` field into `Year`, `Month`, `Day` and `Time`. I also converted reading values to floats (e.g. observations for `NO` and `NO2`), except where data was missing. For missing observations I left these as an empty string, although in future I would consider removing these fields before import, as MongoDB is able to store documents with non-uniform fields.
 
 I then imported the JSON file into MongoDB using:
 
 >```
->mongoimport --uri="mongodb+srv://<username>:<password>@cluster0.bmyfl.mongodb.net/bristol-air-quality" --drop --collection=readings --ignoreBlanks --file=upload_file.json
+>mongoimport --uri="mongodb+srv://<username>:<password>@cluster0.bmyfl.mongodb.net/bristol-air-quality" --drop --collection=readings --file=upload_file.json
 >```
 
 One complete, I verified that the correct number of documents had been imported.
@@ -34,7 +34,7 @@ As we are proposing a query that groups by station and filters for a specific ye
 
 ### Database query
 
-To recreate the query, I used the aggregation pipeline to match on `Year` and `Time`, group by `Location`, calculating the average value of `NO` and `NO2`. Code used was:
+To recreate the query, I used the aggregation pipeline to match on `Year` and `Time`, group by `Location` and calculate the average value of `NO` and `NO2`. Code used was:
 
 >```
 db.readings.aggregate([{$match: {
@@ -51,7 +51,7 @@ db.readings.aggregate([{$match: {
 }}]).pretty()
 >```
 
-To format as a simple table and ease comparison against the outputs of task 4c, I read the outputs into Python, using Pandas to format the data as a table:
+To enable comparison against the outputs of task 4c, I read the outputs into Python, using Pandas to format the data as a table:
 
 >```python
 read_path = "task5\query_result.json"
diff --git a/task6/det_diag_1nf.png b/task6/det_diag_1nf.png
index b107ec5bf5bf56a401971e90363f9a951896f93f..7cff7010bb66a2cb41b90707fd7e2120ba4ba5a9 100644
Binary files a/task6/det_diag_1nf.png and b/task6/det_diag_1nf.png differ
diff --git a/task6/det_diag_1nf_sml.png b/task6/det_diag_1nf_sml.png
index 93cda27dfffe5cbb59ed77c08a987e76ff41e2d6..a25c911790d2f581998d49166fffd8576afaf2e4 100644
Binary files a/task6/det_diag_1nf_sml.png and b/task6/det_diag_1nf_sml.png differ
diff --git a/task6/det_diag_2nf.png b/task6/det_diag_2nf.png
index 8feddc01d79bfb949cbf1924706f66e2ca018e6e..5f462fc155182ef3da4f11b1b7f996cfa133c13b 100644
Binary files a/task6/det_diag_2nf.png and b/task6/det_diag_2nf.png differ
diff --git a/task6/det_diag_2nf_sml.png b/task6/det_diag_2nf_sml.png
index f454db00f6d0a3cd0f7f76763856efaa0257157f..3b7f04dc1f4909374197f6258af86fb4bbfe2e7a 100644
Binary files a/task6/det_diag_2nf_sml.png and b/task6/det_diag_2nf_sml.png differ
diff --git a/task6/dmf_assessment_diagrams v1.pptx b/task6/dmf_assessment_diagrams v1.pptx
index 518ae86768a867757f3b9cee88aa10b32c3846e0..0ccac0fb51af20aad87635d10818388227889021 100644
Binary files a/task6/dmf_assessment_diagrams v1.pptx and b/task6/dmf_assessment_diagrams v1.pptx differ
diff --git a/task6/reflection.md b/task6/reflection.md
new file mode 100644
index 0000000000000000000000000000000000000000..9bc775fc1f78497eb8bb3699449600b2699f26c9
--- /dev/null
+++ b/task6/reflection.md
@@ -0,0 +1,109 @@
+# Reflective Report
+
+## Description
+
+### Task 1:
+
+For 1a, I chose to use the csv module to iterate through the dataset and retain only required rows as this was more efficient than using Pandas.
+
+The script removes 663,634 observations captured prior to 1 January 2010 and a further seven rows where `Date Time` is missing.
+
+For 1b, I created a dictionary of `SiteID` and `Location`. Then compared each row's `SiteID` and `Location` against the dictionary. All non-matching entries (or where `SiteID` was missing) were removed. The script removes 747 observations.
+
+
+### Task 2:
+
+I conducted exploratory data analysis to inform entity-relationship modelling and datatypes.
+
+To normalise the data, I created a determinacy diagram for the First Normal Form: 
+
+![Determinacy diagram - 1NF](det_diag_1nf_sml.png)
+
+This identified a number of attributes with partial functional dependencies. Separating these into a separate relation normalised to the Second Normal Form:
+
+![Determinacy diagram - 2NF](det_diag_2nf_sml.png)
+
+For simplicity, I replaced the composite key with a surrogate key (`reading_id`).
+
+I used MySQL Workbench to create the Entity-Relationship model, using the forward-engineer feature to generate SQL code creating the database and tables. To use the forward-engineer feature on MySQL workbench, I customised my XAMPP installation to mysql version 8.0.23.
+
+
+### Task 3
+
+I started by writing a single entry to a valid SQL string (or directly to the database), scaling to the full CSV file. This led to a couple of challenges:
+
+- The SQL query for the `reading` table was too large when using a single `INSERT` statement
+- Python scripts were slow to execute (2-3 minutes each)
+
+I addressed these by:
+
+- Using chunking to include multiple observations per `INSERT`, packaging them within a single transaction
+- Refining my code:
+  - More efficient data cleaning, using a combination of `apply()` and lambda functions
+  - Creating a dataFrame close to the SQL format and using `itertuples()` to iterate and create the SQL queries
+
+This reduced run time of 3a and 3b to c.15s and c.30s.
+
+### Task 4
+
+I simplified each query into a basic SQL statement, adding clauses to restrict the results to meet task requirements.
+
+For example, for 4b I created a query that returned the maximum CO reading. I then created a query that returned `SiteID`, `Date Time` and `CO` levels and incorporated the original query as a subquery into the `WHERE` clause. Finally, incorporated a join to add location name from the `sites` table.
+
+### Task 5
+
+See `report.md`.
+
+### Visualisation
+
+The data could be visualised using Python tools such as Matplotlib, Seaborn, Bokeh or ggplot. A basic example could include: 
+
+![Example Seaborn plot](Example_seaborn.png)
+
+Options include: 
+- Bar charts comparing average readings across stations
+- Histograms examining the distribution of readings over a specified time period
+- Scatter diagrams or bubble charts visualising the relationship between measures
+- Heatmaps or geospatial plots visualising differences in readings across sites
+
+MongoDB has a visualisation module. Charts, including geospatial charts, can be created using a simple GUI. An example heatmap of NO readings using sampled data:
+
+![MongoDB dashboard](NO2_heatmap_mongodb.png)
+
+Interactive visualisations could also be developed using software such as PowerBI or Tableau. Other options include using a geospatial visualisation tool such as ArcGIS.
+
+
+## Feelings & Evaluation
+
+I found this assignment challenging in places, but, the end-to-end nature of the assignment was very satsifying. I felt my skills improved considerably over the course of the assignment.
+
+I found tasks 2 and 4 more straightforward and tasks 3 and 5 more challenging. 
+
+I particularly enjoyed the opportunity to try a NoSQL implementation.
+
+## Analysis and conclusions
+
+Where I had taken time to understand a particular topic area in advance and plan my approach (such as for tasks 2 and 4), I found tasks more straightforward.
+
+I found task 3 challenging in part because of the bottom-up approach I took - whilst my approach worked for a single line, it wasn't scalable. Task 5 was challenging as it was a completely new subject area for me.
+
+My key takeaway is to apply a more top-down approach to programming tasks, taking more time to plan an approach that ensures my solution remains scalable.
+
+This assignment has considerably improved my data management skills in line with the learning outcomes. In particular:
+
+- My ability to write efficient Python code to cleanse large datasets
+- My knowledge of entity-relationship modelling and ability to use it to implement a SQL database
+- Refreshing my ability to use SQL to extract data
+- My theoretical understanding of NoSQL database implementations, and practical knowledge of MongoDB
+
+## Action plan
+
+Next time I'm tackling a more complex python task, I will spend time planning a high-level approach before I start coding.
+
+I would like to further improve my knowledge of NoSQL database solutions, particularly neo4j. I plan to find a self-contained analysis exercise requiring graph analysis to practice this.
+
+## References
+
+- Weinman, Bill *SQL Essential Training* Available from: https://www.linkedin.com/learning/sql-essential-training-3/ [Accessed 04 May 2021]
+- Malik, Usman *Introduction to Python SQL Libraries* Available from: https://realpython.com/python-sql-libraries/ [Accessed 04 May 2021]
+- Optiz, Daniel *XAMPP - Replacing MariaDB with MySQL 8* Available from: https://odan.github.io/2019/11/17/xampp-replacing-mariadb-with-mysql-8.html [Accessed 04 May 2021]
\ No newline at end of file
diff --git a/task6/reflective_report_working_draft.md b/task6/reflective_report_working_draft.md
index 1224baf005cc7ea9ffe8f612462e810c1641be0e..9bc775fc1f78497eb8bb3699449600b2699f26c9 100644
--- a/task6/reflective_report_working_draft.md
+++ b/task6/reflective_report_working_draft.md
@@ -1,106 +1,109 @@
-# Reflective report for Data Management Fundamentals Assigmnent
-# Student number: 20056481
+# Reflective Report
 
 ## Description
 
 ### Task 1:
 
-I initially used pandas to complete task 1a, however I ultimately chose to use the csv module to iterate through the dataset and retain only the required rows as this was more efficient, taking c.30s (compared to c.60s for the pandas approach).
+For 1a, I chose to use the csv module to iterate through the dataset and retain only required rows as this was more efficient than using Pandas.
 
-The script removes 663,634 observations captured prior to 1 January 2010, and a further seven rows where `Date Time` is missing. 
+The script removes 663,634 observations captured prior to 1 January 2010 and a further seven rows where `Date Time` is missing.
 
-I used a similar approach for task 1b, storing each site ID and location in a dictionary, then comparing the `SiteID` and `Location` fields for each row against the dictionary. All entries where `SiteID` did not match `Location`, or where `SiteID` was missing were removed. I also removed entries containing a `SiteID` that was not in the list provided. The script removes 747 observations.
+For 1b, I created a dictionary of `SiteID` and `Location`. Then compared each row's `SiteID` and `Location` against the dictionary. All non-matching entries (or where `SiteID` was missing) were removed. The script removes 747 observations.
 
 
 ### Task 2:
 
-I initially conducted some exploratory data analysis (EDA) to understand the dataset. This informed entity and relationship modelling, as well as appropriate datatypes.
+I conducted exploratory data analysis to inform entity-relationship modelling and datatypes.
 
-To normalise the data, I started by creating a determinacy diagram for the First Normal Form: 
+To normalise the data, I created a determinacy diagram for the First Normal Form: 
 
 ![Determinacy diagram - 1NF](det_diag_1nf_sml.png)
 
-This identified a number of attributes with partial functional dependencies, which I separated into a separate relation, to normalise to the Second Normal Form:
+This identified a number of attributes with partial functional dependencies. Separating these into a separate relation normalised to the Second Normal Form:
 
 ![Determinacy diagram - 2NF](det_diag_2nf_sml.png)
 
-For simplicity, I replaced the composite key with a surrogate key (`reading_id`). Whilst `SiteID` and `Date Time` fields are then dependent on `reading_id`, they do not violate the no transitive dependency requirement of the Third Normal Form as they are candidate keys.
+For simplicity, I replaced the composite key with a surrogate key (`reading_id`).
 
-I applied a consistent naming standard to the attribute names and set appropriate data types. I separated `geo_point_2d` into separate float-type fields for `geo_point_lat` and `geo_point_long`.
+I used MySQL Workbench to create the Entity-Relationship model, using the forward-engineer feature to generate SQL code creating the database and tables. To use the forward-engineer feature on MySQL workbench, I customised my XAMPP installation to mysql version 8.0.23.
 
-I used MySQL Workbench to create the Entity-Relationship model, using the forward-engineer feature to generate the code to create the database and tables.
 
 ### Task 3
 
-I spent considerable time iterating my approach for task 3. I started with writing a single entry to a valid SQL string, or directly to the database. I then scaled the approach to all rows of the CSV file. This led to a couple of challenges:
+I started by writing a single entry to a valid SQL string (or directly to the database), scaling to the full CSV file. This led to a couple of challenges:
 
 - The SQL query for the `reading` table was too large when using a single `INSERT` statement
-- The Python script took a considerable time to run (around 2-3 minutes for each task)
+- Python scripts were slow to execute (2-3 minutes each)
 
-I resolved these by:
+I addressed these by:
 
-- Using chunking to include multiple rows per `INSERT` statement, packaging all statements within a single transaction to improve speed
-- Refining my Python code including:
+- Using chunking to include multiple observations per `INSERT`, packaging them within a single transaction
+- Refining my code:
   - More efficient data cleaning, using a combination of `apply()` and lambda functions
-  - Creating a dataFrame as close to the required SQL format as possible and using `itertuples()` to efficiently iterate through each row and create the SQL queries
+  - Creating a dataFrame close to the SQL format and using `itertuples()` to iterate and create the SQL queries
 
-This reduced run time of 3a and 3b to c.15s and c.30s respectively. Running the SQL script takes c.15s.
+This reduced run time of 3a and 3b to c.15s and c.30s.
 
-## Task 4
+### Task 4
 
-For each query I broke the requirements into a general form to create a basic SQL statement and then added clauses to restrict the results to meet the requirements of the qustion.
+I simplified each query into a basic SQL statement, adding clauses to restrict the results to meet task requirements.
 
-For example, for query B I created a query that returned the maximum CO reading in the dataset. I then created a query that returned `SiteID`, `Date Time` and `CO` levels from `readings` and incorporated the original query as a subquery within the `WHERE` clause. Finally, incorporated a join to add location name from the `sites` table.
+For example, for 4b I created a query that returned the maximum CO reading. I then created a query that returned `SiteID`, `Date Time` and `CO` levels and incorporated the original query as a subquery into the `WHERE` clause. Finally, incorporated a join to add location name from the `sites` table.
 
-## Task 5
+### Task 5
 
-Prior to starting this task I took MongoDB's online course M001. For details on the process taken, see `report.md`.
+See `report.md`.
 
+### Visualisation
 
-## Visualisation
-
-The data could be visualised using Python tools such as Matplotlib, Seaborn, Bokeh or ggplot. An example could include: 
+The data could be visualised using Python tools such as Matplotlib, Seaborn, Bokeh or ggplot. A basic example could include: 
 
 ![Example Seaborn plot](Example_seaborn.png)
 
-Visualisation options include: 
-- Bar charts to compare average reading levels across stations
-- Histograms to examine the range of readings over a specified time period
-- Scatter diagrams or bubble charts to understand the relationship between different readings (such as NO and CO)
-- Heatmaps to visualise differences in readings across sites
-- Geospatial plots to visualise readings across locations, allowing for easier interpretation of proximity
-
-MongoDB has a visualisation module which is directly connected to the database. Charts, including geospatial charts, can be created using a GUI that also allows for simple manipulation of existing data into the format required for the visualisation.
+Options include: 
+- Bar charts comparing average readings across stations
+- Histograms examining the distribution of readings over a specified time period
+- Scatter diagrams or bubble charts visualising the relationship between measures
+- Heatmaps or geospatial plots visualising differences in readings across sites
 
-An example heatmap of NO readings using sampled data:
+MongoDB has a visualisation module. Charts, including geospatial charts, can be created using a simple GUI. An example heatmap of NO readings using sampled data:
 
 ![MongoDB dashboard](NO2_heatmap_mongodb.png)
 
-Interactive visualisations be developed using BI software such as PowerBI or Tableau. Other options include creating a Shiny dashboard with R or using a geospatial visualisation tool such as ArcGIS.
+Interactive visualisations could also be developed using software such as PowerBI or Tableau. Other options include using a geospatial visualisation tool such as ArcGIS.
 
 
 ## Feelings & Evaluation
 
-I found this assignment challenging in places, particularly task 3. However, I found the end-to-end nature of the assignment satsifying, and I felt my skills improved considerably over the course of the assignment.
+I found this assignment challenging in places, but, the end-to-end nature of the assignment was very satsifying. I felt my skills improved considerably over the course of the assignment.
 
-I found task 4 was the most straightforward, as I have a fair amount of experience using SQL. I also found MongoDB relatively straightforward to use, as the online course had given me a thorough overview of the technology.
+I found tasks 2 and 4 more straightforward and tasks 3 and 5 more challenging. 
 
-I found tasks 1 and 3 the most challenging, and spent a disproportionate amount of time on task 3.
+I particularly enjoyed the opportunity to try a NoSQL implementation.
 
-## Analysis
+## Analysis and conclusions
 
 Where I had taken time to understand a particular topic area in advance and plan my approach (such as for tasks 2 and 4), I found tasks more straightforward.
 
-I found task 3 challenging in part because of the bottom-up approach I took - starting with a single line and attempting to scale up. This meant that while my solution worked for a single line, it wasn't scalable.
+I found task 3 challenging in part because of the bottom-up approach I took - whilst my approach worked for a single line, it wasn't scalable. Task 5 was challenging as it was a completely new subject area for me.
 
-## Conclusions
+My key takeaway is to apply a more top-down approach to programming tasks, taking more time to plan an approach that ensures my solution remains scalable.
 
-My key takeaway is to take a more top-down approach to programming tasks, taking more time to plan an appropriate approach. This will help ensure my solution remains scalable when moving from a single-line to the full dataset.
+This assignment has considerably improved my data management skills in line with the learning outcomes. In particular:
 
-This assignment has considerably improved my Python skills and increased the breadth of my data management skillset. I have learnt how to create an E-R diagram, forward engineer a database, alternative ways of populating a database, as well as implementing a NoSQL database.
+- My ability to write efficient Python code to cleanse large datasets
+- My knowledge of entity-relationship modelling and ability to use it to implement a SQL database
+- Refreshing my ability to use SQL to extract data
+- My theoretical understanding of NoSQL database implementations, and practical knowledge of MongoDB
 
 ## Action plan
 
 Next time I'm tackling a more complex python task, I will spend time planning a high-level approach before I start coding.
 
-I would like to further increase my knowledge of NoSQL database solutions, particularly neo4j. I plan to find a self-contained analysis exercise requiring graph analysis to practice this.
\ No newline at end of file
+I would like to further improve my knowledge of NoSQL database solutions, particularly neo4j. I plan to find a self-contained analysis exercise requiring graph analysis to practice this.
+
+## References
+
+- Weinman, Bill *SQL Essential Training* Available from: https://www.linkedin.com/learning/sql-essential-training-3/ [Accessed 04 May 2021]
+- Malik, Usman *Introduction to Python SQL Libraries* Available from: https://realpython.com/python-sql-libraries/ [Accessed 04 May 2021]
+- Optiz, Daniel *XAMPP - Replacing MariaDB with MySQL 8* Available from: https://odan.github.io/2019/11/17/xampp-replacing-mariadb-with-mysql-8.html [Accessed 04 May 2021]
\ No newline at end of file