diff --git a/UFCFVQ-15-M_Programming_Task_2.ipynb b/UFCFVQ-15-M_Programming_Task_2.ipynb index d847f7bc72123f833b657865faa1c0c1af4ff0b2..edc89ba3946ceec0467eefae07193954dd3f9c43 100644 --- a/UFCFVQ-15-M_Programming_Task_2.ipynb +++ b/UFCFVQ-15-M_Programming_Task_2.ipynb @@ -1773,10 +1773,59 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "add markdown text here" + "## Introduction\n", + "\n", + "The purpose of this report is to reflect on my code development process for Task 2 of the `UFCFQV-15-M Programming_for_Data_Science` coursework.\n", + "\n", + "## Code Description\n", + "\n", + "Task_2 requires undertaking a short 'data science' project, making use of Python libraries such as `pandas`, `numpy`, `matplotlib`, `seaborn` and `scipy`. It involves importing two datasets, merging and cleaning data before analysis, including visualisation and appropriate statistical testing. The project is presented in a Jupyter notebook. \n", + "\n", + "## Development Process\n", + "\n", + "As with Task_1, my development process roughly followed an iterative CRISP-DM approach. \n", + "\n", + "Following the import, merge, filter, and clean tasks, I started the analysis. This was very much in the <i>exploratory</i> spirit of Exploratory Data Analysis (EDA). In hindsight, I would like to develop a more structured approach to my EDA.\n", + "\n", + "Being new to Python, it was a steep learning curve, where I often knew what I wanted to achieve but was not able to do it. I also needed to spend time refreshing my use of statistical tests.\n", + "\n", + "## Code Evaluation\n", + "\n", + "The initial tasks (FR7-9) were straight-forward in that it was a case of following the instructions. I did not encounter any problems with these tasks. That said, I always wonder whether there is a <i>more efficient</i> or <i>standard</i> approach. \n", + "\n", + "I used tools/skills I have encountered so far in terms of Python libraries – especially `pandas`.\n", + "\n", + "FR10-13 were much more challenging (and interesting) because they are not rigid tasks. Whilst undertaking EDA, I found that I was simultaneously contributing to all four remaining FRs and that I would need to unpick my code for the purposes of the coursework later.\n", + "\n", + "As a result of significant EDA (mainly visual), I decided to further 'clean' and 'prepare' the data before presenting any visualisations for FR10. I produced a lot of visualisations using both matplotlib and seaborn, experimenting with the many options, occasionally getting lost in the process yet always learning something new.\n", + "\n", + "I believe that my code is very thorough and shows helpful, relevant output. I made it as readable as possible, commenting where necessary, including adding markdown cells to present the story of the data. \n", + "\n", + "I would say that I may be presenting too much of similar visualisations - e.g., distributions. This is partly because this is coursework (and the audience is not typical) but also because I am not sure, yet which visualisation is preferable. \n", + "\n", + "FR11-13 are more succinct as I settled into an approach – style and convention (comments, naming, etc.) but still very thorough - i.e., I investigate using different statistical tests, visualisations and practiced git, markdown, etc.\n", + "\n", + "### Strengths\n", + "\n", + "* use of different libraries\n", + "* attempt to find a style / approach / format\n", + "* thoroughness\n", + "* achieving goals\n", + "\n", + "### Weaknesses\n", + "\n", + "* too many visualisations\n", + "* duplication - i.e. multiple similar visualisations or statistical tests (although in my defence, I want to learn and practice...)\n", + "\n", + "### Future Improvements\n", + "\n", + "* learning standards, best practice, efficiencies, more libraries\n", + "* balance code / comment / markdown / output\n", + "* gain confidence and skills\n" ] }, { @@ -1806,7 +1855,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.0" + "version": "3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]" }, "vscode": { "interpreter": {