"### Requirement FR1 - Develop a function to find the arithmetic mean"
"### <font color = 'orange'>Requirement FR1</font> - Develop a function to find the arithmetic mean"
]
]
},
},
{
{
...
@@ -98,12 +99,13 @@
...
@@ -98,12 +99,13 @@
]
]
},
},
{
{
"attachments": {},
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"deletable": false
"deletable": false
},
},
"source": [
"source": [
"### Requirement FR2 - Develop a function to read a single column from a CSV file"
"### <font color = 'orange'>Requirement FR2</font> - Develop a function to read a single column from a CSV file"
]
]
},
},
{
{
...
@@ -166,12 +168,13 @@
...
@@ -166,12 +168,13 @@
]
]
},
},
{
{
"attachments": {},
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"deletable": false
"deletable": false
},
},
"source": [
"source": [
"### Requirement FR3 - Develop a function to read CSV data from a file into memory"
"### <font color = 'orange'>Requirement FR3</font> - Develop a function to read CSV data from a file into memory"
]
]
},
},
{
{
...
@@ -224,12 +227,13 @@
...
@@ -224,12 +227,13 @@
]
]
},
},
{
{
"attachments": {},
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"deletable": false
"deletable": false
},
},
"source": [
"source": [
"### Requirement FR4 - Develop a function to calculate the Pearson Correlation Coefficient for two named columns"
"### <font color = 'orange'>Requirement FR4</font> - Develop a function to calculate the Pearson Correlation Coefficient for two named columns"
]
]
},
},
{
{
...
@@ -312,12 +316,13 @@
...
@@ -312,12 +316,13 @@
]
]
},
},
{
{
"attachments": {},
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"deletable": false
"deletable": false
},
},
"source": [
"source": [
"### Requirement FR5 - Develop a function to generate a set of Pearson Correlation Coefficients for a given data file "
"### <font color = 'orange'>Requirement FR5</font> - Develop a function to generate a set of Pearson Correlation Coefficients for a given data file "
]
]
},
},
{
{
...
@@ -372,12 +377,13 @@
...
@@ -372,12 +377,13 @@
]
]
},
},
{
{
"attachments": {},
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"deletable": false
"deletable": false
},
},
"source": [
"source": [
"### Requirement FR6 - Develop a function to print a custom table"
"### <font color = 'orange'>Requirement FR6</font> - Develop a function to print a custom table"
]
]
},
},
{
{
...
@@ -516,14 +522,15 @@
...
@@ -516,14 +522,15 @@
]
]
},
},
{
{
"attachments": {},
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"### <b> Introduction </b>\n",
"### <font color = 'gold'><b> Introduction </b></font>\n",
"\n",
"\n",
"The purpose of this report is to provide a short, critical self-assessment of my code development process for Task 1 of the coursework for `UFCFVQ-15-M Programming_for_Data_Science`. \n",
"The purpose of this report is to provide a short, critical self-assessment of my code development process for Task 1 of the coursework for `UFCFVQ-15-M Programming_for_Data_Science`. \n",
"\n",
"\n",
"### <b> Code Description </b>\n",
"### <b><font color = 'gold'> Code Description </font></b>\n",
"Task_1 requires writing functions in order, to ultimately calculate Pearson’s Correlation Coefficients (PCCs) for pairs of variables in a given data file, without using imported Python libraries, and printing a decent-looking table. \n",
"Task_1 requires writing functions in order, to ultimately calculate Pearson’s Correlation Coefficients (PCCs) for pairs of variables in a given data file, without using imported Python libraries, and printing a decent-looking table. \n",
"\n",
"\n",
"Functional requirements (FRs):\n",
"Functional requirements (FRs):\n",
...
@@ -540,7 +547,7 @@
...
@@ -540,7 +547,7 @@
"The code was developed in a Jupyter notebook using a Python 3.11 kernel. \n",
"The code was developed in a Jupyter notebook using a Python 3.11 kernel. \n",
"\n",
"\n",
"\n",
"\n",
"### <b> Development Process</b>\n",
"### <b> <font color = 'gold'>Development Process</b></font>\n",
"\n",
"\n",
"My development process made use of the task’s inherent structure, allowing me to plan, develop and test each FR independently, before combining as needed. This was especially useful for more complex FRs, which required significant iteration and testing before achieving the desired results.\n",
"My development process made use of the task’s inherent structure, allowing me to plan, develop and test each FR independently, before combining as needed. This was especially useful for more complex FRs, which required significant iteration and testing before achieving the desired results.\n",
"\n",
"\n",
...
@@ -551,7 +558,7 @@
...
@@ -551,7 +558,7 @@
" \n",
" \n",
"I made conscious use of “new-to-me” tools and techniques like Git, VS_Code, Jupyter notebooks, Markdown.\n",
"I made conscious use of “new-to-me” tools and techniques like Git, VS_Code, Jupyter notebooks, Markdown.\n",
"\n",
"\n",
"### <b> Code Evaluation </b>\n",
"### <b> <font color = 'gold'>Code Evaluation </b></font>\n",
"Overall, I am pleased with my code - functions achieve the requirements (as interpreted) and they <i>feel</i> efficient and robust. \n",
"Overall, I am pleased with my code - functions achieve the requirements (as interpreted) and they <i>feel</i> efficient and robust. \n",
"\n",
"\n",
"Principles in mind when writing functions:\n",
"Principles in mind when writing functions:\n",
...
@@ -561,12 +568,12 @@
...
@@ -561,12 +568,12 @@
"* Unambiguous, self-explanatory naming of functions and variables\n",
"* Unambiguous, self-explanatory naming of functions and variables\n",
"* Helpful comments/docstrings by balancing approaches like DRY (Don’t Repeat Yourself), WET (Write Everything Twice), KISS (Keep it Simple, Stupid)\n",
"* Helpful comments/docstrings by balancing approaches like DRY (Don’t Repeat Yourself), WET (Write Everything Twice), KISS (Keep it Simple, Stupid)\n",
"\n",
"\n",
"#### <b> Strengths </b> \n",
"#### <b><font color = 'gold'> Strengths</font> </b> \n",
"* Well-commented, functioning code\n",
"* Well-commented, functioning code\n",
"* Consistent Git use for version control\n",
"* Consistent Git use for version control\n",
"* Kept working notes\n",
"* Kept working notes\n",
"\n",
"\n",
"#### <b> Improvements / To-do </b>\n",
"#### <b> <font color = 'gold'>Improvements / To-do </font></b>\n",
"* Perhaps over-commented; erred on side of caution\n",
"* Perhaps over-commented; erred on side of caution\n",
"[Archived reflective notes by task](archived\\Task1_FR_reflections.md)\n",
"[Archived reflective notes by task](archived\\Task1_FR_reflections.md)\n",
"\n",
"\n",
"\n",
"\n",
"#### <b> Summary </b>\n",
"#### <b> <font color = 'gold'>Summary</font> </b>\n",
"I found this task both appealing and beneficial. It allowed me to build a useful function from the ground up, making use of different Python coding techniques and data structures whilst also employing version control and applying appropriate metadata to the code.\n",
"I found this task both appealing and beneficial. It allowed me to build a useful function from the ground up, making use of different Python coding techniques and data structures whilst also employing version control and applying appropriate metadata to the code.\n",
"\n",
"\n",
"I am super-keen to keep learning for my personal and professional development, picking up best practice, standard approaches and avoiding pitfalls. This task allowed me to practice all of this. \n",
"I am super-keen to keep learning for my personal and professional development, picking up best practice, standard approaches and avoiding pitfalls. This task allowed me to practice all of this. \n",
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# UFCFVQ-15-M Programming for Data Science (Autumn 2022)
# UFCFVQ-15-M Programming for Data Science (Autumn 2022)
Function with two mandatory parameters (filename and column number (0 to n-1)) and two optional parameters (delimiter and header). The function will return a list of data from a specified column and the column name (if header is True). If header is False, the function will return only the list of data.
Function with two mandatory parameters (filename and column number (0 to n-1)) and two optional parameters (delimiter and header). The function will return a list of data from a specified column and the column name (if header is True). If header is False, the function will return only the list of data.
Function with one mandatory parameter (filename) to read csv file and return a dictionary with column names as keys and data as values. Default delimiter is comma. Assumes that the first line of the file contains the column names (i.e. header) which become the dictionary keys.
Function with one mandatory parameter (filename) to read csv file and return a dictionary with column names as keys and data as values. Default delimiter is comma. Assumes that the first line of the file contains the column names (i.e. header) which become the dictionary keys.
'''
'''
try:
try:
withopen(filename)asopenFile:# open file
withopen(filename)asopenFile:# open file
variable=next(openFile).split(delimiter)# read first line and split into list - to get dictionary keys
variable=next(openFile).split(delimiter)# read first line and split into list - to get dictionary keys
data=[line.split(delimiter)forlineinopenFile]# read remaining lines and split into list of lists - to get corresponding dictionary values
data=[line.split(delimiter)forlineinopenFile]# read remaining lines and split into list of lists - to get corresponding dictionary values
variable_data_dict={variable[i]:[float(row[i])forrowindata]foriinrange(len(variable))}# create dictionary with keys and values (as float) by iterating through variable list and data list
variable_data_dict={variable[i]:[float(row[i])forrowindata]foriinrange(len(variable))}# create dictionary with keys and values (as float) by iterating through variable list and data list
returnvariable_data_dict
returnvariable_data_dict
exceptFileNotFoundError:
exceptFileNotFoundError:
print("Error: File not found. Please check file name, extension and path")
print("Error: File not found. Please check file name, extension and path")
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
my_dict=FR3_read_csv_into_dictionary('task1.csv')
my_dict=FR3_read_csv_into_dictionary('task1.csv')
print(my_dict['age'][0:10])# print first 10 elements of 'age' column
print(my_dict['age'][0:10])# print first 10 elements of 'age' column
### Requirement FR4 - Develop a function to calculate the Pearson Correlation Coefficient for two named columns
### <font color = 'orange'>Requirement FR4</font> - Develop a function to calculate the Pearson Correlation Coefficient for two named columns
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defFR4_pearsonCorrCoef(x,y):
defFR4_pearsonCorrCoef(x,y):
'''
'''
Function to calculate the Pearson Correlation Coefficient (PCC), often represented by the letter 'r'. PCC is a measure of linear correlation between two variables between -1 and 1. A value of 1 indicates a perfect positive linear relationship; a value of -1 indicates a perfect negative linear relationship; and a value of 0 indicates no linear relationship.
Function to calculate the Pearson Correlation Coefficient (PCC), often represented by the letter 'r'. PCC is a measure of linear correlation between two variables between -1 and 1. A value of 1 indicates a perfect positive linear relationship; a value of -1 indicates a perfect negative linear relationship; and a value of 0 indicates no linear relationship.
The function takes two lists of numbers as input and returns a single value - the Pearson Correlation Coefficient. The function will return None if the lists are not the same length or if the lists contain non-numerical values.
The function takes two lists of numbers as input and returns a single value - the Pearson Correlation Coefficient. The function will return None if the lists are not the same length or if the lists contain non-numerical values.
'''
'''
# Check that x and y are lists of numbers of same length
# Check that x and y are lists of numbers of same length
try:
try:
asserttype(x)==type([])
asserttype(x)==type([])
asserttype(y)==type([])
asserttype(y)==type([])
assertlen(x)==len(y)
assertlen(x)==len(y)
assertlen(x)>0
assertlen(x)>0
exceptAssertionError:
exceptAssertionError:
print("Error: x and y MUST be same-length lists of only numbers in order to calculate Pearson's Correlation Coefficient")
print("Error: x and y MUST be same-length lists of only numbers in order to calculate Pearson's Correlation Coefficient")
returnNone
returnNone
# Calculate mean of x and y
# Calculate mean of x and y
avg_x=FR1_mean(x)
avg_x=FR1_mean(x)
avg_y=FR1_mean(y)
avg_y=FR1_mean(y)
# Calculate standard deviation of x and y
# Calculate standard deviation of x and y
stdx=(sum([(x-avg_x)**2forxinx])/len(x))**0.5
stdx=(sum([(x-avg_x)**2forxinx])/len(x))**0.5
stdy=(sum([(y-avg_y)**2foryiny])/len(y))**0.5
stdy=(sum([(y-avg_y)**2foryiny])/len(y))**0.5
# returns list of tuples with x, y and PCC values if required
# returns list of tuples with x, y and PCC values if required
#PCCs = [(x[i] - avg_x) * (y[i] - avg_y) / (stdx * stdy) for i in range(len(x))]
#PCCs = [(x[i] - avg_x) * (y[i] - avg_y) / (stdx * stdy) for i in range(len(x))]
#return [(x[i],y[i],PCCs[i]) for i in range(len(x))]
#return [(x[i],y[i],PCCs[i]) for i in range(len(x))]
# Calculate Pearson Correlation Coefficient for lists x and y
# Calculate Pearson Correlation Coefficient for lists x and y
### Requirement FR5 - Develop a function to generate a set of Pearson Correlation Coefficients for a given data file
### <font color = 'orange'>Requirement FR5</font> - Develop a function to generate a set of Pearson Correlation Coefficients for a given data file
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
defFR5_PCCs_from_csv(filename):
defFR5_PCCs_from_csv(filename):
'''
'''
Function to calculate Pearson Correlation Coefficient (PCC) for all combinations of columns in a csv file, where each column is a variable, with column header as variable name.
Function to calculate Pearson Correlation Coefficient (PCC) for all combinations of columns in a csv file, where each column is a variable, with column header as variable name.
'''
'''
# Read csv file into variable as dictionary
# Read csv file into variable as dictionary
my_dict=FR3_read_csv_into_dictionary(filename)
my_dict=FR3_read_csv_into_dictionary(filename)
# Iterate through dictionary to calculate PCC for all combinations of variables, using FR4_pearsonCorrCoef function
# Iterate through dictionary to calculate PCC for all combinations of variables, using FR4_pearsonCorrCoef function
Function which takes a list of tuples, columns to include (as *arguments) and optional single padding character (defaulted to '*') as parameters. The padding character is used to create a table with a border.
Function which takes a list of tuples, columns to include (as *arguments) and optional single padding character (defaulted to '*') as parameters. The padding character is used to create a table with a border.
'''
'''
# if no column headers are provided, use all unique column headers from tup_list
# if no column headers are provided, use all unique column headers from tup_list
ifnotcol_headers:
ifnotcol_headers:
col_headers=sorted(set([x[0]forxintup_list]))
col_headers=sorted(set([x[0]forxintup_list]))
else:
else:
col_headers=col_headers
col_headers=col_headers
# create list of unique row headers (same as cols)
# create list of unique row headers (same as cols)
row_headers=col_headers
row_headers=col_headers
# calculate maximum column width in the data
# calculate maximum column width in the data
max_width=int(max_col_width(tup_list)*1.9)
max_width=int(max_col_width(tup_list)*1.9)
# create table string with top border based on padding character and maximum column width
# create table string with top border based on padding character and maximum column width
### <font color = 'gold'><b> Introduction </b></font>
The purpose of this report is to provide a short, critical self-assessment of my code development process for Task 1 of the coursework for `UFCFVQ-15-M Programming_for_Data_Science`.
The purpose of this report is to provide a short, critical self-assessment of my code development process for Task 1 of the coursework for `UFCFVQ-15-M Programming_for_Data_Science`.
### <b> Code Description </b>
### <b><font color = 'gold'> Code Description </font></b>
Task_1 requires writing functions in order, to ultimately calculate Pearson’s Correlation Coefficients (PCCs) for pairs of variables in a given data file, without using imported Python libraries, and printing a decent-looking table.
Task_1 requires writing functions in order, to ultimately calculate Pearson’s Correlation Coefficients (PCCs) for pairs of variables in a given data file, without using imported Python libraries, and printing a decent-looking table.
Functional requirements (FRs):
Functional requirements (FRs):
| FR | Description |
| FR | Description |
|-----|-----------------------|
|-----|-----------------------|
| FR1 | Arithmetic mean |
| FR1 | Arithmetic mean |
| FR2 | Read column from file |
| FR2 | Read column from file |
| FR3 | Read file |
| FR3 | Read file |
| FR4 | PCC for two lists |
| FR4 | PCC for two lists |
| FR5 | PCC for file |
| FR5 | PCC for file |
| FR6 | Print table |
| FR6 | Print table |
The code was developed in a Jupyter notebook using a Python 3.11 kernel.
The code was developed in a Jupyter notebook using a Python 3.11 kernel.
### <b> Development Process</b>
### <b> <font color = 'gold'>Development Process</b></font>
My development process made use of the task’s inherent structure, allowing me to plan, develop and test each FR independently, before combining as needed. This was especially useful for more complex FRs, which required significant iteration and testing before achieving the desired results.
My development process made use of the task’s inherent structure, allowing me to plan, develop and test each FR independently, before combining as needed. This was especially useful for more complex FRs, which required significant iteration and testing before achieving the desired results.
I used a modified crisp-dm approach, understanding the requirements, then cycling through iterations of pseudocode, Python code and testing until achieving the desired results. I found it very effective, but also that I can occasionally go “off-piste” in the iterations, which can be time-consuming, frustrating and ultimately less productive.
I used a modified crisp-dm approach, understanding the requirements, then cycling through iterations of pseudocode, Python code and testing until achieving the desired results. I found it very effective, but also that I can occasionally go “off-piste” in the iterations, which can be time-consuming, frustrating and ultimately less productive.


I made conscious use of “new-to-me” tools and techniques like Git, VS_Code, Jupyter notebooks, Markdown.
I made conscious use of “new-to-me” tools and techniques like Git, VS_Code, Jupyter notebooks, Markdown.
### <b> Code Evaluation </b>
### <b> <font color = 'gold'>Code Evaluation </b></font>
Overall, I am pleased with my code - functions achieve the requirements (as interpreted) and they <i>feel</i> efficient and robust.
Overall, I am pleased with my code - functions achieve the requirements (as interpreted) and they <i>feel</i> efficient and robust.
Principles in mind when writing functions:
Principles in mind when writing functions:
* Future-proofed: generic, flexible, adaptable to allow reusability
* Future-proofed: generic, flexible, adaptable to allow reusability
* User-friendly, by adding assertions and error-handling
* User-friendly, by adding assertions and error-handling
* Unambiguous, self-explanatory naming of functions and variables
* Unambiguous, self-explanatory naming of functions and variables
* Helpful comments/docstrings by balancing approaches like DRY (Don’t Repeat Yourself), WET (Write Everything Twice), KISS (Keep it Simple, Stupid)
* Helpful comments/docstrings by balancing approaches like DRY (Don’t Repeat Yourself), WET (Write Everything Twice), KISS (Keep it Simple, Stupid)
#### <b> Strengths </b>
#### <b><font color = 'gold'> Strengths</font> </b>
* Well-commented, functioning code
* Well-commented, functioning code
* Consistent Git use for version control
* Consistent Git use for version control
* Kept working notes
* Kept working notes
#### <b> Improvements / To-do </b>
#### <b> <font color = 'gold'>Improvements / To-do </font></b>
* Perhaps over-commented; erred on side of caution
* Perhaps over-commented; erred on side of caution
[Archived reflective notes by task](archived\Task1_FR_reflections.md)
[Archived reflective notes by task](archived\Task1_FR_reflections.md)
#### <b> Summary </b>
#### <b> <font color = 'gold'>Summary</font> </b>
I found this task both appealing and beneficial. It allowed me to build a useful function from the ground up, making use of different Python coding techniques and data structures whilst also employing version control and applying appropriate metadata to the code.
I found this task both appealing and beneficial. It allowed me to build a useful function from the ground up, making use of different Python coding techniques and data structures whilst also employing version control and applying appropriate metadata to the code.
I am super-keen to keep learning for my personal and professional development, picking up best practice, standard approaches and avoiding pitfalls. This task allowed me to practice all of this.
I am super-keen to keep learning for my personal and professional development, picking up best practice, standard approaches and avoiding pitfalls. This task allowed me to practice all of this.
When it comes to Python, I am amazed at the many possibilities of solving the same scenario – this can make it challenging to identify the ‘best approach,’ if it exists. This is something I will need to get used to and embrace.
When it comes to Python, I am amazed at the many possibilities of solving the same scenario – this can make it challenging to identify the ‘best approach,’ if it exists. This is something I will need to get used to and embrace.