Projects

Survival rate of low birth weight babies

Aug 2018 - Sept 2018

Abstract: According to the World Health Organization (WHO) low birth weight is defined as a weight that is less than 2500g at birth. Throughout the world low birth weight continues to be a considerable health problem . This paper reports on a statistical investigation that was performed on a given SAS VA dataset, which contains birth data from 671 neonates with birth weight less than 1500g. The investigation seeks to understand if a statistical relevant relationship exists between any of the predictive variables in the dataset and the survival rate of low birth weight babies (null-hypothesis). A standard, methodical investigation were performed on all the variables in the dataset and then utilized to build a comprehensive logistic regression model. Several predictive variables were identified as potential influencers in determining/predicting survival rates for neonates and an interesting interaction effect was discovered, which all lend credibility towards supporting the notion of rejecting the null-hypothesis. Additionally, the results from the logistic regression analysis were compared to a decision tree and it was found that the two classification techniques yielded very similar results.

Link

Black vs. White

Jun 2018 - Jun 2018

Abstract: Do black vehicles have a greater mean wheel diameter than white vehicles? The underlying, safe assumption in this regard is that there is no significant difference in mean wheel diameter between black vehicles and white vehicles (null-hypothesis). The R-code can be used as is. Although, the analysis and interpretation of the statistical test is really what this exercise was about.

Link

Analysis of TV Energy Ratings

Jan 2018 - Feb 2018

Abstract: This project investigates the relationship between the physical attributes of television appliances (screen size, power consumption and technology) and their associated energy ratings.The R-code can be executed as is. It will attempt to read the main datafile from the data.gov.au site. If this fails due to the file being removed, changed or renamed, I have added the original data file in the repo.

Link

AutoSalesReports

Jun 2017 - Dec 2018

Abstract: Utilizing the Caspio cloud based development environment I designed, coded, tested and rolled-out AutoSalesReports (ASR). ASR is a cloud based BI solution for the Australian automotive industry. ASR is a fully integrated CRM and Saleslog system, with advanced BI reporting capabilities. Microsoft SQL Server was used as the cloud database. Extensive Java scripting was used to facilitate a large portion of ASR's functionality in Caspio.

Link

Link to Caspio case study


The Face of Death

Jun 2017 - Dec 2018

Abstract: The purpose of this project was to deploy the mechanisms of Principle Component Analysis (PCA) on a dataset, which contains the categorized death records of the Australian populace since 1907. This report presents the interesting idea of visualizing death data in three dimensions. A subtle personification of death is created by visualising death data. In doing so, numerous observations come to light whereby we can gain a better understanding of related entities that are hidden in and between the death numbers, such as correlation between deaths in world war one and two as well as disassociation within deaths in the age groups 85+.

Link

Tableau Dashboard

Jun 2017 - Dec 2018

Abstract: This report explains how a given dataset is manipulated, augmented and consumed by Tableau. Five questions, as anticipated in a management meeting scenario, are asked about the dataset. To answer the five questions a rationale is given for the design decisions that underlies four visualizations.

Link to GitHub

Link to online dashboard


Naive Bayes Classifier

Jun 2017 - Dec 2018

Abstract: This project implements a wrapper to the Naive Bayes classifier in R. Feature selection is done with a prune later scheme. The UCI Mushroom dataset is classified for poisonous and non-poisonous mushrooms.

Link

Little excercise in Numpy and Pandas

Jun 2017 - Dec 2018

Abstract: Implementation of simplistic python app, which calculates basic statistics on a given dataset.

Link

DIKW-Hierarchy

Jun 2017 - Dec 2018

Abstract: Let's get a bit philosophical. Concepts of data, information, knowledge and wisdom are the core building blocks of data science. Russell Lincoln Ackoff (1989) was the first to systematically arrange these terms into a hierarchy – referred to variously as the DIKW hierarchy, Knowledge pyramid, Knowledge hierarchy and Information hierarchy – in his article ’From Data to Wisdom’ This paper lists my thoughts on the DIKW hierarchy against five questions asked.

Link