Projects
Survival rate of low birth weight babies
Aug 2018 - Sept 2018
Abstract: According to the World Health Organization (WHO) low birth weight is defined as a weight that is less than 2500g at birth. Throughout the world low birth weight continues to be a considerable health problem . This paper reports on a statistical investigation that was performed on a given SAS VA dataset, which contains birth data from 671 neonates with birth weight less than 1500g. The investigation seeks to understand if a statistical relevant relationship exists between any of the predictive variables in the dataset and the survival rate of low birth weight babies (null-hypothesis). A standard, methodical investigation were performed on all the variables in the dataset and then utilized to build a comprehensive logistic regression model. Several predictive variables were identified as potential influencers in determining/predicting survival rates for neonates and an interesting interaction effect was discovered, which all lend credibility towards supporting the notion of rejecting the null-hypothesis. Additionally, the results from the logistic regression analysis were compared to a decision tree and it was found that the two classification techniques yielded very similar results.
Link
Black vs. White
Jun 2018 - Jun 2018
Abstract: Do black vehicles have a greater mean wheel diameter than white vehicles? The underlying, safe assumption in this regard is that there is no significant difference in mean wheel diameter between black vehicles and white vehicles (null-hypothesis). The R-code can be used as is. Although, the analysis and interpretation of the statistical test is really what this exercise was about.
Link
Analysis of TV Energy Ratings
Jan 2018 - Feb 2018
Abstract: This project investigates the relationship between the physical attributes of television appliances (screen size, power consumption and technology) and their associated energy ratings.The R-code can be executed as is. It will attempt to read the main datafile from the data.gov.au site. If this fails due to the file being removed, changed or renamed, I have added the original data file in the repo.
Link
AutoSalesReports
Jun 2017 - Dec 2018
Abstract: Utilizing the Caspio cloud based development environment I designed, coded, tested and rolled-out AutoSalesReports (ASR). ASR is a cloud based BI solution for the Australian automotive industry. ASR is a fully integrated CRM and Saleslog system, with advanced BI reporting capabilities. Microsoft SQL Server was used as the cloud database. Extensive Java scripting was used to facilitate a large portion of ASR's functionality in Caspio.
Link
The Face of Death
Jun 2017 - Dec 2018
Abstract: The purpose of this project was to deploy the mechanisms of Principle Component Analysis (PCA) on a dataset, which contains the categorized death records of the Australian populace since 1907. This report presents the interesting idea of visualizing death data in three dimensions. A subtle personification of death is created by visualising death data. In doing so, numerous observations come to light whereby we can gain a better understanding of related entities that are hidden in and between the death numbers, such as correlation between deaths in world war one and two as well as disassociation within deaths in the age groups 85+.
Link
Tableau Dashboard
Jun 2017 - Dec 2018
Abstract: This report explains how a given dataset is manipulated, augmented and consumed by Tableau. Five questions, as anticipated in a management meeting scenario, are asked about the dataset. To answer the five questions a rationale is given for the design decisions that underlies four visualizations.
Link to GitHub
Naive Bayes Classifier
Jun 2017 - Dec 2018
Abstract: This project implements a wrapper to the Naive Bayes classifier in R. Feature selection is done with a prune later scheme. The UCI Mushroom dataset is classified for poisonous and non-poisonous mushrooms.
Link
Little excercise in Numpy and Pandas
Jun 2017 - Dec 2018
Abstract: Implementation of simplistic python app, which calculates basic statistics on a given dataset.
Link
DIKW-Hierarchy
Jun 2017 - Dec 2018
Abstract: Let's get a bit philosophical. Concepts of data, information, knowledge and wisdom are the core building blocks of data science. Russell Lincoln Ackoff (1989) was the first to systematically arrange these terms into a hierarchy – referred to variously as the DIKW hierarchy, Knowledge pyramid, Knowledge hierarchy and Information hierarchy – in his article ’From Data to Wisdom’ This paper lists my thoughts on the DIKW hierarchy against five questions asked.
Link