Introduction

This EDA takes a closer look at Australian charities that are registered with the ACNC (Australian Charities and Not-for-profits Commission). Since 3 December 2012, charities wanting to access Commonwealth charity tax concessions (and other benefits), need to register with the ACNC. Although many charities choose to register, registration with the ACNC is voluntary. The data, which contains the list of all ACNC registered charities, is made freely available by the Australian government and can be accessed here.

EDA Step 1

We need the following libraries:

library(tidyr)
library(dplyr)
library(readr)
library(ggplot2)
library(httr)
library(data.table)
library(lubridate)

EDA Step 2

Download the compressed file from data.gov.au and unzip.

tmpdir <- tempdir()
url <- 'https://data.gov.au/dataset/b050b242-4487-4306-abf5-07ca073e5594/resource/eb1e6be4-5b13-4feb-b28e-388bf7c26f93/download/20181011_datadotgov_main.zip'
file <- basename(url)
download.file(url, file)

An initial peak at the data indicated that we can omit a few columns which is of no value to this EDA.

charities <- read_csv(file, 
                      col_types = cols(ABN = col_skip(), Address_Line_1 = col_skip(), 
                                       Address_Line_2 = col_skip(), Address_Line_3 = col_skip(), 
                                       Charity_Legal_Name = col_skip(), 
                                       Charity_Website = col_skip(), Financial_Year_End = col_skip(), 
                                       Operating_Countries = col_skip(), 
                                       Other_Organisation_Names = col_skip()), 
                      locale = locale())

Utilise the data.table class

charities <- data.table(charities)

Let’s have a look at the dimensions of the dataset

ncol(charities)
nrow(charities)

The table contains 51 dimensions and 54912 observations. This indicates that currently there are 54912 charities registered with the ACNC.

Let’s take a peek at the dimension types.

glimpse(charities)
## Observations: 54,912
## Variables: 51
## $ Address_Type                                                   <chr> ...
## $ Town_City                                                      <chr> ...
## $ State                                                          <chr> ...
## $ Postcode                                                       <chr> ...
## $ Country                                                        <chr> ...
## $ Registration_Date                                              <chr> ...
## $ Date_Organisation_Established                                  <chr> ...
## $ Charity_Size                                                   <chr> ...
## $ Number_of_Responsible_Persons                                  <dbl> ...
## $ Operates_in_ACT                                                <chr> ...
## $ Operates_in_NSW                                                <chr> ...
## $ Operates_in_NT                                                 <chr> ...
## $ Operates_in_QLD                                                <chr> ...
## $ Operates_in_SA                                                 <chr> ...
## $ Operates_in_TAS                                                <chr> ...
## $ Operates_in_VIC                                                <chr> ...
## $ Operates_in_WA                                                 <chr> ...
## $ PBI                                                            <chr> ...
## $ HPC                                                            <chr> ...
## $ Preventing_or_relieving_suffering_of_animals                   <chr> ...
## $ Advancing_Culture                                              <chr> ...
## $ Advancing_Education                                            <chr> ...
## $ Advancing_Health                                               <chr> ...
## $ Promote_or_oppose_a_change_to_law__government_poll_or_prac     <chr> ...
## $ Advancing_natual_environment                                   <chr> ...
## $ Promoting_or_protecting_human_rights                           <chr> ...
## $ Purposes_beneficial_to_ther_general_public_and_other_analogous <chr> ...
## $ Promoting_reconciliation__mutual_respect_and_tolerance         <chr> ...
## $ Advancing_Religion                                             <chr> ...
## $ Advancing_social_or_public_welfare                             <chr> ...
## $ Advancing_security_or_safety_of_Australia_or_Australian_public <chr> ...
## $ Another_purpose_beneficial_to_the_community                    <chr> ...
## $ Aboriginal_or_TSI                                              <chr> ...
## $ Aged_Persons                                                   <chr> ...
## $ Children                                                       <chr> ...
## $ Communities_Overseas                                           <chr> ...
## $ Ethnic_Groups                                                  <chr> ...
## $ Gay__Lesbian__Bisexual                                         <chr> ...
## $ General_Community_in_Australia                                 <chr> ...
## $ Men                                                            <chr> ...
## $ Migrants__Refugees_or_Asylum_Seekers                           <chr> ...
## $ Pre_Post_Release_Offenders                                     <chr> ...
## $ People_with_Chronic_Illness                                    <chr> ...
## $ People_with_Disabilities                                       <chr> ...
## $ People_at_risk_of_homelessness                                 <chr> ...
## $ Unemployed_Persons                                             <chr> ...
## $ Veterans_or_their_families                                     <chr> ...
## $ Victims_of_crime                                               <chr> ...
## $ Victims_of_Disasters                                           <chr> ...
## $ Women                                                          <chr> ...
## $ Youth                                                          <chr> ...

The list contains one quantitave variable - Number_of_Responsible_Persons and two date stamps - Date_Organisation_Established and Registration_Date. The rest are all categorical variables.
Determine if there are any NAs.

charities[,sum(is.na(charities))]
## [1] 2052398

Plenty! This is due to the fact that a large number of the dimensions in the dataset are categorical variables and has only one factor level: “Y”. Charities that do not operate in Victoria are marked as NA.

The two date stamped variables are in two different formats and need to be transformed to a proper lubridate dmy format with the time zone marked a Australia/Sydney.

charities$Date_Organisation_Established <- dmy(charities$Date_Organisation_Established, tz = "Australia/Sydney")
charities$Registration_Date <- dmy(charities$Registration_Date, tz = "Australia/Sydney")

Establish a descending order for the state category when applied to ggplot.

#levels(charities$State)
plot <- within(charities, State <- factor(State, levels=names(sort(table(State), decreasing=TRUE))))
#levels(plot$State)

Let’s look at the spread of the number of responsible people that works for the different size charities (Large, Medium and Small)

ggplot(charities, aes(x = Charity_Size, y = Number_of_Responsible_Persons)) + 
  geom_boxplot() +
  labs(x = "Charity Size", y = "# of Responsible Persons%") +
  ggtitle("Figure1: Spread of repsonsible people per charity size")

From Figure 1 we can clearly see that the average number of responsible people drops as the size of the charity decreases. Let’s look at the distribution of responsible people working for charities per state.

ggplot(charities, aes(x = plot$State, y = Number_of_Responsible_Persons)) + 
  geom_boxplot() +
  labs(x = "Australian State", y = "# of Responsible Persons%") +
  ggtitle("Figure2: Spread of repsonsible people per Australian State")

Let’s look at the total number of charities registered for each state and coded per charity size

ggplot(charities,
        aes(x=plot$State,
        fill = Charity_Size,
        order = Charity_Size)) + 
        geom_bar() +
        labs(x = "Australian State", y = "Number of charities") +
        ggtitle("Figure3: Number of charities registered by state and coded per charity size")

From Figure 2 we can see that TAS, ACT and NT has the higest average number of repsonsible persons working for their registered charities. From Figure 3 we can see that TAS, ACT and NT has the smallest number of charities registered of all the states. This means that TAS, ACT and NT has the lowests number of registered charities but, on average employ more people per charity than the rest of the country. What is the reason for this?