Generate summary or inference information for an iNZight plot

getPlotSummary(
  x,
  y = NULL,
  g1 = NULL,
  g1.level = NULL,
  g2 = NULL,
  g2.level = NULL,
  varnames = list(),
  colby = NULL,
  sizeby = NULL,
  data = NULL,
  design = NULL,
  freq = NULL,
  missing.info = TRUE,
  inzpars = inzpar(),
  summary.type = "summary",
  table.direction = c("horizontal", "vertical"),
  hypothesis.value = 0,
  hypothesis.alt = c("two.sided", "less", "greater"),
  hypothesis.var.equal = FALSE,
  hypothesis.use.exact = FALSE,
  hypothesis.test = c("default", "t.test", "anova", "chi2", "proportion"),
  hypothesis.simulated.p.value = FALSE,
  hypothesis = list(value = hypothesis.value, alternative = match.arg(hypothesis.alt),
    var.equal = hypothesis.var.equal, use.exact = hypothesis.use.exact, test =
    match.arg(hypothesis.test), simulated.p.value = hypothesis.simulated.p.value),
  survey.options = list(),
  width = 100,
  epi.out = FALSE,
  privacy_controls = NULL,
  html = FALSE,
  ...,
  env = parent.frame()
)

Arguments

x

a vector (numeric or factor), or the name of a column in the supplied data or design object

y

a vector (numeric or factor), or the name of a column in the supplied data or design object

g1

a vector (numeric or factor), or the name of a column in the supplied data or design object. This variable acts as a subsetting variable.

g1.level

the name (or numeric position) of the level of g1 that will be used instead of the entire data set

g2

a vector (numeric or factor), or the name of a column in the supplied data or design object. This variable acts as a subsetting variable, similar to g1

g2.level

same as g1.level, however takes the additional value "_MULTI", which produces a matrix of g1 by g2

varnames

a list of variable names, with the list named using the appropriate arguments (i.e., list(x = "height", g1 = "gender"))

colby

the name of a variable (numeric or factor) to colour points by. In the case of a numeric variable, a continuous colour scale is used, otherwise each level of the factor is assigned a colour

sizeby

the name of a (numeric) variable, which controls the size of points

data

the name of a data set

design

the name of a survey object, obtained from the survey package

freq

the name of a frequency variable if the data are frequencies

missing.info

logical, if TRUE, information regarding missingness is displayed in the plot

inzpars

allows specification of iNZight plotting parameters over multiple plots

summary.type

one of "summary" or "inference"

table.direction

one of 'horizontal' (default) or 'vertical' (useful for many categories)

hypothesis.value

H0 value for hypothesis test

hypothesis.alt

alternative hypothesis (!=, <, >)

hypothesis.var.equal

use equal variance assumption for t-test?

hypothesis.use.exact

logical, if TRUE the exact p-value will be calculated (if applicable)

hypothesis.test

in some cases (currently just two-samples) can perform multiple tests (t-test or ANOVA)

hypothesis.simulated.p.value

also calculate (where available) the simulated p-value

hypothesis

either NULL for no test, or missing (in which case above arguments are used)

survey.options

additional options passed to survey methods

width

width for the output, default is 100 characters

epi.out

logical, if TRUE, then odds/rate ratios and rate differences are printed when appropriate (y with 2 levels)

privacy_controls

optional, pass in confidentialisation and privacy controls (e.g., random rounding, suppression) for microdata

html

logical, it TRUE output will be returned as an HTML page (if supported)

...

additional arguments, see inzpar

env

compatibility argument

Value

an inzight.plotsummary object with a print method

Details

Works much the same as iNZightPlot

Author

Tom Elliott

Examples

getPlotSummary(Species, data = iris)
#> ====================================================================================================
#>                                           iNZight Summary
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Species (categorical)
#>                                  
#>    Total number of observations: 150
#> ====================================================================================================
#> 
#> Summary of the distribution of Species:
#> ---------------------------------------
#> 
#>              setosa   versicolor   virginica   Total
#>      Count       50           50          50     150
#>    Percent   33.33%       33.33%      33.33%    100%
#> 
#> 
#> ====================================================================================================
#> 
#> 
getPlotSummary(Species, data = iris,
    summary.type = "inference", inference.type = "conf")
#> ====================================================================================================
#>                                iNZight Inference using Normal Theory
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Species (categorical)
#>                                  
#>    Total number of observations: 150
#> ====================================================================================================
#> 
#> Inference of the distribution of Species:
#> -----------------------------------------
#> 
#> Estimated Proportions with 95% Confidence Interval
#> 
#>                 Estimate   Lower   Upper
#>        setosa      0.333   0.258   0.409
#>    versicolor      0.333   0.258   0.409
#>     virginica      0.333   0.258   0.409
#> 
#> Chi-square test for equal proportions
#> 
#>    X^2 = 0, df = 2, p-value = 1
#> 
#>           Null Hypothesis: true proportions in each category are equal
#>    Alternative Hypothesis: true proportions in each category are not equal
#> 
#> 
#> ### Difference in proportions of Species
#>     with 95% Confidence Intervals (adjusted for multiple comparisons)
#> 
#>                            Estimate     Lower    Upper
#>  -----------------------------------------------------
#>      setosa - versicolor          0   -0.1596   0.1596
#>      setosa - virginica           0   -0.1596   0.1596
#> 
#>  versicolor - virginica           0   -0.1596   0.1596
#> 
#> 
#> 
#> ====================================================================================================
#> 
#> 

# perform hypothesis testing
getPlotSummary(Sepal.Length, data = iris,
    summary.type = "inference", inference.type = "conf",
    hypothesis.value = 5)
#> ====================================================================================================
#>                                iNZight Inference using Normal Theory
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Sepal.Length (numeric)
#>                                  
#>    Total number of observations: 150
#> ====================================================================================================
#> 
#> Inference of Sepal.Length:
#> --------------------------
#> 
#> Mean with 95% Confidence Interval
#> 
#>    Estimate   Lower   Upper
#>       5.843    5.71   5.977
#> 
#> One Sample t-test
#> 
#>    t = 12.473, df = 149, p-value < 2.22e-16
#> 
#>           Null Hypothesis: true mean is equal to 5
#>    Alternative Hypothesis: true mean is not equal to 5
#> 
#> 
#> ====================================================================================================
#> 
#> 

# if you prefer a formula interface
inzsummary(Sepal.Length ~ Species, data = iris)
#> ====================================================================================================
#>                                           iNZight Summary
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Sepal.Length (numeric)
#>              Secondary variable: Species (categorical)
#>                                  
#>    Total number of observations: 150
#> ====================================================================================================
#> 
#> Summary of Sepal.Length by Species:
#> -----------------------------------
#> 
#> Estimates
#> 
#>                 Min     25%   Median   75%   Max    Mean       SD   Sample Size
#>        setosa   4.3   4.800      5.0   5.2   5.8   5.006   0.3525            50
#>    versicolor   4.9   5.600      5.9   6.3   7.0   5.936   0.5162            50
#>     virginica   4.9   6.225      6.5   6.9   7.9   6.588   0.6359            50
#> 
#> 
#> ====================================================================================================
#> 
#> 
inzinference(Sepal.Length ~ Species, data = iris)
#> ====================================================================================================
#>                                iNZight Inference using Normal Theory
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Sepal.Length (numeric)
#>              Secondary variable: Species (categorical)
#>                                  
#>    Total number of observations: 150
#> ====================================================================================================
#> 
#> Inference of Sepal.Length by Species:
#> -------------------------------------
#> 
#> Group Means with 95% Confidence Intervals
#> 
#>                 Estimate   Lower   Upper
#>        setosa      5.006   4.906   5.106
#>    versicolor      5.936   5.789   6.083
#>     virginica      6.588   6.407   6.769
#> 
#> One-way Analysis of Variance (ANOVA F-test)
#> 
#>    F = 119.26, df = 2 and 147, p-value < 2.22e-16
#> 
#>           Null Hypothesis: true group means are all equal
#>    Alternative Hypothesis: true group means are not all equal
#> 
#> Pairwise differences in group means with 95% Confidence Intervals and P-values
#> (The CIs and P-values have been adjusted for multiple comparisons)
#> 
#>                            Estimate     Lower     Upper      P-value
#>  -------------------------------------------------------------------
#>      setosa - versicolor     -0.930   -1.1738   -0.6862   < 2.22e-16
#>      setosa - virginica      -1.582   -1.8258   -1.3382   < 2.22e-16
#> 
#>  versicolor - virginica      -0.652   -0.8958   -0.4082   < 2.22e-16
#> 
#> 
#>           Null Hypothesis: true difference in group means is zero
#>    Alternative Hypothesis: true difference in group means is not zero
#> 
#> 
#> ====================================================================================================
#> 
#> 

## confidentialisation and privacy controls
# random rounding and suppression:
HairEyeColor_df <- as.data.frame(HairEyeColor)
inzsummary(Hair ~ Eye, data = HairEyeColor_df, freq = Freq)
#> ====================================================================================================
#>                                           iNZight Summary
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Hair (categorical)
#>              Secondary variable: Eye (categorical)
#>                                  
#>    Total number of observations: 32
#> ====================================================================================================
#> 
#> Summary of the distribution of Hair (columns) by Eye (rows):
#> ------------------------------------------------------------
#> 
#> Table of Counts:
#> 
#>            Black   Brown   Red   Blond   Row Total
#>    Brown      68     119    26       7         220
#>     Blue      20      84    17      94         215
#>    Hazel      15      54    14      10          93
#>    Green       5      29    14      16          64
#> 
#> Table of Percentages (within categories of Eye):
#> 
#>             Black    Brown      Red    Blond   Total   Row N
#>    Brown   30.91%   54.09%   11.82%    3.18%    100%     220
#>     Blue    9.30%   39.07%    7.91%   43.72%    100%     215
#>    Hazel   16.13%   58.06%   15.05%   10.75%    100%      93
#>    Green    7.81%   45.31%   21.88%   25.00%    100%      64
#> 
#> 
#> ====================================================================================================
#> 
#> 
inzsummary(Hair ~ Eye, data = HairEyeColor_df, freq = Freq,
    privacy_controls = list(
        rounding = "RR3",
        suppression = 10
    )
)
#> ====================================================================================================
#>                                           iNZight Summary
#> ----------------------------------------------------------------------------------------------------
#>    Primary variable of interest: Hair (categorical)
#>              Secondary variable: Eye (categorical)
#>                                  
#>    Total number of observations: 32
#> ====================================================================================================
#> 
#> Privacy and confidentialisation information
#> -------------------------------------------
#> 
#>   * counts are rounded using RR3 (random rounding to base 3)
#>   * suppression of counts smaller than 10, indicated by S, with secondary suppression where necessary
#>   * suppression of totals and means where underlying unrounded count < 10
#> 
#> NOTE: this feature is still experimental, and all output should be manually
#> checked before being made public. This is simply to aid that process.
#> 
#> ====================================================================================================
#> 
#> Summary of the distribution of Hair (columns) by Eye (rows):
#> ------------------------------------------------------------
#> 
#> Table of Counts:
#> 
#>            Black   Brown   Red   Blond   Row Total
#>    Brown      69     120    27       S           S
#>     Blue      21      84    18      96         219
#>    Hazel      15      54    15      12          96
#>    Green       S      30    15      15           S
#> 
#> Table of Percentages (within categories of Eye):
#> 
#>             Black    Brown      Red    Blond   Total   Row N
#>    Brown        S        S        S        S       S       S
#>     Blue    9.59%   38.36%    8.22%   43.84%    100%     219
#>    Hazel   15.62%   56.25%   15.62%   12.50%    100%      96
#>    Green        S        S        S        S       S       S
#> 
#> 
#> ====================================================================================================
#> 
#>