Table Guide (draft)

A list of helpful tips creating tables.

Team 346 pol346.com (Princeton Univeristy Department of Politics)princeton.edu/politics
2020-05-04

Table of Contents


Kable

This tutorial will be used to explain how you can make a table with the kable function. This type of table is specifically useful when you are trying to make a table for summary statistics with mean, median, number of terms, etc for your data. It is also useful to create a table with certain rows of your data or a summary table for a t-test.

If you do not have kable Extra yet, use install.packages to install. Then, we’ll download the package and go through an example. In order to access the example, use library to access the Sleuth3 package and put a meaningful name to ex0222. This example contains data documenting scores on the Armed Forces Qualifying Tests, which is a test for intelligence. This study was done to settle a lot of controversial and definitely wrong debates regarding the intelligence of women versus men. In particular, the test gives a score for arithmetic reasoning, word knowledge, paraphgraph comprehension, and mathmatical knowledge.


AFQT <- Sleuth3::ex0222
AFQT %>% glimpse() # look at data 

Rows: 2,584
Columns: 6
$ Gender <fct> male, female, male, female, female, female, female, …
$ Arith  <int> 19, 23, 30, 30, 13, 8, 10, 4, 12, 3, 30, 10, 10, 28,…
$ Word   <int> 27, 34, 35, 35, 30, 15, 17, 17, 33, 11, 33, 16, 16, …
$ Parag  <int> 14, 11, 14, 13, 11, 6, 6, 6, 13, 5, 15, 3, 11, 14, 5…
$ Math   <int> 14, 20, 25, 21, 12, 4, 7, 6, 11, 6, 24, 7, 6, 18, 7,…
$ AFQT   <dbl> 70.3, 60.4, 98.3, 84.7, 44.5, 4.0, 11.8, 8.9, 44.7, …

Now, that we’ve downloaded and explored the data a little bit, let’s make a summary statistics table which gives the average for math, word, paragraph, and artithmeitc scores per gender using kable.

Creating summary statistics

To do this, first make a data frame for your summary statistics.


summary_stats <- AFQT %>%
  group_by(Gender) %>%
  summarize(
    mean_Arith         = mean(Arith),
    mean_Word          = mean(Word),
    mean_Paragraph     = mean(Parag),
    mean_Math          = mean(Math),
    number_of_subjects = n()
  )

And now, let’s make the table! to do this use the data fram for summary stats you just made and use a pipe to carry to kable. in the kable function, format will refer to how it prints after knitting. If knitting to HTML, use “html and if to PDF use”latex" , caption lets you make a title and booktabs makes neaer when knitting to PDF. Then pipe the kable function to kable_styling to add nicer formation like making stripes and adjusting width


summary_stats %>%
  kable(
    format   = "html",
    caption  = "Test Scores Summary by Gender",
    booktabs = TRUE,
    digits   = 2
  ) %>%
  kable_styling(
    bootstrap_options = "striped",
    full_width        = FALSE
  )
Table 1: Test Scores Summary by Gender
Gender mean_Arith mean_Word mean_Paragraph mean_Math number_of_subjects
female 17.5 26.6 11.5 13.8 1278
male 19.5 26.6 10.9 14.6 1306

Working with \(t\)-tests

One interesting question for researches was to see if there were signficantly different results between genders on sections of the tests. So let’s do t-tests for each section and make tables for each using kable using kable.


names(AFQT)

[1] "Gender" "Arith"  "Word"   "Parag"  "Math"   "AFQT"  

t_arith <- t.test(Arith ~ Gender, data = AFQT)
t_Word  <- t.test(Word  ~ Gender, data = AFQT)
t_parag <- t.test(Parag ~ Gender, data = AFQT)
t_math  <- t.test(Math  ~ Gender, data = AFQT)

# Converts t.test object to data.frame
t_arith_df <- tidy(t_arith)  
t_word_df  <- tidy(t_Word)
t_parag_df <- tidy(t_parag)
t_math_df  <- tidy(t_math)


t_arith_df %>%
  dplyr::select(-method, -alternative) %>% # drop extra cols
  # rename to make names more understandable
  rename(
    mean_group_female = estimate1,
    mean_group_male   = estimate2,
    t_statistic       = statistic,
    df                = parameter
  ) %>%
  kable(
    format   = "html",
    caption  = "t-test for Arithmetic vs Gender",
    booktabs = TRUE,
    digits   = 2
  ) %>%
  kable_styling(
    bootstrap_options = "striped",
    full_width        = FALSE
  )
Table 2: t-test for Arithmetic vs Gender
estimate mean_group_female mean_group_male t_statistic p.value df conf.low conf.high
-2.04 17.5 19.5 -7.31 0 2574 -2.58 -1.49

t_math_df %>%
  dplyr::select(-method, -alternative) %>% # drop extra cols
  # rename to make names more understandable
  rename(
    mean_group_female = estimate1,
    mean_group_male   = estimate2,
    t_statistic       = statistic,
    df                = parameter
  ) %>%
  kable(
    format   = "html",
    caption  = "t-test for Math vs Gender",
    booktabs = TRUE,
    digits   = 2
  ) %>%
  kable_styling(
    bootstrap_options = "striped",
    full_width = FALSE
  )
Table 2: t-test for Math vs Gender
estimate mean_group_female mean_group_male t_statistic p.value df conf.low conf.high
-0.75 13.8 14.6 -3.05 0 2573 -1.24 -0.27

t_word_df %>%
  dplyr::select(-method, -alternative) %>% # drop cols
  # rename to make names more understandable
  rename(
    mean_group_female = estimate1,
    mean_group_male   = estimate2,
    t_statistic       = statistic,
    df                = parameter
  ) %>%
  kable(
    format   = "html",
    caption  = "t-test for Word vs Gender",
    booktabs = TRUE,
    digits   = 2
  ) %>%
  kable_styling(
    bootstrap_options = "striped",
    full_width        = FALSE
  )
Table 2: t-test for Word vs Gender
estimate mean_group_female mean_group_male t_statistic p.value df conf.low conf.high
0.02 26.6 26.6 0.08 0.94 2581 -0.52 0.57

t_parag_df %>%
  dplyr::select(-method, -alternative) %>% # drop cols
  # renames to make names more understandable
    rename(
    mean_group_female = estimate1,
    mean_group_male   = estimate2,
    t_statistic       = statistic,
    df                = parameter
  ) %>%
  kable(
    format = "html",
    caption = "t-test for Paragraph vs Gender",
    booktabs = TRUE,
    digits   = 2
  ) %>%
  kable_styling(
    bootstrap_options = "striped",
    full_width = FALSE
  )  
Table 2: t-test for Paragraph vs Gender
estimate mean_group_female mean_group_male t_statistic p.value df conf.low conf.high
0.57 11.5 10.9 4.6 0 2562 0.33 0.81

What conclusions can we draw from these tests? Are there confounding factors that would limit these conclusions?

Lastly, let’s make a table that views the first five rows of the data. This is a good skill if you want to see a quick preview of data/explore it before doing anaylsis and show that exploration in a neat way. Can do this using slice which allows you to extract certain rows.


AFQT %>%
  slice(1:5) %>% # extracts 5 rows
  kable(
    format = "html", # format = "latex" for pdfs
    caption = "Some AFQT data",
    digits   = 2
  ) %>%
  kable_styling(full_width = FALSE)
Table 3: Some AFQT data
Gender Arith Word Parag Math AFQT
male 19 27 14 14 70.3
female 23 34 11 20 60.4
male 30 35 14 25 98.3
female 30 35 13 21 84.7
female 13 30 11 12 44.5

xtable

xtable is another table style that prints some object as either a LaTeX or HTML table. In this tutorial, we will run through some sample code for the uses of ANOVA, as well as some tips and tricks for its usage.

Uses of xtable

xtable can print many R objects in a new object of class xtable. Two common examples of types of tables that can be produced with xtable are ANOVA tables and tables of whole data frames. In POL346, xtable is most commonly used for ANOVA tables.

Data frames as tables

First, let’s read in an example data frame that we can work with in our tables. The following data set shows the years of the Kentucky Derby, the winners, their average speed and track conditions between 1896-2011.


derby <- Sleuth3::ex0920 %>% janitor::clean_names()

To show how to use xtable to print the data frame as a table, we will start by using head(derby) to print just the first six rows of the data frame.


head(derby) %>% 
  xtable() %>%
  print(type = "html") # change to type = "latex" for PDF output
year winner starters net_to_winner time speed track conditions
1 1896 Ben Brush 8 4850 127.75 35.23 Dusty Fast
2 1897 Typhoon II 6 4850 132.50 33.96 Heavy Slow
3 1898 Plaudit 4 4850 129.00 34.88 Good Fast
4 1899 Manuel 5 4850 132.00 34.09 Fast Fast
5 1900 Lieut. Gibson 7 4850 126.25 35.64 Fast Fast
6 1901 His Eminence 5 4850 127.75 35.23 Fast Fast

There are two important things to remember here. First, in order to have the table print, it is necessary to place “results = ‘asis’” in the chunk header. Second, notice print(type = “html”). This can be changed to print(type = “latex”), depending on the output file type.

ANOVA Tables

Now we will discuss the more common type of table that will made with xtable: an ANOVA table. ANOVA tables are made to compare various models of relationships with data in order to find the model with the best fit. Suppose we have three linear regression models for our derby data, as shown below.


full     <- lm(data = derby, speed ~ year + track)
reduced  <- lm(data = derby, speed ~ year)
interact <- lm(data = derby, speed ~ year*track)

In order to compare the fits of these three models, we would make an ANOVA table with xtable, as is shown below. Here, an ANOVA object is being piped into xtable. Once again, don’t forget to include results = ‘asis’ to see the table when knitting.


anova(reduced, full, interact) %>%
  xtable() %>%
  print(type = "html") # change to type = "latex" for PDF output
Res.Df RSS Df Sum of Sq F Pr(>F)
1 114 41.84
2 108 21.37 6 20.46 17.03 0.0000
3 103 20.62 5 0.75 0.75 0.5893

Linear Regression

Regression outputs can also be visualized in xtable, as seen below with a lienar regression. The same concept applies to glm regressions as well. However, stargazer is most likely the better option in this case, as stargazer is better at producing a professional-looking regression table with the stars showing statistical significance.


full %>%
xtable() %>%
  print(type = "html")
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0947 2.6061 2.34 0.0212
year 0.0154 0.0014 11.35 0.0000
trackFast 0.3247 0.4556 0.71 0.4776
trackGood 0.0208 0.4739 0.04 0.9650
trackHeavy -1.3254 0.4809 -2.76 0.0069
trackMuddy -0.7660 0.4845 -1.58 0.1168
trackSloppy -0.3726 0.5068 -0.74 0.4638
trackSlow -0.3714 0.4895 -0.76 0.4496

xtable Tips & Tricks

Now that we have discussed what xtable can be used for and the barebones code of how to make a table, we will discuss other options to make our tables look just how we want them.

Titles

Titles for xtables objects can be made using the caption option.


anova(reduced, full, interact) %>%
  xtable(
    caption = "ANOVA table for Derby linear models"
    ) %>%
  print(type = "html") # change to type = "latex" for PDF output
ANOVA table for Derby linear models
Res.Df RSS Df Sum of Sq F Pr(>F)
1 114 41.84
2 108 21.37 6 20.46 17.03 0.0000
3 103 20.62 5 0.75 0.75 0.5893

Table Placement

Oftentimes, the position of the table itself floats. Use table.placement = “h” to fix this.


anova(reduced, full, interact) %>%
  xtable(
    caption         = "ANOVA table for Derby linear models", 
    table.placement = "h"
    ) %>%
  print(type        = "html") # type = "latex" for PDFs
ANOVA table for Derby linear models
Res.Df RSS Df Sum of Sq F Pr(>F)
1 114 41.84
2 108 21.37 6 20.46 17.03 0.0000
3 103 20.62 5 0.75 0.75 0.5893

If this doesn’t work on its own, you may need to add \usepackage{float} to the header of your r markdown file, as seen below:


title: "Example" 
date: "29 April 2019" 
  output: pdf_document: 
header-includes:
- \usepackage{float}

Suppressing Messages

Often, when knitting to a pdf with latex, there is a message that is produced that says “latex table generated in R 3.5.2 by xtable 1.8-3 package”. To correct this, after loading the xtable library, insert:


library(xtable)
options(xtable.comment = FALSE)

This supplement was put together by Amna Amin, Kavya Chaturvedi and Omar Wasow.