Creating an interactive report using R and shiny

Creating an interactive report using R and shiny

The following article describes my journey on building a shiny app for data visualization during my internship at arconsis. The used data is from the Freshindex project. The Freshindex (FI) is a real-time shelf life indicator to optimize the supply chain of food economically and ecologically. There are some Tableau reports that have been created by one of my colleagues. I tried to recreate one of them using an open source tool. Since I’m studying “technical economics” I am already a little bit familiar with R. Hence, I decided to use this statistic programming language. Therefore, I installed anaconda first and imported the libraries shiny and ggplot2 in R afterwards. It wasn’t my very first experience with R, but it was with shiny.

Each line of the following table represents an event that was triggered by the Virtual Supply Chain (VSC). In the table there are events of the type create for the article type Living Pig that happened at the participant Test Slaughterer A at a certain time (Timestamp_H). For each event there is a calculated FI value (calculated with hygiene and temperature data during a life cycle). These values are helpful for the quality control of articles across supply chains.

Snapshot of the raw dataset

Click here for the whole code

Let's start the tutorial

The first step is exporting the entire dataset. Afterwards you have to check whether the data is already prepared for the next steps. If it’s not the case, then you have to do it first. Luckily, my colleague has already done the data preparation (Note: data preparation can take 80% of time in a data analytics project).

Now you can start with R. Just open a new R Script and load your packages first. In our case we are using “shiny” for the user interface layouts and the application and “ggplot2” for the plot itself.

library(shiny)
library(ggplot2)

For importing the prepared dataset, we use the command read.csv to create a data frame out of it and then pass the pathname of our dataset as a parameter. Sometimes the dataset is encoded differently so R cannot decode it by default thus you have to pass a reencoding format as an argument along with the field separator character and the character used in the file for decimal points if they are different from defaults.

Since that dataset has lots of columns we are not going to use, we create a new data frame by copying all required columns.

mydata <- mydata_full[, c(1,2,3,4,28)]

I decided to rename the columns as you can see in the following snapshot only for better use.

Snapshot of the data frame that is going to be used from now on

By typing str(mydata) in the console you will get the structure of that data frame.
There you will see an output which indicates that the column names are followed by the type and other information in rows.

Snapshot of the structure of the data frame

You will notice that the first row called Timestamp is a Factor but since the data inside that column is a date and we want to use it as a date and not as a factor we have to convert it. Therefore we create a new column in mydata called Time.

mydata$Time <- as.POSIXct(mydata$Timestamp, format = “%d.%m.%y %H:%M”)

You can address columns via the name of your data frame + $ + the column name.

Snapshot of addressing a column of the data frame

User interface

Now we are ready to start with the user interface. As I’ve already mentioned, we are going to use the library shiny. It has two components/ functions viz. UI and server. As the name suggests, in the UI function you can define the UI layout and the server function is used to define your graph logic e.g. the type of the graph, how to select the data based on the user inputs etc.

At first, we need to think about the UI layout i.e. how/ what the plot should look like. I decided to separate the page into two columns, the bigger one is for the plot itself and the smaller column is used for the user inputs like the slider and the checkbox and whatever we want to add.

Snapshot of separating the page into two columns

I colored the smaller column in blue and the other one in pink just to make the columns visible. If you want to have the plot on the left and the filters on the right, you will simply have to change the order of the columns, so the blue column comes after the pink one. Remember that the entire page width size is 12, that means that no matter how many columns you want to create, their widths have to add up to 12.
After setting column(width = 2, ...) for example, we can start creating the filters which will be passed as arguments to that column.

The first input object is going to be a radio button for the participant types which we create by passing an inputId that will be used to access the value, a label which will be the display label for the control, and choices which are a list of values to select from. Here we pass all participant types that are contained in mydata. By adding type.convert and as.is = TRUE the values will be converted from factors to character vectors because that’s the required type for the choices. The command unique returns a vector but with duplicate elements/rows removed. Again, you can check the types with str().

radioButtons(inputId = "ptype", label = "Participant type", 
    choices = type.convert(unique(mydata$Participant_type), as.is = TRUE), 
    selected = "Store”)

Also, I decided to set an initially selected value which is “Store”.

Our next step is creating the slider with the timestamp, and this is where our lately created data frame column Time is going to be used. So besides an id and a label, for the sliderInput() we have to pass the minimum value and the maximum value of the slider range. With value = we set the initial value of the slider. We also have to set the timeFormat and the timezone which are only used if the values are Date or POSIXt objects.

sliderInput(inputId = "timestamp", label = "Timestamp", 
    min = min(mydata$Time), max = max(mydata$Time), 
    value = c(min(mydata$Time), max(mydata$Time)), 
    timeFormat = "%H:%M:%S", timezone = "+0200")

For example, if you don’t set the timeFormat or the right timezone you will simply get different dates than your dataset actually has. This can happen if the default time zone of the computer is different from the time in the dataset. So instead of “14 o’clock” you might get “15 o’clock” and if your dataset says “14:07:10” obviously you have to set the numbers for hours, minutes and seconds, if they are not the same by the default. Hence, keep that in mind when working with date and time.

Moving on with a checkbox for the article types, we are going to use the checkboxGroupInput() so the user will be able to select multiple choices. Again, we pass an id and a label as well as all possible choices and the default selection. Therefore, the parameters are similar to the parameters of the radioButtons() used before.

checkboxGroupInput(inputId = "atype", label = "Article type", 
    choices = type.convert(unique(mydata$Article_type), as.is = TRUE), 
    selected = "Pork schnitzel”)

Our fourth and easiest output object is going to be a download button from which the user can download the dataset of the selected choice. And again, we pass an id and a label. And that’s it for that button. But don’t confuse the inputId of all input objects before with the outputId of the following objects.

downloadButton(outputId = "report", label = "Generate report”)

Alright, now we have added everything we want to our left column which contains all input objects. The last thing to do in the UI is to define our output which is going to be a plot. Therefore, we set the size of our second column column(width = 10, which has to be 10 because our left column has a width of 2, and the only additional parameter this column will contain is plotOutput(outputId = "plot”). Don’t forget to close all open brackets and you are done with the UI!

Server

Continuing with the server function, we set a reactive expression for the selected dataset. It uses a user input (e.g. checkbox, radio buttons etc. as defined in the UI function) and returns a value. This value is saved after the first time running the expression. It updates the result whenever the widget it depends on has changed. If it’s not the case, the reactive expression returns the saved value without computing, which makes the app faster. I decided to call this expression mychoice and subset mydata where the participant type is equal to the user input of the radio button, the article type is equal to the input of the checkbox and the time range is equal to the input of the slider values.

mychoice <- reactive({
    subset(mydata, Participant_type %in% input$ptype 
        & Article_type %in% input$atype 
        & Time >= input$timestamp[1] & Time <= input$timestamp[2])
    })

Now we can write the code for our plot.

output$plot <- renderPlot({...})

You can address the plot via its outputId so if you defined your plotOutput() in the UI as myplot123 you would have to address it via output$myplot123 just like you address everything else via input/output/… + $ + id.

Even before creating the plot, we need to check if that data frame mychoice is not null, based on the user’s input selection. The following line of code does the trick:

if (length(row.names(mychoice())) == 0) {
      print("Values are not available")
    }

Because if there aren’t any values present, an error will appear every time. For example, if the user selects “Living pig” as the article type and “Store” as the participant type, you will not get any value, because obviously there aren’t any living pigs in the stores (they’re present for the slaughterhouse).
So instead of receiving an error every time, we just add the print() command which will only appear in the console.

Snapshot of the output in the console instead of an error in the app

For the plot itself we are going to use the library ggplot2 because it simplifies adding different features to plots. First, we create a plot by passing the data frame to the function.

myplot <- ggplot(mychoice(), aes(x = Time, y = Value, colour = Article_type))

The aes() argument means aesthetics and it defines how the color, shape and size should change based on the variables. So, we set a different color based on the article type.

Afterwards we add data points to the plot and select their size and transparency.

myplot <- myplot + geom_point(size = 3, alpha = 0.4)

By adding + labs(title = "Fresh Index", x = "Minute of Timestamp [May 9, 2019]", y = "Value”) to myplot we can add a title to the plot as well as the axis labels.

Snapshot of the plot in process

By adding a theme, we can make the title bigger and bold and add a little space at the baseline. Also, we can change the color of the panel as well as the size and set styling on other elements like panel border, grid lines, text size and text position, legend position and legend text styling.

Since we expect an individual plot based on the user’s selection of “Participant type", we need to add a further command to create a single column of plots.

myplot <- myplot + facet_wrap( ~ Participant, ncol = 1)

The two last things to do in the renderPlot({}) block are to add a print() function inside of the curly brackets and define the plot size afterwards.

print(myplot) },
height = 600, width = 1000

Finally, we are done with rendering the plot output, so we can begin with rendering the download button output. Therefore, we have to select a default file name and the content to be saved.

filename = function() {
  paste("mychoice", Sys.Date(), ".csv", sep = "_")
},
content = function(file) {
  write.table(mychoice(), file, row.names = FALSE,sep = '|')
}

Everything is done so far and now we can run the app.

shinyApp(ui, server)
Snapshot of the final result of the plot

That’s it!

In this article, we showed you how to build a simple interactive report with shiny and R.

Click here for the whole code