Commercial Energy Rates – R Shiny App

Here is a link to a US commercial energy rates shiny application I posted to in 2014. The app pulls in data from the US Energy Information Administration which posts US residential, commercial, industrial and transportation energy prices at the state level to this webpage on a monthly basis. At the time I created this app my employer was specifically interested in commercial energy rates.

commercial energy rates
Screengrab from EIA website where energy rates are posted every month

The app also pulls in 2013 US census population data from the US Census Bureau. [Note: reading in data from external websites like this is a little risky since it creates an external dependency but at the time I was keen to learn new things about R! For a more robust application, just get the population data one time and store it on your own system where you can control it.]

So what?

My company sold energy management hardware and software solutions to commercial entities in retail, grocery and fast food industries. Like any company we had limited sales and marketing resources. This tool helped efficiently allocate those resources to where returns were more likely, i.e. states with high energy costs and/or states with rising energy costs. Yes, you could just look up the data from tables on the website but this visualization makes it instantly apparent where the most interesting data opportunities are.

energy rates app
Bubble plot to demonstrate costs, change in costs, state population

Key features of the app:

  • By labelling the points and color coding them according to geographic region we can see at a glance that California and a bunch of northeastern states have substantially higher energy costs than the rest of the country.
  • We can also see that West Virginia has seen a 10+% year-on-year increase in costs. Their rates are still low relative to other states but if you are running a business in West Virginia, you are going to feel a 10+% increase in running costs.
  • The size of the points is proportional to population. A useful visual reminder that the opportunities in West Virginia may be slim on the ground (although the scenery is stunning and the locals are friendly!)

The app also features links to the raw data sources and 2 map views – notice again how California and West Virginia stand out for highest rates and highest increase in rates respectively.

energy rate maps
Map views for all you GIS nerds out there!

As usual, here is a link to the code that I have saved in Google Drive and that you are free to download and run in R. Notes on running this shiny application:

  1. Download the zip file and extract the 2 R files to a suitable location on your device. Let’s assume you saved them to “C:/My Documents/energyRatesApp”.
  2. Set your working directory to this address: setwd(“C:/My Documents/”)
  3. Ensure the shiny library is installed: install.packages(“shiny”)
  4. Run the app: shiny::runApp(“energyRatesApp”)
  5. You may get error messages if all the required libraries are not installed. Simply install the necessary libraries and try again.

Call yourself a data scientist, eh?

I won’t add to the proliferation of “What’s a data scientist?” posts on the internet. I am as bored of them as you are. Instead I will point you to this Venn diagram from Drew Conway which nails it in my opinion. In my analytics career I aim for the center of this chart and so far I am doing reasonably well:

  • Hacking skills: I’ve been coding in R regularly since 2011, SAS Base certified since 2013, proficient in Excel, capable in SQL, Tableau, MicroStrategy, Business Objects, SPSS Modeler and some GIS tools.
  • Math & Statistics Knowledge: I have an engineering bachelor’s and MS, graduate certificate in statistics and an MS in business analytics. Credentials – check! Thankfully I have had the good fortune to put this knowledge to work including the development and application of decision trees, regression, association rule mining, k-nearest neighbors, k-means clustering, principal component analysis, random forests, genetic algorithms, hierarchical clustering, bootstrap, cross validation, logistic regression, naive Bayes classifiers, etc.
  • Substantive Expertise: I have been lucky to get my hands on data in tax, insurance, cell phone, finance, energy, real property and healthcare industries to name a few. To obtain substantive expertise it is necessary to put in time in the real world. Books will only teach you so much.

I can largely cover all three circles in the Venn diagram, but I’ll never be comfortable referring myself as an expert in any one of them, never mind all three. Here’s why – Mark Little is a favorite journalist/entrepreneur of mine and I like his ten principles of social journalism, particularly number one: “There is always someone closer to the story.” I generalize this principle and constantly remind myself that there are always people with more knowledge than me in the above three areas. Call that humility or insecurity or whatever, but it keeps me on my game.

Random Forest Model of Building Energy Consumption

I built an R Shiny app to model a building’s energy consumption. The app allows the user to select a baseline period and observe the building’s performance post-baseline. The energy rate can be adjusted to provide a quick estimate of savings (or losses) in the post-baseline period.

So what: My clients wanted to know the effect of energy saving measures they were taking, e.g. new AC units, new freezers, new temperature settings, new lighting, etc. Conversely they might want to assess the negative impact of an event (e.g. equipment malfunction). In the screenshot below energy use has increased post-baseline which would be a concern to any business. This tool gives decision makers and sales engineers quick and “good enough” information they need to take action.

Screenshot of the app

It is no surprise that ambient temperature has the biggest impact on electricity consumed. Below we see a typical annual profile – higher temperatures in summer lead to higher AC use. Note that if you use electric heating instead of gas your winter electricity consumption could be just as high if not higher than your summer consumption.

Typical annual profile of local temperature and energy consumption

The conventional way to build a model of a building’s energy consumption is to use temperature records to compute the number of degree days in each billing period (typically monthly), get the energy consumed from the monthly utility bill and build a regression model (with only 12 data points). For more on degree days and a good account of the pros and cons of this modelling approach in general see here.

When I worked at an energy management company I gained valuable insight into how office, retail, pharmacy and grocery buildings consume electricity. A quick glance at a daily profile of energy consumption (aside: my old statistics lecturer always stressed the importance of drawing pictures early and often) shows how me there is more than temperature influencing energy consumed: namely business operating hours. I was fortunate to have temperature and energy consumption data at the hourly level and therefore had the opportunity to develop a much richer model of energy consumption.

The rmd script is available here on Google Drive. It pulls in data from a publicly available google sheets page so you should be able to download it and run it in RStudio without any fuss. Hopefully I have commented my code sufficiently well but please contact me if you have any questions. For more on random forests check out this video from about 41:20.