Pies are for eating

Bar charts beat pie charts almost every time – I say almost because I am open to being convinced … but I won’t be holding my breath.

Disadvantages of pie charts:
–          pie slices ranking is difficult to see even if attempted
–          rely on color which is an extra level of mental effort for the viewer to process
–          colors are useless if printed in black and white
–          difficult to visually assess differences between slices
–          3-D is worse because it literally makes the nearest slice look bigger than it should be

Advantages of bar charts:
–          easier to read
–          no need for a legend
–          we can rank the bars easily and clearly
–          easy to visually assess difference

If you google “pie charts” you’ll find a bunch of people ranting far worse than me. Here is a good collation of some of the best arguments.

All that being said, data visualization is a matter of taste and personal preference does come into it. At the end of the day it’s about how best we can communicate our message. I wouldn’t dare say we should never use pie charts but personally I tend to avoid them.


Localized real estate cost comparison

Comparison of real estate costs across different regions presents a challenge because location has such a large impact on rent and operation & maintenance (O&M) costs. This large variance in costs makes it difficult for organizations to compare costs across regions.

“There are three things that matter in property: Location, location, location!” British property tycoon, Lord Harold Samuel

For example, imagine two federal agencies, each with 100 buildings spread across the US. Due to their respective missions, agency A has many offices in rural areas, while agency B has many downtown office locations in major US cities.

Agency B has higher rent costs than agency A. This cost difference is largely explained by location – agency B offices are typically in downtown locations whereas agency A offices are often in rural areas. To truly compare costs we need to control for location.

However, we cannot conclude from this picture that agency B is overspending on rent. We can only claim agency B is overspending if we can somehow control for the explanatory variable that is location.

Naïve solution: Filter to a particular location, e.g. county, city, zipcode, etc, and compare costs between federal agencies in that location only. For example we could compare rents between office buildings in downtown Raleigh, NC. This gives us a good comparison at a micro level but we lose the macro nationwide picture. Filtering through every region one by one to view the results is not a serious option when there are thousands of different locations.

I once worked with a client that had exactly this problem. Whenever an effort was made to compare costs between agencies, it was always possible (inevitable even) for agencies to claim geography as a legitimate excuse for apparent high costs. I came up with a novel approach for comparing costs at an overall national level while controlling for geographic variation in costs. Here is a snippet of some dummy data to demonstrate this example (full dummy data set available here):

Agency Zip Sqft_per_zip Annual_Rent_per_zip ($/yr)
G 79101 8,192 33,401
D 94101 24,351 99,909
A 70801 17,076 70,436
A 87701 25,294 106,205
D 87701 16,505 70,275
A 24000 3,465 14,986

As usual I make the full dummy data set available here and you can access my R code here. The algorithm is described below in plain English:

  1. For agency X, compute the summary statistic at the local level, i.e. cost per sqft in each zip code.
  2. Omit agency X from the data and compute the summary statistic again, i.e. cost per sqft for all other agencies except X in each zip code.
  3. Using the results from steps 1 and 2, compute the difference in cost in each zip code. This tells us agency X’s net spend vs other agencies in each zip code.
  4. Repeat steps 1 to 3 for all other agencies.

The visualization is key to the power of this method of cost comparison.

Screenshot from Tableau workbook. At a glance we can see Agency B is generally paying more than its neighbors in rent. And we can see which zip codes could be targeted for cost savings.

This plot could have been generated in R but my client liked the interactive dashboards available in Tableau so that is what we used. You can download Tableau Reader for free from here and then you can download my Tableau workbook from here. There is a lot of useful information in this graphic and here is a brief summary of what you are looking at:

The height of each bar represents the cost difference between what the agency pays and what neighboring agencies pay in the same zip code. If a bar height is greater than zero, the agency pays more than neighboring agencies for rent. If a bar height is less than zero, the agency pays less than neighboring agencies. If a bar has zero height, the agency is paying the same average price as its neighbors in that zip code.

There is useful summary information in the chart title. The first line indicates the total net cost difference paid by the agency across all zip codes. In the second title line, the net spend is put into context as a percentage of total agency rent costs. The third title line indicates the percentage of zip codes in which the agency is paying more than its neighbors – this reflects the crossover point on the chart, where the bars go from positive to negative.

There is a filter to select the agency of your choice and a cost threshold filter can be applied to highlight (in orange) zip codes where agency net spend is especially high, e.g. a $1/sqft net spend in a zip code where the agency has 1 million sqft is costing more than a $5/sqft net spend in a zip code where the agency has only 20,000 sqft.

The tool tip gives you additional detailed information on each zip code as you hover over each bar. In this screenshot zip code 16611 is highlighted for agency B.

At a glance we get a macro and micro picture of how an agency’s costs compare to its peers while controlling for location! This approach to localized cost comparison provided stakeholders with a powerful tool to identify which agencies are overspending and, moreover, in precisely which zip codes they are overspending the most.

Once again, the R code is available here, the data (note this is only simulated data) is here and the Tableau workbook is here. To view the Tableau workbook you’ll need Tableau Reader which is available for free download here.


Geographic Information Systems

Geographic information system is a clunky term for what the layman simply calls maps. Ok, ok, there is more to it than that with shapefiles and polygons and metadata, etc, etc. But the general gist is a visualization of geographical data and this challenge has been tackled for aeons, or at least for a long time before data-viz became trendy. In 2012, Tableau put together what they consider to be The 5 Most Influential Data Visualizations of All Time and I was not surprised to see John Snow’s Cholera Map of London in the mix as well as Napoleon’s March on Moscow (which is kinda sorta GIS mapsy).

Personally, I cut my teeth in GIS as a young civil engineer when I worked in the Irish sewerage and rain water drainage industry – this Wad River Catchment Flood Study (pdf) includes some elegant geographic visualizations that I helped develop. Being a an engineer during the Irish property bubble I witnessed a lot of housing construction in areas where there was subsequently little or no demand. Depending on who you talk to this was either due to greedy bankers, over-exuberance in the market or a myriad of other explanations that spew from experts’ mouths. Given my proximity to the construction industry I was keenly aware of various tax incentives that were on offer for building houses in certain geographic areas and I suspected these government interventions, although well-intentioned, may have had a negative impact.

In 2012 I analyzed the National Unfinished Housing Developments Database to see if there was a link between government incentives and ghost estates. The results of my geographic analysis indicate that, yes, there is some evidence to suggest that the government exacerbated the housing bubble/bust for the very areas they were trying to help. My analysis was crude but compelling (if I do say so myself!)