10 Third-Party Data

You can use the chart_*_by() family of functions with third-party data too.

This set of examples pulls from an R library, ECDMS, that contains tidied data harvested from the California Energy Consumption Database (ECDMS). These data, published by the California Energy Commission (CEC), provide estimates of annual natural gas and electricity consumption in California.

To gain access, we just type library(ECDMS).

library(inventory)
library(ECDMS)

10.1 Natural Gas Consumption

In the ECDMS_gas_county_data dataset, natural gas consumption data are split by county, sector, and year. This dataset covers all of California, from CY1990 through CY2018. Here’s a preview:

head(ECDMS_gas_county_data)
year sector county tput_qty tput_unit
CY1990 Non-Residential Alameda 229.905609 MMthm
CY1990 Non-Residential Amador 11.658194 MMthm
CY1990 Non-Residential Butte 17.028776 MMthm
CY1990 Non-Residential Calaveras 0.331382 MMthm
CY1990 Non-Residential Colusa 11.785811 MMthm
CY1990 Non-Residential Contra Costa 701.736398 MMthm

What we’re interested in is tput_qty — so chart_annual_throughputs_by() will “just work”.

10.1.1 San Francisco Bay Area (9 Counties)

Let’s take SFBA_gas_county_data to be the subset belonging to Alameda, Contra Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Solano, and Sonoma counties.

#
# Note: `DST_COUNTY_NAMES` is supplied by the `inventory` package.
#
SFBA_gas_county_data <-
  ECDMS_gas_county_data %>%
  filter(
    county %in% names(DST_COUNTY_NAMES)) 

Now let’s create a set of plots:

  • Throughput, total
  • Relative growth, total
  • Stacked throughputs, by sector
  • Relative growth, by sector
  • Stacked throughputs, by county
  • Relative growth, by county

For the relative-growth charts, we’ll specify base_year = CY(2011).

SFBA: Total Consumption

SFBA_gas_county_data %>%
  chart_annual_throughputs(
    flag_years = CY(1990, 2011))

SFBA: Total Growth

SFBA_gas_county_data %>%
  chart_annual_growth(
    base_year = CY(2011),
    flag_years = CY(1990))

SFBA: Consumption by Sector

SFBA_gas_county_data %>%
  chart_annual_throughputs_by(
    fill = sector)

SFBA: Growth by Sector

SFBA_gas_county_data %>%
  chart_annual_growth_by(
    color = sector,
    base_year = CY(2011),
    flag_years = CY(1990))

Here we see that, in the nine-county Bay Area, Non-Residential gas consumption has been trending slightly upward over time, at about 0.3% growth per year. Residential consumption, on the other hand, has been trending slightly downward, at about -0.3% per year.

SFBA: Consumption by County

SFBA_gas_county_data %>%
  chart_annual_throughputs_by(
    fill = county)

SFBA: Growth by County

SFBA_gas_county_data %>%
  chart_annual_growth_by(
    color = county,
    base_year = CY(2011))


10.1.2 PG&E Planning Area

Above, we saw that ECDMS_gas_county_data is split by county. There is a different tabular dataset, ECDMS_gas_plan_data, that is split instead by plan (“planning area”).

Let’s take PGE_gas_plan_data to be the subset where plan is “Pacific Gas and Electric”.

PGE_gas_plan_data <-
  ECDMS_gas_plan_data %>%
  filter(
    plan == "Pacific Gas and Electric") 

Now let’s recreate a similar set of plots:

  • Throughput, total
  • Relative growth, total
  • Stacked throughputs, by sector
  • Relative growth, by sector

The total annual consumption is higher in PGE_gas_plan_data — more like 5 billion therms, versus the 3 billion therms we saw in SFBA_gas_county_data.

PG&E: Total Consumption

PGE_gas_plan_data %>%
  chart_annual_throughputs(
    flag_years = CY(1990, 2011))

PG&E: Total Growth

PGE_gas_plan_data %>%
  chart_annual_growth(
    base_year = CY(2011),
    flag_years = CY(1990))

PG&E: Consumption by Sector

PGE_gas_plan_data %>%
  chart_annual_throughputs_by(
    fill = sector)

PG&E: Growth by Sector

PGE_gas_plan_data %>%
  chart_annual_growth_by(
    color = sector,
    base_year = CY(2011))

If we like, we can collapse these sectors into “Residential”, “Industrial/Commercial”, and “Other”.

PGE_gas_plan_data %>%
  mutate_at(
    vars(sector),
    ~ fct_collapse(
      .,
      "Residential" = "Residential",
      "Commercial/Industrial" = c(
        "Industry", 
        "Commercial Building",
        "Commercial Other"),
      other_level = "Other")) %>%
  chart_annual_growth_by(
    color = sector,
    base_year = CY(2011),
    flag_years = CY(1990))

This shows something different than the analysis by county.

  • Residential consumption (MMthm). In this PGE_gas_plan_data, we see about 2 billion thm/yr. In the SFBA_gas_county_data, we only saw about 1 billion thm/yr. Either something is wrong, or these are estimates for different populations, and hence not perfectly comparable.

  • Residential growth. In this PGE_gas_plan_data, we can see a strong long-term decline in Residential consumption — CY1990 was 65% higher than CY2011, so about -3% per year. That’s a full order of magnitude larger than the 0.3%/yr we saw in SFBA_gas_county_data.

  • Commercial/Industrial consumption (MMthm). The amount of natural gas consumed by the Commercial and Industrial sectors, as reported in PGE_gas_plan_data, is almost twice as high as that labeled “Non-Residential” in SFBA_gas_county_data (3 billion vs 1.5 billion therms, respectively).

  • Commercial/Industrial growth. Even though the two datasets don’t cover the same non-residential population, compared to the “Non-Residential” consumption in SFBA_gas_county_data, the long-term growth in the Commercial and Industrial sectors here looks similarly flat — a fraction of a percent per year.

Recall that PGE_gas_plan_data comprises almost 5 billion therms, whereas SFBA_gas_county_data comprised only 3 billion. So, these differences might be reconcilable. There could be large populations of consumers covered by one that aren’t covered by the other.

This section was intended to give you a sense of what you can do with third-party activity data. You can do very similar things, of course, with third-party emission data. The key is to have the expected variables in your dataset (tput_qty and tput_unit, and/or ems_qty and ems_unit). And, you can always fall back to the more generic chart_annual_quantities() and/or chart_annual_growth(), so long as your dataset has at least one variable ending in _qty.