10 Third-Party Data

You can use the chart_*_by() family of functions with third-party data too.

This set of examples pulls from an R library, ECDMS, that contains tidied data harvested from the California Energy Consumption Database (ECDMS). These data, published by the California Energy Commission (CEC), provide estimates of annual natural gas and electricity consumption in California.

To gain access, we just type library(ECDMS).

library(inventory)
library(ECDMS)

10.1 Natural Gas Consumption

In the ECDMS_gas_county_data dataset, natural gas consumption data are split by county, sector, and year. This dataset covers all of California, from CY1990 through CY2018. Here’s a preview:

head(ECDMS_gas_county_data)

year	sector	county	tput_qty	tput_unit
CY1990	Non-Residential	Alameda	229.905609	MMthm
CY1990	Non-Residential	Amador	11.658194	MMthm
CY1990	Non-Residential	Butte	17.028776	MMthm
CY1990	Non-Residential	Calaveras	0.331382	MMthm
CY1990	Non-Residential	Colusa	11.785811	MMthm
CY1990	Non-Residential	Contra Costa	701.736398	MMthm

What we’re interested in is tput_qty — so chart_annual_throughputs_by() will “just work”.

10.1.1 San Francisco Bay Area (9 Counties)

Let’s take SFBA_gas_county_data to be the subset belonging to Alameda, Contra Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Solano, and Sonoma counties.

#
# Note: `DST_COUNTY_NAMES` is supplied by the `inventory` package.
#
SFBA_gas_county_data <-
  ECDMS_gas_county_data %>%
  filter(
    county %in% names(DST_COUNTY_NAMES))

Now let’s create a set of plots:

Throughput, total
Relative growth, total
Stacked throughputs, by sector
Relative growth, by sector
Stacked throughputs, by county
Relative growth, by county

For the relative-growth charts, we’ll specify base_year = CY(2011).

SFBA: Total Consumption

SFBA_gas_county_data %>%
  chart_annual_throughputs(
    flag_years = CY(1990, 2011))

SFBA: Total Growth

SFBA_gas_county_data %>%
  chart_annual_growth(
    base_year = CY(2011),
    flag_years = CY(1990))

SFBA: Consumption by Sector

SFBA_gas_county_data %>%
  chart_annual_throughputs_by(
    fill = sector)

SFBA: Growth by Sector

SFBA_gas_county_data %>%
  chart_annual_growth_by(
    color = sector,
    base_year = CY(2011),
    flag_years = CY(1990))

Here we see that, in the nine-county Bay Area, Non-Residential gas consumption has been trending slightly upward over time, at about 0.3% growth per year. Residential consumption, on the other hand, has been trending slightly downward, at about -0.3% per year.

SFBA: Consumption by County

SFBA_gas_county_data %>%
  chart_annual_throughputs_by(
    fill = county)

SFBA: Growth by County

SFBA_gas_county_data %>%
  chart_annual_growth_by(
    color = county,
    base_year = CY(2011))

10.1.2 PG&E Planning Area

Above, we saw that ECDMS_gas_county_data is split by county. There is a different tabular dataset, ECDMS_gas_plan_data, that is split instead by plan (“planning area”).

Let’s take PGE_gas_plan_data to be the subset where plan is “Pacific Gas and Electric”.

PGE_gas_plan_data <-
  ECDMS_gas_plan_data %>%
  filter(
    plan == "Pacific Gas and Electric")

Now let’s recreate a similar set of plots:

Throughput, total
Relative growth, total
Stacked throughputs, by sector
Relative growth, by sector

The total annual consumption is higher in PGE_gas_plan_data — more like 5 billion therms, versus the 3 billion therms we saw in SFBA_gas_county_data.

PG&E: Total Consumption

PGE_gas_plan_data %>%
  chart_annual_throughputs(
    flag_years = CY(1990, 2011))

PG&E: Total Growth

PGE_gas_plan_data %>%
  chart_annual_growth(
    base_year = CY(2011),
    flag_years = CY(1990))

PG&E: Consumption by Sector

PGE_gas_plan_data %>%
  chart_annual_throughputs_by(
    fill = sector)

PG&E: Growth by Sector

PGE_gas_plan_data %>%
  chart_annual_growth_by(
    color = sector,
    base_year = CY(2011))

If we like, we can collapse these sectors into “Residential”, “Industrial/Commercial”, and “Other”.

PGE_gas_plan_data %>%
  mutate_at(
    vars(sector),
    ~ fct_collapse(
      .,
      "Residential" = "Residential",
      "Commercial/Industrial" = c(
        "Industry", 
        "Commercial Building",
        "Commercial Other"),
      other_level = "Other")) %>%
  chart_annual_growth_by(
    color = sector,
    base_year = CY(2011),
    flag_years = CY(1990))

This shows something different than the analysis by county.

Residential consumption (MMthm). In this PGE_gas_plan_data, we see about 2 billion thm/yr. In the SFBA_gas_county_data, we only saw about 1 billion thm/yr. Either something is wrong, or these are estimates for different populations, and hence not perfectly comparable.
Residential growth. In this PGE_gas_plan_data, we can see a strong long-term decline in Residential consumption — CY1990 was 65% higher than CY2011, so about -3% per year. That’s a full order of magnitude larger than the 0.3%/yr we saw in SFBA_gas_county_data.
Commercial/Industrial consumption (MMthm). The amount of natural gas consumed by the Commercial and Industrial sectors, as reported in PGE_gas_plan_data, is almost twice as high as that labeled “Non-Residential” in SFBA_gas_county_data (3 billion vs 1.5 billion therms, respectively).
Commercial/Industrial growth. Even though the two datasets don’t cover the same non-residential population, compared to the “Non-Residential” consumption in SFBA_gas_county_data, the long-term growth in the Commercial and Industrial sectors here looks similarly flat — a fraction of a percent per year.

Recall that PGE_gas_plan_data comprises almost 5 billion therms, whereas SFBA_gas_county_data comprised only 3 billion. So, these differences might be reconcilable. There could be large populations of consumers covered by one that aren’t covered by the other.

This section was intended to give you a sense of what you can do with third-party activity data. You can do very similar things, of course, with third-party emission data. The key is to have the expected variables in your dataset (tput_qty and tput_unit, and/or ems_qty and ems_unit). And, you can always fall back to the more generic chart_annual_quantities() and/or chart_annual_growth(), so long as your dataset has at least one variable ending in _qty.