Carlsbad Aquafarm SeapHOx

Carlsbad Aquafarm SeapHOx

15-Jan-2018    

I am posting this not as an example of active research, but instead as an example of an open Jupyter notebook which I used to conduct a preliminary analysis. I’m no longer actively maintaining this analysis (though the sensor is still deployed as part of a SCCOOS grant), but leaving this page up to show how we conduct some of our real-time data communications and analysis.


The following is a Jupyter notebook with a Python script that I’ve been using occasionally to scrape a Google Sheet with near-real-time data. The Google Sheet updates ~ 2x/hr but this website is static so it updates only when I manually do so. See my ThingSpeak channel for the near-real-time data output.

PJB-20180716_ScrapeGoogleSheet-datasheets

Google Sheets Scraper

Goal: scrape the Google Sheet with autofilling data from the Particle Electron at Carlsbad Aquafarm Google Sheet named "SeapHOx_OuterLagoon" is here

In [1]:
import numpy as np
import pandas as pd
# import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import datasheets
import os

%matplotlib inline

Scrape Google Sheets

Use datasheets library, following directions from https://datasheets.readthedocs.io/en/latest/index.html

In [3]:
client = datasheets.Client()
workbook = client.fetch_workbook('SeapHOx_OuterLagoon')
tab = workbook.fetch_tab('Sheet1')
electron_array = tab.fetch_data()
electron_array.head()
Out[3]:
Timestamp name email phone message
0 2018-04-17 2018/04/17,18:30:25,18.860,0.07985,-0.83308,5.... undefined undefined undefined
1 2018-04-17 2018/04/17,19:00:25,18.870,0.08055,-0.83331,5.... undefined undefined undefined
2 2018-04-17 2018/04/17,19:30:25,18.860,0.08002,-0.83326,5.... undefined undefined undefined
3 2018-04-17 2018/04/17,20:02:20,18.860,0.07946,-0.83334,5.... undefined undefined undefined
4 2018-04-17 2018/04/17,20:32:51,18.850,0.07959,-0.83293,5.... undefined undefined undefined

Get SeapHOx String, Parse, Remove Bad Transmissions

In [4]:
data_col = electron_array.iloc[:, 1]
data_array = pd.DataFrame(data_col.str.split(',', expand = True))

# Get rid of rows that don't start with "20" as in "2018"; this is foolproof until 2100 or 20 shows up elsewhere in a bad row, I guess
good_rows = (data_array.iloc[:, 0]).str.contains('20')
good_rows.fillna(False, inplace = True)
data_array = data_array[good_rows]

Rename and Reindex

In [5]:
data_array.columns = ['Date', 'Time', 'V_batt', 'V_int', 'V_ext', 'P_dbar', 'pH_int', 'O2_uM', 'temp_SBE', 'sal_SBE', 'V_batt_elec', 'charge_status']
data_array.set_index(pd.to_datetime(data_array['Date'] + ' ' + data_array['Time']), inplace = True)
data_array.drop(['Date', 'Time'], axis = 1, inplace = True)
data_array.head()
Out[5]:
V_batt V_int V_ext P_dbar pH_int O2_uM temp_SBE sal_SBE V_batt_elec charge_status
2018-04-17 18:30:25 18.860 0.07985 -0.83308 5.33600 8.05349 328.51599 13.43740 33.55860 4.05 84.77
2018-04-17 19:00:25 18.870 0.08055 -0.83331 5.41600 8.08004 328.32300 13.48790 33.56140 4.05 84.62
2018-04-17 19:30:25 18.860 0.08002 -0.83326 5.38800 8.07537 325.82300 13.53800 33.55890 4.05 84.62
2018-04-17 20:02:20 18.860 0.07946 -0.83334 5.35400 8.06744 324.28900 13.54920 33.56210 4.05 84.48
2018-04-17 20:32:51 18.850 0.07959 -0.83293 5.21900 8.06995 326.27499 13.65300 33.56350 4.05 84.48

Filter

  • Filter based on date
  • Cast to type float (for some reason the str.split leaves it as arbitrary object)
  • This was necessary in early notebook as the input data wasn't filtered at all but the Google Sheet should be cleaner to begin with (i.e., no land data)
  • Filtration may come in handy later so keep this here for now
In [26]:
date_filt = data_array.index > '2017-09-07 18:30:00'
data_filt = data_array[date_filt]

import pytz
pacific = pytz.timezone('US/Pacific')
data_filt.index = data_filt.index.tz_localize(pytz.utc).tz_convert(pacific)

data_filt = data_filt.astype('float')

data_filt.tail()
# data_filt.V_press
Out[26]:
V_batt V_int V_ext P_dbar pH_int O2_uM temp_SBE sal_SBE V_batt_elec charge_status
2019-05-08 12:30:25-07:00 15.65 0.04701 -1.08303 13.053 8.36403 287.38000 18.7914 33.6151 3.56 1.93
2019-05-08 14:30:25-07:00 15.65 0.04707 -1.08312 13.242 8.36484 292.42001 18.7653 33.5865 3.56 1.91
2019-05-08 19:30:25-07:00 15.64 0.04723 -1.08283 13.068 8.36666 298.20999 18.9222 33.5844 3.56 1.91
2019-05-08 20:30:25-07:00 15.64 0.04755 -1.08244 13.518 8.37370 292.76001 18.6287 33.6312 3.56 1.93
2019-05-09 10:30:25-07:00 15.64 0.04374 -1.08600 11.748 8.30744 251.28999 18.7931 33.5566 3.56 1.90

Set manual limits

  • manually set reasonable limits
  • todo: implement other QARTOD standards for range tests, spike tests, noise tests, rate of change tests, etc.
In [27]:
pH_int_min = 7.5
O2_uM_max = 400

data_filt.O2_uM[data_filt.O2_uM > O2_uM_max] = np.nan
data_filt.pH_int[data_filt.pH_int < pH_int_min] = np.nan

Plot

In [28]:
fig, axs = plt.subplots(6, 1, figsize = (10, 10), sharex = True)
axs[0].plot(data_filt.index, data_filt.V_batt)
axs[0].set_ylabel('V_batt')
ax2 = axs[0].twinx()
ax2.plot(data_filt.index, data_filt.V_batt_elec, 'r')
ax2.set_ylabel('V_batt_elec', color='r')
ax2.tick_params('y', colors='r')

axs[1].plot(data_filt.index, data_filt.P_dbar)
axs[1].set_ylabel('P (dbar)')

axs[2].plot(data_filt.index, data_filt.sal_SBE)
axs[2].set_ylabel('Salinity')

axs[3].plot(data_filt.index, data_filt.temp_SBE)
axs[3].set_ylabel('Temp (C)')

axs[4].plot(data_filt.index, data_filt.V_int)
axs[4].set_ylabel('V_int')
ax2 = axs[4].twinx()
ax2.plot(data_filt.index, data_filt.V_ext, 'r')
ax2.set_ylabel('V_ext', color='r')
ax2.tick_params('y', colors='r')

axs[5].plot(data_filt.index, data_filt.pH_int)
axs[5].set_ylabel('pH')
ax2 = axs[5].twinx()
ax2.plot(data_filt.index, data_filt.O2_uM, 'r')
ax2.set_ylabel('O2 (uM)', color='r')
# ax2.set_ylim([0, 400])
ax2.tick_params('y', colors='r')

axs[0].xaxis_date() # make sure it knows that x is a date/time

for axi in axs.flat:
#     axi.xaxis.set_major_locator(plt.MaxNLocator(3))
#     print(axi)
    axi.yaxis.set_major_locator(plt.MaxNLocator(3))
#     axi.yaxis.set_major_formatter(ticker.FormatStrFormatter("%.02f"))

fig.autofmt_xdate() # makes the date labels easier to read.
plt.tight_layout()
plt.savefig('test_dep_01.png')
In [30]:
fig, axs = plt.subplots(1, 1, figsize = (10, 10), sharex = True)
pHOx = axs.scatter(x = data_filt.pH_int,
                   y = data_filt.O2_uM,
                   c = data_filt.P_dbar,
                   s = 100)
axs.set_xlabel('pH (int)')
axs.set_ylabel('O2 (uM)')
plt.colorbar(pHOx, label = 'P (dbar)');
In [31]:
cmap = plt.get_cmap('coolwarm')
corr = data_filt.corr()
corr.style.background_gradient(cmap, axis=1)\
    .set_properties(**{'max-width': '80px', 'font-size': '10pt'})\
    .set_caption("SeapHOx Correlations")\
    .set_precision(2)
Out[31]:
SeapHOx Correlations
V_batt V_int V_ext P_dbar pH_int O2_uM temp_SBE sal_SBE V_batt_elec charge_status
V_batt 1 0.18 0.33 -0.15 0.049 0.24 -0.23 -0.0075 0.53 0.59
V_int 0.18 1 0.86 0.14 -0.59 0.22 -0.31 -0.011 0.15 0.19
V_ext 0.33 0.86 1 -0.3 -0.62 0.29 -0.11 0.01 0.07 0.13
P_dbar -0.15 0.14 -0.3 1 0.85 0.014 -0.54 0.22 0.011 -0.012
pH_int 0.049 -0.59 -0.62 0.85 1 0.21 -0.66 0.11 0.042 0.049
O2_uM 0.24 0.22 0.29 0.014 0.21 1 -0.38 0.42 0.16 0.17
temp_SBE -0.23 -0.31 -0.11 -0.54 -0.66 -0.38 1 -0.079 -0.022 -0.042
sal_SBE -0.0075 -0.011 0.01 0.22 0.11 0.42 -0.079 1 -0.069 -0.055
V_batt_elec 0.53 0.15 0.07 0.011 0.042 0.16 -0.022 -0.069 1 0.97
charge_status 0.59 0.19 0.13 -0.012 0.049 0.17 -0.042 -0.055 0.97 1
-->