API

Tools

Data tools - Pandas Wrappers

class aerosense_tools.preprocess.RawSignal(dataframe, sensor_type)[source]

A class representing raw data received from data gateway.

extract_measurement_sessions(threshold=datetime.timedelta(seconds=60))[source]

Extract sessions (continuous measurement periods) from raw data.

Parameters:

threshold (datetime.timedelta) – Maximum gap between two consecutive measurement samples

Return (list, pandas.DataFrame):

List with SensorMeasurementSession objects, a dataframe with sessions’ start and end times

filter_outliers(window, standard_deviation_multiplier)[source]

A very primitive filter. Removes data points outside the confidence interval using a rolling median and standard deviation.

Parameters:
  • window (int) – window (number of samples) for rolling median and standard deviation

  • standard_deviation_multiplier (float) – multiplier to the rolling standard deviation

Return pandas.Dataframe:

inplace filtered dataframe

measurement_to_variable(sensor_conversion_constants=None)[source]

Transform fixed point values to a physical variable.

Parameters:

sensor_conversion_constants (dict) – dictionary containing calibrated conversion constants

Return pandas.Dataframe:

inplace transformed dataframe with raw values transformed to variable values

pad_gaps(threshold=datetime.timedelta(seconds=60))[source]

Checks for missing data. If the gap between samples (timedelta) is higher than the given threshold, then the last sample before the gap start is replaced with NaN. Thus, no interpolation will be performed during the non-sampling time window.

Parameters:

threshold (datetime.timedelta) – maximum gap between two samples as a timedelta type

Return pandas.Dataframe:

inplace modified dataframe with the end sample of each session replaced with NaN

class aerosense_tools.preprocess.SensorMeasurementSession(dataframe, sensor_type)[source]

A class representing continuous measurement series for a particular sensor. The class wraps some frequently used Pandas.DataFrame operations as well as plotly figure setup.

merge_with_and_interpolate(*secondary_sessions)[source]

Merge current session’s sensor measurements with measurements from other sensors (secondary sessions) The values from the secondary sessions will be interpolated onto the current session’s time vector.

Return Pandas.DataFrame:

Merged dataframe

plot(sensor_types_metadata, sensor_names=None, plot_start_offset=datetime.timedelta(0), plot_max_time=None)[source]

Plots the session dataframe with plotly.

Parameters:
  • sensor_types_metadata – Metadata about the sensor type, used for figure layout

  • sensor_names – Specific sensors to plot

  • plot_start_offset – start data plot after some time

  • plot_max_time – limit to the time plotted

Return plotly.graph_objs.Figure:

a line graph of the sensor data against time

to_constant_timestep(time_step, timeseries_start=None)[source]

Resample dataframe to the given time step. Linearly interpolates between samples.

Parameters:
  • time_step (datetime.timedelta) – timestep as datetime timedelta type

  • timeseries_start (datetime.datetime) – start constant step time series at specified time

Return SensorMeasurementSession:

sensor session with resampled and interpolated data

to_new_time_vector(new_time_vector)[source]

Interpolate the original dataframe onto a new time index.

Parameters:

new_time_vector (pandas.DatetimeIndex) – the new time index

Return SensorMeasurementSession:

a new SensorMeasurementSession object with the interpolated dataframe

trim_session(trim_from_start=datetime.timedelta(0), trim_from_end=datetime.timedelta(0))[source]

Delete first and last measurements from the session

Parameters:
  • trim_from_start (datetime.timedelta) – Amount of time to trim from the start of the session

  • trim_from_end (datetime.timedelta) – Amount of time to trim from the end of the session

Return SensorMeasurementSession:

sensor session with trimmed_dataframe

Big Query Wrappers

class aerosense_tools.queries.BigQuery(project_name='aerosense-twined')[source]

A collection of queries for working with the Aerosense BigQuery dataset.

Parameters:

project_name (str) – the name of the Google Cloud project the BigQuery dataset belongs to

Return None:

add_sensor_coordinates(coordinates)[source]

Add the given sensor coordinates to the sensor coordinates table.

Parameters:

coordinates (dict) – the sensor coordinates

Return None:

download_microphone_data_at_datetime(installation_reference, node_id, datetime, tolerance=1)[source]

Download the microphone datafile for the given node of the given installation at the given datetime (within the given tolerance). If more than one datetime is found within the tolerance, the datafile with the earliest timestamp is downloaded.

Parameters:
  • installation_reference (str) – the reference of the installation to get microphone data from

  • node_id (str) – the node on the installation to get microphone data from

  • datetime (datetime.datetime) – the datetime to get the data for

  • tolerance (float) – the tolerance on the given datetime in seconds

Return str:

the local path the microphone datafile was downloaded to

extract_and_add_new_measurement_sessions(sensors=None)[source]

Extract new measurement sessions from the database for the given sensors and add them to the sessions table. If no sensors are given, sessions for the following sensors are searched for: - connection_statistics - magnetometer - connection - barometer - barometer_thermometer - accelerometer - gyroscope - battery_info - differential_barometer

Parameters:

sensors (list(str)|None) – the sensors to search for new measurement sessions for

Return None:

get_aggregated_connection_statistics(installation_reference, node_id, start=None, finish=None)[source]

Get minute-wise aggregated connection statistics over the given time period. The time period defaults to the last day.

Parameters:
  • installation_reference (str) – the reference of the installation to get sensor data from

  • node_id (str) – the node on the installation to get sensor data from

  • start (datetime.datetime|None) – defaults to 1 day before the given finish

  • finish (datetime.datetime|None) – defaults to the current datetime

Return pandas.Dataframe:

the aggregated connection statistics

get_installations()[source]

Get the available installations.

Return list(dict):

the available installations

get_measurement_sessions(installation_reference, node_id, sensor_type_reference, start=None, finish=None)[source]

Get the measurement sessions that exist for the given sensor type, node, and installation between the given start and finish datetimes.

Parameters:
  • installation_reference (str) – the reference of the installation to get measurement sessions for

  • node_id (str) – the ID of the node to get measurement sessions for

  • sensor_type_reference (str) – the type of sensor to get measurement sessions for

  • start (datetime.datetime|None) – the time after which the sessions start

  • finish (datetime.datetime|None) – the time before which the sessions end

Return pandas.DataFrame:

the measurement sessions

get_microphone_metadata(installation_reference, node_id, start=None, finish=None)[source]

Get metadata for microphone data for the given node of the given installation over the given time period. The time period defaults to the last day.

Parameters:
  • installation_reference (str) – the reference of the installation to get microphone metadata from

  • node_id (str) – the node on the installation to get microphone metadata from

  • start (datetime.datetime|None) – the start of the time period; defaults to 1 day before the given finish

  • finish (datetime.datetime|None) – the end of the time period; defaults to the current datetime

Return pandas.Dataframe:

the microphone metadata

get_nodes(installation_reference)[source]

Get the IDs of the nodes installed on the given installation.

Parameters:

installation_reference (str) – the reference of the installation to get the node IDs of

Return list(str):

the node IDs for the installation

get_sensor_coordinates(reference=None)[source]

Get the sensor coordinates with the given reference from the sensor coordinates table if they exist. If no reference is given, get all sensor coordinates.

Parameters:

reference (str|None) – the reference of the coordinates to get

Return dict|None:

the sensor coordinates if they exist

get_sensor_data(installation_reference, node_id, sensor_type_reference, start=None, finish=None, row_limit=10000)[source]

Get sensor data for the given sensor type on the given node of the given installation over the given time period. The time period defaults to the last day.

Parameters:
  • installation_reference (str) – the reference of the installation to get sensor data from

  • node_id (str) – the node on the installation to get sensor data from

  • sensor_type_reference (str) – the type of sensor from which to get the data

  • start (datetime.datetime|None) – defaults to 1 day before the given finish

  • finish (datetime.datetime|None) – defaults to the current datetime

  • row_limit (int|None) – if set to None, no row limit is applied; if set to an integer, the row limit is set to this; defaults to 10000

Return (pandas.Dataframe, bool):

the sensor data and whether the data has been limited by a row limit

get_sensor_data_at_datetime(installation_reference, node_id, sensor_type_reference, datetime, tolerance=1)[source]

Get sensor data for the given sensor type on the given node of the given installation at the given datetime. The first datetime within a tolerance of ±0.5 * tolerance is used.

Parameters:
  • installation_reference (str) – the reference of the installation to get sensor data from

  • node_id (str) – the node on the installation to get sensor data from

  • sensor_type_reference (str) – the type of sensor from which to get the data

  • datetime (datetime.datetime|None) – the datetime to get the data at

  • tolerance (float) – the tolerance on the given datetime in seconds

Return pandas.Dataframe:

the sensor data at the given datetime

get_sensor_types()[source]

Get the available sensor types and their metadata.

Return list(dict):

the available sensor types and their metadata

query(query_string)[source]

Query the dataset with an arbitrary query.

Parameters:

query_string (str) – the query to use

Return pd.DataFrame:

the results of the query

update_sensor_coordinates(reference, kind, geometry)[source]

Update the given sensor coordinates in the sensor coordinates table.

Parameters:
  • reference (str) – the reference of the coordinates to update

  • kind (str) – the kind of the new coordinates

  • geometry (dict) – the new coordinates

Return None: