Working With NWB Files

[1]:

from pathlib import Path
import json

import numpy as np
import pandas as pd

from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache
from allensdk.brain_observatory.ecephys.ecephys_session import EcephysSession

from allensdk.brain_observatory.ecephys import nwb  # for compat
import pynwb
from pynwb import NWBHDF5IO

import nems
from nems0.recording import Recording
from nems0.signal import PointProcess

import h5py

from matplotlib import pyplot as plt

[nems.configs.defaults INFO] Saving log messages to /tmp/nems/NEMS 2019-12-10 155251.log

Generate session and get metadata

[2]:

data_dir = Path(r'data')
manifest_path = data_dir / 'manifest.json'

manifest.json is where the library keeps a record of previously downloaded files. If it doesn’t exists in location you specify, it will be created the first time you create the cache object. Otherwise, when you request data, the library will check the manifest to see if it’s been downloaded.

[3]:

# assert manifest_path.exists()

[4]:

cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)
sessions = cache.get_session_table()

[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[numexpr.utils INFO] NumExpr defaulting to 8 threads.
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache

Download data from session

You can edit the download criteria to filter out the data. Downloading the data can take several minutes.

[5]:

# 21 is just a random number, for this example notebook
session = cache.get_session_data(sessions.iloc[21].name,
                                 isi_violations_maximum=np.inf,
                                 amplitude_cutoff_maximum=np.inf,
                                 presence_ratio_minimum=-np.inf
                                )

[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache

This is the epochs table. It applies to all the units.

[6]:

# this can be slow depending on the size of the data
presentation_table = session.stimulus_presentations
presentation_table.head()

[6]:

	color	contrast	frame	orientation	phase	pos	size	spatial_frequency	start_time	stimulus_block	stimulus_name	stop_time	temporal_frequency	x_position	y_position	duration	stimulus_condition_id
stimulus_presentation_id
0	null	null	null	null	null	null	null	null	24.752216	null	spontaneous	84.818986	null	null	null	60.066770	0
1	null	0.8	null	45	[3644.93333333, 3644.93333333]	[-30.0, 40.0]	[20.0, 20.0]	0.08	84.818986	0	gabors	85.052505	4	0	10	0.233520	1
2	null	0.8	null	0	[3644.93333333, 3644.93333333]	[-30.0, 40.0]	[20.0, 20.0]	0.08	85.052505	0	gabors	85.302704	4	20	40	0.250199	2
3	null	0.8	null	0	[3644.93333333, 3644.93333333]	[-30.0, 40.0]	[20.0, 20.0]	0.08	85.302704	0	gabors	85.552904	4	-40	40	0.250199	3
4	null	0.8	null	90	[3644.93333333, 3644.93333333]	[-30.0, 40.0]	[20.0, 20.0]	0.08	85.552904	0	gabors	85.803103	4	10	30	0.250199	4

Included is metadata about each unit, as a pandas dataframe.

[7]:

session.units.head()

[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache

[7]:

	waveform_duration	firing_rate	waveform_PT_ratio	d_prime	waveform_recovery_slope	waveform_velocity_below	presence_ratio	L_ratio	waveform_amplitude	max_drift	...	ecephys_structure_id	ecephys_structure_acronym	anterior_posterior_ccf_coordinate	dorsal_ventral_ccf_coordinate	left_right_ccf_coordinate	probe_description	probe_sampling_rate	probe_lfp_sampling_rate	probe_has_lfp_data
unit_id
951853174	0.219765	19.441146	0.199007	6.714259	-0.049973	0.206030	0.99	0.000089	84.471855	48.84	...	215.0	APN	7928	3129	7167	probeA	29999.96621	1249.998592	True
951853190	0.137353	10.267881	0.734993	2.744475	-0.154760	-0.431682	0.99	0.137435	95.045925	50.17	...	215.0	APN	7908	3068	7169	probeA	29999.96621	1249.998592	True
951853197	0.151089	10.161642	0.625356	3.389818	-0.102740	-0.343384	0.99	0.073114	73.191495	47.14	...	215.0	APN	7905	3060	7169	probeA	29999.96621	1249.998592	True
951853216	0.563149	6.664585	0.645313	5.647242	-0.024678	0.286153	0.99	0.009056	45.769815	32.12	...	215.0	APN	7866	2948	7168	probeA	29999.96621	1249.998592	True
951853225	0.576884	9.174408	0.558268	4.388771	-0.033880	0.073582	0.99	0.017206	57.620160	38.75	...	215.0	APN	7854	2913	7167	probeA	29999.96621	1249.998592	True

5 rows × 89 columns

Spike times are stored in a dictionary, with the unit IDs as keys and arrays for the spike times.

[8]:

# this is a large dict, so let's just look at one key
session.spike_times[951853174]

[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache
[call_caching INFO] Reading data from cache

[8]:

array([3.79479685e+00, 3.80853020e+00, 3.80933020e+00, ...,
       9.62280203e+03, 9.62339993e+03, 9.62352627e+03])

Loading existing NWB files with session

If you’ve previously downloaded data, you can avoid the cache creation and just create a session from an NWB file.

[9]:

nwb_filepath = Path(r'/auto/users/tomlinsa/code/allen/data/session_760345702/session_760345702.nwb')
assert nwb_filepath.exists()

[10]:

session = EcephysSession.from_nwb_path(str(nwb_filepath), api_kwargs={
        "amplitude_cutoff_maximum": np.inf,
        "presence_ratio_minimum": -np.inf,
        "isi_violations_maximum": np.inf
    })

[11]:

# session.stimulus_presentations

or from NWB file (preffered)

This is the raw data. The session method of getting the data does it’s own formatting of the dataframes.

[12]:

nwb_filepath = Path(r'/auto/users/tomlinsa/code/allen/data/session_760345702/session_760345702.nwb')
assert nwb_filepath.exists()

[13]:

io = NWBHDF5IO(str(nwb_filepath), 'r')
nwbfile = io.read()

[14]:

nwbfile.lab_meta_data

[14]:

{'metadata':
 metadata <class 'allensdk.brain_observatory.ecephys.nwb.EcephysLabMetaData'>
 Fields:
   age_in_days: 103.0
   full_genotype: Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
   sex: M
   specimen_name: Pvalb-IRES-Cre;Ai32-407972
   stimulus_name: brain_observatory_1.1
   strain: C57BL/6J}

The rest of the attributes are still accessible.

[15]:

# nwbfile.units
# nwbfile.epochs
# nwbfile.stimulus_presentations

Load into NEMS

[16]:

r = Recording.from_nwb(nwb_filepath, 'neuropixel')

[nems.recording INFO] Loading NWB file with format "neuropixel" from "/auto/users/tomlinsa/code/allen/data/session_760345702/session_760345702.nwb".
[nems.recording INFO] Successfully loaded nwb file.

All of the usual attributes are accessible.

[17]:

r.epochs.head()

[17]:

	start	end	name	stimulus_block	temporal_frequency	x_position	y_position	color	colorSpace	depth	...	tex	texRes	units	stimulus_index	orientation	spatial_frequency	frame	contrast	flipHoriz	flipVert
0	24.752216	84.818986	spontaneous	NaN	NaN	NaN	NaN			NaN	...		NaN		NaN	NaN		NaN	NaN	NaN	NaN
1	84.818986	85.052505	gabors	0.0	4.0	0.0	10.0	[1.0, 1.0, 1.0]	rgb	0.0	...	sin	256.0	deg	0.0	45.0	0.08	NaN	0.8	NaN	NaN
2	85.052505	85.302704	gabors	0.0	4.0	20.0	40.0	[1.0, 1.0, 1.0]	rgb	0.0	...	sin	256.0	deg	0.0	0.0	0.08	NaN	0.8	NaN	NaN
3	85.302704	85.552904	gabors	0.0	4.0	-40.0	40.0	[1.0, 1.0, 1.0]	rgb	0.0	...	sin	256.0	deg	0.0	0.0	0.08	NaN	0.8	NaN	NaN
4	85.552904	85.803103	gabors	0.0	4.0	10.0	30.0	[1.0, 1.0, 1.0]	rgb	0.0	...	sin	256.0	deg	0.0	90.0	0.08	NaN	0.8	NaN	NaN

5 rows × 27 columns

[18]:

r.meta

[18]:

{'specimen_name': 'Pvalb-IRES-Cre;Ai32-407972',
 'age_in_days': 103.0,
 'full_genotype': 'Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt',
 'strain': 'C57BL/6J',
 'sex': 'M',
 'stimulus_name': 'brain_observatory_1.1',
 'uri': '/auto/users/tomlinsa/code/allen/data/session_760345702/session_760345702.nwb'}

All of the spike times are saved into a single PointProcess signal.

[19]:

r.signals

[19]:

{'session_760345702': <nems.signal.PointProcess at 0x7fd3ea723f98>}

[20]:

signal = r.signals['session_760345702']
signal.nchans

[20]:

Each channel key is the unit ID from the units dataframe of the NWB file

[21]:

# ex, unit 951843010
signal._data[951843010]

[21]:

array([  10.2351168 ,   11.66955413,   11.69268753, ..., 9612.18553725,
       9612.20010396, 9620.83476135])

The units metadata is saved into the meta of the signal.

[22]:

# ex, unit 951843010
signal.meta[951843010]

[22]:

{'waveform_duration': 0.755443886097152,
 'firing_rate': 1.74113707219344,
 'PT_ratio': 0.00729943086142739,
 'd_prime': nan,
 'recovery_slope': -0.00895206089461061,
 'quality': 'noise',
 'velocity_below': 1.03015075376884,
 'presence_ratio': 0.99,
 'l_ratio': 0.0,
 'amplitude': 51.4938450000001,
 'max_drift': 23.6,
 'snr': 1.92770330724404,
 'nn_hit_rate': nan,
 'spread': 60.0,
 'nn_miss_rate': nan,
 'cumulative_drift': 615.49,
 'waveform_halfwidth': 0.357118927973199,
 'isolation_distance': nan,
 'isi_violations': 1.06689648479891,
 'silhouette_score': 0.0759991877886951,
 'local_index': 263,
 'amplitude_cutoff': 0.268061054461112,
 'repolarization_slope': 0.122863156143529,
 'cluster_id': 267,
 'velocity_above': 1.37353433835846,
 'peak_channel_id': 850096152}

Get stimulus images

For example, frame 100 from natural_scenes, as in this random row:

[23]:

r.epochs.loc[[51357], ['start', 'end', 'name', 'frame']]  # list loc to keep as df

[23]:

	start	end	name	frame
51357	5910.169789	5910.420001	natural_scenes	100.0

[24]:

im = cache.get_natural_scene_template(100)

[call_caching INFO] Reading data from cache

[25]:

plt.imshow(im, cmap=plt.cm.gray)

[25]:

<matplotlib.image.AxesImage at 0x7fd3ea60bf60>

../_images/demos_demo_working_with_nwb_files_44_1.png