Introduction to Scipp#
Multi-dimensional arrays with labeled dimensions and physical units
scipp.github.io
Scipp is an open-source library developed by ESS for handling, manipulating and visualizing multi-dimensional data arrays.
It enriches raw NumPy-like arrays by adding named dimensions and associated coordinates. In addition, it supports
Physical units which are handled in arithmetic operations
Histograms, i.e., bin-edge axes, which are by 1 longer than the data extent
Propagation of uncertainties
%matplotlib inline
import numpy as np
import scipp as sc
import matplotlib.pyplot as plt
# import scipp_intro
from scipp_utils import quiz, plot, scatter
rng = np.random.default_rng(seed=1234)
1. Labeled dimensions: why do we need them?#
Say we have a 2D rectangular array of data
ny, nx = 10, 20
a = np.sin(np.arange(ny) / (ny / 4)).reshape((-1, 1)) * np.cos(np.arange(nx) / (ny / 4))
a.shape
(10, 20)
that looks like
plot(a)
The task is now to slice out row number 4. Because of the shape of the array, we know that the row dimension is the smallest, so we slice the first dimension of the 2D array:
# Slice out row number 4
plot(a[4, :])
We can’t always deduce from the shape#
Now say we have an array which has a square shape:
ny, nx = 20, 20
a = np.sin(np.arange(ny) / (ny / 4)).reshape((-1, 1)) * np.cos(np.arange(nx) / (ny / 4))
a.shape
(20, 20)
plot(a)
Do we slice the first or the second index of the 2D array?
# Not always obvious which dimension is which
plot(a[:, 4], a[4, :])
The situation gets worse with more dimensions#
Say we now have an array that has 4 dimensions: x, y, z, t
(in that order, maybe?, or is it z, y, x, t
, or t, x, y, z
?)
a = np.random.random([20] * 4)
a.shape
(20, 20, 20, 20)
Quiz time!
quiz(1)
Introducing labeled dimensions#
Xarray (https://docs.xarray.dev) introduced labels to multi-dimensional Numpy arrays.
“real-world datasets are usually more than just raw numbers; they have labels which encode information about how the array values map to locations in space, time, etc.”
We have embraced, and to a large extent copied, the Xarray mechanism.
var = sc.array(dims=["x", "y", "z", "time"], values=a)
var
- (x: 20, y: 20, z: 20, time: 20)float64𝟙0.948, 0.896, ..., 0.678, 0.734
Values:
array([[[[9.48165459e-01, 8.96435967e-01, 9.66219346e-01, ..., 6.03711081e-01, 8.34447197e-01, 9.85612617e-01], [4.82346698e-01, 6.93902855e-01, 9.16113392e-01, ..., 1.61751558e-01, 6.46902282e-01, 9.61384757e-01], [4.04100473e-01, 2.10485033e-01, 1.24629397e-01, ..., 9.08030473e-01, 7.96701255e-01, 8.33341164e-01], ..., [8.41792555e-03, 8.55706407e-01, 5.09527396e-01, ..., 1.10354492e-01, 9.93271805e-02, 6.36451063e-01], [7.33444922e-01, 2.34292533e-01, 2.47198259e-01, ..., 7.01677349e-01, 3.92137995e-01, 9.80117649e-01], [2.09029672e-02, 3.94031074e-02, 4.91507091e-01, ..., 9.29304690e-01, 6.99783344e-01, 7.65535215e-01]], [[1.35507446e-02, 9.79079406e-01, 2.91031700e-01, ..., 3.58622863e-01, 1.12854949e-01, 6.19741706e-01], [6.20801052e-01, 6.55783370e-01, 5.63050287e-01, ..., 3.30302245e-01, 4.62092017e-02, 8.65260138e-01], [5.41832695e-01, 9.82711229e-01, 2.52556545e-01, ..., 2.47359242e-01, 8.86271611e-01, 9.22304025e-01], ..., [4.08220529e-01, 2.97165852e-02, 1.40943346e-01, ..., 8.02610982e-01, 5.22730633e-01, 1.69905969e-03], [2.76746947e-01, 3.64898623e-01, 2.39674844e-01, ..., 6.76665392e-01, 2.70306055e-01, 6.41429489e-01], [4.94274366e-01, 1.06616737e-01, 9.44806846e-01, ..., 7.31648355e-01, 9.37211795e-01, 1.74602753e-02]], [[4.59902637e-01, 5.84264625e-01, 7.37193612e-02, ..., 7.82828626e-01, 6.05248258e-01, 7.81740380e-01], [8.72282668e-01, 2.28534036e-02, 3.69315404e-01, ..., 6.29006585e-01, 5.87850179e-01, 5.65651681e-01], [9.13325685e-01, 5.71537713e-02, 6.44281110e-01, ..., 4.67254686e-01, 2.00073508e-01, 2.69126665e-01], ..., [4.61101350e-01, 4.31187977e-02, 1.03027259e-01, ..., 3.32774433e-01, 2.78961432e-01, 4.84533906e-01], [1.23545557e-01, 4.40777922e-01, 2.23953610e-01, ..., 3.94030255e-01, 1.12754485e-01, 5.17664369e-01], [1.37479956e-01, 7.26688064e-01, 4.06553093e-03, ..., 8.67181708e-01, 5.25860416e-01, 9.58771171e-01]], ..., [[4.78899486e-01, 1.01220900e-01, 8.80562858e-02, ..., 9.78483101e-01, 2.67485187e-01, 4.71981343e-01], [4.38002298e-02, 5.17029510e-01, 8.92742841e-01, ..., 4.51281025e-01, 5.01370435e-01, 3.66222441e-01], [5.11113404e-01, 7.01413603e-01, 9.85420902e-01, ..., 7.71704042e-01, 9.13548459e-01, 9.20764662e-01], ..., [1.35814816e-01, 5.88644865e-01, 7.90724935e-01, ..., 4.94477527e-01, 8.43632914e-01, 4.44257090e-01], [4.59695125e-01, 6.47341060e-01, 8.32933380e-01, ..., 7.40062151e-01, 9.55048914e-01, 6.07186964e-02], [9.06137144e-01, 3.12765676e-01, 8.17617947e-01, ..., 2.99216444e-01, 5.32006530e-01, 2.94789544e-01]], [[9.47925104e-01, 6.71144605e-01, 5.24821534e-01, ..., 5.32435100e-01, 4.23269723e-01, 4.31338110e-01], [1.29343817e-01, 1.88554462e-01, 7.50925911e-01, ..., 7.08796165e-01, 9.01934255e-02, 9.45151426e-01], [1.04354700e-01, 9.37757995e-03, 4.35218516e-01, ..., 7.43464722e-02, 2.32264434e-01, 5.40469003e-01], ..., [2.37254462e-01, 9.36069395e-01, 8.99221776e-01, ..., 1.97991673e-01, 9.24597439e-01, 5.01279345e-01], [7.51795394e-02, 6.94040565e-02, 9.82170970e-01, ..., 5.66263802e-01, 2.73189976e-01, 7.16379645e-01], [5.04549845e-02, 7.34858682e-02, 7.30970255e-01, ..., 2.34708042e-01, 1.16551287e-01, 2.95015826e-01]], [[6.13013501e-01, 7.03155425e-01, 1.38757257e-01, ..., 2.64225695e-01, 5.64384483e-01, 7.66908934e-01], [6.61962531e-01, 2.21711378e-01, 9.96301527e-01, ..., 8.93068636e-01, 1.80055704e-01, 6.40995649e-01], [2.15760371e-01, 3.60936689e-01, 9.43439591e-01, ..., 9.06008612e-01, 3.53859017e-01, 5.40464703e-01], ..., [1.27264999e-01, 9.30494766e-01, 4.56849661e-01, ..., 5.63440760e-01, 7.42466104e-01, 1.98802559e-01], [5.15770722e-01, 3.88159235e-01, 3.74617217e-01, ..., 4.82443936e-01, 8.76654310e-01, 6.20849754e-01], [5.43257801e-01, 4.82709076e-01, 4.91455995e-01, ..., 4.10262182e-01, 2.37017437e-01, 8.27745288e-01]]], [[[1.63069368e-01, 2.14692361e-01, 7.06879472e-01, ..., 8.60977516e-01, 8.23536246e-01, 9.81376234e-01], [6.17667766e-01, 8.11023735e-01, 5.88601405e-01, ..., 2.20560311e-01, 7.75796180e-01, 6.70474253e-01], [6.70504785e-01, 9.17776577e-02, 3.15393294e-01, ..., 2.86350531e-01, 6.54556624e-01, 9.58125720e-01], ..., [7.07377908e-01, 6.67140976e-02, 4.35138602e-02, ..., 3.34614235e-01, 3.75290053e-01, 6.08551002e-01], [2.83419454e-01, 7.22518763e-01, 2.04394401e-01, ..., 2.89371325e-01, 5.37743821e-01, 1.02760238e-01], [1.30964596e-01, 8.32281912e-01, 7.60805317e-01, ..., 4.00464659e-02, 2.98202868e-01, 9.85612539e-01]], [[1.26681064e-02, 7.20111982e-01, 7.46926232e-01, ..., 6.51685160e-01, 4.47459682e-01, 7.73680124e-01], [1.73285625e-01, 8.97776279e-01, 6.51673161e-02, ..., 5.51265359e-01, 9.79392652e-01, 4.33695090e-01], [6.65423869e-01, 3.85637672e-02, 8.97884905e-01, ..., 2.72884547e-01, 2.21141584e-01, 9.03380046e-01], ..., [3.51446781e-01, 5.62542965e-02, 4.49179995e-01, ..., 1.42842841e-01, 2.85401634e-01, 3.09606250e-01], [7.11064922e-03, 6.30408000e-01, 7.87154331e-02, ..., 8.98758197e-01, 6.98489172e-01, 5.45727178e-01], [9.47989249e-01, 6.68260058e-01, 9.83651316e-01, ..., 6.48919129e-01, 2.17230869e-01, 1.67072312e-01]], [[6.13721793e-01, 3.66737263e-01, 6.40938270e-01, ..., 7.93317605e-01, 2.08852387e-01, 5.37656380e-01], [5.38401463e-01, 1.71890130e-01, 7.10436556e-01, ..., 2.42140661e-01, 5.27346118e-02, 6.89135878e-01], [5.10035564e-01, 9.38926657e-01, 3.40109536e-01, ..., 3.62972105e-02, 3.17124488e-01, 7.05596363e-01], ..., [3.57047473e-01, 2.76312159e-01, 4.14376234e-01, ..., 1.20040478e-01, 1.35910442e-01, 6.53502368e-01], [6.49305100e-01, 7.18726046e-01, 9.75991859e-01, ..., 1.02272605e-02, 7.63258982e-01, 8.53724821e-01], [7.88686364e-01, 7.75638672e-01, 9.97443484e-01, ..., 1.25290210e-01, 9.04143596e-01, 9.09323403e-01]], ..., [[1.51290545e-01, 6.85297300e-01, 4.64605057e-01, ..., 8.70839043e-01, 3.28580596e-01, 7.74605373e-01], [7.38660206e-01, 5.51607929e-01, 1.86423744e-01, ..., 8.64798937e-02, 6.88812592e-01, 9.01612289e-01], [2.94174725e-01, 5.67469215e-01, 4.73952015e-01, ..., 2.35076792e-01, 3.31286484e-01, 1.69439261e-01], ..., [1.07980774e-01, 2.22453268e-01, 7.03039744e-02, ..., 8.06867117e-01, 4.53338966e-01, 2.27354866e-01], [2.01861371e-01, 2.91671551e-01, 8.72746222e-01, ..., 1.65268081e-01, 6.01310243e-01, 8.94415984e-02], [8.15753790e-01, 9.97749420e-01, 7.41692211e-01, ..., 3.27633202e-01, 2.83363046e-01, 2.58259989e-01]], [[7.56576350e-01, 4.69680196e-01, 7.14531602e-01, ..., 1.41173994e-01, 4.70968431e-01, 7.72654247e-01], [3.59757961e-01, 4.18181018e-01, 9.00799347e-01, ..., 6.52336134e-01, 6.55944936e-01, 4.70820488e-01], [2.66484144e-01, 4.79197643e-01, 3.58975175e-01, ..., 6.72338776e-02, 4.37468798e-01, 4.75252637e-01], ..., [6.58725408e-01, 5.59965034e-01, 2.04456379e-01, ..., 9.21761442e-01, 1.50024947e-02, 2.72229149e-01], [8.29289614e-02, 1.54929675e-01, 8.94258060e-01, ..., 9.37904823e-01, 1.19099733e-01, 2.08124627e-01], [7.23390316e-01, 6.50197654e-01, 8.07141653e-01, ..., 5.46391853e-01, 7.87676429e-01, 2.49849060e-01]], [[8.01516303e-01, 2.21945279e-01, 8.62410302e-02, ..., 9.62710393e-01, 9.47321329e-01, 8.96490941e-01], [7.19144388e-01, 7.36009957e-01, 7.99445424e-01, ..., 7.59097408e-01, 3.30251018e-01, 6.91333739e-01], [5.80256307e-01, 6.45059225e-01, 3.95401822e-01, ..., 7.07811206e-01, 7.76590172e-01, 4.72329060e-01], ..., [6.21438509e-01, 5.28528993e-02, 6.55216001e-01, ..., 8.36218989e-01, 9.75231496e-01, 7.36273331e-01], [9.36384065e-01, 9.31559742e-01, 7.79944834e-01, ..., 7.40244795e-01, 3.37415998e-01, 6.07570333e-01], [2.76060707e-01, 9.72428809e-01, 5.81890457e-01, ..., 3.75904198e-01, 7.03242960e-01, 2.76479312e-01]]], [[[1.16583537e-01, 2.67972896e-01, 8.93184592e-01, ..., 6.41506981e-01, 7.82489243e-01, 9.84907709e-01], [4.40049018e-01, 9.52529710e-01, 9.79152271e-01, ..., 5.51263068e-01, 8.15900529e-01, 3.84471903e-01], [3.55285756e-02, 1.60659038e-01, 9.89348465e-01, ..., 9.21469172e-01, 5.59694274e-01, 7.85021296e-01], ..., [6.10952737e-01, 3.99697881e-01, 4.29603196e-01, ..., 2.62683702e-01, 7.27075489e-01, 7.30951050e-01], [5.17926195e-01, 7.33932534e-01, 5.50179473e-01, ..., 1.75455856e-01, 1.13383588e-01, 2.68532474e-01], [7.12215914e-01, 9.42437975e-01, 8.94347832e-01, ..., 2.67609575e-01, 5.86956844e-01, 6.15599451e-01]], [[6.38795382e-01, 1.88220369e-01, 7.03389978e-02, ..., 1.71913248e-01, 2.35230146e-01, 8.26206509e-01], [2.47908852e-02, 1.18902681e-01, 9.27973100e-01, ..., 3.49367553e-01, 2.42291477e-01, 3.81606740e-01], [5.29212309e-01, 7.48829133e-01, 4.15761150e-01, ..., 2.40488600e-01, 3.39017043e-02, 7.57381604e-01], ..., [5.58832077e-01, 1.23864974e-01, 5.66123847e-01, ..., 6.49341269e-01, 2.91246300e-01, 6.71096721e-01], [7.17901617e-01, 6.88464852e-02, 8.06619878e-01, ..., 4.34705191e-01, 9.18733801e-01, 4.64498180e-02], [1.15537225e-01, 9.38679409e-01, 1.84675597e-01, ..., 7.89710779e-02, 5.22206112e-01, 2.82203006e-01]], [[2.01249789e-01, 3.05548614e-01, 3.47169743e-01, ..., 2.97115871e-01, 5.66319235e-01, 7.25148577e-01], [8.82258064e-01, 7.20522489e-01, 7.20737404e-01, ..., 1.00499980e-01, 1.56914836e-01, 5.84486334e-01], [9.57238347e-04, 1.89327331e-01, 8.84735577e-01, ..., 5.96480234e-01, 6.74674429e-01, 4.24434072e-01], ..., [2.23668406e-01, 9.46381450e-01, 6.36380857e-01, ..., 7.27510903e-01, 7.97531635e-01, 9.32773570e-01], [8.50810017e-01, 6.54656846e-01, 2.88186044e-01, ..., 8.41146251e-01, 7.55459460e-02, 7.43988960e-01], [6.07373429e-01, 8.46295617e-01, 3.95247727e-02, ..., 6.43124269e-01, 8.67820677e-01, 6.29019804e-01]], ..., [[3.08337592e-01, 8.10321944e-01, 2.41471568e-01, ..., 1.42780749e-01, 7.06151794e-01, 1.83603313e-01], [6.22537683e-01, 5.89722584e-02, 7.21434399e-01, ..., 4.66661242e-01, 6.89174100e-01, 4.12299577e-01], [6.58213386e-01, 5.88684466e-01, 3.55426476e-01, ..., 2.09570530e-02, 4.51955885e-01, 3.95908787e-01], ..., [4.51080848e-01, 4.12368591e-01, 2.14125062e-01, ..., 7.46931611e-01, 1.45419460e-01, 1.01504207e-01], [6.07468388e-01, 3.57721167e-01, 3.09265292e-01, ..., 7.67205154e-01, 4.99622264e-01, 8.09162659e-01], [4.06458992e-01, 4.67638456e-01, 8.20962749e-01, ..., 2.13181275e-01, 8.82093861e-01, 4.25455731e-02]], [[9.82715230e-01, 3.64663155e-01, 8.18386921e-01, ..., 7.76835835e-01, 7.08283077e-01, 6.60836059e-01], [2.47266158e-01, 1.59840230e-01, 8.80702697e-02, ..., 3.40419862e-01, 2.46337926e-01, 4.18786147e-01], [2.47522718e-01, 4.13847143e-01, 6.95051839e-01, ..., 4.72601045e-02, 9.58509252e-01, 7.08404359e-01], ..., [3.38889691e-01, 9.37306358e-01, 8.90471949e-01, ..., 4.73021419e-01, 2.10637066e-01, 9.83712718e-01], [3.59726155e-01, 7.12575347e-01, 9.68341833e-01, ..., 5.44133182e-01, 1.01012415e-01, 6.58231902e-01], [7.33457124e-01, 5.52637759e-01, 2.03451453e-01, ..., 8.47183454e-01, 3.50940985e-01, 7.85347927e-01]], [[7.27705251e-01, 6.70018810e-01, 5.58843351e-01, ..., 4.12763939e-02, 8.50231347e-01, 8.02478015e-01], [2.20729259e-01, 7.23287934e-01, 3.51337175e-01, ..., 6.38504567e-01, 7.20900085e-01, 6.33554541e-01], [2.51874794e-01, 3.09936054e-01, 8.18185842e-01, ..., 5.69402694e-01, 2.36029687e-01, 6.73240254e-01], ..., [8.42706337e-01, 3.51878586e-01, 8.27555297e-01, ..., 6.57697854e-01, 1.11719454e-01, 1.38280430e-01], [4.59249474e-01, 2.45252063e-02, 5.43177486e-01, ..., 7.78278596e-01, 3.72218647e-01, 1.09672008e-02], [6.49197050e-01, 9.22721884e-01, 5.13791428e-01, ..., 4.11184031e-01, 3.92095437e-01, 3.69296653e-01]]], ..., [[[3.14235833e-01, 9.61710561e-01, 5.79780128e-01, ..., 8.11702742e-01, 6.04768798e-01, 9.89360002e-01], [7.79823778e-01, 1.96602631e-01, 4.17317585e-01, ..., 9.39483346e-01, 7.20699511e-01, 2.49271240e-01], [9.71284195e-01, 7.56456212e-02, 8.60764483e-01, ..., 5.07525299e-01, 8.05566770e-01, 2.64726322e-01], ..., [5.30981203e-01, 7.92387505e-02, 1.41975658e-01, ..., 3.63383092e-01, 2.53768577e-01, 4.94209547e-01], [6.06135681e-01, 8.32544637e-01, 1.99071948e-01, ..., 5.80356254e-01, 6.77165601e-01, 1.84491936e-01], [5.09632917e-01, 8.17885282e-01, 8.25723623e-01, ..., 4.88223486e-01, 8.93004218e-01, 1.42514612e-01]], [[7.66958332e-01, 1.03235350e-01, 9.08438800e-01, ..., 4.62632264e-01, 9.50628461e-01, 3.70750799e-01], [5.91590749e-01, 4.96129881e-01, 9.06787888e-01, ..., 7.84049724e-01, 1.40378762e-01, 5.53080097e-01], [7.98186600e-01, 2.63627792e-01, 9.96007827e-01, ..., 8.55533874e-01, 9.08426938e-01, 7.01756457e-01], ..., [8.91094044e-01, 2.66735610e-01, 2.03135862e-01, ..., 3.48224459e-01, 2.73738414e-01, 2.83524009e-01], [6.17402900e-01, 5.74617949e-01, 8.07898363e-01, ..., 4.00141902e-01, 9.01808235e-01, 4.90098509e-01], [3.47262964e-01, 4.46598245e-01, 8.59481101e-02, ..., 9.08824081e-01, 1.74814337e-01, 5.00418518e-01]], [[1.29020971e-01, 9.76814189e-01, 1.79864190e-01, ..., 5.92040587e-01, 6.89841825e-01, 5.96742955e-01], [7.53276948e-01, 8.42221766e-01, 1.57956040e-01, ..., 9.80260304e-01, 1.84399332e-01, 4.27876789e-01], [1.04731689e-01, 9.82574909e-01, 5.52074612e-01, ..., 3.79354488e-01, 5.14437569e-02, 4.97724282e-01], ..., [3.16538712e-01, 7.01694800e-01, 9.68827223e-01, ..., 2.04106839e-01, 8.87575273e-02, 5.09549772e-01], [1.26428164e-01, 3.64415424e-02, 9.93282667e-01, ..., 3.62110831e-01, 8.94865394e-01, 4.43549012e-01], [4.37427387e-01, 9.17882623e-01, 4.11800654e-01, ..., 6.08778153e-01, 6.64835079e-01, 8.87303085e-02]], ..., [[4.35012681e-01, 5.52367295e-01, 9.96958150e-01, ..., 5.48731891e-03, 7.17156231e-01, 9.51240738e-01], [9.87007737e-02, 6.91943782e-01, 7.71360105e-01, ..., 9.02601531e-01, 4.45354574e-01, 3.25477469e-01], [9.97348373e-01, 8.02908995e-01, 5.52429600e-01, ..., 8.37628992e-01, 5.31306152e-01, 2.99783794e-01], ..., [7.76521850e-01, 9.85018108e-01, 7.46734811e-01, ..., 5.82553207e-01, 4.41701899e-01, 7.37071095e-01], [5.53178052e-01, 4.62077742e-01, 8.26103135e-01, ..., 9.53023204e-01, 3.01911313e-01, 5.36291392e-01], [1.53147545e-01, 1.14430308e-01, 4.79870666e-01, ..., 2.38438825e-01, 4.11956458e-01, 8.00363492e-01]], [[9.28892934e-01, 5.78496006e-01, 4.38445946e-01, ..., 1.43734834e-01, 8.06277648e-01, 1.12858263e-01], [7.35994903e-01, 3.58018874e-01, 9.64117128e-01, ..., 1.50379490e-01, 9.51641522e-01, 6.39515422e-01], [1.69714945e-01, 4.36259441e-01, 4.35400253e-01, ..., 9.95416422e-01, 8.25147282e-02, 2.46732236e-01], ..., [3.21412245e-01, 4.41057372e-01, 3.61327719e-01, ..., 7.76814309e-01, 6.94627799e-02, 8.55650007e-01], [1.73525188e-01, 3.45599531e-01, 2.72005936e-01, ..., 3.50766977e-01, 5.07476616e-01, 8.72551422e-01], [2.82701887e-01, 1.02091421e-01, 3.10475584e-01, ..., 3.81428347e-01, 7.76787403e-01, 2.81372791e-02]], [[3.23757479e-01, 3.29602009e-01, 7.47145487e-01, ..., 9.09513459e-01, 8.68190238e-01, 5.76250919e-02], [8.17473495e-01, 7.25015329e-02, 9.47978243e-02, ..., 7.77321808e-03, 5.81687238e-01, 6.72654480e-01], [5.31916087e-01, 5.41648893e-01, 4.92410549e-02, ..., 6.27234217e-01, 2.96502077e-01, 1.33490398e-01], ..., [4.50075988e-01, 9.61465656e-01, 1.10498464e-02, ..., 2.94472565e-01, 1.14060286e-01, 7.73264673e-01], [7.96308295e-02, 4.85744900e-02, 7.86192322e-01, ..., 2.72175255e-01, 1.84821802e-01, 2.55356302e-01], [6.26175284e-01, 2.14494274e-01, 4.99488215e-01, ..., 5.94146398e-01, 9.52217349e-01, 5.93315896e-01]]], [[[4.15016464e-02, 6.53998867e-01, 4.93299878e-01, ..., 8.79785139e-01, 7.32103886e-03, 3.82914599e-01], [3.35995799e-01, 6.78674023e-01, 2.03786578e-01, ..., 7.05425175e-02, 3.89875770e-01, 1.83759689e-01], [4.74427750e-01, 1.97578094e-01, 2.91612939e-01, ..., 7.79851444e-01, 4.42357992e-01, 9.69521229e-01], ..., [2.96348789e-01, 7.96658947e-01, 5.52585169e-01, ..., 9.63117410e-01, 1.10648441e-01, 7.59894433e-01], [6.59787373e-01, 1.58297257e-01, 3.67949660e-01, ..., 8.50314290e-01, 3.75683702e-01, 3.21928399e-01], [7.14411092e-01, 9.51684559e-01, 7.42980719e-01, ..., 3.04722278e-01, 3.70156779e-01, 2.79361891e-01]], [[2.67324520e-01, 1.55289842e-01, 8.54600555e-01, ..., 9.39793974e-01, 8.75252000e-02, 7.71874138e-01], [3.12476760e-01, 5.90593393e-01, 6.70258921e-01, ..., 7.55257308e-01, 8.63976325e-01, 2.51198436e-01], [4.80216277e-01, 6.91567813e-01, 7.68804106e-02, ..., 9.00345566e-01, 1.60912513e-01, 5.46951176e-01], ..., [4.88887008e-01, 5.93935867e-01, 2.28768559e-02, ..., 3.13203904e-01, 1.39081821e-01, 3.95252412e-01], [7.41165670e-01, 3.61644619e-02, 6.42829023e-01, ..., 4.27205788e-01, 3.83858887e-02, 6.30405801e-01], [1.69220283e-01, 5.59600452e-01, 1.66812280e-01, ..., 2.03090266e-01, 2.14814733e-01, 9.09510940e-01]], [[4.15737465e-01, 8.97520680e-01, 1.77590292e-01, ..., 4.54226457e-02, 8.77110906e-01, 7.08220982e-01], [5.01176036e-01, 4.71247586e-01, 4.47394486e-01, ..., 9.30758225e-01, 3.52843530e-01, 4.63454434e-01], [5.10787936e-01, 1.69769246e-01, 4.67889174e-01, ..., 8.19175577e-01, 7.95266417e-01, 1.35252905e-01], ..., [7.55694501e-01, 8.73901862e-01, 9.73297736e-01, ..., 5.21613856e-01, 2.51818790e-01, 4.69054679e-01], [1.46739665e-01, 9.06703206e-01, 9.34103286e-01, ..., 9.73432466e-01, 1.75173944e-01, 3.06510006e-01], [2.19818922e-01, 1.90565217e-01, 3.20871844e-01, ..., 5.35109037e-01, 6.11345609e-02, 6.42048463e-01]], ..., [[7.59505355e-01, 4.68683051e-01, 3.32485305e-01, ..., 3.04583261e-01, 5.12086928e-01, 1.69821181e-01], [8.36794577e-01, 5.43888416e-01, 8.82342297e-01, ..., 4.87126702e-01, 9.64664153e-01, 1.88657859e-01], [5.85415013e-01, 4.05594254e-01, 4.04643090e-01, ..., 9.15560649e-01, 8.77506985e-01, 1.62703809e-01], ..., [5.51003475e-01, 3.40203429e-01, 4.47072183e-01, ..., 2.50134593e-01, 7.92920059e-01, 5.16841563e-01], [8.61157720e-01, 5.08642800e-01, 4.72217343e-01, ..., 4.13255136e-01, 6.44906813e-01, 4.81388462e-01], [2.46474851e-01, 2.87455561e-01, 8.03921337e-01, ..., 8.52158646e-01, 2.77094787e-01, 2.87224942e-01]], [[4.12139832e-01, 5.83058647e-01, 8.85388079e-01, ..., 8.95837098e-01, 6.92742220e-01, 7.95978053e-01], [8.78292022e-01, 1.70031554e-01, 4.42277192e-01, ..., 5.95410752e-01, 9.54547785e-01, 3.46568926e-01], [4.48980622e-01, 5.67400601e-01, 1.20189762e-01, ..., 2.30248484e-01, 2.49875118e-01, 5.01825223e-01], ..., [2.34845937e-01, 4.64212370e-01, 9.89003419e-01, ..., 7.25007171e-01, 1.62296742e-01, 1.09179034e-01], [8.18163802e-01, 4.02933782e-01, 3.05278050e-01, ..., 3.58507825e-01, 8.34709885e-01, 2.91683880e-02], [5.82643337e-01, 4.09035954e-01, 7.95516323e-01, ..., 5.34351832e-01, 6.25183931e-01, 8.41895797e-01]], [[8.74479758e-01, 6.74848849e-01, 4.16008101e-01, ..., 7.27478487e-01, 1.81308984e-01, 2.32085790e-01], [8.71911392e-01, 8.42621658e-01, 8.34548244e-01, ..., 9.58079184e-01, 3.59889961e-01, 2.46839894e-01], [1.34273635e-01, 9.97402995e-01, 9.08709912e-01, ..., 5.93847500e-01, 7.56504281e-02, 6.84957268e-01], ..., [6.43733261e-01, 8.10105559e-01, 2.41195129e-01, ..., 9.97789092e-01, 2.42843021e-01, 8.31011301e-01], [1.99568152e-01, 8.38646919e-01, 5.18339233e-01, ..., 1.32123516e-01, 2.60301499e-01, 6.69037706e-01], [8.76101829e-01, 1.97237941e-01, 5.66556917e-01, ..., 3.83769913e-01, 5.43493555e-01, 5.79793628e-01]]], [[[7.16588480e-01, 9.17240103e-01, 6.95387467e-01, ..., 1.11058059e-01, 8.14845092e-01, 7.84152853e-01], [9.10394595e-01, 3.72999228e-01, 7.65689882e-01, ..., 8.75542051e-01, 2.12029677e-02, 9.24206277e-01], [2.25066555e-01, 8.39582836e-01, 3.54158621e-03, ..., 1.42683980e-01, 3.80926638e-01, 7.71336781e-01], ..., [6.73099390e-01, 5.01368995e-02, 3.43942139e-01, ..., 3.98811626e-01, 3.69613771e-01, 5.54697377e-01], [5.03678311e-01, 7.95271183e-01, 9.28534615e-01, ..., 6.97943764e-01, 5.38517468e-01, 2.94844275e-01], [4.28871837e-01, 9.56723898e-01, 4.69897012e-01, ..., 3.64981286e-01, 7.75918285e-01, 8.19818299e-01]], [[1.23002510e-01, 3.83249975e-02, 1.33216996e-02, ..., 3.06306698e-01, 8.76038936e-01, 3.11767199e-01], [4.37268540e-01, 4.73972698e-01, 7.82560248e-01, ..., 7.40407427e-01, 5.92919748e-01, 9.72583735e-01], [9.11853702e-01, 2.68129132e-01, 6.18048845e-02, ..., 1.46680572e-01, 3.76064638e-01, 9.38140538e-01], ..., [5.47966501e-01, 2.20123448e-01, 8.31215501e-01, ..., 8.34979442e-01, 2.74284589e-02, 6.40388625e-01], [3.29338158e-01, 4.42201670e-01, 1.36436594e-02, ..., 5.82825613e-01, 7.82994478e-01, 7.38187432e-02], [9.87444420e-01, 5.83647579e-01, 5.25642970e-01, ..., 4.67921777e-01, 9.84515861e-01, 4.83550230e-02]], [[7.37252121e-01, 2.75949038e-01, 8.15486662e-01, ..., 2.23170654e-01, 6.69461513e-01, 1.18662644e-01], [7.60972787e-02, 8.39547177e-01, 6.08321296e-01, ..., 8.45554537e-01, 4.38617644e-01, 3.05878236e-01], [1.16705989e-03, 4.73560333e-01, 4.88306109e-01, ..., 9.80902317e-01, 8.64261803e-01, 3.07467300e-01], ..., [9.13697673e-01, 3.87669588e-01, 1.79206239e-01, ..., 5.18639828e-01, 2.52711857e-01, 4.83520162e-02], [3.33419261e-01, 8.99718873e-01, 4.97990073e-01, ..., 3.50121650e-01, 1.70433466e-02, 9.90267107e-02], [1.86767449e-01, 2.28738304e-01, 2.59172280e-01, ..., 2.45977159e-01, 8.21054694e-01, 8.19003318e-02]], ..., [[5.99186113e-02, 4.56744548e-01, 2.59940581e-01, ..., 9.23348664e-01, 9.86530083e-01, 9.51832912e-01], [8.94985115e-01, 4.30931234e-01, 6.65414115e-02, ..., 3.58095826e-01, 2.25745982e-01, 6.80001577e-01], [5.92259921e-01, 9.59306961e-01, 2.85781119e-01, ..., 7.09471986e-01, 5.27137643e-01, 3.20413524e-01], ..., [9.82208451e-01, 6.55368320e-01, 3.56630131e-01, ..., 6.83097320e-03, 6.96009341e-01, 9.35153214e-01], [5.82698998e-01, 7.98720101e-01, 9.94359727e-01, ..., 1.71123992e-03, 7.11976313e-01, 7.33448070e-02], [4.51865277e-02, 2.17846141e-01, 3.44865095e-01, ..., 5.56786488e-01, 5.88558531e-01, 6.90124424e-01]], [[9.93254574e-01, 5.28133069e-01, 4.73527493e-01, ..., 5.14737218e-01, 8.03572260e-01, 5.94716887e-01], [8.40590129e-01, 2.38196307e-01, 1.45737278e-01, ..., 5.74922722e-01, 3.98907825e-01, 5.79300741e-01], [5.06785648e-01, 1.27922785e-02, 3.63741195e-01, ..., 6.35777746e-02, 3.03160838e-01, 3.11544805e-01], ..., [9.41580175e-01, 8.71449414e-01, 6.17974412e-01, ..., 8.97663791e-01, 9.82968340e-02, 9.22939181e-01], [6.12565831e-01, 6.90576709e-02, 4.57426127e-01, ..., 1.02415555e-02, 9.00809931e-01, 3.91689550e-01], [7.49798856e-01, 6.13885891e-01, 6.61152598e-01, ..., 9.07043967e-01, 7.51846856e-01, 4.44834563e-01]], [[3.19929737e-01, 9.69117232e-01, 7.08655378e-01, ..., 3.29324409e-01, 3.25975498e-01, 7.66253680e-01], [3.83817555e-01, 1.26565794e-01, 4.28296454e-01, ..., 6.75974806e-01, 4.71455639e-01, 8.18344078e-01], [3.42887859e-01, 4.14211245e-01, 7.24103457e-01, ..., 5.55380279e-01, 4.01893349e-01, 2.25827794e-01], ..., [7.71408025e-01, 2.37418522e-01, 2.16077852e-01, ..., 7.60481709e-02, 3.60106534e-01, 1.40804684e-01], [2.68646153e-01, 3.60885928e-01, 9.88561429e-01, ..., 6.70994498e-01, 4.09678826e-01, 2.10281591e-01], [4.92968667e-02, 8.39239508e-01, 5.54589092e-02, ..., 3.64023415e-01, 6.77726242e-01, 7.34313713e-01]]]])
Quiz time again!
Can you guess the syntax?
quiz(2)
Getting the z
slice is now easy and readable.
Adding coordinates#
Coordinates can be specified for each dimension.
They describe the extent of each axis, as well as how far each data point is from its neighbours.
Here is an array that represents air pollution levels as a function of altitude and time.
data = sc.array(
dims=["altitude", "year"],
values=np.linspace(500, 10, 5).reshape((5, 1)) * rng.random(10),
)
sc.show(data)
data.plot()
In Scipp and Xarray, coordinates are added in a data structure called DataArray
:
da = sc.DataArray(
data=data,
coords={
"altitude": sc.linspace("altitude", 0, 8000, 5),
},
)
sc.show(da)
da
- altitude: 5
- year: 10
- altitude(altitude)float64𝟙0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.])
- (altitude, year)float64𝟙488.350, 190.098, ..., 9.641, 2.636
Values:
array([[488.34988335, 190.09786751, 461.62311688, 130.84621193, 159.54852921, 59.04561648, 120.88314663, 159.26696439, 482.03962259, 131.82490214], [368.70416193, 143.52388997, 348.52545325, 98.78889001, 120.45913955, 44.57944044, 91.2667757 , 120.24655812, 363.93991505, 99.52780111], [249.05844051, 96.94991243, 235.42778961, 66.73156809, 81.3697499 , 30.11326441, 61.65040478, 81.22615184, 245.84020752, 67.23070009], [129.41271909, 50.37593489, 122.33012597, 34.67424616, 42.28036024, 15.64708837, 32.03403386, 42.20574556, 127.74049999, 34.93359907], [ 9.76699767, 3.80195735, 9.23246234, 2.61692424, 3.19097058, 1.18091233, 2.41766293, 3.18533929, 9.64079245, 2.63649804]])
da.plot()
Accessing and adding more coordinates#
Coordinates are stored in a dict
,
and each dimension can have more than one coordinate.
Getting and setting coordinates is done using the same syntax as Python dicts:
print(da.coords.keys())
da.coords["altitude"]
<scipp.Dict.keys {altitude}>
- (altitude: 5)float64𝟙0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.])
Exercise 1.1: Adding a new coordinate#
The air pollution data was collected every year
from 2014 to 2023; [2014, 2024)
.
Let’s add a coordinate, year
to the year
dimension.
Tip: You can create a
Variable
with consecutive numbers by usingsc.arange(dim, start, stop)
.
Hint:
da = sc.DataArray(
data=data,
coords={
"altitude": sc.linspace("altitude", 0, 8000, 5),
"year": sc.arange(..., 2014, ...)
},
)
or
da.coords['year'] = sc.arange(..., 2014, ...)
Solution:
Show code cell content
da = sc.DataArray(
data=data,
coords={
"altitude": sc.linspace("altitude", 0, 8000, 5),
"year": sc.arange("year", 2014, 2024),
},
)
sc.show(da)
da
- altitude: 5
- year: 10
- altitude(altitude)float64𝟙0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.]) - year(year)int64𝟙2014, 2015, ..., 2022, 2023
Values:
array([2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])
- (altitude, year)float64𝟙488.350, 190.098, ..., 9.641, 2.636
Values:
array([[488.34988335, 190.09786751, 461.62311688, 130.84621193, 159.54852921, 59.04561648, 120.88314663, 159.26696439, 482.03962259, 131.82490214], [368.70416193, 143.52388997, 348.52545325, 98.78889001, 120.45913955, 44.57944044, 91.2667757 , 120.24655812, 363.93991505, 99.52780111], [249.05844051, 96.94991243, 235.42778961, 66.73156809, 81.3697499 , 30.11326441, 61.65040478, 81.22615184, 245.84020752, 67.23070009], [129.41271909, 50.37593489, 122.33012597, 34.67424616, 42.28036024, 15.64708837, 32.03403386, 42.20574556, 127.74049999, 34.93359907], [ 9.76699767, 3.80195735, 9.23246234, 2.61692424, 3.19097058, 1.18091233, 2.41766293, 3.18533929, 9.64079245, 2.63649804]])
Exercise 1.2: Compute new coordinate#
Add a new coordinate representing the Scipp-year.
Hint: Scipp was first released in 2020
Solution:
Show code cell content
da.coords["scipp-year"] = da.coords["year"] - 2020
sc.show(da)
da
- altitude: 5
- year: 10
- altitude(altitude)float64𝟙0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.]) - scipp-year(year)int64𝟙-6, -5, ..., 2, 3
Values:
array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3]) - year(year)int64𝟙2014, 2015, ..., 2022, 2023
Values:
array([2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])
- (altitude, year)float64𝟙488.350, 190.098, ..., 9.641, 2.636
Values:
array([[488.34988335, 190.09786751, 461.62311688, 130.84621193, 159.54852921, 59.04561648, 120.88314663, 159.26696439, 482.03962259, 131.82490214], [368.70416193, 143.52388997, 348.52545325, 98.78889001, 120.45913955, 44.57944044, 91.2667757 , 120.24655812, 363.93991505, 99.52780111], [249.05844051, 96.94991243, 235.42778961, 66.73156809, 81.3697499 , 30.11326441, 61.65040478, 81.22615184, 245.84020752, 67.23070009], [129.41271909, 50.37593489, 122.33012597, 34.67424616, 42.28036024, 15.64708837, 32.03403386, 42.20574556, 127.74049999, 34.93359907], [ 9.76699767, 3.80195735, 9.23246234, 2.61692424, 3.19097058, 1.18091233, 2.41766293, 3.18533929, 9.64079245, 2.63649804]])
2. Going further#
2.1 Physical units#
Every data variable and coordinate in Scipp has physical units. (see also pint, astropy.units, pint-xarray)
Array Variable
with unit:
temperature = sc.array(dims=["time"], values=[300.0, 301.0, 312.0, 340.0], unit="K")
temperature
- (time: 4)float64K300.0, 301.0, 312.0, 340.0
Values:
array([300., 301., 312., 340.])
Scalar Variable
(no dimensions) with unit:
sound_speed = sc.scalar(340.0, unit="m/s")
sound_speed
- ()float64m/s340.0
Values:
array(340.)
Coordinates and data with units in a DataArray
:
cph_air = sc.DataArray(
data=sc.array(
dims=["altitude", "year"],
values=np.linspace(500, 10, 5).reshape((5, 1)) * rng.random(10),
unit="m-3",
),
coords={
"altitude": sc.linspace("altitude", 0, 8000, 5, unit="m"),
"year": sc.arange("year", 2014, 2024, unit="year"),
},
)
cph_air
- altitude: 5
- year: 10
- altitude(altitude)float64m0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.]) - year(year)int64Y2014, 2015, ..., 2022, 2023
Values:
array([2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])
- (altitude, year)float641/m^3220.503, 304.935, ..., 1.721, 8.704
Values:
array([[220.50306103, 304.93540471, 431.81064828, 431.87883539, 337.44065667, 329.93717398, 367.87884916, 111.37682907, 86.03309233, 435.20748624], [166.47981108, 230.22623056, 326.01703946, 326.06852072, 254.76769579, 249.10256635, 277.74853111, 84.08950594, 64.95498471, 328.58165211], [112.45656112, 155.5170564 , 220.22343063, 220.25820605, 172.0947349 , 168.26795873, 187.61821307, 56.80218282, 43.87687709, 221.95581798], [ 58.43331117, 80.80788225, 114.4298218 , 114.44789138, 89.42177402, 87.4333511 , 97.48789503, 29.5148597 , 22.79876947, 115.32998385], [ 4.41006122, 6.09870809, 8.63621297, 8.63757671, 6.74881313, 6.59874348, 7.35757698, 2.22753658, 1.72066185, 8.70414972]])
Units are automatically handled in arithmetic operations.
Say we know the mean ultra-fine particle mass
ultra_fine_particle_mass = sc.scalar(1.0e-6, unit="kg")
cph_air *= ultra_fine_particle_mass
cph_air
- altitude: 5
- year: 10
- altitude(altitude)float64m0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.]) - year(year)int64Y2014, 2015, ..., 2022, 2023
Values:
array([2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])
- (altitude, year)float64kg/m^30.000, 0.000, ..., 1.721e-06, 8.704e-06
Values:
array([[2.20503061e-04, 3.04935405e-04, 4.31810648e-04, 4.31878835e-04, 3.37440657e-04, 3.29937174e-04, 3.67878849e-04, 1.11376829e-04, 8.60330923e-05, 4.35207486e-04], [1.66479811e-04, 2.30226231e-04, 3.26017039e-04, 3.26068521e-04, 2.54767696e-04, 2.49102566e-04, 2.77748531e-04, 8.40895059e-05, 6.49549847e-05, 3.28581652e-04], [1.12456561e-04, 1.55517056e-04, 2.20223431e-04, 2.20258206e-04, 1.72094735e-04, 1.68267959e-04, 1.87618213e-04, 5.68021828e-05, 4.38768771e-05, 2.21955818e-04], [5.84333112e-05, 8.08078822e-05, 1.14429822e-04, 1.14447891e-04, 8.94217740e-05, 8.74333511e-05, 9.74878950e-05, 2.95148597e-05, 2.27987695e-05, 1.15329984e-04], [4.41006122e-06, 6.09870809e-06, 8.63621297e-06, 8.63757671e-06, 6.74881313e-06, 6.59874348e-06, 7.35757698e-06, 2.22753658e-06, 1.72066185e-06, 8.70414972e-06]])
Units also provide protection#
Say we now also have air pollution data for another city, e.g., NYC.
We would like to compute the difference between CPH and NYC air pollution (as a function of altitude and year), but we forgot to multiply the NYC data by particle mass:
nyc_air = sc.DataArray(
data=sc.array(
dims=["altitude", "year"],
values=np.linspace(800, 20, 5).reshape((5, 1)) * rng.random(10),
unit="m-3",
),
coords={
"altitude": sc.linspace("altitude", 0, 8000, 5, unit="m"),
"year": sc.arange("year", 2014, 2024, unit="year"),
},
)
cph_air - nyc_air
---------------------------------------------------------------------------
UnitError Traceback (most recent call last)
Cell In[24], line 13
1 nyc_air = sc.DataArray(
2 data=sc.array(
3 dims=["altitude", "year"],
(...)
10 },
11 )
---> 13 cph_air - nyc_air
UnitError: Cannot subtract kg/m^3 and 1/m^3.
nyc_air *= ultra_fine_particle_mass
air_difference = cph_air - nyc_air
air_difference.plot()
Units are very useful in early prevention of difficult-to-spot bugs in a workflow.
They save hours of debugging time, free-up mental capacity and let the user focus on the important thing: doing science.
Units for label-based indexing#
We also use units to distinguish between positional indexing and label-based indexing:
cph_air["altitude", 2000.0 * sc.Unit("m")].plot()
Positional indices are based on the dimension
, and value indices are based on the coordinates
.
Exercise 2: Coordinate and Units#
We have a data array that contains air pollution
as a function of year
and altitude
above the city of Copenhagen.
However, we want to have a pressure
coordinate for the altitude
dimension instead of altitude
.
Assuming a constant air temperature \(T\) of 300 K, the pressure as a function of height \(h\) is given by
Here is the incomplete function altitude_to_pressure
that converts altitude[m]
into pressure[hPa]
.
Complete the function and use it to add the pressure
coordinate to cph_air
.
def altitude_to_pressure(altitude):
M = sc.scalar(0.0289644, unit="kg/mol")
g0 = sc.scalar(9.80665, unit="m/s2")
R = sc.scalar(8.3144598, unit="J/mol/K")
T = sc.scalar(300.0)
p0 = sc.scalar(1013.25, unit="hPa")
return p0 * sc.exp(-g0 * M * altitude / (R * T))
Solution:
Show code cell content
def altitude_to_pressure(altitude):
M = sc.scalar(0.0289644, unit="kg/mol")
g0 = sc.scalar(9.80665, unit="m/s2")
R = sc.scalar(8.3144598, unit="J/mol/K")
T = sc.scalar(300.0, unit="K")
p0 = sc.scalar(1013.25, unit="hPa")
return p0 * sc.exp(-g0 * M * altitude / (R * T))
cph_air.coords["pressure"] = altitude_to_pressure(cph_air.coords["altitude"])
cph_air
- altitude: 5
- year: 10
- altitude(altitude)float64m0.0, 2000.000, 4000.000, 6000.000, 8000.000
Values:
array([ 0., 2000., 4000., 6000., 8000.]) - pressure(altitude)float64100Pa1013.250, 806.874, 642.532, 511.663, 407.449
Values:
array([1013.25 , 806.87395253, 642.53202593, 511.66282298, 407.44870895]) - year(year)int64Y2014, 2015, ..., 2022, 2023
Values:
array([2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])
- (altitude, year)float64kg/m^30.000, 0.000, ..., 1.721e-06, 8.704e-06
Values:
array([[2.20503061e-04, 3.04935405e-04, 4.31810648e-04, 4.31878835e-04, 3.37440657e-04, 3.29937174e-04, 3.67878849e-04, 1.11376829e-04, 8.60330923e-05, 4.35207486e-04], [1.66479811e-04, 2.30226231e-04, 3.26017039e-04, 3.26068521e-04, 2.54767696e-04, 2.49102566e-04, 2.77748531e-04, 8.40895059e-05, 6.49549847e-05, 3.28581652e-04], [1.12456561e-04, 1.55517056e-04, 2.20223431e-04, 2.20258206e-04, 1.72094735e-04, 1.68267959e-04, 1.87618213e-04, 5.68021828e-05, 4.38768771e-05, 2.21955818e-04], [5.84333112e-05, 8.08078822e-05, 1.14429822e-04, 1.14447891e-04, 8.94217740e-05, 8.74333511e-05, 9.74878950e-05, 2.95148597e-05, 2.27987695e-05, 1.15329984e-04], [4.41006122e-06, 6.09870809e-06, 8.63621297e-06, 8.63757671e-06, 6.74881313e-06, 6.59874348e-06, 7.35757698e-06, 2.22753658e-06, 1.72066185e-06, 8.70414972e-06]])
2.2 Histogramming and bin-edge coordinates#
It is sometimes necessary to have coordinates that represent a range for each data value.
E.g., “the temperature was 310 K in the time span between 10 and 20 seconds”.
This also arises every time we histogram data, as in the image above.
Scipp supports this by having bin-edge coordinates: a coordinate which has a length of 1 more than the dimension length.
The next data set is meant to represent photon events in a camera.
We have a long list of x
and y
positions for the photons.
x = sc.array(dims=["row"], values=rng.normal(size=10000), unit="cm")
y = sc.array(dims=["row"], values=rng.normal(size=10000), unit="cm")
recording = sc.DataArray(
data=sc.ones(sizes=x.sizes, unit="counts"), coords={"x": x, "y": y}
)
recording
- row: 10000
- x(row)float64cm-0.679, -0.621, ..., -0.788, 1.123
Values:
array([-0.67924997, -0.62053203, 1.33121422, ..., 0.66931119, -0.78846271, 1.12268421]) - y(row)float64cm-0.306, -0.704, ..., -0.848, 0.739
Values:
array([-0.30610691, -0.70374693, -1.00885753, ..., -0.42935023, -0.84826518, 0.73921398])
- (row)float64counts1.0, 1.0, ..., 1.0, 1.0
Values:
array([1., 1., 1., ..., 1., 1., 1.])
scatter(x.values, y.values)
It is very common to histogram such data.
In Scipp, histogramming has a very concise and easy-to-use syntax.
To make 8 bins in both the x
and y
dimensions:
image = recording.hist(y=8, x=8)
image.plot(aspect="equal")
The x
and y
coordinates are now bin-edge coordinates.
sc.show(image)
image
- y: 8
- x: 8
- x(x [bin-edge])float64cm-3.819, -2.831, ..., 3.095, 4.083
Values:
array([-3.81886182, -2.83110929, -1.84335675, -0.85560421, 0.13214833, 1.11990087, 2.10765341, 3.09540594, 4.08315848]) - y(y [bin-edge])float64cm-3.646, -2.663, ..., 3.236, 4.219
Values:
array([-3.64602432, -2.66291545, -1.67980659, -0.69669773, 0.28641113, 1.26952 , 2.25262886, 3.23573772, 4.21884658])
- (y, x)float64counts0.0, 2.0, ..., 1.0, 0.0
Values:
array([[0.000e+00, 2.000e+00, 8.000e+00, 1.200e+01, 1.000e+01, 5.000e+00, 0.000e+00, 0.000e+00], [1.000e+00, 1.100e+01, 8.100e+01, 1.460e+02, 1.190e+02, 4.600e+01, 6.000e+00, 1.000e+00], [4.000e+00, 6.200e+01, 3.290e+02, 7.620e+02, 6.310e+02, 2.370e+02, 3.000e+01, 3.000e+00], [7.000e+00, 1.210e+02, 5.790e+02, 1.298e+03, 1.179e+03, 4.170e+02, 7.000e+01, 2.000e+00], [5.000e+00, 8.800e+01, 4.440e+02, 1.030e+03, 8.770e+02, 3.240e+02, 4.400e+01, 1.000e+00], [3.000e+00, 3.600e+01, 1.490e+02, 3.280e+02, 2.710e+02, 9.300e+01, 1.400e+01, 2.000e+00], [0.000e+00, 5.000e+00, 1.900e+01, 3.500e+01, 3.600e+01, 1.100e+01, 1.000e+00, 0.000e+00], [0.000e+00, 0.000e+00, 1.000e+00, 2.000e+00, 1.000e+00, 0.000e+00, 1.000e+00, 0.000e+00]])
Numpy and Matplotlib return the bin edges and the data counts separately.
We have everything stored inside a single data structure.
You can, of course, adjust the number of bins:
recording.hist(y=100, x=100).plot(aspect="equal")
Exercise 3: Histogramming#
We found a 2D detector that reads your mood!
We recorded a signal with it, and now we can visualize the signal by histogramming.
from scipp_utils import load_signal_to_histogram
signal_rng = np.random.default_rng(1)
signal = load_signal_to_histogram(signal_rng)
signal
- row: 203700
- x(row)float64cm7.278, -16.296, ..., 0.018, 0.016
Values:
array([ 7.27805283e+00, -1.62959448e+01, -7.09559402e+00, ..., 1.89787051e-02, 1.81696472e-02, 1.60307256e-02]) - y(row)float64cm0.432, -0.784, ..., 0.061, 0.060
Values:
array([ 0.43193503, -0.78422306, -9.62131874, ..., 0.06044533, 0.06091188, 0.05965346])
- (row)float64counts1.0, 1.0, ..., 1.0, 1.0
Values:
array([1., 1., 1., ..., 1., 1., 1.])
Exercise 3-1: Number of bins for histogramming.#
First, we need to find the right number of bins to histogram the signal.
We tried 200 bins and 4 bins for each axis, but none of them seems meaningful!
signal.hist(x=200, y=200).plot() + signal.hist(x=4, y=4).plot()
Solution:
Show code cell content
# 30~50 bins are enough to see the meaningful shape!
signal.hist(x=50, y=50).plot() + signal.hist(x=30, y=30).plot()
Exercise 3-2: Custom histogram edges.#
However, there is a suspicious hot spot in the very middle of the image.
We want to investigate those signals within the specific range of x
and y
.
Let’s histogram the hot spot and see what is in there.
You can histogram the data with custom histogram edges like below.
Hint:
hist_edges_x = sc.linspace(dim='x', start=-10, stop=10, unit='cm', num=200)
hist_edges_y = sc.linspace(dim='y', start=-10, stop=10, unit='cm', num=200)
signal.hist(x=hist_edges_x, y=hist_edges_y).plot()
Solution:
Show code cell content
# There was a smiley in the middle of the heart!
hist_edges_x = sc.linspace(dim='x', start=-0.15, stop=0.15, unit='cm', num=50)
hist_edges_y = sc.linspace(dim='y', start=-0.15, stop=0.15, unit='cm', num=50)
signal.hist(x=hist_edges_x, y=hist_edges_y).plot()
3. Binned data#
Scipp distinguishes histogrammed data from binned data:
Histogrammed data refers to regular dense arrays of, e.g., floating-point values with an associated bin-edge coordinate.
Binned data refers to the precursor of histogrammed data, i.e., each bin contains a “list” of contributing events or values. Binned data can be converted into a histogram by computing the sum over all events or values in a bin.
This is conceptually similar to a multi-dimensional .
It is best illustrated with an example of data analysis. For this, we will use one of the NYC taxi datasets.
NYC yellow taxi dataset#
(https://vaex.readthedocs.io/en/latest/datasets.html, Dataset from 2015, obtained as a HDF5 file from the Vaex docs, and subsequently cleaned of outliers).
For today, we will use a small set of it.
!wget -nc --no-verbose https://public.esss.dk/groups/scipp/dmsc-summer-school/scipp/nyc_taxi_data_2015_small.tar.gz
!tar -xzf nyc_taxi_data_2015_small.tar.gz
2024-09-10 09:04:54 URL:https://public.esss.dk/groups/scipp/dmsc-summer-school/scipp/nyc_taxi_data_2015_small.tar.gz [362919163/362919163] -> "nyc_taxi_data_2015_small.tar.gz" [1]
# %matplotlib widget
da = sc.io.load_hdf5("nyc_taxi_data_2015_small.h5")
da
- row: 17839810
- dropoff_datetime(row)datetime64s2014-12-16T02:28:00, 2015-01-10T20:58:31, ..., 2016-01-01T00:11:37, 2016-01-01T00:14:14
Values:
array(['2014-12-16T02:28:00', '2015-01-10T20:58:31', '2015-01-10T20:39:23', ..., '2016-01-01T00:06:43', '2016-01-01T00:11:37', '2016-01-01T00:14:14'], dtype='datetime64[s]') - dropoff_hour(row)int64𝟙2, 20, ..., 0, 0
Values:
array([ 2, 20, 20, ..., 0, 0, 0]) - dropoff_latitude(row)float64deg40.743, 40.750, ..., 40.763, 40.696
Values:
array([40.74289322, 40.74963379, 40.73989487, ..., 40.74245071, 40.76282883, 40.69619751]) - dropoff_longitude(row)float64deg-73.996, -73.992, ..., -73.925, -73.980
Values:
array([-73.99645996, -73.99246979, -73.99521637, ..., -73.97740936, -73.92475128, -73.98009491]) - fare_amount(row)float64\$5.0, 14.0, ..., 12.5, 17.0
Values:
array([ 5. , 14. , 6. , ..., 10.5, 12.5, 17. ]) - pickup_datetime(row)datetime64s2014-12-16T02:26:00, 2015-01-10T20:33:39, ..., 2015-12-31T23:59:48, 2015-12-31T23:59:55
Values:
array(['2014-12-16T02:26:00', '2015-01-10T20:33:39', '2015-01-10T20:33:41', ..., '2015-12-31T23:59:46', '2015-12-31T23:59:48', '2015-12-31T23:59:55'], dtype='datetime64[s]') - pickup_hour(row)int64𝟙2, 20, ..., 23, 23
Values:
array([ 2, 20, 20, ..., 23, 23, 23]) - pickup_latitude(row)float64deg40.756, 40.726, ..., 40.764, 40.731
Values:
array([40.75642014, 40.72600937, 40.73177719, ..., 40.77241898, 40.7635498 , 40.73109055]) - pickup_longitude(row)float64deg-73.987, -73.983, ..., -73.971, -73.982
Values:
array([-73.98672485, -73.98327637, -74.0067215 , ..., -73.9466095 , -73.97135925, -73.98199463]) - tip_amount(row)float64\$0.0, 0.0, ..., 0.0, 0.0
Values:
array([0., 0., 0., ..., 0., 0., 0.]) - trip_distance(row)float64mi1.090, 2.200, ..., 3.130, 5.070
Values:
array([1.09000003, 2.20000005, 1.10000002, ..., 2.79999995, 3.13000011, 5.07000017])
- (row)float64counts1.0, 1.0, ..., 1.0, 1.0
Values:
array([1., 1., 1., ..., 1., 1., 1.])
n = 100
x = da.coords["dropoff_longitude"].values[::n]
y = da.coords["dropoff_latitude"].values[::n]
scatter(x, y)
Binning the data records#
Working with binned data is most efficient when keeping the number of bins relatively low.
Binning is essentially like overlaying a grid of bin edges onto our data
ax = scatter(x, y, get_ax=True)
for lon in np.linspace(*ax.get_xlim(), 9):
ax.axvline(lon, color="gray")
for lat in np.linspace(*ax.get_ylim(), 9):
ax.axhline(lat, color="gray")
# Bin into 8 longitude & latitude bins
binned = da.bin(dropoff_latitude=8, dropoff_longitude=8)
binned
- dropoff_latitude: 8
- dropoff_longitude: 8
- dropoff_latitude(dropoff_latitude [bin-edge])float64deg40.595, 40.635, ..., 40.875, 40.915
Values:
array([40.59500122, 40.63499641, 40.67499161, 40.7149868 , 40.75498199, 40.79497719, 40.83497238, 40.87496758, 40.91496277]) - dropoff_longitude(dropoff_longitude [bin-edge])float64deg-74.050, -74.010, ..., -73.770, -73.730
Values:
array([-74.04999542, -74.00999641, -73.96999741, -73.9299984 , -73.88999939, -73.85000038, -73.81000137, -73.77000237, -73.73000336])
- (dropoff_latitude, dropoff_longitude)DataArrayViewbinned data [len=16129, len=12695, ..., len=500, len=12]
dim='row', content=DataArray( dims=(row: 17839810), data=float64[counts], coords={'dropoff_datetime':datetime64[s], 'pickup_datetime':datetime64[s], 'fare_amount':float64[\$], 'trip_distance':float64[mi], 'tip_amount':float64[\$], 'dropoff_latitude':float64[deg], 'dropoff_longitude':float64[deg], 'pickup_latitude':float64[deg], 'pickup_longitude':float64[deg], 'dropoff_hour':int64[dimensionless], 'pickup_hour':int64[dimensionless]})
# Histogramming is summing all the counts in each bin
binned_sum = binned.bins.sum()
binned_sum.plot(aspect="equal", norm="log")
Selecting/slicing bins#
Binning groups the data into bins, but keeps the underlying table of records.
No information is lost, it is simply re-ordered.
The bins can then be used for slicing the data, providing extremely efficient data selection and filtering.
manh = binned["dropoff_longitude", 1]["dropoff_latitude", 4]
manh
- dropoff_latitude(dropoff_latitude [bin-edge])float64deg40.755, 40.795
Values:
array([40.75498199, 40.79497719]) - dropoff_longitude(dropoff_longitude [bin-edge])float64deg-74.010, -73.970
Values:
array([-74.00999641, -73.96999741])
- ()DataArrayViewbinned data [len=4215195]
dim='row', content=DataArray( dims=(row: 17839810), data=float64[counts], coords={'dropoff_datetime':datetime64[s], 'pickup_datetime':datetime64[s], 'fare_amount':float64[\$], 'trip_distance':float64[mi], 'tip_amount':float64[\$], 'dropoff_latitude':float64[deg], 'dropoff_longitude':float64[deg], 'pickup_latitude':float64[deg], 'pickup_longitude':float64[deg], 'dropoff_hour':int64[dimensionless], 'pickup_hour':int64[dimensionless]})
# We can now histogram this with a much finer resolution
manh.hist(dropoff_latitude=300, dropoff_longitude=300).plot(norm="log", aspect="equal")
# We select another bin, which contains the JFK airport
jfk = binned["dropoff_longitude", 6]["dropoff_latitude", 1]
jfk.hist(dropoff_latitude=300, dropoff_longitude=300).plot(norm="log", aspect="equal")
(https://commons.wikimedia.org/wiki/File:JFK_airport_terminal_map.png)
Binning into a new dimension#
Data that has already been binned can also be binned further into new dimensions.
manh
- dropoff_latitude(dropoff_latitude [bin-edge])float64deg40.755, 40.795
Values:
array([40.75498199, 40.79497719]) - dropoff_longitude(dropoff_longitude [bin-edge])float64deg-74.010, -73.970
Values:
array([-74.00999641, -73.96999741])
- ()DataArrayViewbinned data [len=4215195]
dim='row', content=DataArray( dims=(row: 17839810), data=float64[counts], coords={'dropoff_datetime':datetime64[s], 'pickup_datetime':datetime64[s], 'fare_amount':float64[\$], 'trip_distance':float64[mi], 'tip_amount':float64[\$], 'dropoff_latitude':float64[deg], 'dropoff_longitude':float64[deg], 'pickup_latitude':float64[deg], 'pickup_longitude':float64[deg], 'dropoff_hour':int64[dimensionless], 'pickup_hour':int64[dimensionless]})
We look at the trip distances inside the Manhattan and JFK bins we have selected above.
# Use 100 distance bins
manh_dist = manh.bin(trip_distance=100)
manh_dist
- trip_distance: 100
- dropoff_latitude(dropoff_latitude [bin-edge])float64deg40.755, 40.795
Values:
array([40.75498199, 40.79497719]) - dropoff_longitude(dropoff_longitude [bin-edge])float64deg-74.010, -73.970
Values:
array([-74.00999641, -73.96999741]) - trip_distance(trip_distance [bin-edge])float64mi0.020, 0.781, ..., 75.379, 76.140
Values:
array([1.99999996e-02, 7.81199993e-01, 1.54239999e+00, 2.30359998e+00, 3.06479998e+00, 3.82599997e+00, 4.58719996e+00, 5.34839996e+00, 6.10959995e+00, 6.87079994e+00, 7.63199994e+00, 8.39319993e+00, 9.15439993e+00, 9.91559992e+00, 1.06767999e+01, 1.14379999e+01, 1.21991999e+01, 1.29603999e+01, 1.37215999e+01, 1.44827999e+01, 1.52439999e+01, 1.60051999e+01, 1.67663999e+01, 1.75275999e+01, 1.82887999e+01, 1.90499998e+01, 1.98111998e+01, 2.05723998e+01, 2.13335998e+01, 2.20947998e+01, 2.28559998e+01, 2.36171998e+01, 2.43783998e+01, 2.51395998e+01, 2.59007998e+01, 2.66619998e+01, 2.74231998e+01, 2.81843998e+01, 2.89455998e+01, 2.97067998e+01, 3.04679998e+01, 3.12291997e+01, 3.19903997e+01, 3.27515997e+01, 3.35127997e+01, 3.42739997e+01, 3.50351997e+01, 3.57963997e+01, 3.65575997e+01, 3.73187997e+01, 3.80799997e+01, 3.88411997e+01, 3.96023997e+01, 4.03635997e+01, 4.11247997e+01, 4.18859997e+01, 4.26471997e+01, 4.34083997e+01, 4.41695996e+01, 4.49307996e+01, 4.56919996e+01, 4.64531996e+01, 4.72143996e+01, 4.79755996e+01, 4.87367996e+01, 4.94979996e+01, 5.02591996e+01, 5.10203996e+01, 5.17815996e+01, 5.25427996e+01, 5.33039996e+01, 5.40651996e+01, 5.48263996e+01, 5.55875996e+01, 5.63487995e+01, 5.71099995e+01, 5.78711995e+01, 5.86323995e+01, 5.93935995e+01, 6.01547995e+01, 6.09159995e+01, 6.16771995e+01, 6.24383995e+01, 6.31995995e+01, 6.39607995e+01, 6.47219995e+01, 6.54831995e+01, 6.62443995e+01, 6.70055995e+01, 6.77667995e+01, 6.85279995e+01, 6.92891994e+01, 7.00503994e+01, 7.08115994e+01, 7.15727994e+01, 7.23339994e+01, 7.30951994e+01, 7.38563994e+01, 7.46175994e+01, 7.53787994e+01, 7.61399994e+01])
- (trip_distance)DataArrayViewbinned data [len=676736, len=1480372, ..., len=0, len=1]
dim='row', content=DataArray( dims=(row: 4215195), data=float64[counts], coords={'dropoff_datetime':datetime64[s], 'pickup_datetime':datetime64[s], 'fare_amount':float64[\$], 'trip_distance':float64[mi], 'tip_amount':float64[\$], 'dropoff_latitude':float64[deg], 'dropoff_longitude':float64[deg], 'pickup_latitude':float64[deg], 'pickup_longitude':float64[deg], 'dropoff_hour':int64[dimensionless], 'pickup_hour':int64[dimensionless]})
manh_dist.hist().plot()
jfk_dist = jfk.bin(trip_distance=100)
jfk_dist.hist().plot()
Other operations on bins: what is the fare amount as a function of distance?#
In addition to summing/histogramming, bins can be used for other reduction operations:
min()
,max()
, andmean()
.
manh_dist
- trip_distance: 100
- dropoff_latitude(dropoff_latitude [bin-edge])float64deg40.755, 40.795
Values:
array([40.75498199, 40.79497719]) - dropoff_longitude(dropoff_longitude [bin-edge])float64deg-74.010, -73.970
Values:
array([-74.00999641, -73.96999741]) - trip_distance(trip_distance [bin-edge])float64mi0.020, 0.781, ..., 75.379, 76.140
Values:
array([1.99999996e-02, 7.81199993e-01, 1.54239999e+00, 2.30359998e+00, 3.06479998e+00, 3.82599997e+00, 4.58719996e+00, 5.34839996e+00, 6.10959995e+00, 6.87079994e+00, 7.63199994e+00, 8.39319993e+00, 9.15439993e+00, 9.91559992e+00, 1.06767999e+01, 1.14379999e+01, 1.21991999e+01, 1.29603999e+01, 1.37215999e+01, 1.44827999e+01, 1.52439999e+01, 1.60051999e+01, 1.67663999e+01, 1.75275999e+01, 1.82887999e+01, 1.90499998e+01, 1.98111998e+01, 2.05723998e+01, 2.13335998e+01, 2.20947998e+01, 2.28559998e+01, 2.36171998e+01, 2.43783998e+01, 2.51395998e+01, 2.59007998e+01, 2.66619998e+01, 2.74231998e+01, 2.81843998e+01, 2.89455998e+01, 2.97067998e+01, 3.04679998e+01, 3.12291997e+01, 3.19903997e+01, 3.27515997e+01, 3.35127997e+01, 3.42739997e+01, 3.50351997e+01, 3.57963997e+01, 3.65575997e+01, 3.73187997e+01, 3.80799997e+01, 3.88411997e+01, 3.96023997e+01, 4.03635997e+01, 4.11247997e+01, 4.18859997e+01, 4.26471997e+01, 4.34083997e+01, 4.41695996e+01, 4.49307996e+01, 4.56919996e+01, 4.64531996e+01, 4.72143996e+01, 4.79755996e+01, 4.87367996e+01, 4.94979996e+01, 5.02591996e+01, 5.10203996e+01, 5.17815996e+01, 5.25427996e+01, 5.33039996e+01, 5.40651996e+01, 5.48263996e+01, 5.55875996e+01, 5.63487995e+01, 5.71099995e+01, 5.78711995e+01, 5.86323995e+01, 5.93935995e+01, 6.01547995e+01, 6.09159995e+01, 6.16771995e+01, 6.24383995e+01, 6.31995995e+01, 6.39607995e+01, 6.47219995e+01, 6.54831995e+01, 6.62443995e+01, 6.70055995e+01, 6.77667995e+01, 6.85279995e+01, 6.92891994e+01, 7.00503994e+01, 7.08115994e+01, 7.15727994e+01, 7.23339994e+01, 7.30951994e+01, 7.38563994e+01, 7.46175994e+01, 7.53787994e+01, 7.61399994e+01])
- (trip_distance)DataArrayViewbinned data [len=676736, len=1480372, ..., len=0, len=1]
dim='row', content=DataArray( dims=(row: 4215195), data=float64[counts], coords={'dropoff_datetime':datetime64[s], 'pickup_datetime':datetime64[s], 'fare_amount':float64[\$], 'trip_distance':float64[mi], 'tip_amount':float64[\$], 'dropoff_latitude':float64[deg], 'dropoff_longitude':float64[deg], 'pickup_latitude':float64[deg], 'pickup_longitude':float64[deg], 'dropoff_hour':int64[dimensionless], 'pickup_hour':int64[dimensionless]})
To get the minimum and maximum fares for all trips that ended inside our Manhattan area, we can do
manh_dist.bins.coords["fare_amount"].min(), manh.bins.coords["fare_amount"].max()
(<scipp.Variable> () float64 [$] -80,
<scipp.Variable> () float64 [$] 900)
These values are somewhat strange, indicative of bad data in the table.
We restrict our fare range from $0 to $200.
# Make 100 bins between 0 and 200 dollars
nbins = 100
fare_bins = sc.linspace("fare_amount", 0, 200, nbins + 1, unit="dollar")
# Bin & plot our data
manh_dist.bin(fare_amount=fare_bins).hist().transpose().plot(norm="log")
Some things we can say about the data:
there appears to be a (somewhat expected) correlation between fare amount and trip distance: the further you go, the more you’ll have to pay
for a given trip distance, clients usually pay above the diagonal line, rarely below
there appears to be a magic fare amount of $52 that will take you anywhere from 0 to 60 miles!
4. Plopp: interactive data visualization tools#
import plopp as pp
fare_lat_lon = da.hist(
fare_amount=fare_bins, dropoff_latitude=300, dropoff_longitude=300
)
fare_lat_lon
- fare_amount: 100
- dropoff_latitude: 300
- dropoff_longitude: 300
- dropoff_latitude(dropoff_latitude [bin-edge])float64deg40.595, 40.596, ..., 40.914, 40.915
Values:
array([40.59500122, 40.59606776, 40.5971343 , 40.59820084, 40.59926737, 40.60033391, 40.60140045, 40.60246699, 40.60353353, 40.60460007, 40.60566661, 40.60673314, 40.60779968, 40.60886622, 40.60993276, 40.6109993 , 40.61206584, 40.61313238, 40.61419891, 40.61526545, 40.61633199, 40.61739853, 40.61846507, 40.61953161, 40.62059814, 40.62166468, 40.62273122, 40.62379776, 40.6248643 , 40.62593084, 40.62699738, 40.62806391, 40.62913045, 40.63019699, 40.63126353, 40.63233007, 40.63339661, 40.63446314, 40.63552968, 40.63659622, 40.63766276, 40.6387293 , 40.63979584, 40.64086238, 40.64192891, 40.64299545, 40.64406199, 40.64512853, 40.64619507, 40.64726161, 40.64832815, 40.64939468, 40.65046122, 40.65152776, 40.6525943 , 40.65366084, 40.65472738, 40.65579391, 40.65686045, 40.65792699, 40.65899353, 40.66006007, 40.66112661, 40.66219315, 40.66325968, 40.66432622, 40.66539276, 40.6664593 , 40.66752584, 40.66859238, 40.66965892, 40.67072545, 40.67179199, 40.67285853, 40.67392507, 40.67499161, 40.67605815, 40.67712468, 40.67819122, 40.67925776, 40.6803243 , 40.68139084, 40.68245738, 40.68352392, 40.68459045, 40.68565699, 40.68672353, 40.68779007, 40.68885661, 40.68992315, 40.69098969, 40.69205622, 40.69312276, 40.6941893 , 40.69525584, 40.69632238, 40.69738892, 40.69845545, 40.69952199, 40.70058853, 40.70165507, 40.70272161, 40.70378815, 40.70485469, 40.70592122, 40.70698776, 40.7080543 , 40.70912084, 40.71018738, 40.71125392, 40.71232045, 40.71338699, 40.71445353, 40.71552007, 40.71658661, 40.71765315, 40.71871969, 40.71978622, 40.72085276, 40.7219193 , 40.72298584, 40.72405238, 40.72511892, 40.72618546, 40.72725199, 40.72831853, 40.72938507, 40.73045161, 40.73151815, 40.73258469, 40.73365122, 40.73471776, 40.7357843 , 40.73685084, 40.73791738, 40.73898392, 40.74005046, 40.74111699, 40.74218353, 40.74325007, 40.74431661, 40.74538315, 40.74644969, 40.74751623, 40.74858276, 40.7496493 , 40.75071584, 40.75178238, 40.75284892, 40.75391546, 40.75498199, 40.75604853, 40.75711507, 40.75818161, 40.75924815, 40.76031469, 40.76138123, 40.76244776, 40.7635143 , 40.76458084, 40.76564738, 40.76671392, 40.76778046, 40.768847 , 40.76991353, 40.77098007, 40.77204661, 40.77311315, 40.77417969, 40.77524623, 40.77631276, 40.7773793 , 40.77844584, 40.77951238, 40.78057892, 40.78164546, 40.782712 , 40.78377853, 40.78484507, 40.78591161, 40.78697815, 40.78804469, 40.78911123, 40.79017776, 40.7912443 , 40.79231084, 40.79337738, 40.79444392, 40.79551046, 40.796577 , 40.79764353, 40.79871007, 40.79977661, 40.80084315, 40.80190969, 40.80297623, 40.80404277, 40.8051093 , 40.80617584, 40.80724238, 40.80830892, 40.80937546, 40.810442 , 40.81150853, 40.81257507, 40.81364161, 40.81470815, 40.81577469, 40.81684123, 40.81790777, 40.8189743 , 40.82004084, 40.82110738, 40.82217392, 40.82324046, 40.824307 , 40.82537354, 40.82644007, 40.82750661, 40.82857315, 40.82963969, 40.83070623, 40.83177277, 40.8328393 , 40.83390584, 40.83497238, 40.83603892, 40.83710546, 40.838172 , 40.83923854, 40.84030507, 40.84137161, 40.84243815, 40.84350469, 40.84457123, 40.84563777, 40.84670431, 40.84777084, 40.84883738, 40.84990392, 40.85097046, 40.852037 , 40.85310354, 40.85417007, 40.85523661, 40.85630315, 40.85736969, 40.85843623, 40.85950277, 40.86056931, 40.86163584, 40.86270238, 40.86376892, 40.86483546, 40.865902 , 40.86696854, 40.86803507, 40.86910161, 40.87016815, 40.87123469, 40.87230123, 40.87336777, 40.87443431, 40.87550084, 40.87656738, 40.87763392, 40.87870046, 40.879767 , 40.88083354, 40.88190008, 40.88296661, 40.88403315, 40.88509969, 40.88616623, 40.88723277, 40.88829931, 40.88936584, 40.89043238, 40.89149892, 40.89256546, 40.893632 , 40.89469854, 40.89576508, 40.89683161, 40.89789815, 40.89896469, 40.90003123, 40.90109777, 40.90216431, 40.90323085, 40.90429738, 40.90536392, 40.90643046, 40.907497 , 40.90856354, 40.90963008, 40.91069661, 40.91176315, 40.91282969, 40.91389623, 40.91496277]) - dropoff_longitude(dropoff_longitude [bin-edge])float64deg-74.050, -74.049, ..., -73.731, -73.730
Values:
array([-74.04999542, -74.04892878, -74.04786214, -74.0467955 , -74.04572886, -74.04466222, -74.04359558, -74.04252894, -74.0414623 , -74.04039566, -74.03932902, -74.03826238, -74.03719574, -74.0361291 , -74.03506246, -74.03399582, -74.03292918, -74.03186254, -74.0307959 , -74.02972926, -74.02866262, -74.02759598, -74.02652934, -74.0254627 , -74.02439606, -74.02332942, -74.02226278, -74.02119614, -74.0201295 , -74.01906286, -74.01799622, -74.01692958, -74.01586294, -74.0147963 , -74.01372965, -74.01266301, -74.01159637, -74.01052973, -74.00946309, -74.00839645, -74.00732981, -74.00626317, -74.00519653, -74.00412989, -74.00306325, -74.00199661, -74.00092997, -73.99986333, -73.99879669, -73.99773005, -73.99666341, -73.99559677, -73.99453013, -73.99346349, -73.99239685, -73.99133021, -73.99026357, -73.98919693, -73.98813029, -73.98706365, -73.98599701, -73.98493037, -73.98386373, -73.98279709, -73.98173045, -73.98066381, -73.97959717, -73.97853053, -73.97746389, -73.97639725, -73.97533061, -73.97426397, -73.97319733, -73.97213069, -73.97106405, -73.96999741, -73.96893077, -73.96786413, -73.96679749, -73.96573085, -73.9646642 , -73.96359756, -73.96253092, -73.96146428, -73.96039764, -73.959331 , -73.95826436, -73.95719772, -73.95613108, -73.95506444, -73.9539978 , -73.95293116, -73.95186452, -73.95079788, -73.94973124, -73.9486646 , -73.94759796, -73.94653132, -73.94546468, -73.94439804, -73.9433314 , -73.94226476, -73.94119812, -73.94013148, -73.93906484, -73.9379982 , -73.93693156, -73.93586492, -73.93479828, -73.93373164, -73.932665 , -73.93159836, -73.93053172, -73.92946508, -73.92839844, -73.9273318 , -73.92626516, -73.92519852, -73.92413188, -73.92306524, -73.9219986 , -73.92093196, -73.91986532, -73.91879868, -73.91773204, -73.9166654 , -73.91559875, -73.91453211, -73.91346547, -73.91239883, -73.91133219, -73.91026555, -73.90919891, -73.90813227, -73.90706563, -73.90599899, -73.90493235, -73.90386571, -73.90279907, -73.90173243, -73.90066579, -73.89959915, -73.89853251, -73.89746587, -73.89639923, -73.89533259, -73.89426595, -73.89319931, -73.89213267, -73.89106603, -73.88999939, -73.88893275, -73.88786611, -73.88679947, -73.88573283, -73.88466619, -73.88359955, -73.88253291, -73.88146627, -73.88039963, -73.87933299, -73.87826635, -73.87719971, -73.87613307, -73.87506643, -73.87399979, -73.87293315, -73.87186651, -73.87079987, -73.86973323, -73.86866659, -73.86759995, -73.8665333 , -73.86546666, -73.86440002, -73.86333338, -73.86226674, -73.8612001 , -73.86013346, -73.85906682, -73.85800018, -73.85693354, -73.8558669 , -73.85480026, -73.85373362, -73.85266698, -73.85160034, -73.8505337 , -73.84946706, -73.84840042, -73.84733378, -73.84626714, -73.8452005 , -73.84413386, -73.84306722, -73.84200058, -73.84093394, -73.8398673 , -73.83880066, -73.83773402, -73.83666738, -73.83560074, -73.8345341 , -73.83346746, -73.83240082, -73.83133418, -73.83026754, -73.8292009 , -73.82813426, -73.82706762, -73.82600098, -73.82493434, -73.8238677 , -73.82280106, -73.82173442, -73.82066778, -73.81960114, -73.8185345 , -73.81746785, -73.81640121, -73.81533457, -73.81426793, -73.81320129, -73.81213465, -73.81106801, -73.81000137, -73.80893473, -73.80786809, -73.80680145, -73.80573481, -73.80466817, -73.80360153, -73.80253489, -73.80146825, -73.80040161, -73.79933497, -73.79826833, -73.79720169, -73.79613505, -73.79506841, -73.79400177, -73.79293513, -73.79186849, -73.79080185, -73.78973521, -73.78866857, -73.78760193, -73.78653529, -73.78546865, -73.78440201, -73.78333537, -73.78226873, -73.78120209, -73.78013545, -73.77906881, -73.77800217, -73.77693553, -73.77586889, -73.77480225, -73.77373561, -73.77266897, -73.77160233, -73.77053569, -73.76946905, -73.7684024 , -73.76733576, -73.76626912, -73.76520248, -73.76413584, -73.7630692 , -73.76200256, -73.76093592, -73.75986928, -73.75880264, -73.757736 , -73.75666936, -73.75560272, -73.75453608, -73.75346944, -73.7524028 , -73.75133616, -73.75026952, -73.74920288, -73.74813624, -73.7470696 , -73.74600296, -73.74493632, -73.74386968, -73.74280304, -73.7417364 , -73.74066976, -73.73960312, -73.73853648, -73.73746984, -73.7364032 , -73.73533656, -73.73426992, -73.73320328, -73.73213664, -73.73107 , -73.73000336]) - fare_amount(fare_amount [bin-edge])float64\$0.0, 2.0, ..., 198.0, 200.0
Values:
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20., 22., 24., 26., 28., 30., 32., 34., 36., 38., 40., 42., 44., 46., 48., 50., 52., 54., 56., 58., 60., 62., 64., 66., 68., 70., 72., 74., 76., 78., 80., 82., 84., 86., 88., 90., 92., 94., 96., 98., 100., 102., 104., 106., 108., 110., 112., 114., 116., 118., 120., 122., 124., 126., 128., 130., 132., 134., 136., 138., 140., 142., 144., 146., 148., 150., 152., 154., 156., 158., 160., 162., 164., 166., 168., 170., 172., 174., 176., 178., 180., 182., 184., 186., 188., 190., 192., 194., 196., 198., 200.])
- (fare_amount, dropoff_latitude, dropoff_longitude)float64counts0.0, 0.0, ..., 0.0, 0.0
Values:
array([[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], ..., [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]])
%matplotlib widget
inspect = pp.inspector(fare_lat_lon, dim="fare_amount", norm="log")
inspect
Exercise 4.1: Rush hours#
Histogram the Manhattan and JFK bins according to hour-of-the-day, to show the quiet and busy hours for both boroughs.
Solution:
Show code cell content
# In Plopp, you can use the + and / operators to make tiled figures
manh.hist(dropoff_hour=24).plot(title='Manhattan') / jfk.hist(dropoff_hour=24).plot(title='JFK')
Exercise 4.2: Expensive hours#
The final exercise is to create an interactive figure that will show histograms of how expensive trips were, as a function of the hour-of-the-day, for the entire dataset.
You should:
Create a
price_per_mile
coordinate on the original datasetda
Bin
da
using two dimensions: hour-of-the-day andprice_per_mile
Use Plopp’s
superplot
function to make a figure with a 1D histogram and an interactive slider to navigate the hour dimension
Use the slider to find the hour of the day when trips are the most expensive!
Hint: For binning in hour-of-the-day, using 24 bins should work well.
For binning in price_per_mile
, you will have to manually set the bin boundaries.
Solution:
Show code cell content
da.coords['price_per_mile'] = da.coords['fare_amount'] / da.coords['trip_distance']
sp = pp.superplot(
da.bin(dropoff_hour=24,
price_per_mile=sc.linspace('price_per_mile', 0, 20, 100, unit='dollar/mi')).hist())
sp
Show code cell outputs
Bonus Exercise#
You decided to join an exchange program in NY.
But living expenses are too high there, even compared to Copenhagen!
Luckily, you can take over a car from a previous student in the same program, and you are allowed to have a part-time job for 2 hours every day, and there is no limit of income.
So you decide to be a shared-car driver. Your goal is to maximize your income within those 2 hours, so you are going to analyse which hours to drive in which borough!
You are free to choose 2 hours among all 24 in a day, and there are 2 places, Manhattan and JFK airport, where you can be registered as a driver.
Solution
Show code cell content
# Bin the data again to get the `price_per_mile` coord in the
# Manhattan and JFK bins
binned = da.bin(dropoff_latitude=8, dropoff_longitude=8)
# Manhattan bin
manh = binned["dropoff_longitude", 1]["dropoff_latitude", 4]
# Create a new data array with the `price_per_mile` as weights
prices_manh = sc.DataArray(data=manh.values.coords['price_per_mile'],
coords={'dropoff_hour': manh.values.coords['dropoff_hour']})
# Bin by hour-of-the-day and get the mean inside each bin
mean_manh = prices_manh.bin(dropoff_hour=24).bins.mean()
# Repeat for JFK
jfk = binned["dropoff_longitude", 6]["dropoff_latitude", 1]
prices_jfk = sc.DataArray(data=jfk.values.coords['price_per_mile'],
coords={'dropoff_hour': jfk.values.coords['dropoff_hour']})
mean_jfk = prices_jfk.bin(dropoff_hour=24).bins.mean()
# Plot
fig = pp.plot({'Manhattan': mean_manh, 'JFK': mean_jfk})
fig