SANS data reduction#

This notebook will guide you through the data reduction for the SANS experiment that you simulated with McStas yesterday.

The following is a basic outline of what this notebook will cover:

  • Loading the NeXus files that contain the data

  • Inspect/visualize the data contents

  • How to convert the raw time-of-flight coordinate to something more useful (\(\lambda\), \(Q\), …)

  • Normalize to a flat field run

  • Write the results to file

  • Process more than one pulse

import numpy as np
import scipp as sc
import plopp as pp
import utils

Process the run with a sample#

Load the NeXus file data#

folder = "../3-mcstas/SANS_with_sample_1_pulse"

sample = utils.load_sans(folder)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 3
      1 folder = "../3-mcstas/SANS_with_sample_1_pulse"
----> 3 sample = utils.load_sans(folder)

File ~/work/dmsc-school/dmsc-school/4-reduction/sans_utils.py:25, in load_sans(path)
     23     events, meta = load_ascii(filename=ascii_file)
     24 else:
---> 25     events, meta = load_nexus(path=path)
     27 weights = events.pop("p")
     28 weights.unit = "counts"

File ~/work/dmsc-school/dmsc-school/4-reduction/load.py:19, in load_nexus(path)
     17 with warnings.catch_warnings():
     18     warnings.simplefilter("ignore")
---> 19     with sx.File(fname) as f:
     20         dg = f[...]
     21 events = sc.collapse(
     22     dg["entry1"]["data"]["detector_signal_event_dat"].data, keep="dim_0"
     23 )

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/scippnexus/v2/file.py:29, in File.__init__(self, definitions, *args, **kwargs)
     27 if definitions is _default_definitions:
     28     definitions = base_definitions()
---> 29 self._file = h5py.File(*args, **kwargs)
     30 super().__init__(self._file, definitions=definitions)

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/h5py/_hl/files.py:562, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
    553     fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
    554                      locking, page_buf_size, min_meta_keep, min_raw_keep,
    555                      alignment_threshold=alignment_threshold,
    556                      alignment_interval=alignment_interval,
    557                      meta_block_size=meta_block_size,
    558                      **kwds)
    559     fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
    560                      fs_persist=fs_persist, fs_threshold=fs_threshold,
    561                      fs_page_size=fs_page_size)
--> 562     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    564 if isinstance(libver, tuple):
    565     self._libver = libver

File /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/h5py/_hl/files.py:235, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    233     if swmr and swmr_support:
    234         flags |= h5f.ACC_SWMR_READ
--> 235     fid = h5f.open(name, flags, fapl=fapl)
    236 elif mode == 'r+':
    237     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5f.pyx:102, in h5py.h5f.open()

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = '../3-mcstas/SANS_with_sample_1_pulse/mccode.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

The first way to inspect the data is to view the HTML representation of the loaded object.

Try to explore what is inside the data, and familiarize yourself with the different sections (Dimensions, Coordinates, Data).

sample

Visualize the data#

Here is a 2D visualization of the neutron counts, histogrammed along the tof and y dimensions:

sample.hist(tof=200, y=200).plot(norm="log", vmin=1.0e-2)

Histogramming along y only gives a 1D plot:

sample.hist(y=200).plot(norm="log")

Coordinate transformations#

The first step in the data reduction is to convert the raw event coordinates (position, time-of-flight) to something physically meaningful such as wavelength (\(\lambda\)) or momentum transfer (\(Q\)).

Scipp has a dedicated method for this called transform_coords (see docs here).

We begin with a standard graph which describes how to compute e.g. the wavelength from the other coordinates in the raw data.

from scippneutron.conversion.graph.beamline import beamline
from scippneutron.conversion.graph.tof import kinematic

graph = {**beamline(scatter=True), **kinematic("tof")}
sc.show_graph(graph, simplified=True)

To compute the wavelength of all the events, we simply call transform_coords on our loaded data, requesting the name of the coordinate we want in the output ("wavelength"), as well as providing it the graph to be used to compute it (i.e. the one we defined just above).

This yields

sample_wav = sample.transform_coords("wavelength", graph=graph)
sample_wav

The result has a wavelength coordinate. We can also plot the result:

sample_wav.hist(wavelength=200).plot()

We can see that the range of observed wavelengths agrees with the range set in the McStas model (5.25 - 6.75 Å)

Exercise 1: convert raw data to \(Q\)#

Instead of wavelength as in the example above, the task is now to convert the raw data to momentum-transfer \(Q\).

The transformation graph is missing the computation for \(Q\) so you will have to add it in yourself. As a reminder, \(Q\) is computed as follows

\[Q = \frac{4\pi \sin(\theta)}{\lambda}\]

You have to:

  • create a function that computes \(Q\)

  • add it to the graph

  • call transform_coords using the new graph

Solution:

Hide code cell content
def compute_q(two_theta, wavelength):
    return (4.0 * np.pi) * sc.sin(two_theta / 2) / wavelength


graph["Q"] = compute_q
sample_q = sample.transform_coords("Q", graph=graph)
sample_q

Histogram the data in \(Q\)#

The final step in processing the sample run is to histogram the data into \(Q\) bins.

sample_h = sample_q.hist(Q=200)
sample_h.plot(norm="log", vmin=1)

The histogrammed data currently has no standard deviations on the counts. This needs to be added after we have performed the histogramming operation.

When dealing with neutron events, we assume the data has a Poisson distribution. This means that the variance in a bin is equal to the counts in that bin (the standard deviation is then \(\sigma = \sqrt{\mathrm{counts}}\)).

We provide a helper function that will add Poisson variances to any given input:

utils.add_variances(sample_h)
sample_h.data
sample_h.plot(norm="log", vmin=1)

Exercise 2: process flat-field run#

Repeat the step carried out above for the run that contained no sample (also known as a “flat-field” run).

Solution:

Hide code cell content
folder = "../3-mcstas/SANS_without_sample_1_pulse"
# Load file
flat = utils.load_sans(folder)

# Convert to Q
flat_q = flat.transform_coords("Q", graph=graph)

# Histogram and add variances
flat_h = flat_q.hist(Q=200)
utils.add_variances(flat_h)
flat_h.plot()

Exercise 3: Normalize the sample run#

The flat-field run is giving a measure of the efficiency of each detector pixel. This efficiency now needs to be used to correct the counts in the sample run to yield a realistic \(I(Q)\) profile.

In particular, this should remove any unwanted artifacts in the data, such as the drop in counts around 0.03 Å-1 due to the air bubble inside the detector tube.

Normalizing is essentially just dividing the sample run by the flat field run.

Hint: you may run into an error like "Mismatch in coordinate 'Q'". Why is this happening? How can you get around it?

Solution:

Hide code cell content
normed = sample_h / flat_h
Hide code cell content
# Make common bins
qbins = sc.linspace("Q", 5.0e-3, 0.2, 201, unit="1/angstrom")

# Histogram both sample and flat-field with same bins
sample_h = sample_q.hist(Q=qbins)
flat_h = flat_q.hist(Q=qbins)

# Add variances to both
utils.add_variances(sample_h, flat_h)

# Normalize
normed = sample_h / flat_h
normed.plot(norm="log", vmin=1.0e-3, vmax=10.0)

Save result to disk#

Finally, we need to save our results to disk, so that the reduced data can be forwarded to the next step in the pipeline (data analysis).

We will use a simple text file for this:

from scippneutron.io import save_xye

# The simple file format does not support bin-edge coordinates.
# So we convert to bin-centers first.
data = normed.copy()
data.coords["Q"] = sc.midpoints(data.coords["Q"])

save_xye("sans_iofq.dat", data)

Process data from 3 pulses#

We now want to repeat the reduction, but using more than a single pulse to improve our statistics.

We begin by loading the run with 3 pulses.

folder = "../3-mcstas/SANS_with_sample_3_pulse"
sample = utils.load_sans(folder)
sample
sample.hist(tof=200, y=200).plot(norm="log", vmin=1.0e-2)

We can see that we have 3 distinct pulses (3 horizontal bands of events).

In order to combine the data from the 3 pulses (also known as ‘pulse folding’), we need to compute a neutron time-of-flight relative to the start of each pulses, instead of absolute time.

Exercise 4: fold the pulses#

You are provided with a function that will perform the folding correctly, assuming you supply it the correct inputs.

help(utils.fold_pulses)

Instead of reading the bin edges off the figure above, you should try to compute your bin edges in a way that is based on some known parameters of your simulation (e.g. wavelength range) as well as the physical layout of the instrument.

Hints:

  • You have 3 pulses, so you need to supply 4 time-of-flight edges

  • Assume the first pulse has a zero time offset

  • The frequency of the ESS source is 14 Hz

  • To convert neutron wavelength to speed, you can use: \(~~~~v = \displaystyle\frac{h}{m_{\mathrm{n}}\lambda} \)

Solution:

Hide code cell content
AA = sc.Unit("angstrom")
wmin = 5.25 * AA
wmax = 6.75 * AA
h = sc.constants.h
m_n = sc.constants.m_n

# Make 3 pulse offsets, starting at 3, using 14 pulses per second
pulse_offsets = sc.arange("tof", 3.0) * sc.scalar(1.0 / 14.0, unit="s").to(unit="ms")

# Compute minimum and maximum neutron speeds
vmin = (h / (m_n * wmax)).to(unit="m/s")
vmax = (h / (m_n * wmin)).to(unit="m/s")

# Compute pixel mean distance
mean_distance = sc.norm(
    sample.coords["position"].mean() - sample.coords["source_position"]
)

# Compute min and max times for neutrons to arrive
tmin = (mean_distance / vmax).to(unit="ms")
tmax = (mean_distance / vmin).to(unit="ms") + pulse_offsets[-1]

# Make evenly-spaced edges
edges = sc.linspace("tof", tmin.value, tmax.value, 4, unit="ms")

# Fold the data
folded = utils.fold_pulses(sample, edges, pulse_offsets)

# Convert to Q and histogram
folded_q = folded.transform_coords("Q", graph=graph)
folded_h = folded_q.hist(Q=qbins)
utils.add_variances(folded_h)
folded.hist(tof=200, y=200).plot(norm="log", vmin=1.0e-2)
folded_h.plot(norm="log", vmin=1.0)
Hide code cell content
folder = "../3-mcstas/SANS_without_sample_3_pulse"
flat = utils.load_sans(folder)

flat_folded = utils.fold_pulses(flat, edges, pulse_offsets)
flat_folded_q = flat_folded.transform_coords("Q", graph=graph)
flat_folded_h = flat_folded_q.hist(Q=qbins)
utils.add_variances(flat_folded_h)
Hide code cell content
folded_normed = folded_h / flat_folded_h
folded_normed.plot(norm="log", vmin=1.0e-3, vmax=10.0)
pp.plot({"1-pulse": normed, "3-pulses": folded_normed}, norm="log", vmin=1.0e-3, vmax=10.0)

Save results to disk#

Once again, we need to save our results to disk:

data = folded_normed.copy()
data.coords["Q"] = sc.midpoints(data.coords["Q"])

save_xye("sans_iofq_3pulses.dat", data)