numpyfuncs
module¶
- pyfasma.numpyfuncs.baseline_correct(y: numpy.ndarray, kind='snip', zero_correction=False, **params) numpy.ndarray ¶
Apply baseline correction to the input array.
This method uses some of the most popular and efficient baseline correction algorithms, provided by the pybaselines project: https://pybaselines.readthedocs.io/en/latest/
- Parameters:
y (np.ndarray) – The data to be baseline-corrected. Must not contain missing data (NaN) or Inf.
kind ({'imodpoly', 'snip', 'airpls'}, optional) –
The type of baseline correction to be applied.
’imodpoly’: apply baseline correction using the Improved Modified Polynomial (IModPoly) algorithm [1].
’snip’: apply baseline correction using the Statistics-sensitive Non-linear Iterative Peak-clipping (SNIP) algorithm [2].
’airpls’: apply baseline correction using the Adaptive Iteratively Reweighted Penalized Least Squares (airPLS) algorithm [3].
zero_correction (bool) – If True (default), vertically offset the estimated baseline so that it lies below the input array.
**params –
Parameters for the selected baseline correction algorithm. Each algorithm has its own set of parameters that allow for finely adjusting the baseline. To keep things simple and focus on ease of use, here we will only document the most important parameters that affect baseline estimation for each algorithm. In most cases, there should be no need to adjust any of the other parameters. For all available parameters and their documentation, the user is required to consult the respective section of pybaselines: https://pybaselines.readthedocs.io/en/latest/
’imodpoly’ :
- x_datanp.ndarray, optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with the same number of points as y.
- poly_orderint
The polynomial order for fitting the baseline. Default is 2.
- num_stdfloat
The number of standard deviations to include when thresholding. Default is 1.
’snip’ :
- max_half_windowint or Sequence(int, int)
The maximum number of iterations. Should be set such that max_half_window is approxiamtely (w-1)/2, where w is the index-based width of a feature or peak. max_half_window can also be a sequence of two integers for asymmetric peaks, with the first item corresponding to the max_half_window of the peak’s left edge, and the second item for the peak’s right edge. Default is None, which will use the output from pybaselines.utils.optimize_window(), which is an okay starting value.
- decreasingbool
If False (default), will iterate through window sizes from 1 to max_half_window. If True, will reverse the order and iterate from max_half_window to 1, which gives a smoother baseline.
’airpls’ :
- lamfloat
The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.
- Returns:
The baseline-corrected array.
- Return type:
np.ndarray
References
[1] Zhao, J., et al. Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy, Applied Spectroscopy, 2007, 61(11), 1225-1232.
[2] Morháč, M. An algorithm for determination of peak regions and baseline elimination in spectroscopic data. Nuclear Instruments and Methods in Physics Research A, 2009, 60, 478-487.
[3] Zhang, Z.M., et al. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 2010, 135(5), 1138-1146.
- pyfasma.numpyfuncs.crop(y: numpy.ndarray, x: numpy.ndarray, xrange=[None, None]) tuple ¶
Crop x and y arrays at the specified x range.
- Parameters:
y (np.ndarray) – Array of y values.
x (np.ndarray) – Array of x points that y values correspond at. Must be in ascending order.
xrange (list,) – The x values range that both x and y are cropped at. Must be a list of two values: xrange=[start, end]. The first value is the start value and the second is the end value of the crop range. If the start or end values are None, they are set equal to the first and last value of x, respectively.
- Returns:
The cropped x and y arrays.
- Return type:
tuple
- Raises:
ValueError – If elements of x are not in ascending order. If x and y arrays do not have the same length. If xrange list does not have two elements. If the first element of the xrange list is larger than the second element.
TypeError – If xrange is not a list.
- pyfasma.numpyfuncs.despike(y: numpy.ndarray, spikes_type='all', height=None, threshold=None, distance=None, prominence=None, width=None, wlen=None, rel_height=0.5, plateau_size=None) numpy.ndarray ¶
Remove spikes from an array. The method allows for removal of positive, negative, or both positive and negative spikes and is based on scipy.signal.find_peaks.
- Parameters:
y (np.ndarray) – The input array.
spikes_type ({'pos', 'neg', 'all'} str, optional) – The type of spikes to be removed based on their amplitude. If ‘pos’, only remove positive spikes, i.e. spikes that present an increased amplitude. If ‘neg’, only remove negative spikes, i.e. spikes that present a decreased amplitude. If ‘all’ (default), remove both positive and negative spikes.
height (number or ndarray or sequence, optional) – Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.
threshold (number or ndarray or sequence, optional) – Required threshold of peaks, the vertical distance to its neighboring samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required threshold.
distance (number, optional) – Required minimal horizontal distance (>= 1) in samples between neighbouring peaks. Smaller peaks are removed first until the condition is fulfilled for all remaining peaks.
prominence (number or ndarray or sequence, optional) – Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence.
width (number or ndarray or sequence, optional) – Required width of peaks in samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required width.
wlen (int, optional) – Used for calculation of the peaks prominences, thus it is only used if one of the arguments prominence or width is given. See argument wlen in peak_prominences for a full description of its effects.
rel_height (float, optional) – Used for calculation of the peaks width, thus it is only used if width is given. See argument rel_height in peak_widths for a full description of its effects.
plateau_size (number or ndarray or sequence, optional) – Required size of the flat top of peaks in samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied as the maximal required plateau size.
- Returns:
The despiked array.
- Return type:
np.ndarray
- pyfasma.numpyfuncs.differentiate(y: numpy.ndarray, x=None, deriv=1, smoothing=None, params=None) numpy.ndarray ¶
Apply differentiatiation to the input array.
- Parameters:
y (np.ndarray) – The data to be differentiated.
x (np.ndarray, optional) – The array to differentiate y over. It is used to calcualte dx (essentially the differences between successive x points) so that the derivative will be correctly scaled. If None (default), dx is set to 1.
deriv (int) – Order of derivative to calculate.
smoothing (str, optional) –
Smoothing filter to apply to the derivative. Can be one of the following:
’savgol’: applies a Savitzky-Golay filter.
’movav’: applies a moving average filter.
’gauss’: applies a 1-D Gaussian filter.
If None (default), no filter will be applied.
params (list, optional) –
The list of parameters to be used for the filtering method. The parameters depend on the selection of smoothing.
smoothing=’savgol’: requires two parameters: window_length (int), polyorder (int) window_length is the length of the filter window. It must be less than or equal to the size of x. polyorder is the order of the polynomial used to fit the samples. It must be less than window_length.
smoothing=’movav’: requires one parameter: window_length (int) window_length is the length of the filter window. It must be less than or equal to the size of x. Essentially, the moving average filter is a Savitzky- Golay filter of polyorder=0.
smoothing=’gauss’: requires one parameter: sigma (scalar) sigma is the standard deviation of the Gaussian kernel.
- Returns:
The differentiated array.
- Return type:
np.ndarray
- pyfasma.numpyfuncs.get_index(x: numpy.ndarray, value: int | float, nearest=True) int ¶
Get the index of an array that corresponds to a given value. If nearest is true, get the index of the value nearest to the value entered.
- Parameters:
x (np.ndarray) – The input array. Must be in ascending order.
value ((int, float)) – The value to get the index of.
nearest (bool, optional) – If True (default), get the index of the nearest value to value.
- Returns:
The index of the value.
- Return type:
int
- Raises:
ValueError – If array is not in ascending order. If value is not a number.
- pyfasma.numpyfuncs.integrate(y: numpy.ndarray, x: numpy.ndarray, xrange=None) -> (<class 'int'>, <class 'float'>)¶
Calculate the integral of an array y between the specified x range. The integral is calculated using Simpson’s rule.
- Parameters:
y (np.ndarray) – Array of y values.
x (np.ndarray) – Array of x points that y values correspond to. Must be in ascending order.
xrange (list, optional) – A list of two numbers ([start, end]) that specify the range of array x that corresponds to the lower and upper limit of the integration, respectively. If None or if the start or end values of the xrange list are None, they are set equal to the first and last value of x, respectively.
- Returns:
The integral of the y array.
- Return type:
int, float
- pyfasma.numpyfuncs.interpolate(y: numpy.ndarray, x: numpy.ndarray, xnew: numpy.ndarray, kind='cubic') numpy.ndarray ¶
Interpolate an array y, with values at points of an array x, at points of an array xnew.
- Parameters:
y (np.ndarray) – Array of y values to be interpolated.
x (np.ndarray) – Array of x points that y values correspond at. Must be in ascending order.
xnew (np.ndarray) – Array of new x points that y is interpolated at. Must be in ascending order.
kind ({'linear', 'cubic'}) – The type of interpolation to be applied. If ‘linear’, a linear function is used to estimate the interpolated values of y. If ‘cubic’ (default), a 3rd degree polynomial is used to estimated the interpolated values of y. ‘linear’ is faster but may have small deviations from the original points (usually negligible). ‘cubic’ is more accurate but is more computationally expensive.
- Returns:
The interpolated array.
- Return type:
np.ndarray
- Raises:
ValueError – If the elements of x and xnew arrays are not in ascending order. If the range of xnew is outside the range of x.
- pyfasma.numpyfuncs.normalize(y: numpy.ndarray, kind='intensity', x=None, xrange=None, xval=None) numpy.ndarray ¶
Normalize a y array.
- Parameters:
y (np.ndarray) – Array of y values.
kind ({'intensity', 'area', 'l1', 'l2', 'minmax', 'mad'}, optional) –
The type of normalization to use.
’intensity’: Normalize the array to the maximum value of the x array range or to the x array value. The case depends on the selection of the xrange or xval parameter, respectively. The option requires a sorted array x that the y array values correspond to.
’area’: Normalize the array to the value of the integral between the specified range of an array x. The option requires a sorted array x that the y array values correspond to.
’l1’: Normalize the array using the L1 norm. For a one-dimensional array (vector) y, the L1 norm is given by the sum of the absolute values of the array’s elements.
’l2’: Normalize the array using the L2 norm (also known as Euclidean norm or unit vector normalization). For a one-dimensional array (vector) y, the L2 norm is given by the square root of the sum of the squared elements of the array.
’minmax’: Normalize the values of the array to be in a range between 0 and 1. This is achieved by subtracting the minimum value of the array from all values and dividing each value by the difference of the maximum and minimum values of the array.
’mad’: Normalize the values of the array by the mean absolute deviation (MAD) of the array. MAD is given by:
\[\text{MAD} = \frac{1}{n} \sum_{i=1}^{n} |x_i - \mu|\]where \(x_i\) is each data point, \(\mu\) is the mean of the data, and \(n\) is the number of data points.
x (np.ndarray) – Array of x points that y values correspond to. Must be in ascending order. This option is required if kind=’intensity’ or kind=’area’ is used.
xrange (list, optional) –
This option has slightly different meaning depending on whether kind=’intensity’ or kind=’area’ is used. It has no effect in all other cases.
If used with kind=’intensity’: A list of two numbers ([start, end]) that specifies the range of array x whose corresponding maximum value of the y array is used as the normalization coefficient of the y array. If the start or end values are None, they are set equal to the first and last value of x, respectively. Note that xrange and xval are mutually exclusive, so only one can be not None at a time.
If used with kind=’area’: A list of two numbers ([start, end]) that specify the range of array x that corresponds to the lower and upper limit of the integration, respectively. If the start or end values are None, they are set equal to the first and last value of x, respectively.
xval ((int, float), optional) – The value of x, the corresponding y array value of which is used as the normalization coefficient. If the value of xval is not exactly equal to a value of the x array, the x array value right before of xval will be used. Note that xrange and xval are mutually exclusive, so only one can be not None at a time. This option has effect only if kind=’intensity’ is used.
- Returns:
The normalized y array.
- Return type:
np.ndarray
- Raises:
ValueError – If using kind=’intensity’ and x is None. If using kind=’area’ and x is None.
- pyfasma.numpyfuncs.smooth(y: numpy.ndarray, params: list, kind='savgol') numpy.ndarray ¶
Apply a smoothing filter to the input array.
- Parameters:
y (ndarray) – The data to be filtered.
params (list) –
The list of parameters to be used for the filtering method. The parameters depend on the selection of kind.
’savgol’: requires two parameters: window_length (int), polyorder (int) window_length is the length of the filter window. It must be less than or equal to the size of y. polyorder is the order of the polynomial used to fit the samples. It must be less than window_length.
’movav’: requires one parameter: window_length (int) window_length is the length of the filter window. It must be less than or equal to the size of y. Essentially, the moving average filter is a Savitzky- Golay filter of polyorder=0.
’gauss’: requires one parameter: sigma (scalar) sigma is the standard deviation of the Gaussian kernel.
kind ({'savgol', 'movav', 'gauss'}, optional) –
The type of smoothing filter to be applied to the data.
’savgol’: applies a Savitzky-Golay filter.
’movav’: applies a moving average filter.
’gauss’: applies a 1-D Gaussian filter.
- Raises:
ValueError – If kind is not one of the accepted values.
ValueError – If the length of the params list is not right. Depends on the selection of kind.
- Returns:
The filtered data.
- Return type:
ndarray