Preprocessing¶
Raman data typically require preprocessing prior to analysis to ensure consistency and comparability across spectra. Common preprocessing steps include cropping, despiking, smoothing, baseline correction, and normalization. These operations help to reduce noise, correct artifacts, and standardize the spectral data.
In PyFasma, all preprocessing functions are implemented in the numpyfuncs
module and made available for use on pandas DataFrames using the dffuncs
module. These functions can be accessed via the .pyfasma
accessor attached to the DataFrames. The general syntax for applying a preprocessing method to a DataFrame df
is:
df.pyfasma.<method>
where <method>
is the preprocessing method to be applied.
To enable the .pyfasma
accessor, one has to first import the dffuncs
module:
from pyfasma import dffuncs
Note
The DataFrame must be structured with Raman shift values as the index and spectra as columns.
Cropping¶
Cropping is typically the first preprocessing step and involves selecting a specific spectral range of interest by removing values outside a defined wavenumber interval. This step reduces data size and focuses the analysis on the most informative regions, such as the fingerprint or high-wavenumber regions in Raman spectra.
In PyFasma, cropping is performed using the df.pyfasma.crop
method, available through the .pyfasma
accessor. The function selects only the rows (wavenumbers) within the specified range.
Example¶
The following command can be used to crop the spectra, retaining only the region between 400 and 1800 cm-1:
df_cropped = df.pyfasma.crop(xrange=[400, 1800])
The method is non-destructive and returns a new DataFrame containing only the selected spectral region.
Despiking¶
Despiking removes sharp, isolated intensity artifacts from Raman spectra, often caused by cosmic rays or detector noise. These spikes can interfere with baseline correction, normalization, and subsequent analysis, so they are typically removed early in the preprocessing pipeline.
In PyFasma, despiking is performed using the df.pyfasma.despike
method, available through the .pyfasma
accessor. Internally, it uses scipy.signal.find_peaks()
to detect and remove peaks based on customizable parameters such as height, prominence, and width.
Examples¶
The following command removes both positive and negative spikes using default settings:
df_despiked = df.pyfasma.despike()
To only remove strong positive spikes with a minimum height of 500:
df_despiked = df.pyfasma.despike(spikes_type="pos", height=500)
The method is non-destructive and returns a new DataFrame with spikes removed from each spectrum (column).
See full parameter list and details
For more control over the spike detection criteria, refer to the full method documentation:
pyfasma.dffuncs.PyfasmaAccessor.despike()
Smoothing¶
Smoothing reduces noise in Raman spectra by averaging signal fluctuations while preserving important features like peaks. It is a common preprocessing step before baseline correction or peak analysis, especially when spectra exhibit high-frequency noise.
In PyFasma, smoothing is performed using the df.pyfasma.smooth
method, available through the .pyfasma
accessor. The function supports several smoothing algorithms, including Savitzky-Golay, moving average, and Gaussian filtering. The smoothing behavior is controlled by the kind
and params
arguments.
Examples¶
Apply a Savitzky-Golay filter with a window length of 11 and a polynomial order of 3:
df_smooth = df.pyfasma.smooth(params=[11, 3], kind="savgol")
Apply a simple moving average with a window length of 15:
df_smooth = df.pyfasma.smooth(params=[15], kind="movav")
Apply Gaussian smoothing with a standard deviation of 2.5:
df_smooth = df.pyfasma.smooth(params=[2.5], kind="gauss")
The method is non-destructive and returns a new DataFrame with smoothed spectra. The choice of filter and its parameters should depend on the level and nature of noise in your data.
See full parameter list and details
For all supported smoothing methods and their required parameters, refer to the full method documentation:
pyfasma.dffuncs.PyfasmaAccessor.smooth()
Baseline Correction¶
Baseline correction removes background signal from Raman spectra, often caused by fluorescence or system artifacts. This step is essential for accurate peak detection and quantification.
In PyFasma, baseline correction is performed using the df.pyfasma.baseline_correct
method, available through the .pyfasma
accessor. Internally, it leverages the pybaselines library by Donald Erb and supports several popular algorithms, including IModPoly [1], SNIP [2], and airPLS[3].
Examples¶
Apply the default SNIP algorithm:
df_corrected = df.pyfasma.baseline_correct()
Use the IModPoly method with a second-order polynomial:
df_corrected = df.pyfasma.baseline_correct(kind="imodpoly", poly_order=2)
Apply the airPLS method with a smoother baseline (higher
lam
):df_corrected = df.pyfasma.baseline_correct(kind="airpls", lam=1e7)
The method is non-destructive and returns a new DataFrame with the estimated baseline removed from each spectrum. By default, PyFasma also vertically shifts the baseline to avoid clipping the signal (zero_correction=True
), ensuring it remains below the input spectrum.
See full parameter list and details
Each algorithm accepts its own set of fine-tuning parameters. For advanced usage, refer to the full method documentation:
pyfasma.dffuncs.PyfasmaAccessor.baseline_correct()
References
[1] Zhao, J., et al. Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy, Applied Spectroscopy, 2007, 61(11), 1225-1232.
[2] Morháč, M. An algorithm for determination of peak regions and baseline elimination in spectroscopic data. Nuclear Instruments and Methods in Physics Research A, 2009, 60, 478-487.
[3] Zhang, Z.M., et al. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 2010, 135(5), 1138-1146.
Normalization¶
Normalization scales Raman spectra to make them comparable across samples, regardless of their absolute intensities. This is essential for preprocessing pipelines where differences in overall intensity could obscure relevant spectral features.
In PyFasma, normalization is performed using the df.pyfasma.normalize
method, available through the .pyfasma
accessor. Multiple normalization strategies are supported, such as max-intensity, area under curve, L1/L2 norms, min-max scaling, and mean absolute deviation (MAD). For intensity and area-based normalization, the spectral x-axis must be defined.
Examples¶
Normalize each spectrum by its maximum value:
df_norm = df.pyfasma.normalize(kind="intensity")
Normalize based on the area under the curve between 100 and 1800 cm{sup}-1:
df_norm = df.pyfasma.normalize(kind="area", xrange=[100, 1800])
Normalize each spectrum using the L2 norm (Euclidean norm):
df_norm = df.pyfasma.normalize(kind="l2")
The method is non-destructive and returns a new DataFrame with normalized columns.
See full parameter list and details
For all normalization options, including those requiring a spectral axis (x), refer to the full method documentation:
pyfasma.dffuncs.PyfasmaAccessor.normalize()
Interpolation¶
Interpolation resamples Raman spectra to a new set of x-values. This is often necessary when combining spectra measured at slightly different spectral points, or when aligning data onto a common wavenumber axis for downstream analysis. Interpolation estimates the y-values (intensities) at new x-locations using a mathematical function fitted to the original data.
In PyFasma, interpolation is performed using the df.pyfasma.interpolate
method, available through the .pyfasma
accessor. It supports linear and cubic interpolation methods, depending on the desired trade-off between speed and smoothness.
Examples¶
Interpolate to a new axis with 1-unit steps using cubic interpolation (default):
new_x = np.arange(100, 1800, 1) df_interp = df.pyfasma.interpolate(xnew=new_x)
Use linear interpolation for faster performance:
df_interp = df.pyfasma.interpolate(xnew=new_x, kind="linear")
The method is non-destructive and returns a new DataFrame interpolated to the values in xnew
.
See full parameter list and details
For additional options, refer to the full method documentation:
pyfasma.dffuncs.PyfasmaAccessor.interpolate()