fileio module

pyfasma.fileio.load_csvs(path: str, exclude_files=None, exclude_dirs=None, delimiter=';', header='infer', names=None, index_col=None, usecols=None, skiprows=None, nrows=None, **kwargs) list

Load multiple CSV files recursively and add them as DataFrames in a list.

The method makes use of pandas.read_csv and most keyword arguments are passed to it.

Parameters:
  • path (str) – Path to the directory containing the SPC files.

  • exclude_files (list, optional) – Exclude the specified files so they aren’t included in the resulting list. Each filename must include the extension and must be specified as a string. The parameter can also include lists of files to be excluded.

  • exclude_dirs (list, optional) – Exclude the specified direcories so they aren’t included in the resulting list. Each directory name must be specified as a string. The parameter can also include lists of files to be excluded.

  • delimiter (str, default ';') – Character or regex pattern to treat as the delimiter. If delimiter=None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator from only the first valid row of the file by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from ‘s+’ will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone too ignoring quoted data. Regex example: ‘ ‘.

  • header (int, Sequence of int, "infer" or None) – Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly to names then the behavior is identical to header=None. Explicitly passed header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a MultiIndex on the columns e.g. [0, 1, 3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

  • names (Sequence of Hashable, optional) – Sequence of column labels to apply to each DataFrame in the list. If the file contains a header row, then you should explicitly pass header=0 to overrid ethe column names. Duplicates in this list are not allowed.

  • index_col (Hashable, Sequence of Hashable or False, optional) – Column(s) to use as row label(s) for each DataFrame in the list, denoted either by column labels or column indices. If a sequence of labels or indices is given, MultiIndex will be formed for the row labels. Note: index_col=False can be used to force pandas to not use the first column as the index, e.g., when you have a malformed file with delimiters at the end of each line.

  • usecols (list of Hashable or Callable, optional) – Subset of columns to select for each DataFrame in the list, denoted either by column labels or column indices. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). If names are given, the document header row(s) are not taken into account. For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’]. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=[‘foo’, ‘bar’])[[‘foo’, ‘bar’]] for columns in [‘foo’, ‘bar’] order or pd.read_csv(data, usecols=[‘foo’, ‘bar’])[[‘bar’, ‘foo’]] for [‘bar’, ‘foo’] order. If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in [‘AAA’, ‘BBB’, ‘DDD’]. Using this parameter results in much faster parsing time and lower memory usage.

  • skiprows (int, list of int or Callable, optional) – Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of each file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

  • nrows (int, optional) – Number of rows of file to read. Useful for reading pieces of large files.

  • **kwargs – Additional keyword arguments passed to pandas.read_csv.

Returns:

List of DataFrames.

Return type:

list

pyfasma.fileio.merge_csvs(path, exclude_files=None, exclude_dirs=None, delimiter=';', header='infer', names=None, index_col=None, usecols=None, skiprows=None, nrows=None, columns='filenames', how='inner', interpolate=False, xnew=None, kind='cubic', **kwargs) pandas.DataFrame

Megre the CSV files found in the path specified to a single DataFrame.

Parameters:
  • path (str) – Path to the directory containing the SPC files.

  • exclude_files (list, optional) – Exclude the specified files so they aren’t included in the resulting list. Each filename must include the extension and must be specified as a string. The parameter can also include lists of files to be excluded.

  • exclude_dirs (list, optional) – Exclude the specified direcories so they aren’t included in the resulting list. Each directory name must be specified as a string. The parameter can also include lists of files to be excluded.

  • delimiter (str, default ';') – Character or regex pattern to treat as the delimiter. If delimiter=None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator from only the first valid row of the file by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from ‘s+’ will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone too ignoring quoted data. Regex example: ‘ ‘.

  • header (int, Sequence of int, "infer" or None) – Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly to names then the behavior is identical to header=None. Explicitly passed header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a MultiIndex on the columns e.g. [0, 1, 3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

  • names (Sequence of Hashable, optional) – Sequence of column labels to apply to each DataFrame in the list. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

  • index_col (Hashable, Sequence of Hashable or False, optional) – Column(s) to use as row label(s) for each DataFrame in the list, denoted either by column labels or column indices. If a sequence of labels or indices is given, MultiIndex will be formed for the row labels. Note: index_col=False can be used to force pandas to not use the first column as the index, e.g., when you have a malformed file with delimiters at the end of each line.

  • usecols (list of Hashable or Callable, optional) – Subset of columns to select for each DataFrame in the list, denoted either by column labels or column indices. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). If names are given, the document header row(s) are not taken into account. For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’]. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=[‘foo’, ‘bar’])[[‘foo’, ‘bar’]] for columns in [‘foo’, ‘bar’] order or pd.read_csv(data, usecols=[‘foo’, ‘bar’])[[‘bar’, ‘foo’]] for [‘bar’, ‘foo’] order. If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in [‘AAA’, ‘BBB’, ‘DDD’]. Using this parameter results in much faster parsing time and lower memory usage.

  • skiprows (int, list of int or Callable, optional) – Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of each file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

  • nrows (int, optional) – Number of rows of file to read. Useful for reading pieces of large files.

  • columns (list of str or "filenames" (default)) – List of names for the columns of the pandas DataFrames. If “filenames” is used, the filenames of the CSV files without the extension will be used as column names. If None, the column names will not be changed.

  • how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'inner') – Type of merge to be performed. - left: use only keys from left frame, similar to a SQL left outer join; preserve key order. - right: use only keys from right frame, similar to a SQL right outer join; preserve key order. - outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically. - inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys. - cross: creates the cartesian product from both frames, preserves the order of the left keys.

  • interpolate (bool, optional) – If True, interpolate the columns of the DataFrame at the points of the array xnew. The values of the columns of the DataFrame correspond to the values of the DataFrame’s index and the values of xnew become the new index. Both the index of the DataFrame and xnew must have numerical values in ascending order.

  • xnew (np.ndarray) – Array of new x points that the DataFrame’s columns are interpolated at. Must be in ascending order. This parameter only has effect if interpolate=True.

  • kind ({'linear', 'cubic'}) – The type of interpolation to be applied. If ‘linear’, a linear function is used to estimate the interpolated values of y. If ‘cubic’ (default), a 3rd degree polynomial is used to estimated the interpolated values of y. ‘linear’ is faster but may have small deviations from the original points (usually negligible). ‘cubic’ is more accurate but is more computationally expensive. This parameter only has effect if interpolate=True.

  • **kwargs – Additional keyword arguments are passed to pandas.read_csv.

Returns:

A DataFrame of the merged DataFrames in the list.

Return type:

DataFrame

pyfasma.fileio.spc2csv(path: str, output_dir=None, exclude_files=None, exclude_dirs=None, keep_structure=True, suppress_output=True, preview=True, apply=False, delimiter=';', newline='\n') None

Convert SPC files to CSV.

Parameters:
  • path (str) – If the specified path is an SPC file, convert it to CSV. If the specified path is a directory, recursively convert all SPC files inside it to CSV.

  • output_dir (str, optional) – Path to directory to save the converted files. If None (default), the files will be saved in the base directory of path.

  • exclude_files (list, optional) – Exclude the specified files so they aren’t included in the resulting list. Each filename must include the extension and must be specified as a string. The parameter can also include lists of files to be excluded.

  • exclude_dirs (list, optional) – Exclude the specified direcories so they aren’t included in the resulting list. Each directory name must be specified as a string. The parameter can also include lists of files to be excluded.

  • keep_structure (bool, default 'True') – If True (default), the converted files will preserve the directory structure of the original files.

  • suppress_output (bool, default "True") – The spc module by default displays some information about the structure of the SPC file. If True (default) this output will be suppressed, providing more focus to the file conversion taking place.

  • preview (bool, default 'True') – If True (default), preview file conversion.

  • apply (bool, default 'False') – If True, the file conversion operation will be applied.

  • delimiter (str, default ';') – Character or regex pattern to treat as the delimiter. If delimiter=None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator from only the first valid row of the file by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from ‘s+’ will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone too ignoring quoted data. Regex example: ‘ ‘.

  • newline (str, default 'n') – Character to be used as newline character in the CSV file.

Returns:

This function does not return a value.

Return type:

None