modeling
module¶
- class pyfasma.modeling.PCA(df, n_components=10, summary=True, hue=None, **kwargs)¶
Bases:
object
Principal Component Analysis (PCA) implementation for dimensionality reduction, feature extraction, and data visualization.
This class performs PCA on the given dataset, transforming it into a set of uncorrelated principal components. These components can be used for reducing the dimensionality of the data, extracting important features, and visualizing high-dimensional data. The class is based on sklearn.decomposition.PCA.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- Parameters:
df (pd.DataFrame) – The input data for PCA of shape (n_samples, n_features).
n_components ({int, float, 'mle', None}, optional) –
Number of components to keep. If n_components is not set, all components are kept:
n_components == min(n_samples, n_features)
If n_components == ‘mle’ and svd_solver == ‘full’, Minka’s MLE is used to guess the dimension. Use of n_components == ‘mle’ will interpret svd_solver == ‘auto’ as svd_solver == ‘full’.
If 0 < n_components < 1 and svd_solver == ‘full’, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.
If svd_solver == ‘arpack’, the number of components must be strictly less than the minimum of n_features and n_samples.
Hence, the None case results in:
n_components == min(n_samples, n_features) - 1
summary (bool, optional) – If True (default), prints a summary of the explained variance by each principal component.
hue (str or None, optional) – A column name in df used for color encoding in plots. This is typically used for categorical variables. Default is None.
**kwargs (keyword arguments, optional) –
Additional keyword arguments passed to sklearn.decomposition.PCA.
See: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- property loadings_df: pandas.DataFrame¶
DataFrame containing the Principal Component Analysis (PCA) loadings for each component (PC).
- Returns:
The DataFrame containing the PCA scores for each PC.
- Return type:
pd.DataFrame
- loadings_multiple_plots(pc=[1], show_percent=True, figsize=None, nrows=1, ncols=1, sharex=False, sharey=False, label=None, color=None, alpha=None, linestyle=None, linewidth=None, marker=None, markersize=None, fill=True, fill_alpha=0.25, title=None, title_y=0.98, xlabel=None, xlabel_bottom=False, ylabel=None, ylabel_left=False, xlim=None, ylim=None, legend=True, legend_loc='best', legend_kwargs={}, **plot_kwargs)¶
Plot the specified Principal Components Analysis (PCA) loadings as lines in multiple subplots.
- Parameters:
pc (list[int], optional) – List containing the Principal Components (PCs) to plot. The list must contain only the integer part of the PCs to be plotted, i.e. pc=1 for the first component (PC1),`pc=2` for the second component (PC2), etc. The order of the PCs in the list determines their plot order, i.e. the first PC in the list is plotted on the bottom, the second above that, etc.
show_percent (bool, optional) – If True (default), show the percent of explained variance along the axes labels.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
nrows (int, optional) – Number of rows of the subplot grid. Default is 1.
ncols (int, optional) – Number of columns of the subplot grid. Default is 1.
sharex (bool or {'none', 'all', 'row', 'col'}, optional) –
Controls sharing of properties among the x axis:
True or ‘all’: x-axis will be shared among all subplots.
False or ‘none’: each subplot x-axis will be independent.
’row’: each subplot row will share an x-axis.
’col’: each subplot column will share an x-axis.
When subplots have a shared x-axis along a column, only the x tick labels of the bottom subplot are created. To later turn other subplots’ ticklabels on, use tick_params.
When subplots have a shared axis that has units, calling Axis.set_units will update each axis with the new units. Note that it is not possible to unshare axes.
sharey (bool or {'none', 'all', 'row', 'col'}, optional) –
Controls sharing of properties among the y axis:
True or ‘all’: y-axis will be shared among all subplots.
False or ‘none’: each subplot y-axis will be independent.
’row’: each subplot row will share an y-axis.
’col’: each subplot column will share an y-axis.
When subplots have a shared y-axis along a column, only the y tick labels of the bottom subplot are created. To later turn other subplots’ ticklabels on, use tick_params.
When subplots have a shared axis that has units, calling Axis.set_units will update each axis with the new units. Note that it is not possible to unshare axes.
label (list[str], optional) – List of strings to be used as labels for the corresponding PC loading. The labels will be displayed in the legend if legend=True. If None (default), the default labels will be used.
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (list[float], optional) –
List of floating-point numbers that set the transparency of the lines drawn for the corresponding pairs of xlist and ylist. The numbers must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency for all lines is set to 1.
linestyle (list[str], optional) –
List of Matplotlib-accepted line styles for each of the plotted lines. If None (default), all lines are drawn as solid lines.
linewidth (list[float], optional) – List of floating-point numbers that set the width (in points) of each plotted line. If None (default), all lines will have a width of 1.5.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), no markers are drawn.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
fill (bool, optional) – If True (default), fill the area below the curve.
fill_alpha (float, optional) – Set the transparency of the filled area. The value of the parameter must be in the range 0 (completely transparent line) to 1 (completely opaque line). Default 0.25.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default), the index name of the input DataFrame is used.
xlabel_bottom (bool, optional) – If True, show the xlabel only at the bottom subplots. Default False.
ylabel (str, optional) – The label of the y-axis. If None (default), the string ‘Intensity (a.u.)’ is used.
ylabel_left (bool, optional) – If True, show the ylabel only at the left-most subplots. Default False.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
axes (matplotlib.axes.Axes) – The axes object containing the subplots.
- Raises:
ValueError – If pc is not a list. If any item in pc is not an integer.
- loadings_single_plot(pc=[1], show_percent=True, ax=None, figsize=None, label=None, color=None, alpha=None, linestyle=None, linewidth=None, marker=None, markersize=None, fill=True, fill_alpha=0.25, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', legend_kwargs={}, **plot_kwargs)¶
Plot the specified Principal Components Analysis (PCA) loadings as lines in a single plot.
- Parameters:
pc (list[int], optional) – List containing the Principal Components (PCs) to plot. The list must contain only the integer part of the PCs to be plotted, i.e. pc=1 for the first component (PC1),`pc=2` for the second component (PC2), etc. The order of the PCs in the list determines their plot order, i.e. the first PC in the list is plotted on the bottom, the second above that, etc.
show_percent (bool, optional) – If True (default), show the percent of explained variance along the axes labels.
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
label (list[str], optional) – List of strings to be used as labels for the corresponding PC loading. The labels will be displayed in the legend if legend=True. If None (default), the default labels will be used.
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (list[float], optional) –
List of floating-point numbers that set the transparency of the lines drawn for the corresponding pairs of xlist and ylist. The numbers must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency for all lines is set to 1.
linestyle (list[str], optional) –
List of Matplotlib-accepted line styles for each of the plotted lines. If None (default), all lines are drawn as solid lines.
linewidth (list[float], optional) – List of floating-point numbers that set the width (in points) of each plotted line. If None (default), all lines will have a width of 1.5.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), no markers are drawn.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
fill (bool, optional) – If True (default), fill the area below the curve.
fill_alpha (float, optional) – Set the transparency of the filled area. The value of the parameter must be in the range 0 (completely transparent line) to 1 (completely opaque line). Default 0.25.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default), the index name of the input DataFrame is used.
ylabel (str, optional) – The label of the y-axis. If None (default), the string ‘Intensity (a.u.)’ is used.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- Raises:
ValueError – If pc is not a list. If any item in pc is not an integer.
- scores_2d_kde_plot(xpc=1, ypc=2, show_percent=True, ax=None, figsize=None, hue=None, color=None, alpha=None, linewidth=None, fill=False, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', legend_kwargs={}, sns_kwargs={}, **mpl_kwargs)¶
Plot the Kernel Density Estimates (KDEs) contours of each class along the specified Principal Components (PCs).
See: https://seaborn.pydata.org/generated/seaborn.kdeplot.html
- Parameters:
xpc (int, optional) – Principal component (PC) scores to use as the x-axis. It must be only the integer part of the PC, i.e. xpc=1 for the first component (PC1),`xpc=2` for the second component (PC2), etc.
ypc (int, optional) – Principal component (PC) scores to use as the y-axis. It must be only the integer part of the PC, i.e. ypc=1 for the first component (PC1),`ypc=2` for the second component (PC2), etc.
show_percent (bool, optional) – If True (default), show the percent of explained variance along the axes labels.
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
hue (list, optional) – Grouping variable that will produce contours with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case, i.e. it wil use a sequentianl colormap by default.
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the contours drawn for each class. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) –
Set the transparency of the contours. The number must be in the range 0 (completely transparent) to 1 (completely opaque). If None (default), the transparency is set to 1.
linewidth (float or array-like, optional) –
The line width of the contour lines. If a number, all levels will be plotted with this linewidth. If a sequence, the levels in ascending order will be plotted with the linewidths in the order specified. If None (default), it is set to 1.5.linewidth”] (default: 1.5).
Note: this parameter is ignored if fill=True.
fill (bool, optional) – If True (default), fill the area below the curve.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default), the index name of the input DataFrame is used.
ylabel (str, optional) – The label of the y-axis. If None (default), the string ‘Density’ is used.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
sns_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to seaborn.kdeplot.
See: https://seaborn.pydata.org/generated/seaborn.kdeplot.html
**mpl_kwargs –
Additional keyword arguments are passed to one of the following matplotlib functions:
matplotlib.axes.Axes.contour() (fill=False): https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.contour.html#matplotlib.axes.Axes.contour
matplotlib.axes.contourf() (fill=True)
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
axes (matplotlib.axes.Axes) – The axes object containing the subplots.
- property scores_df: pandas.DataFrame¶
DataFrame containing the Principal Component Analysis (PCA) scores for each component (PC).
- Returns:
The DataFrame containing the PCA scores for each PC.
- Return type:
pd.DataFrame
- scores_kde_plot(pc=1, show_percent=True, ax=None, figsize=None, hue=None, label=None, color=None, alpha=0.2, linewidth=None, fill=True, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', legend_kwargs={}, sns_kwargs={}, **mpl_kwargs)¶
Plot the Kernel Density Estimates (KDEs) of each class along the specified Principal Component (PC).
See: https://seaborn.pydata.org/generated/seaborn.kdeplot.html
- Parameters:
pc (int, optional) – Principal Component along which the KDE plots are drawn. The parameter must contain only the integer part of the PCs to be plotted, i.e. pc=1 for the first component (PC1),`pc=2` for the second component (PC2), etc.
show_percent (bool, optional) – If True (default), show the percent of explained variance along the axes labels.
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
hue (list, optional) – Grouping variable that will produce lines with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case, i.e. it wil use a sequentianl colormap by default.
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for each class. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) –
If fill=True, sets the transparency of the fill, else sets the transparency of the lines. The number must be in the range 0 (completely transparent line) to 1 (completely opaque line). Default is 0.2.
linewidth (float, optional) – Set the width (in points) of each plotted line. If None (default), all lines will have a width of 1.5.
fill (bool, optional) – If True (default), fill the area below the curve.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default), the index name of the input DataFrame is used.
ylabel (str, optional) – The label of the y-axis. If None (default), the string ‘Density’ is used.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
sns_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to seaborn.kdeplot.
See: https://seaborn.pydata.org/generated/seaborn.kdeplot.html
**mpl_kwargs –
Additional keyword arguments are passed to one of the following matplotlib functions:
matplotlib.axes.Axes.plot() (fill=False): https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot
matplotlib.axes.Axes.fill_between() (fill=True): https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.fill_between.html#matplotlib.axes.Axes.fill_between
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
axes (matplotlib.axes.Axes) – The axes object containing the subplots.
- scores_plot(xpc=1, ypc=2, hue=None, show_percent=True, ax=None, figsize=None, color=None, alpha=None, marker=None, markersize=None, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', annotate=False, annotate_ha='center', annotate_va='bottom', annotate_xoffset=0, annotate_yoffset=0, annotate_fontsize=None, ellipse=True, ellipse_conf=0.95, ellipse_alpha=0.25, ellipse_linewidth=1, ellipse_linestyle=None, ellipse_edgecolor='none', legend_kwargs={}, annotate_kwargs={}, ellipse_kwargs={}, **plot_kwargs)¶
Plot pairs of Principal Components Analysis (PCA) scores.
- Parameters:
xpc (int, optional) – Principal component (PC) scores to use as the x-axis. It must be only the integer part of the PC, i.e. xpc=1 for the first component (PC1),`xpc=2` for the second component (PC2), etc.
ypc (int, optional) – Principal component (PC) scores to use as the y-axis. It must be only the integer part of the PC, i.e. ypc=1 for the first component (PC1),`ypc=2` for the second component (PC2), etc.
hue (list, optional) – Grouping variable that will produce points with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case, i.e. it wil use a sequentianl colormap by default.
show_percent (bool, optional) – If True (default), show the percent of explained variance along the axes labels.
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) – Set the transparency of the points as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), the default seaborn.scatterplot markers are used.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the x axis.
ylabel (str, optional) – The label of the y-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the y axis.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate (bool, optional) – If True (default), add the percentages of each PC in the plot.
annotate_ha ({'left', 'center', 'right'}) – The horizontal anchor to which the text is aligned. Default ‘center’.
annotate_va ({'baseline', 'bottom', 'center', 'center_baseline',) – ‘top’}, optional The vertical anchor to which the text is aligned. Default ‘bottom’.
annotate_xoffset (Union[int, float], optional) – Horizontal offset to add to the annotated text. Default 0.
annotate_yoffset (Union[int, float], optional) – Vertical offset to add to the annotated text. Default 0.
annotate_fontsize (float or {'xx-small', 'x-small', 'small', 'medium', 'large',) – ‘x-large’, ‘xx-large’}, optional Set the font size in points. The string values denote sizes relative to the default font size.
ellipse (bool) – If True (default), draw covariance confidence ellipses over the (x, y) PC scores. The group of points that each ellipse is calculated for is determined by hue. The color of each ellipse is the same as the color of the group it is calculated for.
ellipse_conf (float) – The confidence interval that defines the size of the ellipse. By default the 95% (0.95) confidence interval is used.
ellipse_alpha (float, optional) – Set the transparency of the ellipse as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
ellipse_linewidth (float, optional) – The width of the ellipse’s outline in points. If None (default), the width is set to 1.5 points.
ellipse_linestyle (str, optional) –
A Matplotlib-accepted line style for the ellipse’s outline. If None (default), a solid line will be used.
ellipse_edgecolor (RGBColorType or 'none', optional) –
A Matplotlib-accepted color for the outline of the ellipse. If ‘none’, no color will be applied to the ellipse’s outline.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
ellipse_kwargs (dict, optional) –
Additional keyword arguments are passed to matplotlib.patches.Ellipse.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.patches.Ellipse.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- scree_plot(ax=None, show_line='all', figsize=None, label=None, color=None, alpha=None, linestyle=None, linewidth=None, marker=['s', 'o'], markersize=None, title=None, title_y=None, xlabel='Principal components', ylabel='Explained ratio (%)', xlim=None, ylim=[-5, 107.5], legend=True, legend_loc='center right', annotate=True, annotate_ha='center', annotate_va='bottom', annotate_xoffset=0, annotate_yoffset=1.5, annotate_fontsize=None, legend_kwargs={}, **plot_kwargs)¶
Scree plot for the PCA.
- Parameters:
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
label (list[str], optional) – List of strings to be used as labels for the corresponding pairs of xlist and ylist. The labels will be displayed in the legend if legend=True. If None (default), a warning is raised and the legend is returned empty. In this case also using legend=False is recommended.
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (list[float], optional) –
List of floating-point numbers that set the transparency of the lines drawn for the corresponding pairs of xlist and ylist. The numbers must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency for all lines is set to 1.
linestyle (list[str], optional) –
List of Matplotlib-accepted line styles for each of the plotted lines. If None (default), all lines are drawn as solid lines.
linewidth (list[float], optional) – List of floating-point numbers that set the width (in points) of each plotted line. If None (default), all lines will have a width of 1.5.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), no markers are drawn.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. Default ‘Principal Components’.
ylabel (str, optional) – The label of the y-axis. Default ‘Explained ratio (%)’.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate (bool, optional) – If True (default), add the percentages of each PC in the plot.
annotate_ha ({'left', 'center', 'right'}) – The horizontal anchor to which the text is aligned. Default ‘center’.
annotate_va ({'baseline', 'bottom', 'center', 'center_baseline',) – ‘top’}, optional The vertical anchor to which the text is aligned. Default ‘bottom’.
annotate_xoffset (Union[int, float], optional) – Horizontal offset to add to the annotated text. Default 0.
annotate_yoffset (Union[int, float], optional) – Vertical offset to add to the annotated text. Default 0.
annotate_fontsize (float or {'xx-small', 'x-small', 'small', 'medium', 'large',) – ‘x-large’, ‘xx-large’} Set the font size in points. The string values denote sizes relative to the default font size.
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- summary_plot(figsize=None, hue=None, legend_loc=None)¶
- property variance_df: pandas.DataFrame¶
DataFrame containing the variance, the cumulative variance and their respective ratios explained by the Principal Component Analysis (PCA) components (PCs).
- Returns:
The DataFrame containing the variance, the cumulative variance and their respective ratios explained by the PCs.
- Return type:
pd.DataFrame
- class pyfasma.modeling.PLSDA(x_train, x_test, y_train, y_test, cross_val=False, n_components=2, stratify=True, n_splits=5, n_repeats=10, random_state=None, scale=False, max_iter=500, tol=1e-06)¶
Bases:
object
Partial Least Squares Discriminant Analysis (PLS-DA) class for supervised classification.
This class implements the PLS-DA algorithm, an extension of Partial Least Squares Regression (PLSR) that applies on categorical (discrete) response variables. It classifies observations into predefined classes based on the predictor variables by uncovering the latent variables that maximize the separation between the classes. The class is based on sklearn.cross_decomposition.PLSRegression.
See: https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html
The class supports both repeated cross-validation and repeated stratified cross-validation to help determine the optimal number of components used in the model. Additionally, it can seamlessly perform binary and multiclass classification.
- Parameters:
x_train (pd.DataFrame) – DataFrame of shape (n_samples, n_features) containing the predictors which the model is trained on.
x_test (pd.DataFrame) – DataFrame of shape (n_samples, n_features) containing the predictors which are used to test the model.
y_train (list[str]) – List containing the responses (classes) corresponding to x_train which the model is trained on.
y_test (list[str]) – List containing the responses (classes) corresponding to x_test which are used to test the model.
cross_val (bool, optional) – If True, perform cross-validation on the training data to determine the optimal number of components for the predictive model. In that case, a plot containing the following performance metrics for different number components is created: accuracy, Q2, MSE, R2. Default is False.
n_components (int, optional) –
This parameter has slightly different function depending on the value of cross_val.
If cross_val=False, it represents the number of components to keep for the PLS-DA model. If cross_val=True, it sets the number of model components up to which the model is evaluated using cross-validation.
Should be in the range [1, n_features]. Default is 2.
stratify (bool, optional) –
If True (default), split the training data for cross-validation in a stratified fashion using the contents of y_train as class labels.
Note: This parameter only has effect if cross_val=True.
n_splits (int, optional) – Number of folds. Must be at least 2. Default 5.
n_repeats (int, optional) – Number of times cross-validator needs to be repeated. Default 10.
random_state (int, optional) – Controls the randomness of each repeated cross-validation instance. Pass an int for reproducible output across multiple function calls. Default is None
scale (bool, optional) – If True, scale (standardize) the train and test data to have a mean of 0 and a variance of 1. Scaling applies to each feature independently. Default is False.
max_iter (int, optional) – The maximum number of iterations of the power method (Nonlinear Iterative Partial Least Squares - NIPALS algorithm). Default is 500.
tol (float, optional) – The tolerance used as convergence criteria in the power method: the algorithm stops whenever the squared norm of u_i - u_{i-1} is less than tol, where u corresponds to the left singular vector. Default is 1e06.
- property coef_df: pandas.DataFrame¶
Dataframe of shape (n_features, n_targets) containing the coefficients of the linear model such that Y is approximated as Y = X @ coef + intercept. Each column represents how each feature contributes to predicting the respective target.
- Returns:
DataFrame containing the coefficients of the linear model.
- Return type:
pd.DataFrame
- confusion_matrix(labels=None, normalize=None, sample_weight=None) numpy.ndarray ¶
Compute the confusion matrix to evaluate the accuracy of the classification of the test data.
By definition a confusion matrix \(C\) is such that \(C_{i, j}\) is equal to the number of observations known to be in group \(i\) and predicted to be in group \(j\).
Thus in binary classification, the count of true negatives is \(C_{0,0}\), false negatives is \(C_{1,0}\), true positives is \(C_{1,1}\) and false positives is \(C_{0,1}\).
- Parameters:
labels (list, optional) – List of labels of shape (n_classes) to index the matrix. This may be used to reorder or select a subset of labels. If None (default), those that appear at least once in y_true or y_pred are used in sorted order.
normalize ({'true', 'pred', 'all'}, optional) – Normalizes the confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None (default), the confusion matrix will not be normalized.
sample_weight (list, optional) – List of shape (n_samples) containing sample weights.
- Returns:
Confusion matrix of shape (n_classes (true), n_classes (pred)) whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and predicted label being j-th class.
- Return type:
np.ndarray
- property confusion_matrix_components_df: pandas.DataFrame¶
DataFrame containing the components of the confusion matrix, i.e number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) for each class.
- Returns:
DataFrame containing the components of the confusion matrix.
- Return type:
pd.DataFrame
- confusion_matrix_df(labels=None, normalize=None, sample_weight=None) pandas.DataFrame ¶
Confusion matrix to evaluate the accuracy of the classification of the test data in DataFrame representation. The index of the DataFrame represents the tar get classes, while the columns represent the predicted classes as these are contained in the class’ classes property.
- Parameters:
labels (list, optional) – List of labels of shape (n_classes) to index the matrix. This may be used to reorder or select a subset of labels. If None (default), those that appear at least once in y_true or y_pred are used in sorted order.
normalize ({'true', 'pred', 'all'}, optional) – Normalizes the confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None (default), the confusion matrix will not be normalized.
sample_weight (list, optional) – List of shape (n_samples) containing sample weights.
- Returns:
DataFrame containing the confusion matrix of the test data.
- Return type:
pd.DataFrame
- confusion_matrix_plot(figsize=None, labels=None, normalize=None, sample_weight=None, display_labels=None, include_values=True, xticks_rotation='horizontal', xticks_ha='center', yticks_rotation='horizontal', yticks_va='center', values_format=None, cmap='viridis', ax=None, colorbar=True, im_kw=None, text_kw=None)¶
Confusion matrix to evaluate the accuracy of the classification of the test data as a matplotlib plot/image.
- Parameters:
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches.
labels (list, optional) – List of labels of shape (n_classes) to index the matrix. This may be used to reorder or select a subset of labels. If None (default), those that appear at least once in y_true or y_pred are used in sorted order.
normalize ({'true', 'pred', 'all'}, optional) – Normalizes the confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None (default), the confusion matrix will not be normalized.
sample_weight (list, optional) – List of shape (n_samples) containing sample weights.
display_labels (array-like, optional) – List of shape (n_classes’) with the target names used for plotting. If `None (default), labels will be used if it is defined, otherwise the unique labels of y_true and y_pred will be used.
include_values (bool, optional) – If True (default) includes values in the confusion matrix.
xticks_rotatwon ({'vertwcal', 'horizontal'} or float, optional) – Rotation of xtick labels. Default is ‘horizontal’.
xticks_ha ({'left', 'center', 'right'}, optional) – Horizontal alignment of the xtick labels.
yticks_rotation ({‘vertical’, ‘horizontal’} or float, optional) – Rotation of ytick labels. Default is ‘vertical’.
yticks_ha ({'top', 'center', 'bottom'}) – Vertical alignment of the ytick labels.
values_format (str, optional) – Format specification for values in confusion matrix. If None (default), the format specification is ‘d’ or ‘.2g’ whichever is shorter.
cmap (str or matplotlib colormap, optional) – A matplotlib-acce[ted colormap. By default ‘viridis’.
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
colorbar (bool) – If True (default), add a colorbar to the plot.
im_kw (dict) – Dict with keywords passed to matplotlib.pyplot.imshow call.
text_kw (dict) – Dict with keywords passed to matplotlib.pyplot.text call.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- property cv_metrics_df: pandas.DataFrame¶
DataFrame containing the mean values and standard deviations of the metrics obtained through cross-validation. The metrics include the accuracy of the model (percentage of correctly predicted classes), the Q2 score (indication of how well the regression predictions approximate the real data), mean square error (MSE) (average of squared differences between the actual and predicted values), and R2 (indication of how well the model is fitted to the training data).
- Returns:
DataFrame containing the mean values and standard deviations of the metrics obtained through cross-validation.
- Return type:
pd.DataFrame
- plot_cv_metrics(metrics=['Accuracy', 'Q2', 'MSE', 'R2'], std=True, ax=None, figsize=None, label=None, color=None, alpha=None, linestyle=None, linewidth=None, marker=['s', 'o', 'D', '^'], markersize=None, ecolor=None, elinewidth=None, capsize=3, fill=False, fill_alpha=0.25, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', legend_kwargs={}, **plot_kwargs)¶
- predict(x_test: Tuple[numpy.ndarray, pandas.DataFrame]) numpy.ndarray ¶
Use the fitted model to get predictions from the input array of shape (n_samples, n_features).
- Parameters:
x_test (np.ndarray or pd.DataFrame) – The test vectors of shape (n_samples, ‘n_features’) that are used for prediction.
- Returns:
y_pred (np.ndarray) – The predicted values with shape (m_samples, n_targets).
y_pred_bin (np.ndarray) – The predicted values in binarized representation with shape (m_samples, n_targets).
- property prediction_df: pandas.DataFrame¶
DataFrame containing the target and predicted classes for each sample of the test data.
- Returns:
DataFrame containing the target and predicted classes for each sample.
- Return type:
pd.DataFrame
- property prediction_metrics_df: pandas.DataFrame¶
DataFrame containing the metrics used to evaluate the predictive model. The metrics include precision, sensitivity, specificity, F1 score, and ROC-AUC (Receiver Operating Characteristic - Area Under Curve) per class and their respective macro and micro metrics. Accuracy is also included, shown as NaN for each class, since accuracy only measures the model as a whole, and has the same value for macro and micro averages (included in both for reasons of consistency).
- Returns:
DataFrame containing the metrics used to evaluate the predictive model.
- Return type:
pd.DataFrame
- prediction_plot(ax=None, orientation='horizontal', figsize=None, color=None, marker=None, markersize=None, alpha=None, xlabel='Samples', ylabel='Classes', title=None, title_y=0, legend=True, legend_loc='best', legend_kwargs={}, **plot_kwargs)¶
Plot classes vs samples for the target and prediction values of the test data.
- Parameters:
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
orientation ({'horizontal', 'vertical'}, optional) – Sets the orientation of the plot. Horizontal orientation provides better overview of the classification, but xtick labels (samples) may overlap. If the sample names are important, a vertical orientation might be preferable with proper adjustment of the figsize parameter. If vertical, the default xlabel and ylabel values are swapped. Default is ‘horizontal’.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) – Set the transparency of the points as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), the default seaborn.scatterplot markers are used.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the x axis.
ylabel (str, optional) – The label of the y-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the y axis.
legend (bool, optional) – If True show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- roc_all_ovr_plot(ax=None, figsize=None, color=None, alpha=None, label=None, linestyle=None, linewidth=None, title=None, title_y=None, xlabel='False Positive Rate', ylabel='True Positive Rate', legend=True, legend_loc='best', drop_intermediate=True, sample_weight=None, chance_level_color='k', chance_level_alpha=None, chance_level_label='Chance level (AUC = 0.5)', chance_level_linestyle='--', chance_level_linewidth=None, chance_level_kwargs={}, legend_kwargs={}, **plot_kwargs)¶
One vs rest (OVR) receiver operating characteristic (ROC) curve for the specified class.
- Parameters:
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
label (list[str], optional) – List of labels of the ROC curves that are used in the legend. If None (default) the label of each class specified in class_label along with its calculated area under curve (AUC) will be used as: “{class_label} vs rest (AUC = {auc})”
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the ROC curves. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (list[float], option) –
List of floating-point numbers that set the transparency of each of the ROC curves. The numbers must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
linestyle (list[str], optional) –
List of Matplotlib-accepted line styles for the ROC curves. If None (default), solid lines are used for all curves.
linewidth (list[float], optional) – List of floating-point numbers that set the width (in points) of the ROC curves. If None (default), all lines will have a width of 1.5.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. Default ‘False Positive Rate’.
ylabel (str, optional) – The label of the y-axis. Default ‘True Positive Rate’.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
drop_intermediate (bool) – If True (default) drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.
sample_weight (list, optional) – List of shape (n_samples) containing sample weights.
chance_level_color (RGBColorType, optional) –
Matplotlib-accepted color to be used for the chance level line. Default ‘k’.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
chance_level_alpha (float, optional) –
Floating-point number that sets the transparency of the chance level line. The number must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
chance_level_label (str, optional) – The label of the chance level line that is used in the legend. Default “Chance level (AUC = 0.5)”.
chance_level_linestyle (str, optional) –
Matplotlib-accepted line style for the chance level line. Default ‘–‘.
chance_level_linewidth (float, optional) – Floating-point number that sets the width (in points) of the chance level curve. If None (default), the line will have a width of 1.5.
chance_level_kwargs (dict) – Additional keyword arguments to be passed to matplotlib.pyplot.plot for rendering the chance level line.
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- roc_ovr_plot(class_label: str, ax=None, figsize=None, color=None, alpha=None, label=None, linestyle=None, linewidth=None, title=None, title_y=None, xlabel='False Positive Rate', ylabel='True Positive Rate', legend=True, legend_loc='best', drop_intermediate=True, sample_weight=None, chance_level_color='k', chance_level_alpha=None, chance_level_label='Chance level (AUC = 0.5)', chance_level_linestyle='--', chance_level_linewidth=None, chance_level_kw={}, legend_kwargs={}, **plot_kwargs)¶
One vs rest (OVR) receiver operating characteristic (ROC) curve for the specified class.
- Parameters:
class_label (str) – The class for which the ROC curve is plotted. Must be one of the classes in y_train (the class’ classes item).
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
label (str, optional) – The label of the ROC curve that is used in the legend. If None (default) the label of the class specified in class_label along with its calculated area under curve (AUC) will be used as: “{class_label} vs rest (AUC = {auc})”
color (RGBColorType, optional) –
Matplotlib-accepted color to be used for the ROC curve. If None (default), the default Matplotlib color will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) –
Floating-point number that sets the transparency of the ROC curve. The number must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
linestyle (str, optional) –
Matplotlib-accepted line style for the ROC curve. If None (default), a solid line is used.
linewidth (float, optional) – Floating-point number that sets the width (in points) of the ROC curve. If None (default), the line will have a width of 1.5.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. Default ‘False Positive Rate’.
ylabel (str, optional) – The label of the y-axis. Default ‘True Positive Rate’.
legend (bool, optional) – If True (default), show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
drop_intermediate (bool) – If True (default) drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves.
sample_weight (list, optional) – List of shape (n_samples) containing sample weights.
chance_level_color (RGBColorType, optional) –
Matplotlib-accepted color to be used for the chance level line. Default ‘k’.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
chance_level_alpha (float, optional) –
Floating-point number that sets the transparency of the chance level line. The number must be in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
chance_level_label (str, optional) – The label of the chance level line that is used in the legend. Default “Chance level (AUC = 0.5)”.
chance_level_linestyle (str, optional) –
Matplotlib-accepted line style for the chance level line. Default ‘–‘.
chance_level_linewidth (float, optional) – Floating-point number that sets the width (in points) of the chance level curve. If None (default), the line will have a width of 1.5.
chance_level_kw (dict) – Additional keyword arguments to be passed to matplotlib.pyplot.plot for rendering the chance level line.
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- property x_loadings_df: pandas.DataFrame¶
DataFrame containing the loadings of the predictor variables for the PLS components.
- Returns:
DataFrame with the loadings (x_loadings_) of each predictor variable.
- Return type:
pd.DataFrame
- property x_rotations_df: pandas.DataFrame¶
DataFrame containing the rotations of the predictor variables for the PLS components.
- Returns:
DataFrame with the rotations (x_rotations_) of each predictor variable.
- Return type:
pd.DataFrame
- property x_scores_df: pandas.DataFrame¶
DataFrame of shape (n_samples, n_components) containing the X scores (transformed data) of the training samples.
- Returns:
DataFrame containing the X scores of the training samples.
- Return type:
pd.DataFrame
- x_scores_plot(xlv=1, ylv=2, show_percent=True, ax=None, figsize=None, color=None, alpha=None, marker=None, markersize=None, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', annotate=False, annotate_ha='center', annotate_va='bottom', annotate_xoffset=0, annotate_yoffset=0, annotate_fontsize=None, ellipse=True, ellipse_conf=0.95, ellipse_alpha=0.25, ellipse_linewidth=1, ellipse_linestyle=None, ellipse_edgecolor='none', legend_kwargs={}, annotate_kwargs={}, ellipse_kwargs={}, **plot_kwargs)¶
Plot pairs of Latent Variables (LV) scores for the predictors (x) of the train dataset.
- Parameters:
xlv (int, optional) – Latent Variable (LV) scores to use as the x-axis. It must be only the integer part of the PC, i.e. xlv=1 for the first component (LV1),`xlv=2` for the second component (LV2), etc.
ylv (int, optional) – Latent Variable (LV) scores to use as the y-axis. It must be only the integer part of the PC, i.e. ylv=1 for the first component (LV1),`ylv=2` for the second component (LV2), etc.
show_percent (bool, optional) – If True (default), show the percent of explained variance along the axes labels.
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) – Set the transparency of the points as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), the default seaborn.scatterplot markers are used.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the x axis.
ylabel (str, optional) – The label of the y-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the y axis.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate (bool, optional) – If True (default), add the percentages of each PC in the plot.
annotate_ha ({'left', 'center', 'right'}) – The horizontal anchor to which the text is aligned. Default ‘center’.
annotate_va ({'baseline', 'bottom', 'center', 'center_baseline',) – ‘top’}, optional The vertical anchor to which the text is aligned. Default ‘bottom’.
annotate_xoffset (Union[int, float], optional) – Horizontal offset to add to the annotated text. Default 0.
annotate_yoffset (Union[int, float], optional) – Vertical offset to add to the annotated text. Default 0.
annotate_fontsize (float or {'xx-small', 'x-small', 'small', 'medium', 'large',) – ‘x-large’, ‘xx-large’}, optional Set the font size in points. The string values denote sizes relative to the default font size.
ellipse (bool) – If True (default), draw covariance confidence ellipses over the (x, y) LV scores. The group of points that each ellipse is calculated for is determined by self.y_train. The color of each ellipse is the same as the color of the group it is calculated for.
ellipse_conf (float) – The confidence interval that defines the size of the ellipse. By default the 95% (0.95) confidence interval is used.
ellipse_alpha (float, optional) – Set the transparency of the ellipse as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
ellipse_linewidth (float, optional) – The width of the ellipse’s outline in points. If None (default), the width is set to 1.5 points.
ellipse_linestyle (str, optional) –
A Matplotlib-accepted line style for the ellipse’s outline. If None (default), a solid line will be used.
ellipse_edgecolor (RGBColorType or 'none', optional) –
A Matplotlib-accepted color for the outline of the ellipse. If ‘none’, no color will be applied to the ellipse’s outline.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
ellipse_kwargs (dict, optional) –
Additional keyword arguments are passed to matplotlib.patches.Ellipse.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.patches.Ellipse.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- property x_variance_df: pandas.DataFrame¶
DataFrame containing the variance, cumulative variance, and their respective ratios for the x_train data explained by the PLS components.
- Returns:
the dataframe containing the variance, the cumulative variance and their respective ratios explained by the PLS components.
- Return type:
pd.dataframe
- property x_weights_df: pandas.DataFrame¶
DataFrame containing the weights of the predictor variables for the PLS components.
- Returns:
DataFrame with the weights (x_weights_) of each predictor variable.
- Return type:
pd.DataFrame
- property y_loadings_df: pandas.DataFrame¶
DataFrame containing the loadings of the response variables for the PLS components.
- Returns:
DataFrame with the loadings (y_loadings_) of each response variable.
- Return type:
pd.DataFrame
- property y_pred_bin_df¶
y_pred binarized data in DataFrame representation.
- Returns:
DataFrame containing the y_pred binarized data.
- Return type:
pd.DataFrame
- property y_pred_df¶
y_pred data in DataFrame representation.
- Returns:
DataFrame containing the y_pred data.
- Return type:
pd.DataFrame
- property y_rotations_df: pandas.DataFrame¶
DataFrame containing the rotations of the response variables for the PLS components.
- Returns:
DataFrame with the rotations (y_rotations_) of each response variable.
- Return type:
pd.DataFrame
- property y_scores_df¶
DataFrame of shape (n_samples, n_components) containing the Y scores (transformed data) of the training targets.
- Returns:
DataFrame containing the Y scores of the training targets.
- Return type:
pd.DataFrame
- y_scores_plot(xlv=1, ylv=2, ax=None, figsize=None, color=None, alpha=None, marker=None, markersize=None, title=None, title_y=None, xlabel=None, ylabel=None, xlim=None, ylim=None, legend=True, legend_loc='best', annotate=False, annotate_ha='center', annotate_va='bottom', annotate_xoffset=0, annotate_yoffset=0, annotate_fontsize=None, ellipse=True, ellipse_conf=0.95, ellipse_alpha=0.25, ellipse_linewidth=1, ellipse_linestyle=None, ellipse_edgecolor='none', legend_kwargs={}, annotate_kwargs={}, ellipse_kwargs={}, **plot_kwargs)¶
Plot pairs of Latent Variables (LV) scores for the response variables (y) f the train dataset.
- Parameters:
xlv (int, optional) – Latent Variable (LV) scores to use as the x-axis. It must be only the integer part of the PC, i.e. xlv=1 for the first component (LV1),`xlv=2` for the second component (LV2), etc.
ylv (int, optional) – Latent Variable (LV) scores to use as the y-axis. It must be only the integer part of the PC, i.e. ylv=1 for the first component (LV1),`ylv=2` for the second component (LV2), etc.
show_percent (#)
(default) (# If True)
axes (show the percent of explained variance along the)
labels. (#)
ax (Axes, optional) – A matplotlib.axes.Axes object in which the plot will be added.
figsize (tuple[float, float], optional) – Set the sjize of the figure as (width, height) in inches. Default (6.4, 4.8).
color (list[RGBColorType], optional) –
List of Matplotlib-accepted colors to be used for the lines drawn for the corresponding pairs of xlist and ylist. If None (default), the default Matplotlib colors will be used.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
alpha (float, optional) – Set the transparency of the points as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
marker (list[str], optional) –
List of Matplotlib-accepted markers for each of the plotted lines. If None (default), the default seaborn.scatterplot markers are used.
See: https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers
markersize (float, optional) – Set the size (in points) of the markers. If None (default), the default size of 6 will be used.
title (str, optional) – The title of the plot. Default None.
title_y (float, optional) – The y-offset of the title (0.0 bottom, 1.0 top). Default None.
xlabel (str, optional) – The label of the x-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the x axis.
ylabel (str, optional) – The label of the y-axis. If None (default) it is set to ‘PCn’, where n is the number of the selected PC for the y axis.
xlim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the x-axis.
ylim ((float, float), optional) – A tuple of floating-point numbers that sets the lower and upper limits of the y-axis.
legend (bool, optional) – If True show the plot’s legend.
legend_loc (str or (x, y), optional) –
Set the location of the plot’s legend using parameters accepted by matplotlib.pyplot.legend. The strings ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’ place the legend at the corresponding corner of the axes. The strings ‘upper center’, ‘lower center’, ‘center left’, ‘center right’ place the legend at the center of the corresponding edge of the axes. The string ‘center’ places the legend at the center of the axes. The string ‘best’ (default) places the legend at the location, among the nine locations defined so far, with the minimum overlap with other drawn artists. This option can be quite slow for plots with large amounts of data; your plotting speed may benefit from providing a specific location. The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate (bool, optional) – If True (default), add the percentages of each PC in the plot.
annotate_ha ({'left', 'center', 'right'}) – The horizontal anchor to which the text is aligned. Default ‘center’.
annotate_va ({'baseline', 'bottom', 'center', 'center_baseline',) – ‘top’}, optional The vertical anchor to which the text is aligned. Default ‘bottom’.
annotate_xoffset (Union[int, float], optional) – Horizontal offset to add to the annotated text. Default 0.
annotate_yoffset (Union[int, float], optional) – Vertical offset to add to the annotated text. Default 0.
annotate_fontsize (float or {'xx-small', 'x-small', 'small', 'medium', 'large',) – ‘x-large’, ‘xx-large’}, optional Set the font size in points. The string values denote sizes relative to the default font size.
ellipse (bool) – If True (default), draw covariance confidence ellipses over the (x, y) LV scores. The group of points that each ellipse is calculated for is determined by self.y_train. The color of each ellipse is the same as the color of the group it is calculated for.
ellipse_conf (float) – The confidence interval that defines the size of the ellipse. By default the 95% (0.95) confidence interval is used.
ellipse_alpha (float, optional) – Set the transparency of the ellipse as a floating-point number in the range 0 (completely transparent line) to 1 (completely opaque line). If None (default), the transparency is set to 1.
ellipse_linewidth (float, optional) – The width of the ellipse’s outline in points. If None (default), the width is set to 1.5 points.
ellipse_linestyle (str, optional) –
A Matplotlib-accepted line style for the ellipse’s outline. If None (default), a solid line will be used.
ellipse_edgecolor (RGBColorType or 'none', optional) –
A Matplotlib-accepted color for the outline of the ellipse. If ‘none’, no color will be applied to the ellipse’s outline.
See: https://matplotlib.org/stable/users/explain/colors/colors.html#colors-def
legend_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
annotate_kwargs (dict, optional) –
Additional keyword arguments passed as dictionary key: value pairs to matplotlib.axes.Axes.legend.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html
ellipse_kwargs (dict, optional) –
Additional keyword arguments are passed to matplotlib.patches.Ellipse.
See: https://matplotlib.org/stable/api/_as_gen/matplotlib.patches.Ellipse.html
**plot_kwargs – Additional keyword arguments are passed to matplotlib.pyplot.plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
- property y_test_bin_df: pandas.DataFrame¶
y_test binarized data in DataFrame representation.
- Returns:
DataFrame containing the y_test binarized data.
- Return type:
pd.DataFrame
- property y_test_df: pandas.DataFrame¶
y_test data in DataFrame representation.
- Returns:
DataFrame containing the y_test data.
- Return type:
pd.DataFrame
- property y_weights_df: pandas.DataFrame¶
DataFrame containing the weights of the response variables for the PLS components.
- Returns:
DataFrame with the weights (y_weights_) of each response variable.
- Return type:
pd.DataFrame