syndat.visualization

Visualize feature distributions correlation and SHAP analysis.

syndat.visualization.plot_categorical_feature(feature, real_data, synthetic_data, y_scale='auto')

Plots count plots for a categorical feature from both real and synthetic datasets.

Parameters:
  • feature (str) – The feature to be plotted

  • real_data (DataFrame) – The real data

  • synthetic_data (DataFrame) – The synthetic data

  • y_scale (Literal['auto', 'absolute', 'relative']) – Categorical y-axis scale mode: - “auto”: uses relative frequencies when real/synthetic sample sizes differ by at least 1%, else absolute. - “absolute”: always uses absolute counts. - “relative”: always uses relative frequencies (%).

Return type:

None

syndat.visualization.plot_correlations(real, synthetic, store_destination)

Plots correlation matrices for real and synthetic features in form of heatmaps.

Parameters:
  • real (DataFrame) – The real data

  • synthetic (DataFrame) – The synthetic data

  • store_destination (str) – Path to the folder where the results should be stored.

Return type:

None

syndat.visualization.plot_distributions(real, synthetic, store_destination, categorical_y_scale='auto')

Plots violin plots (numeric features) or bar charts (categorical features) together with their summary statistics.

Parameters:
  • real (DataFrame) – The real data

  • synthetic (DataFrame) – The synthetic data

  • store_destination (str) – Path to the folder where the results should be stored.

  • categorical_y_scale (Literal['auto', 'absolute', 'relative']) – Categorical y-axis scale mode: - “auto”: uses relative frequencies when real/synthetic sample sizes differ by at least 1%, else absolute. - “absolute”: always uses absolute counts. - “relative”: always uses relative frequencies (%).

Return type:

None

syndat.visualization.plot_numerical_feature(feature, real_data, synthetic_data)

Plots violin plots for a numerical feature from both real and synthetic datasets and displays their summary statistics.

Parameters:
  • feature (str) – The feature to be plotted

  • real_data (DataFrame) – The real data

  • synthetic_data (DataFrame) – The synthetic data

Return type:

None

syndat.visualization.plot_shap_discrimination(real, synthetic, save_path=None)

Generates a SHAP summary plot to illustrate the discrimination between real and synthetic datasets using a Random Forest classifier.

Parameters:
  • real (DataFrame) – The real data

  • synthetic (DataFrame) – The synthetic data

  • save_path (Optional[str]) – Path to the file where the resulting plot should be saved. If None, the plot will not be saved.

Return type:

None

Returns:

None