syndat.postprocessing

Optional cleaning and formatting helpers.

syndat.postprocessing.assert_minmax(real, synthetic, method='clip')

Postprocess the synthetic data by either deleting records that fall outside the min-max range of the real data, or adjusting them to fit within the range. Also normalizes -0.0 to 0.0 to avoid plotting issues.

Parameters:
  • real (DataFrame) – The real dataset.

  • synthetic (DataFrame) – The synthetic dataset.

  • method (str) – The method to apply. ‘delete’ to remove records, ‘clip’ to adjust them.

Return type:

DataFrame

Returns:

The postprocessed synthetic dataset.

syndat.postprocessing.normalize_float_precision(real, synthetic)

Postprocess the synthetic data to match the precision or step size found in the real data for float columns.

This function identifies columns in the real dataset that have float data types and determines the precision or step size (e.g., 1.0, 0.5, 0.1) used in those columns. It then rounds the corresponding columns in the synthetic dataset to match this detected precision or step size.

Parameters:
  • real (DataFrame) – The real dataset containing float columns.

  • synthetic (DataFrame) – The synthetic dataset that needs to be adjusted to match the precision of the real data.

Return type:

DataFrame

Returns:

The synthetic dataset with float columns rounded to match the precision or step size of the real data.

syndat.postprocessing.normalize_scale(real_df, synthetic_df)

Scales the columns in the synthetic DataFrame to match the scale (min and max values) of the corresponding columns in the real DataFrame.

Parameters:
  • real_df (DataFrame) – The real dataset used as the scaling reference.

  • synthetic_df (DataFrame) – The synthetic dataset to be scaled.

Return type:

DataFrame

Returns:

The scaled synthetic dataset with columns adjusted to the real dataset’s scale.