syndat.rct.visualization_clinical_trials

syndat.rct.visualization_clinical_trials.assign_visit_absolute(dat_vpc, Visits, geometric=False)

Assigns each time point in dat_vpc to a visit bin defined by Visits, using linear or geometric spacing.

Parameters:
  • dat_vpc (Union[ndarray, Series]) – Array or Series of time points to bin.

  • Visits (Union[ndarray, Series]) – Array or Series of reference visit times.

  • geometric (bool) – If True, uses geometric (log-space) binning.

Return type:

ndarray

Returns:

Array of assigned visit values.

syndat.rct.visualization_clinical_trials.bar_categorical(plt_dt, var_name, type_, strat_vars=None)

Generates a bar chart for a categorical variable comparing observed vs reconstructed distributions.

Parameters:
  • plt_dt (DataFrame) – DataFrame with one categorical variable to plot.

  • var_name (str) – Name of the variable to use as title.

  • type – “Percentage” or “Subjects” to define the bar heights.

  • strat_vars (Optional[List[str]]) – Optional list of variables to use for facetting.

  • x_label – Label for the x-axis (default: “Real Data”).

  • y_label – Label for the y-axis (default: “Synthetic Data”).

Return type:

ggplot

Returns:

ggplot object.

syndat.rct.visualization_clinical_trials.bar_categorical_list(rp0, dt, mode='Reconstructed', type_='Percentage', dt_cs=None, dt_cs_label=['Counterfactual'], strat_vars=None, static=False, real_label='Real Data', syn_label='Synthetic Data', save_path=None, width=8, height=6, dpi=300, as_png=False)

Generates and optionally saves bar plots for all categorical variables listed in rp0.

Parameters:
  • rp0 (dict) – Dictionary with a key ‘long_cat’ containing a list of categorical variable names.

  • dt (DataFrame) – DataFrame with the columns ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘SUBJID’, ‘TIME’ (if static = False) and optionally others.

  • mode (Optional[str]) – String, usually “Reconstructed”, used for filtering TYPE.

  • type – “Percentage” or “Subjects” to define the bar heights.

  • dt_cs (Optional[DataFrame]) – Optional list of counterfactual DataFrames.

  • dt_cs_label (Optional[List[str]]) – Optional list of labels for the counterfactual data.

  • strat_vars (Optional[List[str]]) – Optional list of variables to use for facetting.

  • static (Optional[bool]) – If True, metrics for static variables will be calculated.

  • real_label (Optional[str]) – Label for the real data (default: “Real Data”).

  • syn_label (Optional[str]) – Label for the synthetic data (default: “Synthetic Data”).

  • save_path (Optional[str]) – Optional path to folder where plots should be saved. If not provided, plots are shown.

  • width (Optional[int]) – Width of the saved plot in inches (used only if save_path is provided).

  • height (Optional[int]) – Height of the saved plot in inches (used only if save_path is provided).

  • dpi (Optional[int]) – Resolution (dots per inch) of the saved plot (used only if save_path is provided).

  • as_png (Optional[bool]) – set to True if you want the plot to be saved as png

Return type:

Dict[str, ggplot]

Returns:

Dictionary of ggplot objects keyed by variable name.

syndat.rct.visualization_clinical_trials.gof_binary_list(rp0, dt, mode='Reconstructed', strat_vars=None, static=False, x_label='Real Data', y_label='Synthetic Data', save_path=None, width=8, height=6, dpi=300, as_png=False)

Creates goodness-of-fit (calibration) plots for binary variables by comparing the proportion of observed vs. reconstructed outcomes over time (in %).

Parameters:
  • rp0 (dict) – Dictionary with a key ‘long_bin’ containing a list of binary variable names.

  • dt (DataFrame) – pd.DataFrame with at least columns ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘TIME’ (if static = False), and optionally stratification variables.

  • strat_vars (Optional[List[str]]) – Optional list of column names for stratified (faceted) plots.

  • static (Optional[bool]) – If True, metrics for static variables will be calculated.

  • x_label (Optional[str]) – Label for the x-axis (default: “Real Data”).

  • y_label (Optional[str]) – Label for the y-axis (default: “Synthetic Data”).

  • save_path (Optional[str]) – Optional path to a folder. If provided, saves each plot as a PNG. If not provided, plots will be shown interactively.

  • width (Optional[int]) – Width of the saved plot in inches (used only if save_path is provided).

  • height (Optional[int]) – Height of the saved plot in inches (used only if save_path is provided).

  • dpi (Optional[int]) – Resolution (dots per inch) of the saved plot (used only if save_path is provided).

  • as_png (Optional[bool]) – set to True if you want the plot to be saved as png

Return type:

Dict[str, ggplot]

Returns:

Dictionary mapping each variable name to its ggplot object.

syndat.rct.visualization_clinical_trials.gof_continuous(plt_dt, var_name, strat_vars=None, log_trans=False, x_label='Real Data', y_label='Synthetic Data')

Generates a goodness-of-fit (GOF) plot for continuous variables using observed vs. reconstructed values.

Produces scatter plots with a smoothing line and identity line. Optionally applies log-transformation and stratification by specified variables.

Parameters:
  • plt_dt (DataFrame) – A pandas DataFrame containing columns ‘Observed’ and ‘Reconstructed’, and optionally stratification variables.

  • var_name (str) – Name of the variable to display in the plot title.

  • strat_vars (Optional[List[str]]) – Optional list of column names to stratify the plot using facet wrap.

  • log_trans (bool) – Whether to apply a log10 transformation to the axes.

  • x_label (str) – Label for the x-axis (default: “Real Data”).

  • y_label (str) – Label for the y-axis (default: “Synthetic Data”).

Return type:

ggplot

Returns:

A ggplot object representing the GOF plot.

syndat.rct.visualization_clinical_trials.gof_continuous_list(rp0, dt, mode='Reconstructed', strat_vars=None, static=False, log_trans=False, x_label='Real Data', y_label='Synthetic Data', save_path=None, width=8, height=6, dpi=300, as_png=False)

Creates a dictionary of goodness-of-fit (GOF) plots for a list of continuous variables. Saves or displays each plot depending on whether a path is provided.

Parameters:
  • rp0 (dict) – Dictionary with a key ‘long_cont’ containing a list of continuous variable names.

  • dt (DataFrame) – pd.DataFrame with columns including ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘SUBJID’, ‘TIME’ (if static = False), and optionally stratification variables.

  • strat_vars (Optional[List[str]]) – Optional list of column names to stratify each plot (faceted visualization).

  • static (Optional[bool]) – If True, metrics for static variables will be calculated.

  • log_trans (Optional[bool]) – If True, applies log10 transformation to both axes in the plots.

  • x_label (Optional[str]) – Label for the x-axis (default: “Real Data”).

  • y_label (Optional[str]) – Label for the y-axis (default: “Synthetic Data”).

  • save_path (Optional[str]) – Optional path to a folder. If provided, saves each plot as a PNG file. If not provided, plots will be shown interactively.

  • width (Optional[int]) – Width of the saved plot in inches (used only if save_path is provided).

  • height (Optional[int]) – Height of the saved plot in inches (used only if save_path is provided).

  • dpi (Optional[int]) – Resolution (dots per inch) of the saved plot (used only if save_path is provided).

  • as_png (Optional[bool]) – set to True if you want the plot to be saved as png

Return type:

Dict[str, ggplot]

Returns:

A dictionary where keys are variable names and values are ggplot GOF plots.

syndat.rct.visualization_clinical_trials.percentage_cat_traj_time_list(rp0, dt, mode='Reconstructed', dt_cs=None, dt_cs_label=['Counterfactual'], strat_vars=None, real_label='Real Data', syn_label='Synthetic Data', time_unit='Months', save_path=None, width=8, height=6, dpi=300, as_png=False)

Creates trajectories plots of the percentage of subjects who achieved the outcome value 1 (e.g., responders).

Parameters:
  • rp0 (dict) – Dictionary with a key ‘long_bin’ containing a list of binary variable names.

  • dt (DataFrame) – pd.DataFrame with at least columns ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘TIME’, and optionally stratification variables.

  • mode (Optional[str]) – String, usually “Reconstructed”, used for filtering TYPE.

  • dt_cs (Optional[DataFrame]) – Optional list of counterfactual DataFrames.

  • dt_cs_label (Optional[List[str]]) – Optional list of labels for the counterfactual data.

  • strat_vars (Optional[List[str]]) – Optional list of column names for stratified (faceted) plots.

  • time_unit (Optional[str]) – A string representing the unit of time to display on the x-axis label (e.g., “Months”, “Days”, “Hours”).

  • real_label (Optional[str]) – Label for the real data (default: “Real Data”).

  • syn_label (Optional[str]) – Label for the synthetic data (default: “Synthetic Data”).

  • save_path (Optional[str]) – Optional path to a folder. If provided, saves each plot as a PNG. If not provided, plots will be shown interactively.

  • width (Optional[int]) – Width of the saved plot in inches (used only if save_path is provided).

  • height (Optional[int]) – Height of the saved plot in inches (used only if save_path is provided).

  • dpi (Optional[int]) – Resolution (dots per inch) of the saved plot (used only if save_path is provided).

  • as_png (Optional[bool]) – set to True if you want the plot to be saved as png

Return type:

Dict[str, ggplot]

Returns:

Dictionary mapping each variable name to its ggplot object.

syndat.rct.visualization_clinical_trials.raincloud_continuous_list(rp0, dt, mode='Reconstructed', static=False, strat_vars=None, dt_cs=None, dt_cs_label=['Counterfactual'], real_label='Real Data', syn_label='Synthetic Data', save_path=None, width=8, height=6, dpi=300, as_png=False)

Generates and optionally saves raincloud plots for continuous observed vs reconstructed variables

Parameters:
  • rp0 (dict) – Dictionary with a key ‘long_cont’, ‘static_cont’ containing a list of continuous variable names.

  • dt (DataFrame) – DataFrame with the columns ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘SUBJID’, ‘TIME’ and optionally others.

  • mode (Optional[str]) – String, usually “Reconstructed”, used for filtering TYPE.

  • static (Optional[bool]) – If True, plots for static variables will be obtained.

  • strat_vars (Optional[List[str]]) – Optional list of variables to use for facetting.

  • dt_cs (Optional[DataFrame]) – Optional list of counterfactual DataFrames.

  • dt_cs_label (Optional[List[str]]) – Optional list of labels for the counterfactual data.

  • real_label (Optional[str]) – Label for the real data (default: “Real Data”).

  • syn_label (Optional[str]) – Label for the synthetic data (default: “Synthetic Data”).

  • save_path (Optional[str]) – Optional path to folder where plots should be saved. If not provided, plots are shown.

  • width (Optional[int]) – Width of the saved plot in inches (used only if save_path is provided).

  • height (Optional[int]) – Height of the saved plot in inches (used only if save_path is provided).

  • dpi (Optional[int]) – Resolution (dots per inch) of the saved plot (used only if save_path is provided).

  • as_png (Optional[bool]) – set to True if you want the plot to be saved as png

Return type:

Dict[str, ggplot]

Returns:

Dictionary of ggplot objects keyed by variable name.

syndat.rct.visualization_clinical_trials.raincloud_plot(plt_dt, var_name, strat_vars=None, real_label='Real Data', syn_label='Synthetic Data', dt_cs_label=[])

Generates a raincloud plot (violin + boxplot + jitter) comparing Observed vs Reconstructed data.

Parameters:
  • dt – DataFrame with columns ‘TYPE’, ‘DV’ and optional stratification vars.

  • var_name (str) – Name of the variable to use as the plot title.

  • strat_vars (Optional[List[str]]) – Optional list of variables to use for facetting.

  • real_label (Optional[str]) – Label for the real data (default: “Real Data”).

  • syn_label (Optional[str]) – Label for the synthetic data (default: “Synthetic Data”).

  • dt_cs_label (Optional[List[str]]) – Label for the counterfactual data (default: []).

Return type:

ggplot

Returns:

ggplot object.

syndat.rct.visualization_clinical_trials.trajectory_plot(plt_dt, var_name, strat_vars=None, time_unit='Months', achievement_plot=False)

Creates a ribbon plot of the median and 5th-95th percentiles of a continuous variable over time.

Parameters:
  • plt_dt (DataFrame) – DataFrame with summary statistics (‘med’, ‘p5’, ‘p95’) by Visit and TYPE.

  • var_name (str) – Name of the variable to use as the plot title.

  • strat_vars (Optional[List[str]]) – Optional list of variables to use for facetting.

  • time_unit (Optional[str]) – A string representing the unit of time to display on the x-axis label (e.g., “Months”, “Days”, “Hours”).

  • achievement_plot (Optional[bool]) – If True, plot percentage of subjects achieving the outcome over time; if False, plot median with ribbons.

Return type:

ggplot

Returns:

ggplot object.

syndat.rct.visualization_clinical_trials.trajectory_plot_list(rp0, dt, mode='Reconstructed', bins=None, dt_cs=None, dt_cs_label=['Counterfactual'], strat_vars=None, real_label='Real Data', syn_label='Synthetic Data', time_unit='Months', save_path=None, width=8, height=6, dpi=300, as_png=False)

Generates and optionally saves ribbon plots for continuous variables across visits.

Parameters:
  • rp0 (dict) – Dictionary with key ‘long_cont’ containing a list of variable names to plot.

  • dt (DataFrame) – Main DataFrame containing data for “Observed” and mode.

  • mode (Optional[str]) – String, usually “Reconstructed”, used for filtering TYPE.

  • bins (Optional[ndarray]) – Optional array of visit cutoffs. If None, uses unique TIME values in dt.

  • dt_cs (Optional[DataFrame]) – Optional list of counterfactual DataFrames.

  • dt_cs_label (Optional[List[str]]) – Optional list of labels for the counterfactual data.

  • strat_vars (Optional[List[str]]) – Optional list of stratification variables for facetting.

  • time_unit (Optional[str]) – A string representing the unit of time to display on the x-axis label (e.g., “Months”, “Days”, “Hours”).

  • real_label (Optional[str]) – Label for the real data (default: “Real Data”).

  • syn_label (Optional[str]) – Label for the synthetic data (default: “Synthetic Data”).

  • save_path (Optional[str]) – Optional path to save plots. If None, plots are printed to console.

  • width (Optional[int]) – Width of the saved plot in inches (used only if save_path is provided).

  • height (Optional[int]) – Height of the saved plot in inches (used only if save_path is provided).

  • dpi (Optional[int]) – Resolution (dots per inch) of the saved plot (used only if save_path is provided).

  • as_png (Optional[bool]) – set to True if you want the plot to be saved as png

Return type:

Dict[str, ggplot]

Returns:

Dictionary of ggplot objects keyed by variable name.