syndat.rct.preprocessing_tidy_format

syndat.rct.preprocessing_tidy_format.convert_data_to_tidy(df0, type, only_pos=False, only_realtimes=False)

Converts a DataFrame from wide to tidy format, depending on whether it is longitudinal or static data. It uses convert_long_data for longitudinal data and convert_static_data for static data. It filters the observations to keep only observed values where the mask is 1, and renames type categories.

Parameters:
  • df0 (DataFrame) – Input wide-format DataFrame.

  • type (str) – Type of data to convert; ‘long’ for longitudinal or other values for static data.

  • only_pos (bool) – If True, clips negative values in the data to zero.

Return type:

DataFrame

Returns:

A tidy-format DataFrame with standardized TYPE column (‘Observed’, ‘Reconstructed’, ‘Simulations’).

syndat.rct.preprocessing_tidy_format.convert_long_data_to_tidy(df0, only_pos=False)

Converts a wide-format DataFrame to a tidy-format DataFrame by encoding variables into separate columns, separates variable types and names, applies masking, and optionally clips negative values.

Parameters:
  • df0 (DataFrame) – Original wide-format DataFrame with encoded columns (e.g., ‘OBS_var’, ‘REC_var’, ‘MASK_var’).

  • only_pos (bool) – If True, clips negative values in the ‘DV’ column to zero.

Return type:

DataFrame

Returns:

A tidy-format DataFrame with columns: ‘SUBJID’, ‘REPI’, ‘TIME’, ‘DRUG’, ‘TYPE’, ‘Variable’, ‘DV’, and ‘MASK’.

syndat.rct.preprocessing_tidy_format.convert_static_data_to_tidy(df0, only_pos=False)

Converts a wide-format static DataFrame to a tidy-format DataFrame. It melts the dataframe, separates variable types and names, applies masking, and optionally clips negative values.

Parameters:
  • df0 (DataFrame) – Input wide-format DataFrame containing static data with columns including ‘PTNO’ and ‘REPI’.

  • only_pos (bool) – If True, clips negative values in the data to zero.

Return type:

DataFrame

Returns:

A tidy-format DataFrame with columns [‘SUBJID’, ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘MASK’].

syndat.rct.preprocessing_tidy_format.convert_to_syndat_scores(df, only_pos=False)

Converts a DataFrame containing observed and predicted (REC_) columns into two separate DataFrames, synchronizing values based on MASK columns and optionally clipping predicted values to be non-negative.

Parameters:
  • df (DataFrame) – DataFrame containing columns with prefixes OBS_, REC_, MASK_ and a REPI column.

  • only_pos (bool) – If True, clips negative values in REC_ columns to zero.

Return type:

tuple[DataFrame, DataFrame]

Returns:

Tuple of two DataFrames: (observed_df, predicted_df) with synchronized and filtered values.

syndat.rct.preprocessing_tidy_format.get_rp(ldt=None, lt=None, st=None, Tmax=None)

Creates a dictionary with static and longitudinal variable names categorized by type (categorical, continuous), and computes the maximum time value.

Parameters:
  • ldt (Optional[DataFrame]) – Longitudinal DataFrame containing at ‘TIME’ column.

  • lt (Optional[DataFrame]) – Longitudinal variables metadata DataFrame with columns ‘Variable’, ‘Type’, and ‘Cats’.

  • st (Optional[DataFrame]) – Static variables metadata DataFrame with columns ‘Variable’ and ‘Type’.

  • Tmax (Optional[int]) – Optional maximum time value (required if lt is provided but ldt is not).

Return type:

dict

Returns:

Dictionary with keys: ‘Tmax’, ‘static_vnames’, ‘static_cat’, ‘static_cont’, ‘long_vnames’, ‘long_cat’, ‘long_bin’, ‘long_cont’, each mapping to lists of variable names.

syndat.rct.preprocessing_tidy_format.merge_real_synthetic(real_df, synthetic_df, patient_identifier='PTNO', type='static')

Merges a real and a synthetic dataframe with the same variable name, into one dataframe, renames columns and create others for library compatibility

param real_df: real dataframe with at least one column to identify patient ID and time if type==’longitudinal’ param synthetic_df: real dataframe with at least one column to identify patient ID and time if type==’longitudinal’ param patient_identifier: column name to identify different patients param type: defines whether the data is longitudinal or static return: combined dataframe with real and synthetic data including required columns for compatibility.

Return type:

DataFrame