syndat.rct.preprocessing_tidy_format
- syndat.rct.preprocessing_tidy_format.convert_data_to_tidy(df0, type, only_pos=False, only_realtimes=False)
Converts a DataFrame from wide to tidy format, depending on whether it is longitudinal or static data. It uses convert_long_data for longitudinal data and convert_static_data for static data. It filters the observations to keep only observed values where the mask is 1, and renames type categories.
- Parameters:
df0 (
DataFrame) – Input wide-format DataFrame.type (
str) – Type of data to convert; ‘long’ for longitudinal or other values for static data.only_pos (
bool) – If True, clips negative values in the data to zero.
- Return type:
DataFrame- Returns:
A tidy-format DataFrame with standardized TYPE column (‘Observed’, ‘Reconstructed’, ‘Simulations’).
- syndat.rct.preprocessing_tidy_format.convert_long_data_to_tidy(df0, only_pos=False)
Converts a wide-format DataFrame to a tidy-format DataFrame by encoding variables into separate columns, separates variable types and names, applies masking, and optionally clips negative values.
- Parameters:
df0 (
DataFrame) – Original wide-format DataFrame with encoded columns (e.g., ‘OBS_var’, ‘REC_var’, ‘MASK_var’).only_pos (
bool) – If True, clips negative values in the ‘DV’ column to zero.
- Return type:
DataFrame- Returns:
A tidy-format DataFrame with columns: ‘SUBJID’, ‘REPI’, ‘TIME’, ‘DRUG’, ‘TYPE’, ‘Variable’, ‘DV’, and ‘MASK’.
- syndat.rct.preprocessing_tidy_format.convert_static_data_to_tidy(df0, only_pos=False)
Converts a wide-format static DataFrame to a tidy-format DataFrame. It melts the dataframe, separates variable types and names, applies masking, and optionally clips negative values.
- Parameters:
df0 (
DataFrame) – Input wide-format DataFrame containing static data with columns including ‘PTNO’ and ‘REPI’.only_pos (
bool) – If True, clips negative values in the data to zero.
- Return type:
DataFrame- Returns:
A tidy-format DataFrame with columns [‘SUBJID’, ‘REPI’, ‘TYPE’, ‘Variable’, ‘DV’, ‘MASK’].
- syndat.rct.preprocessing_tidy_format.convert_to_syndat_scores(df, only_pos=False)
Converts a DataFrame containing observed and predicted (
REC_) columns into two separate DataFrames, synchronizing values based on MASK columns and optionally clipping predicted values to be non-negative.- Parameters:
df (
DataFrame) – DataFrame containing columns with prefixesOBS_,REC_,MASK_and aREPIcolumn.only_pos (
bool) – If True, clips negative values inREC_columns to zero.
- Return type:
tuple[DataFrame,DataFrame]- Returns:
Tuple of two DataFrames: (observed_df, predicted_df) with synchronized and filtered values.
- syndat.rct.preprocessing_tidy_format.get_rp(ldt=None, lt=None, st=None, Tmax=None)
Creates a dictionary with static and longitudinal variable names categorized by type (categorical, continuous), and computes the maximum time value.
- Parameters:
ldt (
Optional[DataFrame]) – Longitudinal DataFrame containing at ‘TIME’ column.lt (
Optional[DataFrame]) – Longitudinal variables metadata DataFrame with columns ‘Variable’, ‘Type’, and ‘Cats’.st (
Optional[DataFrame]) – Static variables metadata DataFrame with columns ‘Variable’ and ‘Type’.Tmax (
Optional[int]) – Optional maximum time value (required if lt is provided but ldt is not).
- Return type:
dict- Returns:
Dictionary with keys: ‘Tmax’, ‘static_vnames’, ‘static_cat’, ‘static_cont’, ‘long_vnames’, ‘long_cat’, ‘long_bin’, ‘long_cont’, each mapping to lists of variable names.
- syndat.rct.preprocessing_tidy_format.merge_real_synthetic(real_df, synthetic_df, patient_identifier='PTNO', type='static')
Merges a real and a synthetic dataframe with the same variable name, into one dataframe, renames columns and create others for library compatibility
param real_df: real dataframe with at least one column to identify patient ID and time if type==’longitudinal’ param synthetic_df: real dataframe with at least one column to identify patient ID and time if type==’longitudinal’ param patient_identifier: column name to identify different patients param type: defines whether the data is longitudinal or static return: combined dataframe with real and synthetic data including required columns for compatibility.
- Return type:
DataFrame