BigQuery ML Standard Scaler "failed to calculate mean"
Trying to build a logistic regression using BigQuery ML, I get the following error:
Failed to calculate mean since the entries in corresponding column ‘x’ are all NULLs.
Here’s a reproducible query – make sure to change your dataset name:
CREATE MODEL `samples.TEST_MODELS_001` TRANSFORM ( flag, split_col, ML.standard_scaler(SAFE_CAST(x as FLOAT64)) OVER() as x ) OPTIONS ( MODEL_TYPE='LOGISTIC_REG', AUTO_CLASS_WEIGHTS=TRUE, INPUT_LABEL_COLS=['flag'], EARLY_STOP=true, DATA_SPLIT_METHOD='CUSTOM', DATA_SPLIT_COL='split_col', L2_REG = 0.3) AS SELECT * ,train_test_split = 0 as split_col FROM ( select 0 as train_test_split, 1 as flag, "" as x union all select 0, 0, "0" union all select 0, 1, "1" union all select 1, 1, "" union all select 1, 0, "" union all select 1, 1, "1" )
The problem seems to be related to scaling because if I use
ML.MIN_MAX_SCALER instead of
ML.STANDARD_SCALER it works as expected. Not sure why this is happening as clearly not all values of
NULLs inside the train-test split groups.
I’m wondering if this actually a bug or if I’m doing something wrong here.