BigQuery ML Standard Scaler "failed to calculate mean"

Trying to build a logistic regression using BigQuery ML, I get the following error:

Failed to calculate mean since the entries in corresponding column ‘x’ are all NULLs.

Here’s a reproducible query – make sure to change your dataset name:

CREATE MODEL `samples.TEST_MODELS_001` TRANSFORM (     flag, split_col, ML.standard_scaler(SAFE_CAST(x as FLOAT64)) OVER() as x ) OPTIONS   ( MODEL_TYPE='LOGISTIC_REG',     AUTO_CLASS_WEIGHTS=TRUE,     INPUT_LABEL_COLS=['flag'],     EARLY_STOP=true,     DATA_SPLIT_METHOD='CUSTOM',     DATA_SPLIT_COL='split_col',     L2_REG = 0.3) AS SELECT     *     ,train_test_split = 0 as split_col FROM (   select     0 as train_test_split, 1 as flag, "" as x    union all    select 0, 0, "0"    union all    select 0, 1, "1"    union all    select 1, 1, ""    union all    select 1, 0, ""    union all    select 1, 1, "1" ) 

The problem seems to be related to scaling because if I use ML.MIN_MAX_SCALER instead of ML.STANDARD_SCALER it works as expected. Not sure why this is happening as clearly not all values of x are NULLs inside the train-test split groups.

I’m wondering if this actually a bug or if I’m doing something wrong here.

Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.