Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null value inserting in MATERIALIZED columns #607

Closed
amjadaliup opened this issue May 19, 2024 · 14 comments
Closed

Null value inserting in MATERIALIZED columns #607

amjadaliup opened this issue May 19, 2024 · 14 comments

Comments

@amjadaliup
Copy link

amjadaliup commented May 19, 2024

When I tried to insert data in to customised table in clickhouse(with materialized columns), it inserting null values to the materialized columns. My expectation is to fill data in the column with materialized expression. My table structure is:-

CREATE TABLE audience ( client_id UInt32 CODEC(Delta, LZ4), list_id UInt32 CODEC(Delta, LZ4), status Enum8('0' = 0, '1' = 1, '2' = 2, '3' = 3,'4' = 4,'5' = 5,'6' =6 ,'7' = 7,'8' = 8,'9' = 9,'10' = 10,'11' = 11,'12' = 12,'13' = 13,'14' = 14,'15' = 15, '16' = 16, '17' = 17, '18' = 18) CODEC(LZ4), email String CODEC(LZ4), created_date DateTime64(3, 'UTC') CODEC(DoubleDelta, LZ4), id UInt32, custom_properties String CODEC(ZSTD), tags Array(UInt32) MATERIALIZED arrayMap(x -> toUInt32(x), JSONExtractKeys(custom_properties, 'tags')) CODEC(LZ4), name Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'name') CODEC(LZ4), last_name Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'last_name') CODEC(LZ4), dob Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'dob') CODEC(LZ4), phone Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'phone') CODEC(LZ4), custom_1 Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'custom_1') CODEC(LZ4), custom_2 Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'custom_2') CODEC(LZ4), ..... ..... custom_100 Nullable(String) MATERIALIZED JSONExtractString(custom_properties, 'custom_100') CODEC(LZ4), _version UInt64, is_deleted UInt8, INDEX _version _version TYPE minmax GRANULARITY 1 ) ENGINE = ReplacingMergeTree(_version, is_deleted) PARTITION BY client_id % 1024 ORDER BY (client_id, list_id, status, email, created_date, id) SETTINGS clean_deleted_rows='Always', index_granularity = 8192;

And current insertion expression is like:-
insert into audience (client_id, list_id, status, email, created_date, id, custom_properties, tags, name, last_name, dob, phone, custom_1, custom_2, ......, custom_100, _version, is_deleted) select client_id, list_id, status, email, created_date, id,custom_properties, tags, name, last_name, dob,phone, custom_1,custom_2,...., custom_100, _version,is_deleted from input('client_id UInt32, list_id UInt32,status Enum8('0' = 0, '1' = 1, '2' = 2, '3' = 3, '4' = 4, '5' = 5, '6' = 6, '7' = 7, '8' = 8, '9' = 9, '10' = 10, '11' = 11, '12' = 12, '13' = 13, '14' = 14, '15' = 15, '16' = 16, '17' = 17, '18' = 18), email String,created_date String, id UInt32,custom_properties String, tags Array(UInt32),name Nullable(String),last_name Nullable(String),dob Nullable(String),phone Nullable(String), custom_1 Nullable(String),custom_2 Nullable(String), ...., custom_100 Nullable(String), _version UInt64,is_deleted UInt8')

In my case I have more than 100 fields that needs to extract from the custom_properties field. If materilized columns not specified in the insert statement the insertion will be slow, 100 JSONExtractString function needs to work. When I investigation on this the usage of single JSONExtractString function to extract all the fields will solve the issue. If I can change the insertion statement like below that will be realy nice.

insert into audience (client_id, list_id,status, email,created_date, id,custom_properties, tags,name,last_name, dob,phone, custom_1,custom_2, ......,custom_100, _version,is_deleted) select client_id,list_id, status,email, created_date,id, custom_properties, tags,name, last_name,dob, phone,custom_1, custom_2,...., custom_100,_version, is_deleted from select client_id,list_id, status,email, created_date, id,custom_properties, tags,custom.name, custom.last_name, custom.dob, custom.phone, custom.custom_1, custom.custom_2, ...., custom.custom_100,_version, is_deleted from (input('client_id UInt32,list_id UInt32,status Enum8('0' = 0, '1' = 1, '2' = 2, '3' = 3, '4' = 4, '5' = 5, '6' = 6, '7' = 7, '8' = 8, '9' = 9, '10' = 10, '11' = 11, '12' = 12, '13' = 13, '14' = 14, '15' = 15, '16' = 16, '17' = 17, '18' = 18),email String, created_date String,id UInt32, custom_properties String,tags Array(UInt32),JSONExtractString(custom_properties, 'Tuple(name Nullable(String), last_name Nullable(String),dob Nullable(String), phone Nullable(String), custom_1 Nullable(String),custom_2 Nullable(String),....,custom_100)') AS custom, Nullable(String), _version UInt64,is_deleted UInt8')) tmp

Thank you

@aadant
Copy link
Collaborator

aadant commented May 19, 2024

hi @amjadaliup,

I think it is because the sink-connector uses this value on insert insert_allow_materialized_columns=1.
it is configured like this because :

  • source database generated columns are passed by debezium and needs to be inserted.
  • generated columns are converted as materialized columns

properties.setProperty("custom_settings", "allow_experimental_object_type=1,insert_allow_materialized_columns=1");

ALIAS columns should work fine (even though they are not persisted)

@amjadaliup
Copy link
Author

@aadant Can you suggest a solution for this?

@aadant
Copy link
Collaborator

aadant commented May 20, 2024

Try with ALIAS ?

I wonder if you make the columns not nullable. that may be part of the issue.

@amjadaliup
Copy link
Author

@aadant My intention is to use MATERIALIZED columns is to improve performance. Initially I was queried by using JSONExtractString function and the performance was low. So ALIAS columns will not work for me it will again use the same function while querying.

@aadant
Copy link
Collaborator

aadant commented May 20, 2024

yes try to make the column not nullable, I think it should work

@amjadaliup
Copy link
Author

amjadaliup commented May 20, 2024

I was initially tried with not nullable fields, but got an error like "Cannot set null to non-nullable column # 21 [name String]"

@aadant
Copy link
Collaborator

aadant commented May 20, 2024

Use an empty String ?

@amjadaliup
Copy link
Author

amjadaliup commented May 20, 2024

But name column is not present in my mysql table it's MATERIALIZED from column custom_properties.
name String MATERIALIZED JSONExtractString(custom_properties, 'name')
If I can create a new field from sink connector and assign value extracted from custom_properties that also fine for me

@aadant
Copy link
Collaborator

aadant commented May 20, 2024

As a workaround, you can use ifNull(JSONExtractString(custom_properties, 'name'), '')

@amjadaliup
Copy link
Author

That's okay, but empty value will be in the name field need to fill value from custom_properties->name.

@amjadaliup
Copy link
Author

@aadant I think name String MATERIALIZED ifNull(JSONExtractString(custom_properties, 'name'), '') will not work, because materilaized column expression will not take if column value given as null. Needs to remove the matrerialized from the insertion query. And also JSONExtractString(custom_properties, 'name') never returns a Null value.

@aadant
Copy link
Collaborator

aadant commented May 20, 2024

@amjadaliup maybe it can in some edge cases ?

┌─JSONExtractString(NULL, 'name')─┐
│ ᴺᵁᴸᴸ                            │
└─────────────────────────────────┘

@amjadaliup
Copy link
Author

Oh.. but this is not my case.

@aadant
Copy link
Collaborator

aadant commented May 28, 2024

closing this as a duplicate of column overrides : #323

@aadant aadant closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants