Materialize: allow to work without destination PK #10160

derekperkins · 2022-04-27T22:03:51Z

I'm materializing and summing from our raw usage table to a daily aggregate. I can't use a sequence to get a globally unique id, but a bounded unique id is sufficient for my use case, so I'm using auto-increment. I'm grouping on the source table columns to get a unique value, which works, but errors out if the PK on the target table is the autoinc value.

Materialize Config

{
  "workflow": "billing__usage",
  "source_keyspace": "workspaces",
  "target_keyspace": "workspaces",
  "table_settings": [
    {
      "target_table": "billing__usage",
      "source_expression": "select workspace_id as account_id, date(from_unixtime(requested)) as requested_date, date(created_at) as created_date, 1 as type, count(*) as usage from workspaces_rankings__pulls group by account_id, requested_date, created_date, type"
    }
  ],
  "tablet_types": "REPLICA"
}

Error Message

primary key column \u0026{usage_id bigint bigint true false} not found in select list

Destination Table Schema

CREATE TABLE `billing__usage` (
  `usage_id` bigint NOT NULL AUTO_INCREMENT,
  `account_id` bigint NOT NULL,
  `created_date` date NOT NULL,
  `requested_date` date NOT NULL,
  `type` smallint NOT NULL,
  `usage` bigint NOT NULL,
  `reported_to_chargebee_at` bigint DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`account_id`,`requested_date`,`created_date`,`type`), # errors if I switch these two keys
  UNIQUE KEY `usage_id` (`usage_id`),
  CONSTRAINT `fk__workspaces__billing_usage__workspaces__billing` FOREIGN KEY (`account_id`) REFERENCES `billing__accounts` (`account_id`)
) ENGINE=InnoDB AUTO_INCREMENT=335837959 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

Not urgent at all, and maybe there's a technical reason why the PK has to exist at the source table.

Tested on v13

The text was updated successfully, but these errors were encountered:

harshit-gangal · 2022-04-28T05:56:57Z

@vitessio/vreplication

mattlord · 2022-05-03T22:14:40Z

I'm working on a related issue here: #10192

I wonder if I should fold this in...

This specific failure occurs here:

vitess/go/vt/vttablet/tabletmanager/vreplication/table_plan_builder.go

Lines 545 to 577 in 39427bd

    
           // analyzePK builds tpb.pkCols. 
        
           // Input cols must include all columns which participate in the PRIMARY KEY or the chosen UniqueKey. 
        
           // It's OK to also include columns not in the key. 
        
           // Input cols should be ordered according to key ordinal. 
        
           // e.g. if "UNIQUE KEY(c5,c2)" then we expect c5 to come before c2 
        
           func (tpb *tablePlanBuilder) analyzePK(cols []*ColumnInfo) error { 
        
           	for _, col := range cols { 
        
           		if !col.IsPK { 
        
           			continue 
        
           		} 
        
           		if col.IsGenerated { 
        
           			// It's possible that a GENERATED column is part of the PRIMARY KEY. That's valid. 
        
           			// But then, we also know that we don't actually SELECT a GENERATED column, we just skip 
        
           			// it silently and let it re-materialize by MySQL itself on the target. 
        
           			continue 
        
           		} 
        
           		cexpr := tpb.findCol(sqlparser.NewColIdent(col.Name)) 
        
           		if cexpr == nil { 
        
           			// TODO(shlomi): at some point in the futue we want to make this check stricter. 
        
           			// We could be reading a generated column c1 which in turn selects some other column c2. 
        
           			// We will want t oensure that `c2` is found in select list... 
        
           			return fmt.Errorf("primary key column %v not found in select list", col) 
        
           		} 
        
           		if cexpr.operation != opExpr { 
        
           			return fmt.Errorf("primary key column %v is not allowed to reference an aggregate expression", col) 
        
           		} 
        
           		cexpr.isPK = true 
        
           		cexpr.dataType = col.DataType 
        
           		cexpr.columnType = col.ColumnType 
        
           		tpb.pkCols = append(tpb.pkCols, cexpr) 
        
           	} 
        
           	return nil 
        
           }

mattlord · 2022-05-03T22:37:20Z

As I look into it more, now I'm wondering if this is not also fixed by #10192 as it looks like we use MoveTables for materializations.

@derekperkins do you happen to have a simple test case, e.g. using the example commerce and customer keyspaces? I could then do some manual testing in my PR branch and I can go from there.

mattlord · 2022-05-04T16:45:30Z

After looking into this more it's orthogonal to #10192 so I won't be looping this into that work.

derekperkins · 2022-05-11T04:31:04Z

I think this config should trigger the error

Tables

create table customer (
  customer_id bigint not null auto_increment,
  email varbinary(128),
  primary key(customer_id)
);

create table customer_with_new_ids (
  customer_id bigint not null auto_increment,
  email varbinary(128),
  primary key(customer_id)
);

Materialize Config

{
  "workflow": "customer_with_new_ids",
  "source_keyspace": "commerce",
  "target_keyspace": "commerce",
  "table_settings": [
    {
      "target_table": "customer_with_new_ids",
      "source_expression": "select email from customer"
    }
  ],
  "tablet_types": "REPLICA"
}

mattlord · 2023-08-31T03:14:22Z

Noting that this is still an issue on main/18.0:

git checkout main
make build
pushd examples/local

./101_initial_cluster.sh

mysql commerce -e '
drop table customer;

create table customer (
  customer_id bigint not null auto_increment,
  email varbinary(128),
  primary key(customer_id)
);

create table customer_with_new_ids (
  customer_id bigint not null auto_increment,
  email varbinary(128),
  primary key(customer_id)
);
'

vtctlclient Materialize '{
  "workflow": "customer_with_new_ids",
  "source_keyspace": "commerce",
  "target_keyspace": "commerce",
  "table_settings": [
    {
      "target_table": "customer_with_new_ids",
      "source_expression": "select email from customer"
    }
  ],
  "tablet_types": "REPLICA"
}'

vtctlclient workflow -- commerce.customer_with_new_ids show

derekperkins added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication Needs Triage This issue needs to be correctly labelled and triaged labels Apr 27, 2022

harshit-gangal removed the Needs Triage This issue needs to be correctly labelled and triaged label Apr 28, 2022

mattlord self-assigned this May 3, 2022

mattlord changed the title ~~vreplication: allow to work without destination PK~~ Materialize: allow to work without destination PK May 4, 2022

mattlord removed their assignment May 4, 2022

derekperkins mentioned this issue May 11, 2022

Materialize: enable usage of sequences #10270

Open

derekperkins mentioned this issue Dec 11, 2023

meterialize: Getting 'unexpected error' when making a workflow with sql join query #14749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Materialize: allow to work without destination PK #10160

Materialize: allow to work without destination PK #10160

derekperkins commented Apr 27, 2022

harshit-gangal commented Apr 28, 2022

mattlord commented May 3, 2022 •

edited

Loading

mattlord commented May 3, 2022

mattlord commented May 4, 2022

derekperkins commented May 11, 2022

mattlord commented Aug 31, 2023 •

edited

Loading

Materialize: allow to work without destination PK #10160

Materialize: allow to work without destination PK #10160

Comments

derekperkins commented Apr 27, 2022

Materialize Config

Error Message

Destination Table Schema

harshit-gangal commented Apr 28, 2022

mattlord commented May 3, 2022 • edited Loading

mattlord commented May 3, 2022

mattlord commented May 4, 2022

derekperkins commented May 11, 2022

Tables

Materialize Config

mattlord commented Aug 31, 2023 • edited Loading

mattlord commented May 3, 2022 •

edited

Loading

mattlord commented Aug 31, 2023 •

edited

Loading