Skip to content

Commit

Permalink
feat(redshift): column compression encodings and comments can now be …
Browse files Browse the repository at this point in the history
…customised (#24177)

In accordance with #24165, I'm opening the same pull request as before. Not sure if my previous PR #23597 will automatically be "re-merged" in, but if not, then you can review this pull request

Will AGAIN close #22506

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
Rizxcviii authored Mar 8, 2023
1 parent 1b2014e commit 1ca3e00
Show file tree
Hide file tree
Showing 35 changed files with 2,085 additions and 1,306 deletions.
37 changes: 31 additions & 6 deletions packages/@aws-cdk/aws-redshift/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,17 +200,32 @@ new Table(this, 'Table', {
});
```

Tables can also be configured with a comment:
Tables and their respective columns can be configured to contain comments:

```ts fixture=cluster
new Table(this, 'Table', {
tableColumns: [
{ name: 'col1', dataType: 'varchar(4)' },
{ name: 'col2', dataType: 'float' }
{ name: 'col1', dataType: 'varchar(4)', comment: 'This is a column comment' },
{ name: 'col2', dataType: 'float', comment: 'This is a another column comment' }
],
cluster: cluster,
databaseName: 'databaseName',
tableComment: 'This is a table comment',
});
```

Table columns can be configured to use a specific compression encoding:

```ts fixture=cluster
import { ColumnEncoding } from '@aws-cdk/aws-redshift';

new Table(this, 'Table', {
tableColumns: [
{ name: 'col1', dataType: 'varchar(4)', encoding: ColumnEncoding.TEXT32K },
{ name: 'col2', dataType: 'float', encoding: ColumnEncoding.DELTA32K },
],
cluster: cluster,
databaseName: 'databaseName',
comment: 'This is a comment',
});
```

Expand Down Expand Up @@ -369,6 +384,8 @@ cluster.addToParameterGroup('enable_user_activity_logging', 'true');
In most cases, existing clusters [must be manually rebooted](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-parameter-groups.html) to apply parameter changes. You can automate parameter related reboots by setting the cluster's `rebootForParameterChanges` property to `true` , or by using `Cluster.enableRebootForParameterChanges()`.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as cdk from '@aws-cdk/core';
declare const vpc: ec2.Vpc;

const cluster = new Cluster(this, 'Cluster', {
Expand Down Expand Up @@ -451,14 +468,16 @@ Some Amazon Redshift features require Amazon Redshift to access other AWS servic
When you create an IAM role and set it as the default for the cluster using console, you don't have to provide the IAM role's Amazon Resource Name (ARN) to perform authentication and authorization.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc;

const defaultRole = new iam.Role(this, 'DefaultRole', {
assumedBy: new iam.ServicePrincipal('redshift.amazonaws.com'),
},
);

new Cluster(stack, 'Redshift', {
new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
},
Expand All @@ -471,14 +490,16 @@ new Cluster(stack, 'Redshift', {
A default role can also be added to a cluster using the `addDefaultIamRole` method.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc;

const defaultRole = new iam.Role(this, 'DefaultRole', {
assumedBy: new iam.ServicePrincipal('redshift.amazonaws.com'),
},
);

const redshiftCluster = new Cluster(stack, 'Redshift', {
const redshiftCluster = new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
},
Expand All @@ -494,6 +515,8 @@ redshiftCluster.addDefaultIamRole(defaultRole);
Attaching IAM roles to a Redshift Cluster grants permissions to the Redshift service to perform actions on your behalf.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc

const role = new iam.Role(this, 'Role', {
Expand All @@ -511,6 +534,8 @@ const cluster = new Cluster(this, 'Redshift', {
Additional IAM roles can be attached to a cluster using the `addIamRole` method.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc

const role = new iam.Role(this, 'Role', {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ async function createTable(
tableAndClusterProps: TableAndClusterProps,
): Promise<string> {
const tableName = tableNamePrefix + tableNameSuffix;
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}`).join();
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}${getEncodingColumnString(column)}`).join();

let statement = `CREATE TABLE ${tableName} (${tableColumnsString})`;

Expand All @@ -63,6 +63,11 @@ async function createTable(

await executeStatement(statement, tableAndClusterProps);

for (const column of tableColumns) {
if (column.comment) {
await executeStatement(`COMMENT ON COLUMN ${tableName}.${column.name} IS '${column.comment}'`, tableAndClusterProps);
}
}
if (tableAndClusterProps.tableComment) {
await executeStatement(`COMMENT ON TABLE ${tableName} IS '${tableAndClusterProps.tableComment}'`, tableAndClusterProps);
}
Expand Down Expand Up @@ -120,6 +125,20 @@ async function updateTable(
alterationStatements.push(...columnAdditions.map(addition => `ALTER TABLE ${tableName} ${addition}`));
}

const columnEncoding = tableColumns.filter(column => {
return oldTableColumns.some(oldColumn => column.name === oldColumn.name && column.encoding !== oldColumn.encoding);
}).map(column => `ALTER COLUMN ${column.name} ENCODE ${column.encoding || 'AUTO'}`);
if (columnEncoding.length > 0) {
alterationStatements.push(`ALTER TABLE ${tableName} ${columnEncoding.join(', ')}`);
}

const columnComments = tableColumns.filter(column => {
return oldTableColumns.some(oldColumn => column.name === oldColumn.name && column.comment !== oldColumn.comment);
}).map(column => `COMMENT ON COLUMN ${tableName}.${column.name} IS ${column.comment ? `'${column.comment}'` : 'NULL'}`);
if (columnComments.length > 0) {
alterationStatements.push(...columnComments);
}

if (useColumnIds) {
const columnNameUpdates = tableColumns.reduce((updates, column) => {
const oldColumn = oldTableColumns.find(oldCol => oldCol.id && oldCol.id === column.id);
Expand Down Expand Up @@ -190,3 +209,10 @@ async function updateTable(
function getSortKeyColumnsString(sortKeyColumns: Column[]) {
return sortKeyColumns.map(column => column.name).join();
}

function getEncodingColumnString(column: Column): string {
if (column.encoding) {
return ` ENCODE ${column.encoding}`;
}
return '';
}
123 changes: 123 additions & 0 deletions packages/@aws-cdk/aws-redshift/lib/table.ts
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,20 @@ export interface Column {
* @default - column is not a SORTKEY
*/
readonly sortKey?: boolean;

/**
* The encoding to use for the column.
*
* @default - Amazon Redshift determines the encoding based on the data type.
*/
readonly encoding?: ColumnEncoding;

/**
* A comment to attach to the column.
*
* @default - no comment
*/
readonly comment?: string;
}

/**
Expand Down Expand Up @@ -371,3 +385,112 @@ export enum TableSortStyle {
*/
INTERLEAVED = 'INTERLEAVED',
}

/**
* The compression encoding of a column.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html
*/
export enum ColumnEncoding {
/**
* Amazon Redshift assigns an optimal encoding based on the column data.
* This is the default.
*/
AUTO = 'AUTO',

/**
* The column is not compressed.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Raw_encoding.html
*/
RAW = 'RAW',

/**
* The column is compressed using the AZ64 algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/az64-encoding.html
*/
AZ64 = 'AZ64',

/**
* The column is compressed using a separate dictionary for each block column value on disk.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Byte_dictionary_encoding.html
*/
BYTEDICT = 'BYTEDICT',

/**
* The column is compressed based on the difference between values in the column.
* This records differences as 1-byte values.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
*/
DELTA = 'DELTA',

/**
* The column is compressed based on the difference between values in the column.
* This records differences as 2-byte values.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
*/
DELTA32K = 'DELTA32K',

/**
* The column is compressed using the LZO algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/lzo-encoding.html
*/
LZO = 'LZO',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 1 byte.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY8 = 'MOSTLY8',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 2 bytes.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY16 = 'MOSTLY16',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 4 bytes.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY32 = 'MOSTLY32',

/**
* The column is compressed by recording the number of occurrences of each value in the column.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Runlength_encoding.html
*/
RUNLENGTH = 'RUNLENGTH',

/**
* The column is compressed by recording the first 245 unique words and then using a 1-byte index to represent each word.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
*/
TEXT255 = 'TEXT255',

/**
* The column is compressed by recording the first 32K unique words and then using a 2-byte index to represent each word.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
*/
TEXT32K = 'TEXT32K',

/**
* The column is compressed using the ZSTD algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/zstd-encoding.html
*/
ZSTD = 'ZSTD',
}
Loading

0 comments on commit 1ca3e00

Please sign in to comment.