-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new split_to_map udf #5563
feat: new split_to_map udf #5563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two quick comments, i'll take a full look later
ksqldb-engine/src/main/java/io/confluent/ksql/function/udf/string/SplitToMap.java
Show resolved
Hide resolved
+ "'kvDelimiter'. If the same key is present multiple times in the input, the latest " | ||
+ "value for that key is returned. Returns NULL f the input text or either of the " | ||
+ "delimiters is NULL.") | ||
public class SplitToMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also want to have EncodeMap(map, entry_delim, kv_delim)
which encodes a map into a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add it to my udf backlog ;) - although i'm less concerned about that from a use-case perspective as you can always construct almost-arbitrary json output by using structs/maps/arrays if you need that for a downstream system. The primary motivator for this one is when you get, for example, some encoded message from a mainframe MQ system that needs to be parsed out this way
``` | ||
|
||
Splits a string into key-value pairs and creates a map from them. The | ||
'entryDelimiter' splits the string into key-value pairs which are then split by 'kvDelimiter'. If the same key is present multiple times in the input, the latest value for that key is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'entryDelimiter' splits the string into key-value pairs which are then split by 'kvDelimiter'. If the same key is present multiple times in the input, the latest value for that key is returned. | |
`entryDelimiter` splits the string into key-value pairs which are then split by `kvDelimiter`. If the same key is present multiple times in the input, the latest value for that key is returned. |
Splits a string into key-value pairs and creates a map from them. The | ||
'entryDelimiter' splits the string into key-value pairs which are then split by 'kvDelimiter'. If the same key is present multiple times in the input, the latest value for that key is returned. | ||
|
||
Returns NULL f the input text is NULL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns NULL f the input text is NULL. | |
Returns NULL if the input text is NULL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arghhh! thanks Jim :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @blueedgenick
import java.util.Map; | ||
import org.junit.Test; | ||
|
||
public class SplitToMapTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test with whitespace? what do we want the behavior to be when there is whitespace (e.g. foo := bar
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, test added!
New UDF split_to_map(input, entryDelimiter, kvDelimiter) to build a map from a string. Useful for taking messages from upstream systems and converting them into a more structured and usable format.
* feat: implements ARRAY_JOIN as requested in (#5028) (#5474) (#5638) Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com> * feat: new split_to_map udf (#5563) New UDF split_to_map(input, entryDelimiter, kvDelimiter) to build a map from a string. Useful for taking messages from upstream systems and converting them into a more structured and usable format. * feat: add CHR UDF (#5559) A new UDF, CHR, to turn a number representing a unicode codepoint into a single-character string. Very useful for dealing with non-printable characters (tab, CR, LF, ...) in strings or those characters not easily represented in your local codepage. Co-authored-by: Steven Zhang <35498506+stevenpyzhang@users.noreply.github.com> Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com> Co-authored-by: Nick Dearden <blueedgenick@users.noreply.github.com>
Description
New UDF
split_to_map(input, entryDelimiter, kvDelimiter)
to build a map from a string.Useful for taking messages from upstream systems and converting them into a more structured and usable format.
Testing done
New Unit & QTT tests.
Reviewer checklist