Skip to content
This repository has been archived by the owner on Mar 26, 2018. It is now read-only.

As a citizen integrator, specify data type information in an integration flow #182

Closed
kcbabo opened this issue Nov 13, 2017 · 7 comments
Closed
Labels

Comments

@kcbabo
Copy link

kcbabo commented Nov 13, 2017

As a citizen integrator, I sometimes use connections within an integration flow that do not have a defined data model for input and/or output data. Examples of some 'typeless' connections used by my team are FTP, Amazon S3, JMS, and APIs which do not define input/output types in their Swagger definition. The issue with these typeless connections is that type-aware functionality in Syndesis (e.g. data mapper, basic filter) is not available because the types are not known. To address this issue, I want the ability to define the data type at any point in an integration where it is not known.

To address this requirement, I would like the ability to add a "Describe Data" step to an integration flow that declares the current data type. If a data type is already declared by a connector or step, I cannot change that data type with the Describe Data step. The configuration of the step should allow me to select from the following data type descriptions:

  • JSON schema
  • JSON instance document
  • XML schema
  • XML instance document

I anticipate using a Describe Data step in the following situations:

  • A step after a typeless start connection, in order to declare the type of data that starts a flow
  • A step before a typeless step connection, in order to declare the type of data that a step connection expects
  • A step after a typeless step connection, in order to declare the type of data returned from a step connection
  • A step before a typeless finish connection, in order to declare the type of data that a finish connection expects
@kcbabo kcbabo added the Epic label Nov 13, 2017
@chirino
Copy link
Contributor

chirino commented Nov 13, 2017

Let me come up with some scenarios:

  1. Say you have a typeless start connection, and typeless end connection. You could add "Describe Data" step there, but it would not really be required right? It has no runtime processing associated with it? What if the data being passed is not actually what was described?

  2. Say you have a typeless start connection, and typeless end connection. Could you add 2 "Describe Data" steps and a Mapper in between those to do a data mapping between typeless connections?

@kcbabo
Copy link
Author

kcbabo commented Nov 14, 2017

Excellent scenarios!

Say you have a typeless start connection, and typeless end connection. You could add "Describe Data" step there, but it would not really be required right?

That's right. Syndesis should be able to process typeless data - you only need to add a data type if you want to use type-aware features of Syndesis.

It has no runtime processing associated with it?

Correct - it's only metadata.

What if the data being passed is not actually what was described?

That's on the user, IMO.

Say you have a typeless start connection, and typeless end connection. Could you add 2 "Describe Data" steps and a Mapper in between those to do a data mapping between typeless connections?

'Zactly!

@kahboom
Copy link

kahboom commented Nov 14, 2017

@chirino What if the data being passed is not actually what was described?
@kcbabo That's on the user, IMO.

Maybe would be good to let the user know there is a discrepancy, in case they did it in error.

@kcbabo
Copy link
Author

kcbabo commented Nov 14, 2017

Agree in principle, but very tough in practice IMO. Can't really do this at the time the integration flow is created because you need the actually data that would be received at runtime. Might be able to log something at runtime, but that could be expensive from a performance standpoint on the happy path. But if there's a easy/reliable/performant way of doing it, then I'm all for it! :-)

@lburgazzoli
Copy link
Contributor

About reporting, we could eventually leverage camel's health checks and add a pass-through processor to the flow that checks and report type correctness, then you can choose if you want this check to be included in application's health endpoint or just report it.

@kcbabo
Copy link
Author

kcbabo commented Nov 14, 2017

Another option would be to allow the user to decide if they want to validate the payload at runtime as part of the Describe Data step in the flow. It should be disabled by default, but when enabled it performs a validation based on the data definition.

Schema validation is crazy expensive, so this would have a significant impact on performance. There's also the issue of how you validate against an instance document.

Overall, I think it's an interesting idea as a future feature, but don't view it as critical functionality for a first pass.

@dhirajsb
Copy link

Schema validation is expensive, but there could be cheaper validation actions like content type checks that might make sense. For example, check that a JSON service is getting JSON request payload, and not XML.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants