Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-stripe: pass type checks #35587

Closed

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Feb 23, 2024

What

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/6349

We want to make mypy pass on a strategic connector. I picked source-stripe

How

  • Install mypy as a dev dependency
  • Run poetry run mypy source_stripe --disallow-untyped-defs
  • Fix the errors

It took me less than a day to make the type check pass.

Notes for reviewers

I'm unsure of 3 changes and would appreciate a connector expert some suggestion:

  1. Here we call a method declared on HttpStream on a object typed as Stream. Can we make HttpAvailabilityStrategy handle HttpStream instead of Stream?
  2. This code is dubious: we might call HttpStream.path but this method is not doing anything as it's not implemented in the abstract class.
  3. math.inf (which is a float) was set for state_checkpoint_interval. I believe setting None is more correct but I'm not 💯 sure.

Copy link

vercel bot commented Feb 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 28, 2024 0:30am

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Feb 23, 2024
Copy link
Contributor Author

alafanechere commented Feb 23, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @alafanechere and the rest of your teammates on Graphite Graphite

@octavia-squidington-iii octavia-squidington-iii added area/documentation Improvements or additions to documentation connectors/source/stripe labels Feb 23, 2024
@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from c26d216 to 21c7982 Compare February 23, 2024 15:58
@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from 21c7982 to ea5f477 Compare February 23, 2024 16:09
@alafanechere alafanechere marked this pull request as ready for review February 23, 2024 16:12
@octavia-squidington-iv octavia-squidington-iv requested a review from a team February 23, 2024 16:13
@maxi297
Copy link
Contributor

maxi297 commented Feb 23, 2024

  1. HttpAvailabilityStrategy inherits from AvailabilityStrategy which expose the information in terms of Stream. So if we want to respect the typing and the Liskov principle, the conclusion is that our abstractions are not so great. From a quick look at this, I don’t get why the information provided by parse_response_error_message is not part of the exception that is raised. I don't know what would be the scope of changing that.

  2. Does that mean that this path is never used and then we can raise here? I'm not entirely sure about that since it is marked as Optional here

  3. None is indeed the right value as seen here

@maxi297
Copy link
Contributor

maxi297 commented Feb 23, 2024

Diagram to document the issue with point 1:

image

Basically, we are trying to pass technology specific information though a technologic agnostic layer.

One proposal would be to use catch the technology specific errors in this layer and raise them as AirbyteTraceExceptions like this
image

@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from f2af3c6 to 742f3fc Compare February 24, 2024 09:34
@octavia-squidington-iv octavia-squidington-iv requested a review from a team February 26, 2024 17:13
@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from 532cb62 to 80301c2 Compare February 26, 2024 19:10
@alafanechere alafanechere changed the base branch from master to issue-35110/relax-cats-when-primary-key February 26, 2024 20:24
@alafanechere alafanechere requested review from a team as code owners February 26, 2024 20:24
@alafanechere alafanechere changed the base branch from issue-35110/relax-cats-when-primary-key to master February 26, 2024 20:25
@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from 80301c2 to 28290e0 Compare February 26, 2024 20:28
@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from 28290e0 to 8f97834 Compare February 26, 2024 21:19
@alafanechere alafanechere force-pushed the augustin/02-23-source-stripe_pass_type_checks branch from 89569e2 to 55b5330 Compare February 27, 2024 00:27
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern about isinstance being used. For the rest, this is nit or me trying to understand things

@@ -29,8 +30,9 @@ def _check_availability_for_sync_mode(
sync_mode: SyncMode,
logger: logging.Logger,
source: Optional["Source"],
stream_state: Optional[Mapping[str, Any]],
stream_state: Optional[Mapping[Any, Any]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typing change seems odd to me. Why is this needed? get_first_stream_slice should expect a Mapping[str, Any] and it seems like the only place this is used

@@ -90,7 +92,12 @@ def handle_http_error(
raise error
doc_ref = self._visit_docs_message(logger, source)
reason = f"The endpoint {error.response.url} returned {status_code}: {error.response.reason}. {error_message}. {doc_ref} "
response_error_message = stream.parse_response_error_message(error.response)
# TODO alafanechere
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't because of the inheritance we have in place right now. This comment describes the situation

@@ -24,7 +23,7 @@
from airbyte_cdk.sources.streams.concurrent.state_converters.datetime_stream_state_converter import EpochValueConcurrentStreamStateConverter
from airbyte_cdk.sources.streams.http.auth import TokenAuthenticator
from airbyte_cdk.utils.traced_exception import AirbyteTracedException
from airbyte_protocol.models import SyncMode
from airbyte_protocol.models import SyncMode # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document why we have this type ignore? I'm unclear as to why we would ignore this

self,
catalog: Optional[ConfiguredAirbyteCatalog],
config: Optional[Mapping[str, Any]],
state: Union[list[Any], MutableMapping[str, Any], None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we keep TState here? I'm not very knowledgeable as to how MyPy handles TypeVar

@@ -83,13 +88,14 @@ def __init__(self, catalog: Optional[ConfiguredAirbyteCatalog], config: Optional
self._streams_configured_as_full_refresh = set()

@staticmethod
def validate_and_fill_with_defaults(config: MutableMapping[str, Any]) -> MutableMapping[str, Any]:
def validate_and_fill_with_defaults(config: Mapping[str, Any]) -> Mapping[str, Any]:
mutable_config = dict(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't MutableMapping allow for modifying the config? In practice, it is probably a dict anyway here

@@ -213,19 +219,20 @@ def get_parent_stream(self, stream_state: Mapping[str, Any]) -> StripeStream:

class CreatedCursorIncrementalStripeStream(StripeStream):
# Stripe returns most recently created objects first, so we don't want to persist state until the entire stream has been read
state_checkpoint_interval = math.inf
# TODO: alafanechere - confirm None is correct instead of math.inf
state_checkpoint_interval = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm because of this

) -> MutableMapping[str, Any]:
params = super(CreatedCursorIncrementalStripeStream, self).request_params(stream_state, stream_slice, next_page_token)
assert isinstance(stream_slice, dict), "stream_slice must be a dictionary"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit dangerous as we could want to change this to any class that support the interface __get__ and this would start failing. Is there are reason why we need to validate that it is an instance of a dict?

@@ -511,15 +523,17 @@ def __init__(self, *args, **kwargs):
start_date=self.start_date,
)

def path(self, stream_slice: Mapping[str, Any] = None, **kwargs):
def path(self, stream_slice: Optional[Mapping[str, Any]] = None, **kwargs: Any) -> str:
assert isinstance(stream_slice, dict), "stream_slice must be a dictionary"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit dangerous as we could want to change this to any class that support the interface __get__ and this would start failing. Is there are reason why we need to validate that it is an instance of a dict?

slice_ | rec
for rec in parent_records
for slice_ in incremental_slices
if isinstance(slice_, dict) and isinstance(rec, dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit dangerous as we could want to change this to any class that support the interface __get__ and this would start failing. Is there are reason why we need to validate that it is an instance of a dict?

@natikgadzhi
Copy link
Contributor

@Jgerardopine, would you like to take over the PR as an onboarding task for Eric? It will need more work, but I think the comments here are very useful for someone learning the architecture, and @maxi297 is the reviewer.

@alafanechere, hope that's fine by you!

@postamar postamar removed request for a team March 15, 2024 12:57
Copy link
Collaborator

@lazebnyi lazebnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere is this PR still relevant?

@natikgadzhi
Copy link
Contributor

natikgadzhi commented Jul 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/stripe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants