Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for infinite recursion due to secrets_masker #35048

Merged
merged 2 commits into from
Nov 22, 2023

Conversation

Usiel
Copy link
Contributor

@Usiel Usiel commented Oct 19, 2023

Related: #19816 (comment)
See also: #20039
Broke with: #33520

We can get into trouble for text types that cannot be initiated with re2's type(obj)() call. The secrets_masker thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth.

We can break the recursion by printing out the repr of the object in question in the warning log. This issue has occured previously: #19816 (comment)

I noticed this issue while working with a DAG that calls Airflow's DB cleanup function, thus I'm fixing the logging call in db_cleanup to log a stringified name instead of the quoted_name object.

I thought about fixing the SecretsMasker call to the reducer, i.e. only passing objects that re2 can handle (i.e. having a parameterless constructor), but decided against it. Checking for this would probably not be cheap and considering this scenario is quite unlikely, I think it's better to simply fail and emit a warning log (without the infinite recursion :)).

Testing

I added a basic test that fails next time someone refactors the warning log. I also used the following DAG for a more comprehensive end-to-end test:

from datetime import datetime

from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator

class MyStringClass(str):
    def __init__(self, required_arg):
        pass

def fail(task_instance):
    # make sure the `SecretsMasker` has a replacer
    Variable.set(key="secret", value="secret_value")
    Variable.get("secret")
    # trigger the infinite recursion
    task_instance.log.info("%s", MyStringClass("secret_value"))

with DAG(
    dag_id="secrets_masker_recursion",
    start_date=datetime(2023, 9, 26),
):
    PythonOperator(task_id="fail", python_callable=fail)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@uranusjr
Copy link
Member

This feels a bit too fragile to me; it is much too easy for new code to introduce a logging call that hits the errornous code path again. Is it possible to fix this in the log rendering logic instead of relying on the call site to hold this correctly?

@Taragolis
Copy link
Contributor

I think the also problem here with inheritance of builtin type str, with collections.UserString there is no recursion error, except the fact that there is no masking secrets in this case 🤣

from collections import UserString

class MyStringClass(UserString):
    pass

"Error was: %s: %s",
item,
repr(item),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think it is one of possible way to solve the issue. In the current implementation we pass into arguments which unable to masked and the error happen again, as result recursion happen.

Maybe we even would like to build warning message outside of the logger:

        except Exception as exc:
            warning_msg = (
                r"Unable to redact {item!r}, please report this via <https://github.com/apache/airflow/issues>. "
                "Error was: {type(exc).__name__}: {exc}",
            )
            log.warning(warning_msg)
            return item

Another option: use logger without mask_secrets filter here

@uranusjr WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Taragolis Just curious, should we print the item in logs at all, It kinda defeats the purpose of mask_secrets filters right? maybe instead we can just print the class name? or just enough information to debug?

Copy link
Member

@uranusjr uranusjr Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-rendering a log message is against best practice since it would not work well for external logging services such as Sentry. There is a linter rule for this: https://docs.astral.sh/ruff/rules/logging-f-string/

logger.debug("rows moved; purging from %s", source_table.name)
logger.debug("rows moved; purging from %s", str(source_table.name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we want to include this fix in this PR? Because it seems a bit hacky here, and just hide actual problem

@Usiel
Copy link
Contributor Author

Usiel commented Oct 23, 2023

This feels a bit too fragile to me; it is much too easy for new code to introduce a logging call that hits the errornous code path again. Is it possible to fix this in the log rendering logic instead of relying on the call site to hold this correctly?

I get your concern. I would still leave in the fix the log.warning(...) call, as that makes sure Airflow doesn't die when the next unhandled case occurs. I like @utkarsharma2's idea on just printing out some debug info here (instead of the str(...) call).

Regarding actually fixing this particular case, where item passes isinstance(item, str), but then cannot be handled by re2's substitute code (e.g. sqlalchemy.sql.elements.quoted_name): I wonder if we can simply call str(...) on the item before passing it to the replacer, since that should result in a sensible output;item is either str or one of the compatibility types like six.text_type or SQLAlchemy's quoted_name.
If we fix it here, then we can remove the change on db_cleanup as you suggested @Taragolis.

@@ -259,11 +259,11 @@ class SecretsMasker(logging.Filter):
             elif isinstance(item, str):
                 if self.replacer:
                     ...
-                    return self.replacer.sub("***", item)
+                    return self.replacer.sub("***", str(item))
                 return item
             elif isinstance(item, (tuple, set)):
                 ...

@Usiel
Copy link
Contributor Author

Usiel commented Oct 23, 2023

I think the also problem here with inheritance of builtin type str, with collections.UserString there is no recursion error, except the fact that there is no masking secrets in this case 🤣

from collections import UserString

class MyStringClass(UserString):
    pass

Here's the class that caused this issue in the wild: https://github.com/sqlalchemy/sqlalchemy/blob/rel_1_4_49/lib/sqlalchemy/sql/elements.py#L5142

@Usiel
Copy link
Contributor Author

Usiel commented Oct 30, 2023

Rebased on main, incorporated the suggestion by @Taragolis (not to filter the warning log) and added the suggested fix for the underlying bug that brought me here: str-like class with constructor causing issues.

@Usiel Usiel force-pushed the usiel/fix-infinite-recursion-for-masker branch from 18d1593 to 20db9a6 Compare October 30, 2023 09:00
We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth.

We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: apache#19816 (comment)
Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`.

I noticed this issue while working with a DAG that calls Airflow's DB cleanup function.

Example DAG:
```
from datetime import datetime

from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator

class MyStringClass(str):
    def __init__(self, required_arg):
        pass

def fail(task_instance):
    # make sure the `SecretsMasker` has a replacer
    Variable.set(key="secret", value="secret_value")
    Variable.get("secret")
    # trigger the infinite recursion
    task_instance.log.info("%s", MyStringClass("secret_value"))

with DAG(
    dag_id="secrets_masker_recursion",
    start_date=datetime(2023, 9, 26),
):
    PythonOperator(task_id="fail", python_callable=fail)

```
@Usiel Usiel force-pushed the usiel/fix-infinite-recursion-for-masker branch from 20db9a6 to fdd568a Compare November 22, 2023 08:50
@Usiel
Copy link
Contributor Author

Usiel commented Nov 22, 2023

It's been a while! 😄 Rebased on the latest main and happy to continue to work on this if needed.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk potiuk merged commit 6e8f646 into apache:main Nov 22, 2023
47 checks passed
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Nov 23, 2023
ephraimbuddy pushed a commit that referenced this pull request Nov 23, 2023
* Fix for infinite recursion due to secrets_masker

We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth.

We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: #19816 (comment)
Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`.

I noticed this issue while working with a DAG that calls Airflow's DB cleanup function.

Example DAG:
```
from datetime import datetime

from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator

class MyStringClass(str):
    def __init__(self, required_arg):
        pass

def fail(task_instance):
    # make sure the `SecretsMasker` has a replacer
    Variable.set(key="secret", value="secret_value")
    Variable.get("secret")
    # trigger the infinite recursion
    task_instance.log.info("%s", MyStringClass("secret_value"))

with DAG(
    dag_id="secrets_masker_recursion",
    start_date=datetime(2023, 9, 26),
):
    PythonOperator(task_id="fail", python_callable=fail)

```

* Improve error message

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
ephraimbuddy pushed a commit that referenced this pull request Nov 26, 2023
* Fix for infinite recursion due to secrets_masker

We can get into trouble for types that cannot be initiated with re2's `type(obj)()` call. The `secrets_masker` thus fails, which triggers a warning log, which also fails because we pass the object to the logger, which is then masked again, and so forth.

We can break the recursion by emitting a log without trying to redact the value again (this ensures no new bug will cause a stack overflow). This issue has occured previously: #19816 (comment)
Additionally, we fix this particular bug by ensuring whatever re2 receives is a simple `str`.

I noticed this issue while working with a DAG that calls Airflow's DB cleanup function.

Example DAG:
```
from datetime import datetime

from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator

class MyStringClass(str):
    def __init__(self, required_arg):
        pass

def fail(task_instance):
    # make sure the `SecretsMasker` has a replacer
    Variable.set(key="secret", value="secret_value")
    Variable.get("secret")
    # trigger the infinite recursion
    task_instance.log.info("%s", MyStringClass("secret_value"))

with DAG(
    dag_id="secrets_masker_recursion",
    start_date=datetime(2023, 9, 26),
):
    PythonOperator(task_id="fail", python_callable=fail)

```

* Improve error message

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants