-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add transformations #1555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 15e4360 in 2 minutes and 10 seconds
More details
- Looked at
1868
lines of code in7
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. pandasai/data_loader/transformation_manager.py:301
- Draft comment:
Ensure the column is of string type before applyingstr.replace
to handle non-string columns correctly. - Reason this comment was not posted:
Marked as duplicate.
2. pandasai/data_loader/transformation_manager.py:77
- Draft comment:
Ensure the column is converted to datetime before applying timezone conversion to handle non-datetime columns correctly. - Reason this comment was not posted:
Comment did not seem useful.
3. pandasai/data_loader/transformation_manager.py:205
- Draft comment:
Ensure the column is converted to datetime before formatting to handle non-datetime columns correctly. - Reason this comment was not posted:
Marked as duplicate.
4. pandasai/data_loader/transformation_manager.py:381
- Draft comment:
Ensure the column is of string type before applyingstr.rjust
orstr.ljust
to handle non-string columns correctly. - Reason this comment was not posted:
Comment did not seem useful.
5. pandasai/data_loader/transformation_manager.py:8
- Draft comment:
TheTransformationManager
class is a significant addition to the library. Ensure that the documentation is updated to include details about this class and its usage. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_T3jluj1GEA86vlVh
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 743008b in 36 seconds
More details
- Looked at
1107
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. tests/unit_tests/data_loader/test_transformation_manager.py:53
- Draft comment:
Ensure the input timestamps are timezone-aware before testingconvert_timezone
. If they are naive, usetz_localize
to set a timezone first. - Reason this comment was not posted:
Comment did not seem useful.
2. pandasai/data_loader/transformation_manager.py:287
- Draft comment:
Ensure that inputs tostr.replace
are sanitized if they come from untrusted sources to prevent potential security issues. - Reason this comment was not posted:
Confidence changes required:50%
Thereplace
method intransformation_manager.py
usesstr.replace
, which is vulnerable to SQL injection if the input is not sanitized. However, since this is a DataFrame operation and not directly interacting with a database, the risk is minimal. Still, it's good practice to ensure inputs are sanitized if they are coming from an untrusted source.
Workflow ID: wflow_6gggmxP1BVmnffeG
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1555 +/- ##
==========================================
+ Coverage 80.42% 81.50% +1.07%
==========================================
Files 62 63 +1
Lines 2105 2271 +166
==========================================
+ Hits 1693 1851 +158
- Misses 412 420 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -202,18 +203,11 @@ def execute_query(self, query: str, params: Optional[list] = None) -> pd.DataFra | |||
) from e | |||
|
|||
def _apply_transformations(self, df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the view case should be handled.
Under the hood columns in schema "table.column" get transformed in "table_column" the same should apply for transformation. And check should be added as well in SemanticLayerSchema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on fe22453 in 46 seconds
More details
- Looked at
39
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. pandasai/constants.py:57
- Draft comment:
Consider adding a comment or documentation to explain the purpose of each transformation type for better maintainability and understanding. - Reason this comment was not posted:
Confidence changes required:50%
The addition of new transformation types is straightforward, but it lacks a description or context in the PR description. However, the change itself is clear and consistent with the existing format.
Workflow ID: wflow_AETPCnswz6rQIjJF
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@@ -54,4 +54,33 @@ | |||
|
|||
VALID_COLUMN_TYPES = ["string", "integer", "float", "datetime", "boolean"] | |||
|
|||
VALID_TRANSFORMATION_TYPES = ["anonymize", "convert_timezone"] | |||
VALID_TRANSFORMATION_TYPES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure that there are corresponding unit tests for the new transformation types added. Additionally, if these transformations are part of a library, update the documentation accordingly.
…das-ai into fix/SIN-SIN-348 * 'fix/SIN-SIN-348' of https://github.com/scaliseraoul/pandas-ai: feat: add transformations (sinaptik-ai#1555) fix(dependencies): update poetry file and pyproject files (sinaptik-ai#1557)
Important
Adds
TransformationManager
for data transformations in PandaAI, updates documentation, and includes comprehensive unit tests.TransformationManager
class intransformation_manager.py
to handle data transformations on DataFrames.anonymize
,convert_timezone
,to_lowercase
,to_uppercase
,strip
,round_numbers
,scale
,format_date
,to_numeric
,to_datetime
,fill_na
,replace
,extract
,truncate
,pad
,clip
,bin
,normalize
,standardize
,map_values
,encode_categorical
,validate_email
,validate_date_range
,normalize_phone
,remove_duplicates
,validate_foreign_key
,ensure_positive
,standardize_categories
.DatasetLoader
inloader.py
to useTransformationManager
for applying transformations.transformations.mdx
to document available transformations and usage examples.mint.json
to include transformations documentation in the navigation.VALID_TRANSFORMATION_TYPES
inconstants.py
to include new transformation types.test_transformation_manager.py
to cover all transformation methods.test_loader.py
to test integration of transformations in data loading process.This description was created by
for fe22453. It will automatically update as commits are pushed.