Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CsvMergeExclusive class delete rows when all column values match. #471

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

nissy0409240
Copy link
Collaborator

@nissy0409240 nissy0409240 commented Oct 22, 2024

Brief

CsvMergeExclusive class delete rows when all column values match.

Points to Check

  • Is there any discomfort in the corresponding parts and contents?

Test

Confirmed

Review Limit

  • As soon as possible.

@@ -430,6 +444,16 @@ def _read_csv_func(self, chunksize, fi, fo):
)
first_write = False

def _all_elements_match(self, df_src_list):
target_list = []
Copy link
Collaborator

@yasuhiro-ohba yasuhiro-ohba Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

全てのカラムを総当たりで比較しているため、カラム数が多い場合のパフォーマンスに問題が出る可能性があります。
パフォーマンスに問題がある場合、行全体のハッシュ値を比較するなどの改善を検討して下さい。
<例>
def _all_elements_match(self, df_src_list):
# ターゲットリストの行をハッシュセットに変換
df_target_set = {hash(tuple(row)) for row in self.df_target_list}
# ソース行のハッシュがターゲットに含まれる場合のインデックスリストを取得
target_list = [i for i, row in enumerate(df_src_list) if hash(tuple(row)) in df_target_set]
return target_list

Copy link
Collaborator

@yasuhiro-ohba yasuhiro-ohba Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df_target_setは一度作成すればよいはずなので、この関数の外で作るとより改善が見込まれると思います。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ご確認ありがとうございます
対応しました

df_target = pandas.read_csv(self._target_compare_path, usecols=[self._target_column])
self.df_target_list = df_target[self._target_column].values.tolist()
if self._all_column:
df_target = pandas.read_csv(self._target_compare_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

元のコードもそうなのですが、引数にdtype=strを指定するべきかもしれません。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ご確認ありがとうございます
対応しました

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_columnの分岐によりややコードの見通しが悪化しているので、リファクタリングを検討してください。
このブランチについてはマージさせていただきます。

@yasuhiro-ohba yasuhiro-ohba merged commit e8a8351 into master Oct 30, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CsvMergeExclusive class delete rows when all column values ​​match.
2 participants