-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CsvMergeExclusive class delete rows when all column values match. #471
Conversation
cliboa/scenario/transform/csv.py
Outdated
@@ -430,6 +444,16 @@ def _read_csv_func(self, chunksize, fi, fo): | |||
) | |||
first_write = False | |||
|
|||
def _all_elements_match(self, df_src_list): | |||
target_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
全てのカラムを総当たりで比較しているため、カラム数が多い場合のパフォーマンスに問題が出る可能性があります。
パフォーマンスに問題がある場合、行全体のハッシュ値を比較するなどの改善を検討して下さい。
<例>
def _all_elements_match(self, df_src_list):
# ターゲットリストの行をハッシュセットに変換
df_target_set = {hash(tuple(row)) for row in self.df_target_list}
# ソース行のハッシュがターゲットに含まれる場合のインデックスリストを取得
target_list = [i for i, row in enumerate(df_src_list) if hash(tuple(row)) in df_target_set]
return target_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df_target_setは一度作成すればよいはずなので、この関数の外で作るとより改善が見込まれると思います。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ご確認ありがとうございます
対応しました
cliboa/scenario/transform/csv.py
Outdated
df_target = pandas.read_csv(self._target_compare_path, usecols=[self._target_column]) | ||
self.df_target_list = df_target[self._target_column].values.tolist() | ||
if self._all_column: | ||
df_target = pandas.read_csv(self._target_compare_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
元のコードもそうなのですが、引数にdtype=strを指定するべきかもしれません。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ご確認ありがとうございます
対応しました
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_columnの分岐によりややコードの見通しが悪化しているので、リファクタリングを検討してください。
このブランチについてはマージさせていただきます。
Brief
CsvMergeExclusive class delete rows when all column values match.
Points to Check
Test
Confirmed
Review Limit
As soon as possible.