Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate data in raw logs #3

Closed
surfrock66 opened this issue May 13, 2016 · 1 comment
Closed

Duplicate data in raw logs #3

surfrock66 opened this issue May 13, 2016 · 1 comment
Assignees

Comments

@surfrock66
Copy link
Owner

A few days ago, I made a commit that resulted in TONS of extra data. Like 1 session with 160,000 datapoints. The bug has been fixed. To fix the data, I had to add a primary key and do a dedupe, you can do this too:

mysql -D torque -u root -v -e "ALTER TABLE raw_logs ADD uniqueid INT UNSIGNED NOT NULL AUTO_INCREMENT, ADD PRIMARY KEY (uniqueid)"
for i in `mysql -D torque -u root -N -e "select time from raw_logs group by time having count(time) > 5" | sed 's/[|+-]//g'`; do mysql -D torque -u root -v -e "DELETE n1 FROM raw_logs n1, raw_logs n2 WHERE n1.uniqueid > n2.uniqueid AND n1.time = n2.time AND n1.time=$i"; done
mysql -D torque -u root -v -e "ALTER TABLE raw_logs DROP COLUMN uniqueid"

This basically deletes any datapoints where there's 2 entries for the same time. I am considering making the unique ID change permanant. It will be less of a storage hit once I dedupe session columns from the raw logs table.

@surfrock66 surfrock66 self-assigned this May 13, 2016
@surfrock66
Copy link
Owner Author

surfrock66 commented May 16, 2016

Since it takes over the rig to do this delete, I am also using this command in a screen session to clean up in batches of 100:

k=100; j=1; date; mysql -D torque -u root -N -e "select time from raw_logs group by time having count(time) > 5 order by count(time) asc" | wc -l; date; for i in``mysql -D torque -u root -N -e "select time from raw_logs group by time having count(time) > 5 order by count(time) asc limit $k" | sed 's/[|+-]//g'``; do echo "Query $j of $k"; date; mysql -D torque -u root -v -e "DELETE n1 FROM raw_logs n1, raw_logs n2 WHERE n1.uniqueid > n2.uniqueid AND n1.time = n2.time AND n1.time=$i"; date; sleep 10; j=$((j+1)); done

surfrock66 added a commit that referenced this issue Mar 1, 2019
…mum session size is now configurable in the creds.php file. Thanks to adel-s for the code.
surfrock66 pushed a commit that referenced this issue Sep 21, 2022
Adds the functionality of uploading csv generated by torque
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant