Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

618 local files with size "0 bytes" or "null" #3642

Closed
raprasad opened this issue Feb 22, 2017 · 4 comments
Closed

618 local files with size "0 bytes" or "null" #3642

raprasad opened this issue Feb 22, 2017 · 4 comments

Comments

@raprasad
Copy link
Contributor

raprasad commented Feb 22, 2017

  • Local files: 618 have file size of 0 or null in the database.
  • Harvested files are inconsistent regarding whether filesize shows as null or 0
    • e.g. filesize does not appear to be harvested

(Queries are shown below for further checking/sanity checks)

Fizesize is null

  • All: 31,177 (1a)
  • Harvested: 31,052 (1b)
  • Local: 125 (1c)

Fizesize is 0

  • All: 108,156 (2a)
  • Harvested: 107,663 (2b)
  • Local: 493 (2c)

  • (1a) Select count(*) from datafile where filesize is null;

  • (1b) select count(df.id) from datafile df, dvobject dv where df.filesize is null and df.id = dv.id and dv.owner_id in (select id from dataset where harvestingclient_id is not null);

  • (1c) select count(df.id) from datafile df, dvobject dv where df.filesize is null and df.id = dv.id and dv.owner_id in (select id from dataset where harvestingclient_id is null);


  • (2a) Select count(*) from datafile where filesize = 0;

  • (2b) select count(df.id) from datafile df, dvobject dv where df.filesize = 0 and df.id = dv.id and dv.owner_id in (select id from dataset where harvestingclient_id is null);

  • (2c) select count(df.id) from datafile df, dvobject dv where df.filesize = 0 and df.id = dv.id and dv.owner_id in (select id from dataset where harvestingclient_id is not null);

@djbrooke djbrooke changed the title 618 local files with size "0 bytes" or "null" 618 local files with size "0 bytes" or "null" Feb 23, 2017
@pdurbin
Copy link
Member

pdurbin commented Mar 1, 2017

The code change I would make for this issue is to try to get working the "dataset integrity" code I proposed but ultimately removed in 019bbb3 as part of pull request #3605 for #3589.

@raprasad
Copy link
Contributor Author

raprasad commented Mar 1, 2017

Proposed fix:

  1. Local files:
    • API endpoint to get actual file size and fix in database
    • Only superuser can use
  2. Harvested:
    • Set file size to null, or appropriate setting
    • Question: Are file sizes "harvested"?
  3. Automated checker for bad data
    • Rerun queries above and send results. Could be in miniverse.

@pdurbin
Copy link
Member

pdurbin commented Jan 31, 2018

I just noticed this issue mentioned at https://help.hmdc.harvard.edu/Ticket/Display.html?id=257561 and wanted to point out that https://services.dataverse.harvard.edu/miniverse/qc/dashboard has a "File size 0 or null" check. As of this writing it shows 874 files, like this:

screen shot 2018-01-30 at 10 59 12 pm

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2018

This issue is super-specific to Harvard Dataverse. No one seems to be championing it. Closing.

@pdurbin pdurbin closed this as completed Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants