Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] <DataMe> - <World Economic Data> #2026

Closed
1 of 2 tasks
QodeNu opened this issue May 31, 2023 · 86 comments
Closed
1 of 2 tasks

[DataCap Application] <DataMe> - <World Economic Data> #2026

QodeNu opened this issue May 31, 2023 · 86 comments

Comments

@QodeNu
Copy link

QodeNu commented May 31, 2023

Data Owner Name

DataMe

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

Hong Kong

Data Owner Industry

IT & Technology Services

Website

http://www.econostatistics.co.za/

Social Media

n/a

Total amount of DataCap being requested

10PiB

Expected size of single dataset (one copy)

2P

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

DATAME was established in Hong Kong in 2021. We focus on the decentralized storage track. In the past, we focused on cc packaging. Currently, we have in total of 50P storage power on Filecoin. Now we are ready to upgrade the software and hardware from cc to dc. I am a pure Filecoin machine and computing power investor, without official website and promotional. Thank you for your undestanding.
Economic data is crucial in the process of human development and affects the development of countries and individuals. We plan to upload economic data that is critical to human development to the Filecoin network. We have collected 30 large-scale economics-related datasets, in a total of 20P storage capacity, with 10 backups.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Economic data is crucial in the process of human development and affects the development of countries and individuals. We plan to upload economic data that is critical to human development to the Filecoin network. We have collected 30 large-scale economics-related datasets, in a total of 20P storage capacity, with 10 backups.
Dataset including but not all: 
Asian Productivity Organization (APO) 
ASEAN Stats 
American Economic Association (AEA)
Asian KLEMS 
Harvard Atlas of Economic Complexity 
BIS Financial Database
Barro-Lee Education Attainment - Barro-Lee Educational Attainment Data from 1950 to 2010
CEPII Database 
EUKLEMS - EU KLEMS is an industry level, growth and productivity research project. EU KLEMS
Economic Freedom of the World Data
Latin America KLEMS 
Long-Term Productivity Database 
Maddison Project Database
National Transfer Accounts 
OpenCorporates Database of Companies in the World
Our World in Data [Meta]
Penn World Table - PWT version 10.0

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://www.aeaweb.org/resources/data
https://www.apo-tokyo.org/
https://data.aseanstats.org/
https://dataverse.harvard.edu/dataverse/atlas
http://www.historicalstatistics.org/
http://www.econostatistics.co.za/
https://www.upcdatabase.com/
http://www.jedh.org/
https://www.rug.nl/ggdc/valuechain/wiod/
https://www.fraserinstitute.org/economic-freedom/dataset?geozone=world&page=dataset&min-year=2&max-year=0&filter=0

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

f02199393/f02192496/f02095766/f01971431/f02115125/f02185816 so far

How do you plan to make deals to your storage providers

Boost client, Lotus client, Droplet client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

@large-datacap-requests large-datacap-requests bot added the very large application For LDN applications over 5+ PiB label May 31, 2023
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@large-datacap-requests large-datacap-requests bot added very large application For LDN applications over 5+ PiB validated and removed very large application For LDN applications over 5+ PiB labels May 31, 2023
@Sunnyiscoming
Copy link
Collaborator

You are applying for a total amount of 10 PiB, but the single dataset is 2PiB and the number of replicas is 10.
(2 PiB x 10 copies) = 20 PiB
Why do you apply for 10 PiB and not 20 PiB?

Could you send your business license to filplus-app-review@fil.org in order to confirm your identity? Email name should includes the issue id #2026?

@QodeNu
Copy link
Author

QodeNu commented Jun 1, 2023

@Sunnyiscoming Cause I wanna go through step by step. I'm not sure whether the process is smooth or not, even though I have much more data. Looking forward your support. Thx!

yup, sure, I have sent the business licences through my email.

@Sunnyiscoming
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

10PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

@large-datacap-requests
Copy link

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

DataCap allocation requested

512TiB

Id

b8fce6c5-8fc8-4ce9-af06-3bd54ef34d85

@large-datacap-requests large-datacap-requests bot added very large application For LDN applications over 5+ PiB ready to sign and removed validated very large application For LDN applications over 5+ PiB labels Jun 1, 2023
@kernelogic
Copy link

In support for public data

Copy link

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecpba32guma6uqajq5sqczrviorcv7l5ye5cnsjs76y34x4dnvkic

Address

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

Datacap Allocated

512.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

b8fce6c5-8fc8-4ce9-af06-3bd54ef34d85

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecpba32guma6uqajq5sqczrviorcv7l5ye5cnsjs76y34x4dnvkic

@large-datacap-requests large-datacap-requests bot added very large application For LDN applications over 5+ PiB ready to sign start sign datacap and removed ready to sign very large application For LDN applications over 5+ PiB labels Jun 2, 2023
Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebklrbhq46iqguqwcikuovg7gtdkhj5agm6qvhhhp66q6l5xlt34y

Address

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

Datacap Allocated

512.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

b8fce6c5-8fc8-4ce9-af06-3bd54ef34d85

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebklrbhq46iqguqwcikuovg7gtdkhj5agm6qvhhhp66q6l5xlt34y

@large-datacap-requests large-datacap-requests bot added very large application For LDN applications over 5+ PiB granted and removed ready to sign start sign datacap very large application For LDN applications over 5+ PiB labels Jun 2, 2023
@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 6.06%
  • Overall HTTP retrieval success rate: 8.87%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 91.77% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.
Click here to view the Retrieval Dashboard.
Click here to view the Retrieval report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@github-actions github-actions bot removed the Stale label Aug 12, 2023
@large-datacap-requests
Copy link

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

DataCap allocation requested

2PiB

Id

38b5b36d-70d4-4c8f-9e00-7f5e3634ec38

@large-datacap-requests
Copy link

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1cgqp6ivn2expwksf4hmbv6qp2v6gi4r2qh6umzy

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

1.862645149230957e+37YiB

Datacap to be granted to reach the total amount requested by the client (10PiB)

1.862645149230957e+37YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
101882 23 2PiB 17.25 204.03TiB

@maxvint
Copy link

maxvint commented Aug 15, 2023

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 5.98%
  • Overall HTTP retrieval success rate: 9.52%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f02328179: 40.98%, f02359258: 55.66%

Deal Data Replication

⚠️ 91.93% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.
Click here to view the Retrieval Dashboard.
Click here to view the Retrieval report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@QodeNu
Copy link
Author

QodeNu commented Aug 15, 2023

Regarding the backup of retrieved reports, we have backed them up on different nodes. Since the efficiency of node storage is not completely uniform, but the storage has been scheduled to see a good improvement in reporting in the next round.

@cryptowhizzard
Copy link

This client is actively stalling http retrievals and blocked http ranged requests with a reverse proxy to prevent it's data being investigated.

It works as follows:

One set's a bandwidth limit with NGINX on the HTTP retrieval. After a random certain amount the limit is set to zero. This makes the transfer timeout. Because range retrieval is disabled in NGINX one cannot pick up where he left and needs to start all over again.

Log can be found at http://datasetcreators.com/downloadedcarfiles/logs/2026.log

Scherm­afbeelding 2023-08-15 om 15 50 54

@QodeNu
Copy link
Author

QodeNu commented Aug 16, 2023

@cryptowhizzard @raghavrmadya
The log you provided shows that your retrieval attempt was on August 4th, but the comment was made on August 15th. During this period, you didn't make any further retrieval attempts, yet you criticized the client without evidence and made baseless speculations.

It's quite clear that we haven't restricted the bandwidth for HTTP retrieval using NGINX. If you had carefully reviewed the client, you would know that the port number you retrieved from was incorrect.

To prevent any potential impact from network issues on retrieval results, I conducted HTTP retrievals for the same CID from machines located in Singapore and Hong Kong. The evidence is as follows:

image image image image

@cryptowhizzard
Copy link

Well, that is great!

Can you make the CID available for download on a simple URL so i can do data inspection? Would love to check the data that you are storing. If it's compliant then we can move on ;-)

@QodeNu
Copy link
Author

QodeNu commented Aug 17, 2023

@cryptowhizzard

#HTTP retrieve command
wget http://IP:port/piece/pieceCID

#e.g
wget http://128.1.207.98:23459/piece/baga6ea4seaqhtldlc32q5dtdrmg3a3wepfufmg3gztklk2sstddhr2so764t4my

baga6ea4seaqd3gwoyzgdyanwhgqfsiohj2rjguockg7jfbvdisydltf6ll3fgga
baga6ea4seaqnh37g6nfowrung3bboprzk2rwqawsgthhqw5svcmjqphkzs4s6da
baga6ea4seaqdwebqq7xfe5ajpomrp7pnuh2xhtmlb2lleyfqrpvrsbfk4ahocdq
baga6ea4seaqk5f2mus6przib6jtgga6oz7ykh4l42aowhvkt5zkngy4vrqsoicy
baga6ea4seaqdp6plztcmasua6pe3zbq7epug2irmnzgha6xzbhunyn4mlotrkgi
baga6ea4seaqkprceyzop7fyqyjestkkblloov2eyxukqqp6fpl67rcjlwsbwcny

@cryptowhizzard
Copy link

Thanks!

I retrieveved baga6ea4seaqhtldlc32q5dtdrmg3a3wepfufmg3gztklk2sstddhr2so764t4my succesfully.

Unpacking i see the filenames:

DATAME1/POWER/PaxHeaders.0/power_901_hourly_radiation_utc.zarr.tar.split-6 and some other solar radiation files.

This is not the data you said to be storing?

Please explain.

@QodeNu
Copy link
Author

QodeNu commented Aug 18, 2023

@cryptowhizzard
Firstly, there was an error in the storage of data, and I apologize for not conducting a thorough review. Allow me to clarify the situation regarding this data:

The data stored by the SP we collaborate with comes from publicly available datasets. We did not intentionally fabricate this data. However, it is true that the data was prepared by us, so there must have been an issue at some point in the process. I apologize once again for this.

Additionally, according to normal logic, I wouldn't have been able to disclose problematic CIDs since this was indeed an accident. The possible scenario is that our technical personnel mixed up the data from another publicly available dataset that we were preparing to submit for an application.

I apologize once again.

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@QodeNu
Copy link
Author

QodeNu commented Sep 1, 2023

This application needs to remain open

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

--
Commented by Stale Bot.

@github-actions
Copy link

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

--
Commented by Stale Bot.

Client f02208492 does not follow the datacap usage rules. More info here.
This application has been failing the requirements for 7 days.
Please take appropiate action to fix the following DataCap usage problems.

Criteria Treshold Reason
Cid Checker score > 25% The client has a CID checker score of 12%. This should be greater than 25%. To find out more about CID checker score please look at this issue: filecoin-project/notary-governance#986

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests