Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading metadata from additional file #45

Open
StephenQuirolgico opened this issue Nov 14, 2019 · 5 comments
Open

Reading metadata from additional file #45

StephenQuirolgico opened this issue Nov 14, 2019 · 5 comments

Comments

@StephenQuirolgico
Copy link

@IanLee1521 - Can't recall if this was already requested elsewhere, but is it possible to enhance the scraper to also read metadata from an additional file in a repo? The rationale would be to allow developers to have more control over the metadata that is provided, and to provide metadata that may not be scraped by the scraper.

@leebrian
Copy link
Collaborator

I think it would be helpful to read a code.json file in the root of the repo. During the GSA calls, at least two programs said they did something similar. I would like to bring this up on a GSA call and have them put out some guidance on code.gov to help shape the implementation here.

The local process we use on top of scraper is to read a code.json and use its values to override the project settings in the combined agency code.json. It's a bit of a hack, but it lets me use the exact same schema. We do this on the openCDC repo.

@IanLee1521
Copy link
Member

Certainly doable, I believe this was last on @jcastle's plate as there was to be a discussion in the bi-weekly calls (or other spin off calls) to figure out the best way to implement this. (and e.g. what to name the file).

@jcastle-zz
Copy link

Let's add this to the metadata brainstorm. Will send out an invite for that discussion to begin next week.

@IanLee1521
Copy link
Member

IanLee1521 commented Jan 22, 2020

I will wait for the official answer from @jcastle / Amin but I propose that we name the file .code_gov.json and that it should have the same format as the “repository” object in the metadata schema (currently called “release”).

If it does, any fields that match what comes from the API will be replaced. Example from gsa.gov/code.json, where all the values are explicitly in the file:

{
      "contact": {
        "URL": "https://github.com/18F",
        "email": "18f@gsa.gov"
      },
      "date": {
        "created": "2013-07-17",
        "lastModified": "2019-05-02"
      },
      "description": "A hosted, shared-service that provides an API key, analytics, and proxy solution for government web services.",
      "downloadURL": "https://api.github.com/repos/18F/api.data.gov/downloads",
      "homepageURL": "https://github.com/18F/api.data.gov",
      "laborHours": 1216,
      "languages": [
        "HTML",
        "Ruby",
        "CSS",
        "JavaScript"
      ],
      "name": "api.data.gov",
      "organization": "18F",
      "permissions": {
        "licenses": [
          {
            "name": "NOASSERTION"
          }
        ],
        "usageType": "openSource"
      },
      "repositoryURL": "https://github.com/18F/api.data.gov",
      "status": "Development",
      "tags": [
        "github"
      ],
      "vcs": "git"
}

Example where only a couple fields (tags and contact:email) are overridden:

{
      "contact": {
        "email": "jcastle@gsa.gov"
      },
      "tags": [
        "github",
        "code_gov"
      ]
}

What do you all think of that?

@jcastle-zz
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants