This repository contains a Golang service that interacts with GitHub's public APIs to fetch and store repository information and commit history. The service also includes functionality to monitor repositories for changes and keep the stored data in sync with GitHub.
- GitHub API Data Fetching Service
- Table of Contents
- Project Structure
- Getting Started
- Tests
- Deployment
- Performance Considerations
- Error Handling
- API Documentation
- API Response Structure
- Error Codes Responses
- Contributing
- License
- Resources
.
├── cmd
│ └── reposvc # Entry point for the service
│ └── main.go
├── config
│ └── local.yml # Configuration settings
├── internal
│ ├── config # App configuration logic
│ ├── healtcheck # API healthcheck logic
│ ├── githubrepository # GitHub API interaction logic
│ ├── db # Database models and interactions
│ └── tests # Unit tests for core functions
├── pkg
│ └── log # Log logic for the application
├── scripts
│ └── *.sql # Database migration scripts
│ └── seed_data.go # Script to seed the database with initial data
├── docker-compose.yml # Docker Compose for multi-container setup
├── Dockerfile # Dockerfile to containerize the application
└── README.md # This file
- Go (version 1.22+)
- PostgreSQL
- Docker (for containerized deployment)
-
Clone the repository:
git clone https://github.com/omept/reposvc.git cd reposvc
-
Run with Docker if you have it installed:
docker compose up --build -d
Alternatively, continue the steps below to build the aplication manually
-
Get go dependencies:
go mod tidy
-
Set up the environment variables by creating a
.env
file based on.env.example
and get go dependencies.cp .env.example .env
-
Update the .env credentials to match your machine. Postgres is the database used in the application.
-
Seed the database (optional):
go run scripts/seed_data.go
-
Build and run the service:
go build -o github-repo-indexer ./cmd/reposvc ./github-repo-indexer
All configuration settings (e.g. database connections) are managed through environment variables defined in the .env
file.
-
Fetch and store repository and commit data:
The service automatically fetches and stores data for the specified repositories based on the configuration. The data is stored in a PostgreSQL database.
-
Monitor repository for changes:
The service continuously monitors the repository and updates the database with new commits as they appear on GitHub.
- Check API docs section on how to get the top N commit authors by commit counts from the database:
The project includes unit tests for core functionalities, located in the internal/tests
directory. To run the tests:
go test ./internal/tests
For deployment, the service can be containerized using Docker. Use the provided Dockerfile
and docker-compose.yml
for easy setup.
docker compose up -d
- Efficient data storage: Indexed fields and optimized queries ensure fast retrieval of commit data.
- Scalability: The service is designed to handle large datasets and can be scaled horizontally with multiple instances.
The service includes robust error handling with clear and meaningful error messages. All critical operations are monitored and logged.
POST /api/v1/index-github-repository
Index repository gets repository details and commit history from GitHub based on the specified repository name and owner, then saves it locally. The repository is also monitored for future commits.
The request body should be a JSON object with the following structure:
{
"repo": "string",
"owner": "string"
}
repo
(string, required): The name of the repository to fetch.owner
(string, required): The GitHub username or organization name that owns the repository.
- Ensure that the
repo
andowner
fields are correctly specified as required to avoid validation errors.
GET /api/v1/fetch-github-repository
Fetch repository details and commit history from GitHub based on the specified repository name and owner. The results can be filtered and paginated using the provided parameters. Defaults to 10 commits per page.
The request body should be a JSON object with the following structure:
{
"repo": "string",
"owner": "string",
"commit_filter": {
"per_page": "uint16",
"page": "uint16"
}
}
repo
(string, required): The name of the repository to fetch.owner
(string, required): The GitHub username or organization name that owns the repository.commit_filter
(object, optional): A filter object to paginate the commit history.per_page
(uint16, optional): The number of commits to return per page. Defaults to a standard value if not provided.page
(uint16, optional): The page number to retrieve, useful for paginated results. Defaults to the first page if not provided.
- Ensure that the
repo
andowner
fields are correctly specified as required to avoid validation errors. - Use the
commit_filter
for efficient pagination, especially when dealing with repositories with a large number of commits.
GET /api/v1/top-commit-authors
Fetch the top committers by commit count. The limit is the max number of rows returned and it defaults to 10.
The request body should be a JSON object with the following structure:
{
"limit": "uint16"
}
limit
(uint16, optional): The number of authors to return. Defaults to a standard value if not provided.
DELETE /api/v1/truncate-commits-from
Truncates the commits of a repository from a specific date and triggers commit indexing from the date.
The request body should be a JSON object with the following structure:
{
"repo": "string",
"owner": "string",
"date": "date"
}
date
(date, required): The date to truncate a repository's commits from. e.g "2024
-05-01".
repo
(string, required): The name of the repository to fetch.owner
(string, required): The GitHub username or organization name that owns the repository.
- Ensure that the
repo
andowner
fields are correctly specified as required to avoid validation errors.
{
"message": "string response",
"status_code": "integer response code, e.g 200, 400",
"data": "object",
"error": "string error message"
}
-
400: Bad Request.
-
404: Not found.
-
500: Internal Server Error
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
This project is licensed under the MIT License. See the LICENSE
file for details.