A backend service written in python that scrapes Google Cloud Skills Boost profiles to track skill badges and arcade game completions. The project leverages the Starlette framework for the backend API, along with a scheduler to periodically update profile data. Users can retrieve profile information through a RESTful API.
- Description
- Features
- Project Structure
- Requirements
- Setup
- Disclaimer
- Usage
- Contributing
- License
- Support
- Scrapes Google Cloud Skills Boost profiles.
- Extracts detailed profile information, including:
- Username
- League
- Membership duration
- Earned points
- Badges (with images and earned dates)
- Stores data in a JSON format for easy access.
- Provides an API to fetch all profiles or a specific profile by its ID.
- Includes a scheduler to automatically update profiles at specified intervals.
- Logging system for monitoring server activity.
gcsbtracker-backend/
├── getData.py # Script to read CSV, scrape profiles, and save to JSON
├── scraper.py # Contains the Scraper class for scraping profile data
├── server.py # The Starlette server for the API
├── requirements.txt # List of project dependencies
├── profiles_data.json # Output JSON file for profile data (generated)
├── completed_queries.csv # CSV file for storing resolved queries
├── templates/ # Directory containing HTML templates for rendering
│ ├── admin_dashboard.html # Template for the admin dashboard
│ ├── admin_login.html # Template for the admin login page
│ └── homepage.html # Template for the homepage with query submission
└── static/ # Directory for serving static files (e.g., CSS)
├── styles.css # Stylesheet for the application
├── admin_login.css # Stylesheet for the admin login
└── admin_dashboard.css # Stylesheet for the admin dashboard
- Admin Login:
- An admin login system has been implemented, requiring authentication to access the admin dashboard.
- Admin ID:
admin@gcsb.makaut.in
- Admin Password:
admin6969
- Query Submission and Management:
- Users can submit queries through a form on the homepage.
- Admins can view, resolve, and manage submitted queries via the admin dashboard.
- Completed queries are logged and stored in a CSV file (
completed_queries.csv
) with timestamps for tracking purposes.
-
Scheduled Data Updates:
- The backend now includes a scheduler that automatically checks for updates and scrapes new data at specified intervals (every 30 minutes).
-
Enhanced Logging:
- A logging system has been implemented for monitoring server activity and error tracking. Logs are saved in a
server.log
file for further analysis.
- A logging system has been implemented for monitoring server activity and error tracking. Logs are saved in a
-
Profile Fetching:
- Users can fetch all profiles or a specific profile by its ID via RESTful API endpoints.
-
Admin Login:
- Navigate to
http://localhost:8000/admin/login
to log in as an admin. - Use the provided admin ID and password to access the dashboard.
- Navigate to
-
Query Submission:
- Navigate to
http://localhost:8000/
to submit queries. - Queries submitted by users will be logged and can be managed in the admin dashboard.
- Navigate to
-
Profile Fetching:
- To fetch all profiles, run:
curl http://localhost:8000/profiles
- To fetch a specific profile by ID, run:
curl http://localhost:8000/profiles/id/{profile_id}
- To fetch all profiles, run:
-
Viewing Profiles in a Browser:
- Navigate to
http://localhost:8000/profiles/
to view all profiles. - Navigate to
http://localhost:8000/profiles/id/{id}
to view a specific profile by its ID.
- Navigate to
- Python 3.7 or higher
- Required libraries listed in requirements.txt
- A csv file(GCSJ_data.csv) containing all the profile links to run getData.py
- for a single profile scraping use scraper.py
To install the required libraries, run:
pip install -r requirements.txt
- Clone the repository.
git clone https://github.com/psypherion/gcsbtracker-backend.git
- Navigate to the cloned directory.
cd gcsbtracker-backend
- Install the required libraries.
pip install -r requirements.txt
- Run the server.
uvicorn server:app
or,
python server.py
With the new V 0.2 Update no need to manually run the getData.py to get the database ready running the server.py will automatically check for the existence of required files and if the database (profiles_data.json) not present it'll check for the GCSJ_data.csv if present it'll automatically create the database first and then host the server and for user firendliness a homepage is also added.
To fetch all profiles, run:
curl http://localhost:8000/profiles
To fetch a specific profile by ID, run:
curl http://localhost:8000/profiles/id/{profile_id}
-
Navigate to http://localhost:8000/profiles/
-
Navigate to http://localhost:8000/profiles/id/{id}
If you'd like to contribute to the project, please follow these steps:
- Create a pull request on GitHub.
- Add the necessary changes to the README.md file.
- Submit a pull request to the repository.
This project is licensed under the BSD 2-Clause License
If you have any questions or feedback, please open an issue or open a pull request