This module is responsible for generating and managing data for the Arkaid project, a game performance analytical tool that utilizes data warehouse approaches and integrates data from multiple sources.
The Data Generation module handles the creation, transformation, and management of various gaming-related datasets including:
- Game data (Steam and Epic platforms)
- Player information
- Developer and publisher details
- Content creator and modder data
- Game statistics and performance metrics
extract_steam_games.py
: Extracts and processes Steam games dataupdate_steam_games.py
: Updates Steam games with additional metricsupdate_epic_games.py
: Manages Epic games data updatesupdate_epic_ids.py
: Handles Epic game ID managementgenerate_game_stats.py
: Creates comprehensive game statistics
generate_players.py
: Creates player profiles and datagenerate_steam_players.py
: Generates Steam-specific player datagenerate_epic_players.py
: Creates Epic Games player profilesgenerate_player_stats.py
: Generates player statistics and metricsgenerate_player_sales.py
: Creates player purchase and sales data
generate_content_creators.py
: Generates content creator profilesgenerate_modders.py
: Creates modder data and associationsupdate_creators_modders.py
: Updates content creator and modder information
devs.py
: Manages developer data generationpub.py
: Handles publisher data creationfilter_dev_pub.py
: Filters and processes developer/publisher data
move_recently_played.py
: Manages recently played game dataremove_library_columns.py
: Handles library column managementreorder_columns.py
: Manages column ordering in datasetssplit_players_table.py
: Splits player data into normalized tables
Each script can be run independently to generate or update specific datasets. The general workflow is:
- Generate base game data:
python extract_steam_games.py
python update_epic_games.py
- Create player profiles:
python generate_players.py
python generate_steam_players.py
python generate_epic_players.py
- Generate additional data:
python generate_game_stats.py
python generate_content_creators.py
python generate_modders.py
- Update and maintain data:
python update_creators_modders.py
python update_game_stats.py
python update_player_ratings.py
The module handles various data types including:
- Text data for names, descriptions, and identifiers
- Numeric data for statistics and metrics
- Date/time data for temporal information
- Boolean flags for status indicators
- Arrays and complex data structures for related information
- Python 3.x
- pandas
- numpy
- faker
- tqdm
- psycopg2
- pycountry
- jellyfish
- All generated data follows specific patterns and distributions to maintain realism
- Data generation takes into account relationships between different entities
- Scripts include error handling and validation to ensure data integrity
- Generated data is used to populate the data warehouse for analytics