Skip to content

This Python script processes an XML file containing people's data, validates and cleans the data, categorizes individuals as "Adult" or "Child" based on age, generates a summary report, and optionally outputs a bar graph of the average age by city.

Notifications You must be signed in to change notification settings

audreymhoughton/techchallenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

People Data XML Processing and Report Generation

This Python script processes an XML file containing people's data, validates and cleans the data, categorizes individuals by city as "Adult" or "Child" based on age, generates a summary report, and optionally outputs a bar graph of the average age by city.

Features

  • XML Parsing: Reads the XML file and extracts relevant data (name, id, date of birth, address, etc.).
  • Data Processing: Validates, formats dates, infers missing country data based on zip codes, and filters invalid records1.
  • Age Categorization: Categorizes individuals as "Adult"2 or "Child"3 based on their age.
  • Report Generation: Outputs a JSON file named age_categorized_by_city_<YYYYMMDD>.json that contains the number of adults and children for each city.
  • Graph Generation: Optionally generates a bar graph showing the average age by city4 and saves it as a PNG file.
  • File Dating: Outputs filenames with the current date (in the format YYYYMMDD)

Requirements

This code was created utilizing Python 3.12.8 and has been tested on that version alone. Make sure you have the required dependencies installed. You can install them by running:

pip install -r requirements.txt

Usage

To use the script, run it with the following arguments:

python script.py <input_file> <output_path> [--output_graph]

Arguments:

  • input_file:
    • Required. Path to the input XML file containing people data.
  • output_path:
    • Required. Path to the output directory where the JSON report and the graph (if requested) will be saved. The output JSON file will include the number of adults and children by city, and will be named age_categorized_by_city_<YYYYMMDD>.json.
  • --output_graph:
    • Optional flag. If specified, a bar graph of average age by city will be generated and saved as average_age_by_city_<YYYYMMDD>.png in the output directory.

Example

python script.py people_data.xml ./output/

This will process the people_data.xml file and save the JSON report to ./output/age_categorized_by_city_<YYYYMMDD>.json.

To also generate the bar graph, run:

python script.py people_data.xml ./output/ --output_graph

This will also generate and save the bar graph as average_age_by_city_<YYYYMMDD>.png in the output directory.

Input File

The path to the XML file containing the people data. This file should be structured in a way that each person is wrapped in a <person> element, and details like name, dob, zipcode, and address should be inside respective child elements. The script processes the data from this XML file for further operations.

Outputs

  • JSON Report: age_categorized_by_city_<YYYYMMDD>.json – Contains the summary of the number of Adults and Children by city.
  • Bar Graph: average_age_by_city_<YYYYMMDD>.png (if --output_graph is provided) – A bar graph of the average age by city.

Example Data

For your convenience, a example input file and outputs from that input data have been provided under example_data within this repository.

Definitions

1 An invalid record refers to any person with a blank or NaN record within one of the values; reports are only generated from people with complete data.
2 Over the age of 18, determined by their date of birth and today's date.
3 Under the age of 18, determined by their date of birth and today's date.
4 Rounded to the nearest whole number, as bar graphs prefer whole numbers.

About

This Python script processes an XML file containing people's data, validates and cleans the data, categorizes individuals as "Adult" or "Child" based on age, generates a summary report, and optionally outputs a bar graph of the average age by city.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages