This Python script processes an XML file containing people's data, validates and cleans the data, categorizes individuals by city as "Adult" or "Child" based on age, generates a summary report, and optionally outputs a bar graph of the average age by city.
- XML Parsing: Reads the XML file and extracts relevant data (name, id, date of birth, address, etc.).
- Data Processing: Validates, formats dates, infers missing country data based on zip codes, and filters invalid records1.
- Age Categorization: Categorizes individuals as "Adult"2 or "Child"3 based on their age.
- Report Generation: Outputs a JSON file named
age_categorized_by_city_<YYYYMMDD>.json
that contains the number of adults and children for each city. - Graph Generation: Optionally generates a bar graph showing the average age by city4 and saves it as a PNG file.
- File Dating: Outputs filenames with the current date (in the format
YYYYMMDD
)
This code was created utilizing Python 3.12.8
and has been tested on that version alone. Make sure you have the required dependencies installed. You can install them by running:
pip install -r requirements.txt
To use the script, run it with the following arguments:
python script.py <input_file> <output_path> [--output_graph]
input_file
:- Required. Path to the input XML file containing people data.
output_path
:- Required. Path to the output directory where the JSON report and the graph (if requested) will be saved. The output JSON file will include the number of adults and children by city, and will be named
age_categorized_by_city_<YYYYMMDD>.json
.
- Required. Path to the output directory where the JSON report and the graph (if requested) will be saved. The output JSON file will include the number of adults and children by city, and will be named
--output_graph
:- Optional flag. If specified, a bar graph of average age by city will be generated and saved as
average_age_by_city_<YYYYMMDD>.png
in the output directory.
- Optional flag. If specified, a bar graph of average age by city will be generated and saved as
python script.py people_data.xml ./output/
This will process the people_data.xml file and save the JSON report to ./output/age_categorized_by_city_<YYYYMMDD>.json
.
python script.py people_data.xml ./output/ --output_graph
This will also generate and save the bar graph as average_age_by_city_<YYYYMMDD>.png
in the output directory.
The path to the XML file containing the people data. This file should be structured in a way that each person is wrapped in a <person>
element, and details like name
, dob
, zipcode
, and address
should be inside respective child elements. The script processes the data from this XML file for further operations.
- JSON Report:
age_categorized_by_city_<YYYYMMDD>.json
– Contains the summary of the number of Adults and Children by city. - Bar Graph:
average_age_by_city_<YYYYMMDD>.png
(if--output_graph
is provided) – A bar graph of the average age by city.
For your convenience, a example input file and outputs from that input data have been provided under example_data
within this repository.
1 An invalid record refers to any person with a blank or NaN record within one of the values; reports are only generated from people with complete data.
2 Over the age of 18, determined by their date of birth and today's date.
3 Under the age of 18, determined by their date of birth and today's date.
4 Rounded to the nearest whole number, as bar graphs prefer whole numbers.