This tutorial covers how to generate a StreamSets Control Hub Data Delivery Report for a specific job and then; fetch and download it.
Data delivery reports provide data processing metrics for a given job or topology. For example, you can use reports to view the number of records that were processed by a job or topology the previous day.
Make sure to complete Prerequisites for the jobs tutorial.
While creating this tutorial following was used:
- Python 3.6
- StreamSets for SDK 3.8.0
- All StreamSets Data Collector with version 3.17.0
In Prerequisites for the jobs tutorial, one job was created with name 'Job for Kirti-HelloWorld'. This tutorial shows the following:
- Create a report definition
- Generate a report
- Download that report
On a terminal, type the following command to open a Python 3 interpreter.
$ python3
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Let’s assume the StreamSets Control Hub is running at http://sch.streamsets.com Create an object called control_hub which is connected to the above.
from streamsets.sdk import ControlHub
# Replace the argument values according to your setup
control_hub = ControlHub(server_url='http://sch.streamsets.com',
username='user@organization1',
password='password')
Following code shows how to create a report definition for the job with name 'Job for Kirti-HelloWorld' using StreamSets SDK for Python. Optionally you can create the same using UI on the browser.
# Get the specific job using job name
job = control_hub.jobs.get(job_name='Job for Kirti-HelloWorld')
# Create Report Definition
report_definition_builder = control_hub.get_report_definition_builder()
report_definition_builder.set_data_retrieval_period(start_time='${time:now() - 30 * MINUTES}',
end_time='${time:now()}')
# Specify the selected job as the report resource
report_definition_builder.add_report_resource(job)
report_definition = report_definition_builder.build(name='Kirti-HelloWorld-Report')
control_hub.add_report_definition(report_definition)
Above code produces report definition like following:
report_definition = control_hub.report_definitions.get(name='Kirti-HelloWorld-Report-Def')
# Generate Report
report_command = report_definition.generate_report()
report_id = report_command.response['id']
# Wait for the report to be ready
report = report_definition.reports.get(id=report_id)
while report.report_status == 'REPORT_SUCCESS':
time.sleep(5)
report = report_definition.reports.get(id=report_id)
# Fetch the report
report = report_definition.reports.get(id=report_id)
print(f'Fetched report = {report}')
Above code generates report like following:
# Another way to get the report
report = report_definition.reports[0]
# download report pdf and write it to a file called report.pdf in the current directory
report_content = report.download()
with open('report.pdf', 'wb') as report_file:
report_file.write(report_content)
To get to know more details about SDK for Python, check the SDK documentation.
If you encounter any problems with this tutorial, please file an issue in the tutorials project.