diff --git a/CHANGELOG.md b/CHANGELOG.md index 95a37cb9e8e..fe83bc5c1f1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ All notable changes to this project will be documented in this file. - Add generating feature cli_report.log in output for otx training () - Support multiple python versions up to 3.10 () - Support export of onnx models () +- Add option to save images after inference in OTX CLI demo together with demo in exportable code () ### Enhancements diff --git a/docs/source/guide/get_started/quick_start_guide/cli_commands.rst b/docs/source/guide/get_started/quick_start_guide/cli_commands.rst index d2d9da0bd39..4c2dedd348d 100644 --- a/docs/source/guide/get_started/quick_start_guide/cli_commands.rst +++ b/docs/source/guide/get_started/quick_start_guide/cli_commands.rst @@ -474,7 +474,7 @@ Demonstration .. code-block:: (otx) ...$ otx demo --help - usage: otx demo [-h] -i INPUT --load-weights LOAD_WEIGHTS [--fit-to-size FIT_TO_SIZE FIT_TO_SIZE] [--loop] [--delay DELAY] [--display-perf] [template] {params} ... + usage: otx demo [-h] -i INPUT --load-weights LOAD_WEIGHTS [--fit-to-size FIT_TO_SIZE FIT_TO_SIZE] [--loop] [--delay DELAY] [--display-perf] [--output OUTPUT] [template] {params} ... positional arguments: template Enter the path or ID or name of the template file. @@ -493,7 +493,8 @@ Demonstration --loop Enable reading the input in a loop. --delay DELAY Frame visualization time in ms. --display-perf This option enables writing performance metrics on displayed frame. These metrics take into account not only model inference time, but also frame reading, pre-processing and post-processing. - + --output OUTPUT + Output path to save input data with predictions. Command example of the demonstration: diff --git a/docs/source/guide/tutorials/base/demo.rst b/docs/source/guide/tutorials/base/demo.rst index 735cc664515..ad700d52fae 100644 --- a/docs/source/guide/tutorials/base/demo.rst +++ b/docs/source/guide/tutorials/base/demo.rst @@ -34,9 +34,14 @@ But if we'll provide a single image the demo processes and renders it quickly, t (demo) ...$ otx demo --input docs/utils/images/wgisd_dataset_sample.jpg \ --load-weights outputs/weights.pth --loop -In this case, you can stop the demo by killing the process in the terminal (``Ctrl+C`` for Linux). +In this case, you can stop the demo by pressing `Q` button or killing the process in the terminal (``Ctrl+C`` for Linux). -3. In WGISD dataset we have high-resolution images, +3. If we want to pass an images folder, it's better to specify the delay parameter, that defines, how much millisecond pause will be held between showing the next image. +For example ``--delay 100`` will make this pause 0.1 ms. +If you want to skip showing the resulting image and instead see the number of predictions and time spent on each image inference, specify ``--delay 0``. + + +4. In WGISD dataset we have high-resolution images, so the ``--fit-to-size`` parameter would be quite useful. It resizes the resulting image to a specified: .. code-block:: @@ -44,11 +49,17 @@ so the ``--fit-to-size`` parameter would be quite useful. It resizes the resulti (demo) ...$ otx demo --input docs/utils/images/wgisd_dataset_sample.jpg \ --load-weights outputs/weights.pth --loop --fit-to-size 800 600 -4. If we want to pass an images folder, it's better to specify the delay parameter, that defines, how much millisecond pause will be held between showing the next image. -For example ``--delay 100`` will make this pause 0.1 ms. +5. To save inferenced results with predictions on it, we can specify the folder path, using ``--output``. +It works for images, videos, image folders and web cameras. To prevent issues, do not specify it together with a ``--loop`` parameter. -5. If we want to show inference speed right on images, +.. code-block:: + + (demo) ...$ otx demo --input docs/utils/images/wgisd_dataset_sample.jpg \ + --load-weights outputs/weights.pth \ + --output resulted_images + +6. If we want to show inference speed right on images, we can run the following line: .. code-block:: @@ -57,12 +68,6 @@ we can run the following line: --load-weights outputs/weights.pth --loop \ --fit-to-size 800 600 --display-perf -.. The result will look like this: - -.. .. image:: ../../../../utils/images/wgisd_pr_sample.jpg -.. :width: 600 -.. :alt: this image shows the inference results with inference time on the WGISD dataset -.. image to be generated and added 6. To run a demo on a web camera, you need to know its ID. You can check a list of camera devices by running the command line below on Linux system: diff --git a/docs/source/guide/tutorials/base/deploy.rst b/docs/source/guide/tutorials/base/deploy.rst index dfba9bda762..9a4d4ddc5ed 100644 --- a/docs/source/guide/tutorials/base/deploy.rst +++ b/docs/source/guide/tutorials/base/deploy.rst @@ -100,11 +100,20 @@ For example, the model inference on image from WGISD dataset, which we used for If you provide a single image as input, the demo processes and renders it quickly, then exits. To continuously visualize inference results on the screen, apply the ``loop`` option, which enforces processing a single image in a loop. - In this case, you can stop the demo by killing the process in the terminal (``Ctrl+C`` for Linux). + In this case, you can stop the demo by pressing `Q` button or killing the process in the terminal (``Ctrl+C`` for Linux). To learn how to run the demo on Windows and MacOS, please refer to the ``outputs/deploy/python/README.md`` file in exportable code. -4. To run a demo on a web camera, we need to know its ID. +4. To save inferenced results with predictions on it, we can specify the folder path, using ``--output``. +It works for images, videos, image folders and web cameras. To prevent issues, do not specify it together with a ``--loop`` parameter. + +.. code-block:: + + (demo) ...$ python outputs/deploy/python/demo.py --input docs/utils/images/wgisd_dataset_sample.jpg \ + --models outputs/deploy/model \ + --output resulted_images + +5. To run a demo on a web camera, we need to know its ID. We can check a list of camera devices by running this command line on Linux system: .. code-block:: @@ -121,7 +130,7 @@ The output will look like this: After that, we can use this ``/dev/video0`` as a camera ID for ``--input``. -5. We can also change ``config.json`` that specifies the confidence threshold and +6. We can also change ``config.json`` that specifies the confidence threshold and color for each class visualization, but any changes should be made with caution. For example, in our image of the winery we see, that a lot of objects weren't detected. diff --git a/otx/api/usecases/exportable_code/demo/README.md b/otx/api/usecases/exportable_code/demo/README.md index 7de5c35382e..f4f716f3a7d 100644 --- a/otx/api/usecases/exportable_code/demo/README.md +++ b/otx/api/usecases/exportable_code/demo/README.md @@ -85,44 +85,71 @@ Exportable code is a .zip archive that contains simple demo to get and visualize ## Usecase -Running the `demo.py` application with the `-h` option yields the following usage message: - -```bash -usage: demo.py [-h] -i INPUT -m MODELS [MODELS ...] [-it {sync,async}] [-l] -Options: - -h, --help Show this help message and exit. - -i INPUT, --input INPUT - Required. An input to process. The input must be a - single image, a folder of images, video file or camera - id. - -m MODELS [MODELS ...], --models MODELS [MODELS ...] - Required. Path to directory with trained model and - configuration file. If you provide several models you - will start the task chain pipeline with the provided - models in the order in which they were specified - -it {sync,async}, --inference_type {sync,async} - Optional. Type of inference for single model - -l, --loop Optional. Enable reading the input in a loop. - --no_show - Optional. If this flag is specified, the demo - won't show the inference results on UI. -``` - -As a model, you can use path to model directory from generated zip. So you can use the following command to do inference with a pre-trained model: - -```bash -python3 demo.py \ - -i /inputVideo.mp4 \ - -m -``` - -You can press `Q` to stop inference during demo running. - -> **NOTE**: If you provide a single image as an input, the demo processes and renders it quickly, then exits. To continuously -> visualize inference results on the screen, apply the `loop` option, which enforces processing a single image in a loop. -> -> **NOTE**: Default configuration contains info about pre- and post processing for inference and is guaranteed to be correct. -> Also you can change `config.json` that specifies needed parameters, but any changes should be made with caution. +1. Running the `demo.py` application with the `-h` option yields the following usage message: + + ```bash + usage: demo.py [-h] -i INPUT -m MODELS [MODELS ...] [-it {sync,async}] [-l] [--no_show] [-d {CPU,GPU}] [--output OUTPUT] + + Options: + -h, --help Show this help message and exit. + -i INPUT, --input INPUT + Required. An input to process. The input must be a single image, a folder of images, video file or camera id. + -m MODELS [MODELS ...], --models MODELS [MODELS ...] + Required. Path to directory with trained model and configuration file. If you provide several models you will start the task chain pipeline with the provided models in the order in + which they were specified. + -it {sync,async}, --inference_type {sync,async} + Optional. Type of inference for single model. + -l, --loop Optional. Enable reading the input in a loop. + --no_show Optional. Disables showing inference results on UI. + -d {CPU,GPU}, --device {CPU,GPU} + Optional. Device to infer the model. + --output OUTPUT Optional. Output path to save input data with predictions. + ``` + +2. As a `model`, you can use path to model directory from generated zip. You can pass as `input` a single image, a folder of images, a video file, or a web camera id. So you can use the following command to do inference with a pre-trained model: + + ```bash + python3 demo.py \ + -i /inputVideo.mp4 \ + -m + ``` + + You can press `Q` to stop inference during demo running. + + > **NOTE**: If you provide a single image as input, the demo processes and renders it quickly, then exits. To continuously + > visualize inference results on the screen, apply the `--loop` option, which enforces processing a single image in a loop. + > In this case, you can stop the demo by pressing `Q` button or killing the process in the terminal (`Ctrl+C` for Linux). + > + > **NOTE**: Default configuration contains info about pre- and post processing for inference and is guaranteed to be correct. + > Also you can change `config.json` that specifies the confidence threshold and color for each class visualization, but any + > changes should be made with caution. + +3. To save inferenced results with predictions on it, you can specify the folder path, using `--output`. + It works for images, videos, image folders and web cameras. To prevent issues, do not specify it together with a `--loop` parameter. + + ```bash + python3 demo.py \ + --input /inputImage.jpg \ + --models ../model \ + --output resulted_images + ``` + +4. To run a demo on a web camera, you need to know its ID. + You can check a list of camera devices by running this command line on Linux system: + + ```bash + sudo apt-get install v4l-utils + v4l2-ctl --list-devices + ``` + + The output will look like this: + + ```bash + Integrated Camera (usb-0000:00:1a.0-1.6): + /dev/video0 + ``` + + After that, you can use this `/dev/video0` as a camera ID for `--input`. ## Troubleshooting diff --git a/otx/api/usecases/exportable_code/demo/demo.py b/otx/api/usecases/exportable_code/demo/demo.py index a7bda9e8938..54fba44f634 100644 --- a/otx/api/usecases/exportable_code/demo/demo.py +++ b/otx/api/usecases/exportable_code/demo/demo.py @@ -74,6 +74,12 @@ def build_argparser(): default="CPU", type=str, ) + args.add_argument( + "--output", + default=None, + type=str, + help="Optional. Output path to save input data with predictions.", + ) return parser @@ -96,6 +102,10 @@ def get_inferencer_class(type_inference, models): def main(): """Main function that is used to run demo.""" args = build_argparser().parse_args() + + if args.loop and args.output: + raise ValueError("--loop and --output cannot be both specified") + # create models models = [] for model_dir in args.models: @@ -105,7 +115,7 @@ def main(): inferencer = get_inferencer_class(args.inference_type, models) # create visualizer - visualizer = create_visualizer(models[-1].task_type, no_show=args.no_show) + visualizer = create_visualizer(models[-1].task_type, no_show=args.no_show, output=args.output) if len(models) == 1: models = models[0] diff --git a/otx/api/usecases/exportable_code/demo/demo_package/executors/asynchronous.py b/otx/api/usecases/exportable_code/demo/demo_package/executors/asynchronous.py index 0d935ef1815..852f7519b5f 100644 --- a/otx/api/usecases/exportable_code/demo/demo_package/executors/asynchronous.py +++ b/otx/api/usecases/exportable_code/demo/demo_package/executors/asynchronous.py @@ -16,6 +16,7 @@ ) from otx.api.usecases.exportable_code.streamer import get_streamer from otx.api.usecases.exportable_code.visualizers import Visualizer +from otx.cli.tools.utils.demo.visualization import dump_frames class AsyncExecutor: @@ -38,6 +39,7 @@ def run(self, input_stream: Union[int, str], loop: bool = False) -> None: next_frame_id = 0 next_frame_id_to_show = 0 stop_visualization = False + saved_frames = [] for frame in streamer: results = self.async_pipeline.get_result(next_frame_id_to_show) @@ -45,6 +47,8 @@ def run(self, input_stream: Union[int, str], loop: bool = False) -> None: output = self.render_result(results) next_frame_id_to_show += 1 self.visualizer.show(output) + if self.visualizer.output: + saved_frames.append(frame) if self.visualizer.is_quit(): stop_visualization = True results = self.async_pipeline.get_result(next_frame_id_to_show) @@ -57,6 +61,7 @@ def run(self, input_stream: Union[int, str], loop: bool = False) -> None: results = self.async_pipeline.get_result(next_frame_id_to_show) output = self.render_result(results) self.visualizer.show(output) + dump_frames(saved_frames, self.visualizer.output, input_stream, streamer) def render_result(self, results: Tuple[Any, dict]) -> np.ndarray: """Render for results of inference.""" diff --git a/otx/api/usecases/exportable_code/demo/demo_package/executors/sync_pipeline.py b/otx/api/usecases/exportable_code/demo/demo_package/executors/sync_pipeline.py index eae537837ff..49f8cb8840b 100644 --- a/otx/api/usecases/exportable_code/demo/demo_package/executors/sync_pipeline.py +++ b/otx/api/usecases/exportable_code/demo/demo_package/executors/sync_pipeline.py @@ -22,6 +22,7 @@ from otx.api.usecases.exportable_code.streamer import get_streamer from otx.api.usecases.exportable_code.visualizers import Visualizer from otx.api.utils.shape_factory import ShapeFactory +from otx.cli.tools.utils.demo.visualization import dump_frames class ChainExecutor: @@ -78,11 +79,16 @@ def crop( def run(self, input_stream: Union[int, str], loop: bool = False) -> None: """Run demo using input stream (image, video stream, camera).""" streamer = get_streamer(input_stream, loop) + saved_frames = [] for frame in streamer: # getting result for single image annotation_scene = self.single_run(frame) output = self.visualizer.draw(frame, annotation_scene, {}) self.visualizer.show(output) + if self.visualizer.output: + saved_frames.append(frame) if self.visualizer.is_quit(): break + + dump_frames(saved_frames, self.visualizer.output, input_stream, streamer) diff --git a/otx/api/usecases/exportable_code/demo/demo_package/executors/synchronous.py b/otx/api/usecases/exportable_code/demo/demo_package/executors/synchronous.py index 95184aff48f..74ab3387311 100644 --- a/otx/api/usecases/exportable_code/demo/demo_package/executors/synchronous.py +++ b/otx/api/usecases/exportable_code/demo/demo_package/executors/synchronous.py @@ -13,6 +13,7 @@ ) from otx.api.usecases.exportable_code.streamer import get_streamer from otx.api.usecases.exportable_code.visualizers import Visualizer +from otx.cli.tools.utils.demo.visualization import dump_frames class SyncExecutor: @@ -31,6 +32,7 @@ def __init__(self, model: ModelContainer, visualizer: Visualizer) -> None: def run(self, input_stream: Union[int, str], loop: bool = False) -> None: """Run demo using input stream (image, video stream, camera).""" streamer = get_streamer(input_stream, loop) + saved_frames = [] for frame in streamer: # getting result include preprocessing, infer, postprocessing for sync infer @@ -38,5 +40,9 @@ def run(self, input_stream: Union[int, str], loop: bool = False) -> None: annotation_scene = self.converter.convert_to_annotation(predictions, frame_meta) output = self.visualizer.draw(frame, annotation_scene, frame_meta) self.visualizer.show(output) + if self.visualizer.output: + saved_frames.append(frame) if self.visualizer.is_quit(): break + + dump_frames(saved_frames, self.visualizer.output, input_stream, streamer) diff --git a/otx/api/usecases/exportable_code/demo/demo_package/utils.py b/otx/api/usecases/exportable_code/demo/demo_package/utils.py index 663aafd6f19..752f3552d9f 100644 --- a/otx/api/usecases/exportable_code/demo/demo_package/utils.py +++ b/otx/api/usecases/exportable_code/demo/demo_package/utils.py @@ -47,9 +47,9 @@ def create_output_converter(task_type: TaskType, labels: LabelSchemaEntity): return create_converter(converter_type, labels) -def create_visualizer(_task_type: TaskType, no_show: bool = False): +def create_visualizer(_task_type: TaskType, no_show: bool = False, output: Optional[str] = None): """Create visualizer according to kind of task.""" # TODO: use anomaly-specific visualizer for anomaly tasks - return Visualizer(window_name="Result", no_show=no_show) + return Visualizer(window_name="Result", no_show=no_show, output=output) diff --git a/otx/api/usecases/exportable_code/demo/requirements.txt b/otx/api/usecases/exportable_code/demo/requirements.txt index f0bba0497d7..903eb7b6480 100644 --- a/otx/api/usecases/exportable_code/demo/requirements.txt +++ b/otx/api/usecases/exportable_code/demo/requirements.txt @@ -1,4 +1,4 @@ openvino==2022.3.0 openmodelzoo-modelapi==2022.3.0 -otx @ git+https://github.com/openvinotoolkit/training_extensions/@dd03235da2319815227f1b75bce298ee6e8b0f31#egg=otx +otx @ git+https://github.com/openvinotoolkit/training_extensions/@8c11c3d42c726e6e0eda7364f00cf8ed4dbdc2e9#egg=otx numpy>=1.21.0,<=1.23.5 # np.bool was removed in 1.24.0 which was used in openvino runtime diff --git a/otx/api/usecases/exportable_code/streamer/streamer.py b/otx/api/usecases/exportable_code/streamer/streamer.py index a31b1652877..641cec68c6e 100644 --- a/otx/api/usecases/exportable_code/streamer/streamer.py +++ b/otx/api/usecases/exportable_code/streamer/streamer.py @@ -164,6 +164,10 @@ def __iter__(self) -> Iterator[np.ndarray]: else: break + def fps(self): + """Returns a frequency of getting images from source.""" + return self.cap.get(cv2.CAP_PROP_FPS) + def get_type(self) -> MediaType: """Returns the type of media.""" return MediaType.VIDEO diff --git a/otx/api/usecases/exportable_code/visualizers/visualizer.py b/otx/api/usecases/exportable_code/visualizers/visualizer.py index 390a0d0d48a..ed79a4a0ca8 100644 --- a/otx/api/usecases/exportable_code/visualizers/visualizer.py +++ b/otx/api/usecases/exportable_code/visualizers/visualizer.py @@ -66,6 +66,7 @@ def __init__( is_one_label: bool = False, no_show: bool = False, delay: Optional[int] = None, + output: Optional[str] = None, ) -> None: self.window_name = "Window" if window_name is None else window_name self.shape_drawer = ShapeDrawer(show_count, is_one_label) @@ -74,6 +75,7 @@ def __init__( self.no_show = no_show if delay is None: self.delay = 1 + self.output = output def draw( self, diff --git a/otx/cli/tools/demo.py b/otx/cli/tools/demo.py index ea6547322e8..c1cdc59fb4d 100644 --- a/otx/cli/tools/demo.py +++ b/otx/cli/tools/demo.py @@ -27,7 +27,7 @@ from otx.api.entities.task_environment import TaskEnvironment from otx.cli.manager import ConfigManager from otx.cli.tools.utils.demo.images_capture import open_images_capture -from otx.cli.tools.utils.demo.visualization import draw_predictions, put_text_on_rect_bg +from otx.cli.tools.utils.demo.visualization import draw_predictions, dump_frames, put_text_on_rect_bg from otx.cli.utils.importing import get_impl_class from otx.cli.utils.io import read_label_schema, read_model from otx.cli.utils.parser import ( @@ -71,6 +71,12 @@ def get_args(): "These metrics take into account not only model inference time, but also " "frame reading, pre-processing and post-processing.", ) + parser.add_argument( + "--output", + default=None, + type=str, + help="Output path to save input data with predictions.", + ) add_hyper_parameters_sub_parser(parser, hyper_parameters, modes=("INFERENCE",)) override_param = [f"params.{param[2:].split('=')[0]}" for param in params if param.startswith("--")] @@ -106,6 +112,9 @@ def main(): # Dynamically create an argument parser based on override parameters. args, override_param = get_args() + if args.loop and args.output: + raise ValueError("--loop and --output cannot be both specified") + config_manager = ConfigManager(args, mode="demo") # Auto-Configuration for model template config_manager.configure_template() @@ -136,7 +145,7 @@ def main(): capture = open_images_capture(args.input, args.loop) elapsed_times = deque(maxlen=10) - frame_index = 0 + saved_frames = [] while True: frame = capture.read() if frame is None: @@ -155,12 +164,17 @@ def main(): color=(255, 255, 255), ) - if args.delay >= 0: + if args.delay > 0: cv2.imshow("frame", frame) if cv2.waitKey(args.delay) == ESC_BUTTON: break else: - print(f"{frame_index=}, {elapsed_time=}, {len(predictions)=}") + print(f"Frame: {elapsed_time=}, {len(predictions)=}") + + if args.output: + saved_frames.append(frame) + + dump_frames(saved_frames, args.output, args.input, capture) return dict(retcode=0, template=template.name) diff --git a/otx/cli/tools/utils/demo/visualization.py b/otx/cli/tools/utils/demo/visualization.py index 09edd4abfe2..4392fc7de04 100644 --- a/otx/cli/tools/utils/demo/visualization.py +++ b/otx/cli/tools/utils/demo/visualization.py @@ -15,18 +15,22 @@ # and limitations under the License. -from typing import List, Tuple +from pathlib import Path +from typing import List, Tuple, Union from warnings import warn import cv2 import numpy as np from cv2 import Mat +from otx.algorithms.common.utils.logger import get_logger from otx.api.entities.annotation import Annotation from otx.api.entities.model_template import TaskType from otx.api.entities.shapes.polygon import Polygon from otx.api.entities.shapes.rectangle import Rectangle +logger = get_logger() + def put_text_on_rect_bg(frame: Mat, message: str, position: Tuple[int, int], color=(255, 255, 0)): """Puts a text message on a black rectangular aread in specified position of a frame.""" @@ -161,3 +165,46 @@ def draw_predictions(task_type: TaskType, predictions: List[Annotation], frame: else: raise ValueError(f"Unknown task type: {task_type}") return frame + + +def get_input_names_list(input_path: Union[str, int], capture): + """Lists the filenames of all inputs for demo.""" + + # Web camera input + if isinstance(input_path, int): + return [] + if "DIR" in capture.get_type(): + return [f.name for f in Path(input_path).iterdir() if f.is_file()] + else: + return [Path(input_path).name] + + +def dump_frames(saved_frames: list, output: str, input_path: Union[str, int], capture): + """Saves images/videos with predictions from saved_frames to output folder with proper names.""" + + if not saved_frames: + return + + output_path = Path(output) + if not output_path.exists(): + output_path.mkdir(parents=True) + + filenames = get_input_names_list(input_path, capture) + + if "VIDEO" in capture.get_type(): + filename = filenames[0] + w, h, _ = saved_frames[0].shape + video_path = str(output_path / filename) + codec = cv2.VideoWriter_fourcc(*"mp4v") + out = cv2.VideoWriter(video_path, codec, capture.fps(), (h, w)) + for frame in saved_frames: + out.write(frame) + out.release() + logger.info(f"Video was saved to {video_path}") + else: + if len(filenames) < len(saved_frames): + filenames = [f"output_{i}" for i, _ in enumerate(saved_frames)] + for filename, frame in zip(filenames, saved_frames): + image_path = str(output_path / filename) + cv2.imwrite(image_path, frame) + logger.info(f"Image was saved to {image_path}")