Merge pull request #1104 from roboflow/letterbox_pr_docs_improvements

docs improvements related to #1028 PR
roboflow · Apr 9, 2024 · 10843c9 · 10843c9
2 parents 15a4555 + eb1c771
commit 10843c9
Show file tree

Hide file tree

Showing 11 changed files with 638 additions and 179 deletions.
diff --git a/docs/how_to/detect_and_annotate.md b/docs/how_to/detect_and_annotate.md
@@ -15,7 +15,7 @@ source image.
 
 ![basic-annotation](https://media.roboflow.com/supervision_detect_and_annotate_example_1.png)
 
-## Run Inference
+## Run Detection
 
 First, you'll need to obtain predictions from your object detection or segmentation
 model.

diff --git a/docs/how_to/save_detections.md b/docs/how_to/save_detections.md
@@ -0,0 +1,299 @@
+---
+comments: true
+status: new
+---
+
+# Save Detections
+
+Supervision enables an easy way to save detections in .CSV and .JSON files for offline
+processing. This guide demonstrates how to perform video inference using the
+[Inference](https://github.com/roboflow/inference),
+[Ultralytics](https://github.com/ultralytics/ultralytics) or
+[Transformers](https://github.com/huggingface/transformers) packages and save their results with
+[`sv.CSVSink`](/latest/detection/tools/save_detections/#supervision.detection.tools.csv_sink.CSVSink) and
+[`sv.JSONSink`](/latest/detection/tools/save_detections/#supervision.detection.tools.csv_sink.JSONSink).
+
+## Run Detection
+
+First, you'll need to obtain predictions from your object detection or segmentation
+model. You can learn more on this topic in our
+[How to Detect and Annotate](/latest/how_to/detect_and_annotate.md) guide.
+
+=== "Inference"
+
+    ```python
+    import supervision as sv
+    from inference import get_model
+
+    model = get_model(model_id="yolov8n-640")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    for frame in frames_generator:
+
+        results = model.infer(image)[0]
+        detections = sv.Detections.from_inference(results)
+    ```
+
+=== "Ultralytics"
+
+    ```python
+    import supervision as sv
+    from ultralytics import YOLO
+
+    model = YOLO("yolov8n.pt")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    for frame in frames_generator:
+
+        results = model(frame)[0]
+        detections = sv.Detections.from_ultralytics(results)
+    ```
+
+=== "Transformers"
+
+    ```python
+    import torch
+    import supervision as sv
+    from transformers import DetrImageProcessor, DetrForObjectDetection
+
+    processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
+    model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    for frame in frames_generator:
+
+        frame = sv.cv2_to_pillow(frame)
+        inputs = processor(images=frame, return_tensors="pt")
+
+        with torch.no_grad():
+            outputs = model(**inputs)
+
+        width, height = frame.size
+        target_size = torch.tensor([[height, width]])
+        results = processor.post_process_object_detection(
+            outputs=outputs, target_sizes=target_size)[0]
+        detections = sv.Detections.from_transformers(results)
+    ```
+
+## Save Detections as CSV
+
+To save detections to a `.CSV` file, open our
+[`sv.CSVSink`](/latest/detection/tools/save_detections/#supervision.detection.tools.csv_sink.CSVSink)
+and then pass the
+[`sv.Detections`](/latest/detection/core/#supervision.detection.core.Detections)
+object resulting from the inference to it. Its fields are parsed and saved on disk.
+
+=== "Inference"
+
+    ```{ .py hl_lines="7 12" }
+    import supervision as sv
+    from inference import get_model
+
+    model = get_model(model_id="yolov8n-640")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.CSVSink(<TARGET_CSV_PATH>) as sink:
+        for frame in frames_generator:
+
+            results = model.infer(image)[0]
+            detections = sv.Detections.from_inference(results)
+            sink.append(detections, {})
+    ```
+
+=== "Ultralytics"
+
+    ```{ .py hl_lines="7 12" }
+    import supervision as sv
+    from ultralytics import YOLO
+
+    model = YOLO("yolov8n.pt")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.CSVSink(<TARGET_CSV_PATH>) as sink:
+        for frame in frames_generator:
+
+            results = model(frame)[0]
+            detections = sv.Detections.from_ultralytics(results)
+            sink.append(detections, {})
+    ```
+
+=== "Transformers"
+
+    ```{ .py hl_lines="9 23" }
+    import torch
+    import supervision as sv
+    from transformers import DetrImageProcessor, DetrForObjectDetection
+
+    processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
+    model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.CSVSink(<TARGET_CSV_PATH>) as sink:
+        for frame in frames_generator:
+
+            frame = sv.cv2_to_pillow(frame)
+            inputs = processor(images=frame, return_tensors="pt")
+
+            with torch.no_grad():
+                outputs = model(**inputs)
+
+            width, height = frame.size
+            target_size = torch.tensor([[height, width]])
+            results = processor.post_process_object_detection(
+                outputs=outputs, target_sizes=target_size)[0]
+            detections = sv.Detections.from_transformers(results)
+            sink.append(detections, {})
+    ```
+
+| x_min   | y_min    | x_max   | y_max    | class_id | confidence | tracker_id | class_name |
+|---------|----------|---------|----------|----------|------------|------------|------------|
+| 2941.14 | 1269.31  | 3220.77 | 1500.67  | 2        | 0.8517     |            | car        |
+| 944.889 | 899.641  | 1235.42 | 1308.80  | 7        | 0.6752     |            | truck      |
+| 1439.78 | 1077.79  | 1621.27 | 1231.40  | 2        | 0.6450     |            | car        |
+
+## Custom Fields
+
+Besides regular fields in
+[`sv.Detections`](/latest/detection/core/#supervision.detection.core.Detections),
+[`sv.CSVSink`](/latest/detection/tools/save_detections/#supervision.detection.tools.csv_sink.CSVSink)
+also allows you to add custom information to each row, which can be passed via the
+`custom_data` dictionary. Let's utilize this feature to save information about the
+frame index from which the detections originate.
+
+=== "Inference"
+
+    ```{ .py hl_lines="8 12" }
+    import supervision as sv
+    from inference import get_model
+
+    model = get_model(model_id="yolov8n-640")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.CSVSink(<TARGET_CSV_PATH>) as sink:
+        for frame_index, frame in enumerate(frames_generator):
+
+            results = model.infer(image)[0]
+            detections = sv.Detections.from_inference(results)
+            sink.append(detections, {"frame_index": frame_index})
+    ```
+
+=== "Ultralytics"
+
+    ```{ .py hl_lines="8 12" }
+    import supervision as sv
+    from ultralytics import YOLO
+
+    model = YOLO("yolov8n.pt")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.CSVSink(<TARGET_CSV_PATH>) as sink:
+        for frame_index, frame in enumerate(frames_generator):
+
+            results = model(frame)[0]
+            detections = sv.Detections.from_ultralytics(results)
+            sink.append(detections, {"frame_index": frame_index})
+    ```
+
+=== "Transformers"
+
+    ```{ .py hl_lines="10 23" }
+    import torch
+    import supervision as sv
+    from transformers import DetrImageProcessor, DetrForObjectDetection
+
+    processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
+    model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.CSVSink(<TARGET_CSV_PATH>) as sink:
+        for frame_index, frame in enumerate(frames_generator):
+
+            frame = sv.cv2_to_pillow(frame)
+            inputs = processor(images=frame, return_tensors="pt")
+
+            with torch.no_grad():
+                outputs = model(**inputs)
+
+            width, height = frame.size
+            target_size = torch.tensor([[height, width]])
+            results = processor.post_process_object_detection(
+                outputs=outputs, target_sizes=target_size)[0]
+            detections = sv.Detections.from_transformers(results)
+            sink.append(detections, {"frame_index": frame_index})
+    ```
+
+| x_min   | y_min    | x_max   | y_max    | class_id | confidence | tracker_id | class_name | frame_index |
+|---------|----------|---------|----------|----------|------------|------------|------------|-------------|
+| 2941.14 | 1269.31  | 3220.77 | 1500.67  | 2        | 0.8517     |            | car        | 0           |
+| 944.889 | 899.641  | 1235.42 | 1308.80  | 7        | 0.6752     |            | truck      | 0           |
+| 1439.78 | 1077.79  | 1621.27 | 1231.40  | 2        | 0.6450     |            | car        | 0           |
+
+## Save Detections as JSON
+
+If you prefer to save the result in a `.JSON` file instead of a `.CSV` file, all you
+need to do is replace
+[`sv.CSVSink`](/latest/detection/tools/save_detections/#supervision.detection.tools.csv_sink.CSVSink)
+with
+[`sv.JSONSink`](/latest/detection/tools/save_detections/#supervision.detection.tools.csv_sink.JSONSink).
+
+=== "Inference"
+
+    ```{ .py hl_lines="7" }
+    import supervision as sv
+    from inference import get_model
+
+    model = get_model(model_id="yolov8n-640")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.JSONSink(<TARGET_CSV_PATH>) as sink:
+        for frame_index, frame in enumerate(frames_generator):
+
+            results = model.infer(image)[0]
+            detections = sv.Detections.from_inference(results)
+            sink.append(detections, {"frame_index": frame_index})
+    ```
+
+=== "Ultralytics"
+
+    ```{ .py hl_lines="7" }
+    import supervision as sv
+    from ultralytics import YOLO
+
+    model = YOLO("yolov8n.pt")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.JSONSink(<TARGET_CSV_PATH>) as sink:
+        for frame_index, frame in enumerate(frames_generator):
+
+            results = model(frame)[0]
+            detections = sv.Detections.from_ultralytics(results)
+            sink.append(detections, {"frame_index": frame_index})
+    ```
+
+=== "Transformers"
+
+    ```{ .py hl_lines="9" }
+    import torch
+    import supervision as sv
+    from transformers import DetrImageProcessor, DetrForObjectDetection
+
+    processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
+    model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
+    frames_generator = sv.get_video_frames_generator(<SOURCE_VIDEO_PATH>)
+
+    with sv.JSONSink(<TARGET_CSV_PATH>) as sink:
+        for frame_index, frame in enumerate(frames_generator):
+
+            frame = sv.cv2_to_pillow(frame)
+            inputs = processor(images=frame, return_tensors="pt")
+
+            with torch.no_grad():
+                outputs = model(**inputs)
+
+            width, height = frame.size
+            target_size = torch.tensor([[height, width]])
+            results = processor.post_process_object_detection(
+                outputs=outputs, target_sizes=target_size)[0]
+            detections = sv.Detections.from_transformers(results)
+            sink.append(detections, {"frame_index": frame_index})
+    ```
diff --git a/docs/utils/image.md b/docs/utils/image.md
@@ -6,16 +6,22 @@ status: new
 # Image Utils
 
 <div class="md-typeset">
-  <h2>ImageSink</h2>
+  <h2>crop_image</h2>
 </div>
 
-:::supervision.utils.image.ImageSink
+:::supervision.utils.image.crop_image
 
 <div class="md-typeset">
-  <h2>crop_image</h2>
+  <h2>scale_image</h2>
 </div>
 
-:::supervision.utils.image.crop_image
+:::supervision.utils.image.scale_image
+
+<div class="md-typeset">
+  <h2>resize_image</h2>
+</div>
+
+:::supervision.utils.image.resize_image
 
 <div class="md-typeset">
   <h2>letterbox_image</h2>
@@ -24,13 +30,13 @@ status: new
 :::supervision.utils.image.letterbox_image
 
 <div class="md-typeset">
-  <h2>resize_image</h2>
+  <h2>overlay_image</h2>
 </div>
 
-:::supervision.utils.image.resize_image
+:::supervision.utils.image.overlay_image
 
 <div class="md-typeset">
-  <h2>place_image</h2>
+  <h2>ImageSink</h2>
 </div>
 
-:::supervision.utils.image.place_image
+:::supervision.utils.image.ImageSink
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -38,9 +38,11 @@ nav:
   - Home: index.md
   - How to:
       - Detect and Annotate: how_to/detect_and_annotate.md
+      - Save Detections: how_to/save_detections.md
+      - Filter Detections: how_to/filter_detections.md
       - Detect Small Objects: how_to/detect_small_objects.md
       - Track Objects: how_to/track_objects.md
-      - Filter Detections: how_to/filter_detections.md
+
   - API:
     - Annotators: annotators.md
     - Classifications:

diff --git a/supervision/__init__.py b/supervision/__init__.py
@@ -71,15 +71,16 @@
 from supervision.geometry.utils import get_polygon_center
 from supervision.metrics.detection import ConfusionMatrix, MeanAveragePrecision
 from supervision.tracker.byte_tracker.core import ByteTrack
+from supervision.utils.conversion import cv2_to_pillow, pillow_to_cv2
 from supervision.utils.file import list_files_with_extensions
 from supervision.utils.image import (
     ImageSink,
     create_tiles,
     crop_image,
     letterbox_image,
-    place_image,
+    overlay_image,
     resize_image,
-    resize_image_keeping_aspect_ratio,
+    scale_image,
 )
 from supervision.utils.notebook import plot_image, plot_images_grid
 from supervision.utils.video import (