Skip to content

Commit

Permalink
Improved Formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
jazibdawre committed Jul 24, 2020
1 parent e230a0a commit 56e151e
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 34 deletions.
4 changes: 2 additions & 2 deletions DatasetCreator.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,12 +358,12 @@ def clean_image(target_folder):
if settings["clean_images"]:
try:

print(" Starting Image cleaner GUI.\n Credits to Guillaume Erhard, github.com/GuillaumeErhard/ImageSetCleaner")
print(" Starting Image cleaner GUI.\n Credits to Guillaume Erhard, https://github.com/GuillaumeErhard/ImageSetCleaner")
p = subprocess.run(["python", "image_set_cleaner.py", f"--image_dir={os.path.join('..', target_folder)}"], cwd="ImageSetCleaner", stdout = subprocess.PIPE, stderr = subprocess.PIPE, universal_newlines=True)

if p.returncode != 0:
if p.stderr.split('\n')[-2] == "AssertionError: No outlier detected in the directory.":
print("\n [INFO] No outliers detected")
print("\n No outliers detected!")
else:
print("\n [WARN] Image cleaner exited with error")
else:
Expand Down
50 changes: 18 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,39 +53,25 @@ On the prompt for search term, enter the word/sentence as you would in a search

## Settings
The settings can be changed via the settings.json file
```
"no_img" The number of images to download (approximately)
"target_url" This is the base url
"stealth" When enabled the script will identify as the user-agent defined in the settings dictionary
"user_agent" We pretend to be this in stealth mode. Use any valid UA string you like
"image_dimension" Dimension of images in dataset if "resize_images" is True
"image_distribution" Ratio of images in train/valid/test set. ex: "70/15/15"
"driver" Path to the webdriver for the browser. ex: driver/geckodriver.exe for firefox
"logging" Enable logging of events and errors in log/run and log/err

"download_images" Weather to download images via browser.
"remove_duplicate" Delete duplicate images by phash algorithm
"clean_images" Use ImageSetCleaner by Guillaume Erhard to filter out bad images. (optional)
"resize_images" resize images to image_dimension*image_dimension pixels
"mirror_images" mirror every image in the dataset. (optional)
"move_images" distribute images in train/valid/test folder based on image_distribution value
"rename_images" rename images as 'first search term_(image_no)'.
"label_images" label images using labelImg by Tzutalin. (optional)
```
|Setting |Description |
|----------------------|------------------------------------------------------------------------------------------|
|no_img |The number of images to download (approximately) |
|target_url |This is the base url |
|stealth |Spoof the user-agent as defined in the settings dictionary |
|user_agent |UA to be used in stealth mode. Use any valid UA string you like |
|image_dimension |Dimension of images in dataset if "resize_images" is True |
|image_distribution |Ratio of images in train/valid/test set. ex: "70/15/15" |
|driver |Path to the webdriver for the browser. ex: driver/geckodriver.exe for firefox |
|logging |Enable logging of events and errors in log/run and log/err |
|download_images |Weather to download images via browser. |
|remove_duplicate |Delete duplicate images by phash algorithm |
|clean_images |Use ImageSetCleaner by Guillaume Erhard to filter out bad images. (optional) |
|resize_images |resize images to image_dimension*image_dimension pixels |
|mirror_images |mirror every image in the dataset. (optional) |
|move_images |distribute images in train/valid/test folder based on image_distribution value |
|rename_images |rename images as 'first search term_(image_no)'. |
|label_images |label images using labelImg by Tzutalin. (optional) |

## Possible changes
1. If you require images to be less than 300px, you can use Beautiful Soup instead of selenium for a much much faster execution. You need to change the code in 'fetch_img_urls' function.
Expand Down

0 comments on commit 56e151e

Please sign in to comment.