Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script d'import de photos #85

Merged
merged 210 commits into from
Jul 8, 2020
Merged

Script d'import de photos #85

merged 210 commits into from
Jul 8, 2020

Conversation

mvoundy
Copy link
Contributor

@mvoundy mvoundy commented Jul 8, 2020

Geo-paysages FTP client (#79 )

Script that fetches site images from FTP Servers

Installation

From the folder GeoPaysages/geopaysagesftpclient (and inside your virtual env)

For production

pip install .

For development

pip install -e ".[dev]"

Development mode is required for running sphinx and pytest.

Once the installation is completed, the script fetchsiteimages should be created in your virtual environment.

Running the script

  1. Set-up a configuration file, say, config.ini (see the configuration section below)
  2. Run the script fetchsiteimages by providing your config file
fetchsiteimages config.ini

Configuration

  1. Copy the template config file config.ini.tpl into config.ini
  2. Fill in the configuration file

A sample configuration file is the following

[main]
outputdir = /home/mv/workspace/data/images
sites =
    glacier_blanc_lateral
    another_site

sqlalchemy.url = postgres://mv:$$$$@localhost/geopaysages

[sites.glacier_blanc_lateral]
host = ftp.ecrins-parcnational.fr
port = 3921
user = photo
password = $$$$$$$
inputpattern = test/glacierblanc/lateral/{Y}{M}{D}/\w+{ext}
outputpattern = {site}_{Y}-{M}-{D}{ext}
resize = 800, 600
save_in_db = true
copyright_notice = ©Observateurs des glaciers de France

[sites.another_site]
host = ftp.public.fr
port = 21
user = 
password =
inputpatter = images/{Y}{M}{D}/.+{ext}
outputpatter = siteimages/{site}/{Y}-{M}-{D}/{filename}
resize =
save_in_db = false
copyright_notice = 

Configuration options

Options Description
outputdir Directory in which the script will store the images
sites The list of sites to fetch the images for. If the save_in_db option is set to true for the site, then the site must exist in database.
sqlalchemy.url Connection string for the database
[sites.site/host] FTP connection host for the site
[sites.site/port] Optional. FTP connection port for the site. Defaults to 21
[sites.site/user] Optional. FTP connection user for non-anonymous connections.
[sites.site/password] Optional. FTP connection password for non-anonymous connections.
inputpattern python-regex-like pattern to describe the files to fetch and how to parse informations from their paths
outputpattern python-regex-like pattern to describe how to name the fetched files from the parsed informations
resize (width, height) value for image resizing. If this is not provided, the images will not be resized.
save_in_db Boolean that indicates whether or not to register the images in the database
copyright_notice Optional. Copyright notice to add to the retrieved files IPTC data. This will be ignored if the fetched image already has a copyright notice.

Patterns

The inputpattern option specifies how to target the files to fetch from the FTP server and how to parse informations from their path. While the outputpattern specifies how to name the retrieved files using the parsed informations.

Example

Using the following configuration for a site, say, glacierblanc_lateral

[main]
outputdir = home/mv/Pictures
sites = 
	glacierblanc_lateral
...

[sites.glacierblanc_lateral]
inputpattern = images/\w+/{Y}-{M}-{D}/\w+{ext}
outputpattern = retrieved_images/{site}/{Y}_{M}_{D}{ext}
...

A file located at images/testsite/2020-10-08/img10001.JPG in the FTP server will match the inputpattern with the following matchdict

{
    "Y": 2020,
    "M": 10,
    "D": 08,
    "ext": ".JPG",
    "site": "glacierblanc_lateral",
    "filename": "img10001.JPG",
    "path": "images/testsite/2020-10-08/img10001.JPG"
}

and will be saved at home/mv/Pictures/retrieved_images/glacierblanc_lateral/2020_10_08.JPG

Built-in patterns

Additionally to the python regex expressions, you can use the script built-in expressions.

Expression Description
{Y} Matches 4 digits for the year
{M} Matches 2 digits for the month
{D} Matches 2 digit for the day
{filename} outputpattern only. Name of the retrieved file from the server
{path} outputpattern only. Path of the retrieved file from the server
{ext} Case-insensitive. Matches .jpg, .jpeg, .git, .png, .bmp
{site} Matches the current site name

Defining custom match group

You can define custom match group using the syntax {exp:name} where exp is a regular expression and name is the key for the match.

Example

Using the following configuration

inputpattern = images/{\w+:author}/{Y}-{M}-{D}/\w+{ext}
outputpattern = retrieved_images/{site}/{Y}_{M}_{D}_{author}{ext}

a file located at images/Carl/2020-10-08/image10001.jpg will match the inputpattern with the matchdict

{
    "author": "Carl",
    "Y": 2020,
    "M": 10,
    "D": 08,
    "ext": ".jpg",
    "site": "glacierblanc_lateral",
    "filename": "image10001.jpg",
    "path": "images/mv/2020-10-08/images10001.jpg"
}

and will be saved to retrieved_images/glacierblanc_lateral/2020_10_08_Carl.jpg.

Testing (dev installation mode only)

  1. copy the file pytest.ini.tpl into pytest.ini
  2. configure the pytest.ini file (This file is required to be named pytest.ini).
  3. run pytest -s -v from a terminal.

HamoudaAmine and others added 30 commits December 3, 2018 10:23
@mvoundy mvoundy added the enhancement New feature or request label Jul 8, 2020
@mvoundy mvoundy requested a review from HamoudaAmine July 8, 2020 09:25
@camillemonchicourt
Copy link
Member

Merci beaucoup pour ce travail documenté.
A voir comment le tester ensuite.
Mais est-ce normal que la PR intègre beaucoup d'autres choses non liées au sujet de l'import FTP ?
Certainement car elle tente de pousser dans la branche dev qui n'était pas du tout à jour sur la branche master ?
Donc à orienter plutôt vers la branche master ? Ou mettre à jour la branche dev avant cette PR ?

Merci.

@HamoudaAmine HamoudaAmine merged commit cad6500 into dev Jul 8, 2020
@mvoundy
Copy link
Contributor Author

mvoundy commented Jul 8, 2020

En effet la branche dev n'était pas à jour et j'ai créé la branche ftpclient avant de m'en rendre compte. J'ai donc du faire un pull de master.
Je pense qu'il faudrait mettre à jour la dev. Tu veux que je m'en occupe ?

Pour tester tu pourrais dans un premier temps le configurer de sorte qu'il n'enregistre rien en base et vérifier que :

  • les fichiers sont bien rappatriés et nommés selon le inputpattern et le outputpattern que tu spécifies
  • les fichiers sont bien redimensionnés selon la taille que tu spécifies
  • les exifs et iptc sont conservés
  • le copyright spécifié est ajouté aux fichiers si le fichier d'origine n'en avait pas, sinon le copyright d'origine est conservé.

Puis, essayer de cibler très peu de fichiers pour l'import via l'inputpattern et vérifier que tout est bien enregistré en base.

La plupart de ces specs sont checkés automatiquement lors des tests normalement.

@camillemonchicourt
Copy link
Member

Oui c'est bien que la branche dev soit à jour.
Mais du coup comme c'est passé ensemble, cela rend moins lisible le travail de cette PR car mélangé avec beaucoup d'autres choses. Pas grave cependant.
Et désormais en comparant les branches dev et master on a bien le travail correspondant à cette fonctionnalité : https://github.com/PnX-SI/GeoPaysages/compare/dev
Je vous laisse voir si il faut faire une PR de dev vers master ou si c'est trop tôt.

Merci.

camillemonchicourt added a commit that referenced this pull request Aug 19, 2020
Copy/paste from great @mvoundy documentation (#85)
@camillemonchicourt camillemonchicourt deleted the ftpclient branch July 30, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

7 participants