Application for parsing of catalogs and products of the site http://old.relefopt.ru
This online store displays prices only to authorized users. The content of specific products is formed by a separate ajax request.
-
JDK 1.8
-
Maven 3
-
PostgreSQL 9.x
-
Setup JAVA_HOME (https://confluence.atlassian.com/doc/setting-the-java_home-variable-in-windows-8895.html)
-
Setup MAVEN_HOME (https://www.mkyong.com/maven/how-to-install-maven-in-windows/ or https://www.tutorialspoint.com/maven/maven_environment_setup.htm)
-
Create PostgreSQL database
All properties for connecting to the database are stored in the database.properties file:
db.url=jdbc:postgresql://localhost:5432/old_relefopt , where
localhost - address of the database
5432 - port of the database
old_relefopt - name of the database
db.username=postgres , where
postgres - login for connecting to the database
db.password=postgres , where
postgres - password for connecting to the database
- Setup authorization credentials:
site.user.login=your_login , where your_login is your login for authorization on the site
site.user.password=your_password , where your_password is your password for authorization on the site
- To build jar file, execute next command:
mvn clean package
Result: relef-parser-{version}.jar file will be created in the /target directory.
- To run jar file, execute next command:
java -jar relef-parser-{version}.jar
The application prints help if you run it without arguments.
- Start fast full parsing:
java -jar relef-parser-{version}.jar -pf 0
- Export products:
java -jar relef-parser-{version}.jar -ep
Creates a file products_{datetime}.xlsx in the directory /exports
- Download images of all products:
java -jar relef-parser-{version}.jar -dpi
Creates many images in directory /downloads
- If you need to analyze percent of matching names with products from MySklad system then you need to create PostgreSQL indexes:
1.1. https://stackoverflow.com/a/16552593/8035065
CREATE EXTENSION pg_trgm;
1.2. https://www.postgresql.org/docs/9.3/static/pgtrgm.html
CREATE INDEX trgm_idx_t_product_name ON t_product USING gist (name gist_trgm_ops);
1.7 (2017-06-08): Changed address of the site.
1.6 (2017-06-08): Changed DB to PostgreSQL. Added import of products from MySklad system.
1.5 (2017-05-23): Added authorization.
1.4 (2017-05-15): Added export to xlsx.
1.3 (2017-05-13): Added strategies for parsing.
1.2 (2017-05-07): Added Liquibase support.
1.1 (2017-05-01): Added Hibernate support.
1.0 (2017-04-29): First commit.