Crawler Desafio Neoway

crawler to get data from https://www2.correios.com.br/sistemas/buscacep/buscaFaixaCep.cfm

clone the project

RUN WITHOUT DOCKER

export PYTHONPATH="${PYTHONPATH}:${PWD}"

sudo chmod -R 777 src/

python test/test_crawler.py

python src/crawler.py

RUN WITH DOCKER

sudo docker-compose up --build test_crawler

sudo docker-compose up --build crawler

RUN DOCKER LOGS

sudo docker-compose up --build -d log-view

you can see the log in http://localhost:9999

Data Pirates challenge

Welcome to the Data Pirates challenge.

Scenario

Hello! We have a small adventure to put your skills to the test. In this task you have to collect data from a website and then write the results to a file.

Requirements

Use the https://www2.correios.com.br/sistemas/buscacep/buscaFaixaCep.cfm URL;
Get data from at least two UFs. The more, the better;
Collect all records for each UF;
Each record must contain at least 3 fields: "localidade", "faixa de cep" and a generated "id". Do not let duplicate records in your output file;
The output format must be JSONL

Deliverable

✔️ The code should be sent through github with at least a README documentation explaining how to test and run it.
✔️ It would be REALLY nice if it was hosted in a git repo of your own. You can create a new empty project, create a branch and Pull Request it to the new master branch you have just created. Provide the PR URL for us so we can discuss the code 😁. BUT if you'd rather, just compress this directory and send it back to us.
❌ Do not start a Pull Request to this project.

References:

https://www2.correios.com.br/sistemas/buscacep/buscaFaixaCep.cfm

http://jsonlines.org

Pay attention

There is no right answer, we will evaluate how you solve problems and what are the results achieved.
We work mainly with Python 3 and Go, but feel free to use any language you feel more comfortable with.
Unit tests are cool.
It's important we can execute your project, so make it clear which steps we need to follow to test and execute your project.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
src		src
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
compose.yaml		compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler Desafio Neoway

RUN WITHOUT DOCKER

RUN WITH DOCKER

RUN DOCKER LOGS

Data Pirates challenge

Scenario

Requirements

Deliverable

Pay attention

About

Releases

Packages

Languages

andrepreira/desafio_neoway_crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler Desafio Neoway

RUN WITHOUT DOCKER

RUN WITH DOCKER

RUN DOCKER LOGS

Data Pirates challenge

Scenario

Requirements

Deliverable

Pay attention

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages