GitHub - buildit/dotnet-webcrawler-starter

Please, note that you have 5 days to complete the exercise from the day it has been sent out.

Here are the instructions for the Buildit - Wipro Digital Platform Engineer - Cloud exercise :

What we are looking for

There are no tricks or hidden agendas. The purpose of this exercise is for you to demonstrate your technical knowledge, reasoning, and engineering practices using current software development technologies and methods. Please make sure your code is clear and demonstrates your best practices. The exercise should be done as if you were building software to hand off to someone else. Refrain from using this as an opportunity to learn a new framework, library or paradigm besides what you feel would be essential to completing this task.

Your solution will form the basis for discussion in subsequent interviews.

What you need to do

Please write a simple web crawler in C#.

The crawler should be limited to one domain. Given a starting URL – say http://wiprodigital.com - it should visit all pages within the domain, but not follow the links to external sites such as Google or Twitter. None of the links in your output should end with a slash (/).

The expected output format:

{
    "uri": "https://test.com/about.html",
    "internalLinks": [
        "https://test.com",
        "https://test.com/about.html#",
        "https://test.com/search.html",
        "https://test.com/categories.html",
        "https://test.com/articles/2015-04-23-forum",
        "https://test.com/feed.xml"
    ],
    "externalLinks": [
        "https://groups.google.com/forum/#!forum/test",
        "https://test.tumblr.com",
        "https://www.agilementoring.com",
        "mailto:test@test.com",
        "https://github.com/test",
        "https://twitter.com/test"
    ],
    "images": [
        "/assets/test.svg"
    ]
}

Please update this README.md describing your thought process and the tradeoffs made. Also, detail anything further that you would like to achieve with more time.

Once done, please make your solution available on Github and forward the link. Where possible please include your commit history to provide visibility of your thinking and working practice.

What you need to share with us

A working crawler as per requirements above
An updated README.md explaining:
Reasoning and describe any trade offs
Explanation of what could be done with more time
Project builds / runs / tests as per instruction

Good luck and thank you for your time - we look forward to seeing your creation.

Running the app

Test the endpoint with curl http://localhost:8080/crawl?url=<-- url to be crawled>

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Controllers		Controllers
Models		Models
Properties		Properties
.gitignore		.gitignore
Buildit.Webcrawler.csproj		Buildit.Webcrawler.csproj
Program.cs		Program.cs
README.md		README.md
Startup.cs		Startup.cs
app.config		app.config
appsettings.json		appsettings.json
global.json		global.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What we are looking for

What you need to do

What you need to share with us

Running the app

About

Releases

Packages

Languages

buildit/dotnet-webcrawler-starter

Folders and files

Latest commit

History

Repository files navigation

What we are looking for

What you need to do

What you need to share with us

Running the app

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages