Author: Anders Poirel
This the Data Science Slugs team's repository for the LANL Earthquake Prediction competition on Kaggle.
Go to GitHub and create a user account. Message Ryan Darling or Anders Poirel on Slack so that you can be added as a collaborator on the project on GitHub. This is will be necessary for pushing your contributions to our repo.
Get the official installer here. Be sure to select Git Bash at installation if you are on Windows. You can also use a GUI for Git but knowing command line will make things much easier for everyone down the line. If you do not know any git, Pro Git is a great resource to get you started.
You can use any editor you like, but to participate in live code collaboration you will need VS Code. Get the official installer here. Launch VS Code and in Extensions (Ctrl+Shift+X) search 'VS Live Share' and install. I also highly recommend installing the 'Python' package from Microsoft and the 'Rainbow CSV' package, for improved Python and CSV support.
At each meeting, links to join a live code collaboration will be sent out through Slack.
Note: this is only necessary if you intend to run the code on your local machine. You can get started without it. Also, a lot of the above software can be installed through Anaconda if you prefer.
Download the official installer here. Be sure to select the Python 3.7 version. I will list here as needed the Python libraries you will need to install through Anaconda.
Note: I highly recommend reading the first 2 chapter of the book ProGit, freely available online, to get a working knowledge of git
Create a folder to store our work on the project.
$ mkdir DataScienceSlugs
$ cd DataScienceSlugs
Copy the repository to your local machine, adding it as a remote - Use the name and email you used on GitHub.
$ git config --global user.name "Jonh Doe"
$ git config --global user.email jonhdoe@example.com
$ git clone https://github.com/datascienceslugs/dss-titanic
Navigate to the folder containing the project and then run
$ git checkout [branch you want to work on]
$ git pull origin [branch you want to work on]
$ code [file you want to work on]
Which will open the file in VSCode
All work should be done on branches created specifically for the issue you want to work on.
If running in IPyhton/jupyter, make sure the directory in IPython is set to the one containing the script you want to run, otherwise relative pathing will not work properly.
Source files are all under src
. To build the features, run feature-extraction.py
, located in src/data
. Raw data is not on git due to file size limitations, but you should save it under data/raw
If you modify that script, make sure that it still writes it output to data/processed
.
Level 1 models are under src/models
and should write their output to data/intermediate
Level 2 model is in meta_learner.py
and also in src/models
.
See CONTRIBUTING.md for guidelines on modifying the code.