-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop an algorithm to give the probability of a certain partner be a relative of the congressperson #98
Comments
I was looking for some way to contribute with this project and this issue seems interesting, since I already solved a similar problem. Here are some ideas:
I might do this in the future, but feel free to steal those ideas :) |
This is awesome! Many thanks, @gabriel-almeida! I'm not sure I have the right skills to code that this quickly by myself, but surely this leaves this Issue way easier. Whoever wants to jump in, make yourself at home ; ) |
Wikipedia provides a lot relationship between the politics, maybe it is a good source to scrape data. Also, maybe I am being silly, but the names of all partners should be public info too, shouldn't? |
@eldersantos sure thing — we've discussed some pros & cons of using Wikipedia for that purpose at #15. As there were some relevant cons this issue is more focused on detecting family members when we can't find that data elsewhere (Facebook, Wikipedia, etc.), does that make sense? |
Absolutely, sorry for not check that subject on the other issue, in that
case definitely we need an algorithm to solve it. Maybe we can check on the
literature (academic papers) what is a good (complexity/time) approach:)
…On Wed, 15 Feb 2017 at 12:21 Eduardo Cuducos ***@***.***> wrote:
@eldersantos <https://github.com/eldersantos> sure thing — we've
discussed some pros & cons of using Wikipedia for that purpose at #15
<#15>. As there
were some relevant *cons* this issue is more focused on detecting family
members when we can't find that data elsewhere (Facebook, Wikipedia, etc.),
does that make sense?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALBvGHV738WiUP1l4e3HtFQqT4n07l2ks5rcwnagaJpZM4KlJhl>
.
|
No need to say sorry — this link was in fact missing here in this thread ; ) |
@gabriel-almeida has made an awesome contribution in in #119 — further discussion is welcomed there. |
We have all the names of the congresspeople and the name of their parents.
That said @anaschwendler and I were discussing today the possibility of having an algorithm that receives as an input:
The algorithms would give us the probability of the following hypothesis: the partner and the congressperson are relatives.
We could balance more popular (e.g. Silva) and less popular family names (e.g. Sarney) with internal sources (we have thousands of full names in out dataset, including congresspeople and their parents) or with an external database (no ideia, but that should not be a big challenge).
PS: formally we don't have company partners in our dataset, but it's on our roadmap (and maybe the development of this algorithm doesn't depend on that).
(@g4brielvs feel free to jump in!)
The text was updated successfully, but these errors were encountered: