Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GID missing in L3 #179

Merged
merged 16 commits into from
Oct 23, 2024
Merged

GID missing in L3 #179

merged 16 commits into from
Oct 23, 2024

Conversation

i-be-snek
Copy link
Collaborator

@i-be-snek i-be-snek commented Oct 21, 2024

This PR fixes an issue that cased missing GIDs in L3. The issue was related to a function not returning properly. The PR also includes minimal changes that improve what functions return.

While fixing this, I am reminded of American states not returning the same GID.
image
(the screenshot is from a test run on some of the full run data for "l3/Specific_Instance_Per_Administrative_Area_Damage")

@liniiiiii do you think it's a good idea to add a fix for American states not being converted to the right GID? I yes, then I'll add a fix for that and tag you as a reviewer.

@liniiiiii
Copy link
Collaborator

This PR fixes an issue that cased missing GIDs in L3. The issue was related to a function not returning properly. The PR also includes minimal changes that improve what functions return.

While fixing this, I am reminded of American states not returning the same GID. image (the screenshot is from a test run on some of the full run data for "l3/Specific_Instance_Per_Administrative_Area_Damage")

@liniiiiii do you think it's a good idea to add a fix for American states not being converted to the right GID? I yes, then I'll add a fix for that and tag you as a reviewer.

@i-be-snek , I think it's good to add a fix for American states, but we can handle it in V2 if the data gap process is not replying on it, and currently the visualization of L2 and L3 are also not replying on it, thanks!

@i-be-snek
Copy link
Collaborator Author

I don't think we should leave this for V2 because no American states have GIDs now, so it would affect the V1 release.

@liniiiiii
Copy link
Collaborator

I don't think we should leave this for V2 because no American states have GIDs now, so it would affect the V1 release.
Pls add a fix to it then, I will review them together! Thanks!

@i-be-snek i-be-snek requested a review from liniiiiii October 21, 2024 18:00
@i-be-snek i-be-snek mentioned this pull request Oct 21, 2024
@liniiiiii
Copy link
Collaborator

I run the parse_events.py, and the missing GID for US locations is solved! Thanks!
image

@i-be-snek
Copy link
Collaborator Author

@liniiiiii if you are happy with everything and have checked everything, please approve it so we can merge it.
When we are done with these bugs, we can do the full-run again :)

@i-be-snek
Copy link
Collaborator Author

i-be-snek commented Oct 22, 2024

Not sure what was going on before but I can confirm it works as intended with India!
image

I'll be pushing some updated tests shortly

@i-be-snek
Copy link
Collaborator Author

@liniiiiii This is now ready to review :) whenever you have time

@i-be-snek
Copy link
Collaborator Author

Okay, unlike github may like to claim, I am not cancelling any github action runs. Not sure what the problem exactly is but we can run all tests locally with poetry run pytest before merging to make sure everything is good. Yesterday the same happened, github actions was patchy but it worked after numerous re-runs.

@liniiiiii
Copy link
Collaborator

@i-be-snek The India case is solved, and I found that for India, there are several GIDs, potentially from its historical GIDs, which in visualization process, I also found some GIDs for historical regions. The PR is passed locally with poetry run pytest in my test.
image

But I have a question about the parsing process now, because when I run the command below

poetry run python3 Database/parse_events.py --raw_dir Database/raw/ESSD_2024_V3/dev --filename wiki_dev_whole_infobox_20240729_70single_events_all_categories_V3_gpt-4o-2024-05-13_rawoutput_data_time_nums.json --output_dir Database/output/V3_branch_gid_test --event_levels l1,l2,l3

I got response like

parse_events: 2024-10-23 12:01:58 INFO     Normalizing nulls for l2 Instance_Per_Administrative_Areas_Homeless
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 70/70 [00:00<00:00, 283947.08it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 101/101 [00:00<00:00, 5468.81it/s]
parse_events: 2024-10-23 12:01:58 INFO     Dropping any columns with non-str column names due to None types in the dicts []

After that, I only get the L2 and L3 with damage and insured damage parsed results.
image

To double check, I switch to main and run git pull and open a new branch to test the parsing process, and I get the same output. When I run the evaluation process on Monday morning, I also used this function for my V3_3 dev set, and it worked well. Could you help to check if something are blocked in the parsing process?
image

@i-be-snek
Copy link
Collaborator Author

@liniiiiii Thanks for the review.
I'll run the command you provided on my end and check.

@i-be-snek
Copy link
Collaborator Author

@liniiiiii btw, the "several GIDs" are disputed areas that India claims, so this is not an error. India, China, and many countries are represented by multiple GIDs.

@i-be-snek
Copy link
Collaborator Author

i-be-snek commented Oct 23, 2024

To double check, I switch to main and run git pull and open a new branch to test the parsing process, and I get the same output. When I run the evaluation process on Monday morning, I also used this function for my V3_3 dev set, and it worked well. Could you help to check if something are blocked in the parsing process?

@liniiiiii So my understanding is that you ran the command and things worked as expected. You then ran the same command in the main branch (after pulling the latest changes) and everything worked as expected. What part is the problem? Did something go wrong or look strange?

Nevermind, now I understand that some categories are missing. :)
Checking now!

@i-be-snek
Copy link
Collaborator Author

@liniiiiii you discovered a really small typo that causes a really big problem. The code was tabbed wrong 🥲 big thanks!

I pushed a fix for that now and added pandarallel so that the code can run faster (except when normalization locations because we have to have a 1 second delay).
Please test your command again.

@liniiiiii
Copy link
Collaborator

liniiiiii commented Oct 23, 2024

@i-be-snek Thanks for this commit Deprecate pandarallel for now due to hanging in l3 (known error in… … pandarallel) , I was testing and it stops running without reason, I thought my laptop was too tired to work :D
Now I'm testing this pr, the parsing is good, and the local test passed as well, could I merge it with this local test passed?

wikimpacts-py3.11nl@F-IR-HYDRPC135:~/Wikimpacts$ poetry run pytest 
================================================================= test session starts ==================================================================
platform linux -- Python 3.11.9, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/nl/Wikimpacts
configfile: pytest.ini
testpaths: tests
plugins: anyio-4.4.0
collected 146 items                                                                                                                                    

tests/test_normalize_locations.py ..............................                                                                                 [ 20%]
tests/test_normalize_numbers.py .............................................................................................................    [ 95%]
tests/test_specific_instance_matcher.py .......                                                                                                  [100%]

=================================================================== warnings summary ===================================================================
../.cache/pypoetry/virtualenvs/wikimpacts-jRbwdZbh-py3.11/lib/python3.11/site-packages/pycountry/__init__.py:10
  /home/nl/.cache/pypoetry/virtualenvs/wikimpacts-jRbwdZbh-py3.11/lib/python3.11/site-packages/pycountry/__init__.py:10: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../.cache/pypoetry/virtualenvs/wikimpacts-jRbwdZbh-py3.11/lib/python3.11/site-packages/pkg_resources/__init__.py:2825
  /home/nl/.cache/pypoetry/virtualenvs/wikimpacts-jRbwdZbh-py3.11/lib/python3.11/site-packages/pkg_resources/__init__.py:2825: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

tests/test_normalize_numbers.py::TestNormalizeNumbers::test__preprocess[EUR19-19]
  /home/nl/.cache/pypoetry/virtualenvs/wikimpacts-jRbwdZbh-py3.11/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

tests/test_normalize_numbers.py::TestNormalizeNumbers::test__preprocess[EUR19-19]
  /home/nl/.cache/pypoetry/virtualenvs/wikimpacts-jRbwdZbh-py3.11/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== 146 passed, 4 warnings in 213.69s (0:03:33) ======================================================

Copy link
Collaborator

@liniiiiii liniiiiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passed the poetry run pytest

@i-be-snek i-be-snek merged commit c22c243 into main Oct 23, 2024
1 check failed
liniiiiii pushed a commit that referenced this pull request Oct 24, 2024
@i-be-snek i-be-snek linked an issue Oct 31, 2024 that may be closed by this pull request
6 tasks
@i-be-snek i-be-snek deleted the gid-l3-fix branch January 17, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

process full run database (to do branch)
2 participants