Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting errors about etype possibly dtype typo? #122

Closed
DavidLKing opened this issue Mar 2, 2022 · 16 comments
Closed

Getting errors about etype possibly dtype typo? #122

DavidLKing opened this issue Mar 2, 2022 · 16 comments

Comments

@DavidLKing
Copy link

DavidLKing commented Mar 2, 2022

With a fresh install and processing a wiktionary dump, I get the following error. It honestly just looks like a typo, but I can't find it:

./wiktwords --inflections --num-threads 8 --out wiktionary.txt ../wiktionary_data/enwiktionary-20220201-pages-articles-multistream.xml.bz2
  ...
  ... 7800000 raw pages collected
Analyzing which templates should be expanded before parsing
Second phase - processing pages
Extracting thesaurus data
  ... 6849/7801231 pages (0.1%) processed, 00:19:02 remaining
  ... 13697/7801231 pages (0.2%) processed, 00:19:02 remaining
  ... 20609/7801231 pages (0.3%) processed, 00:19:00 remaining
  ... 27329/7801231 pages (0.4%) processed, 00:19:05 remaining
  ... 33793/7801231 pages (0.4%) processed, 00:19:18 remaining
  ... 40449/7801231 pages (0.5%) processed, 00:19:21 remaining
  ... 47425/7801231 pages (0.6%) processed, 00:19:16 remaining
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/envs/wikt/lib/python3.10/site-packages/wikitextprocessor/core.py", line 68, in phase2_page_handler
    ret = _global_page_handler(model, title, data)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 249, in page_handler
    recurse(tree)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 206, in recurse
    recurse(contents.children)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 214, in recurse
    recurse(contents.children)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 235, in recurse
    recurse(contents.children)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 219, in recurse
    recurse(contents.children)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 229, in recurse
    recurse(contents.children)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 206, in recurse
    recurse(contents.children)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 137, in recurse
    w = clean_node(config, ctx, None, node.children)
  File "/home/jupyter/wiktextract/wiktextract/page.py", line 3239, in clean_node
    v = ctx.node_to_html(value, node_handler_fn=clean_node_handler_fn,
TypeError: Wtp.node_to_html() got an unexpected keyword argument 'node_handler_fn'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/wikt/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/envs/wikt/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/opt/conda/envs/wikt/lib/python3.10/site-packages/wikitextprocessor/core.py", line 71, in phase2_page_handler
    lst = traceback.format_exception(etype=type(e), value=e,
TypeError: format_exception() got an unexpected keyword argument 'etype'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jupyter/wiktextract/./wiktwords", line 257, in <module>
    parse_wiktionary(ctx, args.path, config, word_cb, capture_cb,
  File "/home/jupyter/wiktextract/wiktextract/wiktionary.py", line 128, in parse_wiktionary
    return reprocess_wiktionary(ctx, config, word_cb, capture_cb)
  File "/home/jupyter/wiktextract/wiktextract/wiktionary.py", line 142, in reprocess_wiktionary
    thesaurus_data = extract_thesaurus_data(ctx, config)
  File "/home/jupyter/wiktextract/wiktextract/thesaurus.py", line 254, in extract_thesaurus_data
    for word, linkages in ctx.reprocess(page_handler, autoload=False):
  File "/opt/conda/envs/wikt/lib/python3.10/site-packages/wikitextprocessor/core.py", line 1339, in reprocess
    for success, title, t, ret in \
  File "/opt/conda/envs/wikt/lib/python3.10/multiprocessing/pool.py", line 448, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/opt/conda/envs/wikt/lib/python3.10/multiprocessing/pool.py", line 870, in next
    raise value
TypeError: format_exception() got an unexpected keyword argument 'etype'
@tatuylonen
Copy link
Owner

This seems to be from an old version. The current versions on pypi are quite dated (I'll try to release new ones next week once I get my current changes stabilized). For now I would recommend installing both wikitextprocessor and wiktextract from the repository in github. (However be aware that even the repo is currently a bit unstable as I'm in the middle of debugging a hang in my production environment that I haven't yet been able to reproduce on my development machine, plus some other changes; I expect this to resolve by mid next week.)

@DavidLKing
Copy link
Author

No rush! The error might have been on me. The wiktextract readme may have mentioned this, but I had not pip installed wikitextprocessor. pip installing that independently resolved the issue for me (I used the pip install -e . method from the git repos directly)

@doctorcolossus
Copy link

doctorcolossus commented May 17, 2022

Still getting this error here with the latest from github:

$ git log

commit 037bc087b321b1e36b0ab54f069be4ff23dc401f (HEAD -> master, origin/master, origin/HEAD)
Author: Tatu Ylonen <ylo@clausal.com>
Date:   Thu May 12 14:41:35 2022 +0300

    Updated TODO

...
./wiktwords --language Finnish --out wiktionary-finnish.json ~/download/enwiktionary-20220401-pages-articles.xml.bz2
Capturing words for: Finnish
First phase - extracting templates, macros, and pages
  ... 10000 raw pages collected
  ... 20000 raw pages collected
  ... 30000 raw pages collected

...

  ... 7860000 raw pages collected
  ... 7870000 raw pages collected
  ... 7880000 raw pages collected
Analyzing which templates should be expanded before parsing
Second phase - processing pages
Extracting thesaurus data
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/wikitextprocessor/core.py", line 68, in phase2_page_handler
    ret = _global_page_handler(model, title, data)
  File "~/wiktextract/wiktextract/thesaurus.py", line 249, in page_handler
    recurse(tree)
  File "~/wiktextract/wiktextract/thesaurus.py", line 206, in recurse
    recurse(contents.children)
  File "~/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "~/wiktextract/wiktextract/thesaurus.py", line 214, in recurse
    recurse(contents.children)
  File "~/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "~/wiktextract/wiktextract/thesaurus.py", line 235, in recurse
    recurse(contents.children)
  File "~/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "~/wiktextract/wiktextract/thesaurus.py", line 219, in recurse
    recurse(contents.children)
  File "~/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "~/wiktextract/wiktextract/thesaurus.py", line 229, in recurse
    recurse(contents.children)
  File "~/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "~/wiktextract/wiktextract/thesaurus.py", line 206, in recurse
    recurse(contents.children)
  File "~/wiktextract/wiktextract/thesaurus.py", line 124, in recurse
    recurse(x)
  File "~/wiktextract/wiktextract/thesaurus.py", line 137, in recurse
    w = clean_node(config, ctx, None, node.children)
  File "~/wiktextract/wiktextract/page.py", line 3298, in clean_node
    v = ctx.node_to_html(value, node_handler_fn=clean_node_handler_fn,
TypeError: Wtp.node_to_html() got an unexpected keyword argument 'node_handler_fn'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/usr/lib/python3.10/site-packages/wikitextprocessor/core.py", line 71, in phase2_page_handler
    lst = traceback.format_exception(etype=type(e), value=e,
TypeError: format_exception() got an unexpected keyword argument 'etype'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/wiktextract/wiktwords", line 263, in <module>
    parse_wiktionary(ctx, args.path, config, word_cb, capture_cb,
  File "~/wiktextract/wiktextract/wiktionary.py", line 134, in parse_wiktionary
    return reprocess_wiktionary(ctx, config, word_cb, capture_cb)
  File "~/wiktextract/wiktextract/wiktionary.py", line 148, in reprocess_wiktionary
    thesaurus_data = extract_thesaurus_data(ctx, config)
  File "~/wiktextract/wiktextract/thesaurus.py", line 254, in extract_thesaurus_data
    for word, linkages in ctx.reprocess(page_handler, autoload=False):
  File "/usr/lib/python3.10/site-packages/wikitextprocessor/core.py", line 1339, in reprocess
    for success, title, t, ret in \
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 448, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 870, in next
    raise value
TypeError: format_exception() got an unexpected keyword argument 'etype'

@kristian-clausal
Copy link
Collaborator

Oops, seems like the above comment was missed when the issue was still closed. Reopening for now.

@kristian-clausal
Copy link
Collaborator

kristian-clausal commented Jun 9, 2022

@doctorcolossus if you have wikitextprocessor installed from pypi, or a version of wikitextprocessor from github from before December 6th 2021, then Wtp.node_to_html in wikitextprocessor/core.py doesn't have the node_handler_fn key parameter, which seems to be the problem here.

Up-to-date versions of both wiktextract and wikitextprocessor are required, and both of them are really old on pypi.

@doctorcolossus
Copy link

Ah, thank you! I had been under the impression that it was wiktextract which needed the latest version, and wikitextprocessor was a dependency I wasn't even aware of which was most likely brought in by pip from when I first tried installing wiktextract from there. Let me grab the latest version of that and give it a try. I'll report back in a moment.

@doctorcolossus
Copy link

Hmm, not so easy unfortunately...
From the wikitextprocessor repository, pip install -e . fails with a long traceback which boils down to:

File "/usr/lib/python3.10/site-packages/setuptools/command/easy_install.py", line 1338, in create_home_path
  if path.startswith(home) and not os.path.isdir(path):
AttributeError: 'int' object has no attribute 'startswith'

Obviously path shouldn't be an int and I'm not sure why it is. I don't have time just this moment to go further down this rabbithole. If you have any ideas for a workaround or another way of installing, please let me know, and otherwise I'll try later to figure out the meaning of this.

@doctorcolossus
Copy link

doctorcolossus commented Jun 9, 2022

/usr/lib/python3.10/site-packages/setuptools/command/easy_install.py:157: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "<pip-setuptools-caller>", line 34, in <module>
  File "/home/{username}/wikitextprocessor/setup.py", line 10, in <module>
    setup(name="wikitextprocessor",
  File "/usr/lib/python3.10/site-packages/setuptools/__init__.py", line 155, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 148, in setup
    return run_commands(dist)
  File "/usr/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
    dist.run_commands()
  File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 985, in run_command
    cmd_obj.ensure_finalized()
  File "/usr/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 107, in ensure_finalized
    self.finalize_options()
  File "/usr/lib/python3.10/site-packages/setuptools/command/develop.py", line 52, in finalize_options
    easy_install.finalize_options(self)
  File "/usr/lib/python3.10/site-packages/setuptools/command/easy_install.py", line 276, in finalize_options
    self._fix_install_dir_for_user_site()
  File "/usr/lib/python3.10/site-packages/setuptools/command/easy_install.py", line 382, in _fix_install_dir_for_user_site
    self.create_home_path()
  File "/usr/lib/python3.10/site-packages/setuptools/command/easy_install.py", line 1338, in create_home_path
    if path.startswith(home) and not os.path.isdir(path):
AttributeError: 'int' object has no attribute 'startswith'

@doctorcolossus
Copy link

doctorcolossus commented Jun 9, 2022

Okay, I persisted a little and that turned out to be a known problem with setuptools solved by upgrading it.
Dumping now, but it will take a while of course. I'll try to remember to report back in the morning (midnight here).

@doctorcolossus
Copy link

doctorcolossus commented Jun 9, 2022

Still getting this with wikitextprocessor installed from the github repository (8207892321161aca45bfe210615dba19c96fb001):

$ ./wiktwords --language Finnish --out wiktionary-finnish.json ~/download/enwiktionary-20220401-pages-articles.xml.bz2
Capturing words for: Finnish
First phase - extracting templates, macros, and pages
  ... 10000 raw pages collected

...

  ... 6255135/7889284 pages (79.3%) processed, 00:02:12 remaining
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 67, in phase2_page_handler
    ret = _global_page_handler(model, title, data)
  File "~/wiktextract/wiktextract/thesaurus.py", line 94, in page_handler
    expanded = ctx.expand(text, templates_to_expand=None)  # Expand all
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1268, in expand
    expanded = expand_recurse(encoded, parent, templates_to_expand)
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1214, in expand_recurse
    t = expand_recurse(encoded_body, new_parent,
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1081, in expand_recurse
    ret = expand_parserfn(fn_name,
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1021, in expand_parserfn
    ret = call_parser_function(self, fn_name, args, expander)
  File "~/wikitextprocessor/wikitextprocessor/parserfns.py", line 1454, in call_parser_function
    return fn(ctx, fn_name, args, expander)
  File "~/wikitextprocessor/wikitextprocessor/parserfns.py", line 64, in if_fn
    return expander(arg1).strip()
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1014, in <lambda>
    expander = lambda arg: expand_recurse(arg, parent,
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1214, in expand_recurse
    t = expand_recurse(encoded_body, new_parent,
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1081, in expand_recurse
    ret = expand_parserfn(fn_name,
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1019, in expand_parserfn
    ret = invoke_fn(args, expander, parent)
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 918, in invoke_fn
    ret = call_lua_sandbox(self, invoke_args, expander, parent, timeout)
  File "~/wikitextprocessor/wikitextprocessor/luaexec.py", line 370, in call_lua_sandbox
    ctx.lua_reset_env()
  File "lupa/_lupa.pyx", line 587, in lupa._lupa._LuaObject.__call__
  File "lupa/_lupa.pyx", line 1333, in lupa._lupa.call_lua
  File "lupa/_lupa.pyx", line 1359, in lupa._lupa.execute_lua_call
  File "lupa/_lupa.pyx", line 1295, in lupa._lupa.raise_lua_error
lupa._lupa.LuaError: [string "<python>"]:142: Lua timeout error
stack traceback:
	[C]: in function 'error'
	[string "<python>"]:142: in hook '?'
	[string "<python>"]:369: in function <[string "<python>"]:300>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 70, in phase2_page_handler
    lst = traceback.format_exception(etype=type(e), value=e,
TypeError: format_exception() got an unexpected keyword argument 'etype'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/wiktextract/./wiktwords", line 263, in <module>
    parse_wiktionary(ctx, args.path, config, word_cb, capture_cb,
  File "~/wiktextract/wiktextract/wiktionary.py", line 134, in parse_wiktionary
    return reprocess_wiktionary(ctx, config, word_cb, capture_cb)
  File "~/wiktextract/wiktextract/wiktionary.py", line 148, in reprocess_wiktionary
    thesaurus_data = extract_thesaurus_data(ctx, config)
  File "~/wiktextract/wiktextract/thesaurus.py", line 254, in extract_thesaurus_data
    for word, linkages in ctx.reprocess(page_handler, autoload=False):
  File "~/wikitextprocessor/wikitextprocessor/core.py", line 1389, in reprocess
    for success, title, t, ret in \
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 870, in next
    raise value
TypeError: format_exception() got an unexpected keyword argument 'etype'

@kristian-clausal
Copy link
Collaborator

This seems like a problem with Lua-scripting timeouts when expanding templates. Afaict, it should fail "gracefully" by outputting error messages into the output of the expansion itself, so this seems like a bug at some level. However, it's a different issue from the original post of this thread (and the stuff with github vs. pypi installations), so please start a new issue and then I will close this one.

@kristian-clausal
Copy link
Collaborator

kristian-clausal commented Jun 14, 2022

Puttering around with my own stuff, turns out the etype= error was actually a bug in wikitextprocessor... But it's a bug that only affects the traceback formatter from the traceback package, so all it did was not give the correct error messages in the tracebacks. So, fixing it would not have fixed any of the underlying errors and you'd have had a crash anyhow, just with better information.

The traceback formatter is used with Lua stuff, which itself is used when expanding templates from the wiktionary source.

The bug was simply that etype= as a keyvalue in a function call has been deprecated and removed in newer versions of Python 3. The argument is now positional.

As soon we get this fixed on wikitextprocessor, Lua errors should start looking much more informative. @doctorcolossus please repost your latest post as a new issue then.

@doctorcolossus
Copy link

Hey Kristian, sorry that my replies have slowed down recently - I've been very busy with work and life.

Today I pulled the latest changes from wiktextract and ran the same command given above, and it succeeded for the first time. It takes a while to run and quite a bit of memory in the meantime, which I can't afford again just this moment, but I will try to remember to try one more time in the next day or two, perhaps with a different language, and will open a new issue if I do experience this issue again. But perhaps it is fixed now?

Thank you so much for your attention and feedback.

@kristian-clausal
Copy link
Collaborator

Out of curiosity, did you also update wikitextprocessor? None of the commits I've made to wiktextract should have done anything regarding the issue in this thread, and the change in wikitextprocessor was minor. In that case it might succeed or fail arbitrarily.

I'll have to ask Tatu when he's not busy, but it's possible that fixing the Lua error messaging allowed some other bit of code to catch the exceptions and fail gracefully. In fact, now that I wrote that, it seems even probable.

@doctorcolossus
Copy link

When I tried a git pull on wikitextprocessor, it was already up-to-date, i.e. at the same commit as it was in the last error traceback I posted above here.

@kristian-clausal
Copy link
Collaborator

Then it seems the issue was not the fix in wikitextprocessor. If you pull it now, you should get better error messages if the problem crops up again the future. For now, I'm closing this specific thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants