-
-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monolith is not embedding SVG files correctly #289
Comments
More info... I entered the worker container and ran the monolith command above. From within the worker container, the SVG files are not pulled correctly. From outside the container, the SVG files are pulled and embedded correctly. Inside the container, the resulting HTML file is 14.2 MB, while outside the container the resulting HTML file is 17.6 MB. The monolith logs from inside and outside the container are identical. |
hm I just ran the same test and it works fine for me.
The filesize is 17.58MB after downloading it in the container. |
I just ran the command in the worker container again and it worked this time. I got the full 17.6 MB capture from inside the container, which is different from a few days ago. So I deleted my capture in Hoarder and recreated it. The problem persists. That's got me scratching my head. I was able to delete the asset.bin and replace it with my manually captured "svgtest.htm" file (renamed) just to make sure that it wasn't a render issue, and it renders just fine as if it was captured correctly. It's possibly a network/firewall issue, but I'm a network person and we are having no other issues that I can detect. Can you try Hoarding that page through the web interface and see if you get the full 17.6 MB. If you do, then I'll know that I'm having a local issue. |
OK i tried it out and confirm that it does not show the images correctly in hoarder. The code shows that this is the used commandline:
I modified your command to this:
This also works fine (please doublecheck) and the file contains the proper images. (i did not provide the baseUrl, but i think that is fine) The thing is, that the code is actually also passing the html from the previous crawling step into monolith instead of providing the URL, which seems to cause this issue. Edit: OK I tried it out with piping the html from the page to monolith directly and the outcome is different and the svg is no longer captured. I guess we should simply make a new request to the page to get all the resources properly. |
Monolith doesn't execute javascript. So if you have a pure SPA, monolith will see an empty page. That's why you want chrome to first run the javascript and load the page, then pass the final html to monolith. |
OK turns our this is caused by the basePath we are passing to monolith. Gotta figure out a way to pass the correct path to it. |
Yes, this works fine from inside the container. Strangely this file is smaller, but still complete. |
passing in the URL of the page to have the proper URL for resolving relative paths
Monolith is capable of embedding SVG files in the output HTML, but when I hoard a page with Hoarder that includes SVG images, the monolith output is broken.
Here's an example page that fails:
If I grab this page with Hoarder, the archive is broken. Anywhere that there was an SVG there is a broken image icon.
However, if I grab it with monolith manually, the output HTML file is correct. Here's my monolith command line:
My testing was done with monolith 2.8.1 downloaded directly from their github here:
The text was updated successfully, but these errors were encountered: