Disclosure Statement: This site contains affiliate links, which means that I may receive a commission if you make a purchase using these links. As an eBay Partner, I earn from qualifying purchases.
Best approach for a server-side HTML to image/pdf converter ?
Best approach for a server-side HTML to image/pdf converter ?
(followup from https://github.com/salvadordf/CEF4Delphi/issues/329)
Hi,
I am experimenting with using CEF4Delphi server-side to convert HTML to an image or PDF, as a replacement for outdated webkit-based tools like phantomjs or wkhtmltopdf.
What would the simplest / safest way be ? an OSR mode browser or a normal mode one ?
The difference with the demos being there would never be a visible TForm, and it would operate in command line mode.
Just looking for pointers in the right direction in case one option is known dead-end...
To illustrate the issues, one use case would be the website screenshots on https://beginend.net, where phantomjs is used right now, but does not support ECMA6, so it breaks on any "modern" website.
In that use case, the screenshot is generated through phantomjs scripting: the url is loaded, the script waits a bit to give the page time to load, then performs some DOM tricks (like hiding cookie banners...) before finally taking a screenshot.
Hi,
I am experimenting with using CEF4Delphi server-side to convert HTML to an image or PDF, as a replacement for outdated webkit-based tools like phantomjs or wkhtmltopdf.
What would the simplest / safest way be ? an OSR mode browser or a normal mode one ?
The difference with the demos being there would never be a visible TForm, and it would operate in command line mode.
Just looking for pointers in the right direction in case one option is known dead-end...
To illustrate the issues, one use case would be the website screenshots on https://beginend.net, where phantomjs is used right now, but does not support ECMA6, so it breaks on any "modern" website.
In that use case, the screenshot is generated through phantomjs scripting: the url is loaded, the script waits a bit to give the page time to load, then performs some DOM tricks (like hiding cookie banners...) before finally taking a screenshot.
- salvadordf
- Posts: 4575
- Joined: Thu Feb 02, 2017 12:24 pm
- Location: Spain
- Contact:
Re: Best approach for a server-side HTML to image/pdf converter ?
Hi,
Sorry for asking you to move this conversation to the forum but I prefer to leave the GitHub issues for bugs only.
The right approach would be to use a browser in "off-screen" mode (OSR) because it can be used without a real user interface.
The trick would be to add this browser in a console application because many methods in Chromium are asynchronous. You have something similar in the ConsoleBrowser demo but that demo has a windowed browser in a DLL and you don't need that. You would also need to use a different EXE for the subprocesses.
I'll try to create a new demo with something similar to what you describe during the weekend.
Sorry for asking you to move this conversation to the forum but I prefer to leave the GitHub issues for bugs only.
The right approach would be to use a browser in "off-screen" mode (OSR) because it can be used without a real user interface.
The trick would be to add this browser in a console application because many methods in Chromium are asynchronous. You have something similar in the ConsoleBrowser demo but that demo has a windowed browser in a DLL and you don't need that. You would also need to use a different EXE for the subprocesses.
I'll try to create a new demo with something similar to what you describe during the weekend.
Re: Best approach for a server-side HTML to image/pdf converter ?
Thanks! no problem about moving the conversation here!
I will look at automating the rest (cookie banner stuff, image conversion options...) to the basic demo exists and report the progress here, in case that is useful to other people.
There are other ways to do it (Selenium, Puppeteer, etc.) but they all involve a rather heavy infrastructure. Having everything controlled directly from an exe like phantomjs did is more straightforward.
I will look at automating the rest (cookie banner stuff, image conversion options...) to the basic demo exists and report the progress here, in case that is useful to other people.
There are other ways to do it (Selenium, Puppeteer, etc.) but they all involve a rather heavy infrastructure. Having everything controlled directly from an exe like phantomjs did is more straightforward.
- salvadordf
- Posts: 4575
- Joined: Thu Feb 02, 2017 12:24 pm
- Location: Spain
- Contact:
Re: Best approach for a server-side HTML to image/pdf converter ?
Hi Eric,
Please download CEF4Delphi from GitHub and take a look at the new ConsoleBrowser2 demo.
It uses a browser in OSR mode and a different EXE for the subprocesses. All that is encapsulated in a thread and it's used in a console application.
Read the code comments for more information.
Please download CEF4Delphi from GitHub and take a look at the new ConsoleBrowser2 demo.
It uses a browser in OSR mode and a different EXE for the subprocesses. All that is encapsulated in a thread and it's used in a console application.
Read the code comments for more information.
Re: Best approach for a server-side HTML to image/pdf converter ?
Thanks!
I had little it of trouble at first because setting FrameworkDirPath, ResourcesDirPath & LocalesDirPath in the main process is not enough, the subprocess either needs to be in the CEF directory or SetCurrentDir used in the main process (as it carries over to the subprocess).
There is also a non-systematic crash in TCEFBrowserThread.SaveSnapshotToFile when debugging the main executable, which occurs when it's called with a Self value of nil, and the line below does not guard against that
It would be trivial to guard against a nil Self here, but I am not sure if that would not be sweeping another issue under the proverbial rug ?
Call stack when that happens is
and it occurs on a second call to SaveSnapshotToFile (there is a first "correct" call before)
I had little it of trouble at first because setting FrameworkDirPath, ResourcesDirPath & LocalesDirPath in the main process is not enough, the subprocess either needs to be in the CEF directory or SetCurrentDir used in the main process (as it carries over to the subprocess).
There is also a non-systematic crash in TCEFBrowserThread.SaveSnapshotToFile when debugging the main executable, which occurs when it's called with a Self value of nil, and the line below does not guard against that
Code: Select all
if (FBrowserInfoCS = nil) then exit;
Call stack when that happens is
Code: Select all
uCEFBrowserThread.TCEFBrowserThread.SaveSnapshotToFile('snapshot.bmp')
uEncapsulatedBrowser.TEncapsulatedBrowser.Thread_OnSnapshotAvailable(???)
uCEFBrowserThread.TCEFBrowserThread.WebpagePostProcessing
uCEFBrowserThread.TCEFBrowserThread.Execute
:0066a2ba TEncapsulatedBrowser.Thread_OnSnapshotAvailable + $E
:0040a60a ThreadWrapper + $2A
:77c1fa29 KERNEL32.BaseThreadInitThunk + 0x19
:77dc75f4 ntdll.RtlGetAppContainerNamedObjectPath + 0xe4
:77dc75c4 ntdll.RtlGetAppContainerNamedObjectPath + 0xb4
Re: Best approach for a server-side HTML to image/pdf converter ?
I have made some tests at using the same executable for the sub-process (switching between main and subprocess behavior based on command line parameters), it seems to work fine.
Is there a hidden reason while it should not be done ?
One reason being that I have seen several times the subprocess executable flagged by antivirus, I guess because the subprocess exe does not do "enough", and gets flagged by heuristics. This happens on and off when building the demos (I have reported them as false positives, with varying success)
FWIW current hacking effort of your demo at https://github.com/EricGrange/cefHtmlSnapshot
Is there a hidden reason while it should not be done ?
One reason being that I have seen several times the subprocess executable flagged by antivirus, I guess because the subprocess exe does not do "enough", and gets flagged by heuristics. This happens on and off when building the demos (I have reported them as false positives, with varying success)
FWIW current hacking effort of your demo at https://github.com/EricGrange/cefHtmlSnapshot
- salvadordf
- Posts: 4575
- Joined: Thu Feb 02, 2017 12:24 pm
- Location: Spain
- Contact:
Re: Best approach for a server-side HTML to image/pdf converter ?
Thank for reporting this issue!
I saw some possible causes of that error and I'll upload the new version as soon as I fix it for Lazarus.

I saw some possible causes of that error and I'll upload the new version as soon as I fix it for Lazarus.
- salvadordf
- Posts: 4575
- Joined: Thu Feb 02, 2017 12:24 pm
- Location: Spain
- Contact:
Re: Best approach for a server-side HTML to image/pdf converter ?
The antivirus warning is a known false positive.
Sadly, some dishonest people use CEF too and some antiviruses don't have the best detection algorithm.
Sadly, some dishonest people use CEF too and some antiviruses don't have the best detection algorithm.
- salvadordf
- Posts: 4575
- Joined: Thu Feb 02, 2017 12:24 pm
- Location: Spain
- Contact:
Re: Best approach for a server-side HTML to image/pdf converter ?
I just uploaded a new version with some fixes, more checks and more code comments.
Please, download CEF4Delphi again from GitHub.
Please, download CEF4Delphi again from GitHub.
Re: Best approach for a server-side HTML to image/pdf converter ?
Thanks, works like a charm now.
I have adapted TakeSnapshot for PrintToPDF purposes, and it seems to work as well.
Now on to busting those cookie banners...
I have adapted TakeSnapshot for PrintToPDF purposes, and it seems to work as well.
Now on to busting those cookie banners...
