Disclosure Statement: This site contains affiliate links, which means that I may receive a commission if you make a purchase using these links. As an eBay Partner, I earn from qualifying purchases.

Best approach for a server-side HTML to image/pdf converter ?

Eric
Posts: 12
Joined: Thu Dec 10, 2020 10:48 am

Best approach for a server-side HTML to image/pdf converter ?

Post by Eric »

(followup from https://github.com/salvadordf/CEF4Delphi/issues/329)

Hi,

I am experimenting with using CEF4Delphi server-side to convert HTML to an image or PDF, as a replacement for outdated webkit-based tools like phantomjs or wkhtmltopdf.

What would the simplest / safest way be ? an OSR mode browser or a normal mode one ?
The difference with the demos being there would never be a visible TForm, and it would operate in command line mode.

Just looking for pointers in the right direction in case one option is known dead-end...

To illustrate the issues, one use case would be the website screenshots on https://beginend.net, where phantomjs is used right now, but does not support ECMA6, so it breaks on any "modern" website.

In that use case, the screenshot is generated through phantomjs scripting: the url is loaded, the script waits a bit to give the page time to load, then performs some DOM tricks (like hiding cookie banners...) before finally taking a screenshot.
User avatar
salvadordf
Posts: 4056
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by salvadordf »

Hi,

Sorry for asking you to move this conversation to the forum but I prefer to leave the GitHub issues for bugs only.

The right approach would be to use a browser in "off-screen" mode (OSR) because it can be used without a real user interface.

The trick would be to add this browser in a console application because many methods in Chromium are asynchronous. You have something similar in the ConsoleBrowser demo but that demo has a windowed browser in a DLL and you don't need that. You would also need to use a different EXE for the subprocesses.

I'll try to create a new demo with something similar to what you describe during the weekend.
Eric
Posts: 12
Joined: Thu Dec 10, 2020 10:48 am

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by Eric »

Thanks! no problem about moving the conversation here!

I will look at automating the rest (cookie banner stuff, image conversion options...) to the basic demo exists and report the progress here, in case that is useful to other people.

There are other ways to do it (Selenium, Puppeteer, etc.) but they all involve a rather heavy infrastructure. Having everything controlled directly from an exe like phantomjs did is more straightforward.
User avatar
salvadordf
Posts: 4056
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by salvadordf »

Hi Eric,

Please download CEF4Delphi from GitHub and take a look at the new ConsoleBrowser2 demo.

It uses a browser in OSR mode and a different EXE for the subprocesses. All that is encapsulated in a thread and it's used in a console application.

Read the code comments for more information.
Eric
Posts: 12
Joined: Thu Dec 10, 2020 10:48 am

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by Eric »

Thanks!

I had little it of trouble at first because setting FrameworkDirPath, ResourcesDirPath & LocalesDirPath in the main process is not enough, the subprocess either needs to be in the CEF directory or SetCurrentDir used in the main process (as it carries over to the subprocess).

There is also a non-systematic crash in TCEFBrowserThread.SaveSnapshotToFile when debugging the main executable, which occurs when it's called with a Self value of nil, and the line below does not guard against that

Code: Select all

if (FBrowserInfoCS = nil) then exit;
It would be trivial to guard against a nil Self here, but I am not sure if that would not be sweeping another issue under the proverbial rug ?
Call stack when that happens is

Code: Select all

uCEFBrowserThread.TCEFBrowserThread.SaveSnapshotToFile('snapshot.bmp')
uEncapsulatedBrowser.TEncapsulatedBrowser.Thread_OnSnapshotAvailable(???)
uCEFBrowserThread.TCEFBrowserThread.WebpagePostProcessing
uCEFBrowserThread.TCEFBrowserThread.Execute
:0066a2ba TEncapsulatedBrowser.Thread_OnSnapshotAvailable + $E
:0040a60a ThreadWrapper + $2A
:77c1fa29 KERNEL32.BaseThreadInitThunk + 0x19
:77dc75f4 ntdll.RtlGetAppContainerNamedObjectPath + 0xe4
:77dc75c4 ntdll.RtlGetAppContainerNamedObjectPath + 0xb4
and it occurs on a second call to SaveSnapshotToFile (there is a first "correct" call before)
Eric
Posts: 12
Joined: Thu Dec 10, 2020 10:48 am

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by Eric »

I have made some tests at using the same executable for the sub-process (switching between main and subprocess behavior based on command line parameters), it seems to work fine.

Is there a hidden reason while it should not be done ?

One reason being that I have seen several times the subprocess executable flagged by antivirus, I guess because the subprocess exe does not do "enough", and gets flagged by heuristics. This happens on and off when building the demos (I have reported them as false positives, with varying success)

FWIW current hacking effort of your demo at https://github.com/EricGrange/cefHtmlSnapshot
User avatar
salvadordf
Posts: 4056
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by salvadordf »

Thank for reporting this issue! :)

I saw some possible causes of that error and I'll upload the new version as soon as I fix it for Lazarus.
User avatar
salvadordf
Posts: 4056
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by salvadordf »

The antivirus warning is a known false positive.
Sadly, some dishonest people use CEF too and some antiviruses don't have the best detection algorithm.
User avatar
salvadordf
Posts: 4056
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by salvadordf »

I just uploaded a new version with some fixes, more checks and more code comments.

Please, download CEF4Delphi again from GitHub.
Eric
Posts: 12
Joined: Thu Dec 10, 2020 10:48 am

Re: Best approach for a server-side HTML to image/pdf converter ?

Post by Eric »

Thanks, works like a charm now.

I have adapted TakeSnapshot for PrintToPDF purposes, and it seems to work as well.

Now on to busting those cookie banners... :D
Post Reply