Disclosure Statement: This site contains affiliate links, which means that I may receive a commission if you make a purchase using these links. As an eBay Partner, I earn from qualifying purchases.
If you find these projects useful please consider becoming a sponsor with Patreon, GitHub or Liberapay.

Problem getting HTML source of page.

Post Reply
PhillHS
Posts: 7
Joined: Sun Mar 05, 2023 5:27 pm

Problem getting HTML source of page.

Post by PhillHS »

Hi all,

I'm having problems getting a page's HTML source. I'm initiating the loading with TChromium.LoadURL, with a callback OnLoadEnd that is basically copied from the mini browser :

Code: Select all

procedure TBrowserForm.ChromeLoadEnd(Sender: TObject;
  const browser: ICefBrowser; const frame: ICefFrame;
  httpStatusCode: Integer);
begin
  Debug.Log('ChromeLoadEnd() code=%d',[httpStatusCode]);
  if (browser <> nil) and (browser.Identifier = Chrome.BrowserId) and ((frame = nil) or (frame.IsMain)) then
  begin;
    IF (frame.IsMain) THEN
    BEGIN;
      Chrome.Browser.MainFrame.GetSourceProc(ChromeCallbackGetSource);
      ChromeBusy:=FALSE;
    END;
  end;
end;         

procedure ChromeCallbackGetSource(const src: ustring);
begin
  BrowserForm.FChromeSrc:=src;

  Debug.Log(src);

  IF (BrowserForm.Handlers.IsHTMLHandler) THEN
    BrowserForm.GoHtml(TRUE,src)
  ELSE
    BrowserForm.Handlers.ChromeDone(src);

  BrowserForm.Ready:=TRUE;
end;                                                                           
What turns up in the src is only a couple of lines.

So I tried retrieving the frame source using :

Code: Select all

procedure TBrowserForm.Button2Click(Sender: TObject);

VAR
  i          : NativeUInt;

  TempCount  : NativeUInt;
  TempArray  : TCefFrameIdentifierArray;
  TempString : string;

begin
  TempCount := Chrome.FrameCount;

  if Chrome.GetFrameIdentifiers(TempCount, TempArray) then
  begin
    TempString := '';
    i          := 0;

    while (i < TempCount) do
    begin
      TempString := TempString + inttostr(TempArray[i]) + CRLF;
      Debug.Log('FrameID:%d',[TempArray[i]]);
      Chrome.RetrieveHTML(TempArray[i]);
      inc(i);
    end;

    Debug.Log(TempString);
  end;
end;                                                                       

procedure TBrowserForm.ChromeTextResultAvailable(Sender: TObject;
  const aText: ustring);

BEGIN;
  Debug.Log(atext);
end;                          
Again this produces only a couple of lines of source, certainly not the source for the full page.

The odd thing is if I right click on the browser output window with the loaded page, and select view page source, a notepad window opesn with the full source of the page. What am I doing wrong?

Cheers.

Phill.
User avatar
salvadordf
Posts: 4620
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Problem getting HTML source of page.

Post by salvadordf »

Hi Phill,

The ICefFrame.GetSourceProc and ICefFrame.GetSource methods are used to get the HTML source for that specific frame.

The MiniBrowser demo uses the TChromiumCore.RetrieveHTML procedure and the TChromiumCore.OnTextResultAvailable event to get the source but the DOMVisitor demo uses two other methods in the context menu to visit the DOM and get the HTML source.

Try these context menu options and see if they give better results for your application :
  • Visit DOM in CEF (BODY HTML)
  • Visit DOM using JavaScript
Some websites are not fully loaded when the TChromiumCore.OnLoadEnd event is triggered. In those cases try using the TChromiumCore.OnLoadingStateChange event and check the isLoading value.
Post Reply