Page 1 of 1

Problem getting HTML source of page.

Posted: Sat Mar 22, 2025 12:00 am
by PhillHS
Hi all,

I'm having problems getting a page's HTML source. I'm initiating the loading with TChromium.LoadURL, with a callback OnLoadEnd that is basically copied from the mini browser :

Code: Select all

procedure TBrowserForm.ChromeLoadEnd(Sender: TObject;
  const browser: ICefBrowser; const frame: ICefFrame;
  httpStatusCode: Integer);
begin
  Debug.Log('ChromeLoadEnd() code=%d',[httpStatusCode]);
  if (browser <> nil) and (browser.Identifier = Chrome.BrowserId) and ((frame = nil) or (frame.IsMain)) then
  begin;
    IF (frame.IsMain) THEN
    BEGIN;
      Chrome.Browser.MainFrame.GetSourceProc(ChromeCallbackGetSource);
      ChromeBusy:=FALSE;
    END;
  end;
end;         

procedure ChromeCallbackGetSource(const src: ustring);
begin
  BrowserForm.FChromeSrc:=src;

  Debug.Log(src);

  IF (BrowserForm.Handlers.IsHTMLHandler) THEN
    BrowserForm.GoHtml(TRUE,src)
  ELSE
    BrowserForm.Handlers.ChromeDone(src);

  BrowserForm.Ready:=TRUE;
end;                                                                           
What turns up in the src is only a couple of lines.

So I tried retrieving the frame source using :

Code: Select all

procedure TBrowserForm.Button2Click(Sender: TObject);

VAR
  i          : NativeUInt;

  TempCount  : NativeUInt;
  TempArray  : TCefFrameIdentifierArray;
  TempString : string;

begin
  TempCount := Chrome.FrameCount;

  if Chrome.GetFrameIdentifiers(TempCount, TempArray) then
  begin
    TempString := '';
    i          := 0;

    while (i < TempCount) do
    begin
      TempString := TempString + inttostr(TempArray[i]) + CRLF;
      Debug.Log('FrameID:%d',[TempArray[i]]);
      Chrome.RetrieveHTML(TempArray[i]);
      inc(i);
    end;

    Debug.Log(TempString);
  end;
end;                                                                       

procedure TBrowserForm.ChromeTextResultAvailable(Sender: TObject;
  const aText: ustring);

BEGIN;
  Debug.Log(atext);
end;                          
Again this produces only a couple of lines of source, certainly not the source for the full page.

The odd thing is if I right click on the browser output window with the loaded page, and select view page source, a notepad window opesn with the full source of the page. What am I doing wrong?

Cheers.

Phill.

Re: Problem getting HTML source of page.

Posted: Sat Mar 22, 2025 2:46 pm
by salvadordf
Hi Phill,

The ICefFrame.GetSourceProc and ICefFrame.GetSource methods are used to get the HTML source for that specific frame.

The MiniBrowser demo uses the TChromiumCore.RetrieveHTML procedure and the TChromiumCore.OnTextResultAvailable event to get the source but the DOMVisitor demo uses two other methods in the context menu to visit the DOM and get the HTML source.

Try these context menu options and see if they give better results for your application :
  • Visit DOM in CEF (BODY HTML)
  • Visit DOM using JavaScript
Some websites are not fully loaded when the TChromiumCore.OnLoadEnd event is triggered. In those cases try using the TChromiumCore.OnLoadingStateChange event and check the isLoading value.