Disclosure Statement: This site contains affiliate links, which means that I may receive a commission if you make a purchase using these links. As an eBay Partner, I earn from qualifying purchases.

Get access to the DOM

Post Reply
Pcrepair
Posts: 10
Joined: Tue Sep 27, 2022 5:09 pm

Get access to the DOM

Post by Pcrepair »

Good afternoon salvadordf

My program should always work with a new webpage and get a set of tags (after implementing JScripts) to search for certain data.
Everything was simple before using JScripts, but that was a long time ago.

Studying CEF4Delphi made some conclusions regarding its protocol of operation.
1) TChromium loads the html code of the page with JS from the server;
2) TChromium executes JS from the page code;
3) TChromium builds the Document Object Model of the page. here it is possible to embed your scripts via browser.MainFrame.ExecuteJavaScript();
(*TChromium puts together all browser procedures, functions, properties and events in one place, to create, modify and destroy a web browser.*)

4) TCEFWindowParent displays the rendering result for the user (human);
(*The TCEFWindowParent component is used in VCL with a TChromium component to embed a web browser in normal mode, browser in normal mode let CEF create some native child controls to show the web contents in them. TCEFWindowParent inherits from TCEFWinControl and it's used as the parent of those child controls. TCEFWindowParent also controls the size of those child controls.*)

Building a DOM tree.
After receiving the html code of the webpage from the server and working out the JScripts, TChromium turns each tag into a node and builds their hierarchy.

Code: Select all

<html>  
<head>  
  <link rel="stylesheet" href="style.css">
</head>  
<body>  
    <h1>How can I get DOM info?</h1>
</body>  
</html>
The result is a node tree, or simply a DOM-tree, in which nested elements are represented as child nodes with a full set of attributes.

html
head
link rel="stylesheet"
href="style.css"
body
h1 How can I get DOM info?

Please let for me know if it is possible or not to get the specified node tree:
- as a set of tags;
- in the form of a string, preserving the line break (not in one line of the entire code);
- in the order of their placement in the web page;

in general, I need to get the structure of the page in a simplified form, for subsequent analysis.

if YES, please point to the code examples, without using "ExecuteJavaScript()", only "document: ICefDomDocument".

In general, is there a separation:
1) "ExecuteJavaScript()" is a universal way of working with DOM and is better used in all cases
2) or working with "document: ICefDomDocument" can solve all problems and I can not go beyond DELPHI

Compiling and viewing the following demos did not indicate the desired result, but it is possible that I missed something.
CRBrowser.exe
CustomTitleBar.exe
DOMVisitor.exe
EditorBrowser.exe
ExternalPumpBrowser.exe
FullScreenBrowser.exe
JSDialogBrowser.exe
JSSimpleWindowBinding.exe
KioskOSRBrowser.exe
MDIBrowser.exe
MDIExternalPumpBrowser.exe
MediaRouter.exe
MiniBrowser.exe
OSRExternalPumpBrowser.exe
PopupBrowser2.exe
PostInspectorBrowser.exe
ResponseFilterBrowser.exe
SchemeRegistrationBrowser.exe
SimpleBrowser.exe
SimpleBrowser_sp.exe
SimpleBrowser2.exe
SimpleExternalPumpBrowser.exe
SimpleOSRBrowser.exe
SimpleServer.exe
TabbedBrowser2.exe
TabbedOSRBrowser.exe
TabBrowser.exe
TinyBrowser.exe
TinyBrowser2.exe
ToolBoxBrowser.exe
ToolBoxBrowser2.exe
ToolBoxSubProcessBrowser_sp.exe
URLRequest.exe
WebpageSnapshot.exe

"DOMVisitor.exe ", there is "SimpleDOMIteration", outputs "TempChild.Name", but that's not enough.

Thanks.
User avatar
salvadordf
Posts: 4057
Joined: Thu Feb 02, 2017 12:24 pm
Location: Spain
Contact:

Re: Get access to the DOM

Post by salvadordf »

CEF allows 3 ways to access the DOM :
  • Using JavaScript. This is the most powerful and the preferred way to access the DOM.
  • Using the CEF interfaces like ICefDomDocument and ICefDomNode. These interfaces are limited. Used only when you need something really simple.
  • Using the DevTools protocol. This method is more powerful than the CEF interfaces but it can be complicated to use.
The DOMVisitor demo shows how to use all of them. Read the code comments in that demo and read this document too if you decide to try the DevTools method :
https://chromedevtools.github.io/devtools-protocol/tot/DOM/

However, the CEF project maintainer always suggests using JavaScript to access the DOM.
Post Reply