HTML parsing

PioPio · Post by **PioPio** » Mon Jan 08, 2018 10:49 am

Hello,

I am trying to extract some information from HTML pages. Looking into the DOMVisitor example I can see the method GetElementById can isolate a part of HTML page containing the information I need.
However, the HTML isolated as such, needs to be parsed anyway because there are nested tables, tags and so on in it and I was wondering which of the following is the best way to achieve this:

iterating the DOM elements via IHTMLDocument2 like in this example https://stackoverflow.com/questions/143 ... ag-parsing.

Using an HTML Parser. If so, I didn't find anything in CEF4Delphi (I suppose it is because this not the purpose of CEF4Delphi). Is there anything Open Source you would recommend for HTML parsing ?

Many thanks
Alberto

salvadordf · Post by **salvadordf** » Mon Jan 08, 2018 11:09 am

I don't know which is the best but you can try these too :

https://github.com/ying32/htmlparser
https://github.com/biznow/htmlparser
https://github.com/CyberShadow/HTMLParser
http://htmlp.sourceforge.net/
https://github.com/flyingtime/delphi-html-parser
http://wiki.delphi-jedi.org/wiki/JVCL_H ... HTMLParser

BriskBard, CEF4Delphi, WebView4Delphi, WebUI4Delphi and WebUI4CSharp

HTML parsing

HTML parsing

Re: HTML parsing