HTML Parser

library that simplifies parsing of the HTML documents, for .NET

Why

As I needed to parse the html code into separate tags and as due to the less strictness of the HTML in comparison to XML using the XMLDocument from .NET library was not an option I creted this library .

What

In the current state it is able to extract all the tags from the source HTML code into a list of TagItem-s where you have:
  • the "name" of the tag (like div, a, span, /div, /a, /span, etc.)
  • list of AttributeItem-s where each of them represents one "key/value" pair for each attribute provide with the tag (like href/http://www.example.com, title/Titlw, etc.)

How

All the general parsing is done using the regular expressions via .Net Regex library. This seemed the simplest and fast enough way how to do it.



Enjoy and in case of any questions, comments or suggestions do not hesitate to contact me. They are very welcomed.
[LHA]

Last edited Mar 6, 2010 at 4:52 AM by lukash, version 2