dhtmlparser3.tags.tag

class dhtmlparser3.tags.tag.Tag(name, parameters=None, content=None, is_non_pair=False)[source]

Bases: object

name

Name of the parsed tag.

Type

str

parameters

Dictionary for the parameters.

Type

SpecialDict

content

List of sub-elements.

Type

list

parent

Reference to parent element.

Type

Tag

property p: Dict[str, str]

Shortcut for .parameters, used extensively in tests.

property c

Shortcut for .content, used extensively in tests.

property tags: List[dhtmlparser3.tags.tag.Tag]

Same as .c, but returns only tag instances. Useful for ignoring whitespace and comment clutter and iterating over the real dom structure.

Make the DOM hierarchy double-linked. Each content element now points to the parent element.

content_without_tags() str[source]

Return content but remove all tags.

This is sometimes useful for processing messy websites.

remove(offending_item: Union[str, dhtmlparser3.tags.tag.Tag, dhtmlparser3.tags.comment.Comment]) bool[source]

Remove offending_item anywhere from the dom.

Item is matched using is operator, so it better be something you’ve found using .find() or other relevant methods.

Returns

True if the item was found and removed.

Return type

bool

remove_item(item: Union[str, dhtmlparser3.tags.tag.Tag, dhtmlparser3.tags.comment.Comment])[source]

Remove the item from the .content property.

to_string() str[source]

Get HTML representation of the tag and the content.

tag_to_str() str[source]

Convert just the tag with parameters to string, without content.

content_str(escape=False) str[source]

Return everything in between the tags as string.

Parameters

escape (bool) – Escape the content. Default False.

replace_with(item: dhtmlparser3.tags.tag.Tag, keep_content: bool = False)[source]

Replace this Tag with another item.

Parameters
  • item (Tag, str) – Item to replace this with.

  • keep_content (bool) – Keep the original content. Default False.

wfind(name, p=None, fn=None, case_sensitive=False)[source]
match(*args)[source]

Recursively call find for each element in *args. That means fuzzy matching, like “find all <div>`s, which have this `<p> element, which has this <a> in it.

Example

dom.match(“div”, [“p”, {“class”: “great”}], “a”)

Parameters

*args (list) – List of paths to match.

Returns

List of matched elements.

Return type

list

match_paths(*args)[source]

Exactly match the path given by the arguments.

Example

dom.match(“body”, [“div”, {“class”: “page-body”}], “p”)

This will match the path only if it really goes like this. If the <p> is for example wrapped in <div>, it won’t be matched.

Parameters

*args (list) – List of paths to match.

Returns

List of matched elements.

Return type

list

find(name, p=None, fn=None, case_sensitive=False) List[dhtmlparser3.tags.tag.Tag][source]

Find (depth first) all tags with given parameters.

Parameters
  • name (str) – Name of the tag you are looking for. Use “” for all.

  • p (dict) – Parameters to match.

  • fn (lambda fn) – Lambda expecting one argument. It will be tested for each element in the tree.

  • case_sensitive (bool) – Use case sensitive search. Default False.

findb(name, p=None, fn=None, case_sensitive=False) List[dhtmlparser3.tags.tag.Tag][source]

Find (breadth first) all tags with given parameters.

Parameters
  • name (str) – Name of the tag you are looking for. Use “” for all.

  • p (dict) – Parameters to match.

  • fn (lambda fn) – Lambda expecting one argument. It will be tested for each element in the tree.

  • case_sensitive (bool) – Use case sensitive search. Default False.

find_depth_first_iter(name, p=None, fn=None, case_sensitive=False) Iterator[dhtmlparser3.tags.tag.Tag][source]
find_breadth_first_iter(name, p=None, fn=None, case_sensitive=False) Iterator[dhtmlparser3.tags.tag.Tag][source]
depth_first_iterator(tags_only=False) Iterator[Union[dhtmlparser3.tags.tag.Tag, str, dhtmlparser3.tags.comment.Comment]][source]
breadth_first_iterator(tags_only=False, _first_call=True) Iterator[Union[dhtmlparser3.tags.tag.Tag, str, dhtmlparser3.tags.comment.Comment]][source]
prettify(depth=0, dont_format=False) str[source]