dhtmlparser3.tags.tag¶

class dhtmlparser3.tags.tag.Tag(name, parameters=None, content=None, is_non_pair=False)[source]¶

Bases: object

name¶

Name of the parsed tag.

Type: str

parameters¶

Dictionary for the parameters.

Type: SpecialDict

content¶

List of sub-elements.

Type: list

parent¶

Reference to parent element.

Type: Tag

property p: Dict[str, str]¶: Shortcut for .parameters, used extensively in tests.

property c¶: Shortcut for .content, used extensively in tests.

property tags: List[dhtmlparser3.tags.tag.Tag]¶: Same as .c, but returns only tag instances. Useful for ignoring whitespace and comment clutter and iterating over the real dom structure.

double_link()[source]¶: Make the DOM hierarchy double-linked. Each content element now points to the parent element.

content_without_tags() → str[source]¶

Return content but remove all tags.

This is sometimes useful for processing messy websites.

remove(offending_item: Union[str, dhtmlparser3.tags.tag.Tag, dhtmlparser3.tags.comment.Comment]) → bool[source]¶

Remove offending_item anywhere from the dom.

Item is matched using is operator, so it better be something you’ve found using .find() or other relevant methods.

Returns: True if the item was found and removed.
Return type: bool

remove_item(item: Union[str, dhtmlparser3.tags.tag.Tag, dhtmlparser3.tags.comment.Comment])[source]¶: Remove the item from the .content property.

to_string() → str[source]¶: Get HTML representation of the tag and the content.

tag_to_str() → str[source]¶: Convert just the tag with parameters to string, without content.

content_str(escape=False) → str[source]¶

Return everything in between the tags as string.

Parameters: escape (bool) – Escape the content. Default False.

replace_with(item: dhtmlparser3.tags.tag.Tag, keep_content: bool = False)[source]¶

Replace this Tag with another item.

Parameters

item (Tag, str) – Item to replace this with.
keep_content (bool) – Keep the original content. Default False.

wfind(name, p=None, fn=None, case_sensitive=False)[source]¶

match(*args)[source]¶

Recursively call find for each element in *args. That means fuzzy matching, like “find all <div>`s, which have this `<p> element, which has this <a> in it.

Example

dom.match(“div”, [“p”, {“class”: “great”}], “a”)

Parameters: *args (list) – List of paths to match.
Returns: List of matched elements.
Return type: list

match_paths(*args)[source]¶

Exactly match the path given by the arguments.

Example

dom.match(“body”, [“div”, {“class”: “page-body”}], “p”)

This will match the path only if it really goes like this. If the <p> is for example wrapped in <div>, it won’t be matched.

Parameters: *args (list) – List of paths to match.
Returns: List of matched elements.
Return type: list

find(name, p=None, fn=None, case_sensitive=False) → List[dhtmlparser3.tags.tag.Tag][source]¶

Find (depth first) all tags with given parameters.

Parameters

name (str) – Name of the tag you are looking for. Use “” for all.
p (dict) – Parameters to match.
fn (lambda fn) – Lambda expecting one argument. It will be tested for each element in the tree.
case_sensitive (bool) – Use case sensitive search. Default False.

findb(name, p=None, fn=None, case_sensitive=False) → List[dhtmlparser3.tags.tag.Tag][source]¶

Find (breadth first) all tags with given parameters.

Parameters

name (str) – Name of the tag you are looking for. Use “” for all.
p (dict) – Parameters to match.
fn (lambda fn) – Lambda expecting one argument. It will be tested for each element in the tree.
case_sensitive (bool) – Use case sensitive search. Default False.

find_depth_first_iter(name, p=None, fn=None, case_sensitive=False) → Iterator[dhtmlparser3.tags.tag.Tag][source]¶

find_breadth_first_iter(name, p=None, fn=None, case_sensitive=False) → Iterator[dhtmlparser3.tags.tag.Tag][source]¶

depth_first_iterator(tags_only=False) → Iterator[Union[dhtmlparser3.tags.tag.Tag, str, dhtmlparser3.tags.comment.Comment]][source]¶

breadth_first_iterator(tags_only=False, _first_call=True) → Iterator[Union[dhtmlparser3.tags.tag.Tag, str, dhtmlparser3.tags.comment.Comment]][source]¶

prettify(depth=0, dont_format=False) → str[source]¶