dhtmlparser3.tags.tag¶
- class dhtmlparser3.tags.tag.Tag(name, parameters=None, content=None, is_non_pair=False)[source]¶
Bases:
object
- name¶
Name of the parsed tag.
- Type
str
- parameters¶
Dictionary for the parameters.
- Type
- content¶
List of sub-elements.
- Type
list
- property p: Dict[str, str]¶
Shortcut for .parameters, used extensively in tests.
- property c¶
Shortcut for .content, used extensively in tests.
- property tags: List[dhtmlparser3.tags.tag.Tag]¶
Same as .c, but returns only tag instances. Useful for ignoring whitespace and comment clutter and iterating over the real dom structure.
- double_link()[source]¶
Make the DOM hierarchy double-linked. Each content element now points to the parent element.
- content_without_tags() → str[source]¶
Return content but remove all tags.
This is sometimes useful for processing messy websites.
- remove(offending_item: Union[str, dhtmlparser3.tags.tag.Tag, dhtmlparser3.tags.comment.Comment]) → bool[source]¶
Remove offending_item anywhere from the dom.
Item is matched using is operator, so it better be something you’ve found using .find() or other relevant methods.
- Returns
True if the item was found and removed.
- Return type
bool
- remove_item(item: Union[str, dhtmlparser3.tags.tag.Tag, dhtmlparser3.tags.comment.Comment])[source]¶
Remove the item from the .content property.
- content_str(escape=False) → str[source]¶
Return everything in between the tags as string.
- Parameters
escape (bool) – Escape the content. Default False.
- replace_with(item: dhtmlparser3.tags.tag.Tag, keep_content: bool = False)[source]¶
Replace this Tag with another item.
- Parameters
item (Tag, str) – Item to replace this with.
keep_content (bool) – Keep the original content. Default False.
- match(*args)[source]¶
Recursively call find for each element in *args. That means fuzzy matching, like “find all <div>`s, which have this `<p> element, which has this <a> in it.
Example
dom.match(“div”, [“p”, {“class”: “great”}], “a”)
- Parameters
*args (list) – List of paths to match.
- Returns
List of matched elements.
- Return type
list
- match_paths(*args)[source]¶
Exactly match the path given by the arguments.
Example
dom.match(“body”, [“div”, {“class”: “page-body”}], “p”)
This will match the path only if it really goes like this. If the <p> is for example wrapped in <div>, it won’t be matched.
- Parameters
*args (list) – List of paths to match.
- Returns
List of matched elements.
- Return type
list
- find(name, p=None, fn=None, case_sensitive=False) → List[dhtmlparser3.tags.tag.Tag][source]¶
Find (depth first) all tags with given parameters.
- Parameters
name (str) – Name of the tag you are looking for. Use “” for all.
p (dict) – Parameters to match.
fn (lambda fn) – Lambda expecting one argument. It will be tested for each element in the tree.
case_sensitive (bool) – Use case sensitive search. Default False.
- findb(name, p=None, fn=None, case_sensitive=False) → List[dhtmlparser3.tags.tag.Tag][source]¶
Find (breadth first) all tags with given parameters.
- Parameters
name (str) – Name of the tag you are looking for. Use “” for all.
p (dict) – Parameters to match.
fn (lambda fn) – Lambda expecting one argument. It will be tested for each element in the tree.
case_sensitive (bool) – Use case sensitive search. Default False.
- find_depth_first_iter(name, p=None, fn=None, case_sensitive=False) → Iterator[dhtmlparser3.tags.tag.Tag][source]¶
- find_breadth_first_iter(name, p=None, fn=None, case_sensitive=False) → Iterator[dhtmlparser3.tags.tag.Tag][source]¶
- depth_first_iterator(tags_only=False) → Iterator[Union[dhtmlparser3.tags.tag.Tag, str, dhtmlparser3.tags.comment.Comment]][source]¶
- breadth_first_iterator(tags_only=False, _first_call=True) → Iterator[Union[dhtmlparser3.tags.tag.Tag, str, dhtmlparser3.tags.comment.Comment]][source]¶