Utilities

The utilities package and its modules provide functions and helpers for transforming the scraped data.

tofeed.utilities.shorten_to_title(string, length, separator=' ', appendix='...')

Use the string to create a human readable title.

The function searches for the last occurrence of the separator string within the range of the string limited by the length parameter and appends the appendix.

Parameters:
  • string (str) – The string to create the title out of
  • length (int) – The approximate length of the title. The length is only approximate to this value, because it’s possible that the next separator string encountered may lie further ahead.
  • separator (str) – The character that separates words in the text. Usually of course a space, this should only have to be changed in very rare cases.
  • appendix (str) – The string to append to the end of the title
Returns:

The title created from the string

Spoon

Helper functions for modifying BeautifulSoup objects.

tofeed.utilities.spoon.absolutize_references(base_url, tag, attributes=['href', 'src'], recursive=True)

Turns references found within the tag’s attributes absolute.

Parameters:
  • base_url (str) – The base URL used to absolutize the references
  • tag (bs4.element.Tag) – The tag to absolutize references in
  • attributes (list) – The attributes containing the URLs that should be made absolute
  • recursive (bool) – If true the tag and all its sub tags will be searched, else only the tag and its direct descendants will be searched.
tofeed.utilities.spoon.collapse_tag(tag)

Replaces the tag’s descendants with their strings.

Parameters:tag (bs4.element.Tag) – The tag to collapse.
tofeed.utilities.spoon.convert_newlines(tag, recursive=True)

Replaces newline characters found in the tag’s strings with line break tags.

Parameters:
  • tag (bs4.element.Tag) – The tag to convert newline characters in.
  • recursive (bool) – If true the tag and all its sub tags will be searched, else only the tag and its direct descendants will be searched.
tofeed.utilities.spoon.new_string = <bound method BeautifulSoup.new_string of >

Shortcut for BeautifulSoup.new_string(). This allows using the bound methods of the BeautifulSoup class without having to instantiate it manually.

tofeed.utilities.spoon.new_tag = <bound method BeautifulSoup.new_tag of >

Shortcut for BeautifulSoup.new_tag(). This allows using the bound methods of the BeautifulSoup class without having to instantiate it manually.

tofeed.utilities.spoon.replace_string_with_tag(tag, string, replacement, recursive=True)

Replaces all occurrences of string within the tag’s strings with the replacement tag.

Parameters:
  • tag (bs4.element.Tag) – The tag to replace strings in
  • string (str) – The string to replace
  • bs4.element.Tag replacement (str) – The tag replacing the string
  • recursive (bool) – If true the tag and all its sub tags will be searched, else only the tag and its direct descendants will be searched.