src.preprocessing package

Submodules

src.preprocessing.keyword_extraction module

src.preprocessing.keyword_extraction.bert_keyword_extraction(texts: List[str], top_n: int = 10) List[str][source]

Extracts keywords from a list of texts using KeyBERT.

Parameters:
  • texts (List[str]) – List of texts to extract keywords from.

  • top_n (int) – Number of top keywords to extract per text.

Returns:

List of unique extracted keywords.

Return type:

List[str]

src.preprocessing.keyword_extraction.extract_keywords(article_ids, top_n: int = 10)[source]

Extracts keywords from a list of texts using KeyBERT.

Parameters:
  • texts (List[str]) – List of texts to extract keywords from.

  • top_n (int) – Number of top keywords to extract per text.

Returns:

It returns something else not a list of list of str. List[List[str]]: List of keyword lists for each text.

src.preprocessing.keyword_extraction.preprocess_text(text)[source]

Preprocesses a given text by tokenizing it and removing stopwords.

Parameters:

text (str) – The text to preprocess.

Returns:

A list of words without stopwords.

Return type:

List[str]

src.preprocessing.summarization module

Module contents