News AI documentation
src.ingestion package
Submodules
src.ingestion.fetch_articles module
This script asynchronously fetches and stores content of news articles by scraping provided URLs and checking if the content is already present in the database.
- async src.ingestion.fetch_articles.fetch_article_content(article_ids, session)[source]
Fetches the content of a list of articles asynchronously, by checking if content already exists in the database, and if not, extracting the content from the given URLs.
- Parameters:
article_ids (List[str]) – List of IDs of the articles to fetch content for.
session (aiohttp.ClientSession) – The aiohttp session to use for the request.
- Returns:
List of dictionaries, each containing the ID and content of a fetched article.
- Return type:
List[Dict[str, str]]
- async src.ingestion.fetch_articles.test_fetch_article_content(article_ids: List[str]) List[Dict[str, str]][source]
Tests the fetch_article_content function by fetching content for a list of article IDs.
- Parameters:
article_ids (List[str]) – A list of article IDs to fetch content for.
- Returns:
A list of dictionaries where each dictionary contains the ID and content of a fetched article.
- Return type:
List[Dict[str, str]]
src.ingestion.newsapi module
- src.ingestion.newsapi.fetch_news(query, from_date: datetime, sort_by, limit, to_json)[source]
Fetches news articles from NewsAPI for the given query, from date and sort_by.
- Parameters:
query (str) – The query to search for in the NewsAPI.
from_date (datetime.datetime) – The date from which to fetch the articles.
sort_by (str) – The field to sort the results by.
limit (int) – The number of articles to fetch.
to_json (bool) – Whether to store the results in a JSON file.
- Returns:
The IDs of the articles that were fetched and stored in MongoDB.
- Return type:
List[str]
src.ingestion.prawapi module
Module contents
src.preprocessing package
Submodules
src.preprocessing.keyword_extraction module
- src.preprocessing.keyword_extraction.bert_keyword_extraction(texts: List[str], top_n: int = 10) List[str][source]
Extracts keywords from a list of texts using KeyBERT.
- Parameters:
texts (List[str]) – List of texts to extract keywords from.
top_n (int) – Number of top keywords to extract per text.
- Returns:
List of unique extracted keywords.
- Return type:
List[str]
- src.preprocessing.keyword_extraction.extract_keywords(article_ids, top_n: int = 10)[source]
Extracts keywords from a list of texts using KeyBERT.
- Parameters:
texts (List[str]) – List of texts to extract keywords from.
top_n (int) – Number of top keywords to extract per text.
- Returns:
It returns something else not a list of list of str. List[List[str]]: List of keyword lists for each text.
src.preprocessing.summarization module
Module contents
src.sentiment_analysis package
Submodules
src.sentiment_analysis.classify module
- src.sentiment_analysis.classify.classify_sentiments(texts: List[str]) Dict[str, List[Tuple[str, float]]][source]
Classify the sentiment of multiple texts.
- Parameters:
texts (List[str]) – List of text to classify sentiment for.
- Returns:
- Dictionary with three keys: ‘positive’, ‘negative’, ‘neutral’.
Each key maps to a list of tuples, where the first element of the tuple is the text and the second element is the sentiment score.
- Return type:
Dict[str, List[Tuple[str, float]]]
src.sentiment_analysis.sentiment_model module
- src.sentiment_analysis.sentiment_model.analyze_sentiments(article_ids: List[str]) List[Dict[str, float]][source]
Analyze the sentiment of a list of article IDs.
- Parameters:
article_ids (List[str]) – List of article IDs to analyze.
- Returns:
List of sentiment analysis results for each text.
- Return type:
List[Dict[str, float]]
src.sentiment_analysis.wordcloud module
- src.sentiment_analysis.wordcloud.generate_wordcloud(keywords: List[str], sentiment_label: str) WordCloud[source]
Generates a word cloud for the given list of keywords and sentiment label.
- Parameters:
keywords (List[str]) – List of keywords to include in the word cloud.
sentiment_label (str) – Sentiment label to generate the word cloud for.
- Returns:
The generated word cloud.
- Return type:
WordCloud
Module contents
src package
Subpackages
Submodules
src.pipeline module
Module contents
src
src.dashboard package
Submodules
src.dashboard.app module
Module contents
src.utils package
Submodules
src.utils.dbconnector module
- src.utils.dbconnector.append_to_document(collection_name, query, update_data)[source]
Appends new data to an existing document in the MongoDB collection.
- Parameters:
collection_name (str) – The name of the MongoDB collection.
query (dict) – The query to select the document to update.
update_data (dict) – The new data to be appended to the document.
- Returns:
The number of documents updated.
- Return type:
int
- src.utils.dbconnector.content_manager(article_id, required_fields)[source]
Checks if the specified fields are present in the database for the given article_id.
- Parameters:
article_id (str) – The ID of the article to check.
required_fields (list) – A list of fields to check for presence (e.g., [“content”, “summary”, “keywords”, “sentiment”]).
- Returns:
A dictionary with the status of each field (True if present, False if not).
- Return type:
dict
- src.utils.dbconnector.fetch_and_combine_articles(collection_name, article_ids)[source]
Fetches documents from the given MongoDB collection using the given IDs and combines them into a Pandas DataFrame.
- Parameters:
collection_name (str) – The name of the MongoDB collection.
article_ids (List[str]) – List of IDs of the articles to fetch and combine.
- Returns:
A Pandas DataFrame containing the combined documents.
- Return type:
pd.DataFrame
- Raises:
Exception – If there is an error fetching and combining the documents.
- src.utils.dbconnector.find_documents(collection_name, query)[source]
Finds documents in the given MongoDB collection using the given query.
- Parameters:
collection_name (str) – The name of the MongoDB collection.
query (dict) – The query to select documents.
- Returns:
A list of documents found by the query.
- Return type:
list
- Raises:
Exception – If there is an error finding documents.
- src.utils.dbconnector.find_one_document(collection_name, query)[source]
Finds a single document in the given MongoDB collection using the given query.
- Parameters:
collection_name (str) – The name of the collection.
query (dict) – The query to select documents.
- Returns:
The selected document.
- Return type:
dict
- Raises:
Exception – If there is an error finding the document.
- src.utils.dbconnector.get_mongo_client()[source]
Connects to MongoDB and returns the database object.
- Uses environment variables for connection:
MONGO_USERNAME: username for MongoDB authentication MONGO_PASSWORD: password for MongoDB authentication MONGO_DB_NAME: name of the database to connect to
- Returns:
the connected database object
- Return type:
pymongo.database.Database
- Raises:
Exception – if connection fails
- src.utils.dbconnector.insert_document(collection_name, document)[source]
Inserts a document into the given collection.
- Parameters:
collection_name (str) – The name of the collection.
document (dict) – The document to be inserted.
- Returns:
The ID of the inserted document.
- Return type:
str
- Raises:
Exception – If there is an error inserting the document.
src.utils.logger module
- src.utils.logger.setup_logger(log_file='app.log')[source]
Sets up a logger with a console handler and a rotating file handler.
The console handler has color coding for different log levels, while the file handler does not. The file handler will rotate the log file every 5MB, keeping up to 5 backups.
- Parameters:
log_file (str) – The name of the log file to write to. Defaults to “app.log”.
- Returns:
The configured logger.
- Return type:
logger (logging.Logger)
Module contents
Project Overview
- <div align=”center”>
<img src=”https://github.com/user-attachments/assets/b825468e-515c-45e8-9b81-a4f1b033ab0c” alt=”NewsAI Logo” width=”200px”> <h1>🚀 NewsAI: Where AI Meets Breaking News! 🌟</h1> <p><i>Buckle up, news junkies! We’re about to take you on a wild ride through the information superhighway! 🎢</i></p>
     [](https://newsai.readthedocs.io/en/latest/?badge=latest)
</div>
## 🎭 What’s All the Fuss About?
Imagine if CNN, Reddit, and a fortune-teller had a baby, and that baby was raised by AI. That’s NewsAI! We’re not just aggregating news; we’re revolutionizing how you experience information:
🔮 Gemini-Powered Insights: Google’s Gemini AI is our crystal ball!
🧠 BERT-Based Sentiment Analysis: We don’t just read news; we feel it in our circuits!
🚀 FastAPI Backend: So fast, it breaks the space-time continuum!
🖥️ Streamlit Dashboard: Where data visualization meets modern art!
🍃 MongoDB: Because our data is too cool for tables!
## 🎬 See It or Don’t Believe It!
Deployment link : https://news-ai-dashboard.streamlit.app/ <div align=”center”>
- <a href=”https://www.youtube.com/watch?v=stTXgljJVPQ”>
<img src=”https://img.youtube.com/vi/stTXgljJVPQ/0.jpg” alt=”Demo Video” width=”500px”>
</a> <br> <i>Warning: This video may cause uncontrollable desire to code! 🤓</i>
</div>
## 🚀 Quick Start: 0 to Hero in 3… 2… 1…
```bash # Clone this bad boy git clone https://github.com/Multiverse-of-Projects/NewsAI.git
# Enter the matrix cd NewsAI
# Install magical dependencies pip install -r requirements.txt
# Add neccessary creds in .env file create an .env file with api keys and all
# Add python path and run streamlit from src/dashboard/ streamlit run app.py
# If you want to run only the pipeline.py python -m src.pipeline
# If you want to Unleash your creativity git checkout -b feature/skynet-integration
# Start coding like you’re trying to prevent Y2K! # for reference my python version == 3.12.7 ```
## 🌈 Contribution: Join Our Avengers of Code!
🍴 Fork (the repo, not your dinner)
🌿 Branch (create one, don’t climb one)
💡 Commit (changes, not crimes)
🚀 Push (to the repo, not your luck)
🎉 PR (Pull Request, not Public Relations)
## 🏆 Wall of Fame: Our Code Wizards
- <div align=”center”>
- <a href=”https://github.com/Multiverse-of-Projects/NewsAI/graphs/contributors”>
<img src=”https://contrib.rocks/image?repo=Multiverse-of-Projects/NewsAI” />
</a>
</div>
- <div align=”center”>
<b>These legends write code that makes Shakespeare look like a casual blogger!</b>
</div>
## 📚 Documentation: The Sacred Texts
Our docs are so good, they’re basically the eighth wonder of the world. Check them out on [Read the Docs](https://newsai.readthedocs.io/)!
## 🎨 Our Tech Palette: Tools of Mass Construction
🧠 Gemini AI: For insights sharper than a samurai’s sword
🤖 BERT: Sentiment analysis that can read between the lines (and emojis)
🚀 FastAPI: Because life’s too short for slow APIs
🖥️ Streamlit: Making dashboards sexier than a sports car
🍃 MongoDB: NoSQL? More like YesQL to all our data needs!
## 📬 Reach Out and Touch Code
📧 Email: patel.devasy.23@gmail.com (We read faster than we code!)
<!– - 🐦 Twitter: [@NewsAIDashboard](https://twitter.com/NewsAIDashboard) (Follow us for dad jokes and tech puns) –> - 💬 Discord: [Join our server](https://discord.gg/kV4ANf6x) (Where we debate tabs vs. spaces)
## 📜 License to Thrill
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details. It’s basically a license to code with reckless abandon!
—
- <div align=”center”>
<img src=”https://media.giphy.com/media/3o7btXkbsV26U95Uly/giphy.gif” width=”200px”> <br> <b>May your code be bug-free and your coffee be strong! 🚀☕</b>
</div>