RSS Scraper¶
- access_amherst_algo.rss_scraper.fetch_rss.fetch_rss()¶
Fetch the RSS feed and save it as an XML file.
This function retrieves the RSS feed from The Hub (https://thehub.amherst.edu/events.rss), and saves the raw content of the response as an XML file. The filename is timestamped based on the current date and time, and the file is stored in the rss_files directory.
The function uses the requests library to fetch the data and saves it in binary format.
- Return type:
None
Examples
>>> fetch_rss()
- access_amherst_algo.rss_scraper.parse_rss.create_events_list()¶
Create a list of event details from an RSS XML file.
This function loads an RSS XML file with a timestamped filename format, parses its content, and extracts event details from each <item> element. The event details are returned as a list of dictionaries, with each dictionary containing relevant information for a single event.
- Returns:
A list where each dictionary represents an event and contains extracted details retrieved by extract_event_details.
- Return type:
list of dict
Examples
>>> events = create_events_list() >>> print(events[0]["title"]) 'Literature Speaker Event'
- access_amherst_algo.rss_scraper.parse_rss.save_event_to_db(event_data)¶
Save event data to the Django model.
This function processes event data for database storage by parsing publication, start, and end dates into timezone-aware datetime objects, extracting and adjusting location data, and generating a unique event ID. The function then updates or creates an entry in the database’s Event table.
- Parameters:
event_data (dict) – A dictionary containing event details
- Return type:
None
Examples
>>> event_data = { ... "title": "Literature Speaker Event", ... "link": "https://thehub.amherst.edu/event/10000000", ... "event_description": "Join us to hear our speaker's talk on American Literature! Food from a local restaurant will be provided.", ... "categories": ["Lecture", "Workshop"], ... "pub_date": "Sun, 03 Nov 2024 05:30:25 GMT", ... "starttime": "Tue, 05 Nov 2024 18:00:00 GMT", ... "endtime": "Tue, 05 Nov 2024 19:00:00 GMT", ... "location": "Friedmann Room", ... "author": "literature@amherst.edu", ... "host": "Literature Club", ... } >>> save_event_to_db(event_data)
- access_amherst_algo.rss_scraper.parse_rss.save_json()¶
Save the list of extracted events to a JSON file.
This function generates a timestamped JSON file containing event details. It first creates a list of events by calling create_events_list(), and then writes this list to a JSON file with a filename format based on the current date and time.
The resulting JSON file is saved in the json_outputs directory under the rss_scraper folder.
- Return type:
None
Examples
>>> save_json()
- access_amherst_algo.rss_scraper.parse_rss.save_to_db()¶
Clean and save event data to the database.
This function first retrieves a cleaned list of events by calling the clean_hub_data() function. It then iterates through each event in the list and saves the event data to the database using the save_event_to_db() function.
This process ensures that only cleaned event data is stored in the database.
- Return type:
None
Examples
>>> save_to_db()