Spiders¶
- class city_scrapers_core.spiders.CityScrapersSpider(*args, **kwargs)[source]¶
Base Spider class for City Scrapers projects. Provides a few utilities for common tasks like creating a meeting ID and checking the status based on meeting details.
- get_id(item, identifier=None)[source]¶
Create an ID for a meeting based on its details like title and start time as well as any agency-provided unique identifiers.
- Parameters
item (
Mapping
) – Meeting to generate an ID foridentifier (
Optional
[str
]) – Optional unique meeting identifier if available, defaults to None
- Return type
str
- Returns
ID string based on meeting details
- get_status(item, text='')[source]¶
Determine the status of a meeting based off of its details as well as any additional text that may indicate whether it has been cancelled.
- Parameters
item (
Mapping
) – Meeting to get the status fortext (
str
) – Any additional text not included in the meeting details that may indicate whether it’s been cancelled, defaults to “”
- Return type
str
- Returns
Status constant
- class city_scrapers_core.spiders.EventsCalendarSpider(*args, **kwargs)[source]¶
Subclass of CityScrapersSpider that may be useful for WordPress sites with the Events Calendar plugin. Three additional things need to be implemented when subclassing:
a categories dict
_parse_location()
_parse_links()
- property categories: Dict¶
categories dict should be of the following format: categories = {
BOARD: [“category-1”, “category-2”], ..
}
- Return type
Dict
- class city_scrapers_core.spiders.LegistarSpider(*args, **kwargs)[source]¶
Subclass of
CityScrapersSpider
that handles processing Legistar sites, which almost always share the same components and general structure.Any methods that don’t pull the correct values can be replaced.
- legistar_links(item)[source]¶
Pulls relevant links from a Legistar item
- Parameters
item (
Dict
) – Scraped item from Legistar- Return type
List
[Dict
]- Returns
List of meeting links
- legistar_source(item)[source]¶
Pulls the source URL from a Legistar item. Pulls a specific meeting URL if available, otherwise defaults to the general Legistar calendar page.
- Parameters
item (
Dict
) – Scraped item from Legistar- Return type
str
- Returns
Source URL
- legistar_start(item)[source]¶
Pulls the start time from a Legistar item
- Parameters
item (
Dict
) – Scraped item from Legistar- Return type
datetime
- Returns
Meeting start datetime
- parse_legistar(events)[source]¶
Method to be implemented by Spider classes that will handle the response from Legistar. Functions similar to
parse
for other Spider classes.- Parameters
events (
Iterable
[Dict
]) – Iterable consisting of a dict of scraped results from Legistar- Raises
NotImplementedError – Must be implemented in subclasses
- Return type
Iterable
[Meeting
]- Returns
Meeting
objects that will be passed to pipelines, output