Spiders¶
-
class
city_scrapers_core.spiders.CityScrapersSpider(*args, **kwargs)[source]¶ Base Spider class for City Scrapers projects. Provides a few utilities for common tasks like creating a meeting ID and checking the status based on meeting details.
-
get_id(item, identifier=None)[source]¶ Create an ID for a meeting based on its details like title and start time as well as any agency-provided unique identifiers.
- Parameters
item (
Mapping) – Meeting to generate an ID foridentifier (
Optional[str]) – Optional unique meeting identifier if available, defaults to None
- Return type
str- Returns
ID string based on meeting details
-
get_status(item, text='')[source]¶ Determine the status of a meeting based off of its details as well as any additional text that may indicate whether it has been cancelled.
- Parameters
item (
Mapping) – Meeting to get the status fortext (
str) – Any additional text not included in the meeting details that may indicate whether it’s been cancelled, defaults to “”
- Return type
str- Returns
Status constant
-
-
class
city_scrapers_core.spiders.LegistarSpider(*args, **kwargs)[source]¶ Subclass of
CityScrapersSpiderthat handles processing Legistar sites, which almost always share the same components and general structure.Any methods that don’t pull the correct values can be replaced.
-
legistar_links(item)[source]¶ Pulls relevant links from a Legistar item
- Parameters
item (
Dict) – Scraped item from Legistar- Return type
List[Dict]- Returns
List of meeting links
-
legistar_source(item)[source]¶ Pulls the source URL from a Legistar item. Pulls a specific meeting URL if available, otherwise defaults to the general Legistar calendar page.
- Parameters
item (
Dict) – Scraped item from Legistar- Return type
str- Returns
Source URL
-
legistar_start(item)[source]¶ Pulls the start time from a Legistar item
- Parameters
item (
Dict) – Scraped item from Legistar- Return type
datetime- Returns
Meeting start datetime
-
parse_legistar(events)[source]¶ Method to be implemented by Spider classes that will handle the response from Legistar. Functions similar to
parsefor other Spider classes.- Parameters
events (
Iterable[Dict]) – Iterable consisting of a dict of scraped results from Legistar- Raises
NotImplementedError – Must be implemented in subclasses
- Return type
Iterable[Meeting]- Returns
Meetingobjects that will be passed to pipelines, output
-