Spiders¶
-
class
city_scrapers_core.spiders.
CityScrapersSpider
(*args, **kwargs)[source]¶ Base Spider class for City Scrapers projects. Provides a few utilities for common tasks like creating a meeting ID and checking the status based on meeting details.
-
get_id
(item, identifier=None)[source]¶ Create an ID for a meeting based on its details like title and start time as well as any agency-provided unique identifiers.
- Parameters
item (
Mapping
) – Meeting to generate an ID foridentifier (
Optional
[str
]) – Optional unique meeting identifier if available, defaults to None
- Return type
str
- Returns
ID string based on meeting details
-
get_status
(item, text='')[source]¶ Determine the status of a meeting based off of its details as well as any additional text that may indicate whether it has been cancelled.
- Parameters
item (
Mapping
) – Meeting to get the status fortext (
str
) – Any additional text not included in the meeting details that may indicate whether it’s been cancelled, defaults to “”
- Return type
str
- Returns
Status constant
-
-
class
city_scrapers_core.spiders.
LegistarSpider
(*args, **kwargs)[source]¶ Subclass of
CityScrapersSpider
that handles processing Legistar sites, which almost always share the same components and general structure.Any methods that don’t pull the correct values can be replaced.
-
legistar_links
(item)[source]¶ Pulls relevant links from a Legistar item
- Parameters
item (
Dict
) – Scraped item from Legistar- Return type
List
[Dict
]- Returns
List of meeting links
-
legistar_source
(item)[source]¶ Pulls the source URL from a Legistar item. Pulls a specific meeting URL if available, otherwise defaults to the general Legistar calendar page.
- Parameters
item (
Dict
) – Scraped item from Legistar- Return type
str
- Returns
Source URL
-
legistar_start
(item)[source]¶ Pulls the start time from a Legistar item
- Parameters
item (
Dict
) – Scraped item from Legistar- Return type
datetime
- Returns
Meeting start datetime
-
parse_legistar
(events)[source]¶ Method to be implemented by Spider classes that will handle the response from Legistar. Functions similar to
parse
for other Spider classes.- Parameters
events (
Iterable
[Dict
]) – Iterable consisting of a dict of scraped results from Legistar- Raises
NotImplementedError – Must be implemented in subclasses
- Return type
Iterable
[Meeting
]- Returns
Meeting
objects that will be passed to pipelines, output
-