Spiders¶
-
class
city_scrapers_core.spiders.
CityScrapersSpider
(*args, **kwargs)[source]¶ Base Spider class for City Scrapers projects. Provides a few utilities for common tasks like creating a meeting ID and checking the status based on meeting details.
-
get_id
(item, identifier=None)[source]¶ Create an ID for a meeting based on its details like title and start time as well as any agency-provided unique identifiers.
- Parameters
item (
Mapping
) – Meeting to generate an ID foridentifier (
Optional
[str
]) – Optional unique meeting identifier if available, defaults to None
- Return type
str
- Returns
ID string based on meeting details
-
get_status
(item, text='')[source]¶ Determine the status of a meeting based off of its details as well as any additional text that may indicate whether it has been cancelled.
- Parameters
item (
Mapping
) – Meeting to get the status fortext (
str
) – Any additional text not included in the meeting details that may indicate whether it’s been cancelled, defaults to “”
- Return type
str
- Returns
Status constant
-
-
class
city_scrapers_core.spiders.
LegistarSpider
(*args, **kwargs)[source]¶ Subclass of
CityScrapersSpider
that handles processing Legistar sites, which almost always share the same components and general structure.Uses the Legistar events scraper from the
`python-legistar-scraper
library <https://github.com/opencivicdata/python-legistar-scraper>`.Any methods that don’t pull the correct values can be replaced.
-
property
base_url
¶ Property with the Legistar site’s base URL
- Return type
str
- Returns
Legistar base URL
-
legistar_links
(item)[source]¶ Pulls relevant links from a Legistar item
- Parameters
item (
Mapping
) – Scraped item from Legistar- Return type
List
[Mapping
[str
,str
]]- Returns
List of meeting links
-
legistar_source
(item)[source]¶ Pulls the source URL from a Legistar item. Pulls a specific meeting URL if available, otherwise defaults to the general Legistar calendar page.
- Parameters
item (
Mapping
) – Scraped item from Legistar- Return type
str
- Returns
Source URL
-
legistar_start
(item)[source]¶ Pulls the start time from a Legistar item
- Parameters
item (
Mapping
) – Scraped item from Legistar- Return type
datetime
- Returns
Meeting start datetime
-
parse
(response)[source]¶ Parse response from the
LegistarEventsScraper
. Ignores thescrapy
Response
which is still requested to be able to hook intoscrapy
broadly.
-
parse_legistar
(events)[source]¶ Method to be implemented by Spider classes that will handle the response from Legistar. Functions similar to
parse
for other Spider classes.- Parameters
events (
Iterable
[Tuple
[Mapping
,Optional
[str
]]]) – Iterable consisting of tuples of a dict-like object of scraped results from legistar and an agenda URL (if available)- Raises
NotImplementedError – Must be implemented in subclasses
- Return type
Iterable
[Meeting
]- Returns
[description]
-
property