Spiders

class city_scrapers_core.spiders.CityScrapersSpider(*args, **kwargs)[source]

Base Spider class for City Scrapers projects. Provides a few utilities for common tasks like creating a meeting ID and checking the status based on meeting details.

get_id(item, identifier=None)[source]

Create an ID for a meeting based on its details like title and start time as well as any agency-provided unique identifiers.

Parameters
  • item (Mapping) – Meeting to generate an ID for

  • identifier (Optional[str]) – Optional unique meeting identifier if available, defaults to None

Return type

str

Returns

ID string based on meeting details

get_status(item, text='')[source]

Determine the status of a meeting based off of its details as well as any additional text that may indicate whether it has been cancelled.

Parameters
  • item (Mapping) – Meeting to get the status for

  • text (str) – Any additional text not included in the meeting details that may indicate whether it’s been cancelled, defaults to “”

Return type

str

Returns

Status constant

class city_scrapers_core.spiders.LegistarSpider(*args, **kwargs)[source]

Subclass of CityScrapersSpider that handles processing Legistar sites, which almost always share the same components and general structure.

Uses the Legistar events scraper from the `python-legistar-scraper library <https://github.com/opencivicdata/python-legistar-scraper>`.

Any methods that don’t pull the correct values can be replaced.

property base_url

Property with the Legistar site’s base URL

Return type

str

Returns

Legistar base URL

Pulls relevant links from a Legistar item

Parameters

item (Mapping) – Scraped item from Legistar

Return type

List[Mapping[str, str]]

Returns

List of meeting links

legistar_source(item)[source]

Pulls the source URL from a Legistar item. Pulls a specific meeting URL if available, otherwise defaults to the general Legistar calendar page.

Parameters

item (Mapping) – Scraped item from Legistar

Return type

str

Returns

Source URL

legistar_start(item)[source]

Pulls the start time from a Legistar item

Parameters

item (Mapping) – Scraped item from Legistar

Return type

datetime

Returns

Meeting start datetime

parse(response)[source]

Parse response from the LegistarEventsScraper. Ignores the scrapy Response which is still requested to be able to hook into scrapy broadly.

Parameters

response (Response) – Scrapy response to be ignored

Return type

Iterable[Meeting]

Returns

Iterable of processed meetings

parse_legistar(events)[source]

Method to be implemented by Spider classes that will handle the response from Legistar. Functions similar to parse for other Spider classes.

Parameters

events (Iterable[Tuple[Mapping, Optional[str]]]) – Iterable consisting of tuples of a dict-like object of scraped results from legistar and an agenda URL (if available)

Raises

NotImplementedError – Must be implemented in subclasses

Return type

Iterable[Meeting]

Returns

[description]