Spiders

class city_scrapers_core.spiders.CityScrapersSpider(*args, **kwargs)[source]

Base Spider class for City Scrapers projects. Provides a few utilities for common tasks like creating a meeting ID and checking the status based on meeting details.

get_id(item, identifier=None)[source]

Create an ID for a meeting based on its details like title and start time as well as any agency-provided unique identifiers.

Parameters
  • item (Mapping) – Meeting to generate an ID for

  • identifier (Optional[str]) – Optional unique meeting identifier if available, defaults to None

Return type

str

Returns

ID string based on meeting details

get_status(item, text='')[source]

Determine the status of a meeting based off of its details as well as any additional text that may indicate whether it has been cancelled.

Parameters
  • item (Mapping) – Meeting to get the status for

  • text (str) – Any additional text not included in the meeting details that may indicate whether it’s been cancelled, defaults to “”

Return type

str

Returns

Status constant

class city_scrapers_core.spiders.EventsCalendarSpider(*args, **kwargs)[source]

Subclass of CityScrapersSpider that may be useful for WordPress sites with the Events Calendar plugin. Three additional things need to be implemented when subclassing:

  1. a categories dict

  2. _parse_location()

  3. _parse_links()

property categories: Dict

categories dict should be of the following format: categories = {

BOARD: [“category-1”, “category-2”], ..

}

Return type

Dict

class city_scrapers_core.spiders.LegistarSpider(*args, **kwargs)[source]

Subclass of CityScrapersSpider that handles processing Legistar sites, which almost always share the same components and general structure.

Any methods that don’t pull the correct values can be replaced.

Pulls relevant links from a Legistar item

Parameters

item (Dict) – Scraped item from Legistar

Return type

List[Dict]

Returns

List of meeting links

legistar_source(item)[source]

Pulls the source URL from a Legistar item. Pulls a specific meeting URL if available, otherwise defaults to the general Legistar calendar page.

Parameters

item (Dict) – Scraped item from Legistar

Return type

str

Returns

Source URL

legistar_start(item)[source]

Pulls the start time from a Legistar item

Parameters

item (Dict) – Scraped item from Legistar

Return type

datetime

Returns

Meeting start datetime

parse(response)[source]

Creates initial event requests for each queried year.

Parameters

response (Response) – Scrapy response to be ignored

Return type

Iterable[Request]

Returns

Iterable of Request objects for event pages

parse_legistar(events)[source]

Method to be implemented by Spider classes that will handle the response from Legistar. Functions similar to parse for other Spider classes.

Parameters

events (Iterable[Dict]) – Iterable consisting of a dict of scraped results from Legistar

Raises

NotImplementedError – Must be implemented in subclasses

Return type

Iterable[Meeting]

Returns

Meeting objects that will be passed to pipelines, output