Extensions

class city_scrapers_core.extensions.StatusExtension(crawler)[source]

Scrapy extension for maintaining an SVG badge for each scraper’s status.

create_status_svg(spider, status)[source]

Format a template status SVG string based on a spider and status information

Parameters
  • spider (Spider) – Spider to determine the status for

  • status (str) – String indicating scraper status, one of “running”, “failing”

Return type

str

Returns

An SVG string formatted for a given spider and status

classmethod from_crawler(crawler)[source]

Generate an extension from a crawler

Parameters

crawler (Crawler) – Current scrapy crawler

spider_closed()[source]

Updates the status SVG with a running status unless the spider has encountered an error in which case it exits

spider_error()[source]

Sets the has_error flag on the first spider error and immediately updates the SVG with a “failing” status

update_status_svg(spider, svg)[source]

Method for updating the status button SVG for a storage provider. Must be implemented on subclasses.

Parameters
  • spider (Spider) – Spider with the status being tracked

  • svg (str) – Templated SVG string

Raises

NotImplementedError – Raises if not implemented on subclass

class city_scrapers_core.extensions.AzureBlobStatusExtension(crawler)[source]

Implements StatusExtension for Azure Blob Storage

create_status_svg(spider, status)

Format a template status SVG string based on a spider and status information

Parameters
  • spider (Spider) – Spider to determine the status for

  • status (str) – String indicating scraper status, one of “running”, “failing”

Return type

str

Returns

An SVG string formatted for a given spider and status

classmethod from_crawler(crawler)

Generate an extension from a crawler

Parameters

crawler (Crawler) – Current scrapy crawler

spider_closed()

Updates the status SVG with a running status unless the spider has encountered an error in which case it exits

spider_error()

Sets the has_error flag on the first spider error and immediately updates the SVG with a “failing” status

update_status_svg(spider, svg)[source]

Implements writing templated status SVG to Azure Blob Storage

Parameters
  • spider (Spider) – Spider with the status being tracked

  • svg (str) – Templated SVG string

class city_scrapers_core.extensions.S3StatusExtension(crawler)[source]

Implements StatusExtension for AWS S3

create_status_svg(spider, status)

Format a template status SVG string based on a spider and status information

Parameters
  • spider (Spider) – Spider to determine the status for

  • status (str) – String indicating scraper status, one of “running”, “failing”

Return type

str

Returns

An SVG string formatted for a given spider and status

classmethod from_crawler(crawler)

Generate an extension from a crawler

Parameters

crawler (Crawler) – Current scrapy crawler

spider_closed()

Updates the status SVG with a running status unless the spider has encountered an error in which case it exits

spider_error()

Sets the has_error flag on the first spider error and immediately updates the SVG with a “failing” status

update_status_svg(spider, svg)[source]

Implements writing templated status SVG to AWS S3

Parameters
  • spider (Spider) – Spider with the status being tracked

  • svg (str) – Templated SVG string

class city_scrapers_core.extensions.GCSStatusExtension(crawler)[source]

Implements StatusExtension for Google Cloud Storage

create_status_svg(spider, status)

Format a template status SVG string based on a spider and status information

Parameters
  • spider (Spider) – Spider to determine the status for

  • status (str) – String indicating scraper status, one of “running”, “failing”

Return type

str

Returns

An SVG string formatted for a given spider and status

classmethod from_crawler(crawler)

Generate an extension from a crawler

Parameters

crawler (Crawler) – Current scrapy crawler

spider_closed()

Updates the status SVG with a running status unless the spider has encountered an error in which case it exits

spider_error()

Sets the has_error flag on the first spider error and immediately updates the SVG with a “failing” status

update_status_svg(spider, svg)[source]

Implements writing templated status SVG to Google Cloud Storage

Parameters
  • spider (Spider) – Spider with the status being tracked

  • svg (str) – Templated SVG string

class city_scrapers_core.extensions.AzureBlobFeedStorage(uri)[source]

Subclass of scrapy.extensions.feedexport.BlockingFeedStorage for writing scraper results to Azure Blob Storage.

Parameters

uri (str) – Azure Blob Storage URL including an account name, credentials, container, and filename