Commands¶
City Scrapers has several custom Scrapy commands to streamline common tasks.
genspider¶
Syntax:
scrapy genspider <name> <agency> <start_url>
Example:
scrapy genspider chi_planning "Chicago Plan Commission" "https://chicago.gov/"
Scrapy’s genspider command is subclassed for this project to handle creating the boilerplate code.
The command accepts the Spider slug, the full agency name, and a URL that should be initially scraped. It will use this information to create a Spider, initial Pytest test file, and fixtures for the tests. If the site uses Legistar (based on the URL), it will use a separate template specific to Legistar sites that simplifies some commmon functionality.
The boilerplate files won’t work for all sites, and in particular they won’t cover cases where multiple pages need to be scraped, but they provide a starting point for some setup tasks that can cause confusion.
combinefeeds¶
Syntax:
scrapy combinefeeds
Combines output files written to a storage backend into latest.json
which contains
all meetings scraped, upcoming.json
which only includes meetings in the future, and
a file for each agency slug (i.e. chi_plan_commission.json
) at the top level of the
storage backend with the most recently scraped meetings for an agency.
runall¶
Syntax:
scrapy runall
This will load all spiders and run them in the same process.
validate¶
Syntax:
scrapy validate <name>
Example:
scrapy validate chi_plan_commission
This command is used to run the ValidationPipeline
and ensure that a scraper is
returning valid output. This is predominantly used for CI.