GitHub Action

The spidermedic GitHub Action runs the crawler inside a Docker container — no Rust toolchain or extra dependencies required.

Usage

- uses: ethanhann/spidermedic@main
  with:
    url: 'https://example.com'

Inputs

Input	Required	Default	Description
`url`	yes	—	Starting URL to crawl
`path`	no	`/`	Restrict crawl to this URL path prefix
`port`	no	`80`	Port override
`interval`	no	`300`	Milliseconds between requests
`concurrency`	no	`10`	Max parallel in-flight requests
`max-depth`	no	`0`	Max crawl depth (`0` = unlimited)
`output`	no	`terminal`	Output format: `terminal`, `json`, or `csv`

Exit codes

Code	Meaning
`0`	All crawled pages returned non-error responses
`1`	One or more 4xx / 5xx responses were found

The workflow step will automatically fail when the exit code is 1.

Examples

steps:
  - uses: ethanhann/spidermedic@main
    with:
      url: 'https://example.com'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy site
        run: ./deploy.sh

  check-links:
    needs: deploy
    runs-on: ubuntu-latest
    steps:
      - uses: ethanhann/spidermedic@main
        with:
          url: 'https://example.com'
          concurrency: '20'
          max-depth: '5'

on:
  schedule:
    - cron: '0 6 * * *'  # daily at 6am

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - uses: ethanhann/spidermedic@main
        with:
          url: 'https://example.com'

# Only crawl the /blog section
- uses: ethanhann/spidermedic@main
  with:
    url: 'https://example.com/blog'
    path: '/blog'

Capturing JSON output

output: json writes results to stdout, which GitHub Actions captures in the step log. To save it as an artifact, wrap the action in a shell step:

- name: Crawl (JSON)
  run: |
    docker run --rm ghcr.io/ethanhann/spidermedic \
      --url=https://example.com \
      --output=json > crawl-results.json

- uses: actions/upload-artifact@v4
  with:
    name: crawl-results
    path: crawl-results.json