API Reference

This section contains the detailed API reference for all public modules and classes in AskPablos Scrapy API.

Middleware Module

The middleware module is responsible for integrating with Scrapy’s downloader middleware system to route requests through AskPablos Proxy API.

Configuration Options

The middleware accepts the following configuration in request meta:

meta = {
    "askpablos_api_map": {
        "browser": True,              # Optional: Use headless browser
        "screenshot": True,           # Optional: Take screenshot (requires browser: True)
        "operations": [...],          # Optional: Browser operations for SPA interaction (requires browser: True)
        "geoLocation": "US",          # Optional: 2-letter ISO country code (e.g. "PK", "US", "GB")
        "proxyType": "residential"    # Optional: "datacenter", "residential", or "mobile"
    }
}

HTTP Method Support

The middleware automatically supports both GET and POST HTTP methods:

  • GET requests: Standard Scrapy requests are processed as GET by default

  • POST requests: When using Scrapy FormRequest or setting method='POST', the request body is automatically included in the API payload

Settings Configuration

Global settings must be configured in settings.py:

# Required settings
API_KEY = "your_api_key_here"
SECRET_KEY = "your_secret_key_here"

# Optional settings
APCLOUDY_URL = "https://domain.com"  # Base URL for AskPablos API
TIMEOUT = 30          # Request timeout in seconds
MAX_RETRIES = 2       # Maximum number of retries for failed requests

Operations Module

The operations module handles configuration validation and API payload creation for enhanced features.

Browser Operations for SPA Interaction

The operations parameter allows you to define advanced browser interactions for Single Page Applications:

"operations": [
    {
        "task": "waitForElement",
        "match": {
            "on": "xpath",      # or "css"
            "rule": "visible",  # or "attached", "hidden", "detached"
            "value": "//*[@id='element']"
        },
        "maxWait": 30,          # Optional: seconds to wait (default: 30)
        "onFailure": "return"   # Optional: "continue", "return", or "throw"
    }
]

Supported Tasks:

  • waitForElement - Wait for an element to match the specified condition

Match Options:

  • on: Selector type - "xpath" or "css"

  • rule: Element state - "visible", "attached", "hidden", or "detached"

  • value: The selector string (XPath or CSS selector)

Optional Parameters:

  • maxWait: Maximum time to wait in seconds (must be > 0)

  • onFailure: Action on failure - "continue" (ignore and continue), "return" (stop and return), or "throw" (raise error)

Geo-Location and Proxy Type

Two additional optional parameters control how the proxy is selected:

geoLocation — Route the request through a proxy in a specific country.

  • Value: 2-letter ISO 3166-1 alpha-2 country code (e.g. "US", "PK", "GB")

  • Case-insensitive; normalized to uppercase internally

proxyType — Choose the category of proxy to use.

  • "datacenter" — Fast, cost-efficient data center proxies

  • "residential" — Real ISP-assigned IPs; higher trust and lower detection rate

  • "mobile" — Mobile carrier IPs; highest trust for mobile-targeted sites

Both options are independent of each other and of browser/screenshot/operations.

Operations handler for AskPablos Scrapy API.

This module defines and validates configuration that can be used with the AskPablos API service.

class askpablos_scrapy_api.operations.AskPablosAPIMapValidator[source]

Bases: object

Validates the askpablos_api_map configuration.

classmethod validate_config(config)[source]

Validate and normalize askpablos_api_map configuration.

Parameters:

config (Dict[str, Any]) – Configuration dictionary

Returns:

Validated and normalized configuration

Raises:

ValueError – If configuration is invalid

Return type:

Dict[str, Any]

askpablos_scrapy_api.operations.create_api_payload(request_url, request_method, config)[source]

Create API payload from validated configuration.

Parameters:
  • request_url (str) – The URL to request

  • request_method (str) – HTTP method

  • config (Dict[str, Any]) – Validated configuration

Returns:

API payload dictionary

Return type:

Dict[str, Any]

Authentication Module

The authentication module handles secure API authentication using HMAC-SHA256 request signing.

Authentication utilities for AskPablos Scrapy API.

This module provides functions for securely signing API requests and verifying authentication credentials.

askpablos_scrapy_api.auth.sign_request(payload, secret_key)[source]

Sign a request payload with HMAC-SHA256 using the provided secret key.

Parameters:
  • payload (Dict[str, Any]) – The request payload to sign

  • secret_key (str) – The secret key used for signing

Returns:

Tuple containing the (JSON payload string, base64-encoded signature)

Return type:

Tuple[str, str]

askpablos_scrapy_api.auth.create_auth_headers(api_key, signature)[source]

Create authentication headers for AskPablos API requests.

Parameters:
  • api_key (str) – The API key

  • signature (str) – The base64-encoded signature

Returns:

Dictionary of HTTP headers

Return type:

Dict[str, str]

Configuration Module

The configuration module manages settings and configuration options for the AskPablos API integration.

Configuration management module for AskPablos Scrapy API.

This module provides utilities for securely loading and validating configuration settings from environment variables and settings files.

class askpablos_scrapy_api.config.Config[source]

Bases: object

Configuration manager for AskPablos Scrapy API.

Initialize an empty configuration.

DEFAULT_TIMEOUT = 30
DEFAULT_RETRIES = 2
__init__()[source]

Initialize an empty configuration.

load_from_settings(settings)[source]

Load configuration from Scrapy settings.

Parameters:

settings (Dict[str, Any])

Return type:

None

validate()[source]

Validate that all required configuration is present.

Return type:

None

get(key, default=None)[source]

Get a configuration value.

Parameters:
  • key (str) – The configuration key to retrieve

  • default (Any | None) – Default value if the key is not found

Returns:

The configuration value or the default

Return type:

Any

Endpoints Module

The endpoints module provides access to different API endpoints offered by the AskPablos service.

Exceptions Module

The exceptions module defines custom exceptions for error handling within the AskPablos Scrapy API.

Exception handling for AskPablos Scrapy API.

This module provides custom exceptions and error handling utilities for the AskPablos Scrapy API middleware.

exception askpablos_scrapy_api.exceptions.AskPablosAPIError(message, status_code=None, response=None)[source]

Bases: Exception

Base exception class for AskPablos API errors.

Parameters:
__init__(message, status_code=None, response=None)[source]
Parameters:
exception askpablos_scrapy_api.exceptions.AuthenticationError(message, status_code=None, response=None)[source]

Bases: AskPablosAPIError

Raised when API key or secret key authentication fails. This is a critical error that should stop the spider.

Parameters:
__init__(message, status_code=None, response=None)[source]
Parameters:
exception askpablos_scrapy_api.exceptions.RateLimitError(message, status_code=None, response=None)[source]

Bases: AskPablosAPIError

Raised when the API rate limit is exceeded. This is a critical error that should stop the spider.

Parameters:
__init__(message, status_code=None, response=None)[source]
Parameters:
askpablos_scrapy_api.exceptions.handle_api_error(status_code, response_data=None)[source]

Factory function to create and return the appropriate exception based on status code.

Parameters:
  • status_code (int) – HTTP status code

  • response_data (Dict[str, Any] | None) – API response data if available

Returns:

An appropriate Exception instance.

Return type:

AskPablosAPIError

Utilities Module

The utilities module provides helper functions for working with the AskPablos Scrapy API.

Utility functions for validating AskPablos API configuration.

This module contains separate validation methods for each configuration option.

askpablos_scrapy_api.utils.validate_browser(config, validated_config)[source]

Validate browser configuration.

Parameters:
  • config (Dict[str, Any]) – Raw configuration dictionary

  • validated_config (Dict[str, Any]) – Dictionary to store validated config

Returns:

True if browser is enabled, False otherwise

Return type:

bool

Raises:

ValueError – If browser value is invalid

askpablos_scrapy_api.utils.validate_screenshot(config, validated_config, browser_enabled)[source]

Validate screenshot configuration.

Parameters:
  • config (Dict[str, Any]) – Raw configuration dictionary

  • validated_config (Dict[str, Any]) – Dictionary to store validated config

  • browser_enabled (bool) – Whether browser mode is enabled

Raises:

ValueError – If screenshot value is invalid

Return type:

None

askpablos_scrapy_api.utils.validate_geo_location(config, validated_config)[source]

Validate geoLocation configuration.

Parameters:
  • config (Dict[str, Any]) – Raw configuration dictionary

  • validated_config (Dict[str, Any]) – Dictionary to store validated config

Raises:

ValueError – If geoLocation value is invalid

Return type:

None

askpablos_scrapy_api.utils.validate_proxy_type(config, validated_config)[source]

Validate proxyType configuration.

Parameters:
  • config (Dict[str, Any]) – Raw configuration dictionary

  • validated_config (Dict[str, Any]) – Dictionary to store validated config

Raises:

ValueError – If proxyType value is invalid

Return type:

None

askpablos_scrapy_api.utils.validate_operations(config, validated_config, browser_enabled)[source]

Validate operations configuration.

Parameters:
  • config (Dict[str, Any]) – Raw configuration dictionary

  • validated_config (Dict[str, Any]) – Dictionary to store validated config

  • browser_enabled (bool) – Whether browser mode is enabled

Raises:

ValueError – If operations structure is invalid

Return type:

None