Module constants

Module constants 

Source
Expand description

Workspace-wide public constants.

This module re-exports the constant groups used by the crawler, downloader, middleware, pipelines, and scheduler.

Constantsยง

BLOOM_BUFFER_FLUSH_SIZE
Buffer size before flushing to Bloom filter.
BLOOM_FILTER_CAPACITY
Capacity of the Bloom filter for duplicate detection.
BLOOM_FILTER_HASH_FUNCTIONS
Number of hash functions used by the Bloom filter.
BLOOM_FLUSH_INTERVAL_MS
Interval in milliseconds for periodic Bloom filter flush.
CONNECTION_POOL_IDLE_TIMEOUT_SECS
Connection pool idle timeout in seconds.
CONNECT_TIMEOUT_SECS
Connect timeout in seconds.
CRAWLER_DEFAULT_CHANNEL_CAPACITY
Default channel capacity for crawler communication.
CRAWLER_IDLE_CHECK_INTERVAL_MS
Idle check interval in milliseconds.
CRAWLER_SHUTDOWN_GRACE_PERIOD_SECS
Default grace period for crawler shutdown in seconds.
CSV_BUFFER_SIZE
Buffer size for CSV export pipeline.
DEFAULT_POOL_MAX_IDLE_PER_HOST
Maximum idle connections per host for default client.
DEFAULT_REQUEST_TIMEOUT_SECS
Default request timeout in seconds.
DEFAULT_VISITED_CACHE_SIZE
Default capacity for the visited URL cache when not using checkpoint.
HOST_SPECIFIC_POOL_MAX_IDLE_PER_HOST
Maximum idle connections per host for host-specific clients.
MAX_PENDING_REQUESTS
Maximum number of pending requests before applying backpressure.
MIDDLEWARE_CACHE_CAPACITY
Default cache capacity for middleware.
MIDDLEWARE_CACHE_TTL_SECS
Default cache TTL for middleware (1 hour).
RATE_LIMIT_ERROR_PENALTY_MULTIPLIER
Error penalty multiplier for adaptive rate limiting.
RATE_LIMIT_FORBIDDEN_PENALTY_MULTIPLIER
Forbidden penalty multiplier for adaptive rate limiting.
RATE_LIMIT_INITIAL_DELAY_MS
Initial delay for adaptive rate limiting (500ms).
RATE_LIMIT_MAX_DELAY_MS
Maximum delay for rate limiting (60 seconds).
RATE_LIMIT_MAX_JITTER_MS
Maximum jitter for rate limiting (500ms).
RATE_LIMIT_MIN_DELAY_MS
Minimum delay for rate limiting (50ms).
RATE_LIMIT_SUCCESS_DECAY_MULTIPLIER
Success decay multiplier for adaptive rate limiting.
RETRY_DEFAULT_BACKOFF_FACTOR
Default backoff factor for retry middleware.
RETRY_DEFAULT_HTTP_CODES
Default HTTP status codes to retry.
RETRY_DEFAULT_MAX_DELAY_MS
Default maximum delay for retry middleware (3 minutes).
RETRY_DEFAULT_MAX_RETRIES
Default retry attempts for retry middleware.
SQLITE_CHANNEL_CAPACITY
Channel capacity for SQLite pipeline.
STREAM_JSON_DEFAULT_BATCH_SIZE
Default batch size for stream JSON pipeline.
TCP_KEEPALIVE_SECS
TCP keepalive in seconds.
VISITED_URL_CACHE_CAPACITY
Capacity of the visited URL cache in the scheduler.
VISITED_URL_CACHE_TTL_SECS
Time-to-idle for visited URL cache entries (1 hour).