mozilla

Inputs

AMQPInput

Connects to a remote AMQP broker (RabbitMQ) and retrieves messages from the specified queue. As AMQP is dynamically programmable, the broker topology needs to be specified in the plugin configuration.

Config:

  • url (string):

    An AMQP connection string formatted per the RabbitMQ URI Spec.

  • exchange (string):

    AMQP exchange name

  • exchange_type (string):

    AMQP exchange type (fanout, direct, topic, or headers).

  • exchange_durability (bool):

    Whether the exchange should be configured as a durable exchange. Defaults to non-durable.

  • exchange_auto_delete (bool):

    Whether the exchange is deleted when all queues have finished and there is no publishing. Defaults to auto-delete.

  • routing_key (string):

    The message routing key used to bind the queue to the exchange. Defaults to empty string.

  • prefetch_count (int):

    How many messages to fetch at once before message acks are sent. See RabbitMQ performance measurements for help in tuning this number. Defaults to 2.

  • queue (string):

    Name of the queue to consume from, an empty string will have the broker generate a name for the queue. Defaults to empty string.

  • queue_durability (bool):

    Whether the queue is durable or not. Defaults to non-durable.

  • queue_exclusive (bool):

    Whether the queue is exclusive (only one consumer allowed) or not. Defaults to non-exclusive.

  • queue_auto_delete (bool):

    Whether the queue is deleted when the last consumer un-subscribes. Defaults to auto-delete.

  • queue_ttl (int):

    Allows ability to specify TTL in milliseconds on Queue declaration for expiring messages. Defaults to undefined/infinite.

  • decoder (string):

    Decoder name used to transform a raw message body into a structured hekad message. Must be a decoder appropriate for the messages that come in from the exchange. If accepting messages that have been generated by an AMQPOutput in another Heka process then this should be a ProtobufDecoder instance.

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

New in version 0.6.

  • tls (TlsConfig):

    An optional sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if URL uses the AMQPS URI scheme. See Configuring TLS.

Since many of these parameters have sane defaults, a minimal configuration to consume serialized messages would look like:

[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"

Or you might use a PayloadRegexDecoder to parse OSX syslog messages with the following:

[AMQPInput]
url = "amqp://guest:guest@rabbitmq/"
exchange = "testout"
exchange_type = "fanout"
decoder = "logparser"

[logparser]
type = "MultiDecoder"
subs = ["logline", "leftovers"]

[logline]
type = "PayloadRegexDecoder"
MatchRegex = '\w+ \d+ \d+:\d+:\d+ \S+ (?P<Reporter>[^\[]+)\[(?P<Pid>\d+)](?P<Sandbox>[^:]+)?: (?P Remaining>.*)'

    [logline.MessageFields]
    Type = "amqplogline"
    Hostname = "myhost"
    Reporter = "%Reporter%"
    Remaining = "%Remaining%"
    Logger = "%Logger%"
    Payload = "%Remaining%"

[leftovers]
type = "PayloadRegexDecoder"
MatchRegex = '.*'

    [leftovers.MessageFields]
    Type = "drop"
    Payload = ""

DockerLogInput

New in version 0.8.

The DockerLogInput plugin attaches to all containers running on a host and sends their logs messages into the Heka pipeline. The plugin is based on Logspout by Jeff Lindsay. Messages will be populated as follows:

  • Uuid: Type 4 (random) UUID generated by Heka.
  • Timestamp: Time when the log line was received by the plugin.
  • Type: DockerLog.
  • Hostname: Hostname of the machine on which Heka is running.
  • Payload: The log line received from a Docker container.
  • Logger: stdout or stderr, depending on source.
  • Fields[“ContainerID”] (string): The container ID
  • Fields[“ContainerName”] (string): The container name

Config:

  • endpoint (string):

    A Docker endpoint. Defaults to “unix:///var/run/docker.sock”.

  • decoder (string):

    The name of the decoder used to further transform the message into a structured hekad message. No default decoder is specified.

Example:

    [nginx_log_decoder]
    type = "SandboxDecoder"
    filename = "lua_decoders/nginx_access.lua"

        [nginx_log_decoder.config]
        type = "nginx.access"
        user_agent_transform = true
        log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

[DockerLogInput]
decoder = "nginx_log_decoder"

FilePollingInput

New in version 0.7.

FilePollingInputs periodically read (unbuffered) the contents of a file specified, and creates a Heka message with the contents of the file as the payload.

Config:

  • file_path(string):

    The absolute path to the file which the input should read.

  • ticker_interval (unit):

    How often, in seconds to input should read the contents of the file.

  • decoder (string):

    The name of the decoder used to process the payload of the input.

Example:

[MemStats]
type = "FilePollingInput"
ticker_interval = 1
file_path = "/proc/meminfo"
decoder = "MemStatsDecoder"

HttpInput

HttpInput plugins intermittently poll remote HTTP URLs for data and populate message objects based on the results of the HTTP interactions. Messages will be populated as follows:

  • Uuid: Type 4 (random) UUID generated by Heka.

  • Timestamp: Time HTTP request is completed.

  • Type: heka.httpinput.data or heka.httpinput.error depending on whether or

    not the request completed. (Note that a response returned with an HTTP error code is still considered complete and will generate type heka.httpinput.data.)

  • Hostname: Hostname of the machine on which Heka is running.

  • Payload: Entire contents of the HTTP response body.

  • Severity: HTTP response 200 uses success_severity config value, all other

    results use error_severity config value.

  • Logger: Fetched URL.

  • Fields[“Status”] (string): HTTP status string value (e.g. “200 OK”).

  • Fields[“StatusCode”] (int): HTTP status code integer value.

  • Fields[“ResponseSize”] (int): Value of HTTP Content-Length header.

  • Fields[“ResponseTime”] (float64): Clock time elapsed for HTTP request, in

    seconds.

  • Fields[“Protocol”] (string): HTTP protocol used for the request (e.g.

    “HTTP/1.0”)

The Fields values above will only be populated in the event of a completed HTTP request. Also, it is possible to specify a decoder to further process the results of the HTTP response before injecting the message into the router.

Config:

  • url (string):

    A HTTP URL which this plugin will regularly poll for data. This option cannot be used with the urls option. No default URL is specified.

  • urls (array):

    New in version 0.5.

    An array of HTTP URLs which this plugin will regularly poll for data. This option cannot be used with the url option. No default URLs are specified.

  • method (string):

    New in version 0.5.

    The HTTP method to use for the request. Defaults to “GET”.

  • headers (subsection):

    New in version 0.5.

    Subsection defining headers for the request. By default the User-Agent header is set to “Heka”

  • body (string):

    New in version 0.5.

    The request body (e.g. for an HTTP POST request). No default body is specified.

  • username (string):

    New in version 0.5.

    The username for HTTP Basic Authentication. No default username is specified.

  • password (string):

    New in version 0.5.

    The password for HTTP Basic Authentication. No default password is specified.

  • ticker_interval (uint):

    Time interval (in seconds) between attempts to poll for new data. Defaults to 10.

  • success_severity (uint):

    New in version 0.5.

    Severity level of successful HTTP request. Defaults to 6 (information).

  • error_severity (uint):

    New in version 0.5.

    Severity level of errors, unreachable connections, and non-200 responses of successful HTTP requests. Defaults to 1 (alert).

  • decoder (string):

    The name of the decoder used to further transform the response body text into a structured hekad message. No default decoder is specified.

Example:

[HttpInput]
url = "http://localhost:9876/"
ticker_interval = 5
success_severity = 6
error_severity = 1
decoder = "MyCustomJsonDecoder"
    [HttpInput.headers]
    user-agent = "MyCustomUserAgent"

HttpListenInput

New in version 0.5.

HttpListenInput plugins start a webserver listening on the specified address and port. If no decoder is specified data in the request body will be populated as the message payload. Messages will be populated as follows:

  • Uuid: Type 4 (random) UUID generated by Heka.

  • Timestamp: Time HTTP request is handled.

  • Type: heka.httpdata.request

  • Hostname: The remote network address of requester.

  • Payload: Entire contents of the HTTP response body.

  • Severity: 6

  • Logger: HttpListenInput

  • Fields[“UserAgent”] (string): Request User-Agent header (e.g. “GitHub Hookshot dd0772a”).

  • Fields[“ContentType”] (string): Request Content-Type header (e.g. “application/x-www-form-urlencoded”).

  • Fields[“Protocol”] (string): HTTP protocol used for the request (e.g.

    “HTTP/1.0”)

Config:

  • address (string):

    An IP address:port on which this plugin will expose a HTTP server. Defaults to “127.0.0.1:8325”.

  • decoder (string):

    The name of the decoder used to further transform the request body text into a structured hekad message. No default decoder is specified.

New in version 0.7.

  • headers (subsection, optional):

    It is possible to inject arbitrary HTTP headers into each outgoing response by adding a TOML subsection entitled “headers” to you HttpOutput config section. All entries in the subsection must be a list of string values.

Example:

[HttpListenInput]
address = "0.0.0.0:8325"

Logstreamer Input

New in version 0.5.

Tails a single log file, a sequential single log source, or multiple log sources of either a single logstream or multiple logstreams.

Config:

  • hostname (string):

    The hostname to use for the messages, by default this will be the machine’s qualified hostname. This can be set explicitly to ensure it’s the correct name in the event the machine has multiple interfaces/hostnames.

  • oldest_duration (string):

    A time duration string (e.x. “2s”, “2m”, “2h”). Logfiles with a last modified time older than oldest_duration ago will not be included for parsing.

  • journal_directory (string):

    The directory to store the journal files in for tracking the location that has been read to thus far. By default this is stored under heka’s base directory.

  • log_directory (string):

    The root directory to scan files from. This scan is recursive so it should be suitably restricted to the most specific directory this selection of logfiles will be matched under. The log_directory path will be prepended to the file_match.

  • rescan_interval (int):

    During logfile rotation, or if the logfile is not originally present on the system, this interval is how often the existence of the logfile will be checked for. The default of 5 seconds is usually fine. This interval is in milliseconds.

  • file_match (string):

    Regular expression used to match files located under the log_directory. This regular expression has $ added to the end automatically if not already present, and log_directory as the prefix. WARNING: file_match should typically be delimited with single quotes, indicating use of a raw string, rather than double quotes, which require all backslashes to be escaped. For example, ‘access\.log’ will work as expected, but “access\.log” will not, you would need “access\\.log” to achieve the same result.

  • priority (list of strings):

    When using sequential logstreams, the priority is how to sort the logfiles in order from oldest to newest.

  • differentiator (list of strings):

    When using multiple logstreams, the differentiator is a set of strings that will be used in the naming of the logger, and portions that match a captured group from the file_match will have their matched value substituted in.

  • translation (hash map of hash maps of ints):

    A set of translation mappings for matched groupings to the ints to use for sorting purposes.

  • decoder (string):

    A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the parsed data is available in the Heka message payload.

  • parser_type (string):
    • token - splits the log on a byte delimiter (default).
    • regexp - splits the log on a regexp delimiter.
    • message.proto - splits the log on protobuf message boundaries
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of a log line.
    • end - the regexp delimiter occurs at the end of the log line (default).

ProcessInput

Executes one or more external programs on an interval, creating messages from the output. Supports a chain of commands, where stdout from each process will be piped into the stdin for the next process in the chain. In the event the program returns a non-zero exit code, ProcessInput will log that an error occurred.

Config:

  • command (map[uint]cmd_config):

    The command is a structure that contains the full path to the binary, command line arguments, optional enviroment variables and an optional working directory (see below). ProcessInput expects the commands to be indexed by integers starting with 0, where 0 is the first process in the chain.

  • ticker_interval (uint):

    The number of seconds to wait between each run of command. Defaults to 15. A ticker_interval of 0 indicates that the command is run only once, and should only be used for long running processes that do not exit. If ticker_interval is set to 0 and the process exits, then the ProcessInput will exit, invoking the restart behavior (see Configuring Restarting Behavior).

  • stdout (bool):

    If true, for each run of the process chain a message will be generated with the last command in the chain’s stdout as the payload. Defaults to true.

  • stderr (bool):

    If true, for each run of the process chain a message will be generated with the last command in the chain’s stderr as the payload. Defaults to false.

  • decoder (string):

    Name of the decoder instance to send messages to. If omitted messages will be injected directly into Heka’s message router.

  • parser_type (string):
    • token - splits the log on a byte delimiter (default).
    • regexp - splits the log on a regexp delimiter.
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the log line depending on the delimiter_location configuration. Note: when a start delimiter is used the last line in the file will not be processed (since the next record defines its end) until the log is rolled.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of a log line.
    • end - the regexp delimiter occurs at the end of the log line (default).
  • timeout (uint):

    Timeout in seconds before any one of the commands in the chain is terminated.

  • trim (bool) :

    Trim a single trailing newline character if one exists. Default is true.

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

cmd_config structure:

  • bin (string):

    The full path to the binary that will be executed.

  • args ([]string):

    Command line arguments to pass into the executable.

  • env ([]string):

    Used to set environment variables before command is run. Default is nil, which uses the heka process’s environment.

  • directory (string):

    Used to set the working directory of Bin Default is “”, which uses the heka process’s working directory.

Example:

[DemoProcessInput]
type = "ProcessInput"
ticker_interval = 2
parser_type = "token"
delimiter = " "
stdout = true
stderr = false
trim = true

    [DemoProcessInput.command.0]
    bin = "/bin/cat"
    args = ["../testsupport/process_input_pipes_test.txt"]

    [DemoProcessInput.command.1]
    bin = "/usr/bin/grep"
    args = ["ignore"]

ProcessDirectoryInput

New in version 0.5.

The ProcessDirectoryInput periodically scans a filesystem directory looking for ProcessInput configuration files. The ProcessDirectoryInput will maintain a pool of running ProcessInputs based on the contents of this directory, refreshing the set of running inputs as needed with every rescan. This allows Heka administrators to manage a set of data collection processes for a running hekad server without restarting the server.

Each ProcessDirectoryInput has a process_dir configuration setting, which is the root folder of the tree where scheduled jobs are defined. It should contain exactly one nested level of subfolders, named with ASCII numeric characters indicating the interval, in seconds, between each process run. These numeric folders must contain TOML files which specify the details regarding which processes to run.

For example, a process_dir might look like this:

-/usr/share/heka/processes/
 |-5
   |- check_myserver_running.toml
 |-61
   |- cat_proc_mounts.toml
   |- get_running_processes.toml
 |-302
   |- some_custom_query.toml

This indicates one process to be run every five seconds, two processes to be run every 61 seconds, and one process to be run every 302 seconds.

Note that ProcessDirectory will ignore any files that are not nested one level deep, are not in a folder named for an integer 0 or greater, and do not end with ‘.toml’. Each file which meets these criteria, such as those shown in the example above, should contain the TOML configuration for exactly one ProcessInput, matching that of a standalone ProcessInput with the following restrictions:

  • The section name must be ProcessInput. Any TOML sections named anything other than ProcessInput will be ignored.
  • Any specified ticker_interval value will be ignored. The ticker interval value to use will be parsed from the directory path.

If the specified process fails to run or the ProcessInput config fails for any other reason, ProcessDirectoryInput will log an error message and continue.

Config:

  • ticker_interval (int, optional):

    Amount of time, in seconds, between scans of the process_dir. Defaults to 300 (i.e. 5 minutes).

  • process_dir (string, optional):

    This is the root folder of the tree where the scheduled jobs are defined. Absolute paths will be honored, relative paths will be computed relative to Heka’s globally specified share_dir. Defaults to “processes” (i.e. “$share_dir/processes”).

  • retries (RetryOptions, optional):

    A sub-section that specifies the settings to be used for restart behavior. See Configuring Restarting Behavior

Example:

[ProcessDirectoryInput]
process_dir = "/etc/hekad/processes.d"
ticker_interval = 120

StatAccumInput

Provides an implementation of the StatAccumulator interface which other plugins can use to submit Stat objects for aggregation and roll-up. Accumulates these stats and then periodically emits a “stat metric” type message containing aggregated information about the stats received since the last generated message.

Config:

  • emit_in_payload (bool):

    Specifies whether or not the aggregated stat information should be emitted in the payload of the generated messages, in the format accepted by the carbon portion of the graphite graphing software. Defaults to true.

  • emit_in_fields (bool):

    Specifies whether or not the aggregated stat information should be emitted in the message fields of the generated messages. Defaults to false. NOTE: At least one of ‘emit_in_payload’ or ‘emit_in_fields’ must be true or it will be considered a configuration error and the input won’t start.

  • percent_threshold (int):

    Percent threshold to use for computing “upper_N%” type stat values. Defaults to 90.

  • ticker_interval (uint):

    Time interval (in seconds) between generated output messages. Defaults to 10.

  • message_type (string):

    String value to use for the Type value of the emitted stat messages. Defaults to “heka.statmetric”.

  • legacy_namespaces (bool):

    If set to true, then use the older format for namespacing counter stats, with rates recorded under stats.<counter_name> and absolute count recorded under stats_counts.<counter_name>. See statsd metric namespacing. Defaults to false.

  • global_prefix (string):

    Global prefix to use for sending stats to graphite. Defaults to “stats”.

  • counter_prefix (string):

    Secondary prefix to use for namespacing counter metrics. Has no impact unless legacy_namespaces is set to false. Defaults to “counters”.

  • timer_prefix (string):

    Secondary prefix to use for namespacing timer metrics. Defaults to “timers”.

  • gauge_prefix (string):

    Secondary prefix to use for namespacing gauge metrics. Defaults to “gauges”.

  • statsd_prefix (string):

    Prefix to use for the statsd numStats metric. Defaults to “statsd”.

  • delete_idle_stats (bool):

    Don’t emit values for inactive stats instead of sending 0 or in the case of gauges, sending the previous value. Defaults to false.

StatsdInput

Listens for statsd protocol counter, timer, or gauge messages on a UDP port, and generates Stat objects that are handed to a StatAccumulator for aggregation and processing.

Config:

  • address (string):

    An IP address:port on which this plugin will expose a statsd server. Defaults to “127.0.0.1:8125”.

  • stat_accum_name (string):

    Name of a StatAccumInput instance that this StatsdInput will use as its StatAccumulator for submitting received stat values. Defaults to “StatAccumInput”.

  • max_msg_size (uint):

    Size of a buffer used for message read from statsd. In some cases, when statsd sends a lots in single message of stats it’s required to boost this value. All over-length data will be truncated without raising an error. Defaults to 512.

Example:

[StatsdInput]
address = ":8125"
stat_accum_name = "custom_stat_accumulator"

TcpInput

Listens on a specific TCP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.

Config:

  • address (string):

    An IP address:port on which this plugin will listen.

  • signer:

    Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.

    • hmac_key (string):

      The hash key used to sign the message.

New in version 0.4.

  • decoder (string):

    A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.

  • parser_type (string):
    • token - splits the stream on a byte delimiter.
    • regexp - splits the stream on a regexp delimiter.
    • message.proto - splits the stream on protobuf message boundaries.
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of the message.
    • end - the regexp delimiter occurs at the end of the message (default).

New in version 0.5.

  • use_tls (bool):

    Specifies whether or not SSL/TLS encryption should be used for the TCP connections. Defaults to false.

  • tls (TlsConfig):

    A sub-section that specifies the settings to be used for any SSL/TLS encryption. This will only have any impact if use_tls is set to true. See Configuring TLS.

  • net (string, optional, default: “tcp”)

    Network value must be one of: “tcp”, “tcp4”, “tcp6”, “unix” or “unixpacket”.

New in version 0.6.

  • keep_alive (bool):

    Specifies whether or not TCP keepalive should be used for established TCP connections. Defaults to false.

  • keep_alive_period (int):

    Time duration in seconds that a TCP connection will be maintained before keepalive probes start being sent. Defaults to 7200 (i.e. 2 hours).

Example:

[TcpInput]
address = ":5565"
parser_type = "message.proto"
decoder = "ProtobufDecoder"

[TcpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[TcpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"

[TcpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"

UdpInput

Listens on a specific UDP address and port for messages. If the message is signed it is verified against the signer name and specified key version. If the signature is not valid the message is discarded otherwise the signer name is added to the pipeline pack and can be use to accept messages using the message_signer configuration option.

Note

The UDP payload is not restricted to a single message; since the stream parser is being used multiple messages can be sent in a single payload.

Config:

  • address (string):

    An IP address:port or Unix datagram socket file path on which this plugin will listen.

  • signer:

    Optional TOML subsection. Section name consists of a signer name, underscore, and numeric version of the key.

    • hmac_key (string):

      The hash key used to sign the message.

New in version 0.4.

  • decoder (string):

    A ProtobufDecoder instance must be specified for the message.proto parser. Use of a decoder is optional for token and regexp parsers; if no decoder is specified the raw input data is available in the Heka message payload.

  • parser_type (string):
    • token - splits the stream on a byte delimiter.
    • regexp - splits the stream on a regexp delimiter.
    • message.proto - splits the stream on protobuf message boundaries.
  • delimiter (string): Only used for token or regexp parsers.

    Character or regexp delimiter used by the parser (default “\n”). For the regexp delimiter a single capture group can be specified to preserve the delimiter (or part of the delimiter). The capture will be added to the start or end of the message depending on the delimiter_location configuration.

  • delimiter_location (string): Only used for regexp parsers.
    • start - the regexp delimiter occurs at the start of the message.
    • end - the regexp delimiter occurs at the end of the message (default).

New in version 0.5.

  • net (string, optional, default: “udp”)

    Network value must be one of: “udp”, “udp4”, “udp6”, or “unixgram”.

Example:

[UdpInput]
address = "127.0.0.1:4880"
parser_type = "message.proto"
decoder = "ProtobufDecoder"

[UdpInput.signer.ops_0]
hmac_key = "4865ey9urgkidls xtb0[7lf9rzcivthkm"
[UdpInput.signer.ops_1]
hmac_key = "xdd908lfcgikauexdi8elogusridaxoalf"

[UdpInput.signer.dev_1]
hmac_key = "haeoufyaiofeugdsnzaogpi.ua,dp.804u"