mozilla

Decoders

MultiDecoder

This decoder plugin allows you to specify an ordered list of delegate decoders. The MultiDecoder will pass the PipelinePack to be decoded to each of the delegate decoders in turn until decode succeeds. In the case of failure to decode, MultiDecoder will return an error and recycle the message.

Config:

  • subs:

    A subsection is used to declare the TOML configuration for any delegate decoders. The default is that no delegate decoders are defined.

  • order (list of strings):

    PipelinePack objects will be passed in order to each decoder in this list. Default is an empty list.

  • log_sub_errors (bool):

    If true, the DecoderRunner will log the errors returned whenever a delegate decoder fails to decode a message. Defaults to false.

  • cascade_strategy (string):

    Specifies behavior the MultiDecoder should exhibit with regard to cascading through the listed decoders. Supports only two valid values: “first-wins” and “all”. With “first-wins”, each decoder will be tried in turn until there is a successful decoding, after which decoding will be stopped. With “all”, all listed decoders will be applied whether or not they succeed. In each case, decoding will only be considered to have failed if none of the sub-decoders succeed.

Example (Two PayloadRegexDecoder delegates):

[syncdecoder]
type = "MultiDecoder"
order = ['syncformat', 'syncraw']

[syncdecoder.subs.syncformat]
type = "PayloadRegexDecoder"
match_regex = '^(?P<RemoteIP>\S+) \S+ (?P<User>\S+) \[(?P<Timestamp>[^\]]+)\] "(?P<Method>[A-Z]+) (?P<Url>[^\s]+)[^"]*" (?P<StatusCode>\d+) (?P<RequestSize>\d+) "(?P<Referer>[^"]*)" "(?P<Browser>[^"]*)" ".*" ".*" node_s:\d+\.\d+ req_s:(?P<ResponseTime>\d+\.\d+) retries:\d+ req_b:(?P<ResponseSize>\d+)'
timestamp_layout = "02/Jan/2006:15:04:05 -0700"

[syncdecoder.subs.syncformat.message_fields]
RemoteIP|ipv4 = "%RemoteIP%"
User = "%User%"
Method = "%Method%"
Url|uri = "%Url%"
StatusCode = "%StatusCode%"
RequestSize|B= "%RequestSize%"
Referer = "%Referer%"
Browser = "%Browser%"
ResponseTime|s = "%ResponseTime%"
ResponseSize|B = "%ResponseSize%"
Payload = ""

[syncdecoder.subs.syncraw]
type = "PayloadRegexDecoder"
match_regex = '^(?P<TheData>.*)'

[syncdecoder.subs.syncraw.message_fields]
Somedata = "%TheData%"

Nginx Access Log Decoder

New in version 0.5.

Parses the Nginx access logs based on the Nginx ‘log_format’ configuration directive.

Config:

  • log_format (string)

    The ‘log_format’ configuration directive from the nginx.conf. $time_local or $time_iso8601 variable is converted to the number of nanosecond since the Unix epoch and used to set the Timestamp on the message.

  • type (string, optional, default nil):

    Sets the message ‘Type’ header to the specified value

  • user_agent_transform (bool, optional, default false)

    Transform the http_user_agent into user_agent_browser, user_agent_version, user_agent_os.

  • user_agent_keep (bool, optional, default false)

    Always preserve the http_user_agent value if transform is enabled.

  • user_agent_conditional (bool, optional, default false)

    Only preserve the http_user_agent value if transform is enabled and fails.

Example Heka Configuration

[FxaNginxAccessDecoder]
type = "SandboxDecoder"
script_type = "lua"
filename = "lua_decoders/nginx_access.lua"

[FxaNginxAccessDecoder.config]
log_format = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"'
user_agent_transform = true

Example Heka Message

Timestamp:2014-01-10 07:04:56 -0800 PST
Type:logfile
Hostname:trink-x230
Pid:0
UUID:8e414f01-9d7f-4a48-a5e1-ae92e5954df5
Logger:FxaNginxAccessInput
Payload:
EnvVersion:
Severity:7
Fields:
name:”remote_user” value_string:”-“
name:”http_x_forwarded_for” value_string:”-“
name:”http_referer” value_string:”-“
name:”body_bytes_sent” value_type:DOUBLE representation:”B” value_double:82
name:”remote_addr” value_string:”62.195.113.219”
name:”status” value_type:DOUBLE value_double:200
name:”request” value_string:”GET /v1/recovery_email/status HTTP/1.1”
name:”user_agent_os” value_string:”FirefoxOS”
name:”user_agent_browser” value_string:”Firefox”
name:”user_agent_version” value_type:DOUBLE value_double:29

PayloadJsonDecoder

NOTE: The PayloadJsonDecoder is deprecated. The recommended way to parse JSON is to make use of the cjson Lua library in a SandboxDecoder. This approach is typically both faster and more flexible than using PayloadJsonDecoder.

This decoder plugin accepts JSON blobs and allows you to map parts of the JSON into Field attributes of the pipeline pack message using JSONPath syntax.

Config:

  • json_map:

    A subsection defining a capture name that maps to a JSONPath expression. Each expression can fetch a single value, if the expression does not resolve to a valid node in the JSON message, the capture group will be assigned an empty string value.

  • severity_map:

    Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.

  • message_fields:

    Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in a JSONPath in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).

    Interpolated values should be surrounded with % signs, for example:

    [my_decoder.message_fields]
    Type = "%Type%Decoded"
    

    This will result in the new message’s Type being set to the old messages Type with Decoded appended.

  • timestamp_layout (string):

    A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. The default layout is ISO8601 - the same as Javascript.

  • timestamp_location (string):

    Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout.

  • require_all_fields (bool):

    Requires json mappings to match ALL specified message_fields. If any message field is not matched, the pack is silently passed back unmodified. Useful when used in combination with a MultiDecoder and multiple JSON formats. Defaults to “false”

    Example:

    [my_multi_decoder]
    type = "MultiDecoder"
    order = ["amqp_decoder", "log1_json", "log2_json"]
    cascade_strategy = "all"
    
    [my_multi_decoder.subs.log1_json]
    type = "PayloadJsonDecoder"
    require_all_fields = "true"
    
    [my_multi_decoder.subs.log1_json.json_map]
    field1 = "$.field1"
    field2 = "$.field2"
    field3 = "$.field3"
    
    [my_multi_decoder.subs.log1_json.message_fields]
    field1 = "%field1%"
    field2 = "%field2%"
    fieddld3 = "%field3%"
    
    [my_multi_decoder.subs.log2_json]
    type = "PayloadJsonDecoder"
    require_all_fields = "true"
    
    [my_multi_decoder.subs.log2_json.json_map]
    field4 = "$.field4"
    field5 = "$.field5"
    field6 = "$.field6"
    
    [my_multi_decoder.subs.log2_json.message_fields]
    field4 = "%field4%"
    field5 = "%field5%"
    field6 = "%field6%"
    

Example:

[myjson_decoder]
type = "PayloadJsonDecoder"

[myjson_decoder.json_map]
Count = "$.statsd.count"
Name = "$.statsd.name"
Pid = "$.pid"
Timestamp = "$.timestamp"
Severity = "$.log_level"

[myjson_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4

[myjson_decoder.message_fields]
Pid = "%Pid%"
StatCount = "%Count%"
StatName =  "%Name%"
Timestamp = "%Timestamp%"

PayloadJsonDecoder’s json_map config subsection only supports a small subset of valid JSONPath expressions.

JSONPath Description
$ the root object/element
. child operator
[] subscript operator to iterate over arrays

Examples:

var s = {
    "foo": {
        "bar": [
            {
                "baz": "こんにちわ世界",
                "noo": "aaa"
            },
            {
                "maz": "123",
                "moo": 256
            }
        ],
        "boo": {
            "bag": true,
            "bug": false
        }
    }
}

# Valid paths
$.foo.bar[0].baz
$.foo.bar

PayloadRegexDecoder

Decoder plugin that accepts messages of a specified form and generates new outgoing messages from extracted data, effectively transforming one message format into another.

Config:

  • match_regex:

    Regular expression that must match for the decoder to process the message.

  • severity_map:

    Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.

  • message_fields:

    Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in a regex in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).

    Interpolated values should be surrounded with % signs, for example:

    [my_decoder.message_fields]
    Type = "%Type%Decoded"
    

    This will result in the new message’s Type being set to the old messages Type with Decoded appended.

  • timestamp_layout (string):

    A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation.

  • timestamp_location (string):

    Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout.

  • log_errors (bool):

    New in version 0.5.

    If set to false, payloads that can not be matched against the regex will not be logged as errors. Defaults to true.

Example (Parsing Apache Combined Log Format):

[apache_transform_decoder]
type = "PayloadRegexDecoder"
match_regex = '/^(?P<RemoteIP>\S+) \S+ \S+ \[(?P<Timestamp>[^\]]+)\] "(?P<Method>[A-Z]+) (?P<Url>[^\s]+)[^"]*" (?P<StatusCode>\d+) (?P<RequestSize>\d+) "(?P<Referer>[^"]*)" "(?P<Browser>[^"]*)"/'
timestamp_layout = "02/Jan/2006:15:04:05 -0700"

# severities in this case would work only if a (?P<Severity>...) matching
# group was present in the regex, and the log file contained this information.
[apache_transform_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4

[apache_transform_decoder.message_fields]
Type = "ApacheLogfile"
Logger = "apache"
Url|uri = "%Url%"
Method = "%Method%"
Status = "%Status%"
RequestSize|B = "%RequestSize%"
Referer = "%Referer%"
Browser = "%Browser%"

PayloadXmlDecoder

This decoder plugin accepts XML blobs in the message payload and allows you to map parts of the XML into Field attributes of the pipeline pack message using XPath syntax using the xmlpath library.

Config:

  • xpath_map:

    A subsection defining a capture name that maps to an XPath expression. Each expression can fetch a single value, if the expression does not resolve to a valid node in the XML blob, the capture group will be assigned an empty string value.

  • severity_map:

    Subsection defining severity strings and the numerical value they should be translated to. hekad uses numerical severity codes, so a severity of WARNING can be translated to 3 by settings in this section. See Heka Message.

  • message_fields:

    Subsection defining message fields to populate and the interpolated values that should be used. Valid interpolated values are any captured in an XPath in the message_matcher, and any other field that exists in the message. In the event that a captured name overlaps with a message field, the captured name’s value will be used. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. ResponseSize|B = “%ResponseSize%” will create Fields[ResponseSize] representing the number of bytes. Adding a representation string to a standard message header name will cause it to be added as a user defined field i.e., Payload|json will create Fields[Payload] with a json representation (see Field Variables).

    Interpolated values should be surrounded with % signs, for example:

    [my_decoder.message_fields]
    Type = "%Type%Decoded"
    

    This will result in the new message’s Type being set to the old messages Type with Decoded appended.

  • timestamp_layout (string):

    A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. The default layout is ISO8601 - the same as Javascript.

  • timestamp_location (string):

    Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’s time.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout.

Example:

[myxml_decoder]
type = "PayloadXmlDecoder"

[myxml_decoder.xpath_map]
Count = "/some/path/count"
Name = "/some/path/name"
Pid = "//pid"
Timestamp = "//timestamp"
Severity = "//severity"

[myxml_decoder.severity_map]
DEBUG = 7
INFO = 6
WARNING = 4

[myxml_decoder.message_fields]
Pid = "%Pid%"
StatCount = "%Count%"
StatName =  "%Name%"
Timestamp = "%Timestamp%"

PayloadXmlDecoder’s xpath_map config subsection supports XPath as implemented by the xmlpath library.

  • All axes are supported (“child”, “following-sibling”, etc)
  • All abbreviated forms are supported (”.”, “//”, etc)
  • All node types except for namespace are supported
  • Predicates are restricted to [N], [path], and [path=literal] forms
  • Only a single predicate is supported per path step
  • Richer expressions and namespaces are not supported

ProtobufDecoder

The ProtobufDecoder is used for Heka message objects that have been serialized into protocol buffers format. This is the format that Heka uses to communicate with other Heka instances, so it is almost always a good idea to include one in your Heka configuration. The ProtobufDecoder has no configuration options.

The hekad protocol buffers message schema in defined in the message.proto file in the message package.

Example:

[ProtobufDecoder]

Rsyslog Decoder

New in version 0.5.

Parses the rsyslog output using the string based configuration template.

Config:

  • template (string)

    The ‘template’ configuration string from rsyslog.conf.

  • tz (string, optional, defaults to UTC)

    The conversion actually happens on the Go side since there isn’t good TZ support here.

Example Heka Configuration

[RsyslogDecoder]
type = "SandboxDecoder"
script_type = "lua"
filename = "lua_decoders/rsyslog.lua"

[RsyslogDecoder.config]
template = '%TIMESTAMP% %HOSTNAME% %syslogtag%%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n'
tz = "America/Los_Angeles"

Example Heka Message

Timestamp:2014-02-10 12:58:58 -0800 PST
Type:logfile
Hostname:trink-x230
Pid:0
UUID:e0eef205-0b64-41e8-a307-5772b05e16c1
Logger:RsyslogInput
Payload:“imklog 5.8.6, log source = /proc/kmsg started.”
EnvVersion:
Severity:7
Fields:
name:”syslogtag” value_string:”kernel:”]

SandboxDecoder

The SandboxDecoder provides an isolated execution environment for data parsing and complex transformations without the need to recompile Heka. See Sandbox.

Config:

  • script_type (string):

    The language the sandbox is written in. Currently the only valid option is ‘lua’.

  • filename (string):

    The path to the sandbox code; if specified as a relative path it will be appended to Heka’s global share_dir.

  • preserve_data (bool):

    True if the sandbox global data should be preserved/restored on Heka shutdown/startup.

  • memory_limit (uint):

    The number of bytes the sandbox is allowed to consume before being terminated (max 8MiB, default max).

  • instruction_limit (uint):

    The number of instructions the sandbox is allowed the execute during the process_message function before being terminated (max 1M, default max).

  • output_limit (uint):

    The number of bytes the sandbox output buffer can hold before before being terminated (max 63KiB, default max). Anything less than 64B is set to 64B.

  • module_directory (string):

    The directory where ‘require’ will attempt to load the external Lua modules from. Defaults to ${SHARE_DIR}/lua_modules.

  • config (object):

    A map of configuration variables available to the sandbox via read_config. The map consists of a string key with: string, bool, int64, or float64 values.

Example

[sql_decoder]
type = "SandboxDecoder"
script_type = "lua"
filename = "sql_decoder.lua"

ScribbleDecoder

New in version 0.5.

The ScribbleDecoder is a trivial decoder that makes it possible to set one or more static field values on every decoded message. It is often used in conjunction with another decoder (i.e. in a MultiDecoder w/ cascade_strategy set to “all”) to, for example, set the message type of every message to a specific custom value after the messages have been decoded from Protocol Buffers format. Note that this only supports setting the exact same value on every message, if any dynamic computation is required to determine what the value should be, or whether it should be applied to a specific message, a SandboxDecoder using the provided write_message API call should be used instead.

Config:

  • message_fields:

    Subsection defining message fields to populate. Optional representation metadata can be added at the end of the field name using a pipe delimiter i.e. host|ipv4 = “192.168.55.55” will create Fields[Host] containing an IPv4 address. Adding a representation string to a standard message header name will cause it to be added as a user defined field, i.e. Payload|json will create Fields[Payload] with a json representation (see Field Variables). Does not support Timestamp or Uuid.

Example (in MultiDecoder context)

[mytypedecoder]
type = "MultiDecoder"
order = ["proto", "mytype"]

    [mytypedecoder.subs.proto]
    type = "ProtobufDecoder"

    [mytypedecoder.subs.mytype]
    type = "ScribbleDecoder"

        [mytypedecoder.subs.mytype.message_fields]
        Type = "MyType"

StatsToFieldsDecoder

New in version 0.4.

The StatsToFieldsDecoder will parse time series statistics data in the graphite message format and encode the data into the message fields, in the same format produced by a StatAccumInput plugin with the emit_in_fields value set to true. This is useful if you have externally generated graphite string data flowing through Heka that you’d like to process without having to roll your own string parsing code.

This decoder has no configuration options. It simply expects to be passed messages with statsd string data in the payload. Incorrect or malformed content will cause a decoding error, dropping the message.

The fields format only contains a single “timestamp” field, so any payloads containing multiple timestamps will end up generating a separate message for each timestamp. Extra messages will be a copy of the original message except a) the payload will be empty and b) the unique timestamp and related stats will be the only message fields.

Example:

[StatsToFieldsDecoder]