# Overview of log file formats

You’ve learned about how logs record events that happen on a network, or system. In security, logs provide key details about activities that occurred across an organization, like who signed into an application at a specific point in time. As a security analyst, you’ll use **log analysis**, which is the process of examining logs to identify events of interest. It’s important to know how to read and interpret different log formats so that you can uncover the key details surrounding an event and identify unusual or malicious activity. In this reading, you’ll review the following log formats:

- JSON
- Syslog
- XML
- CSV
- CEF

## JavaScript Object Notation (JSON)

JavaScript Object Notation (JSON) is a file format that is used to store and transmit data. JSON is known for being lightweight and easy to read and write. It is used for transmitting data in web technologies and is also commonly used in cloud environments. JSON syntax is derived from JavaScript syntax. If you are familiar with JavaScript, you might recognize that JSON contains components from JavaScript including:

- Key-value pairs
- Commas
- Double quotes
- Curly brackets
- Square brackets

### **Key-value pairs**

A **key-value** **pair** is a set of data that represents two linked items: a key and its corresponding value. A key-value pair consists of a key followed by a colon, and then followed by a value. An example of a key-value pair is <var>"Alert": "Malware"</var>.

**Note**: For readability, it is recommended that key-value pairs contain a space before or after the colon that separates the key and value.

### **Commas**

Commas are used to separate data. For example: <var>"Alert": "Malware", "Alert code": 1090, "severity": 10</var>.

### **Double quotes**

Double quotes are used to enclose *text* data, which is also known as a string, for example: <var>"Alert": "Malware"</var>. Data that contains numbers *is not enclosed in quotes, like this: <var>"Alert code": 1090</var>.

### **Curly brackets**

Curly brackets enclose an **object**, which is a data type that stores data in a comma-separated list of key-value pairs. Objects are often used to describe multiple properties for a given key. JSON log entries start and end with a curly bracket. In this example, <var>User</var> is the object that contains multiple properties:

<var>"User" { "id": "1234", "name": "user", "role": "engineer" }</var>

### **Square brackets**

Square brackets are used to enclose an **array**, which is a data type that stores data in a comma-separated ordered list. Arrays are useful when you want to store data as an ordered collection, for example: <var>\["Administrators", "Users", "Engineering"\]</var>.

## Syslog

Syslog is a standard for logging and transmitting data. It can be used to refer to any of its three different capabilities:

1. **Protocol**: The syslog protocol is used to transport logs to a centralized log server for log management. It uses port 514 for plaintext logs and port 6514 for encrypted logs.
2. **Service**: The syslog service acts as a log forwarding service that consolidates logs from multiple sources into a single location. The service works by receiving and then forwarding any syslog log entries to a remote server.
3. **Log format**: The syslog log format is one of the most commonly used log formats that you will be focusing on. It is the native logging format used in Unix® systems. It consists of three components: a header, structured-data, and a message.

## Syslog log example

Here is an example of a syslog entry that contains all three components: a header, followed by structured-data, and a message:

<var>&lt;236&gt;1 2022-03-21T01:11:11.003Z virtual.machine.com evntslog - ID01 \[user@32473 iut="1" eventSource="Application" eventID="9999"\] This is a log entry!</var>

### **Header** 

The header contains details like the timestamp; the hostname, which is the name of the machine that sends the log; the application name; and the message ID.

- **Timestamp**: The timestamp in this example is <var>2022-03-21T01:11:11.003Z</var>, where <var>2022-03-21</var> is the date in YYYY-MM-DD format. <var>T</var> is used to separate the date and the time. <var>01:11:11.003</var> is the 24-hour format of the time and includes the number of milliseconds <var>003</var>. <var>Z</var> indicates the timezone, which is Coordinated Universal Time (UTC).
- **Hostname**: <var>virtual.machine.com</var>
- **Application**: <var>evntslog</var>
- **Message** **ID**: <var>ID01</var>

### **Structured-data** 

The structured-data portion of the log entry contains additional logging information. This information is enclosed in square brackets and structured in key-value pairs. Here, there are three keys with corresponding values: <var>\[user@32473 iut="1" eventSource="Application" eventID="9999"\]</var>.

### **Message** 

The message contains a detailed log message about the event. Here, the message is <var>This is a log entry!</var>.

### **Priority (PRI)**

The priority (PRI) field indicates the urgency of the logged event and is contained with angle brackets. In this example, the priority value is <var>&lt;236&gt;</var> . Generally, the lower the priority level, the more urgent the event is.

**Note**: Syslog headers can be combined with JSON, and XML formats. Custom log formats also exist.

## XML (eXtensible Markup Language)

XML (eXtensible Markup Language) is a language and a format used for storing and transmitting data. XML is a native file format used in Windows systems. XML syntax uses the following:

- Tags
- Elements
- Attributes

### **Tags** 

XML uses tags to store and identify data. Tags are pairs that must contain a start tag and an end tag. The start tag encloses data with angle brackets, for example <var>&lt;tag&gt;</var>, whereas the end of a tag encloses data with angle brackets and a forward slash like this: <var>&lt;/tag&gt;</var>.

### **Elements** 

XML elements include *both* the data contained inside of a tag and the tags itself. All XML entries must contain at least one root element. Root elements contain other elements that sit underneath them, known as child elements.

Here is an example:

<var>&lt;Event&gt; &lt;EventID&gt;4688&lt;/EventID&gt; &lt;Version&gt;5&lt;/Version&gt; &lt;/Event&gt;</var>

In this example, <var>&lt;Event&gt;</var> is the root element and contains two child elements <var>&lt;EventID&gt;</var> and <var>&lt;Version&gt;</var>. There is data contained in each respective child element.

### **Attributes**

XML elements can also contain attributes. Attributes are used to provide additional information about elements. Attributes are included as the second part of the tag itself and must always be quoted using either single or double quotes.

For example:

<var>&lt;EventData&gt; </var>

<var>&lt;Data Name='SubjectUserSid'&gt;S-2-3-11-160321&lt;/Data&gt;</var>

<var> &lt;Data Name='SubjectUserName'&gt;JSMITH&lt;/Data&gt; </var>

<var>&lt;Data Name='SubjectDomainName'&gt;ADCOMP&lt;/Data&gt; </var>

<var>&lt;Data Name='SubjectLogonId'&gt;0x1cf1c12&lt;/Data&gt; </var>

<var>&lt;Data Name='NewProcessId'&gt;0x1404&lt;/Data&gt; </var>

<var>&lt;/EventData&gt;</var>

In the first line for this example, the tag is <var>&lt;Data&gt;</var> and it uses the attribute <var>Name='SubjectUserSid'</var> to describe the data enclosed in the tag <var>S-2-3-11-160321</var>.

## CSV (Comma Separated Value)

CSV (Comma Separated Value) uses commas to separate data values. In CSV logs, the position of the data corresponds to its field name, but the field names themselves might not be included in the log. It’s critical to understand what fields the source device (like an IPS, firewall, scanner, etc.) is including in the log.

Here is an example:

<var>2009-11-24T21:27:09.534255,ALERT,192.168.2.7, 1041,x.x.250.50,80,TCP,ALLOWED,1:2001999:9,"ET MALWARE BTGrab.com Spyware Downloading Ads",1</var>

## CEF (Common Event Format)

**Common Event Format (CEF)** is a log format that uses key-value pairs to structure data and identify fields and their corresponding values. The CEF syntax is defined as containing the following fields:

<var>CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension</var>

Fields are all separated with a pipe character <var>|</var>. However, anything in the <var>Extension</var> part of the CEF log entry must be written in a key-value format. Syslog is a common method used to transport logs like CEF. When Syslog is used a timestamp and hostname will be prepended to the CEF message. Here is an example of a CEF log entry that details malicious activity relating to a worm infection:

<var>Sep 29 08:26:10 host CEF:1|Security|threatmanager|1.0|100|worm successfully stopped|10|src=10.0.0.2 dst=2.1.2.2 spt=1232</var>

Here is a breakdown of the fields:

- **Syslog Timestamp**: <var>Sep 29 08:26:10</var>
- **Syslog Hostname**: <var>host</var>
- **Version**:  <var>CEF:1</var>
- **Device Vendor**: <var>Security</var>
- **Device Product**: <var>threatmanager</var>
- **Device Version**: <var>1.0</var>
- **Signature ID**: <var>100</var>
- **Name**: <var>worm successfully stopped</var>
- **Severity**: <var>10</var>
- **Extension**: This field contains data written as key-value pairs. There are two IP addresses, <var>src=10.0.0.2</var> and <var>dst=2.1.2.2</var>, and a source port number <var>spt=1232</var>. Extensions are not required and are optional to add.

This log entry contains details about a <var>Security</var> application called <var>threatmanager</var> that <var>successfully stopped a worm</var> from spreading from the internal network at <var>10.0.0.2</var> to the external network <var>2.1.2.2</var> through the port <var>1232</var>. A high severity level of <var>10</var> is reported.

**Note**: Extensions and syslog prefix are optional to add to a CEF log.

## Key takeaways

There is no standard format used in logging, and many different log formats exist. As a security analyst, you will analyze logs that originate from different sources. Knowing how to interpret different log formats will help you determine key information that you can use to support your investigations.

## Resources for more information

- To learn more about the syslog protocol including priority levels, check out [The Syslog Protocol<svg aria-labelledby="cds-react-aria-448-title" class="css-1lzqdox" fill="none" focusable="false" height="16" id="bkmrk-" role="img" viewbox="0 0 16 16" width="16"></svg>](https://www.rfc-editor.org/rfc/rfc5424)
- .
- If you would like to explore generating log formats, check out this open-source [test data generator tool<svg aria-labelledby="cds-react-aria-449-title" class="css-1lzqdox" fill="none" focusable="false" height="16" id="bkmrk--1" role="img" viewbox="0 0 16 16" width="16"></svg>](https://generatedata.com/)
- .
- To learn more about timestamp formats, check out [Date and Time on the Internet: Timestamps<svg aria-labelledby="cds-react-aria-450-title" class="css-1lzqdox" fill="none" focusable="false" height="16" id="bkmrk--2" role="img" viewbox="0 0 16 16" width="16"></svg>](https://www.rfc-editor.org/rfc/rfc3339)