Logging
In any application you will need to have some kind of logs where you write events, errors, debug and other information. One of the biggest problems is how to analyze information from log files. If you have some predefined format you can use some log analysis tools or languages such as Perl Scripts, Hadoop MapReduce (if you have big log files), Graylog2 (open source), LogStash (open source), Loggly (SaaS), etc.
The problem in this case might be that you have limited query/analysis capabilities because you will need to code your own logic for filtering and aggregating log information.
If you format information in log files as JSON you might have good trade-off between flexibility of information that you can store and ability to process information with some kind of query language. One of the common choice are NoSQL database such as MongoDB or Azure DocumentDB where you can store JSON messages and search them using some platform specific query language.
In this article we will see ho can you analyze log files containing JSON messages.
Problem
We have log files containing log messages formatted as JSON. How to analyze these log files?
Solution
Load text files in SQL Server and analyze them with OPENJSON function.
How to analyze JSON logs with SQL Server?
SQL Server enables you to load content of the file formatted as JSON and transform it into relational format that can be analyzed with standard SQL. We will start with and example of JSON log file shown in the following example:
[
{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"},
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"},
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” },
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"},
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}
]
Here we have some standard information such as time, IP address, requested URL, HTTP method, etc. If some error occurred we can have additional data such as exception message, stack trace, etc.
In SQL Server, we can easily read this log file and query results:
SELECT log.*
FROM OPENROWSET (BULK 'C:\logs\json-log-2015-11-27.txt', SINGLE_CLOB) as log_file
CROSS APPLY OPENJSON(BulkColumn)
WITH( time datetime, status varchar(5), method varchar(10), exception nvarchar(200)) AS log
For better manageability and performance, it might be good to split your log files by date or size (using something like Rolling file appender in Log4J). Now we can do any type of analysis on the returned data set. The following report returns list of pages and number of server errors detected on them for each HTTP method:
SELECT request, method, COUNT(*)
FROM OPENROWSET (BULK N'C:\logs\log-json-2015-11-27.txt', SINGLE_CLOB) as log_file
CROSS APPLY OPENJSON(BulkColumn)
WITH( time datetime, status int, method varchar(10), request varchar(20), exception nvarchar(200)) AS log
WHERE status >= 500
GROUP BY request, method
You can notice that we have full power of T-SQL on JSON log files. You can also load JSON data into standard table and create reports on that table.
Analyzing LD-JSON logs
One of the problem with JSON is the fact that you cannot continuously append JSON messages. If you want to have valid array of JSON objects you will need to surround them with brackets. Once you add final bracket, you cannot add new data.
LD JSON is an alternative JSON format that might be a good choice for logging. LD JSON address one of the main issues in standard JSON format – ability to continuously append valid JSON objects. LD-JSON introduces few changes in standard JSON format:
- New line is separator between objects
- Stream of JSON objects is not surrounded with brackets so you can continuously append JSON information in the file.
An example of LD-JSON content is shown in the following example:
{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” }
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"}
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}
Now we can read this file with FORMATFILE and run the same report:
SELECT request, method, COUNT(*)
FROM OPENROWSET(BULK 'C:\logs\log-ld-json-2015-11-27.txt',
FORMATFILE= 'c:\logs\csv.xml') AS log_file
CROSS APPLY OPENJSON(json)
WITH( time datetime, status int, method varchar(10), request nvarchar(200)) AS log
WHERE status >= 500
GROUP BY request, method
You would need to have format file with the following content:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="510" COLLATION="Czech_100_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="json" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Note that json column in the T-SQL query aove is defined in format file.
Analyzing file from Azure File Storage
You can use the same approach to read JSON files stored on Azure File Storage. As an example, Azure File Storage supports SMB protocol, so you can map your local virtual drive to the Azure File storage share using the following procedure:
- Create file storage account (e.g. mystorage), file share (e.g. logs), and folder using Azure portal or Azure PowerShell SDK.
- Create firewall outbound rule in Windows Firewall on your computer that allows port 445. Note that this port might be blocked by your internet provider. If you are getting DNS error (error 53) in the following step then you have not opened that port or it is blocked by your ISP.
- Mount Azure File Storage share as local drive (e.g. t: ) using the following command:
net use [drive letter] \\[storage name].file.core.windows.net\[share name] /u:[storage account name] [storage account access key]
Example that I have used is:
net use t: \\mystorage.file.core.windows.net\sharejson /u:myaccont hb5qy6eXLqIdBj0LvGMHdrTiygkjhHDvWjUZg3Gu7bubKLg==
Storage account key and primary or secondary storage account access key can be found in the Keys section in Settings on Azure portal.
Now if you setup you application to log data to some log file into Azure File Storage (e.g. log-file.json), you can use queries above to analyze data loaded from path mapped to t: \\mystorage.file.core.windows.net\sharejson\log-file.json
Conclusion
Logging information in traditional text files is the fastest way to log information. Formatting log messages as JSON or LD-JSON enables you to have simple human readable log format with ability to query and analyze log data.
New SQL Server with JSON support enables you to easily load log files and analyze them with standard T-SQL.