Logswan

Fast Web log analyzer using probabilistic data structures.

Home · GitHub · Repology

Logswan is a fast Web log analyzer using probabilistic data structures. It is targeted at very large log files, typically APIs logs. It has constant memory usage regardless of the log file size, and takes approximatively 4MB of RAM.

Unique visitors counting is performed using two HyperLogLog counters (one for IPv4, and another one for IPv6), providing a relative accuracy of 0.10%. String representations of IP addresses are used and preferred as they offer better precision.

Project design goals include: speed, memory-usage efficiency, and keeping the code as simple as possible.

Logswan is opinionated software:

Logswan is written with security in mind and is running sandboxed on OpenBSD (using pledge) and on Linux (using seccomp). It has also been extensively fuzzed using AFL and Honggfuzz.

Features

Currently implemented features:

Dependencies

Logswan uses the CMake build system and requires Jansson and libmaxminddb libraries and header files.

Installing dependencies

Building

mkdir build
cd build
cmake ..
make

Logswan has been successfully built and tested on OpenBSD, NetBSD, FreeBSD, Mac OS X, and Linux with both Clang and GCC.

Packages

Packages are available for the following operating systems:

GeoIP2 databases

Logswan looks for GeoIP2 databases in ${CMAKE_INSTALL_PREFIX}/share/GeoIP2 by default, which points to /usr/local/share/GeoIP2.

A custom directory can be set using the GEOIP2DIR variable when invoking CMake:

cmake -DGEOIP2DIR=/var/db/GeoIP .

The free GeoLite2 databases from MaxMind can be downloaded here:

https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz

Usage

logswan [-ghv] [-d db] file

If file is a single dash (`-'), logswan reads from the standard input.

The options are as follows:

-d db	Specify path to a GeoIP database.
-g	Enable GeoIP lookups.
-h	Display usage.
-v	Display version.

Logswan outputs JSON data to stdout.

Measuring Logswan memory usage

Heap profiling can be done using valgrind, as follows:

valgrind --tool=massif logswan access.log
ms_print massif.out.*

License

Logswan is released under the BSD 2-Clause license.

Copyright (c) 2015-2019, Frederic Cambus
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

  * Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

  * Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.