apalogretrieve (binary-name: apalog)
retrieves data from an Apache logfile with a syntax,
that is derived from (and a subset of) the SQL language.
SQL is used for relational databases normally.
But here we use a SQL-like language (SQL-subset)
There are a lot of possibilities to retrieve data from
an apache logfile.
To have an overview on the requests (with graphics support also) very quickly,
you can use webalizer. It's a nice tool but as every tool it has not
only advantages but also disadvantages.
Webalizer does averaging. For getting detailed infromation in certain queries, this is a bad choice.
You could switch back to grep, awk or Perl or
use specialzed tools, which you implement in a common programming language,
which does some specialized analyzes.
But grep has some limitations here, because it does string matching linewise,
not on certain fields. awk has no domain-specific named fields.
Even if you can use associative arrays in awk, which makes things easier to handle,
there is no predefined name for the logentry-fields. Also you may get problems with
parsing the data, because you have to find out the appropriate way to select the
apalogretrieve brings you specific names for each field.
And if you want to use specialized tools,
they might be to narrow in focus,
even if possibly good in what they are intended to do.
If you want to make concise, but flexible queries,
wether webalizer nor grep nor awk nor specialized tools might be
the best choice, IMHO.
What apalogretrieve makes possible here is to retrieve the
fields by their name and use filters (WHERE-clause)
as well as boolean operators (AND, OR, NOT).
You also have a simple regular expression mechanism,
like the like-operator from SQL.
IMHO this makes data retrieval in lookups for some special
entries very convenient.
During development of apastat (a logfile analyser for user-statistics) I nedded
to select certain fields for looking up the data in logfiles, so that I can
check the functionality of the analyser.
For that, apalogretrieve is an invaluable tool!
News from apalog (31th January 2008):
- 31/01/2008: New Release: version 0-9-6_4 is available.
Changed Lexer and Parser for the logfile as well as for the REPL-loop.
- Logfile-Parser corrected to read correctly even strange files.
- Command-Parser: uses now "ORDER BY" instead of "GROUP BY" (the latter was,
when following SQL instead of not common sense /natural language ;-) a misnomer).
- 09/01/2008: New Release: version 0-9-6_2 is available.
LaTeX-Converter completed, and unnecessary prints in align-output removed.
New Release: version 0-9-6 is now available!
Apalog has always worked on the COMBINED logfile format,
not on the COMMON logfile format.
Sorry for the misnomer in the documentation!
But I grabbed out only the parts of the common logfile format and ignored
the both (very seldom used) fields logname and userid.
Now I grab all parts out of the logfile, not ignoring
logname and userid.
And I furthermore work on the combined logfile format like before.
The combined logfile format is a very common format (hence
Apalog (non-released version) can now read gzip-files transparently.
Aligned ASCII-output also is ready now.
Apalog (non-released version) has now the possibility to redirect the output to
It also now has the possibility to output the results as a
HTML-Table. The release will follow later; I first want to implement
LaTeX-output also, as well as aligned ASCII-output.
Release 0.9.4: now GROUP BY is implemented! :)
Some other changes also.
New release: 0.9.2 has now the WHERE-clause filter-condition inside the
logfile lexer, which means less memory usage, if you read large logfiles.
How to Use apalog
Implemented SQL-statements (subset of SQL)
apalog has no line-editing functionality implemented.
If you want to have this feature, please use the
(It's also written in OCaml :))
Debian-package for ledit
SELECT host,date FROM "apache-combined.log" where size > 2000;
SELECT host,date,client,referrer FROM "apache-combined.log" where host = "foobar.host.net";
The task: Look for all entries with domainnames ending in ".it", ignoring the entries for icons ("/icons/back.gif", "/icons/folder.gif", "/icons/blank.gif", "/favicon.ico" and so on)
# select host,date,request,referrer from "access.log" where host like "%.it" AND (NOT request like "%icon%");
Example 4: There is no "DISTINCT" clause - example on that?
Invoke apalog like here:
$ cat | apalog | sort -u
and then for example type this command:
select referrer from "access.log"; quit;
Then you get all referrer-entries reported once.
Necessary to mention ("disclaimer")
- I should also mention: the date is compared string-like; it is NOT
a true comparison on the dates (at least not in version 0.9.4).
I may change this in later releases (as I didn't use this feature
often, it was not really necessry for me;
implementing "like" was more interesting for me, so this was my priority.)
- Size of log-entries with size == "-" are handled as
size = -1. This means, if you wish to look for entries
that have "-" entries,look for size < 0.
I may later change this to 0.
The language of choice is OCaml,
the ultimative language for high-level programming.
If you want to give feedback (feature wishes, bug report or if you like the tool
and where you use it), do not hesitate to contact me.
Mail: oliver _at_ first.in-berlin.de
$Date: 2008-01-31 19:30:22 +0100 (Do, 31 Jan 2008) $