Concepts in brief

This chapter presents, in a rather concise way, all the concepts that are at the core of Squey. The understanding of each and every concept used by Squey is not mandatory to start using the software but readers of this manual are strongly encouraged to read this chapter before rushing on the application.

Should the reader be in a real hurry, each section of this chapter starts by a very short presentation of the corresponding concept. Our advice is that all readers of this manual should read all these small definitions, at least.

Moreover, the organisation of this chapter is more logical than alphabetical: concepts are presented in such a way that the newly defined concepts only rely on already defined ones. Of course, for some terms, links to terms that are defined later in this chapter are given.

But to ease the use of this chapter as a reference of concepts, we provide the alphabetically ordered list of links to concepts :

Event

An Event is a well-defined block of information, commonly represented by a line in a file.

An Event is the basic unit of data that composes a whole set of Data. For example, if one considers a Log file of a web server, if we suppose that each line of this Log file has the same structure then each individual line of this file can be considered as an Event. + When data comes from an industrial process, an Event can be composed of the measurement (say, every minute) of a set of physical sensors, to monitor the state of an industrial plant.

Events are commonly stored as fixed-sized tuples (such as in a CSV file); however, some appliances or applications use a tree based block structure (such as XML) to store the information.

Input

An Input is a set of Events: it can be a local file, a remote file, or the result of a database query.

The term Input is a generic term used to name a dataset that can be processed by Squey. Most of the time, it is a plain-text file sitting on a local filesystem. But it might also be such a file accessed on a remote computer or the result of a query sent to a database.

To be able to process an Input, Squey needs a descriptive file called a Format (explained later in the chapter).

Field

A Field is a specific part of an Event that can be found and located on all the Events. Usually, a Field is always located at the same position in the Events.

For example, it often happens that each line of a log file starts by a timestamp. In this case, one can consider that the first Field of all these Events is a timestamp.

A login name, a URL or a port number are other examples of classical Fields that are commonly found in Log files.

Sometimes, according to the analysis’ needs, some Fields found in the Events need to be further processed and split into simpler Fields. + For example:

  • an URI can be further split into Scheme, Hostname, Port and Resources;

  • a Timestamp can be split into year, month, day, hours, minutes and seconds.

The complete splitting process of an Event into smaller Fields is described in the Format associated to an Input.

Column

A Column is the natural representation of all the values taken by a given Field in a given Input, starting with the first Event in the first row, and ending by the last Event in the last row of the Column.

This terminology is obvious to all those what keep in mind the tabular representation of the Input: the Events correpond to rows, and the Fields correspond to columns.

In fact, an Input inherits its tabular representation (as a dataset) after the application of an associated Format.

Axis

In Squey, an Axis is the most common mathematical representation of a Column found in a given Input. An Axis is then completely tied to the Column it represents.

The term Axis will often come when dealing with graphical representations (called Views in Squey).

Mapping

A Mapping is the first function (out of two) that is applied to all the values of a Column so that they can be mapped (i.e. positioned) on the associated mathematical Axis.

Squey offers different Mapping functions. Some are generic and can be applied to any type of values of Fields. Some are only meaningfull on specific types of Fields.

In fact, a Mapping holds some more parameters and settings that are usefull for the graphical representations (colors, etc.). But the main purpose of a Mapping is to map (hopefully!).

Plotting

A Plotting is the second function that determines how values contained in Fields are plotted to a finite portion of a mathematical axis. The main purpose of a Plotting is to set the scale and range applied to the mathematical values computed by the previous Mapping operation.

Axes combination

An Axes combination is a very natural concept in Squey. After a given Input has been processed by Squey, a certain number of Fields are available and form a set of Columns/Axis. Out of this set, the user can select all the Fields it really needs for his or her investigation, and determine in which order these Fields should be organized.

Of course, through multiple selection, it is possible to repeat the some Field/Column/Axis in the Axes combination.

Format

A Format describes how an Input will be processed and represented in Squey. The Format gives the logic about how the Events are split into Fields; it gives meta-informations about Fields too:

  • their type: ‘IP’, ‘integer’, ‘string’, and so on;

  • their displayed name: ‘request’, ‘response’, ‘error code’, and so on;

  • their associated Mapping;

  • their associated Plotting;

  • and a few more parameters.

Moreover, a Format provides an Axes combination.

The association of a specific type of Input and a Format is fundamental in Squey.

Invalid event

As said earlier, a Format describes how Events are extracted from an Input and split in Fields. Sometimes, it happens that the extraction and splitting process described by a given Format fails for some Events.

This can happen for different reasons:

  • malformed Event in the original Input

  • mismatch between the splitting process coded in the Format and some exotic values contained in the Event

  • etc.

In Squey, these unrecognized Events are named Invalid events. Of course, this is relative to the given Format and is likely to be handeled perfectly by a more suitable Format.

Source

A Source is another central concept in Squey. It corresponds to the association of one or many Inputs and a fixed Format that is adapted to the Inputs.

To give a simple example, a set of Proxy files, all of the same type, and a dedicated Format (that is adapted to this type of Log file) form a Source.

Data collection

A data collection is a convenience to gather different Sources according to a free criteria. This criteria can the duration covered by the Sources, the sources types, etc.

Investigation

Investigations are what Squey users will use to save their work and be able to resume their analysis later. It allows them to save:

  • a connection to the original data that was used for a particular investigation

  • all Graphical Views that were in use at the time of saving ^(*)^

  • all the Layers and their states

Layer

A layer is a sub-set of a Source’s Event set.

Zombie event

A zombie event is an Event which does not belong to the current Layer.

Selectable event

An event is ‘selectable’ if it belongs to the current layer’s event set.

Selection

A selection is a sub-set of the Selectable event set of the current Layer.

Zone

A zone is a pair of Axes or of Columns.

View

A view is the result of the application of the different processes provided by a Format on a Source: extraction, Mappings, Plottings and Axes combination.

Line Property

A line property is a property set attached to the Events of a Layer, such as their colors or their selectability.

Filter

A Filter is an algorithm used to alter the Line Property of the current active Layer of the Layer Stack View.

Workspace

A Workspace is a dockable area containing all the displayed widgets of a Source.

Data tree

A Data tree is a tree based hierarchical representation of all existing Views in an Investigation. It also provides the list of Data collections, Sources, Mappings, and Plottings of the current Investigation.