How to Create Easier Security Detections with Elasticsearch Machine Learning

In this blog, we will show how Elastic Stack can help to quickly create detections to track users’ behaviors visiting a website.


Due to increasing dependency on technology, organizations see themselves exposed to complex security threats more than ever. Therefore, cybersecurity defenses are undergoing massive changes in their methods and solutions; more data is collected, deeper knowledge is required and higher automation is preferred. A huge collection of data is processed and human resources are trained to look for patterns and indicators. The missing part however is how to extract security patterns and insights from the data so it is understandable and actionable by the domain experts. That’s where Machine Learning (ML) plays a crucial role. 

For any cybersecurity enthusiast, the question is how can ML help businesses protect against attacks and more importantly, how technology can help bridge the gap between non-technical resources and advanced technical solutions.

An essential concept in Machine learning is unsupervised time-series analytics that plays an important role in cybersecurity computing. The historic data needs to be collected through time, queried, analyzed, and compared with newly arrived data to identify deviations from past normal data as potential indicators of malicious behaviour. While there are a variety of statistics and learning packages for advanced users to implement these solutions, there are open-source packages such as the elasticsearch, which offer ready-to-apply search and analytics solutions for non-technical users as well as businesses and large organizations. 

This blog will cover how time-series analytics can help in cybersecurity and how we can use elastic resources to implement security analytics solutions.

What’s Time-Series Analytics

A time series is a series of measurements ordered in time. The data points can be monitoring results of processes, business metrics, application performance, network indicators, etc. The time factor is an independent variable and indicator measurements are dependent variables which we try to find their internal structure and pattern. 

The purpose of time-series analytics is to find internal patterns such as autocorrelations and seasonal variations in the sequential measurements of indicators to investigate abnormal behaviours or predict future events. An example of time-series data in a cybersecurity environment is user behaviour measurements through time, such as the amount of uploaded content or the number of connections to specific URLs, which may help to identify compromised accounts that are managed by the attackers and showing a non-normal activity compared to the previous history of the users.

How we can implement Time-Series Analytics with Elastic analytics services

There are a variety of techniques and algorithms to perform time-series analytics, including statistical analysis and time-series decompositions. The core functionality of these methods is to model the normal behaviour so we can flag any new measurement which significantly deviates from the normal pattern. This part explains how we can use different elastic services to perform simple behaviour analytics of wiki pages visits. Once you understand the process and available options, you can apply the same approach to more complex datasets. The dataset we are using is extracted from one of the Kaggle competitions that include daily views of different wiki pages. We combined the results based on the language of those pages and created a dataset including the total number of daily visits for some of the top-visited languages. The following table shows a sample record of the dataset.

As you can see, each record includes a category column (lang) indicating the language of aggregated articles and a value column (visits) for the number of daily visits. The csv file is imported and indexed in the elastic with their File Data Visualizer feature. We upload data for each category (language) separately to the elastic indexes and results are then combined using a general index pattern that shows the data for all languages in one place.

The value column is a single numerical feature that tracks the number of visiting users through time. Therefore, it can easily be converted and visualized as a single time series. For example, time-series data for “media” and “es” categories look like this.     

The graph shows the trend of changes in daily visits for each day between 2015 and 2017. Looking at the graph, you can see some unexpected spikes in the visits number of media pages in the right part of the graph. Moreover, there is a clear periodicity in the data of the “es” pages.

Now that we have imported and explored our data, we skip the cleaning and feature engineering parts (those steps are already done when creating the dataset from the source – refer to our previous blog here for some examples of feature engineering) and start unsupervised ML analytics to see unexpected patterns in users’ behaviour. 

Depending on the purpose of the detection and complexity of data, we have a few options for the selected analytic approach. In the following, we will discuss two of the major categories of unsupervised time-series analytics provided in Elastic.

Single Metric jobs

Every single metric job can track the variations in one indicator over time. Let’s create one job to analyse the changes in total visits of wiki pages for the “media” category. Select the “Single metric” job from the Machine Learning section and add the index containing “media” data. Once the data is imported, you can select a variety of metrics and analysis for your dataset. We select the “High sum (visits)” metric to analyse and detect high spikes in the total number of page visits. 


Another config option from this page is the bucket span which defines an interval to aggregate data. To make it even simpler, an estimate button lets elastic select the best value considering the time characteristics of your data. For this example, we select a daily bucket to create a time-series of data. There are also some advanced configurations, such as limiting the max memory allocated and adding influencers which we will skip for this part and keep at default. We can now save the job and start.

View results: Once the job finishes processing, you can select the results option or go back to the main dashboard and open “Single Metric Viewer ” or “Anomaly Explorer” window to view dashboards and graphs of the results and calculated anomaly scores.

The following graph shows the “Single Metric Viewer” for the media dataset. As you can see, there are raised anomalies with increasing severity for the unexpected spikes in the data for the selected period. Depending on the unexpectedness of the measurement, a score between 0 to 100 is assigned to each point. Higher scores are shown with orange and red colours. This scoring helps security operation analysts prioritise the incidents and start with higher severity results to investigate and respond. You can also change the zoom to see more details of raised anomalies by sliding the time selector to select short blocks of time.

Multi-metric jobs

Single-metric jobs could respond to the question when there are higher than usual visits to the site, but it could not give us a detailed analysis of patterns for different languages or track different types of analysis, such as average or min values of indicators. 

Multi-metric jobs give the user options to create a single view of different analytics for different categories in data. Let’s create a job that raises alerts for any unexpected upward spike in total visits of pages per language. We briefly explain some of the available options to create customised learning models.

To model visits per language, we select “High sum” measurement from the “add metric” option list and the “lang” column as a split field. This configuration creates a time-series of total visits for different languages and models the trend of changes per series. As you can see in the picture, the data is categorised into multiple time-series, one series for each language. We use the “Estimate bucket span” function again that selects a 1-day interval. You can increase the span to 7 days for example to see smoother graphs with one value for each week of the measurements. We note that the language is added as the influencer in the data which is the default behaviour for these detections. Users can also add more influencers based on their domain knowledge on which attributes can best describe the causes of anomalies. In our dataset for example, if a bucket is flagged with a high anomaly score, the analyser tries to find the language that has the highest contribution in the calculated value. This helps to give an idea of possible reasons for detected anomalies and makes it easier for the users to interpret and investigate results. 

View results: The following pictures show two different views of the results. The first picture is the anomaly explorer dashboard that displays the calculated anomaly scores for different buckets and languages. The bottom table lists raised anomalies and their severity values. Each row also shows the observed value (actual) and the expected normal value (typical) for that bucket. For example, there is a high severity anomaly raised for the English dataset when the expected number of visits based on history is about 4000 while the measurements show around 8000 visits for that time interval. In a production environment, companies would probably be interested to send these abnormal findings for further investigation of possible suspicious behaviors such as DDoS attacks on the site.

For each category (language) you can also see a graph view of the time-series and raised anomalies by checking the single metric viewer graph. The second picture shows the time-series of Russian language for early 2016. You can see that while the values are mostly around 1100 visits, there is a measurement with a value higher than 1600 which is raised as a high severity anomaly. By moving the time slider to focus on mid-2016, you can zoom to the details of another spike which lasted for a few weeks before going back to their normal. 


Combining these two views, one can easily switch between high-level normalised results of all data and category-based low-level results. For example, a user can start from the top anomaly bar of overall results in the anomaly explorer dashboard which shows the combined averaged value of anomalies considering all categories. Then, users can select the interval for high severity buckets and review the bottom table to check for individual anomalies. For an interesting anomaly, they can move to their graph view which demonstrates the general trends around the time and why the observed measurement is selected as abnormal.

More Advanced Analytics

The job creation dashboard also shows 3 more analytics options: “Population”, “Advanced” and “Categorisation” jobs. Population analysis is most useful for environments where users and entities show very similar patterns. Environments that express this feature can take advantage of population-based techniques, particularly for detections where the number of entities (cardinality) to be modeled is very large. Think about a company with groups of hundreds of employees where employees of each group have similar behaviour in terms of daily visits to sites and downloading content. Instead of creating models per user, population analysis creates a baseline of behavior for each group and compares the behaviour of users with that single model. Therefore, users’ behaviour is recognised as anomalous if their data significantly deviates from the behavior of the group on average. Advanced option is the most flexible way to create a job and configure it with all available options from single/multi/population metrics. Categorisation is more concerned with the text analysis scenarios which are not relevant to our topic here.

How to improve the quality of results

An important part of any analytic solution is the input dataset. Most advanced analytical techniques could easily fail if the data is not collected and properly cleaned having the user’s requirements and expectations in mind. Therefore, the preprocessing of data including feature engineering and enhancement is crucial for any analytics solution. Apart from out-of-the-box tools and custom scripts to process the data, Elastic also offers several feature extraction and transforming capabilities to create meaningful and clean data for processing.

ML analytics are also known to be prone to high false-positive alerts; alerts that are detected as anomalies but users are not interested in as they are coming from trusted sources or not indicating any harmful behaviour in the context of that application. To achieve high fidelity results and reduce noise in investigated alerts, domain users can add their knowledge through filtering lists and calendar events to filter out results coming from known sources or unexpected observations that are happening in planned events such as shutdown times.

What’s next

This blog covered the process of performing time-series anomaly detection with elastic ML services. The next question is how to respond to the detected security incidents. To be able to take action on raised anomalies, alerts should be generated to send important indicators to users for further investigation.

This can be added through built-in watcher capabilities of elastic, which also enables custom filtering and transformation of the alerts, to only send interesting results or combine alerts and send aggregated results for the review. However, which indicators are important or how to best filter results to send high-quality triage information will require some knowledge of security attack techniques including their patterns and structure. We will discuss some of the important security attack categories in our next blogs. 

Author:  Sara Kardani Moghadam