Noob to Pro: python

To those who are well-versed with authentication, we know that there are three things that may be used as an authentication factor.

Something we know (pin, passwords).
Something we have (OTP pad).
Something we are (iris, fingerprints).

The least secure authentication mechanisms use only one of the above. A more secure mechanism uses two or more. The popular ones use two of the above factors - called 2FA or Two-Factor Authentication.

The need for eliminating single factor authentication

In many cases (including Gmail), we have just one password which lets us access our accounts. So, what happens if a person with malicious intents gets hold of it? We get a notification that our account has been accessed from a new browser, but that's it. The malicious person gets access to our account and all is lost.

There are many solutions to circumvent the above problem. One of them is 2FA which maybe a password/biometric or password/OTP combination. Keystroke dynamics is another solution which prevents a person from accessing your account even if they know your password.

What is Keystroke Dynamics?

Keystroke dynamics is a type of behavioral biometric. It relies on the assumption that every person has a unique typing pattern. This typing pattern includes timing samples - the time for which a key is pressed and/or the time between key presses and more.

If a malicious person knows your password, keystroke dynamics can help prevent access to your system because the malicious person most likely has a different typing pattern than you. In the world of cyber-cells and cyber-police, just preventing access is not enough. It is also vital to know the identity of the person trying to hack into the system.

The concept of keystroke dynamics can be used in two ways:

User authentication - involves comparing input data with the authorized user template.
User identification - involves comparing input data with ALL the registered user templates.

In general, user identification is more resource intensive and also more prone to inefficiencies. In this post, I'll set up a simple keystroke-dynamics-enabled user identification system using Python 2.7, which analyzes the typing pattern of the user using simple statistics and identifies the most-probable user if possible.

It is important to note that simple statistical analysis of timing samples lead to inefficient or not-very-efficient identification systems. However, authentication systems using statistical analysis are reliable.

Which features are we using?

In my system, I'll be using three simple features:

Dwell time - the time interval for which a key is pressed down and released.
Flight time - the time interval between key press and next key press.
Key affinity - user preference to use shift or caps lock keys for special or uppercase characters.

Overview

System Training Overview

This algorithm uses the simplest method of training - static training.

We'll be using a single password, .zoroBen1 which each user types thirty times. With this password, we should be able to extract ten dwell time samples and nine flight time samples. I've ignored two flight time samples because of their unstable nature:

the flight time between a character and the caps-lock or shift keys. However, the flight time between caps-lock | shift key and the next character is considered.
the flight time between the last entered character and the carriage key.

A flag marks the difference between the usage of shift or caps lock keys. This flag value combined with the timing samples forms the template for a particular user.

System Testing Overview

When testing, the input data flows through a statistical layer where it is compared with all the user templates available in the system. The system then outputs a list of usernames from the most probable to the least probable. This type of system performance measurement is called the Cumulative Matching Curve (CMC) which is a rank-based performance measurement algorithm.

Algorithm Flow

I've described the flow on a very high level here. If you need more information please refer to the published paper, the link for which I've mentioned in the Readings section.

The first step is to initialize global variables as part of object attributes. This includes - sample password string, number of expected dwell time samples, number of expected flight time samples, accepted deviation, etc.
I've used pygame module of Python to capture the password and calculate the associated timing samples.
A simple string check is executed to check if the entered password is correct.
Depending on the script action, there are two paths the script can take:

if the system is in training mode, it will store the key-affinity flag value and timing samples in a csv file.
if the system is in testing mode, it will move into the comparison layer.

If the code flow reaches this point, then it is in testing mode. At this point, the templates for all users are read from multiple .csv files
For each character's timing sample, the Euclidean distance between the user input and the user template is calculated. If this is less than a certain multiple of the standard deviation for that character for that specific user template, then the input timing sample for that character is considered a hit (or a match).
Now that we have the comparison data between the user input and user templates, we proceed to calculate the scores. This score is basically a measure of the closeness between the user input and the various user templates.
This score is then combined with the caps-lock | shift flag to form the final rank list of users.

Results

With the help of friends and colleagues at NetApp, I was able to get training samples of 25 users. Each typed the sample password thirty times under a semi-controlled environment.

When ten users typed the password, the system was able to identify them correctly (i.e. at rank-0) 76% of the time and in the top 5, 100% of the time. When twenty users tried typing the password, the system identified them correctly 62% of the time and in the top 5, 92% of the time.

As expected, the system is not so efficient. The most probable cause for this is the level of statistics used. Like I said before, simple statistics is not sufficient for identification systems.

Readings

Machine learning is a very good concept for this application as explained in this paper: https://ieeexplore.ieee.org/document/7833085/

The paper for the system I described was published at http://www.ijemr.net/DOC/AStudyOfPersonIdentificationUsingKeystrokeDynamicsAndStatisticalAnalysis.pdf

The GitHub repository for this system lies at: https://github.com/nikhilh-20/keystroke_dynamics

Take away

Keystroke dynamics is a reasonably good behavioral biometric for authentication or identification. However, it does have its inherent drawbacks.

A person's typing pattern may change if he/she is given a new laptop or if he/she gets used to his/her new laptop.
Timing samples may vary between keyboards and processors.
Environmental disturbances are unpredictable which may affect the typing pattern.
With a billion people on the planet, there is a very high chance that there is more than one person with a similar typing pattern.

Demo

Here's a demo on the working of the system:

I was recently admitted into the University of Maryland, College Park (UMCP) as a graduate student in the M.Eng. Cybersecurity program. I was also admitted into Johns Hopkins University (JHU) Security Informatics (MSSI) program. I chose not to attend JHU for various reasons but I did end up in their WhatsApp group. It was here that I was exposed to Slack.

While I was working at NetApp, other developers and QA engineers used HipChat. I'd never heard of Slack before. So, once I read about it (HipChat and Slack are basically enterprise level IM apps) I immediately created a Slack workspace (basically similar to WhatsApp groups but bundled with many more features like channels, bots, etc.) and requested my fellow UMCP classmates to join. At NetApp, I worked primarily with test automation, so of course building a Slack bot came naturally as a must-do micro-project. Without further digress, let's begin!

Setting up and installation of the Slack app

What is the purpose of this bot?

I'm a new graduate student and naturally I was looking at the courses being offered this Fall. My fellow classmates were too. Any time someone would ask about a course, another would copy-paste the course description from Testudo in the WhatsApp group. Looks quite manual doesn't it? Automation to the rescue!

What's in a Name?

Naming anything is an insane task. It must sound good, reflect its purpose, etc. Finding names for stuff is not my forte, so I decided to go with Course Descriptions Bot. Coursedesc in short.

How do I create a Slack App?

Visit: https://api.slack.com/apps?new_app=1 Once you do, you'll be greeted with the Create a Slack App interface. Fill in the two fields and click on Create App.

On the next page,

Click on Bot Users in the left column.
Click on Add a Bot User.
Fill in the two fields.
For better usability, it is preferable not to show your bot as online always.
Click on Add Bot User.

Now that you're done setting up the identity of your app, we need to install it in the workspace. Click on Install App in the left column. Then click on Install App to Workspace.

Click on Authorize.

On the next page, we have very important information. Take special note of Bot User OAuth Access Token. Do not share these tokens with the world. For more information on auth tokens, read the documentation of token types: https://api.slack.com/docs/token-types#bot

That's it! You're now ready with a brand new Slack App. The next step is to code the daemon which will connect with our app.

Programming the bot

Slack Client

Remember the Bot User OAuth Access Token we noted before? We need that here to create a slack client object.

from slackclient import SlackClient

slack_client = SlackClient(os.environ.get('SLACK_BOT_TOKEN'))

Slack Real Time Messaging (RTM) API

Slack's RTM API enables us to receive events and send simple messages to Slack in almost real-time. The first step is to connect with our app:

slack_client.rtm_connect()

The above function call returns True if the connection was successfully established and False if otherwise (also results in a SlackLoginError exception).

Authentication

Now that we are connected to our app, the next step is to authenticate ourselves using the auth.test() method:

course_desc_bot_id = slack_client.api_call("auth.test")["user_id"]

The response of auth.test() is a Python dictionary which contains keys like url, user_id, user, etc. Here we'll extract the user_id. We'll use this later to check if a Slack message was sent by a user or our bot. To read more about the auth.test() method, visit https://api.slack.com/methods/auth.test

Reading events/messages sent to Slack

When we send a message to Slack, our bot will read it to determine if it is addressed to it. It will reply only if so. To read a slack event, we'll use the rtm.read() method

slack_client.rtm_read()

On successful connection (and authentication), the first event will be a simple hello message

[{u'type': u'hello'}]

When you send a message to Slack, the daemon receives an event:

[{u'client_msg_id': u'86e18aef-ac7c-4968-983c-a1a943c1c6f5', u'event_ts': u'1530900342.000367', u'text': u'<@UBHK5HHLH> help', u'ts': u'1530900342.000367', u'user': u'UBJ6NTCNT', u'team': u'TBGJ88997', u'type': u'message', u'channel': u'CBHUMNB9A'}]

The ones of interest here are the type of event, the channel from where it was sent and the text that was sent. Our bot should respond only in the case of a message event type and if it was addressed directly. To verify this, we'll use the values of type and text (a part of it). At the same time, we'll also extract the channel from where it was sent and the message text as well (in this case, help).

We have the command. Now what?

As far as the purpose of this bot goes, the code is written to handle only two scenarios:

The student asks for a list of courses available for the term
The student asks for the description of a specific course

In both cases, we'll be scraping the Testudo website for information.

Once we have our information, our bot will send it to Slack, through an API call, on the same channel where the message was initially received.

slack_client.api_call(

"chat.postMessage",

channel=channel,

text=course_description

)

How often to check for Slack events?

There is no fixed answer to this. The more number of times you check, the more resources you use. The RTM API is not recommended to be used by large teams. In our case, we check for events every one second.

Demo

We have a bot that is operational at this moment. However, there is development pending - most of them future enhancements. To take a sneak-peek, visit my GitHub repo: https://github.com/nikhilh-20/slack_bots

Here's a 1:14 minute demo:

That's it! You should have a live working Slack bot ready to make life easy for you!

Noob to Pro

Keystroke Dynamics using Statistical Analysis for User Identification

The need for eliminating single factor authentication

What is Keystroke Dynamics?

Which features are we using?

Overview

System Training Overview

System Testing Overview

Algorithm Flow

Results

Readings

Take away

Demo

Building a Simple Slack Bot

Setting up and installation of the Slack app

What is the purpose of this bot?

What's in a Name?

How do I create a Slack App?

Programming the bot

Slack Client

Slack Real Time Messaging (RTM) API

Authentication

Reading events/messages sent to Slack

We have the command. Now what?

How often to check for Slack events?

Demo

Popular posts