Covid-19 Update: We've taken precautionary measures to enable all staff to work away from the office. These changes have already rolled out with no interruptions, and will allow us to continue offering the same great service at your busiest time in the year.

Forensic Analysis of WhatsApp on Android

5694 words (23 pages) Essay in Security

18/05/20 Security Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

Abstract

WhatsApp is the most popular messaging applications available for smartphones in the world. It boasts 1.3 billion active monthly users, 1 billion of which are active every single day. While WhatsApp began life as a social-messaging platform, it is also being used by criminals to communicate with each other, evidenced by the arrest of 39 people in April 2017, as part of a Europol investigation into the sharing of child pornography via WhatsApp. WhatsApp uses end-to-end encryption, therefore the records within the WhatsApp databases extracted from the smartphones of criminals provides vital digital evidence in the prosecution of criminal cases. This paper describes several forensically significant artefacts in the WhatsApp message database on Android and provides references for digital investigators in their work in the examination of WhatsApp.

Keywords

WhatsApp, Android Forensics, Smartphone Forensics, SQLite

1.    Introduction

WhatsApp is the most popular messaging applications available for smartphones in the world. It boasts 1.3 billion active monthly users, 1 billion of which are active every single day.[1] WhatsApp users are sending 55 billion messages per day, including 4.5 billion photos, and 1 billion videos. WhatsApp was acquired by Facebook in February 2014 for US$19.2 billion, standing alongside its own messaging application, Facebook Messenger. It is a cross-platform application, available on the Android, iPhone, Windows Phone, BlackBerry and Symbian platforms. While WhatsApp began life as a social-messaging platform, it is being used by criminals to communicate with each other, evidenced by the arrest of 39 people in April 2017[2], as part of a Europol investigation into the sharing of child pornography via WhatsApp.

The purpose of this paper is to forensically examine significant artefacts present in the message database used by the WhatsApp messaging application on Android 7, codenamed Nougat. WhatsApp advertises itself as “Security by Default”[3] as, since April 2016,[4] it uses end-to-end encryption by default for all its message transfers. However, this does not extend to the data at rest, when it is stored on the Android handset, where it is stored in an unencrypted SQLite database. As this paper shows, forensic examination of this database allows message conversations to be reconstructed easily, and thumbnails of shared images can be reassembled even when the actual image has been deleted from the file system of the Android smartphone in question.

As demonstrated in the following images, two of the basic functions of the WhatsApp application are interactions via instant messages (Fig. 1[5]) as well as two-way video or voice calling (Fig. 2[6]). Messages can hold various types of content, including text, audio, video, location, and even documents can be shared as attachments. This paper explores how to acquire the WhatsApp message database from a smartphone, and how to decode the encrypted database backup. It then goes through the analysis of this database and provides proof of concept code to replay message conversations in a human readable format. Finally, to validate the acquisition and analysis, experimental tests were carried out on various Android smartphones and WhatsApp versions.

Figure 1 – WhatsApp message interface

Figure 2 – WhatsApp 2 way call

2.    Related Works

To begin this research, it was a requirement to have a knowledge of the current trends in forensic analysis of Android devices. Older versions of Android (up to version 2.3, codename Gingerbread) used a file system called YAFFS2. Since then, Android has switched to using the ext-4 file system primarily because of its multithreading support. Lessard & Keppler (2010) wrote one of the first papers to deal with the issues involved in forensically examining a mobile device running the Android operating system. However, this paper was written in a time when YAFFS2 was the file system of choice on Android systems. The tools that they document, for example, Android Debug Bridge (ADB) and dd, are still in use today albeit in updated forms. Their paper initially describes the physical forensic examination of SD cards, the challenges involved in accessing the root partition of the physical device and extracting the relevant data from it. It then goes on to discuss the analysis of the extracted data, introducing tools such as Access Data’s Forensic Analysis Toolkit and compares it with logical analysis of the database files used by apps as well as Cellebrite UFED (Universal Forensic Extraction Device). However, these tools were not available for this research.

Hoog (2011) in his book, presents background knowledge on how data is stored by Android applications on the device file system, for example, the use of SQLite databases. He also provides information on the strategies and specific utilities that can be used to forensically examine an Android smartphone. Mahajan et al. (2012)bring the focus of their research onto the forensic analysis of Instant Messaging applications on Android, including WhatsApp. However, they limit their research to the acquisition and analysis by Cellebrite UFED.

Thakur (2013) wrote her thesis on the forensic analysis of WhatsApp on Android smartphones, focussing primarily on memory analysis. Such analysis was out of scope for this paper, as it focusses on artefacts from the file system. Sahu (2014) and Anglano (2014) further the work on the forensic analysis of WhatsApp on Android smartphones. They discuss the forensic artefacts in the message store database, and how the database can be reconstructed into a human readable format. Anglano especially analyses the msgstore.db database file, examining each table, and determining what forensic evidence, if any, can be construed from the database.

3.    Material and Methods

The primary focus of this paper is to analyse the database used by WhatsApp to store its data. This database file was found to be stored in /data/data/ partition, however this is inaccessible to the end user unless they are using a rooted[7] device. Forensic examination tools such as Cellebrite’s UFED can access this partition but that is outside the scope of this paper.

3.1  Data Acquisition

The current version of Android Studio (version 2.3) was installed on a Windows 10 PC, and a virtual Google Pixel XL smartphone device was created using Android Virtual Device (AVD), the emulator that ships with Android Studio. As the AVD emulator does not include the Google Play Store, the WhatsApp APK[8] (version 2.17.130) was downloaded from WhatsApp’s website[9] and installed onto the emulated device using drag and drop.

The first step to activate WhatsApp is to associate it with a phone number (Fig. 3). As the AVD emulator does not have a SIM card and thus does not have the ability to send/receive SMS messages, WhatsApp on the emulator was activated using the phone number of an actual SIM card and used the security code received via SMS to complete the activation. Once that was complete, I was able to verify that WhatsApp was installed correctly by exchanging several messages of various types between WhatsApp on a physical smartphone and the emulated smartphone (Fig. 4).

Figure 3 – WhatsApp activation screen

Figure 4 – Activated WhatsApp

Prior research confirms that WhatsApp stores its message data in an SQLite database located in the file /data/data/com.whatsapp/databases/msgstore.db. However, it appears that the Android Device Monitor that ships with Android Studio 2.3 does not allow access to the /data/data partition when the emulator is running Android 7.x, but it does allow access when emulating earlier versions of Android. A fix is reportedly coming in version 2.4 of Android Studio.[10]

It was noted that, by default, WhatsApp creates a backup of msgstore.db every night at 02:00, and encrypts it in a user accessible folder /sdcard/WhatsApp/Databases/. This file is encrypted and named msgstore.db.crypt12 (Fig. 5). WhatsApp also stores the previous 8 backups using the following naming convention: msgstore-YYYY-MM-DD.1.db.crypt12 giving 9 encrypted backups in total. A copy of the most recent backup file was copied to a Windows 10 PC for analysis and possible decryption.

Figure 5 – WhatsApp encrypted backups in user accessible storage

3.2  Decryption

Research reveals that crypt12 is a modified version of Spongy Castle – a cryptography API library for Android.[11] A python script[12] was located which purported to decrypt WhatsApp databases when given the decryption key. Previous research, Thakur (2013), indicated that all WhatsApp backups were encrypted with the same key (346a23652a46392b4d73257c67317e352e3372482177652c), however, this no longer seems to be the case, as when this key was used to decrypt a backup it failed.

Research on the internet gave the location of the decryption key as /data/data/com.whatsapp/files/key, but, as before this location is inaccessible without some specialist forensic tools on non-rooted devices. WhatsApp’s own FAQ[13] gives steps to copy WhatsApp Messenger from one phone to another, as follows: create a manual backup; copy /sdcard/WhatsApp from the original smartphone to a new smartphone; install WhatsApp on the new smartphone. This procedure lends to the belief that the encryption/decryption key is somehow tied to the phone number.

Given that, as discovered, the Android Device Monitor in Android Studio 2.3 does not allow access to the /data/data partition on Android version 7, it was decided to test out the move database files advice given by WhatsApp on how to move databases between smartphones. An Android version 6 emulated smartphone was created, and the WhatsApp application was installed, and activated using the same phone number that was used to activate WhatsApp on the Android 7 smartphone, without copying /sdcard/WhatsApp from the original smartphone. It was noted that Android Device Manager was able to access the /data/data partition on this emulated smartphone, so the encryption key /data/data/com.whatsapp/files/key was extracted to the local computer. The python script from earlier was used to decrypt this database, using the key from the Android 6 emulated smartphone. This appeared to be successful as, what seemed to be a decrypted database file, msgstore.db, was generated.

3.3  Analysis

The msgstore.db file was examined using DB Browser for SQLite[14], a browser for SQLite database files. It was immediately obvious that the database schema had changed significantly since Anglano (2014) as the database now contained twenty-one tables, whereas it had 3 tables in 2014. Further analysis of the database shows that the three tables that existed when Anglano (2014) carried out his research still exist, but they contain many more fields. A listing and a description of the tables containing evidential information follows (Table 1).

Table Name

Description

chat_list

List of conversations

frequents

List of frequently contacted users, some users appear twice

group_participants

List of WhatsApp groups, along with their participants

group_participants_history

List of WhatsApp groups the user was previously a member of

media_refs

References to media stored in the user’s filesystem

media_streaming_sidecar

Unknown as to evidential data

message_thumbnails

Storage location of media thumbnails since April 2017

messages

List of messages sent and received

messages_edits

Empty table, possibly for future use

messages_fts

Unknown as to evidential data

messages_fts_contents

Unknown as to evidential data

messages_fts_segdir

Unknown as to evidential data

messages_fts_segments

Unknown as to evidential data

messages_links

List of messages containing URL links

messages_quotes

List of messages containing quotes of other messages

messages_vcards

Empty table, possibly for future use

messages_vcards_jids

Empty table, possibly for future use

props

Internal WhatsApp table

receipts

Listing of message receipts for messages sent to groups

sqlite_sequence

Internal WhatsApp table

status_list

Relating to WhatsApp Status, where broadcast messages may be sent to all contacts, and disappear after 24 hours

Table 1 – Table names and descriptions from msgstore.db

The tables that contain the most evidential data and those that the rest of this paper concentrates on are chat_list, messages, and message_thumbnails. Also of note are the group_participants and the group_participants_history tables, but the contents of those can also be garnered from the messages table. A complete description of the three pertinent tables identified follows.

3.3.1       The chat list table

The chat_list table contains the list of all the conversations the user was a member of. A conversation is defined as a series of messages, of various types, exchanged between the user and either one other user, or a group of users. Each entry in the table contains a record of each conversation, from the creation time and the subject in the case of group conversations to the number of unread messages. Table 2 below describes the chat_list table with each field, and Figure 6 contains an example screenshot of the database using DB Browser for SQLite.

Field Name

Description

_id

Key

key_remote_jid

Conversation ID

message_table_jid

References _id column in messages table – points to last message received in conversation

subject

Group Only – Subject of the Group

creation

Group Only – Creation Time (milliseconds since epoch) of the Group

last_read_message_table_id

Same as message_table_jid – points to last read message in conversation

last_read_receipt_sent_message_table_id

Same as message_table_jid – points to last read message in conversation where a read receipt has been sent

archived

Archived bit – https://www.whatsapp.com/faq/en/bb/20888029 – 1 if archived, NULL otherwise

sort_timestamp

An epoch timestamp in milliseconds, used for sorting

mod_tag

Unknown

gen

Unknown – all NULL

my_messages

Signifies whether you have sent a message in this conversation -1 if so, NULL if not

plaintext_disabled

Unknown – all 1

last_message_table_id

Appears to be the same as message_table_jid

unseen_message_count

Count of unseen messages in conversation

unseen_missed_calls_count

Count of unseen voice calls in conversation

unseen_row_count

Unknown, not the same as unseen_message_count

vcard_ui_dismissed

Possibly whether the vCard UI has been dismissed or not.

Table 2 – Description of the chat_list table

Figure 6 – The chat_list table, contact information obfuscated

The key_remote_jid field aligns to a conversation ID, where in the case of a one to one conversation is in the format [email protected] In the case of a group conversation, the format of the key_remote_jid is in the format [email protected] The phone number being in the international dialling format without the leading zeros, e.g. 353881234567, in the case of a group conversation, the group creator, and the timestamp is the number of seconds since epoch[15] and corresponds to the time that the group was created.

3.3.2       The messages table

The messages table is the primary table used by WhatsApp to store the actual content of the conversations referenced by the chat_list table. Each record in the table contains a message sent or received by the user. A description of each field in the messages table follows in Table 3.It should be noted that some of the fields only contain NULL values while having descriptive field names. It is assumed that these fields are reserved for future use by the WhatsApp messenger application.

Field Name

Description

_id

Key ID

key_remote_jid

Individual – [email protected] | Group – [email protected] – GID is timestamp in seconds

key_from_me

Message direction – 0 is to me, 1 is from me

key_id

Message ID

status

Meaning unknown, possibly delivery status.

needs_push

1 when message has not yet sent, otherwise 0

data

Message content, when no attachment, except when attachment type is a vCard, when it contains the actual vCard

timestamp

Epoch time (milliseconds) of the time the message was sent

media_url

URL of attachment – Encoded for sent messages after Jan 2016, and for received after March 2016

media_mime_type

MIME type of attachment, when attachment type is image, audio or video

media_wa_type

Type of media in attachment: 1=image, 2=audio, 3=video, 4=vCard, 5=location, 8=call, 9=document, 13=video also? (0 = no attachment)

media_size

Size of the attachment in bytes

media_name

The name of the attachment when it’s an image. Preview of webpage when data contains a URL

media_caption

Text caption of the attachment. The “title” of the web page when data contains a URL.

media_hash

Appears to be base 64 hash of the attachment when the attachment is an image, audio or a video

media_duration

Duration of the attachment when the attachment is an audio, video or a call

origin

Unknown, vast majority is 0, ~ 10% is 1 and a very small number is 2

latitude

Latitude when the message attachment is a location

longitude

Longitude when the message attachment is a location

thumb_image

Possible thumbnail of attached image – untried but this may decode – http://stackoverflow.com/questions/24828383/how-to-read-the-thumb-image-column-in-the-whatsapp-messengers-sqlite3-database

remote_resource

Message sender (except self) in the case of groups (see key_remote_jid above)

received_timestamp

Epoch time (milliseconds) of the time the message was received

send_timestamp

Unknown as yet, everything appears to be -1

receipt_server_timestamp

Epoch time (milliseconds) of the time the message was received by WhatsApp servers (single grey tick) (own messages only)

receipt_device_timestamp

Epoch time (milliseconds) of the time the message was received by the recipient (or all recipients in the case of a group) (double grey tick) (own messages only)

read_device_timestamp

Epoch time (milliseconds) of the time the message was read by the recipient (or all recipients in the case of a group) (double blue tick) (own messages only)

played_device_timestamp

NULL, of unknown evidential value

raw_data

NULL, of unknown evidential value

recipient_count

Number of members in group when message was sent

participant_hash

Unknown

starred

NULL, of unknown evidential value

quoted_row_id

row id (message_quotes table) of the message quoted

mentioned_jids

NULL, of unknown evidential value

multicast_id

NULL, of unknown evidential value

edit_version

NULL – possibly reserved for the future when WhatsApp will allow you to edit sent messages (c.f. messages_edits table) – http://www.telegraph.co.uk/technology/2017/01/30/whatsapp-let-users-edit-recall-sent-messages/

media_enc_hash

NULL

Table 3 – Description of the messages table

Of primary note here, is the media_wa_type field. WhatsApp allows the user to send and receive different types of messages, for example, images or videos. The media_wa_type field defines the type of message that the record contains, with the following cipher: 0 = text; 1 = image; 2 = audio; 3 = video; 4 = contact / vCard; 5 = location; 8 = audio / video call; 9 = document. Where media_wa_type is 1, 3 or 5 there is also a thumbnail in binary format stored in the database. This location changed during the research of this paper: prior to the 21st of April 2017, the thumbnail was stored in the data field of the messages table; but since then the thumbnail has moved to the message_thumbnails table as described in the next section. The actual media exchanged in the case of an image or video is stored in the user accessible directory, /sdcard/WhatsApp/Media once downloaded, either automatically or by the user.

3.3.3       The message_thumbnails table

Prior to April 2017, the message_thumbnails table existed but was blank. After the 21st of April, this table became the storage location of thumbnails of images, videos or Google map locations that have been sent or received by the WhatsApp location. The resulting thumbnail is a maximum 100×100 representation of the actual image, video or Google map location. A description of the message_thumbnails table follows in Table 4.

Field Name

Description

thumbnail

Blob – the first 4 bytes are “ff d8 ff e0” i.e. the file signature hash of a jpg

timestamp

Date time in epoch

key_remote_jid

Sender / Receiver ID

key_from_me

Same as key_from_me in messages table – 0 is to me, 1 is from me

key_id

30 char hex string

Table 4 – Description of the message_thumbnails table

The thumbnail field here is a blob type, containing a binary image. During examination of the data contained in this field, it was noted that the first four bytes of each record were ff d8 ff e0 which is the file signature of the JPEG file format.

3.4  Conversation Reconstruction

While there are existing scripts[16] to convert the content of a WhatsApp msgstore.db table to HTML, to reconstruct the chat history, it was noted that these are several years old and did not take account of the updates made to the WhatsApp application since. For example, the change in the storage location used by thumbnails, as mentioned in section 3.3.2. It was therefore decided to create some proof of concept code in Python to take account of these changes. This proof of concept code is contained in the attachment examine_msgstore.py and a screenshot of the output of this script is contained in Figure 7.

Figure 7 – Example output of the extract_msgstore.py script, some contact details obfuscated

4.    Experiments and Results

As part of the research for this paper, it was necessary to validate the findings across multiple devices. Five different test devices were procured, each running Android 7, and the WhatsApp application was installed on each. These five devices were given to five different users, and they were instructed to communicate via WhatsApp messenger, with each other, and with third parties from their own contacts. The users were instructed to test with all the message types as described in Section 3.3.2. After a period of two weeks, the five devices were examined and the WhatsApp database backup was extracted from each device. Once a copy of each database was acquired, WhatsApp was installed and activated using the phone numbers of each of the five devices on five different emulators. The key file was then copied from the emulator to a local PC, and an attempt was made to decrypt the database backups, using the method outlined in Section 3.2.

Each of the five databases was opened in DB Browser for SQLite and the database schema was examined to see if they matched the earlier database acquired from the Android emulator. The databases were then parsed using the examine_msgstore.py proof of concept script to see if the conversations matched those in the WhatsApp messenger applications. The results of the experiments of data acquisition, decryption, schema examination and conversation reconstruction are displayed in Table 5.

Device

Android ver.

Acquisition

Decryption

Schema

Conversation

Samsung Galaxy S7

7.0

Samsung Galaxy S8

7.0

HTC 10

7.0

OnePlus 3

7.1.1

Google Nexus 6P

7.1.1

Emulated Google Pixel XL

7.1.1

Table 5 – Experimental results

5.    Conclusions and Further Research

In this paper, several functions relating to the forensic analysis of WhatsApp on Android were examined. These were: data acquisition; decryption; analysis; and conversation replay. It was found that that the research previously carried out by Anglano (2014) is still valid, with some slight changes, namely that the storage of message thumbnails has changed since his research was carried out. The empty tables in the WhatsApp database are indicative of upcoming changes within the application, especially with reference to the ability to edit messages, the tracking of such edits, and the possible movement of vCards out of the messages table into their own table. As both WhatsApp, and less frequently Android, are constantly updated, it would be worthwhile to repeat these experiments on a regular basis to ensure that they are still relevant.

6.    Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

7.    References

  • Anglano, C. Forensic Analysis of WhatsApp Messenger on Android Smartphones. Digital Investigation Journal, 11(3) pp. 201-213. 2014
  • Hoog, A. Android Forensics: Investigation, Analysis, and Mobile Security for Google Android. Syngress Publishing. 2011
  • Lessard, J., Kessler, G.C. Android Forensics: Simplifying Cell Phone Examinations. Small Scale Digital Device Forensics Journal, 4(1), pp. 1-12. 2010
  • Mahajan, A., Dahiya, D.S., Sanghvi, H.P. Forensic Analysis of Instant Messenger Applications on Android Devices. International Journal of Computer Applications. 68 (8) pp 38-44 (2013)
  • Sahu, S. An Analysis of WhatsApp Forensics in Android Smartphones. International Journal of Engineering Research. 3 (5), 349-350. 2014
  • Thakur, NS. Forensic Analysis of WhatsApp on Android Smartphones. University of New Orleans Theses and Dissertations. Paper 1706. 2013

[1] https://blog.whatsapp.com/10000631/Connecting-One-Billion-Users-Every-Day

[2] https://www.europol.europa.eu/newsroom/news/global-action-tackles-distribution-of-child-sexual-exploitation-images-whatsapp-39-arrested-so-far

[3] https://www.whatsapp.com/security/

[4] http://www.bbc.com/news/technology-35969739

[5] http://www.whatsapp.com

[6] http://www.whatsapp.com

[7] Rooting is the process of allowing users of smartphones, tablets and other devices running the Android mobile operating system to attain privileged control (known as root access) over various Android subsystems. (Wikipedia)

[8] Android Package Kit (APK) is the package file format used by the Android operating system for distribution and installation of mobile apps and middleware. (Wikipedia)

[9] https://www.cdn.whatsapp.net/android/

[10] https://issuetracker.google.com/issues/37123176

[11] http://www.digitalinternals.com/security/decrypt-whatsapp-crypt12-database-messages/559/

[12] https://github.com/EliteAndroidApps/WhatsApp-Crypt12-Decrypter

[13] https://faq.whatsapp.com/android/20902622

[14] http://sqlitebrowser.org/

[15] January 1, 1970

[16] https://forum.xda-developers.com/showthread.php?t=1583021

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

Cite This Work

To export a reference to this article please select a referencing style below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please:

Related Lectures

Study for free with our range of university lectures!