How to parse and extract data from Gmail to Google Sheets

Extracting data from Gmail messages and then routing that data to other automation steps is an easy task to do using Google Apps Script.

My bank sends transaction confirmation emails for each transaction in my account. I will parse and collect the expense data into a Google Sheet. By the end of the month, I will have a nice overview of all the expenses organized in a Google Sheet ready to process and analyze.

With a little customization for your own email messages, you can automate this process for your own requirements.

Create a standalone web app in Google Apps Script

For this project, we need to create a standalone web app in Google Apps Script. Go to https://script.google.com. Create a new script. Give a name to the script.

Filtering the emails

The search feature in Gmail is quite powerful and you can get to the right emails using the gmail search operators. Read more about the Gmail search operators in this article.

You can use the search operators to filter and get the transaction emails. For the purpose of this project, we need only the transaction confirmation emails and not the marketing emails from the bank.

function getRelevantMessages()
{
  var threads = GmailApp.search("from:citicorp.com AND subject:Transaction confirmation",0,10);
  var messages=[];
  threads.forEach(function(thread)
                  {
                    messages.push(thread.getMessages()[0]);
                  });
  return messages;
}

This function searches for messages from “citicorp.com” and with emails having subject containing “Transaction confirmation …”.
So the search function becomes:

GmailApp.search("from:citicorp.com AND subject:Transaction confirmation",0,10);

Note that I am passing the second and third parameter to the search() function. This is to limit the number of messages returned and to make the script run faster. If there are too many messages matching the criteria, the search will take a long time to fetch all the messages. Setting a limit makes it faster.

Let us display the messages returned by the search.

First, add an HTML template. Do: File → New and create a new HTML file. Let us name it “messages.html”

<!DOCTYPE html>
<html>
  <head>
    <base target="_top">
    <title>Message Display Test</title>
    <link rel="stylesheet" href="https://ssl.gstatic.com/docs/script/css/add-ons1.css">
  </head>
  <body style="padding:3em;">
  <h1>Messages</h1>
    <ul>
    <? for(var m=0;m<messages.length;m++){ ?>
    <li><?= messages[m].getSubject() ?></li>
    <p><?= messages[m].getPlainBody() ?></p>
    <? } ?>
    </ul>
  </body>
</html>

Now, here is the script to fetch and display the messages:

function getRelevantMessages()
{
  var filter = "from:citicorp.com AND subject:Transaction confirmation";
  var threads = GmailApp.search(filter,0,10);
  var messages=[];
  threads.forEach(function(thread)
                  {
                    messages.push(thread.getMessages()[0]);
                  });
  return messages;
}

function getMessagesDisplay()
{
  var templ = HtmlService.createTemplateFromFile('messages');
  templ.messages = getRelevantMessages();
  return templ.evaluate();  
}

You can customize the search filter so that it fetches the right email messages.

Do File → Save.

Go to the menu item: Publish → Deploy as web app.

Then provide the permissions. At the end of the sequence, it will give you the link to open the script in the browser. Open the link. (click on the link for the latest development version)

The web page should display the messages.

Parse the message

The next step is to parse and extract the data we are interested in, from the message. Regular expression is one of the easily available feature in such situations.

There is one tool that will make it quite easy to build the regular expression. It is regexr.com.

parse using regular expression

parse using regular expression

Copy and paste the message text to the text area of the tool and then compose the regular expression step-by-step. The tool will show the matches live. This makes it easier to build the regular expression.

Once the regular expression is ready, parsing out the data is easy.

function parseMessageData(messages)
{
  var records=[];
  for(var m=0;m<messages.length;m++)
  {
    var text = messages[m].getPlainBody();
    
    var matches = text.match(/Rs\.\s+([\d\,\.]+)\s+(?:spent on card)\s+(\d+)\s+(?:on)\s+([\d\-A-Z]+)\s+(?:at)\s+([\w\s]+)\./);
    
    if(!matches || matches.length < 5)
    {
      //No matches; couldn't parse continue with the next message
      continue;
    }
    var rec = {};
    rec.amount = matches[1];
    rec.card = matches[2];
    rec.date= matches[3];
    rec.merchant = matches[4];
    
    //cleanup data
    rec.amount = parseFloat(rec.amount.replace(/,/g, ''));
    
    records.push(rec);
  }
  return records;
}

Let us verify that the parsing actually gets the correct data.

Create another HTML file and name it parsed.html. We will send the parsed records to the HTML template and display it.

<!DOCTYPE html>
<html>
  <head>
    <base target="_top">
    <title>Data parsed from emails</title>
    <link rel="stylesheet" href="https://ssl.gstatic.com/docs/script/css/add-ons1.css">
  </head>
  <body style="padding:3em;">
    <h2>Parsed data from Messages</h2>
    <table style="width:420px">
    <thead>
    <tr>
    <th>Date</th><th>Merchant</th><th>Card</th><th>Amount</th>
    </tr>
    </thead>
    <tbody>
    <? for(var m=0;m<records.length;m++){ ?>
    <tr>
    <td><?= records[m].date ?></td>
    <td><?= records[m].merchant ?></td>
    <td><?= records[m].card ?></td>
    <td><?= records[m].amount ?></td>
    </tr>
    <? }?>
    </tbody>
    </table>
    
  </body>
</html>

Here is the script:

function parseMessageData(messages)
{
  var records=[];
  for(var m=0;m<messages.length;m++)
  {
    var text = messages[m].getPlainBody();
    
    var matches = text.match(/Rs\.\s+([\d\,\.]+)\s+(?:spent on card)\s+(\d+)\s+(?:on)\s+([\d\-A-Z]+)\s+(?:at)\s+([\w\s]+)\./);
    
    if(!matches || matches.length < 5)
    {
      //No matches; couldn't parse continue with the next message
      continue;
    }
    var rec = {};
    rec.amount = matches[1];
    rec.card = matches[2];
    rec.date= matches[3];
    rec.merchant = matches[4];
    
    //cleanup data
    rec.amount = parseFloat(rec.amount.replace(/,/g, ''));
    
    records.push(rec);
  }
  return records;
}


function getParsedDataDisplay()
{
  var templ = HtmlService.createTemplateFromFile('parsed');
  templ.records = parseMessageData(getRelevantMessages());
  return templ.evaluate();
}

function doGet()
{
  return getParsedDataDisplay();
}

Note: I didn’t parse the date to a Date() object. This is to keep the code simple. It will take only a few extra steps to parse the date.

gmail parsed data

Save the data to a Google Sheet

Now that we have the data extracted, the next step is to save the data to a Google Sheet.

function saveDataToSheet(records)
{
  var spreadsheet = SpreadsheetApp.openByUrl("https://docs.google.com/spreadsheets/d/1ql-wf1OMoZ5zGb");
  var sheet = spreadsheet.getSheetByName("Sheet1");
  for(var r=0;r<records.length;r++)
  {
    sheet.appendRow([records[r].date,records[r].card, records[r].merchant, records[r].amount ] );
  }
  
}

function processTransactionEmails()
{
  var messages = getRelevantMessages();
  var records = parseMessageData(messages);
  saveDataToSheet(records);
}

The steps are pretty straight forward. Open the spreadsheet, then append the rows to the sheet.

Preventing the same message from getting processed again and again

If you call processTransactionEmails() multiple times, it will keep adding the same rows multiple times.

One way to prevent that from happening is to add a label to the processed emails.

function labelMessagesAsDone(messages)
{
  var label = 'payment_processing_done';
  var label_obj = GmailApp.getUserLabelByName(label);
  if(!label_obj)
  {
    label_obj = GmailApp.createLabel(label);
  }
  
  for(var m =0; m < messages.length; m++ )
  {
     label_obj.addToThread(messages[m].getThread() );  
  }
  
}

function processTransactionEmails()
{
  var messages = getRelevantMessages();
  var records = parseMessageData(messages);
  saveDataToSheet(records);
  labelMessagesAsDone(messages);
  return true;
}

function getRelevantMessages()
{
  var filter = "from:citicorp.com AND subject:Transaction confirmation AND -label:payment_processing_done";
  var threads = GmailApp.search(filter,0,10);
  var messages=[];
  threads.forEach(function(thread)
                  {
                    messages.push(thread.getMessages()[0]);
                  });
  return messages;
}

We add the label payment_processing_done to the messages that are processed. Notice that we updated the search filter -label:payment_processing_done .

This means get the messages, but without the label “payment_processing_done”.

Processing the emails as it arrives in Gmail

In order to completely automate it, this script has to keep running whenever there is a new email. However, there is no easy way to trigger the script when a new email arrives.

The other alternative is to trigger this script on a timer. Let’s run this script every hour.

For each run, we have to check for the emails received in the last one hour. However, there is no direct search filter for that.

This Gmail search: newer_than:1d will get emails in the last one day. Combine it with the other operators and we have a sufficiently narrow filter.

So the updated filter becomes:

var filter = "newer_than:1d AND from:citicorp.com "+
" AND subject:Transaction confirmation "+
" AND -label:payment_processing_done";

Timer Trigger

The next step is to trigger this script every 1 hour. We have to add a timer trigger. Go to the menu item: Edit → Current projects triggers

Click on the new trigger button.

Then create a timer trigger. Select the “processTransactionEmails” function for “the function to run”.

gmail check email timer

See also

Leave a Reply 0 comments

Leave a Reply: