CART-ORP: Predicting Email Open Rate


The question of “what leads to a better open rate” has always been troubling marketers. Broadly there are 3 factors that affect open rates- campaign timing, campaign list and subject line. Out of these, the subject line is the only factor that the marketer can truly control. Typically, marketers have to resort to A/B testing to figure out which subject line works better for a given objective. However, A/B testing does not take into account the learnings from history, and does not give the marketer any recommendations for improving open rate. Cart-ORP uses learnings from history and provides a platform to test the open rates before sending out an email. This gives the marketer an opportunity to tweak the subject line before executing a campaign and achieve higher open rates.

Cart-ORP: Our Approach

Cart-ORP considers historical subject lines and open rates, and builds a random forest model on a set of automatically extracted variables created by parsing the subject lines using text mining techniques. When a fresh subject line entered in the interface, its open rate is predicted by scoring on the pre-built model. The basis of our modelling approach is “A subject line is an ordered set of features”, which essentially means every word or set of words represents a feature of the subject line.


Depending on the business domain (Financial institution, retail, QSR etc.) we create an exhaustive list of features that could be used in the email subject lines. Over 30 such features have been identified for building the predictive model. The next and most critical step is to create a lexicon which contains the word to feature mapping.

Cart-ORP: Predictive Modelling

2 types of variables were considered for the predictive model.

1. Syntactical Variables: Number of words, Length in characters, presence of special characters etc.

2. Feature Variables: The subject-line is automatically parsed into a set of features using text mining

algorithms. (as illustrated in the above image). A feature has 2 attributes:

a. Position- based on the position within the subject line

b. Effectiveness in driving email open- This is represented by

creating a number of scores for the feature

The effectiveness of the features in driving open rates is captured in a feature master table that has historical scores for all the features. For each feature- 4 types of scores are considered

  • Simple score- based on the open rate when the feature is present
  • Push score- accounts for the impact of a feature in making the open rate higher than median open rate for the category
  • Count score- accounts for frequency of use of the feature
  • Interaction score- accounts for impact of a feature in pushing open rate considering all feature combinations

In addition to the above 4 scores, we created the following variables for use in the predictive model:

  • Feature Combination Index- accounts for differential effect of combining features
  • Position wise feature index- accounts for different pull of a position
  • % of Good Words- accounts for the differential impact of words within a feature
  • % of good words in first 5 words

A random forest model is built using the syntactical variables and the features scores for the features present in the subject line.

Every time a new subject line is entered, the modelling variables are automatically extracted by parsing the subject lines using text mining techniques. The open rate is predicted by scoring on the pre-built model. On both ‘out of sample’ and ‘out of time’ testing the Cart-ORP model has a very high degree of accuracy, as measured by ‘Mean absolute error’. The next version of Cart-ORP also contains a diagnostic of the entered subject line- in terms of the presence and power of the features contained within the subjectline.


There are no reviews yet.

Leave a Reply

Your email address will not be published. Required fields are marked *

The message will be closed after 20 s