Month: November 2012

Excel VBA Macro Code to Find Word-Level N-Grams in a Text Entry

Excel has a lot of quantitative power, but it has always had issues with parsing text.  While there are the basic text functions (left, mid, right) and a useful substitution function, sometimes something that seems like it should be easy isn’t.  As a case in point, if you want to extract individual words from text, while it can be done with the Text to Columns feature using the space as a delimiter, it is not very easy to do with a formula, because you basically need to find where a certain space is within the text, which means counting where the previous spaces are, which requires a loop.  Which then requires VBA.  So, I did a little VBA coding tonight, and tried to make a function that not only can find single words, but also phrases, as n-grams.  I have now used this to do some qualitative analysis of text, where combining these functions with Pivot Tables, I have been able to have Excel be a decent basic qualitative analysis tool for text.

Here is the VBA file for the code, since when I tried to post the code inline with this message, my WordPress host thought I was trying to do some sort of injection attack or something.  Also you can find an example of how I used the code in the analysis I recently did of Literature and Medicine college courses.

Update: I have improved the efficiency of the code by using VBA’s split function instead of using a lot of extra loops.