Integrating Python into Excel for Web Scraping, Advanced Functions, and More

For some time I’ve been wanting to learn Python.  It seems to be one of the best rapid-development open source languages to do all types of scripting activities and integration between different languages.  It also is very commonly used for scraping and advanced data processing.  And I also have realized from my past, that I want to invest my time in learning things that have long-term usability, and proprietary software, in general, does not necessarily have this benefit.

But I still want to use Excel.  While there are open source data analysis tools that are out there, they either don’t have as many features as Excel (as unfortunately Open & Libre Office don’t), or I haven’t learned them (as some of the open source math packages and languages are).  The feature problem is especially the case with data presentation, Excel can make nice looking charts, diagrams, etc, while most of the others take much more work to do these things.  (Maybe some day I will be able to help with the Open Source projects, and make them better, but for right now, I don’t have the time.)

So, my next incremental step to moving towards using more open source software, is to start to do any advanced programming for Excel spreadsheets in Python instead of VBA.  This is especially relevant to me right now, as there are many spreadsheets that I have, that I need to do web scraping for, and while I have learned to do some of this in VBA, I want to invest my energies into learning Python, which has a lot more code pre-written to do this also.

I also want to be able to create Excel files that can be easily distributed.  And this is where the rub comes.  There are several methods that have been developed to integrate Python and Excel, including using a DLL file, 2 Add-ins that have been developed, and the traditional method of using a Python COM server.  Generally all of these require external files, and sometimes external installations.  But I think I can solve this problem by embedding Python Scripts as OLEObjects and call these from Excel using VBA.   These can then setup the COM Server, and do everything else necessary.

While I’m sure I will run into some walls along the way, I think the idea is sound, and as I develop this more, I will share the VBA code that can be used to do this technique.

One Comment

  1. Pingback: My Python Learning Plan at Jacob J. Walker's Blog

Leave a Comment

Your email address will not be published. Required fields are marked *