Excel VBA Code to Scrape California State Job Openings

I have become interested over time in web scraping and potentially using this to do a massive job market study to determine what education should truly focus on to ensure that we are providing the right skills to our students.   While I’d ultimately like to do this with a website like Indeed, and probably would use something like Python and PostgreSQL, I did a quick VBA script in Excel this morning to scrape the California Job Openings, putting a new opening per sheet, and I thought I’d share the code.

[code language=”vb”]
Sub Scrape_California_Job_Openings()

Start_RecNo = 552090
End_RecNo = 501000

For Counter = Start_RecNo To End_RecNo Step -1
Sheets.Add.Name = Counter
ActiveWorkbook.Sheets(CStr(Counter)).Activate

With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://jobs.spb.ca.gov/spb1/vpos1/vadmin_moreinfo.cfm?recno=" &; CStr(Counter), _
Destination:=Range("$A$1"))
.Name = CStr(Counter)
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "2"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
Worksheets(CStr(Counter)).Visible = False
Next Counter
End Sub
[/code]

I should note that this code hides the worksheet after scraping to it, as to try an not have so many worksheets that Excel slows to such a point it basically crashes…  I’m testing the code out now, to see if this can work, and if I can then use some more VBA to go through each hidden sheet, and extract parts out to a single table.

Leave a Comment

Your email address will not be published. Required fields are marked *