Month: July 2012

Best Currently Known Practice for Integrating Access Databases for Analysis with MS SQL

Currently at my school, we have a need to pull data from software that uses a Microsoft Access back-end.  One of the challenges of this, is that we can not tinker generally around with the MS Access database directly, because it will void any support we want to get, and it usually stops the product from working.   So, I’ve wanted to be able to have the information go into our MS SQL server, so I can access it with our Meta-Database (I talked a little about this idea in my previous post.)

After working on this problem for 20+ hours, and going down a lot of wrong paths, I found a solution that works.  To possibly help others in the same situation, I am writing this blog entry, to not only hopefully help you quickly to get to a solution, but at the same time to help you know why the other paths didn’t work for me, so that you can avoid them, or if you decide to tread down them, that you have some forewarning… 🙂

Final Solution: Linked Servers with Microsoft OLE DB Provider for ODBC and OpenQuery

After going down a lot of bad roads, I found that by creating an ODBC connection to my Access Database, and then using the Microsoft OLE DB Provider for ODBC to attach to that ODBC driver, I was able to get to my data.  But for some reason, I could not still directly query the data, but I could use the OpenQuery command to extract the data, and then use this to make the mirror views in my Meta-Database.

Step 1 – Setup ODBC

I made a System DSN that pointed to my access file.  Since my access file was on another server, I had a hard time typing in the UNC to get to it.  But I was able to get around this by pointing to it locally at first, and then doing a registry hack to point the UNC later.   I also found that if chose “read-only” in the ODBC driver, it screwed things up (I forget the exact error message off-hand, but it took a bit of searching forums to find that this caused problems.)

Step 2 – Setup the Linked Server

I used the following code to create my linked server

EXEC sp_addlinkedserver 
    @server = N'Name_to_use_as_Linked_Server', 
    @provider = N'MSDASQL', 
    @srvproduct = N'Access',
    @datasrc = N'ODBC_Name',
    @catalog = N'Path_to_MDB'
GO
EXEC sp_addlinkedsrvlogin 
    @rmtsrvname = N'Name_to_use_as_Linked_Server',
    @useself = N'FALSE',
    @locallogin = NULL, 
    @rmtuser = NULL, 
    @rmtpassword = N'Database_Password' 
GO

In this code the Name_to_use_as_Linked_Server should be whatever name you want to reference the database with later in MS SQL.   For simplicity sake, I generally name my ODBC connection and Linked Server, the same as the file name of the MDB.  So if I was connecting to sample.mdb, I would call all of those sample.

The @rvproduct apparently doesn’t really matter,   But the ODBC_Name does, although again, if you just use the same name as the MDB then this is easy.   An interesting thing to note is that the Path_to_MDB does not need to be what is defined in the ODBC driver.   So in my case I actually had multiple Linked Servers to Multiple MDBs all using the same ODBC connection (and so I actually broke my rule about having the ODBC driver having the same name as the Linked Server.)

I also was working with a password protected MDB, which made me go down a lot more wrong paths, as I figured that part out.  But by leaving the @rmtuser being NULL and then just having a @rmtpassword, it worked fine.  I suspect that if there were user accounts and passwords on the MS Access files, then the @rmtuser would need to be one of them, and you might run into issues with the System.mdw

Step 3 – Create the Mirror Views using OpenQuery

For some reason, I could never get my queries to work directly against the MS Access linked server.  I got a variety of errors, that all were ambiguous, and that I couldn’t track down a solution from any web searches.  But I found that if I used the OpenQuery command to run the query through the MDB instead of running it through MS SQL, that it actually would return what I was looking for, and by putting this into a view, I could then treat it as a normal MS SQL table (for the most part) in the future.

Here was the syntax I used to make the Mirror Views:

Use Database_Name
GO
CREATE VIEW View_Name AS 
SELECT * FROM OPENQUERY(Name_of_Linked_Server, 'SELECT * From Table_Name')
GO

Where Database_Name is the name of the Meta-Database, and View_Name is the name of the mirror View.  I use naming convention of LinkedServer_Table, thus if I had a linked server named EmployeeDB, and a table named Employees, I would name the Mirror View: EmployeeDB_Employees

So, doing these steps, the process worked, and I’m happy, because it does not seem to conflict with the third party software that uses the MS Access back-end.   And all the updates are real-time, so that when the third party software changes something, the mirror views can instantly see these changes.  Next, lets look at all the other paths I explored before coming to this solution.

Read the full article to find out about methods that were considered but not used

Best Known Practice for Integrating Medium Sized Databases for Analysis with MS SQL

I am always hesitant to use the word “Best Practice”, and even “Best Known Practice” is a little wrong.  But this is the best practice I know of, given the situation that I’m in.

While I’m aware that data-warehousing techniques might be better, the following is a fairly easy way to integrate many databases together, at least for the purposes of extracting and analyzing data.  They key is to make a “Meta-Database” that, for the most part, only contains Views.

Most of the views are what I’m calling “Mirror Views”  (I hope if there is a term that has already been coined, someone tells me, so I can use that word instead.)  Mirror Views are simply a View that Selects * from an equivalent table in another database.   By using these views, I can write additional views to work with them without having to write in the original database, thus I do not void any support options from the original database vendor.

In SQL Server, I use either sp_tables (for databases hosted on the server) or sp_tables_ex  (for linked servers), to retrieve the name of all the tables that I want to make Mirror Views from.   I am pretty sure I could do some sort of T-SQL set of commands to convert these into views, but my T-SQL isn’t that high, so I copied the output into a CSV, then into a spreadsheet and actually wrote formulas to make the Views in the form DBName_TableName in my Meta-Database.  (I would greatly appreciate if any readers who can share some code to do this in T-SQL, as I know the spreadsheet is probably not a best practice, but it gets the job done!)

This technique works easy for databases that are hosted in the SQL server, but for other databases hosted either on other servers, or that are on different platforms (such as MS Access, MySQL, etc.), then using a linked server is the way to go.  BUT, be forewarned, linked servers are not always as easy to setup as one would hope.   I will be writing more about this in a future posting, giving what I have learned through the school of hard knocks.

I am trying to move towards using Free and Open Source software more often, and so if anyone knows how to do something like linked servers in MySQL or Postgre, please let me know, as I’d like to use that solution in the future.

Obfuscation of PII in Released Databases via Randomization

For the past year, my school, Twin Rivers Adult School, has been using an Excel spreadsheet I created as the main database for administering Pell Grants.  While Excel was a good tool to get our system up and running right away, it is not a good long term solution, and a database solution needs to be implemented.  As I talked about in other posts, the commercial solutions are expensive, and do not really meet our needs as a clock-hour based school.  Thus, this summer, myself and Steve Jensen, our Office Technician instructor, are having a special course for our advanced students to work with us to develop a new database.

This I believe will be a good “win-win”.  The students will get real experience working with a real development project, and get some specialized knowledge in Federal Student Aid, which in my opinion is an untapped vertical market.  Further, they will be able to earn lower-division credits from our school, and the adult school will be able to partially reduce its development cost.  (Although, these types of projects are never “free labor” on the part of the students.  There will be an extra investment of time by me and Steve to help the students, and while we hope this time will be less than the equivalent time we would need to do the development ourselves, we are not assured of this.)

But, one of the critical components for us as a school is security.  While we will have our students take a pledge and sign an agreement to not share any personally identifiable information (PII) they should incidentally contact, I still did not want the original data with PII to get disbursed via copying, etc.   So I created a spreadsheet that helped me to randomly assign PII in place of the real information, such as having fake names, emails, etc.   This way I can distribute the real Pell grant spreadsheet to the students, so they can work with real data, and see all the real scenarios the database will need to create, but at the same time, they don’t receive any real personal information about students.

I have placed this randomizing names spreadsheet online, for other database administrators and researchers who wish to obfuscate PII to be able to use.   Currently the spreadsheet does a good job with first and last names, and I hope to improve the middle name algorithm in the future, and also to define the probability distribution function of the the weighted method I have of determining first names and last names.