08 septiembre 2012

Strange Error Loading Excel Files (xlsx) using SQL Server Integration Services

After some months without blogging (too many things happening lately), I’m coming back because I tried to get some answer to my problem in the Internet without any success. Thus, here I am sharing this in order to save you some waste time debugging.

Some days ago, I spent almost 6 hours trying to find “a simple” error (and the solution) related to a SSIS 2008R2 package that loads data from an Excel file and writes it to a SQL Server table without any mayor transformation. The strange thing here was that the package was working correctly in our development environment and “erratically” in our test environment.

The situation was:

  • The SSIS package witch loads the Excel file shown below (Classicare Rx Network 2012 Q2.xlsx), runs correctly in BIDS and when executing it through a SQL Server Agent Job using the current user identity (Proxy Account). The package is using the 32bits version of the ACE Provider.

image

image

  • The SQL Server Agent Job fails when running the SQL Agent Job using the identity of our ETLAccount (a standard domain account created for the SSIS packages execution). The error message indicates that the Excel file cannot be reached as you can see below:

image  image image image

 
“Message Executed as user: NAGNOILAP030-W8\ETLSystem_SQLAgentProxy. Microsoft (R) SQL Server Execute Package Utility  Version 10.50.1600.1 for 32-bit  Copyright (C) Microsoft Corporation 2010. All rights reserved.    Started:  12:10:44 AM  Error: 2012-09-06 00:10:46.53     Code: 0xC0202009     Source: Data Flow Task Excel Source [1]     Description: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.  End Error  Error: 2012-09-06 00:10:46.53     Code: 0xC02020E8     Source: Data Flow Task Excel Source [1]     Description: Opening a rowset for "RxNetwork$" failed. Check that the object exists in the database.  End Error  Error: 2012-09-06 00:10:48.16     Code: 0xC004706B     Source: Data Flow Task SSIS.Pipeline     Description: "component "Excel Source" (1)" failed validation and returned validation status "VS_ISBROKEN".  End Error  Error: 2012-09-06 00:10:48.16     Code: 0xC004700C     Source: Data Flow Task SSIS.Pipeline     Description: One or more component failed validation.  End Error  Error: 2012-09-06 00:10:48.16     Code: 0xC0024107     Source: Data Flow Task      Description: There were errors during task validation.  End Error  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  12:10:44 AM  Finished: 12:10:48 AM  Elapsed:  3.261 seconds.  The package execution failed.  The step failed.”

  • However, the strange thing here was that we have many other packages working correctly (to load some manual dimensions without an existing formal data source), including when the SQL Job is configured to use the ETLAccount credentials. For example, the AdjudicationStatus_Medical.xlsx file shown below is one of those files:

image

image

  • The second, strange  thing was that the failing package executed successfully using a local administrator account:

image image

I reviewed several post about using the Excel OLEDB 12 provider (including those about how to use it in 32bits), without getting a solution to my problem. I also found this very interesting post from Dougbert’s Blog entitled “Excel driver now supported on server for use by SSIS”. Beyond the Microsoft official support for Excel running on SSIS servers, the most important part for me was: “…ACE uses the impersonated user’s Windows temp folder to read-write its data. Therefore if your application is using impersonation with an account that does not have a profile on the server (not an uncommon situation), then ACE will not be able to create its temp files.”

So, according to this post, the ACE driver (used by Excel and Access) requires that a user profile exists in the server for the user (identity) running the Excel packages, but in this case we don’t have a user profile for the ETLAccount. At this point I was more confused: how my other packages work correctly and the one for the Classicare..xlsx don’t?

To understand more the mystery, I did try and error tests to check what was causing the package to fail when it was running using the SQL Server Agent Job and the ETLAccount credentials. I created several Excel files with different number of columns and rows.

Several hours later, I noticed that the problem was not related to the file format nor the number of rows (directly). As you can see in the table below (file with only 3 columns), I was able to reproduced the error when my files were more than 1.7Mb in size approximately:

Rows Size (MB) Job Execution Status
65,000 3.60 Failed
35,000 (and more) 1.79 Failed
33,000 1.69 Failed
32,801 1.68 Failed
32,601 1.67 Failed
32,511 1.66 Success
32,501 1.66 Success
32,000 1.64 Success

image

To validate my hypothesis, I deleted some random rows to my original file (Classicare…xlsx) to cut if down to 1.66Mbs and it worked correctly! Having 1.67Mb or more, the Job was failing as expected.

After this findings, I reproduced the server environment in my own laptop just to test if creating the profile for the user account running the SQL Jobs works, but unfortunately it didn’t. Maybe an additional requirement is needed for this to work, but I got only this work, granting local administrator privileges to the user account.

Some days later, I found this interesting post entitled Excel Imports to SQL Server using Linked Servers, where the author states that any 32bit OLE DB Provider require access permissions to the SQL Server temp folder for the user account who run the query. Thus, I granted permissions to my temp folder (C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Temp) to my ETL account. Again, no good results.

After all my (unsuccessful) tries, my conclusions at this point are:

  • If the account (credential) running your packages has not local admin permissions, you are limited to work with relative small files in size (1.66Mb maximum in my case). The maximum file size could be different in your system.
  • Running the same tests in my laptop, I needed to reduce the file size to 1.28Mb to get it work. This tells me that the exact maximum file size could vary depending on your hardware configuration.
  • After exceeding the size threshold in an Excel file, the ACE Provider needs some permission to write/read some temporary data (it looks it is more than a user profile). If this criteria is not met, you will get strange error messages about problems connecting to the file.
  • My package was failing because it was more than 1.66Mb (12Mb) and the SSIS package execution account was not a local administrator of the machine.
  • Even Microsoft Access is not supported in an enterprise environment (PowerPivot for Sharepoint cannot automate data refresh jobs from Access). I think this should be the same reason for what Excel 2010 is so picky for data loading automation.

My recommendation is:

  • Avoid loading Excel files in a production / automated environment as possible, try comma separated files (CSV) or any other text file format.

Back in my project, we finally “solved” the problem, saving the file as CVS and changing the data source provider of my package from Excel to Flat File. Everything worked correctly this time even running in 64bits.

If you want to share any other information from your experience working with Excel files, feel free to comment.

Alan