After some months without blogging (too many things happening lately), I’m coming back because I tried to get some answer to my problem in the Internet without any success. Thus, here I am sharing this in order to save you some waste time debugging.
Some days ago, I spent almost 6 hours trying to find “a simple” error (and the solution) related to a SSIS 2008R2 package that loads data from an Excel file and writes it to a SQL Server table without any mayor transformation. The strange thing here was that the package was working correctly in our development environment and “erratically” in our test environment.
The situation was:
- The SSIS package witch loads the Excel file shown below (Classicare Rx Network 2012 Q2.xlsx), runs correctly in BIDS and when executing it through a SQL Server Agent Job using the current user identity (Proxy Account). The package is using the 32bits version of the ACE Provider.
- The SQL Server Agent Job fails when running the SQL Agent Job using the identity of our ETLAccount (a standard domain account created for the SSIS packages execution). The error message indicates that the Excel file cannot be reached as you can see below:
“Message Executed as user: NAGNOILAP030-W8\ETLSystem_SQLAgentProxy. Microsoft (R) SQL Server Execute Package Utility Version 10.50.1600.1 for 32-bit Copyright (C) Microsoft Corporation 2010. All rights reserved. Started: 12:10:44 AM Error: 2012-09-06 00:10:46.53 Code: 0xC0202009 Source: Data Flow Task Excel Source  Description: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. End Error Error: 2012-09-06 00:10:46.53 Code: 0xC02020E8 Source: Data Flow Task Excel Source  Description: Opening a rowset for "RxNetwork$" failed. Check that the object exists in the database. End Error Error: 2012-09-06 00:10:48.16 Code: 0xC004706B Source: Data Flow Task SSIS.Pipeline Description: "component "Excel Source" (1)" failed validation and returned validation status "VS_ISBROKEN". End Error Error: 2012-09-06 00:10:48.16 Code: 0xC004700C Source: Data Flow Task SSIS.Pipeline Description: One or more component failed validation. End Error Error: 2012-09-06 00:10:48.16 Code: 0xC0024107 Source: Data Flow Task Description: There were errors during task validation. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 12:10:44 AM Finished: 12:10:48 AM Elapsed: 3.261 seconds. The package execution failed. The step failed.”
- However, the strange thing here was that we have many other packages working correctly (to load some manual dimensions without an existing formal data source), including when the SQL Job is configured to use the ETLAccount credentials. For example, the AdjudicationStatus_Medical.xlsx file shown below is one of those files:
- The second, strange thing was that the failing package executed successfully using a local administrator account:
I reviewed several post about using the Excel OLEDB 12 provider (including those about how to use it in 32bits), without getting a solution to my problem. I also found this very interesting post from Dougbert’s Blog entitled “Excel driver now supported on server for use by SSIS”. Beyond the Microsoft official support for Excel running on SSIS servers, the most important part for me was: “…ACE uses the impersonated user’s Windows temp folder to read-write its data. Therefore if your application is using impersonation with an account that does not have a profile on the server (not an uncommon situation), then ACE will not be able to create its temp files.”
So, according to this post, the ACE driver (used by Excel and Access) requires that a user profile exists in the server for the user (identity) running the Excel packages, but in this case we don’t have a user profile for the ETLAccount. At this point I was more confused: how my other packages work correctly and the one for the Classicare..xlsx don’t?
To understand more the mystery, I did try and error tests to check what was causing the package to fail when it was running using the SQL Server Agent Job and the ETLAccount credentials. I created several Excel files with different number of columns and rows.
Several hours later, I noticed that the problem was not related to the file format nor the number of rows (directly). As you can see in the table below (file with only 3 columns), I was able to reproduced the error when my files were more than 1.7Mb in size approximately:
|Rows||Size (MB)||Job Execution Status|
|35,000 (and more)||1.79||Failed|
To validate my hypothesis, I deleted some random rows to my original file (Classicare…xlsx) to cut if down to 1.66Mbs and it worked correctly! Having 1.67Mb or more, the Job was failing as expected.
After this findings, I reproduced the server environment in my own laptop just to test if creating the profile for the user account running the SQL Jobs works, but unfortunately it didn’t. Maybe an additional requirement is needed for this to work, but I got only this work, granting local administrator privileges to the user account.
Some days later, I found this interesting post entitled Excel Imports to SQL Server using Linked Servers, where the author states that any 32bit OLE DB Provider require access permissions to the SQL Server temp folder for the user account who run the query. Thus, I granted permissions to my temp folder (C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Temp) to my ETL account. Again, no good results.
After all my (unsuccessful) tries, my conclusions at this point are:
- If the account (credential) running your packages has not local admin permissions, you are limited to work with relative small files in size (1.66Mb maximum in my case). The maximum file size could be different in your system.
- Running the same tests in my laptop, I needed to reduce the file size to 1.28Mb to get it work. This tells me that the exact maximum file size could vary depending on your hardware configuration.
- After exceeding the size threshold in an Excel file, the ACE Provider needs some permission to write/read some temporary data (it looks it is more than a user profile). If this criteria is not met, you will get strange error messages about problems connecting to the file.
- My package was failing because it was more than 1.66Mb (12Mb) and the SSIS package execution account was not a local administrator of the machine.
- Even Microsoft Access is not supported in an enterprise environment (PowerPivot for Sharepoint cannot automate data refresh jobs from Access). I think this should be the same reason for what Excel 2010 is so picky for data loading automation.
My recommendation is:
- Avoid loading Excel files in a production / automated environment as possible, try comma separated files (CSV) or any other text file format.
Back in my project, we finally “solved” the problem, saving the file as CVS and changing the data source provider of my package from Excel to Flat File. Everything worked correctly this time even running in 64bits.
If you want to share any other information from your experience working with Excel files, feel free to comment.