Excel Power Query Benefits
One of the things I really like to do with Power Query is shape data into optimized tables. In order to accomplish that goal, I’ve begun using Power Query to source data over Power Pivot’s built in methods. But in order to build things the way I want, I need an easy way to reference other power query queries.Why would I go to the effort of feeding through Power Query first? I’m no SQL ninja, and I find Power Query allows me to easily re-shape data in ways that would be hard with my SQL knowledge. I can leverage this new tool to optimize my tables and build Power Pivot solutions that require less tricky and funky DAX measures to compensate for less than ideal data structure. (I’d rather have easy to understand relationships and simple DAX measures!) MethodologyMy methodology generally goes something like this:. Load a base table into a Power Query.
I then set it to only create a connection. Let’s call this my Base Connection. Next I’ll create as many queries as I need to re-shape the data in the Base Connection into the forms I need, then load those into the data model.It’s that second part that is key. I need to be able to reference other Power Query queries (namely my Base Connection) so that I could prune/trim/re-shape the data. Reference other Power Query queries - The Old WayUntil recently, I would create my Base Connection, then I’d do the following to create the new query to reference that one. Go to the Power Query tab.
Open Power Query In Excel
Show the Workbook Queries pane. Right click the Base Connection query and choose ReferenceThe problem was this my intention was to reference and customize my query. Instead, it immediately loads it into a worksheet.
Even though Power Query (in Excel or Power BI) can’t read the Rich Data Type itself, it CAN read the columns you extract via formulas. It’s a workaround, and one we’d prefer not to have to do, but at least we can get to the enriched data that these new data types give us.
I have to wait for that to finish before I can edit the new query and customize it the way I want. Reference other Power Query queries - The New WayI learned a new method last week from one of the Power Query team members which is much better (thanks Miguel!)., but I thought this was worth calling out on its own.Instead of following the method above, this time we will:. Go to the Power Query tab. Show the Workbook Queries pane. Right click the Base Connection query and EditNow we’re taken into the Power Query window. On the left side we can see a collapsed “Queries” pane. When you expand that, you get a list of all Power Queries in the workbook.
Right click the Base Connection query and choose “Reference”We now have a new query in the editor that we can edit, without loading it into a worksheet first. Faster, and more in line with my goals.The other thing I like about this method is that it immediately gives me access to that queries pane. Why is that important? Because I can drill through the other queries and get at their M code without having to close the window and go back to Excel first. So if I have some funky M code I need to re-use, it makes it way easier to review it and copy it. Hi Ivan,I'm not sure I fully understand here. Assume that we have query 1 and add a query called '2' that references query 1.
Are you asking why, when you refresh Query 2, Query 1 refreshes? Are you looking to make Query 1 static, so that it doesn't update unless you specifically refresh it?If that's the case, then no. When you use one query to reference another it creates a dependency chain. So when you refresh the query, it will reach back to the original data source and refresh that.
How to mod minecraft without forge. Jun 21, 2018 I am on a Mac and my parents won't allow me to run Forge.jar because it 'could harm the computer'. Does anyone know a way to get Mods without Forge? I have seen several tutorials for this but all of them need WinRar which is unavailable for my version of OS X.
If that is a database or file, it will reach out to make sure the data is current. If it's another query, it will refresh that query to make sure it is current, which will then reach back to it's sources.I haven't found that structuring queries to read from each other adds any significant time to the refresh process, but then to be fair I haven't set up any tests to measure it either.Did that answer your question, or did I misinterpret? I have a data sheet where the generated date shows up in a single cell up on the top and then the data table itself follows.I created a query to just pick up the generated data but now I want to use that date within a formula of a new column in the 2nd query (the one that pulls/transforms the data table itself). How can I do that?Or how can I add the Generated data as a new column within the data table itself?
That way I can use this generated date column and create the new column referencing this. Yeah, that would be a pain, for sure. I guess my thought is that I'd probably try and build a solution that pulled the data in and cleaned/processed it with Power Query before uploading it into a staging database. It would take some VBA knowledge, but if built a prototype in the past that did exactly this. I have logged the 'processed files' to a database table to ensure that they would not be refreshed again in future.Kind of a pain to build, but probably the better solution for you, as you can then refresh from the database as needed. Hi Brian,Unfortunately the second query can't take advantage of any query folding against the SQL database, only the first query in the chain can.
If I read this right, you've done this:Query 1: Connects to the raw tableQuery 2: References Query 1 and filters the dataBasically, this means that no steps from query 2 will be rolled up and sent to the database, causing you to load all 100k rowsNow if you set it up to do this:Query 1: Connect to the database, remove columns, remove duplicatesQuery 2: References Query 1 for further operationsThe the entire contents of Query 1 could be passed back to the SQL database in order to process it way more efficiently, returning just the 8 k rows. Hi Ken,The problem I have with this approach is that if I do some filtering of the data, then the referenced query gets affected too. This is inconvenient because sometimes I want to handle that query as a total different one, but having the flexibility of only 'refreshing' one time, one data source. For example, I can bring the data as a connection and make another query as a reference. Everything fine here.
Let's say I do a remove duplicate on the country column and obtain 10 countries. However, if I apply some filter on the first (original) query, and the countries in that scope are limited, then my other query will be affected and now I would see less countries.The other way of doing it is by duplicating the query, but the problem is that then the file path is hard coded and then I might need to first remember to update the path and then it takes manual effort to do this.Any idea?ThanksFernando. Hi Ken,I've transformed a data query as a 'Connection Only' because I want to minimize the file size.Is there a way to reference this 'Connection Only' query from another excel workbook (i.e. Not worksheet)?
I want it to be reference so that any update I made to the base query I can refresh my reference query.The only plausible way to do this is to 'Load To' the 'Connection Only' query as a table first. Then from a different excel workbook, I would extract the query From File - From Excel and open the 1st excel file.
However, loading the base query to a table cause my first excel file size to increase tremendously which I am trying to avoid.Regards,Ayfiq.
Back on May 18, Reid posted a great post on. It’s a technique that I’ve been preaching on for years, and is something that we teach in the as well.
I mean, let’s face it, it only makes sense, right? Connect to the data source once, and re-use that information over and over again as you spin off different shapes for the rest of your tables.So what if I was to tell you that – from a performance standpoint – this is only actually a good practice in Power BI desktop, and probably the worst thing you could do to your Excel model? That doing this will drive Power Query refresh speeds down? How’s that for a click-bait paragraph? Let’s take a look A sample modelIf you’ve been working with Power Query for any amount of time, you’ll start building models that look like this:In fact, this is based on a real model that one of my clients built. Pretty normal really; it pulls data from a SQL stored procedure, 4 excel tables and a custom function.
Many of us build models like this.Now, let me ask you a skill testing question when you click Data – Refresh All, how many times is the stored procedure executed?The answer is 1, right? Trust but verifyMy client sat down with his IT department (he is lucky enough to have one that is super cooperative) and worked through some performance testing. The catalyst for this was that the 52,000 row model was taking ages to refresh. The time was sufficient to kill his ability to build and deploy Power Query solutions as the users just wouldn’t accept that kind of latency.Here’s how they tested:. The Model was built in 64 bit Excel 2013 using Power Query, feeding directly into Power Pivot Data Model.
The data was then landed from the Data Model into a Pivot Table. That same workbook was imported into Power BI and an equivalent Matrix visualization was created with the same information as the pivot table in the Excel version of the model. Each model was refreshed several times. They timed the total refresh time with a stopwatch (since there wasn’t really another way to do it) and used SQL Sentry to log the number of times the stored procedure was called. To account for human latency in the stopwatch start/stop, this test was repeated several times with the results being averaged.What they found was disturbing to say the least. Power Query refresh test resultsHere are the average power query refresh times that were logged for this model: ProgramRefresh TimePower BI Desktop9 secondsExcel75 seconds!That is absolutely insane!
Excel is over 8 times as long to refresh the same data!Imke Feldman made a mention of this in her blog post While the angle of her post is very much how impressed she is with Power BI, I have to ask the opposite question: Why does Excel Power Query performance suck so badly? These two technologies are supposed to be based on the same architecture, aren’t they?
So why are we celebrating Power BI’s greatness and not screaming about Excel’s abysmal performance?The more telling issue actually shows up in the SQL Sentry monitoring of the Stored procedure calls: ProgramStored Proc CalledPower BI Desktop2 timesExcel6 times!Seriously? How does this even make sense?To be honest, I was actually shocked to see that the stored procedure is called more than once in Power BI.
I would not have expected that at all, and would have assumed that the SQL Stored Procedure would be executed once since it is only referenced by the Historical Data table. The only thing we can fathom here is that it is called once for previews, and once for the actual load or possibly that privacy levels may force something to be called twice. I’m not 100% certain, except that we now know that it is called at least one more time that we think would be necessary.But Excel? Why is Excel calling that stored procedure six times? That makes no sense at allWhy don’t we assign a number to each of the queries that relate to the SQL Stored Procedure:Adding one to the total on the theory of a preview, we get to six. Unlike Power BI, in Excel a model refresh re-executes each child query in full, never caching the results. No wonder the performance is horrendous!To solve this issue, my client has had to start flattening his query structure something that is painful and become impossible when you run afoul of the formula firewall.
And that’s the biggest rub here. In order to follow best practices and avoid formula firewall issue, you are SUPPOSED to build queries as shown above.
And yet your performance will be crucified for doing it.Worse, I have no way to believe that this does not ALSO happen with all Excel tables in the model. Unfortunately we don’t have a software like SQL Sentry that works on Excel files Another ModelDespite the fact that we can’t trace internal Excel refreshes, we can draw some conclusions based on power query refresh times. Here’s another model to consider, which is built using a method similar to that which. In this model, I connected to an Access Database using a Connection Only query.
I then referenced the “Database” query and:. Loaded the Transactions table to the Data Model (61,166 records). Loaded the COA table to the Data Model (334 rows). Merged the two into a table called Flat and loaded this to the data model (61,166 rows)Graphically, it looks like this:I then used VBA to time the refresh time of each individual query, with these results:“Now wait a minute” you might be thinking, “you are calling each query individually, so of COURSE it’s going to refresh the child queries!” And that is absolutely true. So then I compared it to how long it takes to run Excel’s Data – Refresh All command. In an ideal world, that should re-use the Transactions and COA tables that were loaded, correct?Um Oh. It takes a fraction longer even.

Possibly because it needs to find the precedent queries? I’m not sure, but what I can say is that it is plainly re-calculating each table as the total refresh is greater than the sum of the tables done individually.Interested in Learning How to Do this Kind of Thing?Our just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.to claim your spot.And lest there be any doubtThis is by far the most frustrating thing here my client reached out to Microsoft Premier support. (They are big enough that they have a dedicated support rep and premier support tickets.) It’s took a while, but they finally received this email back from the support team:The performance difference is expected. Excel isn’t running all of the queries in the same cache context, so if two different data sets are based on the same source data then it can’t take advantage of local caching and this is by design. Power BI Desktop runs all of the queries for a “refresh all” inside a single cache context, which would explain why you see the stored procedure being called less often.
At this time, there is no mechanism that the refresh time in excel can be improved.Honestly, I don’t believe that to be the case here. I’ve been discussing this issue with Microsoft program managers and engineers for some time.
This was an issue in PowerBI Desktop, and they fixed it. The fix just needs to be ported into Excel is all, but they haven’t been able to get it done yet.I don’t fault Microsoft for building new features. That is their job, and their company success is based on selling more licenses. Microsoft wants us to use Power Query, and constantly adding easier ways to clean up crappy data is a huge hook in the subscription model. The insider builds of Excel are already hiding the classic data connectors and making Power Query the default Get Data experience – something I think they should have done with the initial release of Excel 2016.When I talk to users they are blown away by the ability of Power Query to transform and clean up their data. It’s only recently that I’ve started getting more and more questions about power query refresh performance. I’m convinced that this is because the user base is still small, and many users just accept the awful performance because they can get cool stuff done.
And that’s part of the reason Microsoft hasn’t been able to justify this. If it was your company, would you fix a performance issue that few people recognize, or develop new features to attract new users to your product? But it’s time to change the priority here. Why build all these amazing features and push them if the audience just ends up abandoning them because the refresh time is too long?
To me the fact that child nodes are not cached is the number one issue facing Power Query today. I love Power Query. I don’t think anyone who knows me can say anything else. And I love the monthly update cycle that continues to deliver new features and improve the product.
But believe me when I say that I would happily stop ALL development on Power Query for two years in order to get power query refresh fixed. It’s that big a problem. I’ve already seen clients abandon Power Query solutions due to the refresh times, and several others who avoid this great tool at all costs due to the refresh latency. Hanging out at the Ask the Experts booth at the, 75% of the people that came to talk to me were primarily there to figure out how to improve the performance of their queries.
So what can you do to improve Power Query refresh?Help us raise this as an issue with Microsoft. You can do that in two ways: Most importantly: Get everyone you know to. This is one of the primary vehicles to get our voices heard by the development team.
They are listening, and this idea has been marked as under review BUT IT STILL NEEDS MORE VOTES. In Microsoft’s own words:I’d like to see better power query refresh as the #1 requested item on Excel Uservoice, and we have a long way to go to get there. But wait, you say it’s marked as “Under Review”! Why do we still need to vote when it’s under review? Let me answer that by pointing you to to make working with Pivot Tables easier (which you also may want to vote up.) It’s an idea that I KNOW they want to implement, but it’s been under review since Nov 2015 and still hasn’t met the bar:I talked to a couple of the team members at the, and they just haven’t had enough votes to signify that power query refresh in Excel is more important than some of the other things they are working on. Notice in the posts, they ask us to keep voting. The more votes we get, the easier it is for the teams to justify the cost of the fixes and new features to the ultimate approvers in Microsoft.That’s the really important thing to do here; KEEP VOTING.
And if you’re really passionate about these (or any ideas), get your friends to vote. Get your family to vote. In fact, get anyone else that will listen to you to vote! Voting Links. I thought I saw something like that when refreshes were running (you can see what file it’s pulling data from when it’s taking a while) but assumed it was just a display issue.
I never dreamed it would actually do that. No wonder my performance dropped when I went through and separated out my queries to make it easier to troubleshoot! And completely agreed that they could stop everything else for the next two years to fix this, it would be a complete gamechanger. Thank you for bringing this up, I’m going to see exactly how many people I can get to upvote this. I tried that with a model I was working on and I didn’t get any performance boost.
Its one of those things I expected to work but it just didn’t. Another way to work around this is the flatten the entire model and then use a parameter table to control related queries centrally. This gave me a big performance boost (Thanks to Ken for the coaching on this technique). However, there is only so far that can take you on a complex model. I think the biggest thing is that Microsoft needs to fix this on their side. Jimmy, no, it’s a totally different issue. The challenge is development is that Power Query works with a preview.
When you add a new step, the preview needs to re-execute. Think about it this way let’s pretend there is a 1000 row limit in your preview. What if you filter rows out of the data set? Power Query needs to go back and re-evaluate to get rid of the inappropriate records and pull in the refreshed set with the new 1000 rows.I agree that development – particularly on large data sets – is agonizingly slow.
The best advice I can give you here is to make a query that pulls the top 10,000 rows and land that it an a raw table. Connect to that table to design your reshaping query, as that should be faster.
Then re-point the reshaping query to the original data set when you’re ready to execute it. That way you get past the constant refreshes. This is an importante issue. In the last weeks, I’ve been strugling to deal with this problem in the last weeks, and I think there are some workarounds that can make things a little better.
But this is far from good, yet! I THINK IT´S IMPORTANT THAT SOMEONE IN MICROSOFT READ THIS:1 – My tests indicate that, at least in my case, its MUCH, MUCH SLOWER to load the PQ queries to Excel Data Model than to a Worksheet.2 – I have some Power Pivot Tables linked to my relatively complex Excel Data Model. This Power Pivot Tables use some complex DAX measures. Clearly, these Power Pivot Tables are updated at EACH query load.
If you update 5 PQ Queries, all the PV Pivot Tables are updated 5 times. I’ve built a VBA script that disables those updates setting the property ManualUpdate to “True” in the Pivot Table object.
This saved me half of the time. I think that it would be easy to correct this BUG.3 – We have to be extremely cautious with the our Excel Data Model. Each new column added to the model increases it’s complexity. This gets worse when we are talking about columns that have any distinct values. So take some time to clean your model. Eliminate columns that you added “just in case”.
And, most important, get rid of text columns that you dont use in your final reports. When you refresh your PQ queries, something is updated inside your Excel Data Model, probably many more times than it would be necessary. So keep your Excel Data Model as clean as possible.
Excel 2016 Power Query Benefits
This helped me very much. I am new to using PQ and have a few questions after reading this and the referencing articles.
Say you have an appended query containing multiple queries from multiple data sources. To optimize refresh time using referencing, should I only refresh the base queries, or does “Refresh All” automatically only refresh the base queries? If I don’t use referencing and have a loaded query for every single data table from every data source, will this take longer to refresh than if referencing was used? I also need to be able to update the appended query while the data sources are in use.
What would be the best way to create the appended PQ to maximize refresh rate and allow for the data sources to be in use? Any recommendations would be much appreciated.