📆 Published on
4 min read
mack by Macktireh

Flexible Data Source Management in Power BI with MySQL


Table of Contents


Introduction

As part of a client project I am currently working on, we encountered an exciting challenge: designing a Power BI dashboard connected to a MySQL database containing millions of rows. This experience allowed us to develop an efficient solution to manage data sources based on the working environment. In this article, I will walk you through the steps of implementing this solution, addressing the challenges encountered and the solutions implemented.

Challenges Encountered

1. Connectivity Issues

After installing the MySQL driver for Power Query, we encountered connection difficulties due to firewall configuration on the client’s professional workstation. Interestingly, the connection worked perfectly from my personal computer and when refreshing reports in the Power BI service.

2. Performance and Data Volume

The database contained several million rows, making local data loading particularly time-consuming. This situation required an alternative approach to optimize performance in the development environment.

Solution Implementation

1. Solution Architecture

To resolve these issues, we developed a hybrid approach:

  • In local environment (Power BI Desktop): using CSV files stored on SharePoint
  • In production (Power BI Service): direct connection to MySQL database

2. Data Preparation

To simplify local development, we first wrote the SQL query to extract the necessary data. Then, we developed a Python script that connects to the database, executes the query, exports the results to CSV format, and we uploaded the files to a SharePoint site.

3. Parameter Configuration

We set up five parameters to dynamically manage data sources:

parameters

  • ENVIRONMENT: choice between LOCAL and PRODUCTION
  • URL_CSV_FILE: link to the CSV file
  • MySQL_HOSTNAME: MySQL hostname
  • MySQL_DATABASE_NAME: database name
  • MySQL_QUERY: pre-built SQL query

Technical Implementation

1. Dynamic Environment Detection Variable

We created a boolean dynamic variable IS_PRODUCTION that simplifies environment detection:

is_prod

let
Source = Text.Contains(Text.Upper(ENVIRONMENT), "PROD")
in
Source

This variable automatically returns:

  • True if the ENVIRONMENT parameter contains “PROD”
  • False in all other cases

This approach allows us to simply use IS_PRODUCTION in our conditions rather than rewriting the complete formula Text.Contains(Text.Upper(ENVIRONMENT), "PROD") each time.

2. GetData Function

We created a Power Query function named GetData, allowing data retrieval from MySQL in production and from a CSV file locally.

getdata

let
GetData = (NumberColumnsCSV as number, MySQLQuery as text) =>
let
CsvSource = Csv.Document(
Web.Contents(URL_CSV_FILE),
[
Delimiter = ",",
Columns = NumberColumnsCSV,
Encoding = 1252,
QuoteStyle = QuoteStyle.None
]
),
CsvPromotedHeaders = Table.PromoteHeaders(CsvSource, [PromoteAllScalars = true]),
SourceMySQL = MySQL.Database(
MySQL_HOSTNAME,
MySQL_DATABASE_NAME,
[
ReturnSingleDatabase = true,
Query = MySQLQuery,
CreateNavigationProperties = false
]
),
Result = if IS_PRODUCTION then SourceMySQL else CsvPromotedHeaders
in
Result
in
GetData

This function performs three essential operations:

  1. Local data loading:

    • Retrieves a CSV file from a URL
    • Uses a comma delimiter
    • Specifies the number of columns
    • Handles encoding and quote style
    • Automatically promotes the first row as headers
  2. Database connection:

    • Establishes a connection to MySQL in production
    • Executes a custom SQL query
    • Retrieves data directly from the database
  3. Dynamic source selection:

    • Automatically switches between local CSV and MySQL database
    • Uses the IS_PRODUCTION variable as selection criteria

3. Using the GetData Function

To use the GetData function, simply create a new query and rename it as desired. In this example, we call it “MyData”.

mydata

let
Source = GetData(5, MySQL_QUERY)
in
Source

This approach allows data retrieval by specifying only two arguments:

  • The number of CSV file columns (5 in this example)
  • The MySQL query to execute (via MySQL_QUERY parameter)

simple, isn’t it? 🙂

4. Optimizing Data Transformations

In some cases, we need to apply specific transformations to CSV data. For this, we identified two possible approaches:

— Modifying the GetData Function

A first approach involves integrating transformations directly into the GetData function:

let
GetData = (NumberColumnsCSV as number, MySQLQuery as text) =>
let
CsvSource = Csv.Document(
Web.Contents(URL_CSV_FILE),
[
Delimiter = ",",
Columns = NumberColumnsCSV,
Encoding = 1252,
QuoteStyle = QuoteStyle.None
]
),
CsvPromotedHeaders = Table.PromoteHeaders(CsvSource, [PromoteAllScalars = true]),
CsvReplacedValue = Table.ReplaceValue(
CsvPromotedHeaders, ".", ",", Replacer.ReplaceText, {"my_column_numeric"}
),
SourceMySQL = MySQL.Database(
MySQL_HOSTNAME,
MySQL_DATABASE_NAME,
[
ReturnSingleDatabase = true,
Query = MySQLQuery,
CreateNavigationProperties = false
]
),
Result = if IS_PRODUCTION then SourceMySQL else CsvPromotedHeaders
Result = if IS_PRODUCTION then SourceMySQL else CsvReplacedValue
in
Result
in
GetData

However, this approach is not recommended as it goes against the single responsibility principle: a function should ideally have only one responsibility.

— Transformation in the MyData Query

A more elegant approach involves separating data retrieval from transformation. Let’s update our “MyData” query to apply specific transformations to CSV data:

let
Source = GetData(5, MySQL_QUERY),
ReplacedValue = Table.ReplaceValue(
Source, ".", ",", Replacer.ReplaceText, {"my_column_numeric"}
),
ChangedType = Table.TransformColumnTypes(
if IS_PRODUCTION then Source else ReplacedValue,
{{"my_column_numeric", type number}}
)
in
ChangedType

This second approach offers several advantages:

  • Clear separation of responsibilities
  • Better code maintainability
  • Greater flexibility for modifying transformations
  • Ability to apply conditional transformations based on environment

Conclusion

In this article, we explored a solution to efficiently manage data sources in Power BI by implementing a dynamic switching system between local CSV files and a production MySQL database. The approach we developed not only solved our immediate connectivity and performance challenges but also provided a flexible framework that can be easily adapted to other environments or data sources. This approach can easily be adapted and extended to meet other needs, such as adding new environments (test, pre-production) or supporting other data sources (PostgreSQL, Oracle, etc.). Otherwise, if you want to import and combine multiple Excel/CSV files in a clean and optimized way, I invite you to read my article “Import Multiple Excel/CSV Files into Power BI with a Custom Power Query Function” .