Importing data into data studio using python is becoming a major concern for most agencies these days. While Google Data Studio does have native support for various ranges of platforms, no reliable method exists for importing Python data directly. For small datasets, Google Sheets, being natively endorsed and simple to use, is the most straightforward alternative to Google BigQuery. Python and Pandas are used to extract data from MySQL and then move it into Google Sheets, where it can then be accessed in Google Data Studio via a simple data pipeline. Google Data Studio is an excellent tool, but there are 3 basic points to using Google Data Studio I want us to keep in mind before we get started.
Importing Data into GDS from Pandas
Google Sheets will always be my go-to data source for a variety of reasons.
- For datasets with less than 100k rows, the connection is fast enough.
- Google Data Studio and Google Sheets both have extensive documentation on functions & data manipulations.
- All of the strategies you need either to import any Google Sheet into Jupyter Notebooks or upload data frames into a workbook can be found in the Python gspread package.
Gspread will be able to access any sheets you’ve shared with your developer account. To bring any Google Sheet into Pandas, I use the following function.
But Take Note:
To start with – A NaN value is not tolerated by the gspread software library. The function will throw an error if your Dataframe contains any NaNs. Additionally, it is unable to handle iterables, therefore any data that contains lists should be unpacked before uploading. Google Data Studio can also be used to transform the lists into some strings and subsequently into a fresh data type, but that’s up to the dealer.
Then – Although Google Sheets appears to be boundless in size, it isn`t. After 120k rows & 50 columns, the maximum number of cells in a spreadsheet is 5 million. You should also take note that this limit applies to the entire workbook, not just a single worksheet, therefore you should upload a large dataset into a new, clean workbook.
It’s possible now to construct a data pipeline right from your Python working environment to the GDS visualizations, thanks to these new utilities. Additional Google Sheets plugins are supported by many well-known software businesses, making it possible to further automate your workflow.
Do Not Expect Technical Assistance from the Support Staff of the GDS
Complex, interactive, and easily shared visualizations are a strong suit of Google Data Studio. Although it has some basic manipulation capabilities, it isn’t a tool for manipulating data in any sophisticated way. As far as I can tell, GDS reads data like the SQL query & also offers active groupby functions and default aggregations. Any table or chart that utilizes that column will display an error if it finds contradictory types of data in one column (let’s say a date column contains the string ‘N/A’). There’s a huge problem with Data Studio’s merging constraint. In the first place, it’s incredible that you can combine information from two very distinct sources. If you have a list of clients, a list of operations with payment, and a list of staff who sold the products, then you have all the information you need. If you’re familiar with Pandas or SQL, you already know that these tables can be joined using two separate primary keys.
Understand Column vs. Metric
This final point will be the subject of its post in the future, but I intended to let you know in advance. Check the data settings after you have imported your data source. Google will try its best to convert the spreadsheet into clean, and efficient types of data, but typos and outliers will wreak havoc.
You need to understand all of these before you start the process of Importing data into Data studio using python.
Here are a few things to keep in mind as well:
A. Verify that the data is of the correct kind
This may sound obvious, but it’s crucial. Due to incompatible data or unusual formatting from Google Sheets, two of the classic issues I’ve encountered with GDS are when it treats a numeric column as a text column.
B. Develop Functions at the Dataset Level, rather than Within Reports
The Google library has a ton of methods and functions for manipulating data, so you can get creative with it. For me, the HYPERLINK() command is the best way to turn any piece of text into an active hyperlink. Remember that while you’re working with procedures on any report, you can generate a new metric, otherwise known as an averaged column. In both cases, the steps to create these functions are nearly identical, but by doing it at the level of Data Source, each report will have the same accessibility to that column. For example, a “lambda function” works like this: It only affects the chart or table in question.
The Columns in Which Aggregates Are Applied Are Known as Metrics
When attempting the process of Importing data into data studio using python, you may have a hard time with this bit. GDS’s capacity to group data points quickly and easily is one of its most impressive features. In other words, if you have a column for the product name and an item count metric in your table, GDS will add all the rows that contain that product. What if the pricing of the product was based on that metric? It’s likely to be a total of the separate costs. On the report or data set level, you can control these aggregates always, which is wonderful because it’s as simple as adding it to the metric’s category to transform any column into a metric. The default aggregation for number columns is the sum, but you can choose from a wide range of aggregation choices. If record count isn’t available, the only options for text columns are count or count distinct.
Let’s get down to business now.
Importing Data into Data Studio Using Python
Data can still be imported from Python or MySQL into Google Data Studio through Google Sheets and GSpread, despite the lack of native support in Google Data Studio for Python.
Here’s how you can do it –
GSpread is Required
Google Sheets spreadsheets will be used to store our data to import it into Google Data Studio from Python using the GSpread Python library. You may use Python to write to & read from the Google Sheets by installing gspread via PyPi. To authenticate your user, you must first create any Google Service Account by following the instructions in the GSpread documentation, then download the service account.json keyfile to /.config/gspread/service account.json.
Make a Spreadsheet and Distribute it to Others
Gspread can be imported now that GSpread has been installed and your Google Service Account credential has been in the correct place. Create a new spreadsheet titled “Data: Monthly sales by channel” and share it with the user and email them to inform them of the new spreadsheet’s existence. Your Google Drive won’t display until you’ve shared the spreadsheets in GSpread, so they must be.
Go To the Spreadsheet and Start Working with It
A spreadsheet can be opened in three ways: by name, by key, or by URL. My recommendation would be to use the open-by key example as it’s a unique key that cannot be destroyed if the sheet name is changed. Find the spreadsheet on your Google Drive and copy the URL’s key hash. What you see in the center is a series of characters.
Put Together a Few Worksheets with Meaningful Names
You can create named worksheets in your spreadsheets just like in ordinary Google Sheets. The default Sheet1 will be deleted after we’ve created a handful of named worksheets to contain our data. When we create new worksheets, we must specify the number of rows & columns they will have.
Using Google Sheets to Import Data from the Python Programming Language
SQLAlchemy will let us build some of the SQL queries into query MySQL, retrieve the data in Pandas, and then move the data into each one of the sheets we generated in the Google Sheets spreadsheet. When using SQLAlchemy, any percent symbols must be escaped by adding an extra percent.
If you are still here at this point, then you now understand the process of importing data into data studio using python. Your spreadsheet should now be available inside your Google Drive. You may now directly access your Pandas, MySQL, or Python data in GDS by going to Google Data Studio and adding the data source as usual. Is Google Data Studio a better alternative to Microsoft Excel than Google Sheets? Let us know your opinion on this and we’ll get back to you as soon as we can. To make things easy for you, Eaglytics.Co has ready-made data studio templates which are available through this link – https://eaglytics-co.com/products/