Use Python to Stylize the Excel Formatting

 Step by step tutorial to format Excel spreadsheets using Python

Steps to automatically generate a stylized report (Photo by cyda)

Goal

Do you have to regularly update the reports day after day? Have you ever thought of a way to automate these tedious, boring, and robotic works? You may say ‘Yes, but I can’t since there are lots of formatting stuff I need to manually add to Excel’. Today, I am going to show you how Python can work on Excel spreadsheet formatting and automatically generate the report.


Cryptocurrency as Example

I am a cryptocurrency enthusiast and I would check the price for different coins day by day. However, it is extremely time-consuming to check all the coins one by one and that’s why I would like to use a program to automatically generate the summary report for me every morning. If you are interested to know more financial data analysis, feel free to check my publication.


All codes and data are uploaded to my GitHub. You can check and folk this repo to further study. =)

Data Scraping

First of all, we need to collect the data to display in the table. For example, you may query the data from the company database. You may obtain the data from your business partner. Or just like my case, I scrape the data from the Internet instantly. The data source I used here is CoinDesk API. For more details about how this API works and the parameter settings are, you can check my article below.

import json
from urllib import request
url = "https://production.api.coindesk.com/v2/tb/price/ticker?assets=all"
response = request.urlopen(url)
data = json.loads(response.read().decode())

By the above code, we can easily obtain the latest cryptocurrency pricing for different coins. I have shown below how the data should look like.

{'statusCode': 200,
'message': 'OK',
'data': {'BTC': {'iso': 'BTC',
'name': 'Bitcoin',
'slug': 'bitcoin',
'change': {'percent': 0.6642429710971107, 'value': 298.538427},
'ohlc': {'o': 44944.160494,
'h': 46843.67,
'l': 43480.540171,
'c': 45242.698921},
'circulatingSupply': 18814943.79249204,
'marketCap': 851238837219.2552,
'ts': 1631590859000,
'src': 'tb'},
'ETH': {...}}}

Data Tabulation

However, the data is in JSON format which is not easy to read. Therefore, we have to preprocess the data into the table format.

def flatten_json(coin_dict):
flatten_dict = {}
def flatten(x, name=''):
if type(x) is dict:
for key in x:
flatten(x[key], name + key + '_')
else:
flatten_dict[name[:-1]] = x
flatten(coin_dict)
return flatten_dict
import pandas as pd
master_df = pd.DataFrame()
for coin in data['data'].keys():
temp_df = pd.json_normalize(flatten_json(data['data'][coin]))
master_df = master_df.append(temp_df)
master_df = master_df[['iso', 'name', 'ohlc_o', 'ohlc_h', 'ohlc_l', 'ohlc_c', 'change_percent']].reset_index(drop=True)
master_df.columns = ['Symbol', 'Name', 'Open', 'High', 'Low', 'Close', 'Pct_Change']
master_df.iloc[:, 2:] = master_df.iloc[:, 2:].apply(lambda x: round(x, 2))
master_df['Pct_Change'] = master_df['Pct_Change'] / 100
master_df = master_df.sort_values('Pct_Change', ascending=False).reset_index(drop=True)
master_df.to_csv('master_df.csv', index=False)
master_df.head()

In order to transform the JSON into the table, you can follow these steps.

  1. Flatten the dictionary in the JSON for every coin data
  2. Append all the coin data frames into one master data frame
  3. Select the columns that we are interested in
  4. Rename the columns for easy reference
  5. Rounding the figures to simplify the table
  6. Sort by the Percentage Change
Photo by cyda

Excel Formatting

Finally, it comes to the main dish. Just remember one thing, before writing the code, you should have a design in mind first. For example, below is my draft template. The reason to have the draft is because it helps you better define the column or index to put the elements (title, table, remark, etc).

Photo by cyda
1. Add the table to the excel
file_name = "Cryptocurrency.xlsx"
sheet_name = "Summary"
writer = pd.ExcelWriter(file_name, engine='xlsxwriter')
master_df.to_excel(writer, sheet_name=sheet_name, startrow = 2, index = False)

Just one point to note, you may see that there is a parameter startrow set to 2. It is because we would like to leave the first two rows for the title and the spacing and we will start writing the table in row 3 (Python count value starting from 0 so basically setting as 2 means the 3rd row)

Photo by cyda
2. Add the title to the excel
from datetime import datetime
workbook = writer.book
worksheet = writer.sheets[sheet_name]
worksheet.write(0, 0, 'Cryptocurrency Pricing Summary on '+datetime.now().strftime('%d %b %Y'), workbook.add_format({'bold': True, 'color': '#E26B0A', 'size': 14}))

To define which cell to write the text, you can count from top to bottom then from left to right. For example, if you are writing A1, then it is (0, 0). If you are writing C4, then it should be (3, 2).

Photo by cyda
3. Add the remark to the excel
worksheet.write(len(master_df)+4, 0, 'Remark:', workbook.add_format({'bold': True}))
worksheet.write(len(master_df)+5, 0, 'The last update time is ' + datetime.now().strftime('%H:%M') + '.')

Here is one trick to bear in mind. Since the data frame may have a different number of rows in every update, you should better consider its records when writing the remark string. For example, here I set the row index as len(master_df)+4 which is the number of rows in the table plus the title and spacings.

Photo by cyda
4. Add the color to the table header
header_format = workbook.add_format({'bold': True, 'text_wrap': True, 'fg_color': '#FDE9D9', 'border': 1})
for col_num, value in enumerate(master_df.columns.values):
worksheet.write(2, col_num, value, header_format)

For the table headers, two parameters are introduced. The first one is text_wrap. It is set to True so that the text for the header will be wrapped to the next line if the space is not enough to display. The second one is fg_color. It is used to set the foreground color of the cell.

Photo by cyda
5. Add the border to the table
row_idx, col_idx = master_df.shape
for r in range(row_idx):
for c in range(col_idx):
if c == 6:
worksheet.write(r + 3, c, master_df.values[r, c], workbook.add_format({'border': 1, 'num_format': '0.00%'}))
else:
worksheet.write(r + 3, c, master_df.values[r, c], workbook.add_format({'border': 1, 'num_format': '0.00'}))

The trick is similar to previous cases. Just remember to set the row index to be r+3 because there are three rows ahead of the first value in the table.

Photo by cyda
6. Set the Column Width
worksheet.set_column(0, 6, 12)
worksheet.set_column(1, 1, 20)

For your reference, the function parameter is as follow: 
(start column index, end column index, column width)

Photo by cyda

Conclusion

I hope this article can help those who have to regularly update the Excel spreadsheet. If you would like to know more office tricks with programming, you can subscribe to my medium and I will update and share more skills with you all. =)

Comments