How to Extract Monthly Revenue Information in Taiwan Stock Exchange via Python?
Unlike US stock market, the public traded companies in Taiwanese stock market publish revenue every month. This information is very useful to help us track of operations performance, trending, industry status, etc.
Today we are going to show you how to pull this information. First, let’s study on the structure of the following websites before extracting the data.
https://mops.twse.com.tw/nas/t21/XXX/t21sc03_YYY_M_i.html
XXX = sii (stock exchange market) /otc (over-the-counter market)
YYY = R.O.C year (YYY + 1911 = CE)
M = Month
i = 0 (Local companies) / 1 (Companies registered in Cayman Islands)
Step 0: Import packages.
from datetime import date
from datetime import timedelta
import pandas as pd
import requests
import sqlite3
from google.colab import drive
Step 1: Define year/ month
today = date.today()
year = today.year - 1911
month = today.month - 1
date = str(year) + '_' + str(month)
Step 2: Remove the current year-month to make sure there is no duplicates.
drive.mount('/content/drive')
con = sqlite3.connect('/content/drive/MyDrive/data/Stock.db')
delete_sql = "DELETE FROM monthly_revenue WHERE 年月 = '" + date + "'"
con.execute(delete_sql)
con.commit()
Step 3: Extract information for these 4 websites.
url_1 = 'https://mops.twse.com.tw/nas/t21/sii/t21sc03_'+ date +'_0.html'
url_2 = 'https://mops.twse.com.tw/nas/t21/sii/t21sc03_'+ date +'_1.html'
url_3 = 'https://mops.twse.com.tw/nas/t21/otc/t21sc03_'+ date +'_0.html'
url_4 = 'https://mops.twse.com.tw/nas/t21/otc/t21sc03_'+ date +'_1.html'
urls = [url_1, url_2 , url_3 , url_4]
for url in urls:
res = requests.get(url)
res.encoding = 'big5'
html_df = pd.read_html(res.text)
df = pd.concat([df for df in html_df if df.shape[1] == 11])
df.columns = df.columns.get_level_values(1)
df = df[df['公司名稱'] != '合計']
df = df.reset_index(drop=True)
df['年月'] = date
Step 4: Load data into SQLite
df.to_sql('monthly_revenue', con, if_exists='append')
con.close()
Thank you and more to come! We will talk more on the use cases for this output to help us understand operations performance, trending, industry status, etc. Enjoy it :)
If you want to support Informula, you can buy us a coffee here :)