Plotly安装并绘制5种主流数据图

Plotly是新一代的Python数据可视化开发库,它提供了完善的交互能力 和灵活的绘制选项。本文将介绍新手如何安装plotly并编写第一个plotly 绘图程序,以及使用plotly绘制常见的5种数据图表。

与Matplotlib和Seaborn相比,Plotly将数据可视化提升到一个新的层次。 Plotly内置完整的交互能力及编辑工具,支持在线和离线模式,提供稳定的API 以便与现有应用集成,既可以在web浏览器中展示数据图表,也可以存入本地 拷贝。Plotly唯一的缺点是太灵活,提供了太多的可选项。

1、安装PLotly

Plotly是运行在JSON格式上的平台,在Python中我们可以使用plot.ly包来访问 这个API。打开一个终端软后输入以下命令安装plotly:

1
~$ pip install plotly

Plotly的图表使用在线web服务托管,因此你需要首先创建一个在线账户来保存 你的图表。要提取你的个人API KEY请访问这个链接:https://plot.ly/settings/api#/。 拿到API KEY之后就可以使用set_credential_files()函数来初始化,例如:

1
2
3
4
5
6
import plotly

plotly.tools.set_credentials_file(
username=’YourAccountName’, # 账户名
api_key=’YourAPIKey’ # api key
)

2、Plotly数据结构基础

正如之前所述,plotly的可视化建立在JSON数据结构之上。

1
2
import plotly.plotly as py            # 用来与plotly服务器通信
import plotly.graph_objs as go # 用来生成图形对象

graph_objs类包含了一些通用的数据结构,在不同的可视化类型中保持一致。

我们先从trace开始,这是包含数据和绘制指令的单独一层,下面展示了trace 结构的一个示例:

1
2
3
4
5
6
7
8
9
10
11
trace1 = {
"x": ["2017-09-30", "2017-10-31", "2017-11-30", ...],
"y": [327900.0, 329100.0, 331300.0, ...],
"line": {
"color": "#385965",
"width": 1.5
},
"mode": "lines",
"name": "Hawaii",
"type": "scatter",
}

如你所见,trace是一个字典,其中保存了要绘制的数据,以及颜色、线性等方面 的绘制信息。

我们可以用列表组织多个trace,这个列表就成为data。trace在data中的顺序就 决定了它们在最终的图表中的摆放顺序。一个典型的data看起来像这样:

1
data = [trace1, trace2, trace3, trace4]

layout用来设置数据图表的布局,这包含例如标题、轴标题、字体等方面的显示特征。 和trace一样,layout也是一个字典对象:

1
2
3
4
5
6
7
8
9
10
11
12
13
layout = {
"showlegend": True,
"title": {"text": "Zillow Home Value Index for Top 5 States"},
"xaxis": {
"rangeslider": {"visible": True},
"title": {"text": "Year from 1996 to 2017"},
"zeroline": False
},
"yaxis": {
"title": {"text": "ZHVI BottomTier"},
"zeroline": False
}
}

最后,我们可以使用go.Figure()方法来编译data和layout,结果将传给我们 选择的绘图函数。

1
fig = go.Figure(data = data, layout = layout)

3、Plotly条形图

plotly bar chart

下面的代码绘制条形图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#Bar Chart
#Mean house values by Bedrooms type and year
import plotly.graph_objs as go
import plotly.plotly as py
trace1 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_1bedroom,
name = "ZHVI_1bedroom",
marker = dict(color = 'rgb(102,255,255)'),
text = df_groupby_datebr['ZHVI_1bedroom'])
trace2 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_2bedroom,
name = "ZHVI_2bedroom",
marker = dict(color = 'rgb(102,178,255)'),
text = df_groupby_datebr['ZHVI_2bedroom'])
trace3 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_3bedroom.values,
name = "ZHVI_3bedroom",
marker = dict(color = 'rgb(102,102,255)'),
text = df_groupby_datebr['ZHVI_3bedroom'])
trace4 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_4bedroom.values,
name = "ZHVI_4bedroom",
marker = dict(color = 'rgb(178, 102, 255)'),
text = df_groupby_datebr['ZHVI_4bedroom'])
trace5 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_5BedroomOrMore.values,
name = "ZHVI_5BedroomOrMore",
marker = dict(color = 'rgb(255, 102, 255)'),
text = df_groupby_datebr['ZHVI_5BedroomOrMore'])
data = [trace1, trace2, trace3, trace4, trace5]
layout = go.Layout(barmode = "group", title="Bar Chart: Mean House Values by Bedrooms and Year",
xaxis= dict(title= 'Year',ticklen= 5,zeroline= False),
yaxis= dict(title= 'Mean House Values',ticklen= 5,zeroline= False))
fig = go.Figure(data = data, layout = layout)
url = py.plot(fig, validate=False)

使用go.Bar()创建一个条形图类型的图表。使用go.Layout()函数,我们可以指定 一些重要的信息,例如barmode = "group"可以按年度分组不同的bar等等。

4、Plotly线形图

plotly line chart

下面的代码绘制线形图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#Line Plot
#Mean house values by bedrooms and year
trace1 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_1bedroom,
mode = "lines+markers",
name = "ZHVI_1bedroom",
marker = dict(color = 'rgb(102,255,255)'),
text = df_groupby_datebr['ZHVI_1bedroom'])
trace2 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_2bedroom,
mode = "lines+markers",
name = "ZHVI_2bedroom",
marker = dict(color = 'rgb(102,178,255)'),
text = df_groupby_datebr['ZHVI_2bedroom'])
trace3 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_3bedroom.values,
mode = "lines+markers",
name = "ZHVI_3bedroom",
marker = dict(color = 'rgb(102,102,255)'),
text = df_groupby_datebr['ZHVI_3bedroom'])
trace4 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_4bedroom.values,
mode = "lines+markers",
name = "ZHVI_4bedroom",
marker = dict(color = 'rgb(178, 102, 255)'),
text = df_groupby_datebr['ZHVI_4bedroom'])
trace5 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_5BedroomOrMore.values,
mode = "lines+markers",
name = "ZHVI_5BedroomOrMore",
marker = dict(color = 'rgb(255, 102, 255)'),
text = df_groupby_datebr['ZHVI_5BedroomOrMore'])
data = [trace1, trace2, trace3, trace4, trace5]
layout = go.Layout(title = 'Line Plot: Mean House Values by Bedrooms and Year',
xaxis= dict(title= 'Year',ticklen= 5,zeroline= False),
yaxis= dict(title= 'Mean House Values',ticklen= 5,zeroline= False))
fig = go.Figure(data = data, layout = layout)
url = py.plot(fig, validate=False)

使用go.Scatter()初始化线形图trace。我们可以使用mode参数来修改标记模式。例如:

1
mode = "lines+markers"

5、Plotly时序线图

plotly time series line chart

下面的代码绘制时序线图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#Time Series Line Chart
state_list = df_state.groupby('RegionName')[['ZHVI_BottomTier']].mean().sort_values(
by='ZHVI_BottomTier', ascending=False)[:5].index.values.tolist()
colors = dict(zip(state_list, sns.color_palette("GnBu_d", len(state_list)).as_hex()))


trace_list = []
for state in state_list:
trace = go.Scatter(
y=df_state[df_state['RegionName']==state]['ZHVI_BottomTier'].tolist(),
x=df_state[df_state['RegionName']==state]['Date'].tolist(),
mode='lines',
name=state,
line = dict(
color = colors[state],
width = 1.5,
# dash = 'dot'
)
)
trace_list.append(trace)


layout = go.Layout(
xaxis=dict(title='Year from 1996 to 2017', zeroline=False, rangeslider=dict(visible=True)),
yaxis=dict(title='ZHVI BottomTier', zeroline=False),

title='Zillow Home Value Index for Top 5 States',
showlegend=True,
)



fig = go.Figure(data=trace_list, layout=layout)
url = py.plot(fig, validate=False, filename='ZHVI BottomTier')

这里我们添加了一个范围滑杆来调节我们可以在主图中包含的数据。我们也 使用一个字典为每种状态设置不同的颜色。为此我们使用了seaborn的color_palette() 函数。由于plotly不支持RGB元组,我们可以使用as_hex()函数将其转换为16进制代码。

6、Plotly多散列图

plotly multiple scatter chart

下面的代码绘制多个散列图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#Multiple Scatter Plots
from plotly import tools
trace1 = go.Scatter(x=df_sts.MedianListingPrice_1Bedroom,
y=df_sts.MedianListingPrice_2Bedroom, mode='markers',
name = "1Bedroom&2Bedroom", marker = dict(
color = 'rgb(102,255,255)'))
trace2 = go.Scatter(x=df_sts.MedianListingPrice_2Bedroom,
y=df_sts.MedianListingPrice_3Bedroom, mode='markers',
name = "2Bedroom&3Bedroom", marker = dict(
color = 'rgb(102,178,255)'))
trace3 = go.Scatter( x=df_sts.MedianListingPrice_3Bedroom,
y=df_sts.MedianListingPrice_4Bedroom, mode='markers',
name = "3Bedroom&4Bedroom", marker = dict(
color = 'rgb(102,102,255)'))
trace4 = go.Scatter(x=df_sts.MedianListingPrice_4Bedroom,
y=df_sts.MedianListingPrice_5BedroomOrMore, mode='markers',
name = "4Bedroom&5+Bedroom", marker = dict(
color = 'rgb(178, 102, 255)'))

fig = tools.make_subplots(rows=2, cols=2, subplot_titles=("1Bedroom & 2Bedroom", "2Bedroom & 3Bedroom",
"3Bedroom&4Bedroom", "4Bedroom&5+Bedroom"))

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)

fig['layout']['xaxis3'].update(title='Median Listing Price')#showgrid=False
fig['layout']['xaxis4'].update(title='Median Listing Price')

fig['layout']['yaxis1'].update(title='Median Listing Price')
fig['layout']['yaxis3'].update(title='Median Listing Price')

fig['layout'].update(height=600, width=600, title='Mutiple Scatter Plots: Median Listing Price of' +
' Bedrooms')
url = py.plot(fig, validate=False)

要创建这个布局,我们没有将traces添加到单一字典,而是使用make_subplots() 函数创建不同的子图,然后使用append_trace()将trace添加到指定的位置。

7、Plotly等值线图

plotly choropleth

使用下面的代码绘制等值线图:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#Choropleth
ZHVI_state_year = df_state.groupby(['RegionName','year'])[['MedianListingPricePerSqft_AllHomes']].mean()
ZHVI_county_year = df_county.groupby(['RegionName','year'])[['MedianListingPricePerSqft_AllHomes']].mean()

ZHVI_state_2017 = df_state[df_state.year==2017].groupby(['RegionName'])[['MedianListingPricePerSqft_AllHomes']].mean()
ZHVI_county_2017 = df_county[df_county.year==2017].groupby(['RegionName'])[['MedianListingPricePerSqft_AllHomes']].mean()

#%%
values = ZHVI_county_2017['MedianListingPricePerSqft_AllHomes'].tolist()
fips = ZHVI_county_2017['MedianListingPricePerSqft_AllHomes'].index.tolist()

ZHVI_county_2017['MedianListingPricePerSqft_AllHomes'].describe()

colorscale = [
'rgb(102,255,255)',
'rgb(102,178,255)',
'rgb(102,102,255)',
'rgb(178, 102, 255)',
]


fig = ff.create_choropleth(
fips=fips, values=values, scope=['usa'],
binning_endpoints=[80.9, 102.8, 135.5], colorscale=colorscale,
title='United States', legend_title='ZHVI_BottomTier by County'
)
print('done')
url = py.plot(fig, validate=False, filename='ZHVI_cities')

对于等值线图,我们可以利用图表工厂类走个捷径,工厂类包含了一组 用于创建复杂图表的快捷函数。

1
import plotly.figure_factory as ff

ff.create_choropleth()调用时,我们传入一组FIPS值,或每个国家、城市或 州的地理标识代码。


原文链接:Getting Started with Plot.ly

汇智网翻译整理,转载请标明出处