2014年10月5日 星期日

OpenStack 障礙排除 - VM 無法正常啟動 (No valid host was found)

OpenStack 障礙排除 - VM 無法正常啟動 (No valid host was found)

目錄

1、障礙說明

昨天按照著官方網站的安裝手冊將 Neutron 安裝好之後,準備要來啟動第一個在 OpenStack 上的 VM,結果下完 nova boot xxxxxxx 的指令之後,我檢查了一下 VM 狀態,發現 status ERROR:

controller#  nova list
+--------------------------------------+----------------+--------+------------+-------------+----------+
| ID                                   | Name           | Status | Task State | Power State | Networks |
+--------------------------------------+----------------+--------+------------+-------------+----------+
| b7eb4c0f-ef39-4034-b4a9-1d9f7f90b553 | demo-instance1 | ERROR  | -          | NOSTATE     |          |
+--------------------------------------+----------------+--------+------------+-------------+----------+

2、檢查過程

2.1 檢查 Controller node

2.1.1 查詢 nova 錯誤訊息:

 nova show demo-instance1
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| Property                             | Value                                                                                                                  |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                                                                 |
| OS-EXT-AZ:availability_zone          | nova                                                                                                                   |
| OS-EXT-STS:power_state               | 0                                                                                                                      |
| OS-EXT-STS:task_state                | -                                                                                                                      |
| OS-EXT-STS:vm_state                  | error                                                                                                                  |
| OS-SRV-USG:launched_at               | -                                                                                                                      |
| OS-SRV-USG:terminated_at             | -                                                                                                                      |
| accessIPv4                           |                                                                                                                        |
| accessIPv6                           |                                                                                                                        |
| config_drive                         |                                                                                                                        |
| created                              | 2014-10-04T13:56:37Z                                                                                                   |
| fault                                | {"message": "No valid host was found.", "code": 500, "created": "2014-10-03T09:50:40Z"} |
| flavor                               | m1.tiny (1)                                                                                                            |
| hostId                               |                                                              |
| id                                   | b7eb4c0f-ef39-4034-b4a9-1d9f7f90b553                                                                                   |
| image                                | cirros-0.3.3-x86_64 (77c0d5f8-1bcc-4937-932c-72f4b0eccbc3)                                                             |
| key_name                             | demo-key                                                                                                               |
| metadata                             | {}                                                                                                                     |
| name                                 | demo-instance1                                                                                                         |
| os-extended-volumes:volumes_attached | []                                                                                                                     |
| status                               | ERROR                                                                                                                  |
| tenant_id                            | 7539436331ca4f9783bf93163e2a2e0f                                                                                       |
| updated                              | 2014-10-04T22:44:43Z                                                                                                   |
| user_id                              | bc1ae50e167f45edb064e582702c5792                                                                                       |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+

2.1.2 查詢 /etc/nova/nova-api.log

出現以下訊息:

2014-10-04 13:56:38.074 1347 INFO nova.osapi_compute.wsgi.server [req-18fd79ad-0a01-48a0-b8f0-29fac56a5c09 bc1ae50e167f45edb064e582702c5792 7539436331ca4f9783bf93163e2a2e0f] 10.0.0.11 “GET /v2/7539436331ca4f9783bf93163e2a2e0f/images/77c0d5f8-1bcc-4937-932c-72f4b0eccbc3 HTTP/1.1” status: 200 len: 894 time: 0.4527042
2014-10-04 13:56:38.092 1347 INFO nova.api.openstack.wsgi [req-d508d2a8-4a70-46d6-b11e-e21da95224be bc1ae50e167f45edb064e582702c5792 7539436331ca4f9783bf93163e2a2e0f] HTTP exception thrown: The resource could not be found.
2014-10-04 13:56:38.094 1347 INFO nova.osapi_compute.wsgi.server [req-d508d2a8-4a70-46d6-b11e-e21da95224be bc1ae50e167f45edb064e582702c5792 7539436331ca4f9783bf93163e2a2e0f] 10.0.0.11 “GET /v2/7539436331ca4f9783bf93163e2a2e0f/flavors/m1.tiny HTTP/1.1” status: 404 len: 272 time: 0.0192289
2014-10-04 13:56:38.106 1347 INFO nova.osapi_compute.wsgi.server [req-9b6159e7-67ff-454e-a711-46c626354c7d bc1ae50e167f45edb064e582702c5792 7539436331ca4f9783bf93163e2a2e0f] 10.0.0.11 “GET /v2/7539436331ca4f9783bf93163e2a2e0f/flavors HTTP/1.1” status: 200 len: 1383 time: 0.0111101
2014-10-04 13:56:38.117 1347 INFO nova.osapi_compute.wsgi.server [req-a84e3545-5d20-456f-abad-5e8d5f7bc634 bc1ae50e167f45edb064e582702c5792 7539436331ca4f9783bf93163e2a2e0f] 10.0.0.11 “GET /v2/7539436331ca4f9783bf93163e2a2e0f/flavors HTTP/1.1” status: 200 len: 1383 time: 0.0103061
2014-10-04 13:56:38.130 1347 INFO nova.osapi_compute.wsgi.server [req-f5888737-0e2a-4c43-bffa-7f61479f3844 bc1ae50e167f45edb064e582702c5792 7539436331ca4f9783bf93163e2a2e0f] 10.0.0.11 “GET /v2/7539436331ca4f9783bf93163e2a2e0f/flavors/1 HTTP/1.1” status: 200 len: 591 time: 0.0125630

2.1.3 查詢 Nova DB

在 nova.instance_faults 裡面找到 error detail 為以下內容:

File “/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py”, line 108, in schedule_run_instance
raise exception.NoValidHost(reason=”“)

可以研判是 nova schedular 找不到合適的 compute node 作為 host。

2.2 檢查 Compute node

2.2.1 查詢 /etc/nova/nova-compute.log

出現以下訊息:

2014-10-04 13:56:11.489 18675 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on controller:5672
2014-10-04 13:56:11.489 18675 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2014-10-04 13:56:15.501 18675 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on controller:5672 is unreachable: Socket closed. Trying again in 5 seconds.
2014-10-04 13:56:20.504 18675 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on controller:5672
2014-10-04 13:56:20.505 18675 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2014-10-04 13:56:24.520 18675 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on controller:5672 is unreachable: Socket closed. Trying again in 7 seconds.
2014-10-04 13:56:31.525 18675 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on controller:5672
2014-10-04 13:56:31.525 18675 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds…
2014-10-04 13:56:35.536 18675 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on controller:5672 is unreachable: Socket closed. Trying again in 9 seconds.
……. (以下省略)

從上面可以看出 compute node 無法與 RabbitMQ service 進行通訊。

另外從官網上找到一張圖,說明 Nova 在啟動 VM instance 的完整流程:
Nova VM Provisioning

問題就是出在 4~8 這一段,computer node 無法與 (queue)RabbitMQ service 進行通訊,因此無法向 nova-api 通知有可用的 compute node,因此 nova schedular 就找不到合適的 compute node 可用,也因此無法派送佈署的訊息給 queue。

因為目前環境中只有一台 compute node 的情況下,Nova Scheduler 會找不到可以佈署 VM instance 的 compute node,因此會產生 Error。

3、解決方式

確認了 compute node 無法與 RabbitMQ 通訊後,首先檢查 /etc/nova/nova.conf 內的 RabbitMQ 帳號密碼設定是否正確。

結果發現原來密碼設定錯誤,難怪 compute node 一直無法與 RabbitMQ 通訊,修正後重新啟動 nova-compute 服務就可以正常佈署 VM 了!

4、參考資料

沒有留言:

張貼留言