利用ceph-deploy完成ceph分布式存储的快速安装。
参考文档:http://docs.ceph.org.cn/start/
安装环境为Centos7,安装使用服务器三台,安装过程需要全程联网。
三台主机分别为ceph1、ceph2、ceph3
一、初始化环境
配置SSH无密码互通、时间同步、官方yum源等。
修改主机名(在个节点操作):
hostnamectl set-hostname ceph1
配置本地解析
vim /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.1 ceph1 192.168.1.2 ceph2 192.168.1.3 ceph3
创建本地密钥:
ssh-keygen
发送密钥给节点服务器(在三台执行):
ssh-copy-id ceph1 ssh-copy-id ceph2 ssh-copy-id ceph3
配置时间同步(主节点):
yum -y install chrony vim /etc/chrony.conf # Allow NTP client access from local network. allow 192.168.1.0/24 # Serve time even if not synchronized to a time source. local stratum 10
配置时间同步(存储节点):
yum -y install chrony vim /etc/chrony.conf server 192.168.1.1 iburst
建议在存储节点中放置crontab定时任务:
crontab -e -uroot * */3 * * * /usr/sbin/ntpdate 192.168.1.1
关闭selinux:
setenforce 0 vim /etc/selinux/config SELINUX=permissive
关闭防火墙:
systemctl stop firewalld.service systemctl disable firewalld.service iptables -F
新建ceph用户(之后全部操作都在此用户下进行):
useradd ceph echo '******' |passwd --stdin ceph
配置用户sudo权限
vim /etc/sudoers ceph ALL=(root) NOPASSWD:ALL
重新发送ceph用户的ssh密钥:
su - ceph ssh-copy-id ceph1 ssh-copy-id ceph2 ssh-copy-id ceph3
配置ceph程序yum源(本地源被我替换成了163的源)
vim /etc/yum.repos.d/ceph.repo [ceph-noarch] name=Ceph noarch packages baseurl=http://download.ceph.com/rpm-15.2.9/el7/noarch/ enabled=1 gpgcheck=0 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc
以上baseurl路径根据http://download.ceph.com/页面中实际值修改。
编辑连接文件(ceph用户执行):
vim /home/ceph/.ssh/config Host ceph1 Hostname ceph1 User ceph Host ceph2 Hostname ceph2 User ceph Host ceph3 Hostname ceph3 User ceph chmod 600 /home/ceph/.ssh/config
准备一块无挂载的空磁盘(步骤略)
二、部署ceph程序
切换用户
su - ceph
创建文件夹:
cd mkdir my-cluster cd my-cluster
安装ceph-deploy
sudo yum install ceph-deploy
创建集群:
sudo ceph-deploy new ceph1 ceph2 ceph3
安装ceph:
sudo ceph-deploy install ceph1 ceph2 ceph3
(此处可能出现报错,根据报错提示,相应节点安装程序即可解决。以上命令可反复执行,直至安装完成!)
初始配置mon,并收集所有密钥
sudo ceph-deploy --overwrite-conf mon create-initial [ceph@ceph1 my-cluster]$ ll total 712 -rw-------. 1 root root 113 Jul 15 12:37 ceph.bootstrap-mds.keyring -rw-------. 1 root root 113 Jul 15 12:37 ceph.bootstrap-mgr.keyring -rw-------. 1 root root 113 Jul 15 12:37 ceph.bootstrap-osd.keyring -rw-------. 1 root root 113 Jul 15 12:37 ceph.bootstrap-rgw.keyring
拷贝密钥:
sudo cp -rp ./*keyring /etc/ceph/
部署mgr:
sudo ceph-deploy mgr create ceph1 ceph2 ceph3
部署OSD(此步骤决定存储空间大小):
sudo ceph-deploy --overwrite-conf osd create --data /dev/sdb1 ceph1 sudo ceph-deploy --overwrite-conf osd create --data /dev/sdb1 ceph2 sudo ceph-deploy --overwrite-conf osd create --data /dev/sdb1 ceph3 sudo ceph osd tree
检查集群状态
[ceph@ceph1 my-cluster]$ sudo ceph -s cluster: id: af5b857b-e874-4f19-939d-a261e689d0f1 health: HEALTH_OK services: mon: 3 daemons, quorum ceph1,ceph2,ceph3 mgr: ceph1(active), standbys: ceph2, ceph3 osd: 3 osds: 3 up, 3 in rgw: 1 daemon active data: pools: 4 pools, 32 pgs objects: 221 objects, 1.5 KiB usage: 3.0 GiB used, 2.8 TiB / 2.8 TiB avail pgs: 32 active+clean
部署RGW:
检查程序安装:yum install ceph-radosgw mailcap
sudo ceph-deploy install --rgw ceph1 sudo ceph-deploy --overwrite rgw create ceph1 ps aux|grep radosgw
建立账号:
以下命令输出内容保存
sudo radosgw-admin user create --uid testid --display-name 'M. Tester' --system
部署dashboard
sudo ceph mgr module enable dashboard
开启ssl加密使用以下命令(若无需开启则跳过此命令执行以下内容):
sudo ceph dashboard create-self-signed-cert
前台页面查看(查看dashboard的值):
sudo ceph mgr services
创建前台账号
sudo ceph dashboard set-login-credentials {user} {passwd}
添加权限(以下两个数值参考前面保存的输出内容):
sudo ceph dashboard set-rgw-api-access-key {access-key} sudo ceph dashboard set-rgw-api-secret-key {secret-key}
不开启ssl加密则使用命令如下:
------以下内容引用https://zhuanlan.zhihu.com/p/135288558
取消ssl:
sudo ceph config set mgr mgr/dashboard/ssl false
监听IP配置
sudo ceph config set mgr mgr/dashboard/server_addr 0.0.0.0
使其生效:
sudo ceph mgr module disable dashboard sudo ceph mgr module enable dashboard
------引用结束
三、错误内容汇总
如果遇到错误,可使用如下命令恢复:
ceph-deploy purgedata ceph1 ceph2 ceph3 ceph-deploy forgetkeys
用以下命令可以完整清除(同时会删除ceph安装包)
ceph-deploy purge ceph1 ceph2 ceph3
注意:官方清理工具清理不够干净,建议使用以下方法清理:
rm -rf `find / |grep ceph` yum -y remove `rpm -qa |grep ceph `
此清理方法会删掉部分与ceph相关的lib库,参考下方问题十一可解决。
错误一
[2021-07-15 11:37:51,304][ceph_deploy][ERROR ] RuntimeError: connecting to host: ceph2 resulted in errors: HostNotFound ceph2
节点名称错误,ping ceph2测试,若能通,则考虑/home/ceph/.ssh/config文件权限出错,或内容错误。权限需要改为600
错误二
[2021-07-15 11:48:11,766][ceph_deploy][ERROR ] RuntimeError: NoSectionError: No section: 'ceph'
反复执行
错误三
[2021-07-15 11:52:22,668][ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum install -y https://download.ceph.com/rp
m-mimic/el7/noarch/ceph-release-1-0.el7.noarch.rpm
下载到本地自行装包(https协议可能需要改为http):
wget http://download.ceph.com/rpm-mimic/el7/noarch/ceph-release-1-0.el7.noarch.rpm yum -y install ceph-release-1-0.el7.noarch.rpm
装包完成后仍有此报错就反复执行
错误四
[2021-07-15 11:58:54,404][ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install epel-release
根据前方给定的节点,在相应服务器自行安装
yum -y install epel-release
装包完成后仍有此报错就反复执行
错误五
[2021-07-15 11:59:53,550][ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rpm --import https://download.ceph.com/keys
/release.asc
反复执行
错误六
[2021-07-15 12:03:29,020][ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph ceph-radosgw
根据前方给定的节点,在相应服务器自行安装
yum -y install ceph ceph-radosgw
装包完成后仍有此报错就反复执行
错误七
[2021-07-15 12:29:41,604][ceph3][WARNING] Existing lock /var/run/yum.pid: another copy is running as pid 9409.
......
[2021-07-15 12:30:06,081][ceph_deploy][ERROR ] KeyboardInterrupt
相应节点服务器删除 /var/run/yum.pid
问题八:重装后发现问题
有多余的磁盘,应是之前安装的时候没清理掉,这里需要把他删掉(也可以不做处理)
[ceph@ceph1 my-cluster]$ sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 5.56613 root default -3 1.85538 host ceph1 0 hdd 0.92769 osd.0 down 1.00000 1.00000 3 hdd 0.92769 osd.3 up 1.00000 1.00000 -5 1.85538 host ceph2 1 hdd 0.92769 osd.1 down 1.00000 1.00000 4 hdd 0.92769 osd.4 up 1.00000 1.00000 -7 1.85538 host ceph3 2 hdd 0.92769 osd.2 down 1.00000 1.00000 5 hdd 0.92769 osd.5 up 1.00000 1.00000
用一下命令清理即可
sudo ceph osd purge osd.0 --yes-i-really-mean-it sudo ceph osd purge osd.1 --yes-i-really-mean-it sudo ceph osd purge osd.2 --yes-i-really-mean-it
再次查看:
[ceph@ceph1 my-cluster]$ sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.78307 root default -3 0.92769 host ceph1 3 hdd 0.92769 osd.3 up 1.00000 1.00000 -5 0.92769 host ceph2 4 hdd 0.92769 osd.4 up 1.00000 1.00000 -7 0.92769 host ceph3 5 hdd 0.92769 osd.5 up 1.00000 1.00000
此处问题参考:https://blog.csdn.net/qq_40017427/article/details/106219200
问题九:出现报错AttributeError: 'module' object has no attribute 'needs_ssh'
[ceph@ceph1 my-cluster]$ sudo ceph-deploy new ceph1 ceph2 ceph3 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.25): /bin/ceph-deploy new ceph1 ceph2 ceph3 [ceph_deploy.new][DEBUG ] Creating new cluster named ceph [ceph_deploy.new][INFO ] making sure passwordless SSH succeeds [ceph_deploy][ERROR ] Traceback (most recent call last): [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc [ceph_deploy][ERROR ] return f(*a, **kw) [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 162, in _main [ceph_deploy][ERROR ] return args.func(args) [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/new.py", line 141, in new [ceph_deploy][ERROR ] ssh_copy_keys(host, args.username) [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/new.py", line 35, in ssh_copy_keys [ceph_deploy][ERROR ] if ssh.can_connect_passwordless(hostname): [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/util/ssh.py", line 15, in can_connect_passwordless [ceph_deploy][ERROR ] if not remoto.connection.needs_ssh(hostname): [ceph_deploy][ERROR ] AttributeError: 'module' object has no attribute 'needs_ssh' [ceph_deploy][ERROR ]
问题与ceph-deploy版本有关,指令添加参数“--no-ssh-copykey”:
[ceph@ceph1 my-cluster]$ sudo ceph-deploy new ceph1 ceph2 ceph3 --no-ssh-copykey
如果ceph-deploy的版本为1.5.25左右的话,最佳解决办法是将ceph-deploy程序升级到2.0.1;升级后重新执行。
此处问题参考:https://www.10qianwan.com/articledetail/698055.html
问题十:出现报错[ERROR ] RuntimeError: Failed to execute command: rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[ceph1][DEBUG ] Loading mirror speeds from cached hostfile [ceph1][DEBUG ] * epel: mirrors.bfsu.edu.cn [ceph1][DEBUG ] Package yum-plugin-priorities-1.1.31-54.el7_8.noarch already installed and latest version [ceph1][DEBUG ] Nothing to do [ceph1][DEBUG ] Configure Yum priorities to include obsoletes [ceph1][WARNIN] check_obsoletes has been enabled for Yum priorities plugin [ceph1][INFO ] Running command: rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [ceph1][WARNIN] curl: (22) The requested URL returned error: 404 Not Found [ceph1][WARNIN] error: https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: import read failed(2). [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
指令添加参数“--no-adjust-repos”:
sudo ceph-deploy install ceph1 ceph2 ceph3 --no-adjust-repos
如果ceph-deploy的版本为1.5.25左右的话,最佳解决办法是将ceph-deploy程序升级到2.0.1;升级后从创建集群开始重新执行。
此处问题参考:https://tracker.ceph.com/issues/9032
问题十一:ImportError: libceph-common.so.0: cannot open shared object file: No such file or directory
[2021-10-11 12:07:19,044][ceph1][DEBUG ] Complete! [2021-10-11 12:07:19,160][ceph1][INFO ] Running command: ceph --version [2021-10-11 12:07:19,678][ceph1][WARNING] Traceback (most recent call last): [2021-10-11 12:07:19,679][ceph1][WARNING] File "/bin/ceph", line 130, in <module> [2021-10-11 12:07:19,679][ceph1][WARNING] import rados [2021-10-11 12:07:19,679][ceph1][WARNING] ImportError: libceph-common.so.0: cannot open shared object file: No such file or directory [2021-10-11 12:07:19,681][ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 1 [2021-10-11 12:07:19,681][ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph --version
之前清理ceph安装时,lib库被清掉了,执行以下命令重新安装lib库文件
yum -y reinstall lib*
或针对性解决此问题:
yum -y reinstall librados2
参考https://cbs.centos.org/koji/fileinfo?rpmID=166331&filename=/usr/lib64/ceph/libceph-common.so.0
问题十二:添加磁盘时出现报错
[ceph1][DEBUG ] find the location of an executable [ceph1][INFO ] Running command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb1 [ceph1][WARNIN] Running command: /bin/ceph-authtool --gen-print-key [ceph1][WARNIN] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 1850160e-34d7-423a-b455-4be4ed33d1bd [ceph1][WARNIN] Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-1850160e-34d7-423a-b455-4be4ed33d1bd ceph-3fda7e19-610b-48b8-a85f-121f4204bf11 [ceph1][WARNIN] stderr: Calculated size of logical volume is 0 extents. Needs to be larger. [ceph1][WARNIN] --> Was unable to complete a new OSD, will rollback changes [ceph1][WARNIN] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it [ceph1][WARNIN] stderr: purged osd.0 [ceph1][WARNIN] --> RuntimeError: command returned non-zero exit status: 5 [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb1 [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
原因是没有格式化,格式化一下就好了。
mkfs.ext4 /dev/sdb1
发表评论