正如本文所述文档,我想在现有 VPC 中使用 kOps 创建 Kubernetes 集群。我已经创建了 VPC、Internet 网关、路由表、子网和 EC2 实例,我想用它们来调用命令kops create cluster
和其他内容。这些资源是使用以下 CloudFormation 模板创建的:
AWSTemplateFormatVersion: "2010-09-09"
Description: "AWS CloudFormation Template for Kops Poc"
Resources:
KopsPocVPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 172.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: tbe-kops-poc-vpc
- Key: Project
Value: Kops Poc
KopsPocVPCCidrBlockIPv6:
Type: AWS::EC2::VPCCidrBlock
Properties:
VpcId: !Ref KopsPocVPC
AmazonProvidedIpv6CidrBlock: true
KopsPocDHCPOptions:
Type: AWS::EC2::DHCPOptions
Properties:
DomainName: ap-south-1.compute.internal
DomainNameServers:
- AmazonProvidedDNS
Tags:
- Key: Name
Value: tbe-kops-poc-dopt
- Key: Project
Value: Kops Poc
KopsPocVPCDHCPOptions:
Type: AWS::EC2::VPCDHCPOptionsAssociation
Properties:
VpcId: !Ref KopsPocVPC
DhcpOptionsId: !Ref KopsPocDHCPOptions
KopsPocNetworkAcl:
Type: AWS::EC2::NetworkAcl
Properties:
VpcId: !Ref KopsPocVPC
Tags:
- Key: Name
Value: tbe-kops-poc-acl
- Key: Project
Value: Kops Poc
KopsPocInboundNetworkAclEntryIPv4:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 100
Protocol: -1
RuleAction: allow
Egress: false
CidrBlock: 0.0.0.0/0
KopsPocInboundNetworkAclEntryIPv6:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 101
Protocol: -1
RuleAction: allow
Egress: false
Ipv6CidrBlock: ::/0
KopsPocOutboundNetworkAclEntryIPv4:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 100
Protocol: -1
RuleAction: allow
Egress: true
CidrBlock: 0.0.0.0/0
KopsPocOutboundNetworkAclEntryIPv6:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 101
Protocol: -1
RuleAction: allow
Egress: true
Ipv6CidrBlock: ::/0
KopsPocInternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: tbe-kops-poc-igw
- Key: Project
Value: Kops Poc
KopsPocVPCGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref KopsPocVPC
InternetGatewayId: !Ref KopsPocInternetGateway
KopsPocRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref KopsPocVPC
Tags:
- Key: Name
Value: tbe-kops-poc-rt
- Key: Project
Value: Kops Poc
KopsPocRouteIPV4:
Type: AWS::EC2::Route
DependsOn: KopsPocVPCGatewayAttachment
Properties:
RouteTableId: !Ref KopsPocRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref KopsPocInternetGateway
KopsPocRouteIPV6:
Type: AWS::EC2::Route
DependsOn: KopsPocVPCGatewayAttachment
Properties:
RouteTableId: !Ref KopsPocRouteTable
DestinationIpv6CidrBlock: ::/0
GatewayId: !Ref KopsPocInternetGateway
KopsPocSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref KopsPocVPC
CidrBlock: 172.0.1.0/24
AvailabilityZone: ap-south-1a
Tags:
- Key: Name
Value: tbe-kops-poc-subnet
- Key: Project
Value: Kops Poc
KopsPocSubnetRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref KopsPocSubnet
RouteTableId: !Ref KopsPocRouteTable
KopsPocSubnetNetworkAclAssociation:
Type: AWS::EC2::SubnetNetworkAclAssociation
Properties:
SubnetId: !Ref KopsPocSubnet
NetworkAclId: !Ref KopsPocNetworkAcl
KopsPocManagementInstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
VpcId: !Ref KopsPocVPC
GroupDescription: Kops Poc Management Instance Security Group
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
Tags:
- Key: Name
Value: tbe-kops-poc-management-sg
- Key: Project
Value: Kops Poc
KopsPocManagementInstance:
Type: AWS::EC2::Instance
DependsOn: KopsPocVPCGatewayAttachment
Properties:
AvailabilityZone: ap-south-1a
ImageId: ami-0cca134ec43cf708f
InstanceType: t3a.large
KeyName: tbe-kops-poc
NetworkInterfaces:
- NetworkInterfaceId: !Ref KopsPocEth0
DeviceIndex: 0
Volumes:
- Device: /dev/sdf
VolumeId: !Ref KopsPocManagementInstanceVolume
IamInstanceProfile: TBEKopsPocEC2ServiceRole
Tags:
- Key: Name
Value: tbe-kops-poc-management-instance
- Key: Project
Value: Kops Poc
KopsPocIPAddress:
Type: AWS::EC2::EIP
DependsOn: KopsPocVPCGatewayAttachment
Properties:
Domain: vpc
InstanceId: !Ref KopsPocManagementInstance
Tags:
- Key: Name
Value: tbe-kops-poc-eip
- Key: Project
Value: Kops Poc
KopsPocEth0:
Type: AWS::EC2::NetworkInterface
Properties:
GroupSet:
- !Ref KopsPocManagementInstanceSecurityGroup
SubnetId: !Ref KopsPocSubnet
Tags:
- Key: Name
Value: tbe-kops-poc-eth0
- Key: Project
Value: Kops Poc
KopsPocManagementInstanceVolume:
Type: AWS::EC2::Volume
Properties:
AvailabilityZone: ap-south-1a
Size: 20
VolumeType: gp3
Tags:
- Key: Name
Value: tbe-kops-poc-volume
- Key: Project
Value: Kops Poc
之后,我可以通过 ssh 进入这个 EC2 实例。在这个实例中,我安装了 kops 和 kubectl。此外,还在环境变量中添加了以下内容:
export NAME="tbe-kops-poc.k8s.local"
export KOPS_STATE_STORE="s3://tbe-kops-poc-state-store"
export KOPS_OIDC_STORE="s3://tbe-kops-poc-oidc-store"
export MASTER_SIZE="t3a.large"
export MASTER_COUNT=1
export NODE_SIZE="t3a.large"
export NODE_COUNT=2
export ZONES="ap-south-1a"
export AMI_ID="ami-0cca134ec43cf708f"
export AWS_TAGS="Project=Kops Poc"
如果我调用以下命令
kops create cluster --name=${NAME} --cloud=aws --cloud-labels="${AWS_TAGS}" --node-count=${NODE_COUNT} --zones=${ZONES} --node-size=${NODE_SIZE} --master-count=${MASTER_COUNT} --master-zones=${ZONES} --master-size=${MASTER_SIZE}
无需提供之前创建的 VPC 的 ID,kOps 就能够创建集群,并kops validate cluster --name=${NAME} --wait 10m
能够验证该集群。
但是当我使用选项提供先前创建的 VPC 的 ID 时--vpc=<VPC_ID>
,就会kops validate cluster --name=${NAME} --wait 10m
超时。我甚至尝试过--wait 30m
,但结果是一样的。
我收到的错误如下:
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-ap-south-1a Master t3a.large 1 1 ap-south-1a
nodes-ap-south-1a Node t3a.large 2 2 ap-south-1a
NODE STATUS
NAME ROLE READY
VALIDATION ERRORS
KIND NAME MESSAGE
Machine i-0638d7877f8030ab3 machine "i-0638d7877f8030ab3" has not yet joined cluster
Machine i-071746f1afdb86c4f machine "i-071746f1afdb86c4f" has not yet joined cluster
Machine i-07e0de0b4734bc99c machine "i-07e0de0b4734bc99c" has not yet joined a cluster
我不知道为什么会发生此问题。如有任何建议或指点,我将不胜感激。
谢谢。
更新-1
我已经执行了带有选项的命令--ssh-public-key
。之后我通过 ssh 登录到主服务器。从 /var/log/syslog 我可以看到一些错误。日志如下:
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: Started Kubernetes Protokube Service.
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: protokube version 0.1
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.387031 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetToken
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.388630 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetDynamicData
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.394627 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.395412 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.399317 5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.461619 5310 gossip.go:59] gossip dns connection limit is:0
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.462052 5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: W0123 12:54:54.517477 5310 cluster.go:150] couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: E0123 12:54:54.519113 5310 main.go:197] error initializing secondary gossip: %!w(*errors.withStack=&{0xc000091280 0xc0007aa408})
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Main process exited, code=exited, status=1/FAILURE
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Failed with result 'exit-code'.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Scheduled restart job, restart counter is at 61.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: Stopped Kubernetes Protokube Service.
并且这个日志正在重复。
有一些观察结果。
观察1:
观察2:
没有 VPC ID 创建的主服务器具有以下输出ifconfig
:
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.20.57.190 netmask 255.255.224.0 broadcast 172.20.63.255
inet6 fe80::1c:f5ff:feba:2d4e prefixlen 64 scopeid 0x20<link>
ether 02:1c:f5:ba:2d:4e txqueuelen 1000 (Ethernet)
RX packets 896498 bytes 1279076057 (1.2 GB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 45327 bytes 5946634 (5.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 74883 bytes 23190458 (23.1 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 74883 bytes 23190458 (23.1 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth57fa98b3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 100.96.0.1 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::c4fc:cff:fec5:777f prefixlen 64 scopeid 0x20<link>
ether c6:fc:0c:c5:77:7f txqueuelen 0 (Ethernet)
RX packets 158 bytes 15575 (15.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 173 bytes 16763 (16.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vethd97efc68: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 100.96.0.1 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::b8d3:95ff:fecb:810b prefixlen 64 scopeid 0x20<link>
ether ba:d3:95:cb:81:0b txqueuelen 0 (Ethernet)
RX packets 1139 bytes 182549 (182.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1298 bytes 817513 (817.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
对于使用 VPC ID 创建的主服务器,具有:
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.0.2.197 netmask 255.255.255.0 broadcast 172.0.2.255
inet6 fe80::cb:acff:febb:a14c prefixlen 64 scopeid 0x20<link>
ether 02:cb:ac:bb:a1:4c txqueuelen 1000 (Ethernet)
RX packets 793161 bytes 1076223553 (1.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 96108 bytes 17026472 (17.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 281449 bytes 52537138 (52.5 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 281449 bytes 52537138 (52.5 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
此外,对于使用 VPC ID 创建的集群,该命令kubectl get pods --all-namespaces
返回:
E0123 13:28:46.878743 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.879320 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.880829 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.882274 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.883682 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?