如何为 kOps 配置共享 VPC?

如何为 kOps 配置共享 VPC?

正如本文所述文档,我想在现有 VPC 中使用 kOps 创建 Kubernetes 集群。我已经创建了 VPC、Internet 网关、路由表、子网和 EC2 实例,我想用它们来调用命令kops create cluster和其他内容。这些资源是使用以下 CloudFormation 模板创建的:

AWSTemplateFormatVersion: "2010-09-09"
Description: "AWS CloudFormation Template for Kops Poc"

Resources:
  KopsPocVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 172.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: tbe-kops-poc-vpc
        - Key: Project
          Value: Kops Poc

  KopsPocVPCCidrBlockIPv6:
    Type: AWS::EC2::VPCCidrBlock
    Properties:
      VpcId: !Ref KopsPocVPC
      AmazonProvidedIpv6CidrBlock: true

  KopsPocDHCPOptions:
    Type: AWS::EC2::DHCPOptions
    Properties:
      DomainName: ap-south-1.compute.internal
      DomainNameServers:
        - AmazonProvidedDNS
      Tags:
        - Key: Name
          Value: tbe-kops-poc-dopt
        - Key: Project
          Value: Kops Poc

  KopsPocVPCDHCPOptions:
    Type: AWS::EC2::VPCDHCPOptionsAssociation
    Properties:
      VpcId: !Ref KopsPocVPC
      DhcpOptionsId: !Ref KopsPocDHCPOptions

  KopsPocNetworkAcl:
    Type: AWS::EC2::NetworkAcl
    Properties:
      VpcId: !Ref KopsPocVPC
      Tags:
        - Key: Name
          Value: tbe-kops-poc-acl
        - Key: Project
          Value: Kops Poc

  KopsPocInboundNetworkAclEntryIPv4:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 100
      Protocol: -1
      RuleAction: allow
      Egress: false
      CidrBlock: 0.0.0.0/0

  KopsPocInboundNetworkAclEntryIPv6:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 101
      Protocol: -1
      RuleAction: allow
      Egress: false
      Ipv6CidrBlock: ::/0

  KopsPocOutboundNetworkAclEntryIPv4:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 100
      Protocol: -1
      RuleAction: allow
      Egress: true
      CidrBlock: 0.0.0.0/0

  KopsPocOutboundNetworkAclEntryIPv6:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 101
      Protocol: -1
      RuleAction: allow
      Egress: true
      Ipv6CidrBlock: ::/0

  KopsPocInternetGateway:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: tbe-kops-poc-igw
        - Key: Project
          Value: Kops Poc

  KopsPocVPCGatewayAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref KopsPocVPC
      InternetGatewayId: !Ref KopsPocInternetGateway

  KopsPocRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref KopsPocVPC
      Tags:
        - Key: Name
          Value: tbe-kops-poc-rt
        - Key: Project
          Value: Kops Poc

  KopsPocRouteIPV4:
    Type: AWS::EC2::Route
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      RouteTableId: !Ref KopsPocRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref KopsPocInternetGateway

  KopsPocRouteIPV6:
    Type: AWS::EC2::Route
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      RouteTableId: !Ref KopsPocRouteTable
      DestinationIpv6CidrBlock: ::/0
      GatewayId: !Ref KopsPocInternetGateway

  KopsPocSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref KopsPocVPC
      CidrBlock: 172.0.1.0/24
      AvailabilityZone: ap-south-1a
      Tags:
        - Key: Name
          Value: tbe-kops-poc-subnet
        - Key: Project
          Value: Kops Poc

  KopsPocSubnetRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref KopsPocSubnet
      RouteTableId: !Ref KopsPocRouteTable

  KopsPocSubnetNetworkAclAssociation:
    Type: AWS::EC2::SubnetNetworkAclAssociation
    Properties:
      SubnetId: !Ref KopsPocSubnet
      NetworkAclId: !Ref KopsPocNetworkAcl

  KopsPocManagementInstanceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      VpcId: !Ref KopsPocVPC
      GroupDescription: Kops Poc Management Instance Security Group
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
      Tags:
        - Key: Name
          Value: tbe-kops-poc-management-sg
        - Key: Project
          Value: Kops Poc

  KopsPocManagementInstance:
    Type: AWS::EC2::Instance
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      AvailabilityZone: ap-south-1a
      ImageId: ami-0cca134ec43cf708f
      InstanceType: t3a.large
      KeyName: tbe-kops-poc
      NetworkInterfaces:
        - NetworkInterfaceId: !Ref KopsPocEth0
          DeviceIndex: 0
      Volumes:
        - Device: /dev/sdf
          VolumeId: !Ref KopsPocManagementInstanceVolume
      IamInstanceProfile: TBEKopsPocEC2ServiceRole
      Tags:
        - Key: Name
          Value: tbe-kops-poc-management-instance
        - Key: Project
          Value: Kops Poc

  KopsPocIPAddress:
    Type: AWS::EC2::EIP
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      Domain: vpc
      InstanceId: !Ref KopsPocManagementInstance
      Tags:
        - Key: Name
          Value: tbe-kops-poc-eip
        - Key: Project
          Value: Kops Poc

  KopsPocEth0:
    Type: AWS::EC2::NetworkInterface
    Properties:
      GroupSet:
        - !Ref KopsPocManagementInstanceSecurityGroup
      SubnetId: !Ref KopsPocSubnet
      Tags:
        - Key: Name
          Value: tbe-kops-poc-eth0
        - Key: Project
          Value: Kops Poc

  KopsPocManagementInstanceVolume:
    Type: AWS::EC2::Volume
    Properties:
      AvailabilityZone: ap-south-1a
      Size: 20
      VolumeType: gp3
      Tags:
        - Key: Name
          Value: tbe-kops-poc-volume
        - Key: Project
          Value: Kops Poc

之后,我可以通过 ssh 进入这个 EC2 实例。在这个实例中,我安装了 kops 和 kubectl。此外,还在环境变量中添加了以下内容:

export NAME="tbe-kops-poc.k8s.local"
export KOPS_STATE_STORE="s3://tbe-kops-poc-state-store"
export KOPS_OIDC_STORE="s3://tbe-kops-poc-oidc-store"
export MASTER_SIZE="t3a.large"
export MASTER_COUNT=1
export NODE_SIZE="t3a.large"
export NODE_COUNT=2
export ZONES="ap-south-1a"
export AMI_ID="ami-0cca134ec43cf708f"
export AWS_TAGS="Project=Kops Poc"

如果我调用以下命令

kops create cluster --name=${NAME} --cloud=aws --cloud-labels="${AWS_TAGS}" --node-count=${NODE_COUNT} --zones=${ZONES} --node-size=${NODE_SIZE} --master-count=${MASTER_COUNT} --master-zones=${ZONES} --master-size=${MASTER_SIZE}

无需提供之前创建的 VPC 的 ID,kOps 就能够创建集群,并kops validate cluster --name=${NAME} --wait 10m能够验证该集群。

但是当我使用选项提供先前创建的 VPC 的 ID 时--vpc=<VPC_ID>,就会kops validate cluster --name=${NAME} --wait 10m超时。我甚至尝试过--wait 30m,但结果是一样的。

我收到的错误如下:

INSTANCE GROUPS
NAME                    ROLE    MACHINETYPE     MIN     MAX     SUBNETS
master-ap-south-1a      Master  t3a.large       1       1       ap-south-1a
nodes-ap-south-1a       Node    t3a.large       2       2       ap-south-1a

NODE STATUS
NAME    ROLE    READY

VALIDATION ERRORS
KIND    NAME                    MESSAGE
Machine i-0638d7877f8030ab3     machine "i-0638d7877f8030ab3" has not yet joined cluster
Machine i-071746f1afdb86c4f     machine "i-071746f1afdb86c4f" has not yet joined cluster
Machine i-07e0de0b4734bc99c     machine "i-07e0de0b4734bc99c" has not yet joined a cluster

我不知道为什么会发生此问题。如有任何建议或指点,我将不胜感激。

谢谢。

更新-1

我已经执行了带有选项的命令--ssh-public-key。之后我通过 ssh 登录到主服务器。从 /var/log/syslog 我可以看到一些错误。日志如下:

Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: Started Kubernetes Protokube Service.
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: protokube version 0.1
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.387031    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetToken
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.388630    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetDynamicData
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.394627    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.395412    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.399317    5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.461619    5310 gossip.go:59] gossip dns connection limit is:0
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.462052    5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: W0123 12:54:54.517477    5310 cluster.go:150] couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: E0123 12:54:54.519113    5310 main.go:197] error initializing secondary gossip: %!w(*errors.withStack=&{0xc000091280 0xc0007aa408})
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Main process exited, code=exited, status=1/FAILURE
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Failed with result 'exit-code'.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Scheduled restart job, restart counter is at 61.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: Stopped Kubernetes Protokube Service.

并且这个日志正在重复。

有一些观察结果。

观察1:

如果不提供 VPC ID,它创建的路由表具有以下路由: 在此处输入图片描述

并且通过 VPC ID,它创建的路由表具有: 在此处输入图片描述

观察2:

没有 VPC ID 创建的主服务器具有以下输出ifconfig

ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.20.57.190  netmask 255.255.224.0  broadcast 172.20.63.255
        inet6 fe80::1c:f5ff:feba:2d4e  prefixlen 64  scopeid 0x20<link>
        ether 02:1c:f5:ba:2d:4e  txqueuelen 1000  (Ethernet)
        RX packets 896498  bytes 1279076057 (1.2 GB)
        RX errors 0  dropped 2  overruns 0  frame 0
        TX packets 45327  bytes 5946634 (5.9 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 74883  bytes 23190458 (23.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 74883  bytes 23190458 (23.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth57fa98b3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.96.0.1  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::c4fc:cff:fec5:777f  prefixlen 64  scopeid 0x20<link>
        ether c6:fc:0c:c5:77:7f  txqueuelen 0  (Ethernet)
        RX packets 158  bytes 15575 (15.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 173  bytes 16763 (16.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethd97efc68: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.96.0.1  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::b8d3:95ff:fecb:810b  prefixlen 64  scopeid 0x20<link>
        ether ba:d3:95:cb:81:0b  txqueuelen 0  (Ethernet)
        RX packets 1139  bytes 182549 (182.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1298  bytes 817513 (817.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

对于使用 VPC ID 创建的主服务器,具有:

ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.0.2.197  netmask 255.255.255.0  broadcast 172.0.2.255
        inet6 fe80::cb:acff:febb:a14c  prefixlen 64  scopeid 0x20<link>
        ether 02:cb:ac:bb:a1:4c  txqueuelen 1000  (Ethernet)
        RX packets 793161  bytes 1076223553 (1.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 96108  bytes 17026472 (17.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 281449  bytes 52537138 (52.5 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 281449  bytes 52537138 (52.5 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

此外,对于使用 VPC ID 创建的集群,该命令kubectl get pods --all-namespaces返回:

E0123 13:28:46.878743    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.879320    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.880829    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.882274    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.883682    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

相关内容