Page tree
Skip to end of metadata
Go to start of metadata


Lab Tasks and Resources Overview

DomainTaskHardware Resource

Effort

MilestoneDependenciesStatusNotes
ONAPONAP on K8S on x86Auto Pod 1 (x86 server)
2/2/18
Done
ONAP on K8S on ARMLaas ARM Server1 week2/16/18ARM servers in LaaSBlocked
DCAE on OPNFV on ARMAuto ARM Pod
2/9/18
In Progress
DCAE on K8S on x86Auto Pod 1 (x86 server)
2/23/18ONAP needs to deliverBlocked
DCAE on K8S on ARMLaaS ARM Server
2/23/18ONAP/ARM-in-LaaSBlocked
OPNFVOpenStack for ONAP VMs (e.g. DCAE)Auto ARM Pod
2/2/18
Done
ARM VIM as an ONAP targetAuto ARM Pod
2/2/18
Done
x86 VIM as ONAP targetLaaS x86 Server


In Progress

ONAP on Kubernetes

Test environment

LabPODOSCPUMemoryStorage
UNH laas Lab10.10.30.157CentOS 7.3.1611Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz117G802G

Deployment guide:

https://wiki.onap.org/display/DW/ONAP+on+Kubernetes

vFW blitz daily at 1200EDT until Kubecon - https://wiki.onap.org/display/DW/Vetted+vFirewall+Demo+-+Full+draft+how-to+for+F2F+and+ReadTheDocs

(official) https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/OOM%20User%20Guide/oom_user_guide.html?highlight=oom

OOM Discussion list (post any issues here to notify and get responses from OOM personnel like the current) - https://lists.onap.org/pipermail/onap-discuss/2017-September/004616.html

In general most of the sync issues are because of the long lead time getting docker images from ONAP Nexus3 - this can be fixed by prewarming your own docker repo or running the deployment a 2nd time - There is a JIRA on running a script across the yamls to extract out the docker images and load them before bringing up the containers - see https://jira.onap.org/browse/OOM-328

TestStatusNote
Full deployin progress

clear iptables rules to deploy rancher server and agent on one machine ( solved: centos issue )

rancher can't bring up kubernetes. all pods in kube-system stay at pending state ( blocker )

Fix:

https://jira.onap.org/secure/attachment/10501/prepull_docker.sh

OOM-328 - Preload docker images script before createAll.sh will allow 7 min startup IN PROGRESS

Partial deployin progress

aaf: (Successful)

aai: (Successful)

appc: (Successful)

clamp: (CrashLoopBackOff)

cli: (Successful)

consul: (Successful)

kube2msb: (CrashLoopBackOff)

log: (Successful)

message-router: (Successful)

msb: (Successful)

mso: (Successful)

multicloud: (Successful)

policy: (Successful)

portal: (CrashLoopBackOff)

robot: (Successful)

sdc: (Successful)

sdnc: (Successful)

vid: (Successful)

vnfsdk: (Successful)

dcae: (TODO)


ONAP on OpenStack

Test environment (Huawei x86)

LabPODOSCPUMemoryStorage
Huawei Shanghai Labhuawei-pod4ubuntu

(jumphost) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz(48)

(host1) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz(48)

(host2) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz(48)

(host3) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz(48)

(host4) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz(48)

(host5) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz(48)

256G

256G

256G

256G

256G

256G

4T

4T

4T

4T

4T

4T


Deployment guide:

1.Set up Openstack 

Deploy os-nosdn-nofeature scenario using Euphrates Compass4nfv: Containerized Compass

2. ONAP Installation in Vanilla OpenStack

https://wiki.onap.org/display/DW/ONAP+Installation+in+Vanilla+OpenStack


TestStatusNote
Full deployin progress

Heat Template and env parameters: https://nexus.onap.org/content/sites/raw/org.onap.demo/heat/ONAP/1.1.0-SNAPSHOT/

VMs:

onap-aai-inst1 (Active)

onap-aai-inst2 (Active)

onap-appc (Active)

onap-clamp (Active)

onap-dcae-bootstrap (Active)

onap-dns-server (Active)

onap-message-router (Active)

onap-multi-service (Active)

onap-policy (Active)

onap-portal (Active):

onap/portal-apps:1.3.0 (Up):

  • http://<onap-portal ip>:8989/ONAPPORTAL/login.html is 404

ERROR in catalina.out: ERROR in ch.qos.logback.core.joran.spi.Interpreter@114:21 - no applicable action for [totalSizeCap], current ElementPath  is [[configuration][appender][rollingPolicy][totalSizeCap]]

onap/portal-wms:1.3.0 (Up)

onap/cli:1.3.0 (Up)

onap/portal-db:1.3.0 (Up)

onap-robot (Active)

onap-sdc (Active)

onap-so (Active)

onap-vid (Active)



Test environment (UNH ARM)

LabPODOSCPUMemoryStorage
UNH Auto Labarm-podubuntu(jumphost) Cavium(R) ThunderX(TM) 2.0GHz(48)64G450G



(host1) Cavium(R) ThunderX(TM) 2.0GHz(96)128G450G



(host2) Cavium(R) ThunderX(TM) 2.0GHz(96)128G450G



(host3) Cavium(R) ThunderX(TM) 2.0GHz(48)64G450G



(host4) Cavium(R) ThunderX(TM) 2.0GHz(48)64G450G



(host5) Cavium(R) ThunderX(TM) 2.0GHz(48)64G450G

Deployment guide:

1.Set up Openstack

on the jumphost do the following:

mkdir -p /armband

cd /armband

git clone -b stable/euphrates http://github.com/opnfv/armband

cd /armband/armband

        ci/deploy.sh -b file:///home/ubuntu -l arm -p pod-auto -s os-nosdn-nofeature-ha -B admin1_br0,mgmt1_br0,,,

        note: the pdf and idf referenced via the arm and pod-auto args for -l and -p, are referenced below

2. Follow steps described above for Huawei lab.

Current status is bringing up heat template, onap_openstack.yaml.  

  • Working through a combination of issues, including 
    • Timeouts for some neutron service requests related to floating IP addresses.
      • These timeouts don't appear to have much affect, as the VMs are all ping-able/ssh-able after adding necessary rules (see below). 
      • stack_status_reason   | Resource CREATE failed: NeutronClientException: resources.dcae_c_floating_ip: <html><body><h1>504 Gateway Time-out</h1>                      | The server didn't respond in time.
    • Volume create of vol1-sdc-data is failing.  It's a 100G volume...complaint is not enough space
      • FUEL-330 - Getting issue details... STATUS  There appears to be a problem with cinder-volume.  It's a loop device (maybe a problem, maybe not), but it's also set to 20G. 
      • So the problem is either that cinder is using the loop device or that the loop device is set to 20G.
      • Workaround for now is to change the size of the 100G volume to 10G in the heat template file "onap_openstack.yaml".
  • Due to the above timeouts, the stack creation usually fails, although most/all of the VMs are created.
    • In numerous "stack create" of the heat template, the stack has created successfully (as far as heat is concerned) on two occasions.  It has failed probably 10-15 times.
  • The user_data script that is run by cloud-init is not succeeding:
    • onap_portal
      • The installation of the docker components fails for ARM"
        • The docker and docker-engine) are done via "apt" from the dockerproject.org repo, but this repo has no support for arm64
        • The docker-compose component is done via "curl" from the docker project.org repo, also not supported for arm64
      • I modified the "portal_install.sh" script that is retrieved from the nexus onap via wget, put the modified copy on the jump server, and modified the user_data script
      • The ubuntu 16.04 image (14.04 is requested by the heat template, but docker is not supported for arm in 14.04) only has enough storage defined in the image for /dev/vda1 to have 2G, which is overrun when various docker images are pulled, causing the docker pulls to fail with "not enough space on device."  
        • I updated the flavor for "large" to have 10G of root disk, and this is no longer an issue (about 3G of space appears to be needed).
      • The portal_install.sh gets further now, but still encounters issues
        • Most notably pulling the nexus3.onap.org:10001/onap/portal-app
        • Error: image onap/portal-app:v1.3.0 not found
    • onap_vid
      • Same as above.  This hasn't been modified as for onap_portal
    • onap-dcae-bootstrap
      • Looks similar to above.
    • Suspect all of the xxx_install.sh scripts have the same issue.

Troubleshooting Environment Bring-Up after Server Relocation

Notes for this episode (1/18/18 thru 1/25/18) are archived here: ARM Pod Deployment: Debugging after Server Relocation.

Current Deployment: 2/2/18

The environment was re-deployed on Friday, 2/2/18, with an "os-nosdn-nofeature-ha" scenario.

Network Topology

The 6 ARM servers and their roles and IPMI addresses are shown in the following table:

NicknameServer DescriptionIPMI AddressPublic IP AddressStandard Work Assignment
Big Cavium 196 core, 128G10.10.52.10
Compute Node 1, or 'cmp001'
Big Cavium 296 core, 128G10.10.52.11
Compute Node 2, or 'cmp002'
Small Cavium 148 core, 64G10.10.52.1210.10.50.12Jump Host
Small Cavium 248 core, 64G10.10.52.13
KVM Host for Controller VMs, or 'kvm01'
Small Cavium 348 core, 64G10.10.52.14
KVM Host for Controller VMs, or 'kvm02'
Small Cavium 448 core, 64G10.10.52.15
KVM Host for Controller VMs, or 'kvm03'

The jump host is running Ubuntu 16.04.3 LTS, user/passwd is "ubuntu"/"ubuntu".

We use the Euphrates/Stable MCP (new name for Fuel) installer, which is described here: 

http://docs.opnfv.org/en/stable-euphrates/submodules/fuel/docs/release/installation/installation.instruction.html

We use the following PDF file: Auto Lab ARM Pod Description File (PDF)

We use the following IDF file: Auto Lab ARM Pod IDF File

The above files are listed here for information. They will likely change soon for schema validation and eventually be checked into the Pharos git repo.

A quick tour of how to collect information about the OPNFV installation on the ARM Pod is at: Tour of ARM Pod Installation.

Note: If you want to skip the above tour, the floating controller address is 172.16.10.10, accessible from the jump server for user ubuntu using the ssh key in /var/lib/opnfv/mcp.rsa.  The credentials are in the files /root/keystonerc and /root/keystonercv3.  It's possible that a previous user has copied these credentials to the /home/ubuntu directory.

Remote Management

Remote access is required for …

    • Developers to access deploy/test environments (credentials to be issued per POD / user) at 100Mbps upload and download speed

OpenVPN is generally used for remote however community hosted labs may vary due to company security rules. Please refer to individual lab documentation/wiki page as each company may have different access rules and policies.

Basic requirements:

    • SSH sessions to be established (initially on the jump server)
    • Packages to be installed on a system by pulling from an external repo.

Firewall rules accommodate:

    • SSH sessions

Lights-out management network requirements:

    • Out-of-band management for power on/off/reset and bare-metal provisioning
    • Access to server is through a lights-out-management tool and/or a serial console
    • Refer to applicable light-out management information from server manufacturer





  • No labels

5 Comments

  1. For the pending state this is a result of parallel loading of docker images causing startup contention - the fix for this is to pre pull all docker images into your local repo - see the script on the kubernetes page.

    https://wiki.onap.org/display/DW/ONAP+on+Kubernetes#ONAPonKubernetes-QuickstartInstallation

    https://jira.onap.org/secure/attachment/10501/prepull_docker.sh

    OOM-328 - Preload docker images script before createAll.sh will allow 7 min startup IN PROGRESS

     

    For the search line limits - this is a known pre Rancher 2.0 bug - it is a red-herring in that it does not affect functionality - it is actually a warning on more than 5 DNS search terms - you can ignore it.

    https://github.com/rancher/rancher/issues/9303

     

    Also everything after "cli" in setenv.sh is still a WIP as the merge from open-o into onap occurs - R1 RC0 was last Thu and the Integration team is addressing issues first in HEAT - with the each team adjusting OOM config to match.

    /michael

    1. How about running a Registry as a pull-through cache https://docs.docker.com/registry/recipes/mirror/#how-does-it-work ?

      here an interesting thread about use cases (https://github.com/docker/distribution/issues/1431) as well as a suggestion to use https://github.com/virtuald/docker-registry-cache

       

      pros:

      • to me it seems we just need to update the docker daemon with the mirror address 
      • minmal traffic hitting the original registries

      cons:

      • yet another component to take care of
      1. Good idea,

           Some of the images are built daily, hence the daily pull refresh.  Yes a docker registry on premises would be better - some of the teams like Bell do this - you can even setup one directly on its' own container.  Anyway nexus3 got a double allocation a week ago - so pull time for all of onap is down to 20 min. with a 7 min startup (without DCAE) - which requires Openstack or Openstack on K8s

        /michael