Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ThemeNeed / Pain PointPolestar category Suggested ByComments
Customer OPEX integration cost (Standard industry Frameworks & API's)

ONUG voiced that there can easily be single or even double orders of magnitudes of cost increase to integrating & maintaining multiple provider "sets of API's." This is in fact why ONUG exists - from a "reasonable" solution, ONUG has "network telemetry" API project to get a standard "INDUSTRY LEVEL" network telemetry API framework to make the solution cost effective for them as a customer base.

i.e - the customer has moved to software, and has an integration cost, could NFV please consider that in the "industry solution" that we also have a "per provider" industry level cost... For the INDUSTRY to provide a reasonable solution there should be some common solution. 

Model DrivenONUG via Michael Bugenhagen
  • Consider a common industry level Network telemetry API framework that removes the bulk of the integration cost for the customer... it should still be extendable per provider so it's compliant with principles.
Provider Touchless customer portal framework

The key to lowering cost and meeting customer expectation is a solid NFV customer portal schema that is "provider touch less"... this means all design should be centered around the portal - this is what the orchestrator is reacting to, and orchestrating - portal objects (resources and services) along with transparently showing the resource / service state, Fault, PM, ... (this is the primary goal)

 

Industry need - Match Customer Experience with "provider touch less service portals" (cloud portal alignment)

Provide the modular objects to enable a provider customizable, yet in-provider align-able "Cloud Portal Framework."

Align at "object levels" (Model driven)

  • Resource and Service objects and their states, PM, and FM must be shared between providers for "joint provider service delivery" (portal in portal, or portal behind a portal).

Large Enterprise customers such as ONUG are already asking for these "standard industry objects between providers so they can display those in their portals.

Templates, and model driven constructs (TOSCA, yang)Michael Bugenhagen

The Portal is the critical element, however we tend to focus on orchestration... if we orchestrate without a portal we have missed the key customer demand.

If we fail on "model driven" portal objects being used by orchestrators we have also failed to create a standard way to share resource views between providers, and customers.

Resilience at Scale

Platform resilience and performance for clusters up to 1000 nodes.

There are two different interpretations of nodes here -  compute nodes ( e.g. servers/ CPUs)  vs locations (NFVI-PoPs) . bryan: this was mainly meant to refer to distinct sites, i.e. a complete control plane (cloud/SDN VIMs) and virtual infra (VI) deployment (servers, etc) that provides service at a particular location. "Nodes" refers to the number of servers primarily, distributed across VIM and VI.

Platform Resilience has to do with the ability of the OPNFV reference platform to maintain normal operations as resources become available or unavailable (e.g. due to failures). This also include the case of expanding or shrinking the aggregate footprint of the NFV upon which the OPNFV Reference Platform is executing.

Small NFVI nodes may be homogeneous but larger NFVI deployments are likely to have variations in the deployed infrastructure e.g. compute nodes with different vintage server blades, BIOS versions etc.

bryan: Performance focuses include the control plane (e.g. orchestration, monitoring, closed-loop control) and VI (e.g. data plane), as the number of nodes scales up.

The performance of OPNFV Reference Platform operations may be variable with operations that impact a large number of nodes. For example the time to deploy a VNF may be larger for VNFs which have VNFCs deployed on different nodes.

The capacity of the NFVI to support VNFs also changes as the resources available change.

 

also need to support the platform at small scale - Enterprise NFV is considered a significant application by analysts. The platform in this case is typically a much smaller device - e.g. Pizza box

 NFV Reference Platform - TBD - Daisy may be addressing partialBryan Steven expanded discussion of  some of the terms
Security and Policy

Integrate platform services with enterprise security systems e.g. for RBAC, lock out, password aging, encryption, vulnerability scanning, policy management. Includes some items in scope for the PCI DSS feature of Keystone in Newton.

roots of trust and legal requirmeents for critical infrastructure

A recommendation of Linux hardening that doesn’t impact OpenStack functionality ( best practices) will be useful

-NFVI networking security mechanism

As an OPNFV Platform Provider or Network Service Operator, I need a security mechanism for the NFVI networking to apply a unified security policy, as a baseline, to any traffic on the whole of our managed NFVI. When we use a DPDK or SR-IOV to improve a VNF networking performance, the security policy, ex.) newtron security group, is shortcut and ignored. We need a solution without Linux kernel function like iptables of the host OS.  OVS based fierwall driver which is released in OpenStack Mitaka will be a canditate to solve this problem on DPDK side, but other data acceleration method such as SR-IOV should be coverd. 

 

  NFV Reference Platform

Copper - Policy platform

Models - Policy model

Moon - Security framework - analytic tools

TBD - security holes of platform & infrastructure

Need Data Plane WG

Bryan

+

Diego

+

Morgan

+

Hajime

note that the security project in OPNFV produced first openSCAP reporting in Colorado (http://artifacts.opnfv.org/logs/functest/lf-pod1/colorado/2016-09-01_07-47-15/colorado/security_scan/192.30.9.9_2016-09-01_07-41-22/report.hmtl). This is a first step, hardening shall be possible.
VNF Onboarding and Lifecycle Management

Drive convergence across End Users for a comprehensive, end-to-end (developer-to-operations) process of service/VNF development, packaging, onboarding, service creation, cataloging, deployment, and lifecycle management. Support this though tooling (e.g. SDKs) and open source for key components of the end-to-end process.

From the high level messaging page: As a VNF Provider, I use the OPNFV platform to  validate interoperability with multiple orchestration scenarios so that  ingestion and administration problems in my customer's operational deployments are minimized.

This could be refined as: As a VNF Provider, I use the OPNFV platform to  validate VNF Packaging so that  OPNFV platform returns no errors when onboarding a VNF under all of the OPNFV platform configuration options. This may require a number of Reference VNFs to validate all of the VNFC deployment constraints that could be expressed in the VNF Package.

 Automation & Testing

  • Models Project

 

Bryan

Steve added epic format statements on ingestion

created a daughter page to expand the VNF onboarding topic.

CI/CD & Day 2 OperationsSupport migration to a true devops based production deployment, with infra lifecycle management features such as in-place upgrades, interoperability between platform/component versions, etc. Includes various prerequisites and enablers such as API micro-versioning, containerization of OpenStack services, Control plane maintenance mode.

 Automation

  • Escalator

 

Bryan 
Networking for NFVEnhance network capabilities and performance to meet NFV demands. Includes such capabilities as driving Gluon (common port abstraction enabling flexible networking / SDNC provider integration), L3VPN-based service chaining, dataplane acceleration (e.g. DPDK, SRIOV, FD.io/VPP), Smart NICs (e.g. P4), ...NFV Reference PlatformBryan 
Multisite Federation

Support multisite deployments ranging from simple site pairs, to massively distributed and hierarchical deployments. Key aspects include identity/auth federation, quota management, catalog distribution, global/local resource orchestration, ...

NFV Reference PlatformBryan 
Platform Operational Insights

Across platform components, provide consistent approaches to deep observability into platform (control plane, physical/virtual infra) and workload analytics. This include fast site deployments - both incorporation of new white box NFVI ( storage, servers, switches, networking) within an NFVI-PoP and the integration of new sites within a multisite deployment.

Manage complex validation challenges driven by frequent changes of a huge number of interdependent platform and application components. 

 Automation?Bryan

 Steve added fast deployment

Herbert added "Manage complex valication challenges ..." (could be also separate topic)

Cloud-Native Architecture

Evolve toward platform infra and workload support based upon a containerized, microservices oriented model. This will be necessary to overcome the performance/cost limitations of VM-based infra and workloads, as well as achieve true agile lifecycle management.

Cloud Native Architecture applies independently to VNFs and the OPNFV Reference platform. A distinguishing feature of Cloud native applications is that they are designed for operation in a cloud infrastructure.

NFV Reference PlatformBryan 
Cloud-Native Architecture / Application CI/CD

Support migration to a true devops based management of applications, with automated testing and staging in order to unleash full speed of agile software delivery 

Devops / Continuous Integration/ Continuous Delivery processes will apply independently for the VNFs and the OPNFV Reference platform.  A distinguishing feature of Devops/ CI/CD is the faster and asynchronous cycling of versions so that delivered functionality and performance improve with frequent small increments 

The OPNFV Reference Platform  needs to have automated procedures for the onboarding of VNFs which are available on a CI/CD basis.

The deployed VNFs need to remain in normal operation through OPNFV Reference Platform  CI/CD upgrades.  [see also API backwards compatibility below]

Automation?HerbertCould be separate topic or combined with "Cloud-Native Architecture"
Reference VNFs

Build a catalog of production-quality Reference VNFs for common network functions, to be used in OPNFV testing and otherwise.

As an OPNFV Platform Provider, I need a set of Reference VNFs to validate that VNFs (including VNFCs)  are correctly deployed by the OPNFV Platform such that the deployed configuration complies with the VNFD information in the VNF Package. (e.g., VNFC deployments respect affinity/ anti-affinity constraints, VNFC deployments respect the portability constraints, Internal connectivity between VNFC deployments established.) Separate  Reference VNFs may be required for each deployment constraint in the VNF Package.

As an OPNFV Platform Provider or Network Service Operator,  I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that infrastructure performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., compute intensive, memory intensive, networking traffic intensive, intensive VNFC instance deployment/ deletion operations etc.)

As an OPNFV Platform Provider or Network Service Operator,  I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that OPNFV Platform performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., intensive VNFC instance deployment/ deletion operations, intra-VNF networking changes, VNFC migration operations, etc.)

Automation / TestingBryan

Steve added epic format statements for the cases of Ingestion validation, infrastructure benchmarking, and VNF Management operations.

We will likely need a separate page for discussion around the scope of potential reference VNFs. I suspect we would need to include some of the following:

  • a null VNF that simply provides connectivity between externally visible connectivity points of the VNF
  • an ECHO VNF that simply reflects back packets received on an externally visible VNF connectivity point.

 Openstack Operators WG working page on the subject: https://etherpad.openstack.org/p/generic-nfv-platform

 

 

API Backwards Compatibility

 All OPNFV Platform APIs must be versioned. (micro-versioning is desirable). An API version must not be removed without a deprecation cycle (2 releases). New incompatible APIs may only be introduced with a new version. The old (deprecated) API must be supported for the duration of the deprecation cycle.

As a Network Service Provider I need 2 release cycles to work with a VNF provider to ensure upgraded OPNFV Platform APIs are supported, so that a smooth migration of existing services can be planned.

NFV Reference Platform Steve 

Real-time performance monitoring and fast fault detection

The amount of alarms and statistical data collected from NFVI to detect failures, would increase because of growths of NFVI resources (CPU, memory, disk.... collected from OS and hypervisor) we should monitor .

However, such data collection becomes coarse-grained (e.g., 5 minute-interval) to prevent overload on the monitoring system.  Eventually, It causes delay in detecting failure as well as healing failure.  

So, achievement of real-time monitoring and fast detection mechanism is required in NFV platform.

NFV Reference PlatformMasanoribryan: metrics collection for fault/performance purposes is enabled by VES/Barometer.
VM placement optimization

Linked to multi sites and/or cloud native application.

There are already some solutions but not lots of clear guidelines/rules.

Placement within one VIM: Some solutions exist to take into account affinity/anti-affinity rules, EPA requirements. 

Multi-site deployment: Complicates things quite a bit. Operators have to consider which sites the VNF/VMs needs to be deployed in which depends on (1) geography of data centers, (2) network topologies within and between data centers, (3) latency requirements from access networks and other resources, (4) capacities needed/allocated at each site, (5) vendor placements rules if any, (6) redundancy requirements (e.g., N:1), and (7) recovery procedures in case of failure.

Instead of a laborious human planning/design task, this needs to be automated as much as possible. One possibility is to create operator deployment “policy” or rules to be input to the Orchestration system, as these considerations are stemming from the operator deployment environment instead of the VNF/service itself. But open to any solutions.

NFV Reference platformMorgan, Sergebryan: metrics collection for realtime inventory/placement purposes is enabled by VES/Barometer. Projects such as Valet (proposed by AT&T for OpenStack) are specifically addressing advanced workload placement based upon realtime inventory and SLAs.
Metrics for billing/chargingAre the current metrics in OpenStack and/or the controllers enough to build the charging chain for the VNF (bandwidth used by a VM/Tenant/Stack)NFV Reference platformMorganbryan: metrics collection is enabled by VES/Barometer
EnergyTool missing to have a clear view on the power consumption. Power consuption is a driver of NFV/virtualization however today there is no native feedback on such consuption, and not connection with VM placementNFV Reference platformMorgan

As far as I know there is a project in ODL and they used to be project in OpenStack some time ago.

bryan: should be enabled by VES/Barometer

troubleshooting SLAs

tooling also missing to have a reliable and fast feedback in case of trouble

NFV reference platformMorgan

There are projects in OPNFV (pinpoint, bottlenecks) dealing with such issues

bryan: enabled by VES/Barometer, at least for data collection.

Interoperabilityinteroperability assessment with orchestration platforms (linked to VNF onboarding)NFV reference platformDiegobryan: should be covered by VNF Onboarding above.
Security aspectsespecially those related to roots of trust and legal requirements for critical infrastructuresNFV reference platformDiegobryan: should be covered by Security and Policy above.
Documentation / technical architectureDue to the extreme modularity of the systems, Defining a clear list of flows needed for filtering between all components will be helpfulNFV reference platformMorganIn OPNFV Pharos defines some rules, but due to diversity of installer/scenarios, it should be also up to the scenario owner to provide a flow matrix associated to the integration scenario
Log management

Linked to troubleshooting/SLA, today they are lots of modules/components and logs are everywhere...you do not really care when everything is fine.

But when there are troubles, it is sometimes a nightmare to find the right log file at the right places. An aggregation of the logs and some format convergence will save time

NFV reference platformMorgan

Task seems not easy as the any Telco Cloud solution based on Open Source component is the result of an integration where logs are managed differentely. It could however be a good opportunity for a new project dealing with integration such as OPNFV.

bryan: Syslog archiving (as one key log example) is in scope for VES.

Backup&Restore Ability to restore customer virtual machines if the stack is deleted and recreated (not sure if this is red hat specific).NFV reference platformMorganThere are several technical options but no native/integrated/reference ones
Real HADespite of HA modes for single components, practice showed that manual operations may be needed to restore the project. A better end-to-end HA is needed and if all the sub systems are not HA, it shall be clear to knwo what is HA and what is notNFV reference platformMorganIn OPNFV there is a HA project addressing these issues. It already created test cases (executed in yardstick). However recent discussions also show that HA was probably misleading in the scenario naming as most of the controllers are not HA today
Service Function PlaneAbility to compose network services with flexible combinations of VNF and PNF embodied service functions participating in dynamic service function chains that are underlay network agnostic (with service functions being unaware of network topology), with service function data plane interfaces being standardized and independent of the underlay network, enabling service functions to be placed anywhere within an administrative domain on any platform-supported underlay network.NFV reference platformMeadorbryan: likely aligned with cloud-native goals of abstraction from the infra env. 1st-gen VNF solutions are inherently complex re compatibility, due to their own size and complexity/dependencies.
Certification Program(s)Some VIM suppliers (OpenStack distros) initiated programs to "Certify" variant VNFs from different suppliers as successfully on-boarded on their VIM. Such certification programs should be led by OPNFV so as to have it structured and verified by vendor agnostic entity . We might expand the scope to other interfaces (VNFM/VIM, VNFM/VNF-O,..etc) .NFV reference platformSamerbryan: OPNFV should promote its role in VNF portability, but diversity in all aspects (even certification) is a market enabler, as long as it does not exacerbate fragmentation and become a market inhibitor.

Versioning + Skip release(s) when upgrading a (live) system

Linked to CI/CD + API backwards compatibility, upgrade from version A to version B must not require manual operations especially for critical components. Moreover it shall be possible to skip intermediate version and upgrade only stable version not necessarily from version N to N+1.

To upgrade OpenStack used in a commercial deployment a huge amount of integration efforts and testing is required despite the progress in automation of deployment and testing. Given this and the fact that OpenStack releases on a 6-month cadence, I, as an end user, may not want to update my commercial deployment with the same cadence, but may want to choose a yearly or even bi-yearly upgrade of my systems. For example, I want to be able to directly “jump” from D-release to G-release skipping the E- and F-releases. Ideally, such system upgrades shall be done “live”, i.e. without the need shut down everything running on the older release.

Note: we may also want to consider the option of live/rolling upgrades for OPNFV.

NFV Reference platform

Ashiq, Gerald, Morgan

bryan: suggest feedback from Escalator project.
 portabilityPlatform and VNFs are reported as having difficulties in deployment due to minor changes in NFVI versions e.g. BIOS upgrades, changes in brand of device deployed etc... need processes to support in service deployments across new infrastructure instances as deployments scale. exacerbates the 1000 node scaling issue above, as the deployments at 1000  NFVI-PoPs are unlikely to be at the same hardware version. Need to support a hardware lifecycle for refresh of new compute, storage, networking whiteboxes. NFVI Platform  + automation Stevebryan: reported issues should be brought specifically so that we can address them specifically. An enabler (part) of the solution is likely a deep HW env inspection/inventory capability and compatibility verification pre-install.
 

‘Data Plane’ VNF Decoupling from Physical high throughput NICs and usage of virtual NICs with ‘same’ high speed / throughput

 
  • The initial VNFs to be deployed in a NFV infrastructure will be VNF typically acting on the user/data plane, like DPI Functions, virtual BNGs, and so on
  • These VNFs have strong requirements in terms of speed / throughput; for these VNF there is a need to maintain high speed and throughput (near the physical speed and throughput allowed by the physical NICs) and this is currently achieved allowing the VNF to go directly to the NIC and bypassing the hypervisor, creating a strong binding between the specific VNF and the Physical NICs of a specific Physical Hardware.
  • Need a way to guarantee VNF reaching the same speed/ throughput whereas decoupling the VNF from the physical context and so obtaining the virtualization advantages (flexibility , hardware decoupling and so on); this issue may be addressed in OPNFV DPACC and other projects

 VNF requirments /

VNF Packaging

Andrea & Cecelia 

(12/7 email to list)