Page tree
Skip to end of metadata
Go to start of metadata

This page can be used to share end-user pain points with various components/aspects of NFV/SDN platforms. This can help us coalesce initiatives, messaging around them, and strategy aspects in support of them.

Links to Epics created for the Euphrates release (created as subpages of this page):

Subpage draft Epics / User Stories:

ThemeNeed / Pain PointCategory Suggested ByComments
Customer OPEX integration cost (Standard industry Frameworks & API's)See Customer Interfaces

 

  

 

Provider Touchless customer portal framework

See Customer Interfaces 

  

 

Resilience at Scale

See Resilience at Scale

   
Security and Policy

See Security and Policy

 

  
VNF Onboarding and Lifecycle ManagementSee VNF Lifecycle

 

 

 

 

CI/CD & Day 2 OperationsSupport migration to a true devops based production deployment, with infra lifecycle management features such as in-place upgrades, interoperability between platform/component versions, etc. Includes various prerequisites and enablers such as API micro-versioning, containerization of OpenStack services, Control plane maintenance mode.

 Automation

  • Escalator

 

Bryan 
Networking for NFVEnhance network capabilities and performance to meet NFV demands. Includes such capabilities as driving Gluon (common port abstraction enabling flexible networking / SDNC provider integration), L3VPN-based service chaining, dataplane acceleration (e.g. DPDK, SRIOV, FD.io/VPP), Smart NICs (e.g. P4), ...NFV Reference PlatformBryan 
Multisite Federation

Support multisite deployments ranging from simple site pairs, to massively distributed and hierarchical deployments. Key aspects include identity/auth federation, quota management, catalog distribution, global/local resource orchestration, ...

NFV Reference PlatformBryan

"As multi-cloud is one of the major technical tendency for the next generation cloud infrastructure, the Moon project have been starting working on a multi-PODs security solution. The main idea is to install different modules on different PODs which will collaborate together.

The deployment of such a solution includes:

  1. Install different modules on each POD
  2. Set up a connection between these PODs

However, a connection between geographically distributed PODs seems to be very difficult even impossible. We hope that the OPNFV can start working on that, take Moon as a scenario to set up such a multi-PODs scenario."

Ruan He Moon PTL

Platform Operational Insights

Across platform components, provide consistent approaches to deep observability into platform (control plane, physical/virtual infra) and workload analytics. This include fast site deployments - both incorporation of new white box NFVI ( storage, servers, switches, networking) within an NFVI-PoP and the integration of new sites within a multisite deployment.

Manage complex validation challenges driven by frequent changes of a huge number of interdependent platform and application components. 

 Automation?Bryan

 Steve added fast deployment

Herbert added "Manage complex valication challenges ..." (could be also separate topic)

Cloud-Native Architecture

Evolve toward platform infra and workload support based upon a containerized, microservices oriented model. This will be necessary to overcome the performance/cost limitations of VM-based infra and workloads, as well as achieve true agile lifecycle management.

Cloud Native Architecture applies independently to VNFs and the OPNFV Reference platform. A distinguishing feature of Cloud native applications is that they are designed for operation in a cloud infrastructure.

NFV Reference PlatformBryan 
Cloud-Native Architecture / Application CI/CD

Support migration to a true devops based management of applications, with automated testing and staging in order to unleash full speed of agile software delivery 

Devops / Continuous Integration/ Continuous Delivery processes will apply independently for the VNFs and the OPNFV Reference platform.  A distinguishing feature of Devops/ CI/CD is the faster and asynchronous cycling of versions so that delivered functionality and performance improve with frequent small increments 

The OPNFV Reference Platform  needs to have automated procedures for the onboarding of VNFs which are available on a CI/CD basis.

The deployed VNFs need to remain in normal operation through OPNFV Reference Platform  CI/CD upgrades.  [see also API backwards compatibility below]

Automation?HerbertCould be separate topic or combined with "Cloud-Native Architecture"
Reference VNFs

Build a catalog of production-quality Reference VNFs for common network functions, to be used in OPNFV testing and otherwise.

As an OPNFV Platform Provider, I need a set of Reference VNFs to validate that VNFs (including VNFCs)  are correctly deployed by the OPNFV Platform such that the deployed configuration complies with the VNFD information in the VNF Package. (e.g., VNFC deployments respect affinity/ anti-affinity constraints, VNFC deployments respect the portability constraints, Internal connectivity between VNFC deployments established.) Separate  Reference VNFs may be required for each deployment constraint in the VNF Package.

As an OPNFV Platform Provider or Network Service Operator,  I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that infrastructure performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., compute intensive, memory intensive, networking traffic intensive, intensive VNFC instance deployment/ deletion operations etc.)

As an OPNFV Platform Provider or Network Service Operator,  I need a set of Reference VNFs to benchmark performance of a specific OPNFV Platform deployment on an NFVI Node using a defined reference load, so that OPNFV Platform performance benchmarks from different deployments area comparable. Separate Reference VNFs may be required for different representative workloads (e.g., intensive VNFC instance deployment/ deletion operations, intra-VNF networking changes, VNFC migration operations, etc.)

Automation / TestingBryan

Steve added epic format statements for the cases of Ingestion validation, infrastructure benchmarking, and VNF Management operations.

We will likely need a separate page for discussion around the scope of potential reference VNFs. I suspect we would need to include some of the following:

  • a null VNF that simply provides connectivity between externally visible connectivity points of the VNF
  • an ECHO VNF that simply reflects back packets received on an externally visible VNF connectivity point.

Openstack Operators WG working page on the subject: https://etherpad.openstack.org/p/generic-nfv-platform

 

 

API Backwards Compatibility

 All OPNFV Platform APIs must be versioned. (micro-versioning is desirable). An API version must not be removed without a deprecation cycle (2 releases). New incompatible APIs may only be introduced with a new version. The old (deprecated) API must be supported for the duration of the deprecation cycle.

As a Network Service Provider I need 2 release cycles to work with a VNF provider to ensure upgraded OPNFV Platform APIs are supported, so that a smooth migration of existing services can be planned.

NFV Reference Platform Steve 

Real-time performance monitoring and fast fault detection

The amount of alarms and statistical data collected from NFVI to detect failures, would increase because of growths of NFVI resources (CPU, memory, disk.... collected from OS and hypervisor) we should monitor .

However, such data collection becomes coarse-grained (e.g., 5 minute-interval) to prevent overload on the monitoring system.  Eventually, It causes delay in detecting failure as well as healing failure.  

So, achievement of real-time monitoring and fast detection mechanism is required in NFV platform.

NFV Reference PlatformMasanoribryan: metrics collection for fault/performance purposes is enabled by VES/Barometer.
VM placement optimization

Linked to multi sites and/or cloud native application.

There are already some solutions but not lots of clear guidelines/rules.

Placement within one VIM: Some solutions exist to take into account affinity/anti-affinity rules, EPA requirements. 

Multi-site deployment: Complicates things quite a bit. Operators have to consider which sites the VNF/VMs needs to be deployed in which depends on (1) geography of data centers, (2) network topologies within and between data centers, (3) latency requirements from access networks and other resources, (4) capacities needed/allocated at each site, (5) vendor placements rules if any, (6) redundancy requirements (e.g., N:1), and (7) recovery procedures in case of failure.

Instead of a laborious human planning/design task, this needs to be automated as much as possible. One possibility is to create operator deployment “policy” or rules to be input to the Orchestration system, as these considerations are stemming from the operator deployment environment instead of the VNF/service itself. But open to any solutions.

NFV Reference platformMorgan, Sergebryan: metrics collection for realtime inventory/placement purposes is enabled by VES/Barometer. Projects such as Valet (proposed by AT&T for OpenStack) are specifically addressing advanced workload placement based upon realtime inventory and SLAs.
Metrics for billing/chargingAre the current metrics in OpenStack and/or the controllers enough to build the charging chain for the VNF (bandwidth used by a VM/Tenant/Stack)NFV Reference platformMorganbryan: metrics collection is enabled by VES/Barometer
EnergyTool missing to have a clear view on the power consumption. Power consuption is a driver of NFV/virtualization however today there is no native feedback on such consuption, and not connection with VM placementNFV Reference platformMorgan

As far as I know there is a project in ODL and they used to be project in OpenStack some time ago.

bryan: should be enabled by VES/Barometer

troubleshooting SLAs

tooling also missing to have a reliable and fast feedback in case of trouble

NFV reference platformMorgan

There are projects in OPNFV (pinpoint, bottlenecks) dealing with such issues

bryan: enabled by VES/Barometer, at least for data collection.

Interoperabilityinteroperability assessment with orchestration platforms (linked to VNF onboarding)NFV reference platformDiegobryan: should be covered by VNF Onboarding above.
Security aspectsespecially those related to roots of trust and legal requirements for critical infrastructuresNFV reference platformDiegobryan: should be covered by Security and Policy above.
Documentation / technical architectureDue to the extreme modularity of the systems, Defining a clear list of flows needed for filtering between all components will be helpfulNFV reference platformMorganIn OPNFV Pharos defines some rules, but due to diversity of installer/scenarios, it should be also up to the scenario owner to provide a flow matrix associated to the integration scenario
Log management

Linked to troubleshooting/SLA, today they are lots of modules/components and logs are everywhere...you do not really care when everything is fine.

But when there are troubles, it is sometimes a nightmare to find the right log file at the right places. An aggregation of the logs and some format convergence will save time

NFV reference platformMorgan

Task seems not easy as the any Telco Cloud solution based on Open Source component is the result of an integration where logs are managed differentely. It could however be a good opportunity for a new project dealing with integration such as OPNFV.

bryan: Syslog archiving (as one key log example) is in scope for VES.

Backup&Restore Ability to restore customer virtual machines if the stack is deleted and recreated (not sure if this is red hat specific).NFV reference platformMorganThere are several technical options but no native/integrated/reference ones
Real HADespite of HA modes for single components, practice showed that manual operations may be needed to restore the project. A better end-to-end HA is needed and if all the sub systems are not HA, it shall be clear to knwo what is HA and what is notNFV reference platformMorganIn OPNFV there is a HA project addressing these issues. It already created test cases (executed in yardstick). However recent discussions also show that HA was probably misleading in the scenario naming as most of the controllers are not HA today
Service Function PlaneAbility to compose network services with flexible combinations of VNF and PNF embodied service functions participating in dynamic service function chains that are underlay network agnostic (with service functions being unaware of network topology), with service function data plane interfaces being standardized and independent of the underlay network, enabling service functions to be placed anywhere within an administrative domain on any platform-supported underlay network.NFV reference platformMeadorbryan: likely aligned with cloud-native goals of abstraction from the infra env. 1st-gen VNF solutions are inherently complex re compatibility, due to their own size and complexity/dependencies.
Certification Program(s)Some VIM suppliers (OpenStack distros) initiated programs to "Certify" variant VNFs from different suppliers as successfully on-boarded on their VIM. Such certification programs should be led by OPNFV so as to have it structured and verified by vendor agnostic entity . We might expand the scope to other interfaces (VNFM/VIM, VNFM/VNF-O,..etc) .NFV reference platformSamerbryan: OPNFV should promote its role in VNF portability, but diversity in all aspects (even certification) is a market enabler, as long as it does not exacerbate fragmentation and become a market inhibitor.

Versioning + Skip release(s) when upgrading a (live) system

Linked to CI/CD + API backwards compatibility, upgrade from version A to version B must not require manual operations especially for critical components. Moreover it shall be possible to skip intermediate version and upgrade only stable version not necessarily from version N to N+1.

To upgrade OpenStack used in a commercial deployment a huge amount of integration efforts and testing is required despite the progress in automation of deployment and testing. Given this and the fact that OpenStack releases on a 6-month cadence, I, as an end user, may not want to update my commercial deployment with the same cadence, but may want to choose a yearly or even bi-yearly upgrade of my systems. For example, I want to be able to directly “jump” from D-release to G-release skipping the E- and F-releases. Ideally, such system upgrades shall be done “live”, i.e. without the need shut down everything running on the older release.

Note: we may also want to consider the option of live/rolling upgrades for OPNFV.

NFV Reference platform

Ashiq, Gerald, Morgan

bryan: suggest feedback from Escalator project.

 

Boston Forum session: https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading

 portabilityPlatform and VNFs are reported as having difficulties in deployment due to minor changes in NFVI versions e.g. BIOS upgrades, changes in brand of device deployed etc... need processes to support in service deployments across new infrastructure instances as deployments scale. exacerbates the 1000 node scaling issue above, as the deployments at 1000  NFVI-PoPs are unlikely to be at the same hardware version. Need to support a hardware lifecycle for refresh of new compute, storage, networking whiteboxes. NFVI Platform  + automation Stevebryan: reported issues should be brought specifically so that we can address them specifically. An enabler (part) of the solution is likely a deep HW env inspection/inventory capability and compatibility verification pre-install.
 

‘Data Plane’ VNF Decoupling from Physical high throughput NICs and usage of virtual NICs with ‘same’ high speed / throughput

 
  • The initial VNFs to be deployed in a NFV infrastructure will be VNF typically acting on the user/data plane, like DPI Functions, virtual BNGs, and so on
  • These VNFs have strong requirements in terms of speed / throughput; for these VNF there is a need to maintain high speed and throughput (near the physical speed and throughput allowed by the physical NICs) and this is currently achieved allowing the VNF to go directly to the NIC and bypassing the hypervisor, creating a strong binding between the specific VNF and the Physical NICs of a specific Physical Hardware.
  • Need a way to guarantee VNF reaching the same speed/ throughput whereas decoupling the VNF from the physical context and so obtaining the virtualization advantages (flexibility , hardware decoupling and so on); this issue may be addressed in OPNFV DPACC and other projects

 VNF requirments /

VNF Packaging

Andrea & Cecelia 

(12/7 email to list)

 
  • No labels

12 Comments

  1. "Networking for NVF" feels a bit more vague than the rest of the table. 

  2. An interesting exercise, which may be too detailed, may be to see which OPNFV projects and which upstream activities support some of these buckets. 

  3. I can come back with more detail as needed. But if this is already too detailed,

    1. I guess that would be something like:

      • Resilience at scale
        • OPNFV projects: Doctor, Predictor, Multisite, HA, ...
        • Upstream projects: Pacemaker, OpenSAF, Chaos Monkey, Blazar, Nova, Senlin, ...
      • Security and policy
        • OPNFV projects: Moon, Security Scanner, Copper, ...
        • Upstream projects: OpenSCAP, ...

      I don't think that would be overkill at all.

  4. Bryan: this is a good list. So this is from the OPNFV user group, the Openstack Telco WG or the AT&T Openstack User group?

    Just want to know where this got created. - maybe you should add a column on which group to see if they align. thanks.

  5. Isn't multi-site the same as scale to 1000? ie the first item covers multi-site item?

  6. Polestar WG is evaluating proposals in terms of categories and it may be useful to assign these pain points to categories so that we can effectively communicate with them, so I've added a column to the table above to assign these pain points to a Polestar category. The categories they use are:

    NFV Reference Platform (What)
    –Carrier Grade
    –Security
    –Interoperability
    –Policy Management
    –Compliance
    –Globally Applicable
    –Scalable
    –Multi-site
    –Evolvable/modular
    Automation (MANO+) (How to execute)
    –CI/CD
    –Service Assurance
    –Service Provisioning
    –Service Agitiliy
    –Optimized resource usage
    •Data Plane (what)
    •Testing (How to execute)
    •Documentation
  7. I think there is a hot topic missing: release management both for the OPNFV platform and the VNFs.

    For cloud native architecture, we may reference existing best practices such as the 12 factor app (https://12factor.net/)

    1. @Eric, Perhaps release management could be a generalization of the API backward compatibility pain point identified above ? Can you break down a little more what you would see under this release management topic? 

    2. I assume cloud native architecture is a separate pain point compared to release management.  We have a couple of entries above on cloud native architecture. The 12 factor approach is perhaps a way to determine whether a particular application is cloud native or not. 

      What is it that you want to see developed as Cloud native - the VNFs or the OPNFV Platform?

      Can you articulate WHY a user cares that something is cloud native - preferably in a one sentence Epic format?  

  8. Gerald Kunzmann & Ashiq: I think we can vers your pain point with the "Versioning" one

     

    1. Hi Morgan, sorry, it seems I had missed this one. I fully agree to merge. Will you do it or you prefer I move it?