Page tree
Skip to end of metadata
Go to start of metadata
NameStephen Wong

Project name

Clover
Milestones affectedMS6
Rescheduled or exempted?Rescheduled
Other projects affectedNone
Background description

New project, took time for initial patches to be submitted and merged.

An open source VNF was initially chosen, but turned out it isn't optimal for Clover's cloud native definition, therefore the team decided to pivot on having project's own cloud native VNF

Schedule impactStill on schedule to MS7, so no major impact, still on schedule for branch freeze and release

 

Recovery plan

Preliminary doc patch is on flight and will merge soon, need to merge test cases and integrate with Jenkins

Milestone schedule changeMS6 and MS7 will happen at the same time
RiskPatches are in flight, so risks should be minimal
StatusClosed
DecisionApproved

6 Comments

  1. If it's still on schedule for MS7 and patches are in flight, the risk is low. Approved. Out of curiosity which open source VNF was chosen and why was it suboptimal? Because it didn't follow the cloud native approach?

    1. It was Clearwater IMS. It was chosen from the get go for the project due to the fact that it presumably already runs on Kubernetes, is already componentized, and (predominantly) utilizes HTTP as transport for the communication between the components. But as we spent more time looking into making it work in the service mesh (Istio in our case), couple things stood out:

      1.) components are really tightly coupled. For example, the Homestead component (one of the db server components) supposedly serves as a service to be consumed by Sprout (SIP gateway), but if you look into Homestead code, it actually maintains data structure for Sprout connections. This badly violated the (service mesh) principle of decoupling application connectivity knowledge from application and delegate that to a "smart" pipe


      2.) a good chunk of Clover's goals is to utilize ecosystem (operation) projects such as Jaeger (opentracing), Prometheus, and fluentd / elasticsearch to demonstrate ease of operation and debuggability (of the mesh primarily); to gain application knowledge to correlate the various logs, traces, and metrics, we need some application (VNF in this case) control. In the beginning we thought we could just run an adapter or ambassador container within the pod — but the lack of clean-cut API and (even localhost) connectivity model for the Clearwater components make this an unexpectedly complicated task... so we decided it would be better to change course


      3.) this is probably more an Envoy problem — but sending non HTTP traffic over the service proxy does lead to traffic loss / blackhole that is very hard to debug in Envoy... so, it is decided by the team that for Fraser (particularly the first release of Clover), we should focus on HTTP based VNF...

  2. the team decided to pivot on having project's own cloud native VNF Stephen Wong

    My question is "what's the project's own cloud-native VNF?". Will Clover team develop a new VNF based on HTTP/HTTPs to meet the requirement of Istio?

    What's the timeline and what's the current status?


    I think it is OK, even Clover cannot keep up with the scheduler. It's a new project in Fraser and it won't affect any other project.

    1. We basically put together a mix of HTTP related Docker images (load balancer, HTTP firewall, web cache, and IDS) and put them together as a web / HTTP processing VNF in a particular topology (FW → LB → WC or directly to server, depending on LB version). We use primarily Snort and Nginx in Fraser (moving forward will explore modsecurity w/ nginx and haproxy), and develop gRPC server as a ambassador container to expose gRPC interface (and in terms act as gRPC client also) to communicate with each other (as control channel). So I hesitate to say we develop a "new" VNF — basically leverage current (well-tested and well used) HTTP based services already capable of running on Linux as components.

      Patches of those are currently under review:

      https://gerrit.opnfv.org/gerrit/#/c/54225/

      https://gerrit.opnfv.org/gerrit/#/c/54269/

      The beauty of these is that BOTH the control (via gRPC) and data plane (HTTP payload via some HTTP traffic generator such as locust) will traverse through Istio service mesh, and will be subjected to route rules (and other policies); furthermore, the opentracing calls from Envoy is able to trace both data and control message, giving us a preliminary view on latency and timing info. We also have control on the app to preserve the trace ID (embedded inside the HTTP header) — thus giving us a fully correlated trace from ingress to server.

  3. Looks ok to me also, nothing to add

  4. Allowing late exceptions increases the work in Functest and we should be part of the decision process.

    It results here in cutting the Functest Kubernetes container after tagging the repo (and then to update the releng jjobs).

    I fully disagree with 'no major impact' even if we help you.