Migration Roadmap

A brief roadmap to the future of the Early Detection Research Network (EDRN) portal, and for data-centric portals in general.

🏃‍♀️ Motivation

Plone is incredibly secure which is fantastic but it's also incredibly challenging to use. Major hurdles exist today for the EDRN portal:

The integration of data-rich statistical graphics.
- There are plenty of off-the-shell visualization and interactivity libraries, notably D3 and Plotly Dash. These can be dropped into numerous content management systems but not Plone. Plone has its own, different, incompatible way of handling JavaScript and CSS add-ons.
- We have abused iframes and pre-generated images to get the graphics we want so far, but this isn't an agile approach.
  - Plone especially handles iframes poorly because they come as concurrent HTTP requests, which slows down the entire page load.
The migration to Python 3. Python 2 reached end-of-life over a year ago, and the vulnerability scans are finding more and more problems.
- We are at the point now, that to continue to make portal releases, we can no longer upgrade dependent packages and must surgically excise shared objects, libraries, and individual Python source files to satisfy the scans.
- If the scan were to find a problem in Python 2 itself, we would have no way to proceed. This is a real possibility: security updates to Python 2 are no longer being published. If a vulnerability is discovered, it's game over.
- Plone finally supports Python 3, but Plone uses a "no-SQL" hierarchical database that contains serialized Python objects, and the objects are completely incompatible between Python versions. The EDRN portal contains thousands of Python 2 objects.
  - In some rare cases it's possible to automate the migration from Python 2 to 3, but that assuredly doesn't apply to us. Why? Because the current Python 2 database contains dozens, perhaps hundreds of objects whose code no longer exists thanks to numerous upgrades over time, and this prevents migrations from proceeding.
Major sites no longer use Plone. The following are big, public-facing websites that once used Plone but have faced the same challenges that we do and have since migrated to other platforms:
- science.nasa.gov
- cia.gov
- oxfam.org
- amnesty.org
- developer.ebay.com

🏁 Goals

Given the hurdles described above, we can at least enumerate some goals in a migration from Plone and the reasoning to do so as follows:

Open development to more people. Plone and Zope have such steep learning curves that it comes down to essentially one person on the Inforamtics Center team who is able to make progress.
Embrace future-looking technologies, including data graphics, advanced faceted and other modes of search, and interactivity, which are difficult if not impossible with Plone.
Use open standards such as relational databases instead of implementation-specific serialized object stores to help ensure future portability.
Remain on an upgrade path that avoids the use of end-of-life products and technologies.

🐦 Migration

Migrating away from Plone requires three phases with movement between each phase as obstacles are identified and uncertainties are narrowed. These phases are:

Technology identification. Although developing a web-solution from scratch is attractive, leveraging the power, features, and security of an existing content management system or web application framework is a vital time-saver. So far we have opted to explore Wagtail for its track record with JPL-hosted solutions. It itself is based on the extremely popular Django web framework. As these are Python-based technologies, there is optimism to re-use existing code from the EDRN portal, especially the RDF parsing and ingestion frameworks.
Prototyping. As part of the risk reduction approach, we plan on creating a prototype cancer data portal that meets the essential "go/no-go" decision points (see below). If for any reason the prototype fails to meet the criteria or demonstrates a non-functional requirement (such as difficulty of use similar to that of Plone) we'll return to step 1 and identify other technologies.
Full migration. Once the prototype satisfies and demonstrates the requirements, we will proceed with a full migration. The intent is not to add any new features at this time, but change the technology stack.

🚀 Go/No-go Features

The following is a brief checklist of the core features we must have in order to proceed. As part of the risk reduction approach, if a solution can't satisfy all of these options, we must find another path. The core features are:

LDAP authentication and authorization.
Protection of pages based on roles and privileges accorded to login.
- Group- and role-based access to certain portal sections.
- Mixed public- and private- access to biomarker pages (this is critical; biomarkers are are public but certain details are private to certain groups).
RDF ingest of data for page population.
Import of existing Plone "static" pages, files, and images.
Statistical charts and data graphics.
Ability to perform interactive editing of page content.
Tech stack that is 100% beyond "end of life" status.
Maintainability by content editors.
Implementation of look-and-feel: NCI branding, Section 508, and 21st Century IDEA.

We are conducting a series of experiments to determine if our newly chosen platform can satisfy each of the go/no-go features listed above. We record the results of each experiment in the following section. If all of the experiments are successful, we will then begin the migration as described in the next section.

👩‍🔬 Experiments

The experiments are a risk-reduction exercise that have a manifold purpose:

Each experiment assists with the learning process of the chosen migration platform by challenging the developers with an immersive and measurable problem-space that hones the skills with the platform.
Each experiment exhibits a solution of one aspect of the portal implementation and reduces risk by proving that an aspect—whether in the core architecture or in visible end-user feature—is feasible in the chosen technology platform.

The experiments are as follows:

Experiment 1: Database Population through RDF

Hypothesis. We are able to read existing RDF data and use it to populate the Wagtail backend database.

Process. Open and RDF data source as usual via rdflib (reusing the P5 portal RDF parser which is not-specific to Plone). Using the Django ORM, create matching database objects. Start with a simple RDF source such as body-system. Advance to complex RDF source such as publication.

Experiment 2: Database Population through Flat Files

Hypothesis. We are able to populate the Wagtail backend database with flat files exported from P5.

Process. Export the static file/folder structure (retrofitting the P4 export/setup) for P5. Write a program to crawl the files and, using the Django ORM, create matching database objects. Start with simple HTML pages and folders. Progress to Image and File objects.

Experiment 3: Permissions

Hypothesis. We are able to use EDRN Directory Service (LDAP) users in Wagtail.

Process. Connect Wagtail user/group permission systems to python-ldap. Connect python-ldap to ldaps://edrn-ds.jpl.nasa.gov. Start with assigning private HTML pages to specific groups and test if they're protected against the public and users not in those groups. Progress to mixed-content pages that vary what's shown based on permissions afforded to who's logged in.

Experiment 4: Theme

Hypothesis. We are able to display NCI branding.

Process. TBD.

Experiment 5: Charts

Hypothesis. We can incorporate rich data graphics and statistical charts into Wagtail.

Process. Integrate django-plotly-dash into a Wagtail instance. Create a Wagtail view that contains some data to plot. Create a Django page template that retrieves data from the view. Present the statistical graphics to a public, non-authenticated user. Include some basic interactivity, such as zooming on a graph.

Experiment 6: End-to-End Deployment

Hypothesis. We can deploy a basic-but-complete "hello EDRN" portal in Wagtail from developers' desktops through to the continuous integration/continuous deployment to the dev platform.

Process. Create the complete filesystem layout of the EDRN Wagtail portal along with database deployment and container composition. Populate the database with the EDRN home page—including images if possible but omitting them at first. Instantiate the portal in such an environment to support development activities. Configure Jenkins for deployment on the dev platform to receive pings or notifications from GitHub. Jenkins shall run unit, functional, and integration tests and—if all are successful—deploy a dev portal at https://edrn-dev.jpl.nasa.gov/portal/renaissance/.

🗺 Workplan

Time and again the proven approach for the development of large-scale software projects is to first create a skeletal architecture that can support the basic core function of the discipline area and then build on that structure with bolt-on features that satisfy each requirement. This reduces risk further by enabling the architecture to be re-shaped early-on before too many bolt-ons make that impossible. Another lesson learned is that the in-development morphology of the project should match the deployment shape as closely as it can. This mitigates the need for numerous and painful r

Thus, the workplan for the portal migration is roughly linear and as follows:

Creation of environmental context (automatic continuous integration/deployment of demonstration platform).
Containerization of basic requirements (with identical morphology for development, demonstration, acceptance testing, and operations).
Construction of initial architecture (and implementation within the container and context described in the immediately preceding steps) that includes a basic home page and look-and-feel.
Log-in for users.
RDF ingest for a basic knowledge object.
Implementation and the knowledge environment.
Import of the existing static pages.
Import of binary large objects (PDFs and the like).

While look-and-feel may seem like an unimportant need to fulfill early in the workplan, the sad truth of human psyche is that we judge books by their covers; as a result, having a portal that looks like it would work in production early on is vital for acceptance by stakeholders.

Astute readers may notice there are no formal demonstrations in the workplan; this is because demonstrations are continuous and available throughout. Our continuous integration and continuous delivery system (Jenkins) will ensure that at least nightly—if not more frequently—deployed platforms are available for feedback.

🗓 Schedule

A schedule of milestones for each important feature of the migration is now available.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly