MORPHEMIC (https://www.morphemic.cloud/) introduces two novel concepts in Cloud application modelling: 1) the polymorphic adaptation i.e. the ability to run and deploy a component, depending on its requirements and workload, in different technical forms and different environments, and 2) the proactive adaptation i.e. forecast of future resource needs and deployment configurations to ensure that adaptation can be done effectively and seamlessly for the users of the application. The two concepts enable to adapt and optimize, in a unique way, Cloud computing applications. Engineering, partner of the MORPHEMIC project, took part in the design of the architecture and the implementation of the polymorphic adaptation components.

How and why does the Polymorphic Adaptation leverage open source repositories?

MORPHEMIC’s Polymorphic Adaptation works at both the architectural and cloud service level, by defining the most optimal deployment model according to internal (e.g. available infrastructures) and external (e.g. load) constraints. To reach this task, one of the main activities that needs to be performed is to define the characteristics of the application and select one or more possible deployment models by comparing multiple architecture variants (i.e. FPGA, GPU). When selecting the most optimal deployment model it is important to have a range of choices as wide as possible, that is why we turned to online open-source repositories. On the web there is a wealth of applications coming from open-source projects which can support us in searching, retrieving and loading metadata information to be used for our application profiling. From a technical point of view the component is made up of some modules of code mining which extract, from code repositories, appropriate metadata information of the open source projects. The code and the associated metadata is found, stored and analysed. The crawling (the searching for useful code from external repositories) and the storage of the source code and metadata information projects are the main activities where Engineering was involved.

How are the remote repositories of open source projects exploited technically?

In MORPHEMIC the main components which enable the scarping, download and processing of the data found in repositories of open source software are the Web-Crawler and the KnowledgeBase.

The Web-Crawler scrapes data from dedicated open source repositories (as GitHub) or list of directories (as Apache) and classifies the code by leveraging machine learning. The Web-Crawler has been identified among some candidate open source assets for their ability in extracting sets of information (metadata) associated with projects available on known source repositories (for example GitHub). The Web-Crawler implementation was based on the outcome of an EU co-funded research project: MARKOS. In the first prototype release of the Web-Clawler the processed open source projects’ metadata was stored using a standard Relational Database Management Systems (RDBMS) and HTTP Post method was used to send the processed metadata information to the other services in a JSON data format based on a DOAP model. The standard DOAP model (https://github.com/ewilderj/doap) is an RDF Schema and XML vocabulary that describes free and open source software project metadata as release information, reference link to the repository where the source code is stored, licence of use, topics, labels, maintainers and developers, organization, programming language and other relevant information.

Once the concept was validated, we started working on a greater amount of metadata information to scrape and download and with real-time processing and analysis. At this point a dedicated service called KnowledgeBase was implemented based on ElasticSearch (https://www.elastic.co/) which takes care of the storage of the metadata processed. ElasticSearch is an open source search engine based on Lucene (https://lucene.apache.org/) search server, with Full Text search capabilities, and support for a distributed architecture. All its functionalities are natively exposed through a RESTful interface, while the information is stored in JSON (https://www.elastic.co/pricing/faq/licensing).

By leveraging the KnowledgeBase, MORPHEMIC is going to be able to perform fast searches, visualize and obtain almost real-time responses (with a typical margin of around one second). Last but not least the KnowledgeBase is scalable which is a critical aspect in the MORPHEMIC context because of how the platform works: namely on multiple servers exploiting a huge of amount of data. Currently implementation and integration work is being completed and soon the MORPHEMIC platform will be able to take advantage of the wealth of open source applications to improve its Polymorphic Adaptation capabilities.

For more information on the Web-Clawler and the KnowledgeBase you can check our deliverable at the following link: https://www.morphemic.cloud/wp-content/uploads/2021/04/D3.1-Software-tools-and-repositories-for-the-code-mining.pdf

Engineering Group is the Digital Transformation Company, leader in Italy and expanding its global footprint. The Engineering Group has been supporting the continuous evolution of companies and organizations for more than 40 years, thanks to a deep understanding of business processes in all market segments, fully leveraging the opportunities offered by advanced digital technologies and proprietary solutions. It integrates best-of-breed market solutions, managed services, and continues to expand its expertise through M&As and partnerships with leading technology players. The Group strongly invests in innovation through its R&I division by investing in international R&D projects while exploring groundbreaking technologies and developing new business solutions.