Intelligent Data Selection

We synchronize dozens of data sources, themselves bringing together millions of data, whether on-chain, social, news, off chain APIs, or even on site metrics. EGG puts pre-modeled and labeled blockchain data in the hands of communities. Through dashboard and visualization tools, as well as auto-generated API endpoints, queries can easily be added to the algorithm.

In contrast to traditional data sources (e.g., centrally controlled corporate databases), blockchains by design provide several bene fits that are important for Data Science applications:

  • High data quality: All new records go through a rigorous, blockchain-specific validation process powered by one of the many “consensus mechanisms”. Once validated and approved, these records become immutable — no one can modify them for any purposes, good or malicious ones. Blockchain data are typically well structured and their schemas are well documented. This makes the life of a Data Scientist who works with such data much easier and more predictable.

  • Traceability: Blockchain records contain all the information necessary to track their origin and context, e.g., which address initiated a transaction, time when it happened, the amount of asset transferred, and which address received that asset. Moreover, most of the public blockchains have “explorers” — websites where anyone can examine any record that has ever been generated on the respective blockchain (see, for example, the Bitcoin, Ethereum, and Ripple explorers).

  • Built-in anonymity: Blockchains do not require their users to provide any personal information, which is important in a world where keeping one’s privacy has become a real issue. From a Data Scientist’s perspective, this helps to overcome the headaches associated with some of the regulations (e.g., GDPR in Europe) that require personal data to be anonymized before processing.

  • Large data volumes: Many Machine Learning algorithms require large amounts of data to train models. This is not a problem in mature blockchains, that offer gigabytes of data

Leveraging a proprietary knowledge graph able to draw inferences from data points, EGG created a multi- disciplinary platform that extracts insights and delivers products and services tailored to a number of specific industries. The code base, sources and methodologies are meant to be open-source and publicly available. They also meet all compliance requirements for traditional financial applications.

EGG combines both on-chain & off-chain data acquisition treating a constantly growing database containing millions of wallets. Over 150M+ labeled wallets are analyzed as well as their activity. Decentralization, trusted nodes, premium data, and cryptographic proofs are used to connect highly accurate and real-world data/APIs to any smart contract.

Separating the signal from the noise in blockchain data becomes child’s play. Our research spectrum is expanding on Ethereum and other smart contracts compatible blockchains.

Real-Time Alerts Smart alerts notify you when a specific event occurs - putting you in the know before anyone else. For example, a price alert appears when a whale has done a token trade or movement into or out of a contract over a specified size.

We make sure that you are notified on the second by multiplying the means of transmission. Alerts can either be in the form of a browser / mobile push notification, an on-site alert, an email, a call or an SMS.

On Chain Portfolios Data Acquisition EGG’s portfolio analytics solution provides support for multi-asset portfolio construction and back testing, quantitative performance and risk analysis of the portfolio and its components, measurement of performance against benchmark and calculation of investment risk indicators, analysis of stress events, drawdowns and value-at-risk, visualization of risk metrics and the efficient frontier, evaluation of sector allocation, correlations and decomposition of portfolio risk, optimization of the portfolio with predefined mean-variance strategies.

EGG has used on-chain data to fi nd specific patterns of activity that would give us a competitive edge when presenting digital assets. The mantra our team followed was simple and humble: Wallets with money know more than we do.

In doing so, we set out to fi nd the wallets that seemed to be moving the markets. Although we could not detect the trades that were occurring behind the scenes of centralized exchanges like Binance, Poloniex or Kraken for specific wallets, we could detect the wallets that were depositing and withdrawing the most value.

Exchange flows analysis are based on monitoring funds moving in and out of centralized exchanges by analyzing on-chain transactions. This process requires to associate specific addresses with exchanges and then accurately monitor the flows. Trying to do this in a manual way is highly inaccurate and unsustainable over time.

We follow a machine learning approach with some human interventions and the first wave of results seem robust. Models having best results are applied into EGG’s main data selection algorithm.

For example, we were able to identify all of the largest depositors on exchanges and follow where they were originating their funds.

We started to understand the flow of value and the personas of trading activity behind the wallets. In doing so, we have successfully been able to target opportunities that would otherwise go unnoticed. Some of them were to create models linking one or more wallets to the same entity, allowing to have a more global view of the winning activities.

By design, the identity of blockchain address owners is usually unknown. However, being able to categorize certain addresses into predefined groups or, better yet, link them to real-world entities can be of great value. This is especially true for addresses involved in illicit activities, such as money laundering, distribution of drugs, ransomware, Ponzi schemes, human trafficking, and even terrorism financing. The problem of address categorization has been successfully tackled with supervised learning in several studies that developed either binary classifiers or multi-class classifiers (e.g., “exchange”, “pool”, “gambling”, and “service” addresses).

The respective models have been developed using on-chain data and a variety of standard Machine Learning algorithms. Interestingly, in contrast to the cryptocurrency price forecasting studies, tree-based methods (in particular, Random Forest) have often outperformed Deep Learning-based algorithms. These will therefore be part of our solutions.

On Chain Assets Data Acquisition Cross-chain analytics is presented the right way. EGG gives you the tools to fi nd signals that may otherwise be unknown. Our combination of multi-chain data and labeled addresses is incredibly powerful.

The tools we combine give an overall snapshot of moving funds. For further details, you can trace transactions down to the most granular level. We track exchanges, token teams, and funds, which means you can see exactly which entities are accumulating - or selling off - a specific token. Token metrics on usage, engagement, and liquidity are available so you can make informed decisions before investing in a new token.

Price analysis is the ongoing process of cryptocurrency traders and analysts finding patterns in the market to determine optimal trading strategies and gauge market sentiment for the cryptocurrency market at large or for specific assets.

The two main indicators that are sought after are whether or not the market is bullish with upward-trending price action or bearish with downward pressure on price. Price analysis techniques vary and are often used in unison to provide as detailed a perspective of market conditions as possible.

The majority of published studies attempted to predict the future price of Bitcoin or Ethereum, either in absolute terms or as price direction (up or down). This has been done with algorithms ranging from simple logistic regression to XGBoost and Deep Learning-based methods.

Model inputs typically included a combination of on-chain (e.g., number of active addresses, transaction volume, mining difficulty, transaction graph metrics) and off-chain variables (e.g., trading volumes on exchanges). Overall, neural networks (in particular, those with LSTM-based architectures) have been found to outperform tree-based algorithms, although the accuracy of prediction even in the best-performing models was only marginally higher than a random guess.

EGG’s asset metrics include network data metrics and certain market data metrics that are aggregated at the asset-level (e.g., reference rates/prices and trusted Volume), as well as on the wallet-level.

Off Chain Oracles Data Acquisition Ethereum’s smart contracts are completely self-contained and any data or access to off-chain data is limited. For security purposes, this is necessary because execution in blockchains must be deterministic and the reaction to subsequent calls to outside APIs can change in unknown ways. Nevertheless, with additional outside data, some desirable forms of smart contracts are only feasible.

An Oracle is a conceptual solution that takes off-chain information from the real world and submits an immutable copy of this information into blocks, making it open for future use of smart contracts.

Chainlink teams brainstormed with us to integrate oracles into the programs set up by EGG.

Turn Data Into Strategies As of today, there is no such thing as on-chain high-frequency trading. The best that can be afforded is network coverage and a highly reputed node that listens and broadcasts information to different networks. Pre-block rendering of transactions and annotations are key components to the future of decentralized finance and trading.

Currently, almost all trade is done between the top Cryptocurrency Exchanges. At EGG, we are extending it to any of the centralized or decentralized exchanges carrying assets that can be collateralized. Because these signals are public knowledge, anyone can verify the activity. Therefore, the determining factor in maximizing profitability is the delta between the point the signal is issued and the speed of the response.

Last updated