Weekly Digest

Extraction Attacks, Text-to-Image, and InnerEye Deployment

By Featured, Weekly DigestNo Comments

Does my model know your phone number?

Recent work from researchers at Google, Stanford, UC Berkeley, OpenAI, Harvard, and Apple researches privacy considerations in Large Language Models


Machine Learning models are trained using large amounts of data. This is a no-brainer. However, the data used for training can sometimes contain sensitive information, especially in specific industries such as Healthcare or Finance. As Google’s People AI Research (PAIR) group explains, “if they’re [the Machine Learning models] not trained correctly, sometimes that [sensitive] data is inadvertently revealed.” For an interactive and intuitive explanation, I highly recommend checking out PAIR’s article entitled “Why Some Models Leak Data”.

What’s new

In a recent paper by renowned AI institutions, researchers attempt to extract training data from large language models using an adversarial method called “extraction attack”. The method contains two important steps:

  1. Generation of a large number of samples by interacting with the Large Language Model as a black-box. This is done by feeding the model prompts of text and collecting the model’s output samples.
  2. Selection of the samples that have an abnormally high likelihood. More specifically, what was done in the paper was the comparison of the likelihood of sample using a large version of GPT-2 compared to a smaller version of GPT-2. The grounds for this approach are that smaller models (those with a smaller number of parameters) are less prone to memorization.

Source: Google AI Blog

The selected samples are then manually searched for on the web to check if they can be found verbatim. If this is the case, the representative researcher from OpenAI can indicate the number of training documents that include the sample.

The paper found 604 (out of 1’800 selected samples) that contain verbatim reproduced text that can only be found in one document in the training data.

These memorized samples include personally identifiable information (names, phone numbers, and email addresses), JavaScript code, log messages, 128-bit UUIDs, and others.

Source: BAIR Blog

It is however important to note that in most of these cases, the unique document that contains the training example contains multiple instances of the memorized sample. This is mentioned not only on the Google AI Blog but also in an in-depth paper explanation video by Yannic Kilcher.

Why it matters

Extracting training data from models that use private data for training can be extremely harmful. While the training data from the model studied in the paper is public, it raises serious questions concerning data privacy. Misuse of personal data can present serious legal issues.

At the moment, there is a legal grey area as to how data privacy regulations like the GDPR should apply to Machine Learning models. For instance, users have the right to be forgotten. Internet service users are allowed to request that the maintainer of the service delete all the personal data they have gathered on them. Does this mean companies will need to retrain their models from scratch every time a user invokes this right? Even when training these models costs upwards of several million USD?

As posted on the Berkeley AI Research Blog, “The fact that models can memorize and misuse an individual’s personal information certainly makes the case for data deletion and retraining more compelling.”

What’s next

So, Large Language Models can sometimes memorize training data. In some cases, this memorization is problematic, as it can lead to ethical and legal consequences. What do we do? Who is responsible for preventing such issues?

A common response to such issues is the use of Differential Privacy. As explained by PAIR: “Training models with differential privacy stops the training data from leaking by limiting how much the model can learn from any single data point. Differentially private models are still at the cutting edge of research, but they’re being packaged into machine learning frameworks, making them much easier to use.”

Yet, it seems that applying differential privacy in a principled and effective way is difficult when it comes to preventing the memorization of data found on the web. Particular examples are information snippets that occur multiple times in the same document, or copyrighted information such as complete books.

This raises the following question: perhaps training models with the entire content of the internet is a bad idea in the first place? The corpus of the internet is not sanitized, it raises immense privacy and legal concerns, and it contains significant inherent biases. The researchers explain that the better way forward could be a better curation of the dataset used for training. They state that “if even a small fraction of the millions of dollars that are invested into training language models were instead put into collecting better training data, significant progress could be made to mitigate language models’ harmful side effects.”

What does an armchair in the shape of an avocado look like?

Microsoft-backed research institution OpenAI shows impressive progress in text-to-image synthesis


If I were to ask you what the important AI model advances in 2020 were, your answer would most likely include some of the following: Generative Models, Transformers for text (GPT-3), and Transformers for Images (ViT, Image GPT), and Transformers again (AlphaFold 2).

It was only a matter of time before one of the big players decided to merge all of these topics together and create a large scale text-to-image model.

What’s new

It comes as no surprise that OpenAI, the tech giant responsible for GPT-3 and Image GPT, has taken on the challenge of creating large models that work with text-image pairs. Last week, the company posted two blog posts introducing two such models: DALL·E and CLIP. The former is a model that leverages a reduced version of GPT-3 (using 12 out of its standard 175 billion parameters) and is trained to generate images from text descriptions. The latter is a different neural network trained to learn visual concepts from natural language to classify images in a “zero-shot” manner, meaning the classes are only observed at inference time, not during training.

As you might have guessed, DALL·E is a transformer language model. Using a (presumably large) training datasets of text-image pairs, the model receives text and image as a single stream of data in a tokenized manner. This procedure as well as the tokenization allows for image generation from scratch.

While OpenAI has yet to publish a paper explaining the theoretical details behind DALL·E, the blog post allows us to make some educated guesses regarding the models architecture. When looking at the references made to other research papers in the side-notes, it seems that the model is a combination of GPT-3 and a Vector Quantized-Variational AutoEncoder (VQ-VAE).

As hypothesized in a Explanation Video by Yannic Kilcher, the custom scaled-down GPT-3 model would be responsible for taking the text input and transforming it into a sequence of tokens that adhere to a specific vocabulary. The objective here is that this sequence is a sensible latent representation of the image. The decoder part of a VQ-VAE model can then use this sequence of tokens to generate the image.

The results found in the blog post are very impressive. OpenAI has cherry-picked some textual inputs that you can modify in this part of the blog post. I highly recommend you to go play around with their examples to get a good grasp of how well the models performs.

Source: OpenAI Blog

While the model is excellent at reproducing local information (such as different styles, textures, and colors), it is less accurate when it comes to global information (such as counting and ordering objects, temporal and geographical knowledge).

Why it matters

The business potential of text-to-image use-cases is immeasurable. Particular fields like stock photography and illustration are the first that come to mind. While the use of Transformers comes as no surprise, the impressive results consolidate the reason behind the trend. This work is a very important milestone in the text-to-image synthesis area of research, which has been around since only 2016.

What’s next

The researchers state that they “plan to provide more details about the architecture and training procedure in an upcoming paper.” Future research will tackle “how models like DALL·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer-term ethical challenges implied by this technology,” the team wrote.

To play around with DALL·E yourself, check out OpenAI’s blog post.

Deep Learning Tumor Contouring deployed in Addenbrooke’s Hospital

A Microsoft AI tool has been deployed in a Cambridge Hospital to help speed up cancer treatment


The potential impact of Deep Learning solutions on augmenting the imaging workflow in healthcare is immense. As we’ve seen over the past years, the technology needed exists. Computer Vision algorithms consistently achieve high performances in object detection and image classification tasks when constrained to specific domains. The main obstacles are (1) small training sets, (2) data privacy and security compliance concerning the use of patient scans, (3) approving ethical considerations, and (4) finding a seamless integration method of such Deep Learning models into health professional’s daily workflow. In fact, the very large majority of health care solutions leveraging Deep Learning methods remain in research labs.

What’s new

After working on a pilot version of a CT scan tumor highlighting software with the Microsoft Research Lab in Cambridge for eight years, Addenbrooke’s Hospital will use the solution called InnerEye, in practice.

The solution leverages a Neural Network to contour tumors and healthy organs on 3D CT (Computed Tomography scans. This lengthy procedure (several hours) is usually performed by very specialized health professionals. It is an extremely important part of a patient’s cancer treatment as these contours are used to guide high-intensity radiation beams whose objective is to damage the DNA of cancerous cells, all while avoiding the surrounding healthy organs.

Source: Microsoft AI Blog

Trained on the hospital’s own data, InnerEye is able to perform this contouring task 13 times faster than a human. As stated by Dr. Raj Jena, Oncologist at Addenbrooke’s, “the results from InnerEye are a game-changer. To be diagnosed with a tumor of any kind is an incredibly traumatic experience for patients. So as clinicians we want to start radiotherapy promptly to improve survival rates and reduce anxiety. Using machine learning tools can save time for busy clinicians and help get our patients into treatment as quickly as possible.”

The whole procedure is integrated into an oncologist’s routine using the augmented intelligence framework. In fact, the contouring done by InnerEye is checked and confirmed by a clinical oncologist before the patient receives treatment. “The AI is helping me in my professional role; it’s not replacing me in the process. I double-check everything the AI does and can change it if I need to. The key thing is that most of the time, I don’t need to change anything” says Yvonne Rimmer, a Clinical Oncologist at the Hospital.

Why it matters

The deployment of Deep Learning in an oncologist’s daily routine has never been done before. In doing so, Addenbrooke’s Hospital is the first hospital in the world to successfully leverage this type of ground-breaking technology in order to improve survival rates for some cancers.

In a country where up to half of the people are diagnosed with cancer at a certain stage in their life, such technologies will allow doctors to treat patients faster.

What’s next

The goal of Microsoft’s InnerEye project is to “Democratize Medical Imaging AI”. As such, they have made the code available online. In the case of Addenbrooke’s, the Deep Learning models are hosted on Microsoft Azure, ensuring that all data is securely kept in the UK and available only to the oncologists who need it.

This highlights an important aspect with respect to this deployment. While the software has been open-sourced by Microsoft, its clinical use remains subject to regulatory approval. Addenbrooke’s is a medical center renowned internationally for dealing with rare and complex conditions needing cutting-edge facilities and equipment as well as the best doctors. In that regard, it comes as no surprise that the Hospital is at the forefront of innovation in healthcare. It does, however, raise the question: when and where can we expect widespread use of such technologies?

Don’t forget to subscribe!

If you want to receive a summarized version of the Visium Digest in your inbox every two weeks, subscribe below!

2020 AI Recap, Global ML Community, and Mobile Object Detection

By Featured, Weekly DigestNo Comments

Artificial Intelligence in 2020

As we reach the end of the year, we look back at the major Artificial Intelligence milestones of 2020


This year has been AI’s most exciting year yet. As of today, the implementation of Machine Learning solutions in global industries is still being lead by the early adopters. However, amidst the Coronavirus pandemic, people across the world are beginning to grasp the impact of the massive digitization that is to come in the near future. As the milestones of 2020 have shown us, the role of AI in this huge transformation is both promising and unsettling.

Tens of thousands of Machine Learning papers are published each year. Unfortunately, the clear impact each of them will have in real-life remains unclear. Meanwhile, the Machine Learning algorithms being run by tech giants (e.g. Apple, Amazon, Facebook, Google, etc.), whose real-world impact is immense, are developed behind closed doors. There remains quite a path to clear before the use of Machine Learning is democratized through industries and applications.

The most important language model yet: GPT-3

Released by OpenAI, GPT-3 is an auto-regressive Natural Language Processing model. Boasting an extensive set of 175 billion parameters, it achieves never-before-seen text generation capabilities.

Despite its insane performance, GPT-3 has raised a lot of debate concerning the large monetary and environmental cost (it cost approximately $12 million to train GPT-3) of large language models as well as their tendency to produce biased outputs.

As a response to these inconveniences, researchers are starting to propose new Transformer-based methods such as Performers and Linformers. Their goal is to mitigate the lengthy training time while maintaining high performance.

AI for the good of society

As the adoption of AI increases and the understanding of ML Operations is refined, 2020 has seen many data-driven solutions for the good of society.

Whether it is finding ways to diagnose COVID using cough recordings, diagnosing tinnitus from brain scans, or solving the 50-year old protein folding problem: Artificial Intelligence clearly opened up some very interesting avenues for research.

The rise of the GANs: deepfakes

As working from home during the pandemic blurred the concept of time and space, state-of-the-art GANs started blurring out faces and replacing them to make indistinguishable fake content

Deepfakes aren’t new, but they have seen some incredible advances this year. Exemplified by Jordan Peele making a fake Obama address and President Richard Nixon giving an alternate address about the moon landings, deepfakes are becoming increasingly more convincing.

AI Governance: the most important challenge yet

Tools driven by Machine Learning can be extremely powerful. In this sense, these solutions are double-edged swords. Luckily, consciousness about the dangers and potential misuse of Machine Learning has increased in the past year, from solving bias in datasets, evaluating the outcome fairness of specific models, all the way to regulating the type of data that can be used and tracked.

Recently, we have seen BMW release its code of ethics and California passing the AB-730 bill designed to criminalize the use of deepfakes that give false impressions of politician’s words or actions. Moreover, public debate about ethics in AI has seen a recent jump after Google fired one of their important AI ethicists, Timnit Gebru.

What’s next

We are looking forward to the AI advances coming in 2021. Hopefully, as model performances increase, the democratization of Machine Learning solutions will in turn put more importance on accountability as well as highlight possible biases and ethical misuses of the technology.

We will keep covering important AI milestones in 2021. We will curate, evaluate, and publish the three most relevant topics every two weeks. Feel free to join us by subscribing to the digest using the form below!

The Launch of MLCommons

50+ Global technology and academic leaders in AI unite with objective to accelerate innovation in Machine Learning


Machine Learning is a relatively young field. Over the years, many actors have attempted to create standardized material to unify certain aspects, from modeling and testing libraries to deployment toolkits and data versioning software. Some of these attempts, such as the GLUE benchmark for NLP or the PapersWithCode initiative on arxiv, have been very well-received by the industry.

One of these attempts is MLPerf, a benchmarking tool for measuring the performance of hardware for Machine Learning tasks.

What’s new

The founders of MLPerf have brought together an engineering consortium of companies, schools, and research labs to build open-source and standardized tools for machine learning.

This consortium, called MLCommons, includes representatives from Alibaba, Facebook AI, Google, Intel, Dell, Samsung, NVIDIA, and many others. The list of partnering schools mostly includes Universities with a global reputation for leading AI research such as U.C. Berkely, Stanford, Harvard, the University of Toronto, and others.

Source: MLCommons

MLCommons will focus on three pillars:

  • Benchmarks and Metrics that are able to compare ML solutions, software, and systems transparently.
  • Publicly available crowd-sourced Datasets and Models to build new state-of-the-art AI solutions and applications.
  • Best Practices to allow sharing models between teams globally.

Today, MLCommons already includes one project for each of these pillars.

With regards to benchmarks and metrics, MLPerf has become an industry-standard for evaluation training and inference performance across different infrastructures.

For datasets and models, MLCommons has released People’s Speech, a large dataset containing 87 000 hours of speech in 57 different languages.

Finally, in the Best Practices category, MLCube is a set of common conventions that allow users to run and share models with anyone, anywhere.

Why it matters

Publicly available datasets and benchmarks have driven the majority of recent progress in the Machine Learning Industry. The production and maintenance of such resources are complex, expensive, and require input and feedback from many different actors. MLCommons takes on the challenge by bringing 50+ leading organizations together.

As David Kanter, the Executive Director of MLCommons states, “MLCommons is the first organization that focuses on collective engineering to build that infrastructure. We are thrilled to launch the organization today to establish measurements, datasets, and development practices that will be essential for fairness and transparency across the community.”

What’s next

The global Machine Learning community is impatiently waiting for MLCommons next releases. Hopefully, these standardized tools and methods will spearhead innovative initiatives in the field.

“MLCommons has a clear mission – accelerate Machine Learning innovation to ‘raise all boats’ and increase positive impact on society,” states Peter Mattson, the President of MLCommons.

Simultaneous Face, Hand, and Pose detection on Mobile

Google AI has developed an all-in-one face, hand and pose detection solution for mobile using multiple and dependent neural networks


Some of the most important advances in Computer Vision revolve around the adequate detection of human behavior in images. Correctly detecting and tracking objects carry a lot of potential use case applications in various industries and in diverse steps of the value chain. It comes as no surprise that models that detect human pose, faces, and hand position share that characteristic.

MediaPipe, an open-source project by Google, offers cross-platform ML solutions for real-time media analysis. These ML solutions include standalone tasks such as Face Detection, Hair Segmentation, Object Detection, Iris detection, and many others.

What’s new

Until recently, MediaPipe offered separate solutions for Face, Hand, and Pose Detection. Last week, Google researchers have released MediaPipe Holistic, a solution that combines all three tasks.

The consolidated pipeline integrates three separate models for Face, Hand, and Pose Detection. While each model uses the same input image, said input is processed differently to achieve optimal task-specific results. For instance, the face and hand detection models require image cropping before reducing the input’s pixel resolution (which is done to allow real-time inference). In the end, the solution yields a total of 540+ key-points for each analyzed frame.

Source: Google AI Blog

Each task is run in real-time with minimal memory transfer between inference backends. Furthermore, the solution is modular in the sense that it allows for component interchangeability depending on your device’s hardware performance.

An additional note with regards to the solution’s Model Card, a topic discussed in a previous Digest: this solution does not yet have a Model Card. However, as all other MediaPipe solutions do have one, it seems as if it is simply a matter of time for the Google Research team to add the relevant documents to the MediaPipe documentation.

Why it matters

Using real-time detection models in cross-platform applications enables a large variety of impactful use-cases. Some examples are sign language recognition, augmented reality effects, additional features in video-conferencing applications, fitness detection, and gesture control. Moreover, applications like this one prove the technical feasibility of integrating complex Machine Learning solutions in mobile- and edge-devices.

What’s next

As stated by the researchers, “We hope the release of MediaPipe Holistic will inspire the research and development community members to build new unique applications. We anticipate that these pipelines will open up avenues for future research into challenging domains, such as sign-language recognition, touchless control interfaces, or other complex use cases.”

The great news is that you can try the solutions directly in the web browser, now! You can use Python notebooks in MediaPipe on Google Colab. Otherwise, you can see direct results using the MediaPipe CodePen for applications using JavaScript with your own webcam.

Subscribe to the Digest!

Unsupervised Diagnostics, Amazon Monitoring, and Probabilistic Programming

By Featured, Weekly DigestNo Comments

NEW: You can now subscribe to Visium Digest!

From now on, you can receive your favorite source of curated AI News directly in your inbox. Don’t worry, we won’t send you any unsolicited content or marketing emails, just your Digest, every two weeks.

Using your Doctor’s notes for disease detection

Researchers from Stanford proposes a method that uses text to generate features for its associated unlabeled image


Helping doctors make key decisions using data-driven solutions is an important challenge. With image classification models aplenty in other fields, health science applications are struggling to augment their workflows using Machine Learning. In large part, this delay can be associated with a lack of labeled training data. In fact, medical images require a lot of domain knowledge for accurate labeling. Often, trained doctors and physicians are too busy with more high-stake and tangible tasks. In a previous digest, we discussed how synthetic data augmentation could help resolve such issues.

What’s new

A team of researchers from Stanford University has come up with an interesting unsupervised alternative. The method, called ConVIRT, leverages the naturally occurring pairing of images and textual data to classify medical imagery.

In fact, the text reports accompanying medical images often contain very useful information about the image’s contents. This information can be used to extract the class associated to the input, and this without any expert input whatsoever!

lung convirt digest

Source: The Batch

The authors built two separate pipelines: one for the textual input and another for the image. The NLP pipeline consisted of BERT variant. To compare the image encoding with the textual encoding in a consistent space, a single hidden layer was added to a ResNet-50. For more information regarding the specific architecture, a PDF version of the paper is available.

Why it matters

The proposed method was evaluated using four medical image classification tasks and two zero-shot retrieval tasks. The obtained results indicate that their method considerably outperforms strong baselines (ResNet-50 pre-trained on ImageNet and in-domain initialization methods). In fact, the method requires only 10% of labeled training data as ImageNet to achieve better performance.

This improved data efficiency is very promising as it could help alleviate the high cost of medical data labeling.

How Amazon monitors factory workers and machines

In its expansion into the industrial sector, the tech giant uses state-of-the-art AI to monitor factories


A large majority of companies still rely on scheduled maintenance procedures to verify the state of machinery. This is done in order to reduce the occurrence of line outages and factory shutdowns, which can bring enormous inconvenience or product unavailability to the end customers.

Furthermore, in factories around the world, compliance regulations need to be upheld. Employees are often required to wear Personal Protective Equipment (PPE) and follow specific guidelines such as staying out of unauthorized zones, maintaining social distancing, etc. Most often than not, the misuse or manipulation of these regulations and guidelines lead to potentially costly and dangerous accidents.

What’s new

Cloud computing leader AWS has developed hardware-reliant systems to monitor the health of heavy machinery and detect worker compliance.

The former relies on a two-inch, 50-gram sensor called Monitron. It can record vibration and temperature, which a Machine Learning model then uses to flag anomalous behavior.

Leveraging data-driven solutions to predict machine failure instead allows companies to replace or maintain their machinery during set maintenance windows. This way, machines don’t break down at unexpected times. That way, there are no negative impacts on customers.

amazon monitron digest

Source: AWS

Amazon has been testing 1,000 Monitron sensors at its fulfillment centers near Mönchengladbach in Germany. Their new system is being tested to monitor conveyor belts handling packages.

AWS’s second addition to its industrial product line is called Panorama. The system enables pushing Machine Learning models to the edge, connecting to pre-installed camera systems. This way, managers can automate the monitoring of workers. The system can detect misuse of or missing PPE, vehicles that are in unauthorized parking spots, the respect of social distancing measures, and so on.

aws panorama digest

Source: AWS

A set of companies are testing AWS Panorama. Siemens Mobility said it will use the new technology to monitor traffic flow in different cities. Furthermore, Deloitte has stated that it was working with a major North American seaport to utilize the tool for monitoring shipments.

Why it matters

These new Amazon products demonstrate the benefit of using data-driven solutions in a factory setting. Furthermore, it shows that implementing end-to-end solutions is crucial to ensuring added value for AI solutions.

“This idea of predictive analytics can go beyond a factory floor,” Mr. Thill said. “It can go into a car, on to a bridge, or on to an oil rig. It can cross fertilize a lot of different industries.” said Matt Garman (AWS’s head of sales and marketing) speaking to the Financial Times.

What’s next

While the new products have raised some concerns with critics, the advantages they bring are indubitable. The concerns are mostly linked to the fact the client company does not seem to have enough control over the Machine Learning models embedded in Monitron and Panorama. In fact, the capabilities seem extremely generalized. This is where AI providers such as Visium can provide solutions that are highly optimized to a client’s needs – all whilst using Amazon’s standardized and compliant hardware.

While Amazon ensures no pre-packaged facial recognition capabilities are embedded within Panorama, there has been debate about the ethical issues surrounding packaged monitoring systems in general. To mitigate this issue, Amazon relies on a defined list of terms and regulations to ensure that their systems are used solely for safety purposes.

Facebook AI’s evaluation framework for Probabilistic Programming Languages

Facebook AI introduces a new benchmark called PPL Bench for evaluating Probabilistic Programming Languages on a variety of statistical models


Using Probabilistic Programming Languages, statisticians and data scientists alike are able to formulate probability models in a formal language. Using a probability model allows you to perform Bayesian Inference by computing the posterior probability of an event. To be more specific, you are able to assess the probability of an event by using prior probabilistic belief given a set of observations.

The advantages of using such techniques combined with Machine Learning algorithms are multiple and diverse. First, you can aggregate similar behavior together (e.g. hierarchical structure in your dataset) to increase the accuracy of your model. Second, you are able to grow consistency and robustness by adding beliefs from professionals with expert domain knowledge. Finally, formulating Machine Learning problems using probability models allow you to leverage probabilistic output—taking into account uncertainty to assess risk.

What’s new

Researches from Facebook AI have created an open-source benchmark framework for evaluating PPLs used for statistical modeling. PPL Bench has a dual objective: (1) evaluate improvements in PPLs in a standardized setting and (2) help users pick the right PPL for their modeling application.

Implemented in Python 3, PPL Bench handles Model Instantiation and Data Generation, Implementation and Posterior Sampling, and Evaluation. The modular workflow is explained graphically in the image below.

PPL Bench digest

Source: Facebook AI

The Evaluation of the PPL implementations is done using several evaluation metrics.

  1. Predictive log-likelihood with respect to samples. This allows users to evaluate how fast each PPL converges to final predictions.
  2. Gelman-Rubin convergence statistic.
  3. The effective sample size is used to evaluate if there are any positive correlations between generated samples, which should theoretically not be the case and kept to a minimum in practical implementations.
  4. Inference time is used to evaluate the potential runtime of practical use cases.

Why it matters

Probabilistic Programming is a very powerful tool whose use has exploded in the last decade. Proposing an open-sourced evaluation framework for PPLs attempts to create a standardized mechanism for implementation comparison. Not only does it raise awareness and spark discussions, but it also allows users to pick the right PPL for their task at hand using data-driven insights following the most common PPL considerations.

What’s next

As is stated by Bradford Cottel, Technical Program Manager at Facebook AI, “We hope that community contributions will help grow and diversify PPL Bench and encourage wider industrial deployments of PPLs.”

Here are the relevant links to the paper and code.

Sign up to get the digest directly in your inbox!

digest protein folding breakthrough

Protein Folding Breakthrough, TLDR in Science, and Robot Bias

By Featured, Weekly DigestNo Comments

AlphaFold solves the protein folding problem

DeepMind’s AlphaFold reaches 90+ percent accuracy at protein structure prediction competition


Proteins are indubitably the most important molecules for sustaining life. Practically all functions, from transporting oxygen through our blood to giving leaves their bold colors, are supported by proteins.

These proteins can be described using three different structural languages, which each have a varying degree of complexity and abstraction and are depicted below.

protein structure digest

Source: MSOE Center for BioMolecular Modeling

There are different ways of determining protein structure, and each of these methods yields information about the protein in one of the different structural languages. For instance, while mass spectrometry can yield primary structure, only Nuclear Magnetic Resonance (NMR), X-ray crystallography, and cryo-electron microscopy (which are immensely time- and resource-intensive) are able to yield tertiary structure.

What’s new

Last week, DeepMind’s AlphaFold competed in the biennial Critical Assessment of protein Structure Prediction (CASP). The challenge allows participants to predict the tertiary structure of a given primary structure. The metric used for evaluation is called the Global Distance Test (GDT). In short, the score ranges from 0-100 and indicates how close the predicted structure is from the Ground Truth.

In the past 7 versions of CASP, the winners’ scores didn’t grow past 75 GDT, even staying below 50 GDT before CASP 2018. This year, however, AlphaFold’s state-of-the-art AI model was able to achieve a median score of 92.4 GDT. This surpasses the 90 GDT threshold that is considered to be a ‘solution’ to the protein folding problem.

AlphaFold digest

Source: DeepMind

Their solution implements new deep learning techniques that consider a folded protein as a spatial graph. Using an attention-based neural network, evolutionarily related sequences, and multiple sequence alignment, the system develops strong predictions of the underlying physical structure of the protein.

Why it matters

For 50 years, researchers in Biology have been looking for a method to determine tertiary structure using only the information from the primary structure. This is essential as the tertiary structure is closely linked to its function. Therefore, knowing a protein’s tertiary structure unlocks a greater understanding of what it does and how it works.

What’s next

The DeepMind team states that they’re “optimistic about the impact AlphaFold can have on biological research and the wider world, and excited to collaborate with others to learn more about its potential in the years ahead. Alongside working on a peer-reviewed paper, we’re exploring how best to provide broader access to the system in a scalable way.”

‘Too Long; Didn’t Read’ comes to scientific literature

A new state-of-the-art summarization model is being used to distill the information of AI research papers into a single sentence


In recent years, many different summarization models have been released. Their common goal: reduce reading time without compromising understanding. You can easily find online bots such as summarizebot, summarization APIs such as one from DeepAI, and articles explaining the key technical concepts behind these types of models. What’s the catch? The common flaw of these models is that they don’t generalize well. If applied to text that is uncommon in the dataset that was used for training, the model will perform significantly worse.

What’s new

Researchers from the Allen Institue “introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression, requiring expert background knowledge and complex language understanding.” This quote is a summarized version of the abstract of their paper using the method described in said paper.

Using a multitask learning strategy on top of pretrained BART, researchers were able to compile the SciTLDR dataset. By analyzing a paper’s abstract, intro, and conclusion (for computational reasons), the method is able to summarize 5 000 word articles in only 20.

TLDR digest

Source: Semantic Scholar

The AI solution has been deployed as a beta-version on Semantic Scholar. Displaying the TLDR of articles directly on the search results page enables you to quickly locate the right papers for you. The feature is already available for nearly 10 million computer science papers, and counting!

Why it matters

Staying up to date with scientific literature is an essential part of a researchers’ workflow. Furthermore, parsing through a long list of papers from different sources by reading abstracts is extremely time-consuming.

TLDRs can help researchers make quick and informed decisions about which papers are relevant to them. TLDRs also provide paper summaries for explaining the content in other contexts, such as sharing a paper on social media platforms.

What’s next

Summarizing papers with 20 words gives you a good idea of the direction of the paper. However, in complex domains such as Computer Science, a couple of dozen words is not enough to distill the information. A possibility for the future might be dynamic N-sentence summarizers.

Making Robots less biased than humans

Researchers in Robotics have committed to actively ensuring fairness in AI-driven solutions


Almost all police robots in use today are straightforward remote-control devices. However, more sophisticated robots are being developed in labs around the world. Increasingly, they use Artificial Intelligence to integrate many more complex and diverse features.

Many researchers find this problematic. In fact, several AI algorithms for facial recognition, predicting people’s actions, or nonlethal projectile launching have led to controversy in past few years. The reason is clear: many of these algorithms are biased against people of color and other minorities. Researchers from Google have argued why the police shouldn’t use this type of software. Above that, some private citizens are now using facial Recognition against the Police, as mentioned in a previous digest.

What’s new

Earlier this year, hundreds of AI and robotics researchers committed to actively changing some practices in their field of work. A first Open Letter from the organization Black in Computing states that “the technologies we help create to benefit society are also disrupting Black communities through the proliferation of racial profiling.” A second statement, “No Justice, No Robots”, calls for its signers to refuse work with or for law enforcement.

Researchers in robotics are trained to solve difficult technical problems. They are not educated to consider societal questions about how the robots they build affect society. Nevertheless, they have committed themselves to actions whose end goal is to make the creation and usage of AI in Robotics more just.

bias robot digest

Source: Wes Frazer for The New York Times

Why it matters

The adoption of AI systems is growing exponentially. Today there are AI systems built into self-driving cars meant specifically for the detection of pedestrians. A study by Benjamin Wilson and his colleagues from Georgia Tech has found that eight such systems were significantly worse at detecting people with darker skin tones than lighter ones.

As a public policy researcher from Georgia Tech, Dr. Jason Borenstein, puts it: “it is disconcerting that robot peacekeepers, including police and military robots, will, at some point, be given increased freedom to decide whether to take a human life, especially if problems related to bias have not been resolved.”

What’s next

The root cause of this issue, as Dr. Odest Chadwicke Jenkins (one of the main organizers of the open letter mentioned above) from the University of Michigan states, “is representation in the room — in the research lab, in the classroom, and the development team, the executive board.”

In parallel, some technical progress is trying to mitigate the potential unfair outcomes of AI systems. For instance, Google has developed a system to bring a shared understanding of AI models called Model Cards, as mentioned in a previous digest that discussed background features in Google Meet. In the Model Card, bias is tested for different geographies, skin tones, and genders. This method clearly identifies the metrics used and results found, adding a lot of transparency and accountability to Machine Learning Modeling.

Additionally, the market for synthetic datasets is growing rapidly. The use of this methodology, which is covered in more detail in a previous digest, allows to balance datasets that could potentially produce unfair outcomes.

Sign up to get the digest directly in your inbox!

Robot Recyclers, Hate-Speech on Facebook, and Ethical GANs

By Featured, Weekly DigestNo Comments

Robots fight for our Climate

Recycling robots leverage the power of AI to make the cleaning of single streams of waste financially viable


At the end of 2017, China instated to close off the import of recycled waste. As a response, western countries–which used to be the main waste exporters to China–were forced to strengthen their waste processing internally. With the rise of the implementation of IoT technology in the industrial setting in recent decades, it comes as no surprise that western countries have turned to robotic technologies to solve this problem.

What’s new

Founded last year, a company from Louisville in Colorado, USA called AMP Robotics sells and leases AI-driven recycling robots. Raising $23 million in venture funding, the company has sold or leased 100 robots to more than 40 recycling plants across the globe.

digest trash collector AI

Source: AMP Robotics

The robots are able to work on a stream of recycling in which one can find paper, plastics, and aluminum mixed together. These streams, called single-stream recycling streams, are analyzed using proprietary computer vision techniques. Their impressive self-proclaimed 99% accuracy rate outperforms competing technologies such as optical sorters.

trash collector AI

Source: AMP Robotics

When compared to humans, who are able to pick up 50 pieces of waste per minute on average, robots can pick up 80.

Why it matters

After China’s ban, western countries’ recycling stream was not pure enough. AMP’s robots allow for a cleaner recycling output with more downstream market value.

What’s next

AMP has already started working on new projects. Extending their reach from only single-stream recycling, they have started supporting handling waste from electronic as well as construction and demolition facilities.

How Facebook handles harmful content

Facebook AI Research reveals how Machine Learning is used to handle different forms of harmful content


As one of the leading social media platforms, Facebook is reliant on scalable and intelligent solutions to detect harmful content. The company has implemented a range of specific policies and products whose goal is to mitigate the spread of misinformation and harmful content on its platform. In short, these include (1) adding a warning to content that has been rated by third-party fact-checkers, (2) reducing the distribution of harmful content, and (3) removing misinformation if it can contribute to imminent harm.

The modern political environment is becoming increasingly polarized (which is partly due to social media platforms such as Facebook and the spread of misinformation). Therefore, it is interesting to see how large corporations such as Facebook handle scaling their efforts in detecting and mitigating the spread of harmful content and misinformation.

What’s new

FAIR (Facebook AI Research) has recently developed two new Artificial Intelligence technologies to help protect people from hate speech. They claim to have proactively detected 94.7% of hate speech in Q3 2020 (compared to 80.5% in Q3 2019 and 24% in Q3 2017) using these new technologies.

The first technology is called Reinforced Integrity Optimizer (RIO) and allows the integration of real examples and metrics into training their classification models.

RIO Facebook digest

Source: FAIR

The second technology, called Linformer, decreases the computational requirement to train state-of-the-art models using the Transformer architecture. In fact, implementing a linear-time training algorithm makes it possible to use larger pieces of text to train these models. The code is available online.

Linformer digest Facebook

Source: FAIR

Why it matters

Facebook currently uses both RIO and Linformer in production to analyze harmful content in many different regions around the world.

These gains in efficiency and coverage are paramount in dealing with hate speech and harmful content before it has a chance to spread.

What’s next

In the long term, Facebook’s objective is to “deploy a state-of-the-art model that learns from text, images, and speech and effectively detects not just hate speech but human trafficking, bullying, and other forms of harmful content”. There is a long way to go before this objective becomes a reality, and as such users must remain wary of the adverse effects a social media platform such as Facebook can have.

Others will argue that these effects have nothing to do with inherently harmful content or hate speech itself. It can be noted that as long as internet platforms use intelligent systems wired to predict what informations will keep you scrolling and online instead of those you should be informed about, it is recommended to get news from other sources.

Ethical considerations for GANs

Ethical considerations of GANs arise in the face of improved portrait-generating technologies


GANs — or Generative Adversarial Networks are a powerful Artificial Intelligence tool that is able to generate new data that has strong statistical relations with the training dataset. This method is used for diverse applications such as the creation of synthetic datasets.

GANs have also been used to create stock photos with dummy faces. The goal is to demonstrate and promote diversity, which can in turn generate business opportunities. This particular use of GANs leads to numerous ethical implications.

What’s new

Nowadays, GANs have gotten so powerful that their results are almost indistinguishable from real photographs. Additionally, one can now easily customize and edit the generated content. Modifying the shade of the skin or the color of one’s hair is possible at the click of a button.

GAN progress digest

Source: Medium Towards AI

The business opportunities resulting from this technological advance are endless. In fact, promoting ‘counterfeit’ diversity is now cheaper than ever before. Photo agency databases prevalently contain images of white men: minorities are severely under-represented. Using GANs, businesses can quickly produce thousands of fake photographs with high percentages of diversity. However, increasingly using fake diversity carries the risk of building a false illusion of diversity. This makes it easier to publicly promote an image of diversity all the while ostracizing minorities from your company.

digest GAN fake

Source: NY Times

From an ethical perspective, the use of this technology raises several discussions related to the validity of what can be found online, even from trustworthy sources.

Why it matters

GANs are considered one of the most powerful machine learning technologies available today. However, awareness should be raised concerning the ethical implications of these techniques. They are already being used, for instance, by people impersonating journalists on Twitter with generated profile pictures.

What’s next

The progress that can be observed between 2014 and today is quite extraordinary. This begs the question: where will this technology stand 5 years from now? Increased interest and awareness concerning the ethical use of AI technologies are necessary. AI governance and accountability are important issues that must be confronted sooner rather than later.

Sign up to get the digest directly in your inbox!

essential AI news curated by industry insiders - weekly digest cover week six

Visium Weekly Digest Week 48

By Featured, Weekly DigestNo Comments

Questioning Augmented Video Surveillance

Modern surveillance systems are getting better, which is raising ethical questions around how they are trained and used


Surveillance systems as we know them are used for live monitoring by operators or post-hoc verification in case of accident or theft. However, adding data-driven analytics to these systems (where the cameras are already installed) allows for breakthrough advances in surveillance and detection. These methods are able to reduce response time and greatly increase efficiencies. For instance, Vaak, a Japanese start-up, has developed systems that monitor suspicious attributes among shoppers and alert retail store managers through smartphone notifications. The goal here is prevention and the 77% drop in shoplifting losses attests to the system’s accuracy. Usually, if a suspicious target is approached and asked if they need any help: that attention is enough to curb the probability of theft.

What’s new

The potential advantages of these techniques are indubitable. However, if trained with biased data or processes sensitive information, it remains obvious that this technology can be dangerous if put in the wrong hands.

Recently, a surveillance system in Buenos Aires has sparked a lot of debate because it mixes personal information with criminal records. Furthermore, it uses personal information about minors, which goes against the international Convention of the Rights of the Child (ratified by Argentina in 1990).

child video surveillance digest

Source: Getty Images

The system matches two databases: one with outstanding arrest warrants and another with an image database of people’s faces. IT has led to the arrest of up to 595 suspects per month with low False Positive rates (5 of the 595 that particular month).

Why it matters

Video surveillance systems have very strong potential for both companies and governments. However, Buenos Aires’ system violates international human rights law. Furthermore, it uses criminal records as training data, which is known to be highly biased partly due to historical systemic inequalities.

What’s next

All AI solutions, but especially those used for surveillance purposes, need clean and compliant data to ensure equal social outcomes and fairness in real-life applications. There remains a clear lack of large scale governance and standardized ethical frameworks in high-stake Machine Learning solutions.

AI for Healthcare

Healthcare is being transformed by a rise in the adoption of data-driven solutions


The market for the Internet of Medical Things (IoMT) is expanding rapidly. From glucose monitors to MRI scanners, sophisticated sensors are increasingly being matched with AI-powered analytics. The IoMT market, which Deloitte estimates will grow to 158.1 billion USD by 2022, is undertaking a mission to improve efficiency, lower care costs, and drive better health outcomes in healthcare using data-driven insights.

Recently, COVID has highlighted the need for remote patient monitoring. In order to lower hospital readmissions and emergency visits, the large majority of healthcare providers are beginning to invest in remote systems. These allow for monitoring essential health metrics for at-risk patients without needing a visit to the doctor.

What’s new

Last week, the US Center for Medicare & Medicaid Services (CMS) stated it would start reimbursing for the use of two novel Augmented Intelligence systems. The first, called IDx-DR, can diagnose diabetic retinopathy, a diabetes complication that can cause blindness, using retina scans.

digest healthcare ct scan


The second is a software developed by called ContaCT. The system can alert a neurosurgeon when a CT scan shows evidence that a patient has a blood clot in their brain. Rapid diagnosis is essential in these situations as saving a couple of minutes can dramatically reduce potential disabilities. Results show that Viz ICH is 98% faster than the standard of care.

In other news, a real-world oncology analytics platform called Cota Health has recently raised $10 million. Organizing fragmented real-world data, their solution can gain insights into cancer treatments and care delivery variation.

Why it matters

The willingness to pay for the standardized use of AI tools is great news for other companies working on medical AI products. It should be noted, however, that these solutions are not replacing healthcare workers. Instead, the solutions provide augmented intelligence that allows the workers to spend more time on essential tasks. These data-driven analytics act as a support system for healthcare, enabling a more informed decision-making process.

What’s next

While the increasingly-connected environment of IoMT brings a lot of advantages and increases the efficiency of many processes, one must not forget the new security risks that come along with them. These sophisticated sensors act as edge-devices in their respective networks. As such, they open up new vulnerabilities for cybercriminals to exploit.

In general, it is paramount for the future of healthcare that adequate AI governance is put in place to mitigate these drawbacks. Furthermore, there are many other factors such as data privacy, fairness, and ethics that come into play when deploying data-driven solutions to real-world high-stake environments.

Hum to Search

Google has revealed the Machine Learning technology behind their Hum to Search feature


It is no surprise that Google’s Now Playing and Sound Search features are powered by Machine Learning approaches. Released in 2017 and 2018 respectively, they use deep neural networks to identify songs by using your device’s microphone. While the aforementioned features are accurate at finding played songs in multiple settings and environments, you still couldn’t find the song responsible for that melody stuck in your head! Frustrating, especially when research suggests the best way to get rid of an earworm is by listening to the song in question.

What’s new

Google released a Hum to Search feature in October. Just last week Google researchers responsible for the feature have uncovered the Machine Learning behind their technique.

As you might imagine, a studio recording is quite different from a hummed song. Often, the pitch, tempo, and rhythm vary significantly between the two. Fortunately, using existing knowledge from having created the older features, researchers know how to spot the similarities between spectrograms.

spectrogram humming digest

Source: Google AI Blog

Using this knowledge coupled with a state-of-the-art Deep Learning retrieval model, they were able to match millions of songs to hummed melodies with decent accuracy.

Click here to learn more about the method and additionally how the researchers tackled the challenge of obtaining enough training data.

Why it matters

This recent development shows that useful data can be extracted from sound recordings. Applying Machine Learning algorithms to processes that involve sound in other environments can be of immense use. Recently, Visium has developed a sound-based solution called ListenToMachines, helping Nestlé tackle predictive maintenance in factories.

Sign up to get the digest directly in your inbox!

essential AI news curated by industry insiders - weekly digest cover week five

Visium Weekly Digest Week 47

By Featured, Weekly DigestOne Comment

Detecting COVID using forced cough recordings

The releases of an infection detection using cough recording as well as a COVID-19 Simulator are at the forefront of AI developments tackling the virus’ spread


With the second wave of the Coronavirus disease in full swing all around the world, researchers are leveraging data-driven methods in an effort to reduce virus spread. The ability to use aggregated data from the first wave allows for a diverse set of new ideas and accuracy improvements in existing solutions. Recently, we covered a virus identification technique using Computer Vision from researchers at Oxford University.

What’s new

Last week, researchers from MIT and Harvard published a paper in the IEEE Journal of Engineering in Medicine & Biology putting forward a diagnostic tool using only cough recordings.

As you can imagine, the difference in coughs from COVID-19 negative and positive patients cannot be distinguished with the human ear. However, a model based on the Convolutional Neural Network architecture is able to distinguish them with high accuracy. Indeed, it identified 98.5 % of coughs from patients who tested positive, including 100 % of coughs from asymptomatic.

covid cough digest

Source: Laguarta et al. 2020

Interestingly, the research groups’ prior work included similar algorithms for the identification of Alzheimer’s disease. Unfortunately, the code does not seem to be made available to the public.

In other news, AWS recently open-sourced a COVID-19 Simulator and Machine Learning Toolkit. The goal is to enable data scientists to better model and understand disease progression in a given community over time. This is done by modeling the disease progression for each individual using a finite state machine. Furthermore, the simulator allows for testing the impact of various ‘what-if’ intervention scenarios. The code is available here.

What’s next

With regard to the cough detection tool, the team is currently looking into deploying the model into a user-friendly app. This app, if approved by the FDA, could lead to the adoption of many potential use cases. For instance, daily country-wide screenings, outbreak monitoring, and test pooling candidate selection.

In fact, it would give access to a free, convenient, and non-invasive pre-screening tool. Patients could log in every day, forcibly cough into their phone’s microphone, and get information on whether they are possibly infected and hence should take a formal test. This reminds us that for data-driven solutions to work in a real-life setting, the insights must be actionable.

As they propose in the paper, “Pandemics could be a thing of the past if pre-screening tools are always on in the background and constantly improved.”

The AWS COVID-19 simulator aims to encourage data-driven decisions with regard to restrictions.

Why it matters

This research shows that using data can lead to a plethora of different solutions to a unique problem. With a complex problem such as a pandemic, many factors are at play. The large majority of these factors can be monitored, tracked, and modeled in some way, shape, or form.

Here, the unconventional idea of using cough recordings for disease detection leads to a non-invasive diagnostic tool that is essentially free, can yield quasi-unlimited throughput, real-time results, and longitudinal monitoring.

AI in the hands of the oppressed

US citizens use Computer Vision models to identify abusive law enforcement officials


Developing and deploying Machine Learning solutions is not something anyone can do. Not only do you need the technical know-how, you need the data. For a long time, only large tech companies were able to deploy large scale, robust, and high-stake Machine Learning solutions. Recently, some have proven that one can use the open-source tools, knowledge, and software elements to build a Machine Learning solution on their own.

What’s new

Private citizens have been using face recognition software to identify abusive law enforcement officials. Using publicly available models and crowd-sourced datasets, their solutions are able to identify police officers in photos and videos. Christopher Howell from Portland, Oregon is one of these individuals. Using images from the news, social media, and a public dataset Cops Photo, he developed a model that can recognize about 20% of the city’s police force.

From Belarus, which is in the midst of a highly debated presidential election, an individual called Andrew Maximov has designed a similar solution to identify mask-wearing police officers. He displays the solution in a YouTube clip.

Source: Andrew Maximov YouTube

What’s next

In some jurisdictions, police officers are not required to display their name tags and are allowed to wear face masks. Moreover, the number of protests has highly increased in the past decade. In some countries and circumstances, these protests can become incredibly violent. These examples show that in the hands of citizens, AI tools can increase police accountability and stem abuse. On the other hand, the tool could also be used with malicious intent, harassing officers who’ve done nothing wrong. Worse, the solution could lack in performance accuracy when compared to professional systems, identifying the wrong officers.

Why it matters

Face recognition is a double-edged sword in a politically polarized world. This shows that adequate governance with respect to the democratization of Artificial Intelligence is essential. The use of these tools by individuals, companies, or governments, comes with immense responsibilities.

Tackling group distribution shift in production

Researchers from Berkley have developed Adaptive Risk Minimization, a meta-learning approach for tackling group shift


There is a huge difference between Machine Learning in Research and in Industry. In research, an enormous amount of importance is put on the model’s performance. Unfortunately, using benchmarks and standardized datasets does not reflect the use of these solutions in real life.

In fact, so much is needed in addition to the model training and prediction scripts that there has been a recent boost in Dev Ops for Machine Learning solutions (also referred to as MLOps) led by Allegro AI, MLflow, Weights & Biases,, DataRobot, DVC, Snorkal AI, and many others. The specific features of all of these platforms differ in some way, shape, or form. However, what brings them all together is the idea of standardized and collaboration-friendly tools for monitoring, testing, and versioning ML products (as well as their data, artifacts, etc.) in both development and production.

One very important aspect is the distribution shift in the production setting. Let us use the example of handwriting transcription. What happens when end users have different handwriting to the writers the training data was taken from?

What’s new

This week, a team of researchers from Berkeley Artificial Intelligence Research (BAIR) published a paper proposing a new meta-learning approach: Adaptive Risk Minimization. The method aims to tackle group distribution shifts, which occurs when the training and testing data are not drawn from the same underlying distribution. This phenomenon can be explained by temporal correlations, specific end users, or many other factors. More importantly, it occurs in almost all practical Machine Learning applications. Hence, a fundamental assumption is violated. Real-life results may therefore not reflect the testing accuracies obtained during the training phase.

The method displayed in the paper assumes that training data is distributed in groups. Then, the data that is processed in production can be modeled as a shift in the group distribution. The use of meta-learning enables adaptable model learning that can adapt to this shift at test time.

Source: Marvin Zhang

Let us go back to the example mentioned above. If the model is analyzing handwriting from a user, it can use batched letters (all letters used in a sentence or paragraph) to adapt to learned potential distribution shifts. For instance, the user writes the number 2 without a loop, indicating that a shape that could potentially be either the letter a or the number 2 using classical approaches will most likely be the letter a. An illustration is provided above.

What’s next

The authors believe their method and its empirical results “convincingly argue for further study into general techniques for adaptive models”. The outcomes and consequences of AI solutions in real-life gain an increasing amount of importance. Indeed, new subject areas for the upcoming Conference on Computer Vision and Pattern Recognition include Deepfake detection, Ethics in Vision, Fairness, Accountability, Privacy, and Dataset bias. It seems clear that adaptive models will become crucial for Machine Learning to achieve their potential in complex, real-world environments.

Why it matters

Handling data from new users is far from the only potential application for adaptive models. The authors state that “in an ever-changing world, autonomous cars need to adapt to new weather conditions and locations, image classifiers need to adapt to new cameras with different intrinsics, and recommender systems need to adapt to users’ evolving preferences”. Humans have clearly demonstrated that they can adapt by inferring information using examples from test distributions. Will humans be able to develop methods that can allow machine learning models to do the same?


Sign up to get the digest directly in your inbox!

essential AI news curated by industry insiders - weekly digest cover week four

Visium Weekly Digest Week 46

By Featured, Weekly DigestOne Comment

Background Features in Google Meet

Google launches background features in its online video conference platform that work directly in your browser.


As the working-from-home culture quickly spreads around the world, there is more demand for added functionalities on conferencing applications. To focus on the meeting itself and prevent distractions from setting or background objects, several applications have implemented background modifications tools. Unfortunately, these functionalities require extensive computing power. Indeed, they often require the installation of specific software.

What’s new

Google AI researchers have recently developed background features for Google Meet, their Workspace’s real-time meeting application. The new set of features allow users to (1) slightly blur, (2) heavily blur, or (3) completely replace their background.

The interesting aspect here is that everything runs in high quality directly in your browser! Researchers used MediaPipe, Google’s open-source Live ML platform, and WebAssembly, a low-level web code format, to achieve high performance at extremely fast inference speed. For more information about making this solution run on a wide range of devices and with low power consumption, read the article.

Source: Google AI Blog

Lastly, Google provides a Model Card for the segmentation model used in these features. It lays out many interesting aspects of the model, such as its limitations and ethical considerations. Furthermore, it provides the evaluation metrics used for evaluating model performance and detailed results testing fairness across geographic regions, skin tones, and gender. This Model Card format, based on a paper by Google, is relevant for shedding light on intended use cases and promoting transparent model reporting.

Why it matters

The complexity of Machine Learning models is not necessarily related to its potential utility in the real-world. Here, Google researchers show that a fairly simple and light-weight segmentation model optimized for web performance can have a tremendous impact on a daily application. Regardless of their complexity, successful solutions are transparent, accessible, and suited to common real-life use-cases.

AI Governance

AI governance around ethics, fairness, transparency, and explainability are paramount when putting Machine Learning solutions into production.


When coming out of the research setting, AI models can introduce unique problems. Training data often doesn’t reflect real-life data. Be it errors, duplication, or bias; when training data is flawed, the model doesn’t perform well. Even worse, it could produce discriminatory or unfair results.

Additionally, models go stale over time. The inference quality of a model is known to drift as the input stream becomes increasingly different from the data the model was trained on.

While ML tools for production (controversially called ‘ML Ops’) are on the rise. Tools such as allegro ai and MLflow from Databricks advertise end-to-end ML Operations management, from experiment tracking to deployment in production. With or without these tools, companies today need to define process management frameworks that take all external factors into account. Read more in this Forbes article.

What’s new

Adding on to the fundamental requirements formulated by the EU for trustworthy AI, BMW Group has written its code of ethics for AI. It states seven basic principles covering the use of AI within the company, which are displayed in the image below.

Source: BMW Group

The code of ethics is a great start and shows BMW’s hands-on approach to tackling AI governance. However, concepts such as ethics, fairness, explainability, and transparency are still topics of debate in the AI industry. They are ever-changing, which is what makes AI governance so challenging. BMW affirms that this list will be refined and adapted continuously.

Why it matters

It is fundamental for any company using AI in their products or services to define internal AI governance. Its widespread use and increasing diversity of use cases demonstrate the need for companies to manage their processes and take responsibility for their products’ outcomes. Especially as AI is being democratized and increasingly leveraged in small and medium enterprises.

GANs with small datasets

Dynamical data augmentation allows GANs to produce good results with less data.


Generative Adversarial Networks often need huge amounts of data for good results. You might think this is a non-issue with the seemingly unlimited supply of images online. However, it remains challenging to collect a large dataset for an application with specific constraints. Constraints can be subject type, image quality, location, privacy, and many more. Unfortunately, when trained on small datasets, GANs tend to replicate the training data or output extremely noisy results.

What’s new

Researchers from Nvidia have developed a discriminator augmentation mechanism to stabilize training in scenarios where less data is available. The technique, called Adaptive Discriminator Augmentation, dynamically augments the training data with image scaling, rotations, and color transformations, etc. The key is to add these common augmentation techniques in the right proportion in order to prevent overfitting.

Training a StyleGAN model on the Flickr Face High-Quality dataset, researchers were able to significantly improve the evaluation metrics when compared to the baseline StyleGAN model they used. In fact, their version even beat the baseline model that was trained on a dataset that is 5 times larger! These results are shown in the figure below, taken from the paper.

Source: Karras et al. 2020

Why it matters

It takes tens of thousands of images to train a GAN successfully. Gathering all the necessary data is an extremely resource-intensive task. Reducing the number of needed images by an order of magnitude can already notably reduce the effort. Indeed, Adaptive Discriminator Augmentation makes GANs more accessible and increases the feasibility of high-stake Machine Learning tasks.

Sign up to get the digest directly in your inbox!

Visium Weekly Digest Week 45

By Featured, Weekly DigestOne Comment

Producing fair outcomes with synthetic data

Not only can the use of synthetic AI make high-stake use cases more feasible, it also enables reducing bias found in datasets


The generation of synthetic datasets for training Machine Learning systems is becoming more popular. This technique, which uses generative AI, allows engineers to have larger datasets that carry the same properties as the original dataset. Furthermore, as the synthetic data is not based on real-world sampling, it is privacy-preserving.

What’s new

Synthetaic, a company focused on creating synthetic data for high-stake Machine Learning solutions, has just raised $3.5 M in seed funding.

The company’s founder, Corey Jaskolski, got the idea when he was creating a full digital record of one of the last remaining Sumatran Rhinos in Indonesia. The 3D scan (displayed below) was so realistic, he thought it was a photo. He argues that “if my synthetic digitized rhino is indistinguishable from a photo, is it as real as a photo? I realized that if I can create 3D models that look real to me, I can use these images to train AI systems.”

Source: Corey Jaskolski

This technique permits the generation of data where real-world examples are sparse. For example, for the detection of rare brain cancers or extremist stickers on cars, the imbalance between positive and negative examples heavily impacts the model’s performance.

Recently, Synthetaic worked on generating chest x-rays of COVID patients to assist doctors in the disease’s detection.

Source: Synthetaic

However, if you generate synthetic datasets from sources which are bias, and real-life datasets often are (e.g. Compas recidivism dataset), you are perpetuating that bias in your data augmentation. Therefore, you must always be aware of the potential biases that exist in your data and include parity penalties in your optimization procedures.

Why it matters

Real datasets are often too small for adequate training. This heavily impacts the feasibility of high stake and high reward Machine Learning use cases. While the generation of synthetic data can help solve this problem, researchers must still be aware of social biases in their original training datasets.

It is true that modifying the ratio of a feature in an original dataset during augmentation could be considered as injecting a new bias. In fact, your new dataset might not reflect reality as well as the original one. One might say that you are simply replacing one bias with another. Remember that the aim is not always to represent reality accurately but to produce fair outcomes.

Perfomers – the new and improved Transformers

By approximating Transformers’ attention mechanism, researchers have drastically reduced their computational cost


Transformers have recently revolutionized the Artificial Intelligence community. This type of deep learning model has demonstrated state-of-the-art performance in NLP. Promising results show that they are also relevant in Computer Vision tasks. Unfortunately, Transformers scale quadratically on the number of tokens. This leads to a heavy computational load when training these models. As a consequence, most AI teams are unable to leverage the power of this technique.

What’s new

A team combining researchers from Google, the University of Cambridge, DeepMind, and the Alan Turing Institute propose a new type of Transformers, dubbed Perfomers. The new technique estimates regular full-rank-attention Transformers with high accuracy. As can be observed in the image below, the calculation for the attention mechanism is decomposed, reducing its operational cost (from quadratic to linear). The paper contains extensive mathematical theory, guaranteeing unbiased (or nearly-unbiased) estimation of the attention matrix, uniform convergence, and low estimation variance.

Source: Google AI Blog

The code is available online. Moreover, you can find a brief summary of the paper on Google AI Blog and an in-depth explanatory video review by Yannic Kilcher on YouTube.

Why it matters

Transformers provide an intelligent mechanism for identifying complex dependencies in input sequences. Unfortunately, that mechanism carries an immense computational cost, prohibiting their use. Performers use a different backbone mechanism to calculate attention, providing accuracy with linear (instead of quadratic) cost. This effectively makes the method more accessible, which in turn democratizes the use of Artificial Intelligence in both research and industry.

Interactive Data Science Communication

A new visual article about COVID-19 marks a trend of increasing interactive communication in Data Science


The past 40 years have seen a complete shift in how people communicate. The internet allows for instant transmission of information. In this age, sorting out the valid from the unreliable and unproven has become an enormous challenge.

Furthermore, video and audio formats make up an increasing amount of shared information. Explaining through text and data visualizations is difficult as a reader’s background heavily influences their level of comprehension.

What’s new

To cope with this, data science writers aim to make their articles more interactive. Displaying data dynamically enhances readers’ learning experience. They are able to play with the visualization and understand concepts clearly.

A beautiful example relevant to the current scenario is an article from the Financial Times Visual Journalism Team. The article, entitled “Covid-19: The global crisis — in data”, uses data from around the world to tell the Coronavirus story. Dynamic visualizations coupled with good storytelling and relevant external links make for a poignant article.

Source: Financial Times

The Financial Times Visual Journalism Team has created a plethora of other articles, which you can find here.

In the Machine Learning field, Distill is a publication platform for interactive articles. The platform aims to advance dialogue, promote outstanding communication, and support scientific integrity. Leveraging web-tools allows for the use of reactive diagrams, breaking free from the traditional PDF format. Examples include using t-SNE effectively (displayed below), attention and augmented RNNs, and visual exploration of Gaussian Processes.

Source: Distill Pub

Why it matters

New tools, such as Observable, allow you to interact with your readers to convey information with style and clarity. In an age combining misinformation with a trend of ever-increasing amounts of generated data, communicating clearly and efficiently is paramount.

Sign up to get the digest directly in your inbox!

essential AI news curated by industry insiders - weekly digest cover week two

Visium Weekly Digest Week 44

By Featured, Weekly DigestNo Comments

The power of bio-inspired Artificial Intelligence

The imitation of the nematode’s nervous system using only 19 neurons shows promising results in the context of autonomous driving


Deep Neural Networks perform incredibly when there is enough data to train them. Unfortunately, these models often don’t generalize well or efficiently. Furthermore, they require heavy computational power to train hundreds of thousands of parameters.

What’s new

Inspired from the nematode’s nervous system, researchers developed a sparse recurrent neural network model called Neural Circuit Policies (NCP). The model first uses a small convolutional feature extractor to transform the camera’s input into structured features. This is subsequently fed into the NCP network containing 19 neurons whose role is to output motor commands that control the car.

Source: Neural circuit policies enabling auditable autonomy by Lechner et al.

This system shows promising results considering the size of the network. The worm’s neural system is minuscule but allows for locomotion, motor control, and navigation. These abilities are exactly what is needed for applications like autonomous driving. The authors state that “the system shows superior generalizability, interpretability, and robustness compared with orders-of-magnitude larger black-box learning systems”.

Source: What’s AI Medium

The code is open-sourced. There is also a video explaining the technique in more detail.

Why it matters

Autonomous driving is an important challenge for AI to tackle. Further than combining technically complex systems, there are important ethical questions that can arise. For these reasons, robustness and interpretability are key factors for the potential widespread integration of autonomous driving.

Diagnose COVID-19 with Machine Learning

Scientists from Oxford have developed an extremely rapid Coronavirus diagnostic tool


The current testing framework for SARS-CoV-2 (more commonly referred to as Coronavirus) is mainly focused on viral testing. You’ve most probably already been subject to this type of test, it uses a nasal-swab to detects the virus’ nucleic acid or antigen. An important drawback of this testing method is the response delay. Samples are sent to a laboratory where a method called PCR is performed. This protocol, followed by result extraction and communication to the patient usually takes between 24 and 72 hours.

What’s new

Scientists from Oxford University have recently developed an extremely rapid diagnostic test, which can detect and identify different viruses (including SARS-CoV-2) in less than five minutes. The method uses images captured using a wide-field fluorescence microscope. The images are processed using adaptive filtering algorithms and analyzed using Machine Learning. More specifically, a Convolutional Neural Network is used to classify the image as containing SARS-CoV-2 or not.

Source: Virus detection and identification in minutes using single-particle imaging and deep learning by Shiaelis et al.

While the method works considerably better for the Flu (85% accuracy), the results for detecting Coronavirus are promising (70% accuracy). Using state-of-the-art Computer Vision techniques could play an important role in speeding up viral testing.

It remains to be discussed how such a method could potentially be integrated in health systems around the world. Indeed, the integration and deployment of a Machine Learning project is often complex as it needs to take all parts of the data pipeline into account: extraction, aggregation, processing, analysis, and result communication.

Why it matters

Reducing result delay for viral testing in a pandemic scenario has the potential of having a massive impact on virus spread and contamination. Solving this problem by extracting the necessary information from images and a Convolutional Neural Network demonstrates the potential of data-driven techniques.

Re-inventing NLP model testing

Adding a human-centric approach to NLP testing is revealing flaws in the best state-of-the-art models


AI development is driven by benchmarks. Whether ImageNet for Computer Vision tasks or GLUE and SQuAd for Natural Language Understanding tasks, benchmarks have been instrumental in driving AI progress. By laying a solid basis for model performance comparison, researchers are led to improve models. However, a large part of benchmarks such as the ones mentioned above come with some flaws: they have artifacts, can be deceiving, are not human-centric, and are used for overfitting by researchers. As stated in Goodhart’s law generalized by Marilyn Strathern, “When a measure becomes a target, it ceases to be a good measure”.

What’s new

Facebook has recently developed Dynabench, an online tool where users can try to fool language models. The goal is to gather human input dynamically to measure progress in NLP more accurately. This follows a trend of earlier efforts to test NLP models using human input such as Trick Me If You Can and Beat the AI from researchers at the University of Maryland and UCL respectively.

In a collaborative effort between Microsoft Research and the University of Washington, Checklist is a task-agnostic method for NLP model behavior testing. Inspired by typical testing methods from Software Engineering, researchers have developed a matrix for testing a large and diverse number of cases. It consists of three types of tests for a large array of different adversarial methods such as specific vocabulary, negation, semantic role labeling, fairness, and many others.

  1. Minimum Functionality Test (MFT) to target a specific behavior (similar to unit testing),
  2. Invariance Test (INV) for testing small perturbations that should not modify the result, and
  3. Directional Expectation test (DIR) for testing perturbations that should produce an expected result.

Some examples of these tests can be observed in the image below. The paper tests state-of-the-art models from Microsoft, Google, Amazon, as well as BERT and RoBERTa. As can be observed in the table taken from the paper, results show some alarming failure rates, even in the best models. Additionally, usage of the tool by researchers has proven to increase the number of tests performed and the amount of identified bugs.

A recent seminar hosted by Stanford ML Systems inviting the paper’s first author can be found on YouTube. Furthermore, the Checklist repository is open source. (Pro hint: you can find an arXiv paper’s code directly on the website since the recent addition of a code tab following a collaboration with PapersWithCode.)

Source: Beyond Accuracy: Behavioral Testing of NLP models with CheckList by Ribeiro et al.

Why it matters

Especially in NLP tasks, a human-centric approach to testing models is essential. For instance, users can use negation, specific entity names, and temporal vocabulary to attempt to trick modern state-of-the-art models. If a social network wants to implement the classification of hate-speech, it should be robust against adversarial statements using these techniques.

Sign up to get the digest directly in your inbox!

Cookies / Privacy Policy

Visium 2020
Developed in Switzerland