corpora.ai: Tips and Tricks Pt. 1 of...

This is our first article in what will become a series of helpful guides on how to leverage Corpora.ai to trivialize and democratize deep, meaningful research into a variety of subjects, individuals and disciplines.

corpora.ai: Tips and Tricks Pt. 1 of...

Prompts, Functions, Insights

Learn how to truly become a corpora.ai master

This is our first article in what will become a series of helpful guides on how to leverage corpora.ai to trivialize and democratize deep, meaningful research into a variety of subjects, individuals and disciplines. We will cover the introduction of our pseudo query language, corpora.ai Prompting Language (CPL), as well as some of the functions and features that are now exposed to the user via CPL.

For any links to examples, you will require a corpora.ai account. If you don't have one, you can register for access and will be invited to create your account after a short review process.

Tips covered in this article

Our first tip/trick is Controlling Source Content by group:

To understand the value of this tip will take a bit of context and a light explanation of the core engine and data corpora.ai has. corpora.ai has a core engine which ingests in real-time, and currently possesses understanding and local access to over 1 Petabyte of compressed content. This patented and proprietary engine understands contextual grouping of both content, author and publisher allowing corpora.ai to maintain a current and past view of 'the world' based on author and publisher taxonomical grouping. All publishers and authors can possess multiple classifications, i.e. a political publisher can also be a news publisher.

With CPL, users can define the source content for their research in a variety of ways, the 2 key methods being as a function call or through natural language - there are a handful of alias functions, and keywords to use in both methods. The list below details the alias functions.

Source Content Functions

  • using()
  • focus()
  • from()
  • with()
  • content()
  • source()

The keywords are the functions without the parentheses but also can be far more natural. Either can be used at any logical natural position within the query. i.e. using(tech, news)..., How has EV technology evolved from legal. Both of these examples will filter the source content to only publishers and authors that have any of the classifications in either the tech and news or legal groups.

The source content groups are shown in a list below:

  • tech
  • legal
  • medical
  • finance
  • financial
  • news

The above list of source content groups is the current list and they will be updated over time. We will write blog posts about those updates when they are released.

The idiom "A picture paints a thousand words" is always true in my experience, so below is a collection of links to example overviews using a mix of content source filter functions and keywords supported by CPL.

The second tip/trick in this article is Controlling Source Content by Date

Users can control the timeframe of the source content that their research is built upon through various natural language mechanisms. This is very powerful as it allows the user to compare the known information of an entity or hypothesis with the same view a year prior.

Users can use the following keywords for date filtering - at the time of writing, there are no date filter functions:

  • before
  • after
  • during
  • between
  • in
  • until
  • upto
  • prior
  • range

The above keywords can be used in any logical and natural position within a corpora.ai query. i.e. What was the political landscape like before 1939, Clinical trials targeting peritonial mesothelioma between 2010 and 2019

Below are some links to queries utilizing the Date Filtering functionality of CPL:

The third tip/trick in this article is Comparison Queries

Comparison Queries are queries that expose the discovery aspect of corpora.ai perfectly. Given a topic, competitors are identified that meet any other provided criteria, and then begins the construction of multiple research reports on each entity, using shared metrics to focus on, which aids user comprehension as the comparison is consistent. The only other CPL function not supported concurrent with Comparison Queries is the Content Source Filtering.

This is a contextual comparison, so users can use the following keyword phrases to initiate comparison queries and build multiple research books:

  • find competitors of x
  • ...described as x
  • products for x
  • ...that are described as x
  • ...that compete with x
  • challengers to/of x
  • alternatives to/of x
  • replacements for x
  • rivals to x
  • opposite to x
  • antagonist to x
  • slanderer of x
  • allies of x
  • similar to x
  • associates with x
  • supports x

The reason for such variance in the keywords is that in affords the greatest flexibility to the user.

Below are links to example queries that utilize the Comparison Query processing functionality of corpora.ai:

The next tip/trick to cover in this article is Controlling Source Content by Language

Users can also control the source content language their research is built upon through CPL functions while also controlling output language. For example, a user can use the following structure to their query: written(x, y) in z where x and y are source languages and z is output language

This is very powerful as it gives the researcher the ability to view differing regional content and understanding of various topics. Below is a list of the functions that can be used to achieve this function.

  • output
  • in
  • use
  • report
  • language
  • authored
  • written
  • composed
  • published

To demonstrate the above, the examples all show a mix of function use.

The fifth and final tip/trick to cover in this article is Controlling Output Persona

The output persona is the control to choose the language used in the research book that is generated. Persona is a function and has alias functions too, these are detailed below:

  • persona
  • audience
  • reader
  • recipient
  • listener

These functions take in one parameter which is natural language to succinctly describe the target audience. E.g. audience(high school), reader(phd student), persona(professional). Each of those examples will control the output construction process used to author the research books and ensure the content is appropriate for the provided persona.

The links below show this in practice.

We hope these Tips and Tricks are helpful and unlock your research potential with corpora.ai. Our intention is to continue to grow and evolve the platform to handle more configuration to user queries through the CPL.

Feel free to let us know if this article and type of content is helpful. We welcome your feedback and accept feedback via email at support@corpora.ai, X @corpora_ai, LinkedIn or any other of our social platforms as well as the comments below.