About PanKB

The microbial world holds vast potential for advancements in diverse fields such as food production, human health, and ecological sustainability. PanKB is a pangenomic knowledgebase working to empower practitioners to leverage microbial functions beyond those of a select few model organisms. PanKB provides:

  • A growing dataset of pangenomic results,
  • Interactive data reports and analytics for exploration, analysis, and new potential discoveries,
  • Global database search for genes, pathways, products, species, and more,
  • Alleleomes describing the amino acid variants across gene alleles,
  • Dataset download providing access to raw data, search results, and pangenomics analytics for custom analysis,
  • A bibliome of open-access pangenomic publications,
  • A specialized LLM-powered chat interface to accelerate knowledge acquisition by providing deeply-detailed query responses on publication content, including supporting references, and constrained to not return hallucinated content.

Read the Publication Browse the Code

Using PanKB

Pangenomes

PanKB includes multiple interactive analytics and tables that give an overview of a pangenome's contents. These analytics can be found or navigated to an individual pangenome's page.

Using Lactiplantibacillus plantarum as an example:

  • Overview Page shows the presence/absence gene matrix, COG category distribution, core/accessory/rare pangenome gene categorizations, and pangenome openness of Lactiplantibacillus plantarum Pangenome.
  • Gene Annotation Table provides detailed gene annotation of all gene clusters found in the Lactiplantibacillus plantarum pangenome.
  • Phylogenetic Tree displays the phylogenetic structure of Lactiplantibacillus plantarum species, annotated with isolation source and country information.

Alleleomes

Pangenomes also serve as the foundation for further large-scale analyses, and PanKB is actively integrating their novel results. Recent pangenomic-scale analyses of variants, named Alleleomics, demonstrated unique value in narrowing the solution search space for feasible genetic variants in E.coli. PanKB currently includes the alleleomes of all of its pangenomes. Alleleome analytics can be found on pangenome and specific gene pages.

Using Lactiplantibacillus plantarum as an example:

  • For genome alleleome, users can find it in the Overview Page.
  • For single gene alleleome, for instance, gene accA2 in the Lactiplantibacillus plantarum. By clicking on the accA2 in the Lactiplantibacillus plantarum's Gene Annotation Table, users can access the accA2 gene page, where shows the alleleome of accA2.

Data Accessibility

PanKB implements multiple different methods for accessing its data. Users can access all of PanKB's data through its navigation links. Users can also quickly find specific data through the global database search feature accessible on most pages. Finally, users can also download the database's raw data through the various analytics hosted on database pages.


Data Application

Combined, PanKB's features enable valuable workflows for enzyme and strain engineering. These include identifying genes for new enzyme production or reintroduction into strains, pinpointing precise gene edits to modify activity, discovering and optimizing valuable pathways, and selecting optimal starting strains. Current strain engineering heavily relies on models or familiar strains; the features and data of PanKB empower strain engineers to start leveraging pangenomic data for targeted bioengineering.


PanKB LLM

Scientific progress often necessitates extensive literature review, a traditionally time-consuming process. Large Language Models (LLMs) offer a potential solution by aggregating and summarizing knowledge across documents. PanKB includes an LLM chatbot (AI Assistant) focused on an open-access pangenomic bibliome, designed to accurately answer deep questions on pangenomics, cite relevant articles, and not attempt to hallucinate inaccurate content. This feature is an initial experiment towards combining an LLM and a specialized scientific database to accelerate scientific knowledge acquisition through automated knowledge extraction. PanKB LLM can be accessed by clicking on the AI Assistant link in the navbar.

How to Cite PanKB

When using PankB in your researcher we kindly ask that you cite the latest publication. Your citation of PanKB will help us to apply for ongoing funding to maintain this resource.

Latest Publication:

Sun, Binhuan, Liubov Pashkova, Pascal Aldo Pieters, Archana Sanjay Harke, Omkar Satyavan Mohite, Alberto Santos, Daniel C. Zielinski, Bernhard O. Palsson, and Patrick Victor Phaneuf. 2024. “PanKB: An Interactive Microbial Pangenome Knowledgebase for Research, Biotechnological Innovation, and Knowledge Mining.” Nucleic Acids Research, November, gkae1042.

Contact Us

We value your feedback and are here to assist with any questions or issues.

For reporting bugs, suggesting features, or asking questions, please visit our GitHub Issue Tracker.

For general inquiries or if you prefer to reach out via email, feel free to reach out to the team:

Contributors

The following people contributed to the development of PanKB:

  • Binhuan Sun
  • Pascal A. Pieters
  • Liubov Pashkova
  • Aaron C. Thiel
  • Archana S. Harke
  • Omkar S. Mohite
  • Alberto Santos
  • Daniel C. Zielinski
  • Bernhard Ö. Palsson
  • Patrick Victor Phaneuf

Funding

This work was funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF Grant Number NNF20CC0035580).