Undiscovered public knowledge discovery

# Undiscovered public knowledge discovery - UPKD Undiscovered public knowledge discovery is a line of [[§Augmented Knowledge Generation]] that is based on the premise that 1) there is a body of public knowledge and you can create new knowledge by creating new connections between pieces of that public knowledge. Public knowledge is stuff that’s not in people’s heads but has been encoded in papers, databases, etc. Something something relaxing constraint of someone with expertise in both areas needing to discover Like [[§Augmented Knowledge Generation]] in general, UPKD can take advantage of both [[Crowd-based knowledge synthesis - ToA]] and [[Algorithmic knowledge generation]]. Outside of biology, chemistry, and materials science approaches to undiscovered public knowledge either look like paper search engines ([[Commercial approaches to UPKD act like specialized search engines]]) or are at proof-of-concept stage. Abstractly - there is a spectrum of approaches to UPKD between using only the forms of existing public knowledge and changing the form of public knowledge. ### Lines of Undiscovered Public Knowledge in Biology, Chemistry, and Materials Science * [[Using published literature to find causal relationships that can be connected to lead to a new causal relationship]]. This method has shown some academic results and there is one company attempting to commercialize it ([[Lum.ai]]) * [[Using computers to search databases of existing molecules or compounds to suggest new uses]]. Of the three, this approach has been commercialized the most. It is the approach used by [[Citrine Informatics]], [[Atomwise]], [[BenevolentAI]] and others. * [[Training computers on existing molecules or compounds and using them to predict properties of new molecules or compounds]] Note that these are not mutually exclusive. Originally I was going to have two separate sections - one for Biology and one for Chemistry and Materials science. But when I went to write them, the approaches are actually very similar. ### Proof of Concept Approaches - TRL 3 or Below * [[People have tried many different attempts to get experts to encode public knowledge in ways that make it easier to connect the pieces]]. This is both necessary for * [[A system that encoded papers as problems and solutions and then surfaced the papers and associated researchers based on the problem you had]] it fell into the trap that [[Structuring knowledge is expensive]] * [[Zitian Chen]] created a proof of concept system for ontology-based knowledge discovery in social science [[Metabus.org]] ultimately it ends up falling into the two traps that [[Commercial approaches to UPKD act like specialized search engines]] and [[Structuring knowledge is expensive]]. * [[IBM Watson]] attempted [[Using published literature to find causal relationships that can be connected to lead to a new causal relationship]] but cancelled the program because the autonomous version didn’t produce useful enough results to be commercially viable. Interestingly, one of the people on the program noted that there was promise if it had humans in the loop. ([[Scott Spangler Conversation 12/2/20]]) * The analogy-based approach (see [[Accelerating Innovation Through Analogy Mining]] and [[Scaling Up Serendipity]] for overviews) has a few proofs of concept in product design and a toy problem in research ([[chanSOLVENTMixedInitiative2018]]) They attempt to address the [[Structuring knowledge is expensive]] problem by building systems to use machine learning to extract the purpose and mechanism from papers automatically. It doesn’t look promising any time soon. * Too many projects trying to cluster papers and patents to enumerate. This is where [[Patent on finding correlations quickly in a graph of works]] falls. * Arguably [[Polyplexus]] and [[Micropublication Biology]] are attempts to create public knowledge that is more discoverable and recombinable. ### Criticism of UPKD [[UPKD assumes that all valuable knowledge is both written down somewhere and interpretable when you find it]]. [[UPKD requires public knowledge to be both discoverable and interpretable in order to produce useful results]]. [[In order for knowledge to be discoverable and interpretable that means that it needs to be encodable]]. [[Of conception and implementation of undiscovered public knowledge, conception is the low-hanging fruit]]. Discovery is less useful for making things happen than interpretability. In its platonic form, pure discovery approaches would give you the exact piece of literature you need at the exact moment you need it. However, real experience suggests that [[Just having the piece of literature doesn’t solve the problem]]. Therefore, [[The limiting reagent for UPKD is better methods for increasing interpretability]]. ### Increasing Interpretability #### Challenges to Interpretability #### Possibilities to increasing Iterpretability ## History and Case Studies People have known about random connections being useful forever - see vannevar bush and the memes. [[Undiscovered Public Knowledge (Paper)]] laid down the philosophical groundwork in 1986. [[DARPA and IARPA have run at least five programs focused on UPKD covering both crowd and algorithmic knowledge discovery]] ## Hypotheses * It should be possible to encode knowledge in a way that at least suggests which pieces of knowledge will be “pre-requisites” for another piece of knowledge and conversely to encode what the pre-requisites for something to “become a reality” are. ## Open Questions * Do you think these are the right projects * Are there any concepts from computer science that are useful here?  How the fuck would you encode these things [Gmail](https://mail.google.com/mail/u/0/#label/Very+Unimportant%2Fnewsletters/FMfcgxwGDDqlmqWtzxzdxLPrfPhGhChP) [Web URL for this note](http://notes.benjaminreinhardt.com/Undiscovered+public+knowledge+discovery+-+UPKD) [Comment on this note](http://via.hypothes.is/http://notes.benjaminreinhardt.com/Undiscovered+public+knowledge+discovery+-+UPKD)