The Bioinformatics and Institutional limits of the questions you ask are important…but the mistakes are often not only in Virology, which result in LAI, but also in the data science linked to the Error Detection Paths and Patterns.
Dear NLM Officer,
Re: SARS-CoV-2 reference sequence suppression and other COVID Origin research GenBank data omissions
I am writing to learn more about how it is that the SARS-CoV-2 reference sequence became suppressed in May 2023?
<<May 6, 2023 02:21 PM; suppressed; This record was removed by RefSeq staff. Please contact info@ncbi.nlm.nih.gov for further details.>>
<<Verifying the quality of sequencing data submissions is necessary to maintain the integrity of such databases managed by NIH and ensures that users have access to trusted and reliable data.>>
<<Thus, the sequence was never made publicly available on GenBank.
In the interim, another submission to GenBank from a different submitter was received and published on January 12, 2020.
That submission published on January 12 provided the genetic sequence for SARS-CoV-2.
The sequence published on January 12, 2020, was nearly identical to the sequence that was submitted by Lili Ren.>>
Groups deleted (use these grids to undelete if necessary):>>
Can a version of this submission from group 7385146 please be undeleted and made available for calculation of more exact comparison to the SARS-CoV-2 reference sequence; <<nearly identical>> is not very accurate.
Thank you.
Are any of the submissions from Lili Ren WH01 series available on this database archive perhaps?
<<Verifying the quality of sequencing data submissions is necessary to maintain the integrity of such databases managed by NIH and ensures that users have access to trusted and reliable data.>>
Can the GenBank indexers please provide some more information about BioProject: 1097963?
Now it is not public:
<<The following ID is not public in BioProject: 1097963
Given the stringent quality control NIH conducts how is it that this helpful BioProject is no longer available to the public?
Was it perhaps Spam or some other failure of the NIH’s high standards of data integrity and or cybersecurity?
Can the NLM GenBank resources please provide more stream lined support services and resources for reporting suspect submission that have made it past the NIH strict data integrity processes?
Data contamination is an important issue to avoid in GenBank submissions.
But JAMOGK000000000.1 Pseudomonas aeruginosa was removed by GenBank even when this contamination was found in related submission with AI/ML tools:
<<This record was removed because the sequence was determined to be contaminated. Please contact info@ncbi.nlm.nih.gov for further details>>
But in GenBank’s stringent data integrity rules there has been no attempt to identify the actual contaminant here in the JAMOGK01 series of data?
So too with JAMOGK01 can you please identify the contaminant and make this contamination public BEFORE you comply with request from PLA to remove data…
If you can flag the contaminant codes now, when you are aware of what they are, this would be helpful.
Thank you.
Finally;
I am interested in COVID origin research.
Prof Zhegli Shi of WIV was quite clear that I examine all available data on GenBank before asking for access to WIV’s virus databases.
See previous correspondence on this issue:
<<Case CAS-1324284-Y8N8F1 - National Library of Medicine Customer Service confirmation TRACKING:000435001291518>>
This is quite challenging as some of the data in GenBank are hidden from view.
I would like to be able to see that data, and I would like to know how it became suppressed.
For example: the pre-print <<Spread and Geographic Structure of SARS-related Coronaviruses in Bats and the Origin of Human SARS Coronavirus; Yu2018unpublished>>
and the data set and correspondence with GenBank from the submitting authors is important to ongoing COVID Origin research.
Now the title of the pre-print gives zero results…due to suppression.
No items found.
On August 09, 2022 this paper gave 163 search results in GenBank as seen in this archive.
Then to <<GI 1769824316>>Record suppressed: Bat SARS-like coronavirus strain Rs5725_Yunnan ORF8 gene, complete cds
Followed again by the unrelated<<GI 1769824315>>
Also placed on 25-OCT-2019Salmonella enterica subsp. enterica serovar Infantis strain FSIS170230
This series of 308 suppressed GI GenBank submissions is extensive but also not the full set of 163 nucleotide results yet as this should result in 326 missing GI numbers and only 308 have been recovered so 18 GI submissions are still missing according to my calculations.
This means of the at least 163 nucleotide and protein sequence pairs submitted for this preprint and placed in GenBank, only 154 are able to be recovered for analysis by examining the series of GI numbers at this stage.
How many original GI data points were placed for this preprint?
Also, where are the final nine nucleotide and protein sequences pairs that were searchable on August 09, 2022?
That means at least eighteen GI numbers are missing and due to GenBank suppression are not able to be found?
Perhaps there is another way to recover these missing files?
Was there an earlier GI number series that was placed when the preprint was originally submitted?
When was this series linked to <<Yu2018unpublished>> originally submitted?
Given that cybersecurity concerns were highlighted by Prof ZLShi as reason for limiting access to WIV’s extensive bat virus databases, can you please reassure me that the missing data from <<Yu2018unpublished>> is safe, and send links to the remaining suppressed and missing sequence submissions.
The main error of Virology and associated Science is not being direct enough to finish the Math and calculate not only risk of LAI in an age of Synthetics…but also to extrapolate this to determine the Extinction level risk of LAI.
It is assumed that the reader knows something of the GOF controversy in virology. To ensure the essays remain short, they are best read as a series. Are the essays too dense or difficult to absorb? Comments please. Suggestions for an article around which a future essay could be crafted would be welcome.>>
You essays are straight forward and easy to understand.
The reason you leave certain topics alone is not.
Dual Use Research of Concern is implicitly involved with Science and the arguments for and against GOF but also deeply and irreparably linked to the Dark Side…Biological Weapons and the extinction level danger of synthetics.
The Bioinformatics and Institutional limits of the questions you ask are important…but the mistakes are often not only in Virology, which result in LAI, but also in the data science linked to the Error Detection Paths and Patterns.
Dear NLM Officer,
Re: SARS-CoV-2 reference sequence suppression and other COVID Origin research GenBank data omissions
I am writing to learn more about how it is that the SARS-CoV-2 reference sequence became suppressed in May 2023?
<<May 6, 2023 02:21 PM; suppressed; This record was removed by RefSeq staff. Please contact info@ncbi.nlm.nih.gov for further details.>>
https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=girevhist
Also, I would like to know more about the sequence that was deleted WH01 from Lili Ren, discussed in the article below.
https://www.science.org/content/article/first-sars-cov-2-genome-deposited-us-database-earlier-than-previously-known
<<Verifying the quality of sequencing data submissions is necessary to maintain the integrity of such databases managed by NIH and ensures that users have access to trusted and reliable data.>>
<<Thus, the sequence was never made publicly available on GenBank.
In the interim, another submission to GenBank from a different submitter was received and published on January 12, 2020.
That submission published on January 12 provided the genetic sequence for SARS-CoV-2.
The sequence published on January 12, 2020, was nearly identical to the sequence that was submitted by Lili Ren.>>
And from the GenBank correspondence here:
https://d1dth6e84htgma.cloudfront.net/Ford_H2_316_20240111_152518_05f9837537.pdf
Was a GI number given to this sequence?
The GenBank correspondence states:
<<AUTODELETED group 7385146
Groups deleted (use these grids to undelete if necessary):>>
Can a version of this submission from group 7385146 please be undeleted and made available for calculation of more exact comparison to the SARS-CoV-2 reference sequence; <<nearly identical>> is not very accurate.
Thank you.
Are any of the submissions from Lili Ren WH01 series available on this database archive perhaps?
https://web.archive.org/web/20200222054741/https:/bigd.big.ac.cn/ncov/genome/
Further, given that
<<Verifying the quality of sequencing data submissions is necessary to maintain the integrity of such databases managed by NIH and ensures that users have access to trusted and reliable data.>>
Can the GenBank indexers please provide some more information about BioProject: 1097963?
Now it is not public:
<<The following ID is not public in BioProject: 1097963
ID:1097963>>
https://web.archive.org/web/20240705052322/https://www.ncbi.nlm.nih.gov/bioproject/?term=cybersecurity
But previously it was public…and available when searching for cybersecurity resources of GenBank:
https://web.archive.org/web/20240807055019/https://www.ncbi.nlm.nih.gov/bioproject/?term=Cybersecurity
This BioProject included a link to an IT education company…has this training been now completed by the GenBank indexers?
https://maetechacademy.edu.my/course.html
Given the stringent quality control NIH conducts how is it that this helpful BioProject is no longer available to the public?
Was it perhaps Spam or some other failure of the NIH’s high standards of data integrity and or cybersecurity?
Can the NLM GenBank resources please provide more stream lined support services and resources for reporting suspect submission that have made it past the NIH strict data integrity processes?
Data contamination is an important issue to avoid in GenBank submissions.
But JAMOGK000000000.1 Pseudomonas aeruginosa was removed by GenBank even when this contamination was found in related submission with AI/ML tools:
<<This record was removed because the sequence was determined to be contaminated. Please contact info@ncbi.nlm.nih.gov for further details>>
But in GenBank’s stringent data integrity rules there has been no attempt to identify the actual contaminant here in the JAMOGK01 series of data?
https://www.ncbi.nlm.nih.gov/Traces/wgs/JAMOGK01?display=contigs
In previous contaminated submission it was obvious where the PLA had contaminated the records with data relevant to COVID Origin;
For example NY5541 urine sample from 2019
https://www.ncbi.nlm.nih.gov/Traces/wgs/JAMOHC01?display=contigs&page=1&state=dead
And… NY5537 collected 2019 sputum sample submitted by Zhou,D:
Submitted (19-MAY-2022) State Key Laboratory of Pathogen and
Biosecurity, Beijing Institute of Microbiology and Epidemiology,
No. 20, Dongdajie, Fengtai, Beijing, Beijing 100071, China
https://www.ncbi.nlm.nih.gov/nuccore/JAMOGK010000088.1?report=GenBank
See <<Breaking: SARS-CoV-2 Spike found in bacteria samples taken from China, 2019
January 20, 2023 >> Adeno News article
https://web.archive.org/web/20230122173319/https://adeno-news.com/2023/01/20/breaking-sars-cov-2-spike-found-in-bacteria-samples-taken-from-china-2019/
So too with JAMOGK01 can you please identify the contaminant and make this contamination public BEFORE you comply with request from PLA to remove data…
If you can flag the contaminant codes now, when you are aware of what they are, this would be helpful.
Thank you.
Finally;
I am interested in COVID origin research.
Prof Zhegli Shi of WIV was quite clear that I examine all available data on GenBank before asking for access to WIV’s virus databases.
See previous correspondence on this issue:
<<Case CAS-1324284-Y8N8F1 - National Library of Medicine Customer Service confirmation TRACKING:000435001291518>>
This is quite challenging as some of the data in GenBank are hidden from view.
I would like to be able to see that data, and I would like to know how it became suppressed.
For example: the pre-print <<Spread and Geographic Structure of SARS-related Coronaviruses in Bats and the Origin of Human SARS Coronavirus; Yu2018unpublished>>
and the data set and correspondence with GenBank from the submitting authors is important to ongoing COVID Origin research.
Now the title of the pre-print gives zero results…due to suppression.
No items found.
On August 09, 2022 this paper gave 163 search results in GenBank as seen in this archive.
https:/web.archive.org/web/20220809085043/https:/www.ncbi.nlm.nih.gov/nuccore/?term=Spread+and+Geographic+Structure+of+SARS-related+Coronaviruses+in+++++++++++++Bats+and+the+Origin+of+Human+SARS+Coronavirus
By basic bioinformatics analysis we can see the recoverable series of nucleotide & protein submissions by the authors
<<Yu,P., Hu,B., Li,B., Luo,D., Zhu,G., Zhang,L., Holmes,E.C., Shi,Z. and Cui,J.>> extends from the suppressed record:-
<<GI 1769824624>>Record suppressed: spike protein [Bat SARS-like coronavirus] - Protein - NCBInlm.nih.gov
And
<<GI 1769824623>>Record suppressed: Bat SARS-like coronavirus strain Rs161465_Guangdong spike protein (S) - Nucleotide - NCBInlm.nih.gov
To:-
<<GI 1769824592>> where it is interrupted by an unrelated sequence placed on 25-OCT-2018 that is not suppressed:
Salmonella enterica subsp. enterica serovar Infantis strain FSIS170230 - Nucleotide - NCBInlm.nih.gov
Then to <<GI 1769824316>>Record suppressed: Bat SARS-like coronavirus strain Rs5725_Yunnan ORF8 gene, complete cds
Followed again by the unrelated<<GI 1769824315>>
Also placed on 25-OCT-2019Salmonella enterica subsp. enterica serovar Infantis strain FSIS170230
This series of 308 suppressed GI GenBank submissions is extensive but also not the full set of 163 nucleotide results yet as this should result in 326 missing GI numbers and only 308 have been recovered so 18 GI submissions are still missing according to my calculations.
This means of the at least 163 nucleotide and protein sequence pairs submitted for this preprint and placed in GenBank, only 154 are able to be recovered for analysis by examining the series of GI numbers at this stage.
How many original GI data points were placed for this preprint?
Also, where are the final nine nucleotide and protein sequences pairs that were searchable on August 09, 2022?
That means at least eighteen GI numbers are missing and due to GenBank suppression are not able to be found?
Perhaps there is another way to recover these missing files?
Was there an earlier GI number series that was placed when the preprint was originally submitted?
When was this series linked to <<Yu2018unpublished>> originally submitted?
Given that cybersecurity concerns were highlighted by Prof ZLShi as reason for limiting access to WIV’s extensive bat virus databases, can you please reassure me that the missing data from <<Yu2018unpublished>> is safe, and send links to the remaining suppressed and missing sequence submissions.
Thank you for your assistance.
Kind regards
Mr Tommy Cleary
Postgrad Student UNDA.
3/-
Part of the solution is here in the Synthetic Markers of tail codes Cyphers in Baric Lab products…
https://northernvirginiamag.com/culture/culture-features/2022/10/14/cia-kryptos-sculpture-cipher/
The main error of Virology and associated Science is not being direct enough to finish the Math and calculate not only risk of LAI in an age of Synthetics…but also to extrapolate this to determine the Extinction level risk of LAI.
Link
2/2
Citations to come.
I have to post and edit and add links?
Expand on what David Relman, Ralph Baric and Tom Ingelby seem to want to say…as well as what Eddie Holmes et al refuses to say…
Given our one home here how can the extinction level risks of GOF be simply ignored?
The Theory of Mind of Biological Weapons is explored here by Baric
https://www.jcvi.org/sites/default/files/assets/projects/synthetic-genomics-options-for-governance/Baric-Synthetic-Viral-Genomics.pdf
Dear Simon,
Teaching and learning are entwined…
<<Aside 3
It is assumed that the reader knows something of the GOF controversy in virology. To ensure the essays remain short, they are best read as a series. Are the essays too dense or difficult to absorb? Comments please. Suggestions for an article around which a future essay could be crafted would be welcome.>>
You essays are straight forward and easy to understand.
The reason you leave certain topics alone is not.
Dual Use Research of Concern is implicitly involved with Science and the arguments for and against GOF but also deeply and irreparably linked to the Dark Side…Biological Weapons and the extinction level danger of synthetics.