| Title: | Ultra-Fast Analysis of Sparse DNA Methylome via Recurrent Pattern Encoding |
|---|---|
| Description: | Methods for analyzing DNA methylation data via Most Recurrent Methylation Patterns (MRMPs). Supports cell-type annotation, spatial deconvolution, unsupervised clustering, and cancer cell-of-origin inference. Includes C-backed summaries for YAME “.cg/.cm” files (overlap counts, log2 odds ratios, beta/depth aggregation), an XGBoost classifier, NNLS deconvolution, and plotting utilities. Scales to large spatial and single-cell methylomes and is robust to extreme sparsity. |
| Authors: | Hongxiang Fu [aut, cre] (ORCID: <https://orcid.org/0000-0002-9873-8606>), Wanding Zhou [cph, fnd], The SAMtools/HTSlib authors [ctb, cph] (BGZF components; see inst/COPYRIGHTS), Attractive Chaos [ctb, cph] (Author and copyright holder of khash.h (klib, MIT license)) |
| Maintainer: | Hongxiang Fu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-05-27 06:09:49 UTC |
| Source: | https://github.com/cran/MethScope |
Produce confidence score for XGBoost prediction
confidence_score(vec)confidence_score(vec)
vec |
A vector of predicted probability for each cell type |
A numeric confidence score from 0 to 1.
Produce confidence score based on top 95 percent for XGBoost prediction
confidence_score_top95(vec)confidence_score_top95(vec)
vec |
A vector of predicted probability for each cell type |
A numeric confidence score from 0 to 1.
Filter final prediction to reduce noise
filter_cell(pred_result, knn_res, KNeighbor = 5)filter_cell(pred_result, knn_res, KNeighbor = 5)
pred_result |
The prediction result from XGBoost |
knn_res |
knn graph from smooth_matrix |
KNeighbor |
Number of knn neighbors to use for smoothing (Default: 5) |
The final prediction result after dropping few cell types
Generate pattern level data for cell type annotation
GenerateInput(query_fn, knowledge_fn)GenerateInput(query_fn, knowledge_fn)
query_fn |
File path to query .cg |
knowledge_fn |
File path to pattern file .cm |
A cell by pattern matrix.
qry <- system.file("extdata", "toy.cg", package = "MethScope") msk <- system.file("extdata", "toy.cm", package = "MethScope") res <- GenerateInput(qry, msk)qry <- system.file("extdata", "toy.cg", package = "MethScope") msk <- system.file("extdata", "toy.cm", package = "MethScope") res <- GenerateInput(qry, msk)
Generate reference pattern labels (no default writing)
GenerateReference(binary_file, min_CG = 50, output_path = NULL)GenerateReference(binary_file, min_CG = 50, output_path = NULL)
binary_file |
Path to the pattern strings file (one string per line). |
min_CG |
Minimum CpG count a pattern must have to keep its own ID (default: 50). Patterns with frequency <= 'min_CG' are grouped as "Pna". |
output_path |
Optional file path to write the resulting labels. If 'NULL' (default), nothing is written and the labels are only returned. |
A character vector of pattern labels (same length/order as the input file).
## Not run: # DO write only to a temp location in examples/vignettes/tests: tmp_out <- file.path(tempdir(), "patterns.txt") labs <- GenerateReference("path/to/pattern_strings.txt", min_CG = 50, output_path = tmp_out) # Or skip writing and just get the vector: labs <- GenerateReference("path/to/pattern_strings.txt", min_CG = 50) ## End(Not run)## Not run: # DO write only to a temp location in examples/vignettes/tests: tmp_out <- file.path(tempdir(), "patterns.txt") labs <- GenerateReference("path/to/pattern_strings.txt", min_CG = 50, output_path = tmp_out) # Or skip writing and just get the vector: labs <- GenerateReference("path/to/pattern_strings.txt", min_CG = 50) ## End(Not run)
Impute missing value for 100K window matrix
imputeRowMean(mtx, na_percent = 30)imputeRowMean(mtx, na_percent = 30)
mtx |
A cell by 100K window data frame with missing values |
na_percent |
A na percent threshold to be filterd (Default: 30) |
A cell by 100K window data frame with imputed values
Train XGBoost model to predict cell type
Input_training( summary_results, cell_type_label, number_patterns = 1000, cross_validation = FALSE, xgb_parameters = list() )Input_training( summary_results, cell_type_label, number_patterns = 1000, cross_validation = FALSE, xgb_parameters = list() )
summary_results |
a wide cell by pattern matrix generated from GenerateInput function |
cell_type_label |
a vector of the corresponding cell type label for each row of the summary results |
number_patterns |
a numeric value to indicate number of patterns to be used (Default: 1000) |
cross_validation |
a boolean varaible whether to perform cross_validation to obtain the best hyper parameters for the model |
xgb_parameters |
an optional list for xgb model parameters provided by the user |
the xgb model trained
Estimate cell type relative proportion
nnls_deconv(ref, mixture_matrix, number_patterns = 1000, var_threshold = 0.01)nnls_deconv(ref, mixture_matrix, number_patterns = 1000, var_threshold = 0.01)
ref |
An imputed wide cell by pattern matrix generated from GenerateInput function using reference Pseudobulk |
mixture_matrix |
An imputed wide cell by pattern matrix generated from GenerateInput function |
number_patterns |
a numeric value to indicate number of patterns to be used (Default: 1000) |
var_threshold |
a numeric value to indicate variance that should filter the patterns (Default: 0.1) |
A cell type by cell matrix showing the relative cell type proportion estimate for each cells
Generate confusion table for the final prediction
PlotConfusion(prediction_result, actual_label, log2 = FALSE)PlotConfusion(prediction_result, actual_label, log2 = FALSE)
prediction_result |
Prediction result from PredictCellType |
actual_label |
Ground truth cell label |
log2 |
Log scale count (Default: False) |
A ggplot2 confusion table object.
Generate F1 score barplot for each class
PlotF1(prediction_result, actual_label)PlotF1(prediction_result, actual_label)
prediction_result |
Prediction result from PredictCellType |
actual_label |
Ground truth cell label |
A ggplot2 object.
Generate UMAP for the final prediction based on cell patterns
PlotUMAP(predictMatrix, prediction_result, n_component = 30, seed = 123, ...)PlotUMAP(predictMatrix, prediction_result, n_component = 30, seed = 123, ...)
predictMatrix |
a wide cell by pattern matrix generated from GenerateInput function |
prediction_result |
Prediction result from PredictCellType |
n_component |
Number of PCA components to use (Default: 30) |
seed |
A number for random seed (Default: 123) |
... |
Additional arguments passed to 'uwot::umap' (e.g., 'n_neighbors', 'metric'). |
A list of two ggplot2 UMAP object.
Generate UMAP for the final prediction based on fixed window eg.100kb bin widows
PlotUMAP_fixedwindow( query_fn, knowledge_fn, prediction_result, n_component = 30, seed = 123, ... )PlotUMAP_fixedwindow( query_fn, knowledge_fn, prediction_result, n_component = 30, seed = 123, ... )
query_fn |
File path to query .cg |
knowledge_fn |
File path to 100bk bins window or reference pattern |
prediction_result |
Prediction result from PredictCellType |
n_component |
Number of PCA components to use (Default: 30) |
seed |
A number for random seed (Default: 123) |
... |
Additional arguments passed to 'uwot::umap' (e.g., 'n_neighbors', 'metric'). |
A list of two ggplot2 UMAP object.
Predict cell type annotation from the trained model
PredictCellType(bst_model, predictMatrix, smooth = FALSE, KNeighbor = 5)PredictCellType(bst_model, predictMatrix, smooth = FALSE, KNeighbor = 5)
bst_model |
The boosting model trained from ModelTrain |
predictMatrix |
A wide cell by pattern matrix generated from GenerateInput function |
smooth |
A Boolean variable to indicate whether smooth the matrix (Default: FALSE) |
KNeighbor |
number of knn neighbors to use for smoothing (Default: 5) |
A cell by cell type matrix with confidence score and labeled cell type.
qry <- system.file("extdata", "toy.cg", package = "MethScope") msk <- system.file("extdata", "toy.cm", package = "MethScope") res <- GenerateInput(qry, msk) ## Not run: prediction <- PredictCellType(Liu2021_MouseBrain_P1000,res) ## End(Not run)qry <- system.file("extdata", "toy.cg", package = "MethScope") msk <- system.file("extdata", "toy.cm", package = "MethScope") res <- GenerateInput(qry, msk) ## Not run: prediction <- PredictCellType(Liu2021_MouseBrain_P1000,res) ## End(Not run)
Smooth cell by pattern matrix to reduce noise
smooth_matrix(predictMatrix, KNeighbor = 5)smooth_matrix(predictMatrix, KNeighbor = 5)
predictMatrix |
A wide cell by pattern matrix generated from GenerateInput function |
KNeighbor |
Number of knn neighbors to use for smoothing (Default: 5) |
A wide cell by pattern matrix after smoothing and knn graph