Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching

Oracle Engineered System

当前位置: 首页 » 技术文章 » 开源之美




  • Treat - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby
  • Ruby Linguistics - Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independent front end, a module for mapping language codes into language names, and a module which contains various English-language utilities.
  • Stemmer - Expose libstemmer_c to Ruby
  • Ruby Wordnet - This library is a Ruby interface to WordNet
  • Raspel - raspell is an interface binding for ruby
  • UEA Stemmer - Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing
  • Twitter-text-rb - A library that does auto linking and extraction of usernames, lists and hashtags in tweets



  • rsruby - Ruby - R bridge
  • data-visualization-ruby - Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby
  • ruby-plot - gnuplot wrapper for ruby, especially for plotting roc curves into svg files
  • plot-rb - A plotting library in Ruby built on top of Vega and D3.
  • scruffy - A beautiful graphing toolkit for Ruby
  • SciRuby
  • Glean - A data management tool for humans
  • Bioruby
  • Arel




  • ahaz - ahaz: Regularization for semiparametric additive hazards regression
  • arules - arules: Mining Association Rules and Frequent Itemsets
  • bigrf - bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets
  • bigRR - bigRR: Generalized Ridge Regression (with special advantage for p >> n cases)
  • bmrm - bmrm: Bundle Methods for Regularized Risk Minimization Package
  • Boruta - Boruta: A wrapper algorithm for all-relevant feature selection
  • bst - bst: Gradient Boosting
  • C50 - C50: C5.0 Decision Trees and Rule-Based Models
  • caret - Classification and Regression Training: Unified interface to ~150 ML algorithms in R.
  • caretEnsemble - caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models.
  • Clever Algorithms For Machine Learning
  • CORElearn - CORElearn: Classification, regression, feature evaluation and ordinal evaluation
  • CoxBoost - CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks
  • Cubist - Cubist: Rule- and Instance-Based Regression Modeling
  • e1071 - e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
  • earth - earth: Multivariate Adaptive Regression Spline Models
  • elasticnet - elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA
  • ElemStatLearn - ElemStatLearn: Data sets, functions and examples from the book: "The Elements of Statistical Learning, Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman
  • evtree - evtree: Evolutionary Learning of Globally Optimal Trees
  • fpc - fpc: Flexible procedures for clustering
  • frbs - frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks
  • GAMBoost - GAMBoost: Generalized linear and additive models by likelihood based boosting
  • gamboostLSS - gamboostLSS: Boosting Methods for GAMLSS
  • gbm - gbm: Generalized Boosted Regression Models
  • glmnet - glmnet: Lasso and elastic-net regularized generalized linear models
  • glmpath - glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model
  • GMMBoost - GMMBoost: Likelihood-based Boosting for Generalized mixed models
  • grplasso - grplasso: Fitting user specified models with Group Lasso penalty
  • grpreg - grpreg: Regularization paths for regression models with grouped covariates
  • h2o - A framework for fast, parallel, and distributed machine learning algorithms at scale -- Deeplearning, Random forests, GBM, KMeans, PCA, GLM
  • hda - hda: Heteroscedastic Discriminant Analysis
  • Introduction to Statistical Learning
  • ipred - ipred: Improved Predictors
  • kernlab - kernlab: Kernel-based Machine Learning Lab
  • klaR - klaR: Classification and visualization
  • lars - lars: Least Angle Regression, Lasso and Forward Stagewise
  • lasso2 - lasso2: L1 constrained estimation aka ‘lasso’
  • LiblineaR - LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library
  • LogicReg - LogicReg: Logic Regression
  • Machine Learning For Hackers
  • maptree - maptree: Mapping, pruning, and graphing tree models
  • mboost - mboost: Model-Based Boosting
  • medley - medley: Blending regression models, using a greedy stepwise approach
  • mlr - mlr: Machine Learning in R
  • mvpart - mvpart: Multivariate partitioning
  • ncvreg - ncvreg: Regularization paths for SCAD- and MCP-penalized regression models
  • nnet - nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models
  • oblique.tree - oblique.tree: Oblique Trees for Classification Data
  • pamr - pamr: Pam: prediction analysis for microarrays
  • party - party: A Laboratory for Recursive Partytioning
  • partykit - partykit: A Toolkit for Recursive Partytioning
  • penalized - penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model
  • penalizedLDA - penalizedLDA: Penalized classification using Fisher's linear discriminant
  • penalizedSVM - penalizedSVM: Feature Selection SVM using penalty functions
  • quantregForest - quantregForest: Quantile Regression Forests
  • randomForest - randomForest: Breiman and Cutler's random forests for classification and regression
  • randomForestSRC - randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC)
  • rattle - rattle: Graphical user interface for data mining in R
  • rda - rda: Shrunken Centroids Regularized Discriminant Analysis
  • rdetools - rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces
  • REEMtree - REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data
  • relaxo - relaxo: Relaxed Lasso
  • rgenoud - rgenoud: R version of GENetic Optimization Using Derivatives
  • rgp - rgp: R genetic programming framework
  • Rmalschains - Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R
  • rminer - rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression
  • ROCR - ROCR: Visualizing the performance of scoring classifiers
  • RoughSets - RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories
  • rpart - rpart: Recursive Partitioning and Regression Trees
  • RPMM - RPMM: Recursively Partitioned Mixture Model
  • RSNNS - RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)
  • RWeka - RWeka: R/Weka interface
  • RXshrink - RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression
  • sda - sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection
  • SDDA - SDDA: Stepwise Diagonal Discriminant Analysis
  • SuperLearner and subsemble - Multi-algorithm ensemble learning packages.
  • svmpath - svmpath: svmpath: the SVM Path algorithm
  • tgp - tgp: Bayesian treed Gaussian process models
  • tree - tree: Classification and regression trees
  • varSelRF - varSelRF: Variable selection using random forests
  • XGBoost.R - R binding for eXtreme Gradient Boosting (Tree) Library
  • Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.


  • ggplot2 - A data visualization package based on the grammar of graphics.



  • ScalaNLP - ScalaNLP is a suite of machine learning and numerical computing libraries.
  • Breeze - Breeze is a numerical processing library for Scala.
  • Chalk - Chalk is a natural language processing library.
  • FACTORIE - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.


  • MLlib in Apache Spark - Distributed machine learning library in Spark
  • Scalding - A Scala API for Cascading
  • Summing Bird - Streaming MapReduce with Scalding and Storm
  • Algebird - Abstract Algebra for Scala
  • xerial - Data management utilities for Scala
  • simmer - Reduce your data. A unix filter for algebird-powered aggregation.
  • PredictionIO - PredictionIO, a machine learning server for software developers and data engineers.
  • BIDMat - CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.
  • Wolfe Declarative Machine Learning


  • Conjecture - Scalable Machine Learning in Scalding
  • brushfire - Distributed decision tree ensemble learning in Scala
  • ganitha - scalding powered machine learning
  • adam - A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
  • bioscala - Bioinformatics for the Scala programming language
  • BIDMach - CPU and GPU-accelerated Machine Learning Library.
  • Figaro - a Scala library for constructing probabilistic models.
  • H2O Sparkling Water - H2O and Spark interoperability.


关键词:ml open 


conda and anaconda
cmake(Write once, run everywhere)
Hadoop Ecosystem
AI Open platform h2o
Oracle Graphpipe
Oracle Open Source Projects