misc/CHANGELOG-archive.md
These are the release notes for older versions. For current releases, see CHANGELOG.md.
ObjectSpace.memsize_of is now safe to call on Documents with complex DTDs. In previous versions, this debugging method could result in a segfault. [#2923, #2924]XML::Node as the first parameter to CDATA.new now raises a TypeError. Previously this would result in either a segfault (CRuby) or a Java exception (JRuby). [#2920]XML::Node as the first parameter to Schema.from_document now raises a TypeError. Previously this would result in either a segfault (CRuby) or a Java exception (JRuby). [#2920]XML::Node as the second parameter to Text.new now raises a TypeError. Previously this would result in a segfault. [#2920]Node#inner_html=, #children=, and #replace no longer defensively dups the node's next sibling if it is a Text node. This behavior was originally adopted to work around libxml2's memory management (see #283 and #595) but should not have included operations involving xmlAddChild(). [#2916]malloc and freeSince 2009, Nokogiri has configured libxml2 to use ruby_xmalloc et al for memory management. This has provided benefits for memory management, but comes with a performance penalty.
Users can now opt into using system malloc for libxml2 memory management by setting an environment variable:
# "default" here means "libxml2's default" which is system malloc
NOKOGIRI_LIBXML_MEMORY_MANAGEMENT=default
Benchmarks show that this setting will significantly improve performance, but be aware that the tradeoff may involve poorer memory management including bloated heap sizes and/or OOM conditions.
You can read more about this in the decision record at adr/2023-04-libxml-memory-management.md.
Encoding objects may now be passed to serialization methods like #to_xml, #to_html, #serialize, and #write_to to specify the output encoding. Previously only encoding names (strings) were accepted. [#2774, #2798] (@ellaklara)malloc for libxml2 memory management. For more detail, see note above or adr/2023-04-libxml-memory-management.md.Schema.from_document now makes a defensive copy of the document if it has blank text nodes with Ruby objects instantiated for them. This prevents unsafe behavior in libxml2 from causing a segfault. There is a small performance cost, but we think this has the virtue of being "what the user meant" since modifying the original is surprising behavior for most users. Previously this was addressed in v1.10.9 by raising an exception.XSLT.transform now makes a defensive copy of the document if it has blank text nodes with Ruby objects instantiated for them and the template uses xsl:strip-spaces. This prevents unsafe behavior in libxslt from causing a segfault. There is a small performance cost, but we think this has the virtue of being "what the user meant" since modifying the original is surprising behavior for most users. Previously this would allow unsafe memory access and potentially segfault. [#2800]Nokogiri::XML::Node::SaveOptions#inspect now shows the names of the options set in the bitmask, similar to ParseOptions. [#2767]#inspect and pretty-printing are improved for AttributeDecl, ElementContent, ElementDecl, and EntityDecl.ObjectSpace.memsize_of reports a pretty good guess of memory usage when called on Nokogiri::XML::Document objects. [#2807] (@etiennebarrie and @byroot)config.guess and config.sub that supports new architectures like loongarch64. [#2831] (@zhangwenlong8911)xlink:arcrole and removing xml:base [#2841, #2842]<hr> in <select> [whatwg/html#3410, whatwg/html#9124]Node#first_element_child now returns nil if there are only non-element children. Previously a null pointer exception was raised. [#2808, #2844]Nokogiri::XSLT now has usage examples including custom function handlers.Nokogiri::XML::Node as the first parameter to CDATA.new is deprecated and will generate a warning. This parameter should be a kind of Nokogiri::XML::Document. This will become an error in a future version of Nokogiri.Nokogiri::XML::Node as the first parameter to Schema.from_document is deprecated and will generate a warning. This parameter should be a kind of Nokogiri::XML::Document. This will become an error in a future version of Nokogiri.Nokogiri::XML::Node as the second parameter to Text.new is deprecated and will generate a warning. This parameter should be a kind of Nokogiri::XML::Document. This will become an error in a future version of Nokogiri.nokogiri namespace is deprecated and will generate a warning. Support for non-namespaced functions will be removed in a future version of Nokogiri. (Note that JRuby has never supported non-namespaced custom XPath functions.)The following people and organizations were kind enough to sponsor @flavorjones or the Nokogiri project during the development of v1.15.0:
We'd also like to thank @github who donate a ton of compute time for our CI pipelines!
To ensure that JRuby users on Java 8 can apply the security changes from v1.14.4, we're cutting this release on the v1.14.x branch. We don't expect to make any more v1.14.x releases.
(The changes in this release are incorporated into the v1.15.x release branch at v1.15.2.)
[JRuby] Vendored Xalan-J is updated to v2.7.3. This is the first Xalan release in nine years, and it was done to address CVE-2022-34169.
The Nokogiri maintainers wish to stress that Nokogiri users were not vulnerable to this CVE, as we explained in GHSA-qwq9-89rg-ww72, and so upgrading is really at the discretion of users.
This release was cut primarily so that JRuby users of v1.14.x can avoid vulnerability scanner alerts on earlier versions of Xalan-J.
NodeSet#to_html on an empty node set no longer raises an encoding-related exception. This bug was introduced in v1.14.0 while fixing #2649. [#2784]Zip::OutputStream). This was a regression in v1.14.0 due to the fix for #752 in #2434, and was not completely fixed by #2753. [#2773]void* casting and old-style C function definitions.This release introduces native gem support for Ruby 3.2. (Also see "Technical note" under "Changed" below.)
This release ends support for:
aarch64-linux (aka linux/arm64/v8)This version of Nokogiri ships official native gem support for the aarch64-linux platform, which should support AWS Graviton and other ARM64 Linux platforms. Please note that glibc >= 2.29 is required for aarch64-linux systems, see Supported Platforms for more information.
arm-linux (aka linux/arm/v7)This version of Nokogiri ships experimental native gem support for the arm-linux platform. Please note that glibc >= 2.29 is required for arm-linux systems, see Supported Platforms for more information.
This version introduces an experimental pattern matching API for XML::Attr, XML::Document, XML::DocumentFragment, XML::Namespace, XML::Node, and XML::NodeSet (and their subclasses).
Some documentation on what can be matched:
XML::Attr#deconstruct_keysXML::Document#deconstruct_keysXML::Namespace#deconstruct_keysXML::Node#deconstruct_keysXML::DocumentFragment#deconstructXML::NodeSet#deconstructWe welcome feedback on this API at #2360.
jar-dependencies to manage most of the vendored Java dependencies. nokogiri -v now outputs maven metadata for all Java dependencies, and Nokogiri::VERSION_INFO also contains this metadata. [#2432]net.sourceforge.htmlunit:neko-htmlunit:2.61.0 (previously Nokogiri used a fork of org.cyberneko.html:nekohtml)com.thaiopensource:jing:20091111 to nu.validator:jing:20200702VNU.net.sf.saxon:Saxon-HE:9.6.0-4 (via nu.validator:jing:20200702VNU).Node#wrap and NodeSet#wrap now also accept a Node type argument, which will be duped for each wrapper. For cases where many nodes are being wrapped, creating a Node once using Document#create_element and passing that Node multiple times is significantly faster than re-parsing markup on each call. [#2657]nokogiri namespace prefix. Historically, the JRuby implementation required this namespace but the CRuby implementation did not support it. It's recommended that all XPath and CSS queries use the nokogiri namespace going forward. Invocation without the namespace is planned for deprecation in v1.15.0 and removal in a future release. [#2147]HTML5::Document#quirks_mode and HTML5::DocumentFragment#quirks_mode expose the quirks mode used by the parser.Encoding objects rather than compare their names. This is a slight performance improvement and is future-proof. [#2454] (@casperisfine)Document#canonicalize now raises an exception if inclusive_namespaces is non-nil and the mode is inclusive, i.e. XML_C14N_1_0 or XML_C14N_1_1. inclusive_namespaces can only be passed with exclusive modes, and previously this silently failed.Nokogiri::CSS::SyntaxError message, "empty CSS selector". Previously the exception raised from the bowels of racc was "unexpected '$' after ''". [#2700]XML::Reader parsing errors encountered during Reader#attribute_hash and Reader#namespaces now raise an XML::SyntaxError. Previously these methods would return nil and users would generally experience NoMethodErrors from elsewhere in the code.ruby_xmalloc to malloc within the C extension. [#2480] (@Garfield96)gumbo.h on OpenBSD. [#2464]vasprintf in favor of platform-independent rb_vsprintf-Wno-unknown-warning-option to avoid errors when Ruby injects options that clang doesn't know about. [#2689]SAX::Parser's encoding attribute will not be clobbered when an alternative encoding is passed into SAX::Parser#parse_io. [#1942] (@kp666)HTML4::DocumentFragment will now be properly encoded. Previously this empty string was encoded as US-ASCII. [#2649]Node#wrap now uses the parent as the context node for parsing wrapper markup, falling back to the document for unparented nodes. Previously the document was always used.form elements.HTML5::Document#fragment now always uses body as the parsing context. Previously, fragments were parsed in the context of the associated document's root node, which allowed for inconsistent parsing. [#2553]Nokogiri::HTML5::Document#url now correctly returns the URL passed to the constructor method. Previously it always returned nil. [#2583]HTML5 encoding detection is now case-insensitive with respect to meta tag charset declaration. [#2693]HTML5 fragment parsing in context of an annotation-xml node now works. Previously this rarely-used path invoked rb_funcall with incorrect parameters, resulting in an exception, a fatal error, or potentially a segfault. [#2692]HTML5 quirks mode during fragment parsing more closely matches document parsing. [#2646]#add_namespace_definition. [#1247]NodeSet#[] now raises a TypeError if passed an invalid parameter type. [#2211]Nokogiri.install_default_aliases is deprecated in favor of Nokogiri::EncodingHandler.install_default_aliases. This is part of a private API and is probably not called by anybody, but we'll go through a deprecation cycle before removal anyway. [#2643, #2446]The following people and organizations were kind enough to sponsor @flavorjones or the Nokogiri project during the development of v1.14.0:
xmlTextReaderExpand. See GHSA-qv4q-mr5r-qprj for more information.XML::Reader#attribute_hash now returns nil on parse errors. This restores the behavior of #attributes from v1.13.7 and earlier. [#2715]Nokogiri::XML::Namespace objects, when compacted, update their internal struct's reference to the Ruby object wrapper. Previously, with GC compaction enabled, a segmentation fault was possible after compaction was triggered. [#2658] (@eightbitraptor and @peterzhu2118)Document#remove_namespaces! now defers freeing the underlying xmlNs struct until the Document is GCed. Previously, maintaining a reference to a Namespace object that was removed in this way could lead to a segfault. [#2658]XML::Reader#attribute_nodes is deprecated due to incompatibility between libxml2's xmlReader memory semantics and Ruby's garbage collector. Although this method continues to exist for backwards compatibility, it is unsafe to call and may segfault. This method will be removed in a future version of Nokogiri, and callers should use #attribute_hash instead. [#2598]XML::Reader#attribute_hash is a new method to safely retrieve the attributes of a node from XML::Reader. [#2598, #2599]XML::Reader#attributes is now safe to call. In Nokogiri <= 1.13.7 this method may segfault. [#2598, #2599]XML::Node objects, when compacted, update their internal struct's reference to the Ruby object wrapper. Previously, with GC compaction enabled, a segmentation fault was possible after compaction was triggered. [#2578] (@eightbitraptor)
{HTML4,XML}::SAX::{Parser,ParserContext} constructor methods now raise TypeError instead of segfaulting when an incorrect type is passed.< characters.<![CDATA[ and incorrectly-opened comments will result in HTML text nodes starting with <! instead of skipping the invalid tag. This behavior is a direct result of the quadratic-behavior fix noted above. The behavior of downstream sanitizers relying on this behavior will also change. Some tests describing the changed behavior are in test/html4/test_comments.rb.xerces:xercesImpl) is updated to address CVE-2022-23437. See GHSA-xxx9-3xcr-gjj3 for more information.org.cyberneko.html) is updated to address CVE-2022-24839. See GHSA-gx8x-g87m-h5q6 for more information.xerces:xercesImpl) is updated from 2.12.0 to 2.12.2.org.cyberneko.html) is updated from a fork of 1.9.21 to 1.9.22.noko2. This fork is now publicly developed at https://github.com/sparklemotion/nekohtml< character in some contexts. This version of Nokogiri restores the earlier behavior, which is to recover from the parse error and treat the < as normal character data (which will be serialized as < in a text node). The bug (and the fix) is only relevant when the RECOVER parse option is set, as it is by default. [#2461]Please see GHSA-fq42-c5rg-92c2 for more information about these CVEs.
Nokogiri::XSLT.quote_params regression in v1.13.0 that raised an exception when non-string stylesheet parameters were passed. Non-string parameters (e.g., integers and symbols) are now explicitly supported and both keys and values will be stringified with #to_s. [#2418]Nokogiri::XML::XPath::SyntaxError when parsing XPath attributes mixed into the CSS query. Although this mash-up of XPath and CSS syntax previously worked unintentionally, it is now an officially supported feature and is documented as such. [#2419]This release introduces native gem support for Ruby 3.1. Please note that Windows users should use the x64-mingw-ucrt platform gem for Ruby 3.1, and x64-mingw32 for Ruby 2.6–3.0 (see RubyInstaller 3.1.0 release notes).
This release ends support for:
This version of Nokogiri ships experimental native gem support for the aarch64-linux platform, which should support AWS Graviton and other ARM Linux platforms. We don't yet have CI running for this platform, and so we're interested in hearing back from y'all whether this is working, and what problems you're seeing. Please send us feedback here: Feedback: Have you used the aarch64-linux native gem?
This version of Nokogiri opts-in to the "MFA required to publish" setting on Rubygems.org. This and all future Nokogiri gem files must be published to Rubygems by an account with multi-factor authentication enabled. This should provide some additional protection against supply-chain attacks.
A related discussion about Trust exists at #2357 in which I invite you to participate if you have feelings or opinions on this topic.
LICENSE-DEPENDENCIES.md for more information.) [#2206]~> 2.6.1 to ~> 2.7.0. ("ruby" platform gem only.){XML,HTML4}::DocumentFragment constructors all now take an optional parse options parameter or block (similar to Document constructors). [#1692] (@JackMc)Nokogiri::CSS.xpath_for allows an XPathVisitor to be injected, for finer-grained control over how CSS queries are translated into XPath.XML::Reader#encoding will return the encoding detected by the parser when it's not passed to the constructor. [#980]Node#line is no longer capped at 65535. libxml v2.9.0 and later support a new parse option, exposed as Nokogiri::XML::ParseOptions::PARSE_BIG_LINES, which is turned on by default in ParseOptions::DEFAULT_{XML,XSLT,HTML,SCHEMA} (Note that JRuby already supported large line numbers.) [#1764, #1493, #1617, #1505, #1003, #533]RuntimeError is raised. libxml2 does no checking for this, which means cycles would otherwise result in infinite loops on subsequent operations. (Note that JRuby already did this.) [#1912]Node#line behavior has been modified to return the line number of the node in the final DOM structure. This behavior is different from CRuby, which returns the node's position in the input string. Ideally the two implementations would be the same, but at least is now officially documented and tested. The real-world impact of this change is that the value returned in JRuby is greater by 1 to account for the XML prolog in the output. [#2380] (@dabdine)XML::Builder blocks restore context properly when exceptions are raised. [#2372] (@ric2b and @rinthedev)Nokogiri::CSS::Parser cache now uses the XPathVisitor configuration as part of the cache key, preventing incorrect cache results from being returned when multiple XPathVisitor options are being used.Node#parse) now always uses the correct DocumentFragment class. Previously Nokogiri::HTML4::DocumentFragment was always used, even for XML documents. [#1158]DocumentFragment#> now works properly, matching a CSS selector against only the fragment roots. [#1857]XML::DocumentFragment#errors now correctly contains any parsing errors encountered. Previously this was always empty. (Note that HTML::DocumentFragment#errors already did this.)Document#canonicalize when inclusive namespaces are passed in. [#2345]Document#canonicalize when an argument type error is raised. [#2345]EncodingHandler where iconv handlers were not being cleaned up. [#2345]Reader#base_uri where the string returned by libxml2 was not freed. [#2347]Namespace from a NodeSet no longer modifies the href to be the default namespace URL.Nokogiri::XML::Node as the second parameter to Node.new is deprecated and will generate a warning. This parameter should be a kind of Nokogiri::XML::Document. This will become an error in a future version of Nokogiri. [#975]Nokogiri::CSS::Parser, Nokogiri::CSS::Tokenizer, and Nokogiri::CSS::Node are now internal-only APIs that are no longer documented, and should not be considered stable. With the introduction of XPathVisitor injection into Nokogiri::CSS.xpath_for there should be no reason to rely on these internal APIs.Nokogiri::CSS::XPathVisitorAlwaysUseBuiltins and XPathVisitorOptimallyUseBuiltins are deprecated. Prefer Nokogiri::CSS::XPathVisitor with appropriate constructor arguments. These classes will be removed in a future version of Nokogiri.[JRuby] Address CVE-2021-41098 (GHSA-2rr5-8q37-2w7h).
In Nokogiri v1.12.4 and earlier, on JRuby only, the SAX parsers resolve external entities (XXE) by default. This fix turns off entity-resolution-by-default in the JRuby SAX parsers to match the CRuby SAX parsers' behavior.
CRuby users are not affected by this CVE.
Document#to_xhtml properly serializes self-closing tags in libxml > 2.9.10. A behavior change introduced in libxml 2.9.11 resulted in emitting start and and tags (e.g., </br>) instead of a self-closing tag (e.g., ) in previous Nokogiri versions. [#2324]Namespace behavior when reparenting nodes has historically been poorly specified and the behavior diverged between CRuby and JRuby. As a result, making this behavior consistent in v1.12.0 introduced a breaking change.
This patch release reverts the Builder behavior present in v1.12.0..v1.12.3 but keeps the Document behavior. This release also introduces a Document attribute to allow affected users to easily change this behavior for their legacy code without invasive changes.
This release of Nokogiri introduces a new Document boolean attribute, namespace_inheritance, which controls whether children should inherit a namespace when they are reparented. Nokogiri::XML:Document defaults this attribute to false meaning "do not inherit," thereby making explicit the behavior change introduced in v1.12.0.
CRuby users who desire the pre-v1.12.0 behavior may set document.namespace_inheritance = true before reparenting nodes.
See https://nokogiri.org/rdoc/Nokogiri/XML/Document.html#namespace_inheritance-instance_method for example usage.
However, recognizing that we want Builder-created children to inherit namespaces, Builder now will set namespace_inheritance=true on the underlying document for both JRuby and CRuby. This means that, on CRuby, the pre-v1.12.0 behavior is restored.
Users who want to turn this behavior off may pass a keyword argument to the Builder constructor like so:
Nokogiri::XML::Builder.new(namespace_inheritance: false)
See https://nokogiri.org/rdoc/Nokogiri/XML/Builder.html#label-Namespace+inheritance for example usage.
Note that any downstream gems may want to specifically omit Nokogiri v1.12.0--v1.12.3 from their dependency specification if they rely on child namespace inheritance:
Gem::Specification.new do |gem|
# ...
gem.add_runtime_dependency 'nokogiri', '!=1.12.3', '!=1.12.2', '!=1.12.1', '!=1.12.0'
# ...
end
systemId. [#2296] (@pepijnve)require and rely on $LOAD_PATH instead of using require_relative. This issue only exists when deleting shared libraries that exist outside the extensions directory, something users occasionally do to conserve disk space. [#2300]HTML5 support has been added (to CRuby only) by merging Nokogumbo into Nokogiri. The Nokogumbo public API has been preserved, so this functionality is available under the Nokogiri::HTML5 namespace. [#2204]
Please note that HTML5 support is not available for JRuby in this version. However, we feel it is important to think about JRuby and we hope to work on this in the future. If you're interested in helping with HTML5 support on JRuby, please reach out to the maintainers by commenting on issue #2227.
Many thanks to Sam Ruby, Steve Checkoway, and Craig Barnes for creating and maintaining Nokogumbo and supporting the Gumbo HTML5 parser. They're now Nokogiri core contributors with all the powers and privileges pertaining thereto. 🙌
Nokogiri::HTML4 module and namespaceNokogiri::HTML has been renamed to Nokogiri::HTML4, and Nokogiri::HTML is aliased to preserve backwards-compatibility. Nokogiri::HTML and Nokogiri::HTML4 parse methods still use libxml2's (or NekoHTML's) HTML4 parser in the v1.12 release series.
Take special note that if you rely on the class name of an object in your code, objects will now report a class of Nokogiri::HTML4::Foo where they previously reported Nokogiri::HTML::Foo. Instead of relying on the string returned by Object#class, prefer Class#=== or Object#is_a? or Object#instance_of?.
Future releases of Nokogiri may deprecate HTML methods or otherwise change this behavior, so please start using HTML4 in place of HTML.
Nokogiri::VERSION_INFO["libxslt"]["datetime_enabled"] is a new boolean value which describes whether libxslt (or, more properly, libexslt) has compiled-in datetime support. This generally going to be true, but some distros ship without this support (e.g., some mingw UCRT-based packages, see https://github.com/msys2/MINGW-packages/pull/8957). See #2272 for more details.Nokogiri::XML::ParseOptions::DEFAULT_XSLT, which adds the libxslt-preferred options of NOENT | DTDLOAD | DTDATTR | NOCDATA to ParseOptions::DEFAULT_XML.Nokogiri.XSLT parses stylesheets using ParseOptions::DEFAULT_XSLT, which should make some edge-case XSL transformations match libxslt's default behavior. [#1940]libiconv, libxml2, and libxslt by using autoconf's --disable-dependency-tracking option. ("ruby" platform gem only.)Nokogiri::HTML5.get. This method will be removed in a future version of Nokogiri.~> 2.5.0 to ~> 2.6.1. ("ruby" platform gem only.)DocumentFragment#path now does proper error-checking to handle behavior introduced in libxml > 2.9.10. In v1.11.4 and v1.11.5, calling DocumentFragment#path could result in a segfault.[Windows CRuby] Work around segfault at process exit on Windows when using libxml2 system DLLs.
libxml 2.9.12 introduced new behavior to avoid memory leaks when unloading libxml2 shared libraries (see libxml/!66). Early testing caught this segfault on non-Windows platforms (see #2059 and libxml@956534e) but it was incompletely fixed and is still an issue on Windows platforms that are using system DLLs.
We work around this by configuring libxml2 in this situation to use its default memory management functions. Note that if Nokogiri is not on Windows, or is not using shared system libraries, it will will continue to configure libxml2 to use Ruby's memory management functions. Nokogiri::VERSION_INFO["libxml"]["memory_management"] will allow you to verify when the default memory management functions are being used. [#2241]
Nokogiri::VERSION_INFO["libxml"] now contains the key "memory_management" to declare whether libxml2 is using its default memory management functions, or whether it uses the memory management functions from ruby. See above for more details.
[CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:
Note that two additional CVEs were addressed upstream but are not relevant to this release. CVE-2021-3516 via xmllint is not present in Nokogiri, and CVE-2020-7595 has been patched in Nokogiri since v1.10.8 (see #1992).
Please see nokogiri/GHSA-7rrm-v45f-jp64 or #2233 for a more complete analysis of these CVEs and patches.
Node objects to Document#root= now raises an ArgumentError exception. Previously this likely segfaulted. [#1900]Node objects to Document#root= now raises an ArgumentError exception. Previously this raised a TypeError exception.NodeSet may now safely contain Node objects from multiple documents. Previously the GC lifecycle of the parent Document objects could lead to nodes being GCed while still in scope. [#1952]nokogiri.so by including LDFLAGS in Nokogiri::VERSION_INFO. [#2167]{XML,HTML}::Document.parse now invokes #initialize exactly once. Previously #initialize was invoked twice on each object.{XML,HTML}::Document.parse now invokes #initialize exactly once. Previously #initialize was not called, which was a problem for subclassing such as done by Loofah.HTML::DocumentFragment. [#2087] (@ashmaroli)Node#line to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [#1223, #2177]--enable-system-libraries and --disable-system-libraries flags to extconf.rb. These flags provide the same functionality as --use-system-libraries and the NOKOGIRI_USE_SYSTEM_LIBRARIES environment variable, but are more idiomatic. [#2193] (@eregon)--disable-static is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [#2191, #2193] (@eregon)Nokogiri::XML::Path is now a Module (previously it has been a Class). It has been acting solely as a Module since v1.0.0. See 8461c74.libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.
We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:
x86-linux and x86_64-linux -- including musl platforms like alpinex86_64-darwin and arm64-darwinWe'd appreciate your thoughts and feedback on this work at #2075.
This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.
This release ends support for:
~> 2.4.0 to ~> 2.5.0 [#2005] (@alejandroperea)See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".
class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]a:has(> b), a:has(~ b), and a:has(+ b). [#688] (@jonathanhefner)Node#value? to better match expected semantics of a Hash-like object. [#1838, #1840] (@MatzFan)Nokogiri::XML::Node#line= for use by downstream libs like nokogumbo. [#1918] (@stevecheckoway)nokogiri.gemspec is back after a 10-year hiatus. We still prefer you use the official releases, but main is pretty stable these days, and YOLO.~= operator and class selector . are about 2x faster. [#2137, #2135]strlen from xmlStrlen rather than the naive implementation, because strlen is generally optimized for the architecture. [#2144] (@ilyazub)RelaxNG.from_document no longer leaks memory. [#2114]{HTML,XML}::Document#parse now accept Pathname objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because the read method would be repeatedly invoked. [#1821, #2110] (@doriantaylor and @phokz)frozen_string_literal: true magic comment to all lib files. [#1745] (@oniofchaos)RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130]~= operator now correctly handles non-space whitespace in the class attribute. commit e45deddadd_previous_sibling, previous=, before, add_next_sibling, next=, after, replace, and swap now correctly use their parent as the context node for parsing markup. These methods now also raise a RuntimeError if they are called on a node with no parent. [nokogumbo#160]XML::Schema XSD validation errors are captured in XML::Schema#errors. These errors were previously ignored.Node#<=> now matches CRuby/libxml2 behavior.Document#errors for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler.vasprintf. [#1908]canonicalize. [#2105]Nokogiri::CSS::Parser.cache_on= has been removed. Use .set_cache if you need to muck with the cache internals.Nokogiri::CSS::Parser.parse has been removed. This was originally deprecated in 2009 in 13db61b. Use Nokogiri::CSS.parse instead.XML::Schema input is now "untrusted" by defaultAddress CVE-2020-26247.
In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.
This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.
Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".
More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.
strict or norecover parsing option(Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby.
If you're using the default parser options, you will be unaffected by this fix. If you're passing strict or norecover to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises a XML::SyntaxError exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.
Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.
VersionInfo, the output of nokogiri -v, and related constantsThis release changes the metadata provided in Nokogiri::VersionInfo which also affects the output of nokogiri -v. Some related constants have also been changed. If you're using VersionInfo programmatically, or relying on constants related to underlying library versions, please read the detailed changes for Nokogiri::VersionInfo at #2139 and accept our apologies for the inconvenience.
NodeSet#to_a to return a RubyArray instead of Object, for compilation under JRuby 9.2.9 and later. [#1968, #1969] (@headius)[MRI] Pulled in upstream patch from libxml that addresses CVE-2020-7595. Full details are available in #1992. Note that this patch is not yet (as of 2020-02-10) in an upstream release of libxml.
patch. [#1954][MRI] Vendored libxslt upgraded to v1.1.34 which addresses three CVEs for libxslt:
More details are available at #1943.
Address CVE-2019-5477 [#1915].
A command injection vulnerability in Nokogiri v1.10.3 and earlier allows commands to be executed in a subprocess by Ruby's Kernel.open method. Processes are vulnerable only if the undocumented method Nokogiri::CSS::Tokenizer#load_file is being passed untrusted user input.
This vulnerability appears in code generated by the Rexical gem versions v1.0.6 and earlier. Rexical is used by Nokogiri to generate lexical scanner code for parsing CSS queries. The underlying vulnerability was addressed in Rexical v1.0.7 and Nokogiri upgraded to this version of Rexical in Nokogiri v1.10.4.
This CVE's public notice is #1915
[MRI] Pulled in upstream patch from libxslt that addresses CVE-2019-11068. Full details are available in #1892. Note that this patch is not yet (as of 2019-04-22) in an upstream release of libxslt.
Procs in many methods. [#1776] (@chopraanmol1):has() now correctly matches against any descendant. Previously this selector matched against only direct children). [#350] (@Phrogz)NodeSet#attr now returns nil if it's empty. Previously this raised a NoMethodError.XSLT::Stylesheet#transform. Previously these errors were suppressed which led to silent failures and a subsequent segfault. [#1802]XML::DocumentFragment#dup no longer returned an instance of the callee's class, instead always returning an XML::DocumentFragment. This notably broke any subclass of XML::DocumentFragment including HTML::DocumentFragment as well as the Loofah gem's Loofah::HTML::DocumentFragment. [#1846]/test) from the packaged gems. [#1719] (@stevecrozz)XML::Attr#value= allows HTML node attribute values to be set to either a blank string or an empty boolean attribute. [#1800]XML::Node#wrap which does what XML::NodeSet#wrap has always done, but for a single node. [#1531] (@ethirajsrinivasan)Node#dup supports copying a node directly to a new document. See the method documentation for details.DocumentFragment#dup is now more memory-efficient, avoiding making unnecessary copies. [#1063]NodeSet has been rewritten to improve performance! [#1795]NodeSet#each now returns self instead of zero. [#1822] (@olehif)XML::Builder to create nodes with namespaces. [#1810]RbConfig::CONFIG instead of ::MAKEFILE_CONFIG to fix installations that use Makefile macros. [#1820] (@nobu)~> 2.3.0 to ~> 2.4.0[MRI] Pulled in upstream patches from libxml2 that address CVE-2018-14404 and CVE-2018-14567. Full details are available in #1785. Note that these patches are not yet (as of 2018-10-04) in an upstream release of libxml2.
[MRI] Behavior in libxml2 has been reverted which caused CVE-2018-8048 (loofah gem), CVE-2018-3740 (sanitize gem), and CVE-2018-3741 (rails-html-sanitizer gem). The commit in question is here:
and more information is available about this commit and its impact here:
This release simply reverts the libxml2 commit in question to protect users of Nokogiri's vendored libraries from similar vulnerabilities.
If you're offended by what happened here, I'd kindly ask that you comment on the upstream bug report here:
[MRI] Vendored libxml2 upgraded to v2.9.8 which addresses CVE-2016-9318 [#1582].
Node#classes, #add_class, #append_class, and #remove_class are added.NodeSet#append_class is added.NodeSet#remove_attribute is a new alias for NodeSet#remove_attr.NodeSet#each now returns an Enumerator when no block is passed (@park53kr)Reader [#898]Node#replace to insert Comment and CDATA nodes. [#1666]Node, Sax::PushParser, and the JRuby implementation [#1708, #1710, #1501][MRI] The update of vendored libxml2 from 2.9.5 to 2.9.7 addresses at least one published vulnerability, CVE-2017-15412. [#1714 has complete details]
Node#serialize once again returns UTF-8-encoded strings. [#1659]~> 1.1 (from ~> 1.1.7). [#1660]~> 2.2.0 to ~> 2.3.0, which will validate checksums on the vendored libxml2 and libxslt tarballs before using them.NodeSet#first with an integer argument longer than the length of the NodeSet now correctly clamps the length of the returned NodeSet to the original length. [#1650] (@Derenge)content argument is not implicitly convertible into a string. [#1669]This release ends support for Ruby 2.1 on Windows in the x86-mingw32 and x64-mingw32 platform gems (containing pre-compiled DLLs). Official support ended for Ruby 2.1 on 2017-04-01.
Please note that this deprecation note only applies to the precompiled Windows gems. Ruby 2.1 continues to be supported (for now) in the default gem when compiled on installation.
~> 2.1.0 to ~> 2.2.0jruby --1.8 code paths. [#1607] (@kares)NodeSet#clone is now an alias for NodeSet#dup [#1503] (@stephankaag)PushParser#replace_entities and #replace_entities= will control whether entities are replaced or not. [#1017] (@spraints)SyntaxError#to_s now includes line number, column number, and log level if made available by the parser. [#1304, #1637] (@spk and @ccarruitero)HTML::SAX::Parser#parse_io now correctly parses HTML and not XML [#1577] (Thanks for the test case, @gregors)lib64 site config. [#1562]XML::Attr.new checks type of Document arg to prevent segfaults. [#1477]#to_html, #to_s, et al) a document with explicit encoding now works correctly. [#1281, #1440] (@kares)XML::Reader now returns parse errors [#1586] (@kares)NodeSets are now decorated properly. [#1319] (@kares)[MRI] Upstream libxslt patches are applied to the vendored libxslt 1.1.29 which address CVE-2017-5029 and CVE-2016-4738.
For more information:
[MRI] Upstream libxml2 patches are applied to the vendored libxml 2.9.4 which address CVE-2016-4658 and CVE-2016-5131.
For more information:
This release ends support for:
Removes required dependency on the pkg-config gem. This dependency
was introduced in v1.6.8 and, because it's distributed under LGPL, was
objectionable to many Nokogiri users (#1488, #1496).
This version makes pkg-config an optional dependency. If it's
installed, it's used; but otherwise Nokogiri will attempt to work
around its absence.
[MRI] Bundled libxml2 is upgraded to 2.9.4, which fixes many security issues. Many of these had previously been patched in the vendored libxml 2.9.2 in the 1.6.7.x branch, but some are newer.
See these libxml2 email posts for more:
For a more detailed analysis, you may care to read Canonical's take on these security issues:
[MRI] Bundled libxslt is upgraded to 1.1.29, which fixes a security issue as well as many long-known outstanding bugs, some features, some portability improvements, and general cleanup.
See this libxslt email post for more:
Several changes were made to improve performance:
NodeSet#to_a with a minor speed-up. [#1397]XML::Node#ancestors optimization. [#1297] (Bruno Sutic)Symbol#to_proc where we weren't previously. [#1296] (Bruno Sutic)XML::DTD#each uses implicit block calls. (@glaucocustodio)pkg-config gem if we're having trouble finding the system libxml2. This should help many FreeBSD users. [#1417]NodeSet#drop [#1042] (@mkristian)style tags are no longer encoded [#1316] (@tbeauvais)libxml-ruby gem's global callbacks were smashing the heap. [#1426]. (Thanks to @bbergstrom for providing an isolated test case)Sax::Parser xmldecl callback. [#844]This version pulls in several upstream patches to the vendored libxml2 and libxslt to address:
Ubuntu classifies this as "Priority: Low", RedHat classifies this as "Impact: Moderate", and NIST classifies this as "Severity: 5.0 (MEDIUM)".
MITRE record is https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-7499
This version pulls in several upstream patches to the vendored libxml2 and libxslt to address:
See also http://www.ubuntu.com/usn/usn-2834-1/
This version supports native builds on Windows using the RubyInstaller DevKit. It also supports Ruby 2.2.x on Windows, as well as making several other improvements to the installation process on various platforms.
This version also includes the security patches already applied in v1.6.6.3 and v1.6.6.4 to the vendored libxml2 and libxslt source. See #1374 and #1376 for details.
config.guess files brought up to date. [#1326] (@hernan-erasmo)iconv.h. (#1206, #1210, #1218, #1345) (@neonichu)Document#parse should support IO objects that respond to #read. [#1124] (Jake Byman)id attribute on HTML documents are now silenced. [#1262]This version pulls in an upstream patch to the vendored libxml2 to address:
This issue was assigned CVE-2015-8710 after the fact. See http://seclists.org/oss-sec/2015/q4/616 for details.
This version pulls in several upstream patches to the vendored libxml2 and libxslt to address:
See #1374 for details.
Note that 1.6.6.0 was not released.
Node and NodeSet implementations of #search, #xpath and #css.Node#lang and Node#lang=.bin/nokogiri passes the URI to parse() if an HTTP URL is given.bin/nokogiri now loads ~/.nokogirirc so user can define helper methods, etc.bin/nokogiri can be configured to use Pry instead of IRB by adding a couple of lines to ~/.nokogirirc. [#1198]bin/nokogiri can better handle urls from STDIN (aiding use of xargs). [#1065]DocumentFragment#search now matches against root nodes. [#1205]DocumentFragment#dup. [#1196]XML::Comment.new argument types are now consistent and safe (and documented) across MRI and JRuby. [#1224]zlib is available before building libxml2. [#1188]Slop#respond_to_missing?. [#1176]an+b CSS query.Document#dup in Document#errors. [#1196]Document#canonicalize parameters are now consistent with MRI. [#1189]nokogiri --version will include a list of applied patches.DocumentFragment#element_children [#1138].Node#document? and Node#processing_instruction?libxml-ruby and nokogiri together in multi-threaded environment. [#895] (@ender672)Node#parse now works again for HTML document nodes (broken in 1.6.2+).Node#add_next_sibling.mini_portile to address the git dependency detailed in [#1102].A set of security and bugfix patches have been backported from the libxml2 and libxslt repositories onto the version of 2.8.0 packaged with Nokogiri, including these notable security fixes:
It is recommended that you upgrade from 1.6.x to this version as soon as possible.
Now requires libxml >= 2.6.21 (was previously >= 2.6.17).
extconf.rb. [#923]extconf.rb. [#952]:not() functions in selectors. [#887] (Magnus Bergmark)extconf.rb option --use-system-libraries, alternative to setting the environment variable NOKOGIRI_USE_SYSTEM_LIBRARIES.Nokogiri::HTML::Document#title= and #meta_encoding= now always add an element if not present, trying hard to find the best place to put it.Nokogiri::XML::DTD#html_dtd? and #html5_dtd? are added.Nokogiri::XML::Node#prepend_child is added. [#664]Nokogiri::XML::SAX::ParserContext#recovery is added. [#453]XML::Node#namespace. [#803, #802] (Hoylen Sue)Nokogiri::XML::Node#parse from unparented non-element nodes. [#407]extconf.rb [#931] (Shota Fukumori)Nokogiri.parse() does not mistake a non-HTML document like a RSS document as HTML document. [#932] (Yamagishi Kazutoshi)SyntaxError in strict mode. [#1005]This release was based on v1.5.10 and 1.6.0.rc1, and contains changes mentioned in both.
This release was based on v1.5.9, and so does not contain any fixes mentioned in the notes for v1.5.10.
nokogiri -v) exposes whether libxml was compiled from packaged source, or the system library was used.Nokogiri::XML::Builder [#868]SAX::Parser.parse_io throw an error when used with lower case encoding. [#828]Nokogiri::XML::Reader broken (as a pull parser) on jruby - reads the whole XML document. [#831]Node#content= incompatibility. [#839]EntityReference after a Text node mangles the entity in JRuby. [#835]XML::Document#collect_namespaces. [#761] (Juergen Mangler)SAX::Document#processing_instruction (Kitaiti Makoto)Node#native_content= allows setting unescaped node content. [#768]XML::Node#[]= stringifies values. [#729] (Ben Langfeld.)bin/nokogiri will process a document from $stdinbin/nokogiri -e will execute a program from the command linebin/nokogiri --version will print the Xerces and NekoHTML versions.Nokogiri::XML::Node#content inconsistency between Java and C. [#794, #797]Node#content now renders newlines properly. [#737] (Piotr Szmielew)Nokogiri::XML::Document#wrap raises undefined method `length' for nil:NilClass when trying to << to a node. [#781]NodeSet, decorating NodeSet's base document raises exception. [#514]RDF::RDFXML::Writer. [#683]--rng option. [#675] (Dan Radez)-Werror=format-security. [#680].at() and search(). [#690](MRI) Default parse options for XML documents were changed to not make network connections during document parsing, to avoid XXE vulnerability. [#693]
To re-enable this behavior, the configuration method nononet may be called, like this:
Nokogiri::XML::Document.parse(xml) { |config| config.nononet }
Insert your own joke about double-negatives here.
Nokogiri::XML::Node#css now works for XML documents with default namespaces when the rule contains attribute selector without namespace.Nokogiri::XML::Reader#outer_xml is broken in JRuby [#617]Nokogiri::XML::Attribute on JRuby returns a nil namespace [#647]Nokogiri::XML::Node#namespace= cannot set a namespace without a prefix on JRuby [#648]HTML::Document#meta_encoding does not raise exception on docs with malformed content-type. [#655]Repackaging of 1.5.1 with a gemspec that is compatible with older Rubies. [#631, #632]
XML::Builder#comment allows creation of comment nodes.XML::Document.wrap and XML::Document#to_java methods are available.nokogiri cli utility. [#591] (Dan Radez)Nokogiri::XML::Node on JRuby (1.6.4/5) fails [#560]XML::Attr nodes are not allowed to be added as node children, so an exception is raised. [#558]Node#add_next_sibling and Node#add_previous_sibling calls. [#595].<p /></p> when tag is empty [#557]Document#add_child now accepts a Node, NodeSet, DocumentFragment, or String. [#546].Document#create_element now recognizes namespaces containing non-word characters (like "SOAP-ENV"). This is mostly relevant to users of Builder, which calls Document#create_element for nearly everything. [#531].::HTML [#542]Node#to_xml does not override :save_with if it is provided. [#505]Node#set is a private method (JRuby). [#564] (Nick Sieger)Node#canonicalize (Ivan Pirlik) [#563]Node::SaveOptions into Node::SaveOptions::DEFAULT_{X,H,XH}TML (refactor)NodeSets with null nodes member safe to operate on. [#443]<meta charset="...">.Node#inner_text no longer returns nil. (JRuby) [#264]Nokogiri::HTML::Document#title accessor gets and sets the document title.Node::SaveOptions into Node::SaveOptions::DEFAULT_{X,H,XH}TML (refactor)Nokogiri::XML::Schema#validate. [#406]Node#serialize-and-friends now accepts a SaveOption object as the, erm, save object.Nokogiri::CSS::Parser has-a Nokogiri::CSS::Tokenizerstart_element() callback (currently used for HTML::SAX::Parser) pass attributes in assoc array, just as emulated start_element() callback does. rel. [#356]HTML::SAX::Parser should call back a block given to parse*() if any, just as XML::SAX::Parser does.Document#remove_namespaces! now handles attributes with namespaces. [#396]XSLT::Stylesheet#transform no longer segfaults when handed a non-XML::Document. [#452]XML::Reader no longer segfaults when under GC pressure. [#439]XML::Node#children= sets the node's inner html (much like #inner_html=), but returns the reparent node(s).XML::Reader node type constants. [#369]XML::DTD#attributes returns an empty hash instead of nil when there are no attributes.XML::DTD#{keys,each} now work as expected. [#324]{XML,HTML}::DocumentFragment.{new,parse} no longer strip leading and trailing whitespace. [#319]XML::Node#{add_child,add_previous_sibling,add_next_sibling,replace} return a NodeSet when passed a string.XML::Node#{replace,add_previous_sibling,add_next_sibling} edge cases fixed related to libxml's text node merging. [#308]Slop decorator to work with previously defined methods. [#330]nth-last-{child,of-type} CSS selectors when NOT using an+b notation. [#354]SAX::Document#start_element. [#356]NodeSet#wrap on nodes within a fragment. [#331]XML::Reader#empty_element? returns true for empty elements. [#262]Node#remove_namespaces! now removes namespace declarations as well. [#294]NodeSet#at_xpath, NodeSet#at_css and NodeSet#> do what the corresponding methods of Node do.XML::NodeSet#{include?,delete,push} accept an XML::NamespaceXML::Document#parse added for parsing in the context of a documentXML::DocumentFragment#inner_html= works with contextual parsing! [#298, #281]lib/nokogiri/css/parser.y Combined CSS functions + pseudo selectors fixedxmlFirstElementChild et al. [#303]XML::Attr#add_namespace now works as expected. [#252]HTML::DocumentFragment uses the string's encoding. [#305]XML::Node#parse will parse XML or HTML fragments with respect to the context node.XML::Node#namespaces returns all namespaces defined in the node and all ancestor nodes (previously did not return ancestors' namespace definitions).Enumerable to XML::NodeNokogiri::XML::Schema#validate now uses xmlSchemaValidateFile if a filename is passed, which is faster and more memory-efficient. [#219]XML::Document#create_entity will create new EntityDecl objects. [#174]ObjectSpace._id2ref, instead using Charles Nutter's rocking Weakling gem.Nokogiri::XML::Node#first_element_child fetch the first child node that is an ELEMENT node.Nokogiri::XML::Node#last_element_child fetch the last child node that is an ELEMENT node.Nokogiri::XML::Node#elements fetch all children nodes that are ELEMENT nodes.Nokogiri::XML::Node#add_child, #add_previous_sibling, #before, #add_next_sibling, #after, #inner_html, #swap and #replace all now accept a Node, DocumentFragment, NodeSet, or a string containing markup.Node#fragment? indicates whether a node is a DocumentFragment.XML::NodeSet is now always decorated (if the document has decorators). [#198]XML::NodeSet#slice gracefully handles offset+length larger than the set length. [#200]XML::Node#content= safely unlinks previous content. [#203]XML::Node#namespace= takes nil as a parameterXML::Node#xpath returns things other than NodeSet objects. [#208]XSLT::StyleSheet#transform accepts hashes for parameters. [#223]not() work. [#205]XML::Builder doesn't break when nodes are unlinked. [#228] (vihai)XML::DocumentFragment uses XML::Node#parse to determine children.Node#replace returns the new child node as claimed in the RDoc. Previously returned +self+.Nokogiri::LIBXML_ICONV_ENABLEDNode#[] to Node#attrXML::Node#next_element addedXML::Node#> added for searching a nodes immediate childrenXML::NodeSet#reverse addedNode#add_child, Node#add_next_sibling, Node#add_previous_sibling, and Node#replace.XML::Node#previous_element implemented:has()XML::NodeSet#filter() was addedXML::Node.next= and .previous= are aliases for add_next_sibling and add_previous_sibling. [#183]Node#matches? works in nodes contained by a DocumentFragment. [#158]Document should not define add_namespace() method. [#169]XPath queries returning namespace declarations do not segfault.Node#replace works with nodes from different documents. [#162]XML::Document#collect_namespacesXML::Node#next_element for certain edge casesXSLT#apply_to will honor the "output method". (richardlehane)Node#at_xpath returns the first element of the NodeSet matching the XPath expression.Node#at_css returns the first element of the NodeSet matching the CSS selector.NodeSet#| for unions [#119] (Serabe)NodeSet#inspect makes prettier outputNode#inspect implemented for more rubyish document inspectingXML::DTD#external_idXML::DTD#system_idXML::ElementContent for DTD Element content validityNokogiri::XML::BuilderXML::Node#external_subsetXML::Node#create_external_subsetXML::Node#create_internal_subsetXML::SAX::ParserContext addedXML::Document#remove_namespaces! for the namespace-impairedRbConfig::CONFIG['host_os'] to adjust ENV['PATH'] [#113]NodeSet#search is more efficient [#119] (Serabe)NodeSet#xpath handles custom xpath functionsXML::Reader gets attributes for current nodeNode#inner_html takes the same arguments as Node#to_html [#117]DocumentFragment#css delegates to it's child nodes [#123]NodeSet#[] works with slices larger than NodeSet#length [#131]XML::Document to NodeSetXML::SyntaxError can be duplicated. [#148]NodeSet#children returns all children of all nodesParseOption#strict fixedNode#inner_html= [#88]Nokogiri::XML::DTD#validate will validate your documentNokogiri::XML::NodeSet#search will search top level nodes. [#73]Nokogiri::XML::DocumentNokogiri::XML::Document#clone is now an alias of dupNokogiri::XML::SAX::Document#start_element_ns is deprecated, please switch to Nokogiri::XML::SAX::Document#start_element_namespaceNokogiri::XML::SAX::Document#end_element_ns is deprecated, please switch to Nokogiri::XML::SAX::Document#end_element_namespaceextconf.rb checks for optional RelaxNG and Schema functionsNokogiri::XML::Node#<=> compares nodes based on Document positionNokogiri::XML::Node#matches? returns true if Node can be found with given selector.Nokogiri::XML::Node#ancestors now returns an Nokogiri::XML::NodeSetNokogiri::XML::Node#ancestors will match parents against optional selectorNokogiri::HTML::Document#meta_encoding for getting the meta encodingNokogiri::HTML::Document#meta_encoding= for setting the meta encodingNokogiri::XML::Document#encoding= to set the document encodingNokogiri::XML::Schema for validating documents against XSD schemaNokogiri::XML::RelaxNG for validating documents against RelaxNG schemaNokogiri::HTML::ElementDescription for fetching HTML element descriptionsNokogiri::XML::Node#description to fetch the node descriptionNokogiri::XML::Node#accept implements Visitor patternbin/nokogiri for easily examining documents (Yutaka HARA)Nokogiri::XML::NodeSet now supports more Array and Enumerable operators: index, delete, slice, - (difference), + (concatenation), & (intersection), push, pop, shift, ==Nokogiri.XML, Nokogiri.HTML take blocks that receive Nokogiri::XML::ParseOptions objectsNokogiri::XML::Node#namespace returns a Nokogiri::XML::NamespaceNokogiri::XML::Node#namespace= for setting a node's namespaceNokogiri::XML::DocumentFragment and Nokogiri::HTML::DocumentFragment have a sensible API and a more robust implementation.Node#{before/after/inner_html=}. [#35]Node#newNokogiri::XML::NodeSet#dup works [#10]Nokogiri::HTML returns an empty Document when given a blank string [#11]XSD::XMLParser::NokogiriNokogiri::XML::Node#inner_html= to set the inner html for a nodeNokogiri::XML::Node#swap swaps html for current node [LH#50]Nokogiri::HTML.fragment will properly handle text only nodes [LH#43]Nokogiri::XML::Node#before will prepend text nodes [LH#44]Nokogiri::XML::Node#after will append text nodesNokogiri::XML::Node#search automatically registers root namespaces [LH#42]Nokogiri::XML::NodeSet#search automatically registers namespacesNokogiri::HTML::NamedCharacters delegates to libxml2Nokogiri::XML::Node#[] can take a symbol [LH#48]Nokogiri::XML::Node#[]= should not encode entities [LH#55]Document#dup should create a new document of the same type [LH#59]Document should not have a parent method [LH#64]Nokogiri::XML::Document#encoding get encoding used for this documentNokogiri::XML::Document#url get the document urlNokogiri::XML::Node#add_namespace add a namespace to the node [LH#38]Nokogiri::XML::Node#each iterate over attribute name, value pairsNokogiri::XML::Node#keys get all attribute namesNokogiri::XML::Node#line get the line number for a node (Dirkjan Bussink)Nokogiri::XML::Node#serialize now takes an optional encoding parameterNokogiri::XML::Node#to_html, to_xml, and to_xhtml take an optional encodingNokogiri::XML::Node#to_strNokogiri::XML::Node#to_xhtml to produce XHTML documentsNokogiri::XML::Node#values get all attribute valuesNokogiri::XML::Node#write_to writes the node to an IO object with optional encodingNokogiri::XML::ProcessingInstruction.newNokogiri::XML::SAX::PushParser for all your push parsing needs.Nokogiri::XML::Document#dupNokogiri::XML::Node.new_from_str will be deprecated in 1.3.0Nokogiri::HTML.fragment now returns an XML::DocumentFragment [LH#32]XML::Node#elem?XML::Node#attribute_nodesXML::AttrXML::Node#delete added.XML::NodeSet#inner_html added.CSS::SelectorHandler and XML::XPathHandlerXML::Node#attributes returns an Attr node for the value.XML::NodeSet implements to_xmlNokogiri::XML::Node#xpathNokogiri::XML::Node#cssNokogiri::XML::Node#<< will add a child to the current nodeXML::Node#to_xml now takes an indentation argumentXML::Node#dup takes an optional depth argumentXML::Node#add_previous_sibling returns new sibling node.Nokogiri() should delegate to Nokogiri.parse()ENV['PATH'] on windowsSyntaxError on parse failureSyntaxError on parse failurefilter() and not() hpricot compatibility addedNode#search are now always relativeREADME.txtENV['PATH'] on windows if it doesn't existNodeSet#[] on DocumentNodeSet now implements to_aryXML::Document should not implement parentinner_html fixed. (Thanks Yehuda!)extconf.rb should not check for frex and raccextconf.rb searched libdir and prefix so that ports libxml/ruby will link properly. Thanks lucsky!